OpenFeeder – LLM-native web content API (better accuracy, 20x less data)

Posted by jcviau |3 hours ago |2 comments

jcviau 2 hours ago

Author here. A few things worth clarifying upfront:

The benchmark methodology is simple: we made HTTP requests with the actual User-Agents these crawlers use in production and measured raw response sizes. "Actual text content" is after stripping tags/scripts/styles — it still includes some noise (aria-labels, data attributes), so the real useful content ratio is probably worse.

The protocol itself is intentionally minimal. Two endpoints, clean JSON, HTTP caching built in. The spec is ~10 pages. If you've dealt with robots.txt or sitemaps, it'll feel familiar.

The WordPress plugin is the most mature implementation (200 lines of PHP, hooks into WP_Query, 22 tests). The Express middleware and Docker sidecar are also solid. The other adapters (Next.js, FastAPI, Astro, etc.) work but are less battle-tested.

Things I'm genuinely unsure about: whether LLM operators will actually implement client-side support, or whether this ends up being useful mainly for RAG pipelines and AI agents rather than general crawlers. Happy to discuss the threat model.

What would make this useful for you?

jcviau 3 hours ago

Two problems with LLMs consuming web content today:

Quality: They get full browser HTML — 90% nav/scripts/ads. Content is 10% of what they parse. They infer structure, miss updates, get things wrong.

Cost/environment: ~80KB HTML per page, ~4KB actual content needed. 20x overhead, across millions of daily crawls = massive bandwidth + energy waste.

OpenFeeder: one /openfeeder endpoint per site, clean structured JSON, semantic search, differential sync.

9 adapters (WordPress, Drupal, Joomla, Next.js, Astro, Express, FastAPI, Vite, WooCommerce) + Python sidecar + MCP server for Claude Code.

Live demo: https://sketchynews.snaf.foo/openfeeder