A B2B SaaS in the observability and developer-tools space
Scaled llms-full.txt from 8KB to 1.8MB; citation rate moved from 4% to 33%
The challenge
The client sells an observability platform to engineering teams. Their buyers are technical and they use LLMs heavily, both for vendor research and for direct questions about implementation patterns. The team had shipped an llms.txt and a placeholder llms-full.txt a year earlier when those files first started getting attention, but the llms-full.txt was an 8KB summary that contained the company's marketing positioning and a feature list. It was not useful as a knowledge source. The team's citation rate on category prompts was 4 percent across blended LLMs, despite having one of the most comprehensive technical documentation libraries in the category. The actual documentation was excellent. The problem was that it was scattered across a docs subdomain, a separate API reference, a runbooks repo, an integrations gallery, and a community knowledge base. LLMs were not consistently crawling all of it, and even when they did, they were not weighting it as authoritative because the surfaces were inconsistent.
What we shipped
The engagement was unusually narrow. We focused almost entirely on the llms-full.txt build and the underlying content consolidation, with a smaller amount of schema and citation-tracking work attached.
The llms-full.txt build had four phases. First, audit and categorization of every existing technical surface, which produced an inventory of about 4,200 pages across the five separate documentation properties. Second, deduplication, because there was significant overlap and contradiction across the surfaces. Third, restructuring the source content into a consistent format: every page got a clear single-purpose title, a one-paragraph summary at the top, structured headings, and a consistent code-block format. Fourth, the actual llms-full.txt build, which ended up at 1.8MB and was structured as a hierarchical index with the densest technical reference content at the top, followed by integration documentation, then conceptual guides, then less-cited content.
We made some deliberate choices that turned out to matter. The file was structured so that the most-cited content was front-loaded, on the working hypothesis that LLM crawlers were not necessarily ingesting the full file in every retrieval. We included full code examples rather than abbreviated snippets. We did not include marketing or sales content. We included the integration list with explicit version compatibility, which became a heavy citation source for "does X work with Y" prompts.
On the schema side, we added TechArticle markup across the documentation surfaces and built a consistent author entity for the dev-rel team. We instrumented citation tracking against 90 priority prompts running weekly. We did not do off-domain press work in this engagement.
The numbers
| Metric | Baseline | After 60 days | After 4 months |
|---|---|---|---|
| Blended LLM citation rate, category prompts | 4% | around 19% | around 33% |
| ChatGPT citation rate, technical how-to prompts | 6% | around 24% | around 41% |
| llms-full.txt size | 8KB | 1.6MB | 1.8MB |
| Documentation pages consolidated | 0 | 3,100 | 4,200 |
| Inbound trials from organic, monthly | 240 | 290 | 380 |
| Branded search, monthly | 6,800 | 8,400 | 12k |
The trial volume lift was real but the more interesting funnel change was on the support side. Engineering customers arriving in months three and four were submitting fewer onboarding tickets and were getting to first successful deploy faster, because they had already worked through implementation questions with an LLM that was now citing the actual documentation correctly. The customer success team measured a meaningful drop in onboarding-related support volume. Time-to-first-value, measured by the client's product-led-growth metric, shortened by about 18 percent on organic-acquired trials. Branded search lifted noticeably and started including specific technical phrases that appeared in the llms-full.txt, which is a strong indicator that LLM responses were driving the search behavior. Inside sales noted that prospects on calls were quoting the documentation back to them, sometimes verbatim, which had not been a common pattern before.
What we'd do differently
We spent too long on the dedup phase. Three weeks of cleanup work produced about 60 percent of the citation lift that the next ten days of restructuring produced. We should have moved faster on the early dedup and gotten to restructuring sooner. We also did not initially version the llms-full.txt file, which made it hard to do clean before-and-after experiments when we shipped iterations. By month three we were rebuilding the file weekly and had no clear way to attribute citation movement to specific changes. We eventually built proper versioning and changelog tracking, but the first six weeks of iterations are not properly attributed. Finally, we did not include a structured table-of-contents at the very top of the file, which we now think would have helped LLMs index more efficiently.
What's next
The four-month engagement ended on scope. The client brought the maintenance in-house but kept us on a small monthly retainer for citation-tracking and quarterly schema audits. The next focus, which the in-house team is leading, is to extend the same structured-knowledge approach to the API reference, which currently lives in a separate generated documentation system. The hypothesis is that exposing the API reference cleanly through the llms-full.txt structure will unlock a different class of LLM responses, specifically when developers ask LLMs to write integration code. If that works, it has implications for how the rest of the category will need to structure their documentation, and we expect a follow-on engagement to help productize the approach internally.
Want this outcome for your domain?
Start with a $597 Strategy Sprint or get a free GEO audit.
Book a Strategy Sprint Free GEO Grader