Who cares if AI reads your docs correctly?

Dachary Carey, a senior technical writer at MongoDB, cares. A lot! She has been running research into how AI tools interact with technical documentation. Her approach is experimental: she probes how AI agents actually behave when they go reading docs, rather than taking reports from developers and other experts at face value.

When Dachary first published her findings, they caught the technical writing community off guard. Her follow-up research has shown how pervasive and inconsistent the behavior is. While some tools indicate how much of the content they read, most don’t. It’s not a quirk. When asked, Claude confirmed that it’s a common practice, in part to minimize token use.

For tech writers, that’s a problem worth understanding. When an AI agent skips content and nothing flags it, the AI tool’s answer comes back confident and looking complete. Neither the end user nor the developer who built the tool has any way of knowing what got left out because the failure mode is silent. In fact, it’s not even considered a failure.

Dachary’s research revealed that getting data from a web page to an AI tool’s answer involves more steps and more moving parts than many technical writers realized. That complexity is worth unpacking, because buried in it is a question with some uncomfortable answers: who in that process actually has a reason to care whether the AI gets it right?

The following table describes the layers between the end user and the source content when AI tools collect and process data, and each layer’s incentive to care about the outcome.

LayerFunctionIncentive to care
End usersUse AI tools to solve problemsVery high incentive. They often have a high personal stake in trusting the response.
Developers and IntegratorsAdapt AI tools to specialized use casesVery high incentive. Errors in AI tool responses come back to them directly.
LLM Providers (Anthropic, OpenAI, Google)Develop the AI tools used by the end usersModerate incentive. Currently focus on confidence over accuracy.
Agent Frameworks (LangChain, LlamaIndex, etc.)Used to perform tasks on the AI tool’s behalfLow incentive. Focus more on task success and efficiency than task quality.
Extraction and Scraping Tools (Firecrawl, trafilatura, etc.)The tools used to read data sources (e.g. web pages) and return normalized data to agents for further processingLow incentive. Their focus is to properly read and process the data. How much is used by an agent is out of their scope.
Content AuthorsWrite and publish the source material used by AI tools and human readersVery high incentive. The human authors have a professional interest in ensuring their content is useful and accurate for their readers.

Notice how the layers with the most incentive to care have the least agency in the process. End users care very much about the accuracy of the process but get no feedback from the process. They must validate the results externally, which can be so labor intensive as to negate any time saving provided by the automation. Content authors also care very much about how their content is used but get no information from the process to know if their content is usable.

The layers with the least agency have the most to gain from demanding transparency. Knowing how much of the source content actually reached the answer is useful information for end users, developers, and content authors alike. Some layers currently have more incentive to project confidence than to surface limitations. That only changes when customers start asking questions they can’t currently ask because the information isn’t surfaced.

Paying customers tend to gravitate toward tools they can trust, and trust that’s earned through transparency is more durable than a trust built on not knowing how much of the source material was ignored.

p.s.
As I was writing this post, Dachary published a post on the verification gap in AI content pipelines, in which she examines the verification of AI-generated content. In this context she is both content author, albeit by proxy because the content is AI-generated at her direction and under her review, and end user of the verification process. In both roles, her work is undermined by silent errors and confidently wrong outputs. I’m looking forward to seeing more research into this.

Leave a Reply