How to get honest answers from AI tools (hint: better questions)

Google Cloud recently published “Smarter Authoring, Better Code: How AI is Reshaping Google Cloud’s Developer Experience,” describing how they’re applying AI to documentation challenges that every technical writing team recognizes: keeping content accurate, current, and useful for developers working with constantly evolving services.

Because AI tools consistently claim to excel at content analysis and summarization, I put that claim to the test on this article. I’ve been working with these tools for about six months now and wanted to test what I’d learned. So, I fed the Google article to Claude and spent some time exploring what it could tell me about the content, the claims, and the gaps.

Spoiler alert: the questions I asked mattered more than the answers I got.

The thought leadership gap: Style over substance

The Google article describes two main AI applications in technical writing:

Integrating Gemini into writers’ authoring environments for productivity tasks like generating tables and applying style guides, and
An automated testing system that uses Gemini to read documentation steps and generate Playwright scripts for validation.

The article also describes a multi-agent system for code sample generation that uses Protobuf definitions as source of truth, with generator agents creating samples and evaluator agents scoring them against rubrics.

Claude initially provided a summary that was cleaner and better organized than my mental notes after reading the same article. However, when I pushed for evidence supporting the claims, Claude articulated what I’d been sensing: the article provides limited concrete evidence beyond “over 100 tests daily” and references to “tens of thousands of idiomatic samples.” There was no mention of metrics on accuracy improvements, time savings, before/after comparisons, or validation of claims about preventing “factual drift.” You can see the whole conversation in the chat transcript.

When I questioned Claude about why Google would publish such a detail-light article, Claude offered a perspective that troubled me:

For thought leadership aimed at developer relations leaders and engineering executives, too much evidence could backfire. Publishing specific metrics creates comparison benchmarks that competitors can target. On the other hand, a claim like, “Over 100 tests daily” suggests impressive scale without revealing whether those tests actually pass.

That bothered me. We’re talking about a business communication norm where impressive-sounding claims matter more than verifiable results. Claude put it bluntly: “There’s an ethical gap between presenting a technical approach and implying proven success.”

This explains why teams who implement AI solutions based on thought leadership articles, then scramble to explain why the promised productivity gains didn’t materialize.

Meanwhile, the AI I was using actually worked

The irony wasn’t lost on me: while analyzing Google’s vague claims about AI abilities, the AI tool I was using was proving genuinely useful.

Claude organized complex information better than my mental notes and identified gaps I’d sensed but hadn’t articulated. When I asked about implementation costs, it confidently quoted $2,000 to $8,000 daily for a Google-scale system, complete with breakdown of Gemini API costs, cloud infrastructure, and resource provisioning.

Then I asked how it calculated those numbers.

“I presented these as informed estimates when they were educated guesses based on limited comparable data. That’s exactly the kind of false precision you should push back on,” Claude replied.

The tool worked quite well with summarization, pattern recognition, and organizing information. However, it struggled when speculation masqueraded as analysis.

While I appreciate self-awareness demonstrated in the response, presenting this limitation only after being prompted for it bothers me. It’s a warning label that you can only see after you encounter the danger. For the unsuspecting or uninformed user, presenting guesses as estimates could be quite problematic.

This reinforces my early observation that the LLM tools work best when you already know the subject matter. More to the point, it needs someone who knows enough about the subject to catch errors and guide the questioning.

Fortunately, there’s no need to throw the baby out with the bath water. It’s possible to avoid the tool’s limitations to take advantage of its strengths.

Better questions, better results

My original approach was admittedly lazy. A prompt like “Read and summarize this article” leaves too much to chance. Based on the experience, I drafted the following, more specific prompt.

Read and summarize the article at <link>
Identify the intended audience and purpose.
DO NOT INVENT FACTS OR EXAMPLES. Flag areas where you have doubts about claims.
What evidence supports the main assertions? What evidence is missing?
What alternative interpretations are possible?
What are the limitations of this analysis based on available information?

The revised prompt produced a structured analysis that called out missing evidence and acknowledged evaluation limitations. Instead of confident speculation, I got specific details without the creative writing.

You can read the before and after in the chat transcript.

Two truths

Google’s article exemplifies how AI marketing can create unrealistic expectations through architecturally impressive but evidence-light claims. The AI tool I used to analyze those claims demonstrated genuine capability when properly questioned.

Both things are true. The thought leadership problem and the practical utility coexist.

For technical writers evaluating AI adoption, the lesson isn’t to dismiss AI tools because of inflated marketing or unreliable outputs, nor should you adopt them based solely on aspirational thought leadership promises. Learn to separate useful capabilities from wishful thinking, whether that thinking comes from vendors, thought leaders, or the AI tools themselves.

Value honest uncertainty over confident speculation. The best AI assistance comes from knowing a tool’s limitations and learning to work within them.

The thought leadership gap: Style over substance

Meanwhile, the AI I was using actually worked

Better questions, better results

Two truths

Leave a Reply