What Makes Documentation AI-Ready: Media

In previous posts, I’ve explored how structural principles, writing quality, and code example pedagogy that serve human readers also serve AI tools. Those posts focused on text-based content found in all technical documentation.

This post addresses media: images, videos, diagrams, and audio. These elements appear less frequently in API documentation than in other technical content, but they’re valuable for tutorials, demonstrations, system architecture explanations, and physical procedures.

The patterns that make media accessible to AI tools aren’t new requirements. They’re the same accessibility standards that have served vision-impaired readers, users who have difficulty hearing audio content, and people with cognitive disabilities for decades. Once again, the fundamentals endure.

What accessibility standards tell us about media

The Web Content Accessibility Guidelines (WCAG) have provided clear requirements for accessible media since the late 1990s. These requirements weren’t created for AI—they were created for people who can’t see images, can’t hear audio, or need alternative ways to process information. To address these cases, they recommend:

For images:

  • Provide text alternatives (alt text) that describe content and function
  • Include longer descriptions for complex images like diagrams
  • Don’t rely on color alone to convey information
  • Ensure text in images is also available as actual text

For video:

  • Provide synchronized captions for deaf and hard-of-hearing users
  • Provide audio descriptions for blind and vision-impaired users
  • Provide transcripts that include both dialogue and important visual information
  • Identify speakers in multi-person videos

For audio:

  • Provide text transcripts
  • Identify speakers
  • Note non-speech sounds that convey meaning

These requirements are the result of thirty years of refinement based on how people with disabilities use content.

How AI tools process images

I asked Claude how it processes images in documentation. The patterns it described align exactly with accessibility requirements.

Claude can process images, but only when they’re provided directly—screenshots, diagrams, photos, charts. Unfortunately, Claude can’t extract images from web pages. When Claude encounters <img src="diagram.png" alt="System architecture"> in a webpage, it only sees the alt text—'System architecture'—not the actual image. This same limitation is familiar to screen reader users. In both cases, alt text becomes the primary interface to the images in the page.

In our conversation, Claude identified these patterns for processing images. Each one aligning with accessibility requirements. I’ve added comments to Claude’s responses in italics for additional clarification and context.

What helps Claude process images

These aspects help Claude process documents with images and help people using screen readers.

Screenshots with readable text

  • UI elements, error messages, code snippets
  • Text must be clear enough to read/OCR

Diagrams with text labels

  • Architecture diagrams, flowcharts, system maps
  • Labels identify components and show relationships
  • Arrows and connections have descriptive text

Charts and graphs with data labels

  • Axis labels, data points, legends
  • Text alternatives provide the data the chart represents

Annotated images

  • Arrows, callouts, highlights pointing to specific elements
  • Text explains what’s being indicated

What creates barriers

These situations complicate Claude’s image processing and make it hard for people using screen readers.

Low resolution or heavy compression

  • Text becomes unreadable for OCR
  • Details blur, making description difficult

Color-coded information without labels

  • “The red line shows X” with no legend
  • Relies on color perception alone

Images without context

  • Screenshot with no surrounding explanation
  • Diagram with unlabeled components
  • Reference to “upper right” without clear landmark

Text embedded as images

  • Code as screenshots instead of actual text
  • Can’t be copied, searched, or reliably read

These barriers affect vision-impaired users and AI tools equally. The solutions are the same for both audiences: provide text alternatives, use clear labels, and add context.

Writing effective alt text

You should write alt text as though that’s the only way your reader will learn the information from the image. One way to do this is to describe the information in the document content that precedes the text, using the image as reinforcement. Another is to use the image’s alt text tag.

WCAG provides clear guidance on alt text

The WCAG guidelines for effective alt text are summarized here.

Describe content and function

  • Not: “Image of a button”
  • Better: “Save button with disk icon”

Be concise but complete

  • Not: “Screenshot”
  • Better: “Error message: ‘Connection timeout after 30 seconds. Check network settings.'”

Provide context

  • Not: “Diagram”
  • Better: “System architecture diagram showing three-tier application: web server connects to application server, which connects to database”

For complex images, use longer descriptions [in the text]

Alt text gives overview, longer description (in caption or linked document) provides details

Don’t be redundant

If surrounding text says, “The following diagram shows the authentication flow,” alt text doesn’t need to repeat “diagram”

How AI tools process video

It doesn’t (without your help).

AI tools can’t process video directly and so they have no access to any content: no frames, no audio, no visual information from video files.

This is identical to how blind and vision-impaired users experience video: they rely entirely on audio descriptions and transcripts.

Claude identified what makes video content accessible. As with images, each requirement aligns with accessibility standards:

What makes video accessible to all audiences

Claude described the following aspects of video and audio transcripts that make them easier to process. Visually impaired audiences also benefit from these recommendations.

Human-edited transcripts

  • Proper punctuation and paragraph breaks
  • Logical topic sections
  • Technical terms spelled correctly (not auto-transcript errors)

Speaker identification

  • “Sarah: [text]” vs unmarked dialogue
  • Critical for interviews and panel discussions

Timestamps with section headers

  • “00:45 – Introduction to the problem”
  • “03:20 – Proposed solution”
  • Allows navigation to specific information

Visual content described explicitly

  • Not: “As you can see here…”
  • Better: “The diagram shows three components: authentication layer, API gateway, and database. The auth layer connects to the gateway via OAuth…”

This is valuable information if you’re writing video scripts. Describe things as though the viewer can’t see the images in the video.

Non-verbal content noted

  • “[demonstrates on screen]”
  • “[shows code example]”
  • “[points to diagram]”

Key resources linked

  • Code repositories mentioned
  • Slides or presentation materials
  • Related documentation

What creates barriers:

Claude described these aspects that make transcripts more confusing and harder to process.

Wall of text with no structure

  • No paragraph breaks
  • All one run-on without topic divisions

Missing visual context

  • “This part right here does X” (which part?)
  • “Look at how this changes” (describes screen without narrating)
  • References to “above” or “below” without description

Auto-generated transcripts without editing

  • Technical terms mangled (“Kubernetes” becomes “cooper nettys”)
  • No punctuation or paragraph structure
  • Speaker changes unmarked

Ambiguous references

  • “It connects to this, then it processes that”
  • Worse than in written text because viewers can’t see referents

Why video transcripts matter

Transcripts serve multiple audiences beyond accessibility compliance:

  • Deaf and hard-of-hearing users – Can’t access audio information
  • People in sound-sensitive environments – Libraries, open offices, public transit
  • Non-native speakers – Reading is often easier than listening
  • People who prefer reading – Faster to scan text than watch video
  • Search engines – Can’t index video content without transcripts
  • AI tools – Can’t process video without text alternatives

A quality transcript makes video content accessible to all these audiences simultaneously.

The AI perspective confirms established practice

When I asked Claude what it needs from images and video, every requirement it identified matches WCAG accessibility standards. For example:

AI tools don’t introduce new requirements for media. They validate requirements that accessibility advocates have documented and refined over the past thirty years. The surprise isn’t that AI needs text alternatives—it’s that we’ve had clear standards for providing them since 1999, yet many organizations still treat accessibility as optional.

Bringing it together: Media for all readers

The previous articles in this series showed that AI tools need the same structural principles, writing quality, and pedagogical approaches that serve human readers. Media follows the same pattern.

The fundamentals persist. Whether you’re serving a screen reader user, a deaf student, someone in a quiet library, or an AI tool processing your documentation, the same principles apply: provide text alternatives, describe visual content explicitly, structure information clearly.

Meeting accessibility standards serves all your audiences—human and AI alike.

Further Reading

For readers interested in learning more about media accessibility:

Web Accessibility Standards:

W3C Web Content Accessibility Guidelines (WCAG) – Current standards for accessible web content, including detailed requirements for images, video, and audio.

WCAG 2.2 Quick Reference – Filterable reference guide to all WCAG requirements with techniques and examples.

Accessibility Resources:

WebAIM: Alternative Text – Comprehensive guide to writing effective alt text.

WebAIM: Captions, Transcripts, and Audio Descriptions – Detailed guidance on making video and audio accessible.

Universal Design:

The Center for Universal Design – Principles and applications of universal design that benefit all users.

Additional Resources:

For a comprehensive bibliography tracing web accessibility principles from the 1980s through current practice, see Bibliography of Web Design Principles.

Leave a Reply