What to measure?

Creative Commons licenseThat measuring API documentation is difficult is one of the things I’ve learned from writing developer docs for more than 11 years. Running the study for my dissertation gave me a detailed insight as to some of the reasons for this.

The first challenge to overcome is answering the question, “What do you want to measure?” A question that is followed immediately by, “…and under what conditions?” Valid and essential, but not simple, questions. Stepping back from that question, and a higher-level question comes into view, “What’s the goal?” …of the topic? …of the content set? and then back to the original question, of the measurement?

For my dissertation, I spent considerable effort scoping the experiment down to something manageable, measurable, and meaningful–ending up at the relevance decision. Clearly there is more to the API documentation experience than just deciding if a topic is relevant, but that’s a pivotal moment in the content experience. The relevance decision also seemed to be the most easily identifiable, discrete event that I could identify in the overall API reference topic experience. It’s a pivotal point in the experience, but by no mean the only one.

The processing model I used was based on the TRACE model presented by Rouet (2006). Similar cognitive-processing models were also identified in other API documentation and software development research papers. In this model, the experiment focuses on step 6.

The Task-based Relevance Assessment and Content Extraction (TRACE Model) of document processing Rouet, J.-F. (2006). The Skills of Document Use: From Text Comprehension to Web-Based Learning (1st ed.). Lawrence Erlbaum Associates.
The Task-based Relevance Assessment and Content Extraction (TRACE Model) of document processing
Rouet, J.-F. (2006). The Skills of Document Use: From Text Comprehension to Web-Based Learning (1st ed.). Lawrence Erlbaum Associates.

Even in this context, my experiment studies a very small part of the overall cognitive processing of a document and an even smaller part of the overall task of information gathering to solve a larger problem or to answer a specific question.

To wrap this up by returning to the original question, that is…what was the question?

  1. The goal of the topic is to provide information that can be easily accessible to the reader.
  2. The easily accessible goal is measured by the time it takes for the reader to identify whether the topic provides the information they seek or not.
  3. The experiment simulates the readers task by providing the test participants with programming scenarios in which to evaluate the topics
  4. The topics being studied are varied randomly to reduce order effects and bias and participants see only one version of the topics to bias their experience by seeing other variations.

In this experiment, other elements of the TRACE model are managed by or excluded from the task.

API reference topic study – thoughts

Last month, I published a summary of my dissertation study and I wanted to summarize some of the thoughts that the study results provoked. My first thought was that my experiment was broken. I had four distinctly different versions of each topic yet saw no significant difference between them in the time participants took to determine the relevance of the topic to the task scenario. Based on all the literature about how people read on the web and the importance of headings and in-page navigation cues in web documents, I expected to see at least some difference. But, no.

The other finding that surprised me was the average length of time that participants spent evaluating the topics. Whether the topic was relevant or not, participants reviewed a topic for an average of about 44 seconds before they decided its relevance. This was interesting for several reasons.

  1. In web time, 44 seconds is an eternity–long enough to read the topic completely, if not several times. Farhad Manjoo wrote a great article about how people read Slate articles online, which agrees with the widely-held notion that people don’t read online. However, API reference topics appear to be different than Slate articles and other web content, which is probably a good thing for both audiences.
  2. The average time spend reading a reference topic to determine its relevance in my study was the same whether the topic was relevant to the scenario or not. I would have expected them to be different–the non-relevant topics taking longer than the relevant ones on the assumption that readers would spend more time looking for an answer. But no. They seemed to take about 44 seconds to decide whether the topic would apply or not in both cases.

While, these findings are interesting, and bear further investigation, they point out the importance of readers’ contexts and tasks when considering page content and design. In this case, changing one aspect of a document’s design can improve one metric (e.g. information details and decision speed) at the cost of degrading others (credibility and appearance).

The challenges then become:

  1. Finding ways to understand the audience and their tasks better to know what’s important to them
  2. Finding ways to measure the success of the content in helping accomplishing those tasks

I’m taking a stab at those in the paper I’ll be presenting at the HCII 2015 conference, next month.

Checklist for technical writing

Devin Hunt's Design hierarchy
Devin Hunt’s design hierarchy

Devin Hunt posted this figure from “Universal Principles of Design,” which is an adaptation of Maslow’s Hierarchy of Needs for design.  It seemed like they could also apply to technical writing. Working up from the bottom…


As with a product, technical content must work. The challenge is knowing what that actually means and how to measure it. Unfortunately, for a lot of content, this is fuzzy. I’m presenting a paper next month that should help provide a framework for defining this, but, as with Maslow’s triangle, you must do this before you can hope to accomplish the rest.

For technical content, like any product, you must know your audience’s needs to know what works means. At the very least, the content should support the user’s usage scenarios, such as getting started or onboarding, learning common use cases, having reference information to support infrequent, but important, usage or application questions. What this looks like is specific to the documentation and product.


Once you know what works means, then you can tell if it does and determine if it does so consistently. Again, this requires knowledge of the audience–not unlike product design.

This is tough to differentiate from functionality, except that it has the dimension of providing the functionality over time. Measuring this is a matter of tracking the functionality metrics over time.


Once you know what content that works looks like, you can make sure it does so consistently and does so in a way that is as effortless as possible.

Separating usability from functionality is a tough one  in the content case. If content is not usable, does it provide functionality? If you look close, you could separate them out. For example, a content set can have all the elements that a user requires but they can be difficult to find or navigate. Likewise, the content might all exist, but be accessible in a way that is inconvenient or disruptive to the user. As with product development, understanding the audience is essential, as is user testing to evaluate this.


Can readers become expert at using the documentation? One could ask if they should become experts, but in the case of a complex product that has a diverse set of features and capabilities, it’s not too hard to imagine having a correspondingly large set of documentation to help users develop expertise.

What does this look like in documentation? At the very least, the terms used the documentation should correspond to the audience’s vocabulary to facilitate searching for new topics.


Not every product supports creativity, nor does every documentation set. However, those that do make the user feel empowered and are delightful to use. A noble, albeit difficult, goal to achieve, but something worthy of consideration.

This might take the form of community engagement in forums, or ongoing updates and tips to increase the value of the documentation and the product to the audience.

API reference topic study – summary results

During November and December, 2014, I ran a study to test how varying the design and content of an API reference topic influenced participants’ time to decide if the topic was relevant to a scenario.


  • I collected data from 698 individual task scenarios were from 201 participants.
  • The shorter API reference topics were assessed 20% more quickly than the longer ones, but were less credible and were judged to have a less professional appearance than the longer ones.
  • The API reference topics with more design elements were not assessed any more quickly than those with only a few design elements, but the topics with more design elements were more credible and judged to have a more professional appearance.
  • Testing API documentation isn’t that difficult (now that I know how to do it, anyway).

The most unexpected result, based on the literature, was how the variations of visual design did not significantly influence the decision time. Another surprise was how long the average decision time was–almost 44 seconds, overall. That’s more than long enough to read the entire topic. Did they scan or read? Unfortunately, I couldn’t tell from my study.


The experiment measured how quickly participants assessed the relevance of an API reference topic to a task-based programming scenario. Each participant was presented with four task scenarios:  There were two scenarios for each task: one to which the topic applied and another to which the topic was not relevant and each participant saw two of each. There were four variations of each API reference topic; however, each participant only saw one–they had no way to compare one variation to another.

The four variations of API reference topics resulted from two levels of visual design and two levels of the amount of information presented in the topic.

Low visual design High visual design Findings:
Information variations
copy_ld_hi copy_hd_hi
  • Higher credibility
  • More professional appearance
copy_ld_lo copy_hd_lo
  • Lower credibility
  • Less professional appearance
Design variations
  • Faster decision time
  • Lower credibility
  • Less professional appearance
  • Slower decision time
  • Higher credibility
  • More professional appearance

Continue reading “API reference topic study – summary results”

Is it really just that simple?

Photo of a tiny house. Is less more or less or does it depend?
A tiny house. Is less more or less or does it depend?
After being submerged in the depths of my PhD research project since I can’t remember when, I’m finally able to ponder its nuance and complexity. I find that I’m enjoying the interesting texture that I found in something as mundane as API reference documentation, now that I have a chance to explore and appreciate it (because my dissertation has been turned in!!!!). It’s in that frame of mind that I consider the antithesis of that nuance, the “sloganeering” I’ve seen so often in technical writing.

Is technical writing really so easy and simple that it can be reduced to a slogan or a list of 5 (or even 7) steps? I can appreciate the need to condense a topic into something that fits in a tweet, a blog post, or a 50-minute conference talk. But, is that it?

Let’s start with Content minimalism or, in slogan form, Less is more! While my research project showed that less can be read faster (fortunately, or I’d have a lot more explaining to do), it also showed that less is, well, in a word, less, not more. It turns out that even the father of Content Minimalism, John Carroll, agrees. He says in his 1996 article, “Ten Misconceptions about Minimalism,”

In essence, we will argue that a general view of minimalism cannot be reduced to any of these simplifications, that the effectiveness of the minimalist approach hinges on taking a more comprehensive, articulated, and artful approach to the design of information.

In the context of a well considered task and audience analysis, it’s easy for the writer to know what’s important and focus on it–less can be more useful and easier to grok. He says later in that same article,

Minimalist design in documentation, as in architecture or music, requires identifying the core structures and content.

In the absence of audience and task information, less can simply result in less when the content lacks the core structures and content and misses the readers’ needs.   More can also be less, when writers try to cover those aspects by covering everything they can think of (so-called peanut-butter documentation that covers everything to some unsatisfying uniform depth).

For less to be more, it has to be well informed. Its the last part that makes it a little complicated.

Carroll, John, van der Meij, Hans (1996): Ten Misconceptions about Minimalism. IEEE Transactions on Professional Communication, 39(2), 72-86.

Infrequently used but extremely critical information

Seattle at sunset from over Bainbridge Island
Seattle at sunset from over Bainbridge Island

I just finished the refresher training for my Flight Instructor (CFI) certificate. This is the eighth time I’ve renewed it since my last check-ride. Yet, for some reason, this time seemed unusually focused on mishaps. Granted, the lessons focused mostly on avoiding them, and surviving them (when they couldn’t be avoided), but mishaps nonetheless.

This year, I had lessons on avoiding and surviving:

  • mishaps in the mountains
  • mishaps in helicopters
  • mishaps in seaplanes
  • mishaps while teaching student pilots
  • mishaps in bad weather

Then, as a break in training, I watched the Weather Channel’s “Why Planes Crash.”

Yeah, that was relaxing.

Yet, after all this, I can’t wait to get to the airport and go flying, again. I really enjoy flying so why focus on all this negativity?

So it won’t happen to me. I hope that I never need to apply any of this information, directly. At the same time, I hope to apply what I learned about these mishaps all the time–not just while flying.

Most of the mishap scenarios could be summed up as:

Failing to plan is planning to fail.

Alan Lakein

Which applies to just about any aspect of life.

I’ve had a lot of flight training in the 40+ years I’ve spent hanging around airports. The bulk of it–70-80%, I’d guess–involved dealing with emergencies (the rest delt with how to avoid them in the first place). So far, fortunately, these emergency scenarios have only happened to me in training scenarios. While I have a story or two to tell about when things didn’t go exactly as planned, the vast majority of my flights were quite safe.

So, if the probability of an inflight emergency is very low, why spend a disproportionate time in training for them? Because, while they are very low-probability events (0%, ideally), they are very high-cost events.

Is it worth it to spend all that time training for something that is not likely to ever happen? You betcha!!  I’m sure the people who fly with me would agree!

What does any of this have to do with technical writing? It’s like I posted in this post on in-flight reading material,  some topics provide a little value many times, while other topics provide a great deal of value, but infrequently. The value of content that results, somehow, in a financial transaction is relatively easy to compute: subtract the cost of producing and promoting the content from the gain it provides. Computing the value of content that provides great value but only infrequently, much harder.

Like the flight training that’s kept me from having an accident, how do you measure the value of events  (accidents, misunderstandings, errors, etc.) that have been averted or costs that have been avoided?

How many topics in a page?

API reference topic type distribution observed in study of open-source API documentation
API reference topic type distribution observed in study of open-source API documentation

This question has come up a few times recently and the answer, like the answer to so many technical writing questions is, “it depends,” of course. Which begs the next question, “depends on what?”

Well, at the root, what the reader wants to accomplish, perhaps most critically, how do they want to accomplish it.

I researched this a couple of years ago and found the multi-topic/page format was used in the API reference documentation we studied to be twice as common as the the single-topic/page format.

Graph of API reference topic type by API size
The distribution of API reference topic type by API size

Now, because a format is more popular doesn’t necessarily mean that it’s better. Looking at this chart from the study shows that the format preference shifts towards the single-element per page format with larger APIs. It’s possible that the difference observed is nothing more than an artifact of authoring systems or organizational style guides.

Research on how people construct knowledge, however, tends to prefer the multi-topic/page format (to a point). If you are constructing knowledge about an API, you might look for an overview, some sample code, some explanations of some key methods, go back to the overview, look at some more sample code… Lather, rinse, repeat. Such a learning method is best facilitated by a big topic in which the reader can skip around to see quickly all the related information in whatever order his or her learning style desires. Doing that with each topic in a separate page requires multiple web-server accesses, each interrupting the flow for more time than in-page navigation. While the 2-3 seconds it might take for the page to load doesn’t sound like much, if it breaks the reader’s flow, it degrades the learning experience.

A key part of this learning method is the intra-page navigation–to facilitate meaningful skipping around, or random access to an otherwise sequentially oriented topic. This topic came up in a discussion about the scrollspy feature of Twitter Bootstrap, which provides some very helpful in-page navigation elements. The advantage of scrollspy is that it builds the in-page navigation when the page loads, which makes it much easier for the author (and maintainer) of the topic.

A great tutorial saved my glasses

Last night, in an attempt to fix a loose temple on my glasses, I found myself in a pickle. As is often the case, it was much easier to take my glasses apart than it was to reassemble them.

I’d taken the temple off my glasses frame to tighten a loose hinge. However, I realized (after dis-assembly, of course) that the spring tensioner in the temple made it impossible to replace the screw that holds it to the frame. Of course, I tried (stubbornly) for about 30 minutes to figure out how to reassemble my glasses. Alas, to no avail.

Eventually, I asked YouTube and found this video.

In less than two-and-a-half minutes, the video producer demonstrated the problem, the tools needed for the repair, showed how the method worked, and then the final outcome—a textbook-perfect tutorial. No fluff, just useful information. Five minutes later, I had my glasses reassembled and back in business.

Thanks danliv99!

In-flight reading

Photo of in-flight reading material
The collection of in-flight reading material found in the seat back of a recent Alaska Airlines flight

I’m working on a paper for the HCII 2015 conference and thought of the reading material I saw on a recent airline flight. The paper discusses ways to identify the goals of different types of information-focused web content and how to measure how well the content is accomplishing those goals, so now I see this everywhere I look.

This is what occurred to me while I was staring at this collection of literature for several hours.

The Alaska Airlines magazine

It’s goal is to provide the reader with entertainment, er, I mean provide them with advertising, using entertainment as the bait. So how would you measure success? Interaction with the content, maybe? Coupon codes entered, products ordered from the web links, and other interactions traceable to the magazine. Pretty straightforward. Content can be compared to the corresponding advertisement, reader feedback, and the publisher can decide what’s working and what needs work.

The airsick bag

This isn’t really literature, but a good case of “if it’s not easy to use, it’s going to get messy.” I don’t think any amount of documentation could fix a poorly-designed air sickness bag.

The emergency procedures brochure

This is everyone’s favorite reading material, right? It’s goal is to provide important information and provide it in a way that’s easy to understand (quickly, in the worst-case scenario). This is a document that Alaska Airlines (and its passengers) hope to never need, but when it’s needed, it’s value will be immeasurable. How do you track that goal? User feedback? probably not. Survivor stories? Let’s hope not! Maybe usability testing?

The WiFi and the “Meals & Snacks” advertisements

Again, this is purely promotional material whose effectiveness can be tracked by sales. Like the magazine, this is not unfamiliar territory.

What’s this got to do with me?

As a writer of help content, I relate to the emergencies procedures brochure. Sometimes I don’t think anyone reads my content and frequently, Google analytics tends to agree with me. But, I know that in some cases, when those few people need to read it, that’s going to be the most important thing for them to read (if only for that moment). If I’ve done my job right, what I’ve written will save them time and money. I’ll never know that from looking at Google analytics, but, a usability test (even an informal or discount test) will tell me if a topic will be ready to save the day (moment) when it’s called upon to do so.

Back to the paper.

Knowing your audience

Image of earth from space
Where to find our audience

For my PhD dissertation, I ran an unmoderated, online study to see how variations in page design and content of an API reference topic would affect how people found the information in the topic.

For the study, I solicited participants from several software development groups on LinkedIn and a few universities around the country. It’s definitely a convenience sample in that it’s not a statistically random sample, but it’s a pretty diverse one. Is it representative of my audience? I’m working on that. My suspicion, in the mean time, is that I’ve no reason to think it’s not, in that the people who read API documentation include a lot of people. For now, it’s representative enough.

A wide variety of people responded to the 750,000 or so software developers and people interested in software development that I contacted in one way or another. From those invitations (all in English, by the way), 436 people responded and 253 actually filled out enough of the survey to be useful. The 253 participants who completed the demographic survey and at least one of the tasks were from 29 different countries and reported speaking a total of 32 different native languages. Slightly less than half of the participants reported speaking English as their native language. After English, the top five non-English native languages in this group were: Hindi, Tamil, Telegu, Kannada, Spanish.

More than half of the participants didn’t speak English as their native language, but the vast majority of them should have no problem reading and understanding it. Of the 144 who didn’t speak English as their native language, 81% strongly agreed with the statement, “I can read, write, and speak English in a professional capacity or agreed or strongly agreed with the statement, “I can speak English as well as a native speaker.” So, while they are a very international group, the vast majority seem to speak English pretty well. The rest might need to resort to Google translate.

All of this supports the notion that not providing API documentation in any language other than English is not inconveniencing many developers—developers who respond to study invitations in English, at least. An interesting experiment might be to send this same survey to developers in other countries (India, China, Japan, LatAm, for starters) in their native languages to see how the responses vary.

So many studies. So little time.