New articles on API documentation

These are some the new articles I found while browsing Google Scholar on the subject of API documentation since my last update on the topic. I found about 20-some new articles since January, which is exciting! Here’s a review of just the ones I’ve had a chance to skim and write up, so far.

This recent trove of documents makes me happy and sad as did the papers I reviewed in an earlier post. These articles contain lots of great research into API documentation, who uses it, and how it’s used. All of these are available for download and I would encourage you to do that if you are interested in API documentation.

At the same time, I’m still disappointed that there has been absolutely zero research published on the topic from the technical communication community this year (was I really the only one who was doing that lately?!) and little to no reference to anything tech-comm related in these papers. If you know of some API research published in a tech-comm venue (and that’s not already in my list), please let me know. At this point, all this is just a warning that some of my disappointment might seep through into my article reviews.

Many of these articles (3/4) had both academic and industry authors, which suggests that industry-academy partnerships aren’t that unusual in the research of API Documentation—if you’re in computer science. In tech comm, as Tom Johnson’s recent survey laments, not so much.

Some of these articles cite earlier research in developer user studies and also contribute to that body of work. None of them, however, cite what I would call writing or reading research. That wouldn’t be so troubling, if it were just these articles, but I can’t think of an article on the topic of API documentation published in a computer science venue that talks about actually reading and using the content. At the same time, there now seems to be a recognizable genre in API documentation literature that Head et al. describes as, “finding anti-patterns in documentation” (see reviews below).

So, here’s a short review of four of the articles I read this morning.

When not to comment: questions and tradeoffs with API documentation for C++ projects

The first two line of this paper drew me in:

Without usable and accurate documentation of how to use an API, developers can find themselves deterred from reusing relevant code. In C++, one place developers can find documentation is in a header file.

Wait, what?! The header file? (and NOT the documentation?!)

Well, of course.

A recurring refrain from developers is that documentation credibility falls off with its distance from the code (perhaps with the square of the distance in some cases). Reasons cited, often with only anecdotal evidence, include (and appear in the paper’s interview responses in section 4.3.2), the program code is the absolute authority on what a function does because they aren’t tested like the code is, while comments can fall out of sync as can the documentation. The header file provides a succinct listing of the functions a C++ library or API provides and makes a convenient place for the corresponding description of those functions to reside.

This paper describes the findings from a survey of Google developers who searched for information about APIs and the people who maintained the APIs and the corresponding documentation.

This should be good. (Spoiler: it was!)

The literature review referred to a familiar cast of researchers who have conducted various studies into what developers thought of API documentation and how developers sought and used it. None of the researchers cited, however, I would characterize as “technical communicators.” The authors do acknowledge “ideas and feedback” from a couple of Google’s senior technical writers at the end of the paper, however.

The study was conducted inside of Google, which has the characteristics of being both a blessing and a curse. In the blessing category, they were able to instrument their search tools to collect specific analytic data and connect with the searchers. In the curse category, it’s hard to know how their findings generalize to a world of non-Google developers as they are studying the very tippy-tip of the bell curve that describes developer skills. Their paper mentions this limitation and also that their interviewees, “had very strong opinions,” which might not represent other developers.

I appreciated the acknowledgement that, “the interviews were aimed at finding anti-patterns in documentation; there are many benefits of getting information from documentation that are not reported in our results.” It would, however, be nice to see some of those “many benefits” and their underlying practices studied and reported some day.

The authors report on what developers were looking for in their searches (VERY Helpful information—if it generalizes to your audience, of course). Nothing special, but it’s good to see the conventional wisdom confirmed every now and then.

The single most interesting section of the paper for a tech. writer involved in API documentation was section 5.1 Implications. This section mentions many of the quandaries that you face every day. While they don’t provide definitive answers, they show that you’re not the only one pondering such notions as these (excerpted from section 5.1):

  • Should you document for an unexpected client?
  • When isn’t the code enough to be self-documenting?
  • Which implementation details do you bring into the documentation?

That section alone makes it worth downloading the article. Reading about what they instrumented in their search tool, might also provide some insights for technical writers.

How developers search for code: a case study

While reviewing the literature cited in the previous article, I found this article written several years earlier by one of its authors. This article focused on search motivations of developers. Specifically, two of their research questions ask: “Why do programmers search” and “In what contexts is search used?” Questions I’ve heard asked in many tech-writing conversations.

This report is a smaller-scale version of the paper (above) that cites it, so maybe read this one first? Either way, both are interesting.

Towards Extracting Web API Specifications from Documentation

This paper was interesting in that it treats API documentation as the source from which to derive the underlying API’s specification—literally a documentation-driven approach. Reading into the article, the intent is noble—read the documentation to create an API schema definition that could be used to validate the use of the API. While this seems like a pragmatic approach in the context of a world in which such schemas, however, helpful they might be, can be hard to come by, it might also benefit from some more ecological context, namely considering the source of the data they’ve chosen: API documentation.

Now, if you’ve read any of my blog, you know that I’m as big of a fan of API documentation as they get, but I’ve also worked in the field of API Documentation production long enough to know that such source material isn’t always the highest priority of an API development project (in terms of either quality or quantity). To the article’s credit, they identify deficiencies in the documentation as contributing to inaccurate results from their API scraper.

As a technical communicator, it’s disappointing to see inaccurate documentation as a given rather than an underlying problem to be solved.

Where does Google find API documentation?

This is a short paper that describes the different sites that had documentation on 10 different popular APIs. To no real surprise, the top locations included Stack Overflow and GitHub. However, I was surprised by a couple of things from this paper.

First, the venue: WAPI’18: IEEE/ACM 2nd International Workshop on API Usage and Evolution. That there’s an annual IEEE/ACM workshop on API usage is COOL! That API Documentation was part of the conversation was even cooler!!! With respect to a recent post on Academic/Industry partnerships, however, the conference at which this paper was presented had an invited talk from an IBM Researcher, so points for that, as well. That there’s no tech-comm reference to API Documentation in this particular article or the conference (from what I can tell of the program, anyway), however, gives me that Eeyore feeling.

Which leads to the second “surprise,” except, it’s not really a surprise anymore. All the literature cited in this article was computer science-y references. The documentation sites were reviewed from a strictly documentation-as-data perspective. In academic literature, you can do that: focus on a very narrow aspect of a topic and exclude the context in which that narrow aspect lives. While that’s often encouraged in the name of boundaries, I think some recognition of the context in which an article is situated helps. In the case of this article, recognition of developer use cases and search motivations might have helped. But, I get it. Perhaps the article’s page count was limited.

Parting thoughts

So, it’s great to see that API documentation is the subject of so much research and this tip-of-the-iceberg view of what’s recently been published provides some empirical answers to questions I’ve seen posed by technical writers in forums lately. My disappointment in the lack of tech-comm contributions to this body of work isn’t because I think that tech comm needs to be duplicating these studies. It’s that tech comm and computer science don’t seem to be talking to each other about the topic. The question I come away with after reading these articles is, why not? I hope its not because tech comm is being made obsolete and, as such, is becoming irrelevant.


Leave a Reply