I’m in the process of writing one of several academic articles that my current profession (professor) demands of me. An essential part of the process is indulging in the diversions and distractions necessary to retain some sanity throughout the process. Today’s diversion was updating my global bibliography. Unfortunately, that idea turned out to have some depressing side effects, which I’m here to share with you.
It turns out that there’s a lot of research being done in how to automatically generate API documentation. Having written a lot of it and read a lot more, I can certainly understand the motivations. What I didn’t realize was from how many different directions the problem was being attacked. Someone even patented an idea for it (US 8819629 B2, in case you were wondering).
While this doesn’t look like anything new–documentation generators have been around for years–but those have, until lately, required some degree of human involvement to write the underlying text. The human involvement is the part they are trying to eliminate.
Here are some of what I came across (without looking very hard, by the way). Let’s start out light…
Who Asked What: Integrating Crowdsourced FAQs into API Documentation
Chen, C., & Zhang, K. (2014). In Companion Proceedings of the 36th International Conference on Software Engineering (pp. 456–459). New York, NY, USA: ACM. https://doi.org/10.1145/2591062.2591128
This paper from 2014 describes a process for analyzing search queries and integrating and consolidating information from various sources (e.g. stackoverflow) to organize and categorize it into specific documentation sets that target different users, I can’t cite an example, but I have the feeling this is already being implemented (in some for, if not as described in the paper).
This approach still requires people (the crowd in crowdsourced), but don’t rest on your technical writing laurels, yet.
Automated API Documentation with Tutorials Generated From Stack Overflow
Rocha, A. M., & Maia, M. A. (2016). In Proceedings of the 30th Brazilian Symposium on Software Engineering (pp. 33–42). New York, NY, USA: ACM. https://doi.org/10.1145/2973839.2973847
I haven’t read the text of this article, but the abstract describes “four different methodologies to generate tutorials for APIs from the contents of Stack Overflow and organize them according to the complexity of understanding.” So, it sounds like their planning to mine StackOverflow to find tutorial topics and organize them into related tutorial guides. This process, like the first, still requires some human intervention, but they aren’t the only ones looking to mine StackOverflow to created some edited documentation sets.
Augmenting API Documentation with Insights from Stack Overflow
Treude, C., & Robillard, M. P. (2016). In Proceedings of the 38th International Conference on Software Engineering (pp. 392–403). New York, NY, USA: ACM. https://doi.org/10.1145/2884781.2884800
This article describes another machine-learning algorithm to mine StackOverflow data to generate API reference documentation. The approach helps augment reference topics with additional information gleaned automatically from StackOverflow.
Statistical Learning for Inference Between Implementations and Documentation
Phan, H., Nguyen, H. A., Nguyen, T. N., & Rajan, H. (2017). In Proceedings of the 39th International Conference on Software Engineering: New Ideas and Emerging Results Track (pp. 27–30). Piscataway, NJ, USA: IEEE Press. https://doi.org/10.1109/ICSE-NIER.2017.9
The method described in this paper looks into the code to identify reference topic content from the actual code. This has always been the gold standard as the code always accurately describes what a software module does. In their words, they “treat API documentation generation as a machine translation problem.” Essentially translating code to documentation. What developer wouldn’t like that?!
Implications for technical writing
First, there’s still time, but not much. The processes in these studies are a ways from production, but there is a lot of interest in this field in computer science (not so much in technical writing, for some reason). This is literally a case of being replaced by a machine.
It’s not a question of if, but a question of when.
And, when is right now, actually.
My advice is twofold:
- Embrace these tools
Technical writers need to get out in front of this so they can understand them and their limitations. The tools will likely be oversold as being able to do everything, but even if are taken to their logical extreme, they have their limitations. The solutions that rely on StackOverflow for content, for example, will not be able to provide any value until StackOverflow starts getting some relevant content–something that cannot happen before a product is released (and something that cannot happen without people providing content).
- Demonstrate and promote the value that technical writing adds
Technical writing needs to differentiate themselves from what machines can (or will so be able to) do and establish a more durable value proposition. Reference topics are probably not going to be it. Introductory content that supports early adopters might have a better chance. But, whatever it is, we need to start selling it now or the message will be drowned out by the “we don’t need no writers!” hype.
The future is now!