By this, I mean, should you describe in the formal documentation the way a feature actually works or the way it is supposed to work?
There is a lot of research that says software developers (and, I would imagine almost anyone else, but I haven’t studied that research) expect the documentation to be accurate, so reference documentation should describe how something ACTUALLY works. If “how it works” changes, the documentation should be updated to reflect that. Simple, right?
Well, if a feature is shipped that (let’s say) still has some room for improvement, your accurate documentation will highlight that. A real fear in the hearts of some product managers is that your accurate documentation could turn some customers away saying, “we need something that does X and your product’s documentation says that it doesn’t do X (yet).” Your product manager will have to weigh the cost of that vs. the cost of having the customer build their system around your product expecting “X” only to find out that it doesn’t. A decision that could get your product into market and/or into an unflattering TechCrunch article.
The problem with reference topics (of almost any sort) is that, individually, they don’t do much and don’t get a lot of attention (except, of course, when they do). Collectively, they tell your customers how much you value their time and their business.
Recent events in the news bring this topic into the spotlight.
Documentation in the news
The recent news about the meltdown and spectre vulnerabilities has put documentation into the spotlight. It’s bad enough that the recent discoveries identified microprocessor vulnerabilities that have apparently been around for quite some time, but it’s interesting to see technical documentation enter the discussion.
While the vulnerabilities are troubling on various levels, from a documentation perspective, Microsoft’s response to their pause in pushing out the patches caught my eye. A Microsoft spokesman was quoted in the Verge as attributing this pause to the fact that, “After investigating, Microsoft has determined that some AMD chipsets do not conform to the documentation previously provided to Microsoft to develop the Windows operating system mitigations to protect against the chipset vulnerabilities known as Spectre and Meltdown.”
Let’s deconstruct that a bit (a cool thing about being an academic, is that I can now work “deconstruct” and “unpack” into conversation).
Microsoft as angry customer
Reading this quote, Microsoft is in the position of a very upset customer saying, basically, “Your documentation said the chip would do X and it doesn’t!” The implication is that this discrepancy is going to cost Microsoft and its millions (billions?) of customers some considerable amount of time and money. I don’t know the exact documentation to which the unnamed spokesman is referring, but I would imagine it is some obscure reference topic or errata buried deep in a [virtual] stack of other equally important and equally obscure content.
Could this problem have been avoided from the start by spending more time on the documentation? Would have been deemed worth the effort (before the fact, not with the advantage of 20/20 hindsight)? To look at this in the value context of documentation that I’ve mentioned in earlier posts, I’m curious how the value of such a topic could be evaluated when it was written.
Page views are the go-to unit of measure for valuing web content. Would the page view unit be useful here?
Not likely. First, the number of people who read microprocessor references is quite small and the number of details about microprocessors is quite large, so the number of times any one page would be read on average is likely to be barely perceptible. Further, what value does a reference topic page view have in dollars? I mention how inappropriate page views are as unit of reference-topic value in my observations of in-flight reading materials.
The value to the customer would be a likely candidate in that this would translate to some value to the company (although that calculation could be complicated). Right now, the value to Microsoft (and its customers) of having accurate and useful documentation about that topic is quite high and I would imagine that the cost to AMD is also measurable. Divide that value across all the pages necessary and the average value per page could still be pretty low (assuming millions of dollars of impact and thousands, if not millions of pages of content over the time this chip has been in production).
Whether the value per page is $1, $100, or $1,000, if the odds of a documentation failure causing a lot of financial damage are low, it might make the most business sense to just cut the tangible costs on producing documentation up front and have an expert or two available on hand to handle the crises that may (or may not) arise. Or it might be cheaper to invest heavily up front to avert any downstream damage. It’s hard to say, in general.
How all that math works out depends on the situation, but in layman’s terms, it’s like deciding between carrying a spare tire or having AAA (which is not an exclusive choice, by the way). Having a spare tire means you lose a lot of trunk space to haul around a tire and a jack all the time, but a flat tire will only take about 20 minutes to change no matter where it goes flat. Having AAA or some other roadside assistance plan frees up all that space in the trunk, but a flat tire could become a 30-minute wait on a good day, or you could be stuck for quite a while if circumstances are working against you (and remember, they are already working against you because remember, you’re already stuck with a flat tire). Which is the better option depends on many variables, but they can be estimated–it’s actually a pretty standard formula.
In the absence of any historical data, however, calculating customer value, in this case, is tricky.
This is related to customer value, but from the company’s perspective—AMD in this example. What does Microsoft’s (extremely visible and public) press reports do to AMD’s image and perception? I haven’t seen AMD’s response (although I have seen lots of reporting on Microsoft’s view).
What would it cost your company to have one of the largest tech companies in the world spreading bad press about you because of a documentation error? That’s got to be worth something. (Turns out, it is.)
But wait a minute, is it a documentation error or a chip (design/manufacturing) error? It depends on what you use as your “truth reference.”
Specification as truth reference
If the design spec (invariably an internal document) is your truth reference and the documentation describes the specified functionality, it is, by definition, an implementation error (perhaps in translating the design to the fabrication or in the manufacture of the chips themselves). Somehow countless chips were designed and built with the error and managed to pass final test to be shipped to customers.
OK, that happens. Such occurrences are typically handled in software by listing them in the release notes or in Knowledge Base articles or some other ad hoc form of last-minute documentation. Hardware devices have a similar documentation protocol for this. Unfortunately, such addenda, errata, or readme are often not linked to the original, reference documentation that describes the feature being referenced in the errata. It’s not the most accessible way to provide the information. But, it’s a common one.
The argument for this approach is that eventually, errors will be recognized and fixed (somewhere along the line) and so, eventually, reality will align itself with the documentation. This argument is particularly compelling when the cost of publishing the documentation is high (e.g. because it is printed, or because the publishing process is costly). Problems arise, such as in this Microsoft-AMD case, where the error was not detected or the impact of the error was not appreciated for quite some time.
What ships as the truth reference
Agile development would advocate for this approach in that, according to the Agile Manifesto, you might not have any specification in the first place—the code is the specification. The AMD chips were probably not developed with the Agile Manifesto in mind, but a lot of software is. While documenting what ships is less likely to produce discrepancies between documentation and code, there are still a lot of ways for the code to surprise the user. Documentation omissions, such as unexplored edge cases or undocumented (perhaps unknown) limitations that result from how it a function was implemented are a more likely source of surprises in this case.
Practically, when development is allowed to outpace documentation, omissions are much more likely to occur than not. The trick is to not omit anything the customer will need (to the extent that’s humanly possible).
What’s the answer?
It’s not an easy call. Ideally, your reference documentation should be complete, accurate, useful, and usable. If there are known bugs (or as bugs or discrepancies become known) the documentation should refer to bugs in a way that such references are discoverable when they need to be and they disappear when the fixes are shipped. I don’t know of a content management system that does that, but perhaps it’s only because no one asked it to? Having a formal set of documentation, some separate (often, unpublished) list of bugs, errors, and omissions, and another knowledge base of problems & solutions might give you the ability to say “it’s documented,” but not in a very customer-friendly way. However, it is a good way to show customers how your company is organized.
The paradigm shift to have is to consider reference topics as having more value to your company’s brand than any other value category. Sure, they help the customer, occasionally, but even when they aren’t read, they speak volumes about what you think of your customer. The documentation takeaway from this event is how much you spend on reference content and the level of quality and detail you put into it tells customers and potential customers how much you value their time and their business.