In my thread on user interactions with documentation, I suggested that you want your measurement instrument, usually some form of survey question(s), to influence the experience you’re trying to measure as little as possible. From a measurement standpoint, that’s nothing new. You never want your measurements to influence that which you’re measuring to prevent contaminating your measurement.
In the case of measuring feedback, or in this case, documentation or experience feedback, a recent tweet by Nate Silver (@NateSilver538) described how the use and perceived use of the measurement system had contaminated the data collected by Uber/Lyft (and countless other services). He tweeted, “Given that an equilibrium has emerged where any rating lower than 5 stars means your Uber/Lyft driver was bad, they should probably just replace the 5-star scale with a simple thumbs-up/thumbs-down.”
Given that an equilibrium has emerged where any rating lower than 5 stars means your Uber/Lyft driver was bad, they should probably just replace the 5-star scale with a simple thumbs-up/thumbs-down.
— Nate Silver (@NateSilver538) April 15, 2018
In this case, what people had learned about how Uber/Lyft used the rating to determine the driver’s future had contaminated the driver-performance measurement. At this point, this perception had become a non-trivial part of the measurement. What might have started out as a linear relationship where 5 was best, 4 not as good as 5, 3 somewhat worse than 4, etc. turned into 5 is great, not 5 is not great. At which point Nate Silver suggests, essentially, if it’s perceived to be a binary measure, it should be collected as a binary measure—especially when the raters (customers) know that it’s going to be used as a binary measure.
Instead of 1 to 5, Nate Silver suggests a thumbs up or down (or similar) binary rating scheme.
However, Shinobi replied to Nate’s post, suggesting this binary effect is the result of ““getting a five” has become a standard metric in the “business people who like easy to understand and over simplified metrics to fire people with” book.” Perhaps the rating should simply be labeled “Should we keep this driver?” as that is the perception of the rating’s net effect and further supports moving to the thumbs-up/down values in a way that is unfortunately reminiscent of a Roman Coliseum.
Are users capable of providing feedback that is more detailed?
This is an unfortunate use of a potentially valuable metric, but maybe the value it seemed to offer was imaginary all along?
@kevinpurcell replied to Nate’s tweet suggesting that, “the general populace is incapable of making finer graduations [than sucks/rocks].” Citing YouTube’s experience from nine years ago.
Google concluded the same thing about YouTube ratings (anything less than 5 is a sucks rating) and wrote a blog about it before moving to a sucks/rocks system (aka thumbs up/down). The general populace is incapable of making finer gradations.https://t.co/RtDvgskeVx
— Dr Kevin Purcell (@kevinpurcell) April 16, 2018
That’s entirely possible, but I’d argue that, in the case of an Uber/Lyft ride, the general populace is more than capable, but simply has other things on their mind—not unlike an encounter with documentation. Therefore, in both cases, the user/reader doesn’t have much interest or inclination in giving the rating any more thought than a binary decision (if that). If the user (in general) doesn’t care to give more than a binary thought to the situation, asking for more information than that (such as an analog rating on a scale), is asking for information that is not available, for whatever reason.
When asking for information from people, the best answers come from people who know the answer. In the context of a questionnaire, asking questions for which the respondent doesn’t know (or care to give) the answer is going to make the responses noisier. But, more to the point, it intrudes into the experience to the point it starts to influence the experience as well as the measurement. Lose-Lose, generally.
Whether you decide to treat the data collection as a measurement exercise or another opportunity to influence the experience is a business decision. I’m looking at the data collection exercise as simply one to collect data on which decisions can be made.
Justin Owings differentiates between some of the uses of such data in Analytics Theater vs. Actionable Insights in which he describes the difference between data for show and data for decisions and describes Analytics Theater as:
“…when your analytics tools provide you with signals—metrics, charts, graphs, whatever—that give the appearance of valuable insights while being mostly if not completely useless and/or poorly reflecting the underlying reality.”
And he summarizes the properties of Actionable Insights as:
- “When you get a signal from your analytics tool, do you have an idea of what you should do next?
- Does a visualization drive you to clear next steps?
- Does a metric tell you enough to deduce what is happening?
- Having absorbed it, do you end up more capable of doing your job?”
While having a good visualization and presentation of data is important, but it’s equally as important (if not more so) to collect valid data to display. Valid data come from an accurate measure of whatever it is that will drive the Actionable Insights that Justin describes. Also, you can rest assured that your customer/user/reader will be happier and provide you with better data to display if you don’t ask him or her for more data than they have the time or interest to provide.
But, the effect Nate describes demonstrates another piece of analytics wisdom in that you shouldn’t use data for business decisions to make business decisions that might adversely affect those who collect or generate the data. While that might seem like common sense (as in, “what could possibly go wrong?), it’s a pattern that is repeated with unfortunate consistency.