Wrestling with UTF-8

Unicode logoI’m working on a project for an international customer base, initially supporting the Spanish and English languages. Having worked on international projects before, I knew that I’d have to make some accommodations, but I was still, in the 21st century, surprised at how un-automatic the process still was to make it all work. The surprises I’m seeing are now less frequent, but I no longer trust that I won’t find another around the next corner.

The project

I’m developing a small patient automation system (the piClinic) for use in limited resource clinics in developing countries. While there is no shortage of Electronic Health Record (EHR) systems, they tend to work best in well-funded and well-supported clinics and hospitals. For everyone else (which is a rather large population) there are virtually no suitable systems, especially for small clinics in countries that do not (yet) need to support the comprehensive (and complex) data collection and reporting requirements for health information in the U.S.

The piClinic system is designed to fill the gap between zero automation and complete EHR systems until that gap can be closed or the clinic grows out of it and becomes able to install a more full-featured system. Given that much of the developing world speaks a language other than English, internationalization is something that needs to be built in from the start and not just bolted on as an afterthought.
Continue reading “Wrestling with UTF-8”

Knowing your audience

Image of earth from space
Where to find our audience

For my PhD dissertation, I ran an unmoderated, online study to see how variations in page design and content of an API reference topic would affect how people found the information in the topic.

For the study, I solicited participants from several software development groups on LinkedIn and a few universities around the country. It’s definitely a convenience sample in that it’s not a statistically random sample, but it’s a pretty diverse one. Is it representative of my audience? I’m working on that. My suspicion, in the mean time, is that I’ve no reason to think it’s not, in that the people who read API documentation include a lot of people. For now, it’s representative enough.

A wide variety of people responded to the 750,000 or so software developers and people interested in software development that I contacted in one way or another. From those invitations (all in English, by the way), 436 people responded and 253 actually filled out enough of the survey to be useful. The 253 participants who completed the demographic survey and at least one of the tasks were from 29 different countries and reported speaking a total of 32 different native languages. Slightly less than half of the participants reported speaking English as their native language. After English, the top five non-English native languages in this group were: Hindi, Tamil, Telegu, Kannada, Spanish.

More than half of the participants didn’t speak English as their native language, but the vast majority of them should have no problem reading and understanding it. Of the 144 who didn’t speak English as their native language, 81% strongly agreed with the statement, “I can read, write, and speak English in a professional capacity or agreed or strongly agreed with the statement, “I can speak English as well as a native speaker.” So, while they are a very international group, the vast majority seem to speak English pretty well. The rest might need to resort to Google translate.

All of this supports the notion that not providing API documentation in any language other than English is not inconveniencing many developers—developers who respond to study invitations in English, at least. An interesting experiment might be to send this same survey to developers in other countries (India, China, Japan, LatAm, for starters) in their native languages to see how the responses vary.

So many studies. So little time.

Well, that was fast, Bob

Just three days after I post about how I was going to consider the minority, I post how  software development documentation should be written and published in just English.

Three days.

Am I ignoring my theme of the year (or the week)?

I don’t think so.

I did consider the rest of the non-English-speaking world (which is actually the majority of the world), when I thought about that. Will anyone be harmed by a lack of API documentation written in Miskito, for example. I don’t think so. If it turns out that they might need it, however, I’ll revise my decision. But, what about in Spanish? Possibly. But, from the information I have available, probably not. Inconvenienced, might be the worst-case scenario.

So, while I won’t be writing any technical documentation for the Miskito people of Central America in the immediate future, they haven’t been ignored in the decision. As a side note, later this year, I will be helping to give them something they need much more than technical writing.

The point of this year’s theme was to consider the vast minority–include them in the design and thought process. So far, I think I’ve done that (for going on four days, now!). The point is to consider them. Include them in the design process. Ask the question, “Will not accommodating the minority hinder, or worse, harm them?” Sometimes it will, such as in the case of accessibility aspects of documentation. Sometimes it won’t, as in the case of translating or writing for people who have no use for the documentation in the first place (i.e. there are other problems to solve before that becomes an issue). In either outcome, they were included in the process.

Four days and counting…

Lost in translation

I’m mixing a some of my favorite themes, music videos that feature dance numbers and technical writing with a dash of Latin culture.

(Bear with me… or skip to the technical part)

Latin pop star, Enrique Iglesias made this video of a song in Spanish. As I write this, it had over 648-million views since it was published April 11, 2014. About 65-million/month.

Released a few months later, on June 13, 2014, was this “localized” version in Spanglish (with mixed Spanish and English lyrics). It has had only 90-million views. About 13-million/month.

Now there are a lot of reasons that could explain such a difference (and 13-million a month isn’t shabby, by the way), but in listening to them both, I’m with the majority and prefer the original.  Both versions are good, but they are different and they each have a unique feel to them. To my ears, the all-Spanish version has more feeling and is a bit more romantic. In comparison, the Spanglish version doesn’t have the same sentiment, to my ears. You can compare the two sets of lyrics for yourself, if you’re interested: Spanish lyrics and Spanglish lyrics.

What’s this have to do with technical writing? (Thanks for hanging in there, by the way)

In code samples and technical documentation, like music (and many other fields), the original is almost always better than the translation. The best information about developing with a software library or API is going to be in the original language, which invariably is English.

So, as a technical writer of developer documentation for a software product with an international audience, should you write in English and/or localize the technical content? In the absence of evidence to the contrary, my default answer is, Yes you should write the documentation in English and No, you shouldn’t localize it. Localization is guaranteed to be costly and not guaranteed to be anything more than marginally beneficial.

Over the years, I’ve collected a lot of anecdotal information on this, but this is one of the things I’ve wanted to study more formally. Anecdotally, international developers believe that the translations aren’t as good as the original English version and tend to prefer the original English version over the same content translated into their language. One reason might be that software has a lot of keywords, class and method names, and the like in English which can’t really be translated. If you read the original, you know you won’t be tripping over inappropriately translated terms.

I have some info from my recent API documentation study, but it’s a bit tangential to this topic.

Yet another study to put on the list.