Every programmer knows that it is good to measure how usable a software is, but for some reason no one besides maybe Apple Corporation actually invests sufficient resources in obtaining these empirical measures. A small team of entrepreneurial programmers or an open-source project could make quite a splash by upping the ante on this dimension. The way to proceed is to pick some small simple task that millions of people do every day and get incredibly geeky on obtaining empirical measures about the task.
Consider for example the following task. Joe User is subscribed to a handful of RSS feeds and is in the habit every morning of checking a few of the feeds in a feed reader. It is my hypothesis that it would make a big splash for a group of entrepreneurial or open-source programmers to recruit on the internet people willing to serve as experimental subjects. Dear volunteer, you (the group of programmers) say, please visit the following URL in your web browser. It is a feed reader. Notice that the reader is already subscribed to three feeds, X, Y and Z. Please, dear volunteer, delete the subscription to feed Y. Please, volunteer, subscribe to the feed for the blog U. Notice that there are four unread entries in feed Y. Please, volunteer, mark those four entries as read so that the reader will no longer show them to you or remind you about them.
You measure in milliseconds how long it takes each volunteer to do these tiny tasks. You measure how many mistakes they make. The goal is to try to develop a theory of the user. Maybe that theory takes the form of a probability distribution and you get all geeky with probability theory. Naturally, different volunteers get different versions of the software behind the feed reader — each version incorporating a different design decision.
Now you brainstorm ways to optimize the measures (e.g. the time it takes to do tasks, the number of errors made) and you iterate. This is what I mean by getting all geeky on the topic. Actually, time in milliseconds strikes me as a nonoptimal thing to measure. What I would really like to measure is how stressed out the user is, millisecond by millisecond. My observation of my computer-hating and computer-fearing friends suggests that stressful experiences or sensations induced by interaction with the computer is the basis of their hate or fear. Heart rate, galvanic skin response, tension in the muscles of the arms or the back of the neck all strike me as good or excellent barometers for whether the person is feeling stress. Alas, those things are probably too expensive to start to measure (though monitoringthe diaphram bears consideration). Perhaps there are “cognitive” barometers that are cheap to start to measure that are good proxies for how stressed out the user feels. For example, you could ask the volunteer test subject to keep three digits in his head, and you could use the volunteer’s forgetting one of the digits (when you ask him to repeat it back to you) as a proxy for how stressed out the user feels.
Let us review. I propose that it would be a great idea for a couple of programmers to pick some simple task done by millions every day and constantly run what I will call synthetic workloads. A synthetic workload is a made-up task: it is not real work. The volunteer (or beta tester where a “beta tester” is defined as someone who is bribed in some way, e.g., with a free copy of the software being tested when it ships) is not accomplishing anything while performing the synthetic workload except helping the programmers get experimental data about them. That is why it is called “synthetic”.
In a sense the volunteers (or beta testers) are donating hours of their time to “science” e.g. to the “science” of feed-reader usability and feed-reader righteousness and feed-reader splendidness. Every day volunteers perform these made-up tasks, and the tasks constantly change to reflect the needs of “science”. There are at least two “scientist” working together on this continuous endeavor where a “scientist” is defined as someone with technical skills in software, usability engineering, machine learning, etc. But the goal of the “scientists” is not to benefit humankind but rather to enrich themselves (or at least to make reputational coin in the world of open-source software that can be converted into fun cool paying gigs.)
I have used as my example the task of subscribing to a set of RSS feeds and checking them every day using a service such as Google Reader, but there are hundreds of other tasks that millions of people do every day that can and probably should be examined in the technical detail I have just described. I do not think it is important to pick the best one of these hundreds of tasks: the science team should just pick one that interests them and seems to be important in the lives of many people (and is poorly served by current user-interface designs??) and get on with the technical work described above. Moreover, it does not matter if the task is accomplished using a web interface to a software hosted on the server (like Google Reader is) or if the task is accomplished using an old-school GUI to a software hosted on the client.
There is a skill that I like to call “empathy for the user” that dovetails very nicely with programming skill. Some programmers seem to be almost completely absent in this skill (because of autism??) and some seem to have high amounts of the skill. I tend to think I have high amounts of the skill or that I could acquire high amounts with practice. This skill is probably highly useful in the endeavor I just described.
In my next blog entry, I will explain a little about what I mean when I say that the goal is to try to develop a theory of the user. In short, I think it pays to use a whole lot more math than usability experts have been using up to now. A startup could make a lot of money by getting a lot more geeky than companies and open-source projects have gotten on a theory of the user.
One of the two themes of this blogs is software startup companies of the Silicon-Valley type and this blog will assume that reader is familiar with the writing of Paul Graham on software startups. I will however indulge the reader now by explaining that according to Paul Graham, the key to success for a startup is to make something users love. Once the company has done that, it is usually pretty easy to figure out how to make money from that: the hard part is making something users love. We spent a few paragraphs above considering what to measure, what to optimize: time it takes for the user to complete a simple task, stress level of the user (as determined by muscle tone in the back of the neck), what? The best thing to optimize is whatever best predicts user satisfaction and user delight with a software.
I have talked about recruiting volunteer test subjects (performers of synthetic workloads) over the internet. Another way is to take a notebook computer to a cafe and invite patrons to perform simple synthetic tasks while you watch.
Tags: software industry
What’s the advantage of synthetic workloads over real workloads?
Hi, Phil.
If I remember correctly, the reason I specified synthetic workloads is that I was afraid that if I specified just workloads in general, the reader would visualize real workloads and consequently would object to the whole idea because of the loss of privacy. There is also the issue that when someone is doing real work, he is often unwilling or less willing to switch from doing that work to answering a question about the work (e.g., what are you trying to achieve right now?) and I thought asking questions of the user would be useful. Of course, there are no universal reasons that real workloads cannot yield satisfactory data.