In many ways, the most
creative challenging and under-appreciated aspect of interaction design is
evaluating designs with people.
The insights that you’ll get
from testing designs with people can help you get new ideas, make changes,
decide wisely and fix bugs. One reason I think design is such an interesting
field is its relationship to truth and objectivity.
I find design so incredibly
fascinating because we can say more in response to a question like:
“How can we measure success?” than
“It’s just personal
preference” or
“Whatever feels right.”
At the same time the answers
are more complex and more open-ended, more subjective and require more wisdom
than just a number like 7 or 3.
One of the things that we’re
going to learn is the different kinds of knowledge that you can get out of
different kinds of methods.
Why evaluate designs with people?
Why learn about how people use interactive systems?
One major reason for this is that
it can be difficult to tell how good a user interface is until you’ve tried it
out with actual users and that’s because clients and designers and developers they
may know too much about the domain and the user interface or have acquired blinders through designing and
building the user interface.
At the same time they may not
know enough about the user’s actual tasks. And while experience and theory can
help, it can still be hard to predict what real users will actually do. You
might want to know,
“Can people figure out how
to use it?”or
“Do they swear or giggle
when using this interface?”
“How does this design
compare to that design?”and,
“If we changed the
interface, how does that change people’s behaviour?”
“What new practices might
emerge?” “How do things change over time?”
These are all great questions
to ask about an interface, and each will come from different methods. The value of having a broad toolbox of
different methods can be especially valuable in emerging areas like mobile and
social software where people’s use practices can be particularly
context-dependent and also evolves significantly over time in response to how
other people use software through network effects and things like that. To give
you a flavour of this, I’d like to quickly run through some common types of empiracal research in HCI.
This “watch someone use my
interface” approach is a common one in HCI. This basic strategy for traditional
user-centred design is to iteratively bring people into your lab or office
until you run out of time. And then release. And if you had deep pockets, these
rooms had a one-way glass mirror and the development team was on the other
side. In a leaner environment, this may be just bring in people into your dorm
room office. You’ll learn a huge amount by doing this. Every single time that I
or a student, friend, or colleague has watched somebody use a new interactive
system, we learn something as designers we get blinders to systems’ quirks,
bugs, and false assumptions.
However, there are some major
shortcomings to this approach. In particular, the setting probably isn’t very
ecologically valid. In the real world
people may have different tasks, goals, motivations and physical settings than
your office or lab. This can be especially true for user interfaces that you
think people might use on the go like at a bus stop or while waiting in line.
Second, there can be a
“please me” experimental bias, where when you bring somebody in to try out a
user interface they know that they’re trying out the technology that you
developed and so they may work harder or be nicer than they would if they had
to use it without the constraints of a lab setup with the person who developed
it watching right over them.
Third, in its most basic form where you’re just trying out just one
user interface, there is no comparison point. So while you can track when
people laugh or swear, or smile with joy, you won’t know whether they would’ve
laugh more, or sworn less, or smiled more if you’d had a different user
interface.
And finally it requires
bringing people to your physical location. This is often a whole lot easier
than a lot of people think. It can be a psychological burden even if nothing
else.
Survey of San Francisco street light |
And it’s relatively easy to compare multiple
alternatives. You can also automatically tally the results. You don’t even need
to build anything you can just show screen shots or mock-ups. One of the things
that I’ve learned the hard way, though, is the difference between what people
say they’re going to do and what they actually do. Ask people how often they
exercise and you’ll probably get a much more optimistic answer than how often
they really do exercise.
The same holds for the street
light example here.
Try to imagine what a number
of different street light designs might be is really different than actually
observing them on the street and having them become part of normal everyday
life. Still, it can be valuable to get feedback.
Another type of responder strategy
is focus
groups. In a focus group you’ll gather together a
small group of people to discuss a design or idea. The fact that focus groups
involve a group of people is a double-edged sword. On one hand, you can get
people to tease out of their colleagues things that they might not have thought
to say on their own on the other hand for a variety of psychological reasons,
people may be inclined to say polite things or generate answers completely on
the spot that are totally uncorrelated with what they believe or what they
would actually do.
Focus groups can be a
particularly problematic method when you are looking at trying to gather data about
taboo topics or about cultural biases. With those caveats right now we’re just
making a laundry list and I think that focus groups, like almost any other method
can play an important role in your toolbelt.
Our third category of
techniques is to get feedback from experts.
For example, in this class we’re going to do a bunch of peer critique for your
weekly project assignments. In addition to having users try your interface, it
can be important to eat your own dog food and use the tools that you built
yourself.
When you are getting feedback
from experts, it can often be helpful to have some kind of structured format, much
like the rubrics you’ll see in your project assignments.
And, for getting feedback on
user interfaces, one common approach to this structured feedback is called heuristicevaluation, and you’ll learn how to do that in this class it’s
pioneered by Jacob Nielson.
Our next genre is comparative
experiments,taking two or more distinct
options and comparing their performance to each other. These comparisons can
take place in lots of different ways:
They can be in the lab they
can be in the field; they can be online. These experiments can be more-or-less
controlled, and they can take place over shorter or longer durations. What
you’re trying to learn here is which option is the more effective and more often,
what are the active ingredients, what are the variables that matter in creating
the user experience that you seek.
Here’s an example, My former
PhD student Joel Brandt, and his colleague at Adobe, ran a number of studies
comparing help interfaces for programmers.
In particular they compared a
more traditional search style user interface for finding programming help with
a search interface that integrated programming help directly into your
environment. By running these comparisons they were able to see how
programmers’ behaviour differed based on the changing help user interface.
Comparative experiments have an advantage over surveys in that you get to see the actual behaviour as
opposed to self report and they can be better than usability studies because
you’re comparing multiple alternatives. This enables you to see what works
better or worse or at least what works different. I find that comparative
feedback is also often much more actionable. However, if you are running
controlled experiments online, you don’t get to see much about the person on
the other side of the screen. And if you are inviting people into your office
or lab the behaviour you’re measuring might not be very realistic.
If realistic longitudinal
behaviour is what you’re after, participant observation may be the approach for you. This approach is
just what it sounds like observing what people actually do in their actual work
environment. And this more long-term evaluation can be important for uncovering
things that you might not see in shorter term, more controlled scenarios.
For example, Prof. Bob Sutton
and Andrew Hargadon studied brainstorming.
The prior literature on
brainstorming had focused mostly on questions like “Do people come up with more
ideas?” What Bob and Andrew realized by going into the field was that
brainstorming served a number of other functions also, like for example,
brainstorming provides a way for members of the design team to demonstrate their
creativity to their peers. It allows them to pass along knowledge that then can
be reused in other projects; and it creates a fun exciting environment that
people like to work in and that clients like to participate in.
In a real ecosystem, all of
these things are important, in addition to just having the ideas that people
come up with. Nearly all experiments seek to build a theory on some level. I don’t
mean anything fancy by this just that we take some things to be more relevant
and other things less relevant.
We might, for example, assume that
the ordering of search results may play an important role in what people click
on, but that the batting average of the Detroit Tigers doesn’t, unless, of
course, somebody’s searching for baseball. If you have a theory that
sufficiently, formal mathematically that you may make predictions, then you can
compare alternative interfaces using that model, without having to bring people
in.
And we’ll go over that in this
blog a little bit, with respect to input models. This makes it possible to try
out a number of alternatives really fast. Consequently, when people use simulations,
it’s often in conjunction with something like Monte Carlo optimization.
One example of this can be
found in the ShapeWriter system, where Shuman Zhai and colleagues
figured out how to build a keyboard where people could enter an entire word in
a single stroke. They were able to do this with the benefit of formal models
and optimization-based approaches.
Simulation has mostly been
used for input techniques because people’s motor performance is probably the
most well-quantified area of HCI. And while we won’t get much to it in this blog, simulation can also be used for higher-level cognitive tasks for
example, Pete Pirolli and colleagues at PARC had built impressive models of
people’s web-searching behaviour. These models enable them to estimate, for
example, which links somebody is most likely to click on by looking at the
relevant link texts.
That’s our whirlwind tour of a
number of empirical methods that this class will introduce. You’ll want to pick
the right method for the right task and here’s some issues to consider,
If you did it again would you
get the same thing?
Another is generalizability
and realism
Does this hold for people
other than 18-year-old upper-middle-class students who are doing this for
course credit or a gift certificate?
Is this behaviour also what
you’d see in the real world or only in a more stilted lab environment?
Comparisons are important
because they can tell you how the user experience would change with different
interface choices, as opposed to just a “people liked it” study. It’s also
important to think about how to achieve how these insights efficiently, and not
chew up a lot of resources especially when your goal is practical.
My experience as a designer,
researcher, teacher, consultant, advisor and mentor has taught me that
evaluating designs with people is both easier and more valuable than many
people expect and there’s an incredible light bulb moment that happens when you
actually get designs in front of people and see how they use them. So, to sum
up this hole talk, I’d like to ask what could be the most important question,
“What do you want to learn?”
Comments
Post a Comment