Loading ...
Sorry, an error occurred while loading the content.

Re: Online Usability Tests

Expand Messages
  • meszaros_susan
    I think you can get a lot of broad information doing an on-line study, such as whether or not users can get the tasks accomplished at all, if so then how
    Message 1 of 29 , Mar 16 7:48 PM
    • 0 Attachment
      I think you can get a lot of "broad" information doing an on-line study,
      such as whether or not users can get the tasks accomplished at all, if
      so then how accurately/well. You might then know if the site is working
      or if it's not, and maybe even about certain areas of it. But I wonder
      about how you can capture the confusion or frustration of users which is
      most apparent from their body language and/or how they use the site
      (gleaned from watching rather than from a questionaire).

      Have you tried to work out the quality of information / quantity of
      information tradeoff? Is it better to have a broad user testing base
      (like your 1000) or a narrow base (say 5 - 7), and the cost of getting
      them (eg setting up the online test v. other more traditional methods)?
      Presumably it would be contingent upon what was being tested. And
      probably also upon the type of users you require.

      susan
      ps. You might want to include some kind of self-reporting on level of
      computer user and subject matter knowledge (1 - 5 or something).

      --- In agile-usability@yahoogroups.com, TomTullis@... wrote:
      >
      > One of the techniques I've been using more and more often lately is
      > conducting online usability tests with potentially rather large
      numbers of users.
      > I've found that these can be particularly useful in comparing
      alternative
      > designs and getting feedback about them quickly. For example, when
      the employees
      > of our company are reasonable representatives of the target users,
      we've
      > found that we can send out a quick online study to all our employees
      and get
      > perhaps 1,000 of them to do the study in a couple of days. Obviously
      I'm
      > wondering whether this technique might be a promising one for
      incorporating
      > usability into an agile development project.
      >
      > I think the best way to understand the type of online usability study
      I'm
      > talking about is to actually participate in one yourself. So I set
      up a
      > demonstration study evaluating websites about the Apollo Space
      Program. You can
      > participate in the study by going to:
      >
      > http://www.webusabilitystudy.com/Apollo/
      >
      > It should only take about 15 minutes. I'll post a summary of the
      results to
      > this group later this week. Please hold off on making comments about
      the
      > websites being evaluated until after I post that summary so as not to
      bias
      > others who may not have done the study yet.
      >
      > Please contact me directly (_TomTullis@..._ (mailto:TomTullis@...) )
      > if you have any technical problems with the online study.
      >
      > Thanks!
      > Tom Tullis
      > Senior VP, User Experience
      > Fidelity Investments
      >
      >
      >
      > **************It's Tax Time! Get tips, forms, and advice on AOL Money
      &
      > Finance. (http://money.aol.com/tax?NCID=aolprf00030000000001)
      >
    • TomTullis@aol.com
      I ve been pretty surprised by the specificity of the usability feedback you can get from an online study. Although it s not apparent in this particular
      Message 2 of 29 , Mar 16 8:27 PM
      • 0 Attachment
        I've been pretty surprised by the specificity of the usability feedback you can get from an online study.  Although it's not apparent in this particular study, one thing I've often done is select the "distracter" answers for each task in such a way that I could know with some degree of certainty what specific error they made that led them to that answer.  Of course, a great deal depends upon the tasks.  One thing not apparent from the user's perspective in this particular study is that you saw a randomly selected 4 tasks out of a complete set of 9 tasks.  I've done other studies where the total set of tasks was high as 20.  But with enough participants, each person can do a reasonable number and you still get plenty of data, across all participants, on all the tasks.
         
        On the other hand, it is true that even though an online study may be good at identifying WHAT usability problems may exist, it's not as good at helping you understand WHY the users are having those problems.  Lab tests are generally much better at getting at the why.
         
        One thing I have discovered is that participants in an online study usually express their frustration pretty clearly.  Here are some sample comments from this study (I'm not going to identify which site these were for!):
        "The information is horribly organized!"
        "The info architecture seems pathetic"
        "Terrible!"
         
        The tradeoff between the quality and quantity of information you can get from a traditional lab test vs. an online test is definitely an interesting question.  I haven't done a direct comparison between an online test and a lab test of the same site recently, but we did one a few years ago (http://home.comcast.net/~tomtullis/publications/RemoteVsLab.pdf) and got reasonably consistent results from the two tests.  But we did find that there were certain usability issues that only the lab test "caught" and certain ones that only the online test "caught".
         
        So I'm certainly not advocating that we replace traditional lab usability testing with online testing-- just that both might be useful tools.  In my User Experience team at Fidelity Investments, we do far more lab tests than we do online tests.  But I was wondering if online usability studies might play a significant role in the agile process.  Being able to get usability data from 20-30 people in one day, or even a few hours, seems to fit in nicely with the goals of agile development.
         
        Yes, we do commonly ask a variety of demographic questions, including ratings of self-reported web experience, subject-matter knowledge, and many other things in our "real" online studies.  They're generally collected on the starting page.  But I didn't include them in this sample study.  Sometimes these data provide very useful ways of slicing and dicing the rest of the data.
         
        By the way, it's not too late to do the sample online study, if other people are interested: http://www.webusabilitystudy.com/Apollo/.  It should only take about 15 minutes, and if you complete it by 11:00 pm (Eastern US time) Monday night, you get entered in a drawing for a $20 Amazon gift certificate!
         
        --Tom
         
        In a message dated 3/16/2008 10:48:44 P.M. Eastern Daylight Time, susan.meszaros@... writes:
        I think you can get a lot of "broad" information doing an on-line study,
        such as whether or not users can get the tasks accomplished at all, if
        so then how accurately/well. You might then know if the site is working
        or if it's not, and maybe even about certain areas of it. But I wonder
        about how you can capture the confusion or frustration of users which is
        most apparent from their body language and/or how they use the site
        (gleaned from watching rather than from a questionaire).

        Have you tried to work out the quality of information / quantity of
        information tradeoff? Is it better to have a broad user testing base
        (like your 1000) or a narrow base (say 5 - 7), and the cost of getting
        them (eg setting up the online test v. other more traditional methods)?
        Presumably it would be contingent upon what was being tested. And
        probably also upon the type of users you require.

        susan
        ps. You might want to include some kind of self-reporting on level of
        computer user and subject matter knowledge (1 - 5 or something).
         



      • Manish Pillewar
        Hi Tom, Just wondering, 1000 users seem to be a bit difficult to collate for any project Agile or Waterfall. Do the benefits of such a large user base outweigh
        Message 3 of 29 , Mar 16 9:24 PM
        • 0 Attachment
          Hi Tom,
          Just wondering,

          1000 users seem to be a bit difficult to collate for
          any project Agile or Waterfall. Do the benefits of
          such a large user base outweigh the money invested, is
          what I wonder (even if you're testing amazon.com).

          Agile releases are very short spanned. We have been
          trying to fit seamlessly, usability testing to agile(
          remote or otherwise). Apart from the usual issue of
          getting the real users against the 'representative
          subject matter experts', the time lines are pretty
          hectic to encompass a much detailed UT.Plus,
          other challenges like getting the project management
          team and the client on your side exist as well.

          The cycle of sending out the feedback form and getting
          responses asap to do a quick analysis of the data,
          seems to be generally a long one for such a huge user
          base.It would be a challenge to keep the rest of
          project team members busy till then. Late feedback
          does not really help in any way.

          I like the way the test is designed. However, I'm
          still trying the figure the benefit of this test to
          you.The real issues may not be well translated by the
          users. We definitely know that Task A is very
          difficult to do, but why and how is missing from that
          bit of information.

          I'm interested in the conclusions of this test for
          sure.

          Thanks for sharing this,

          Cheers!
          Manish Govind Pillewar
          www.thoughtworks.com





          I think you can get a lot of "broad" information doing
          an on-line study,
          such as whether or not users can get the tasks
          accomplished at all, if
          so then how accurately/well. You might then know if
          the site is working
          or if it's not, and maybe even about certain areas of
          it. But I wonder
          about how you can capture the confusion or frustration
          of users which is
          most apparent from their body language and/or how they
          use the site
          (gleaned from watching rather than from a
          questionaire) .

          Have you tried to work out the quality of information
          / quantity of
          information tradeoff? Is it better to have a broad
          user testing base
          (like your 1000) or a narrow base (say 5 - 7), and the
          cost of getting
          them (eg setting up the online test v. other more
          traditional methods)?
          Presumably it would be contingent upon what was being
          tested. And
          probably also upon the type of users you require.

          susan
          ps. You might want to include some kind of
          self-reporting on level of
          computer user and subject matter knowledge (1 - 5 or
          something).

          --- In agile-usability@ yahoogroups. com, TomTullis@..
          . wrote:
          >
          > One of the techniques I've been using more and more
          often lately is
          > conducting online usability tests with potentially
          rather large
          numbers of users.
          > I've found that these can be particularly useful in
          comparing
          alternative
          > designs and getting feedback about them quickly. For
          example, when
          the employees
          > of our company are reasonable representatives of the
          target users,
          we've
          > found that we can send out a quick online study to
          all our employees
          and get
          > perhaps 1,000 of them to do the study in a couple of
          days. Obviously
          I'm
          > wondering whether this technique might be a
          promising one for
          incorporating
          > usability into an agile development project.
          >
          > I think the best way to understand the type of
          online usability study
          I'm
          > talking about is to actually participate in one
          yourself. So I set
          up a
          > demonstration study evaluating websites about the
          Apollo Space
          Program. You can
          > participate in the study by going to:
          >
          > http://www.webusabi litystudy. com/Apollo/
          >
          > It should only take about 15 minutes. I'll post a
          summary of the
          results to
          > this group later this week. Please hold off on
          making comments about
          the
          > websites being evaluated until after I post that
          summary so as not to
          bias
          > others who may not have done the study yet.
          >
          > Please contact me directly (_TomTullis@ ..._
          (mailto:TomTullis@ ...) )
          > if you have any technical problems with the online
          study.
          >
          > Thanks!
          > Tom Tullis
          > Senior VP, User Experience
          > Fidelity Investments



          ___________________________________________________________
          Rise to the challenge for Sport Relief with Yahoo! For Good

          http://uk.promotions.yahoo.com/forgood/
        • Todd Zaki Warfel
          ... Additionally, I wonder about the reliability of the data. W/o a moderator there, I d expect a lot more participants just cranking through the tasks to try
          Message 4 of 29 , Mar 17 4:00 AM
          • 0 Attachment

            On Mar 16, 2008, at 10:48 PM, meszaros_susan wrote:

            Have you tried to work out the quality of information / quantity of information tradeoff? Is it better to have a broad user testing base (like your 1000) or a narrow base (say 5 - 7), and the cost of getting them (eg setting up the online test v. other more traditional methods)?

            Additionally, I wonder about the reliability of the data. W/o a moderator there, I'd expect a lot more participants just cranking through the tasks to try and get them done as quickly as possible rather than actually trying to perform the task as they would if they were at home or in their office (e.g. a typical environment and setting). 

            It's like comparing a steak to a hamburger—both beef, but not at all the same experience or flavour.

            Cheers!

            Todd Zaki Warfel
            President, Design Researcher
            Messagefirst | Designing Information. Beautifully.
            ----------------------------------
            Contact Info
            Voice: (215) 825-7423
            Email: todd@...
            ----------------------------------
            In theory, theory and practice are the same.
            In practice, they are not.

          • TomTullis@aol.com
            Sorry if I misled people with my statement that we often get 1,000 people to do our online studies in a couple of days. While that is true for studies aimed
            Message 5 of 29 , Mar 17 4:59 AM
            • 0 Attachment
              Sorry if I misled people with my statement that we often get 1,000 people to do our online studies in a couple of days.  While that is true for studies aimed at a general audience, it's also common to get data from perhaps 20-30 more targeted users in a couple of days.
               
              In terms of the timelines involved, that is in fact what I see as one of the major advantages of this online testing technique.  The sample study of the Apollo Space Program took me about 1 hour to set up.  And that included the time to decide on the tasks.  Of course, I already had the basic online testing tool and so it was just a matter of defining the characteristics of this particular study (the tasks and their possible answers, the websites being evaluated, etc).  In terms of the BASIC analysis of the data (task completion rates, task times, subjective ratings, etc), as you would imagine most of that is automated and takes perhaps half an hour.  That kind of analysis is very quick and takes the same amount of time whether you have data from 20 people or 2,000 people.  That lets you see the obvious things like which tasks users had the most trouble with. Or simply whether they had trouble or not.  In some situations, that may be what you're most interested in.
               
              The more time-consuming analysis is the analysis of the verbatim comments from the users.  And the time for that, of course, depends upon the number of participants.  And it is those verbatim comments that can often give you more insight into WHY people were encountering the problems that they were.  Making sense of verbatim comments from 1,000 people certainly takes longer than it does for 20 people.  But you can often see the trends pretty quickly even with the large sample sizes.
               
              I've been involved in projects where we decided what we wanted to test one morning, set up the online study, sent out the message about the study to a panel of a few hundred users that had been set up previously, got data back from perhaps 20-30 of them by that evening, and reviewed the data with the project team the next morning.
               
              As I mentioned in my other message, I certainly don't see online testing as a replacement for lab testing, but perhaps a useful adjunct.
               
              --Tom 
               
              In a message dated 3/17/2008 12:25:17 A.M. Eastern Daylight Time, manish1022@... writes:
              Hi Tom,
              Just wondering,

              1000 users seem to be a bit difficult to collate for
              any project Agile or Waterfall. Do the benefits of
              such a large user base outweigh the money invested, is
              what I wonder (even if you're testing amazon.com).

              Agile releases are very short spanned. We have been
              trying to fit seamlessly, usability testing to agile(
              remote or otherwise). Apart from the usual issue of
              getting the real users against the 'representative
              subject matter experts', the time lines are pretty
              hectic to encompass a much detailed UT.Plus,
              other challenges like getting the project management
              team and the client on your side exist as well.

              The cycle of sending out the feedback form and getting
              responses asap to do a quick analysis of the data,
              seems to be generally a long one for such a huge user
              base.It would be a challenge to keep the rest of
              project team members busy till then. Late feedback
              does not really help in any way.

              I like the way the test is designed. However, I'm
              still trying the figure the benefit of this test to
              you.The real issues may not be well translated by the
              users. We definitely know that Task A is very
              difficult to do, but why and how is missing from that
              bit of information.

              I'm interested in the conclusions of this test for
              sure.

              Thanks for sharing this,

              Cheers!
              Manish Govind Pillewar
              www.thoughtworks.com

               



            • Vincent Matyi
              Hi Tom, What online testing tool are you using, if you don t mind sharing? Thanks, Vincent
              Message 6 of 29 , Mar 17 5:47 AM
              • 0 Attachment
                Hi Tom,

                What online testing tool are you using, if you don't mind sharing?

                Thanks,
                Vincent

                TomTullis@... wrote:
                >
                > Sorry if I misled people with my statement that we often get 1,000
                > people to do our online studies in a couple of days. While that is
                > true for studies aimed at a general audience, it's also common to get
                > data from perhaps 20-30 more targeted users in a couple of days.
                >
                > In terms of the timelines involved, that is in fact what I see as one
                > of the major advantages of this online testing technique. The sample
                > study of the Apollo Space Program took me about 1 hour to set up. And
                > that included the time to decide on the tasks. Of course, I already
                > had the basic online testing tool and so it was just a matter of
                > defining the characteristics of this particular study (the tasks and
                > their possible answers, the websites being evaluated, etc). In terms
                > of the BASIC analysis of the data (task completion rates, task times,
                > subjective ratings, etc), as you would imagine most of that is
                > automated and takes perhaps half an hour. That kind of analysis is
                > very quick and takes the same amount of time whether you have data
                > from 20 people or 2,000 people. That lets you see the obvious things
                > like which tasks users had the most trouble with. Or simply whether
                > they had trouble or not. In some situations, that may be what you're
                > most interested in.
                >
                > The more time-consuming analysis is the analysis of the verbatim
                > comments from the users. And the time for that, of course, depends
                > upon the number of participants. And it is those verbatim comments
                > that can often give you more insight into WHY people were encountering
                > the problems that they were. Making sense of verbatim comments from
                > 1,000 people certainly takes longer than it does for 20 people. But
                > you can often see the trends pretty quickly even with the large sample
                > sizes.
                >
                > I've been involved in projects where we decided what we wanted to test
                > one morning, set up the online study, sent out the message about the
                > study to a panel of a few hundred users that had been set up
                > previously, got data back from perhaps 20-30 of them by that evening,
                > and reviewed the data with the project team the next morning.
                >
                > As I mentioned in my other message, I certainly don't see online
                > testing as a replacement for lab testing, but perhaps a useful adjunct.
                >
                > --Tom
                >
                > In a message dated 3/17/2008 12:25:17 A.M. Eastern Daylight Time,
                > manish1022@... writes:
                >
                > Hi Tom,
                > Just wondering,
                >
                > 1000 users seem to be a bit difficult to collate for
                > any project Agile or Waterfall. Do the benefits of
                > such a large user base outweigh the money invested, is
                > what I wonder (even if you're testing amazon.com).
                >
                > Agile releases are very short spanned. We have been
                > trying to fit seamlessly, usability testing to agile(
                > remote or otherwise). Apart from the usual issue of
                > getting the real users against the 'representative
                > subject matter experts', the time lines are pretty
                > hectic to encompass a much detailed UT.Plus,
                > other challenges like getting the project management
                > team and the client on your side exist as well.
                >
                > The cycle of sending out the feedback form and getting
                > responses asap to do a quick analysis of the data,
                > seems to be generally a long one for such a huge user
                > base.It would be a challenge to keep the rest of
                > project team members busy till then. Late feedback
                > does not really help in any way.
                >
                > I like the way the test is designed. However, I'm
                > still trying the figure the benefit of this test to
                > you.The real issues may not be well translated by the
                > users. We definitely know that Task A is very
                > difficult to do, but why and how is missing from that
                > bit of information.
                >
                > I'm interested in the conclusions of this test for
                > sure.
                >
                > Thanks for sharing this,
                >
                > Cheers!
                > Manish Govind Pillewar
                > www.thoughtworks.com
                >
                >
                >
                >
                >
                > ------------------------------------------------------------------------
                > It's Tax Time! Get tips, forms and advice on AOL Money Finance.
                > <http://money.aol.com/tax?NCID=aolprf00030000000001>
                >
              • Desilets, Alain
                ... working ... Here s an example. When I went to the URL you provided to test this Apollo site, I clicked on a button to start the study. This opened up a new
                Message 7 of 29 , Mar 18 3:15 AM
                • 0 Attachment
                  > I think you can get a lot of "broad" information doing an on-line
                  > study,
                  > such as whether or not users can get the tasks accomplished at all, if
                  > so then how accurately/well. You might then know if the site is
                  working
                  > or if it's not, and maybe even about certain areas of it. But I wonder
                  > about how you can capture the confusion or frustration of users which
                  > is
                  > most apparent from their body language and/or how they use the site
                  > (gleaned from watching rather than from a questionaire).

                  Here's an example.

                  When I went to the URL you provided to test this Apollo site, I clicked
                  on a button to start the study.

                  This opened up a new Firefox window for me to do my work in. But for
                  some reason, this window did not have any of the menus, and in
                  particular, I could find no way to search within a page. I fiddled with
                  this for a good 2 minutes until I eventually decided to just copy the
                  URL to a different Firefox window (one that I opened myself).

                  This is presumably something that would not be observable by your
                  system.

                  >
                  > Have you tried to work out the quality of information / quantity of
                  > information tradeoff? Is it better to have a broad user testing base
                  > (like your 1000) or a narrow base (say 5 - 7), and the cost of getting
                  > them (eg setting up the online test v. other more traditional
                  methods)?
                  > Presumably it would be contingent upon what was being tested. And
                  > probably also upon the type of users you require.

                  That would be a really interesting finding.

                  Alain
                • Todd Zaki Warfel
                  ... We just finished a study two weeks ago where one participant tried to enter their name and email address into a registration form. They typed in their
                  Message 8 of 29 , Mar 18 5:31 AM
                  • 0 Attachment

                    On Mar 18, 2008, at 6:15 AM, Desilets, Alain wrote:

                    This opened up a new Firefox window for me to do my work in. But for some reason, this window did not have any of the menus, and in particular, I could find no way to search within a page. I fiddled with this for a good 2 minutes until I eventually decided to just copy the URL to a different Firefox window (one that I opened myself).

                    This is presumably something that would not be observable by your system.

                    We just finished a study two weeks ago where one participant tried to enter their name and email address into a registration form. They typed in their first name (field), last name (field), email address (field), and password (field). The form returned an error message that they had an illegal character in their first name field and to try again. 

                    They tried three times. On the fourth time, they said "I wonder if it's not letting me put a space in." Yup, that was the problem. They couldn't get past the registration form. because the validation wouldn't allow spaces in the name fields (ironically, that means I wouldn't be able to register, as my last name has a space in it). 

                    Something on-line testing won't capture.


                    Cheers!

                    Todd Zaki Warfel
                    President, Design Researcher
                    Messagefirst | Designing Information. Beautifully.
                    ----------------------------------
                    Contact Info
                    Voice: (215) 825-7423
                    Email: todd@...
                    ----------------------------------
                    In theory, theory and practice are the same.
                    In practice, they are not.

                  • meszaros_susan
                    My motivation for asking those questions about on-line testing was due to our own shop being both agile and virtual or remote. Three of us work in 3 different
                    Message 9 of 29 , Mar 18 3:20 PM
                    • 0 Attachment
                      My motivation for asking those questions about on-line testing was due
                      to our own shop being both agile and virtual or remote. Three of us
                      work in 3 different cities, one of us in another country. Sometimes it
                      is really hard to get something tested very quickly using "on-site" or
                      "in presence of" testers (who actually have the background or
                      knowledge that represents the user base). So I'm interested to get a
                      better handle on when it makes sense to invest in some form of remote
                      testing (I prefer remote to on-line since most of what we do even with
                      face to face testing is on-line anyway).

                      To pluck the low-hanging fruit on a site, whether it exists on-line or
                      on paper is easy and can be quickly identified using just a few
                      testers....they all have the same issues and they quickly come to the
                      surface. In contrast, to figure out the subtleties, you need to
                      understand more detail on what you are testing, focus your testing on
                      and around that, and use the right testers (and probably observe them???).

                      But I haven't convinced myself yet that remote testing isn't useful or
                      can't be done cost-effectively. In some respect, we use remote testing
                      when we analyse our web site stats, even though it is at a rather
                      broad level and isn't focused very well, although it probably could
                      be. Has anybody used something like Google Analytics to try to do
                      testing? Setting up goals and measuring conversions (isn't this the
                      same as setting a task for users except not explicitly but imlicitly?)
                      for example.

                      I think there could be real value in developing dynamic testing
                      methods that can be ongoing in the life of a site/application, just
                      like we use code tests for refactoring code, why can't we use built-in
                      or dynamic user tests for refactoring design (or at least monitoring it)?

                      susan

                      --- In agile-usability@yahoogroups.com, "Desilets, Alain"
                      <alain.desilets@...> wrote:
                      >
                      > > I think you can get a lot of "broad" information doing an on-line
                      > > study,
                      > > such as whether or not users can get the tasks accomplished at all, if
                      > > so then how accurately/well. You might then know if the site is
                      > working
                      > > or if it's not, and maybe even about certain areas of it. But I wonder
                      > > about how you can capture the confusion or frustration of users which
                      > > is
                      > > most apparent from their body language and/or how they use the site
                      > > (gleaned from watching rather than from a questionaire).
                      >
                      > Here's an example.
                      >
                      > When I went to the URL you provided to test this Apollo site, I clicked
                      > on a button to start the study.
                      >
                      > This opened up a new Firefox window for me to do my work in. But for
                      > some reason, this window did not have any of the menus, and in
                      > particular, I could find no way to search within a page. I fiddled with
                      > this for a good 2 minutes until I eventually decided to just copy the
                      > URL to a different Firefox window (one that I opened myself).
                      >
                      > This is presumably something that would not be observable by your
                      > system.
                      >
                      > >
                      > > Have you tried to work out the quality of information / quantity of
                      > > information tradeoff? Is it better to have a broad user testing base
                      > > (like your 1000) or a narrow base (say 5 - 7), and the cost of getting
                      > > them (eg setting up the online test v. other more traditional
                      > methods)?
                      > > Presumably it would be contingent upon what was being tested. And
                      > > probably also upon the type of users you require.
                      >
                      > That would be a really interesting finding.
                      >
                      > Alain
                      >
                    • Catriona Campbell
                      Tom, I couldn t agree more with you! In the UK, our agency is using a remote testing tool similar to yours in an agile environment. It is fast becoming an
                      Message 10 of 29 , Mar 18 4:33 PM
                      • 0 Attachment

                        Tom,

                         

                        I couldn’t agree more with you!

                         

                        In the UK , our agency is using a remote testing tool similar to yours in an agile environment.

                         

                        It is fast becoming an essential medium with which to get representative and VERY fast responses about designs in each Sprint.

                         

                        More importantly, used appropriately – it gives management the quantitative and significant numbers they need to inform expensive design and development decisions!

                         

                        Catriona Campbell,

                        Director, Foviance

                         


                        From: agile-usability@yahoogroups.com [mailto:agile-usability@yahoogroups.com] On Behalf Of TomTullis@...
                        Sent: 16 March 2008 13:57
                        To: agile-usability@yahoogroups.com
                        Subject: [agile-usability] Online Usability Tests

                         

                        One of the techniques I've been using more and more often lately is conducting online usability tests with potentially rather large numbers of users.  I've found that these can be particularly useful in comparing alternative designs and getting feedback about them quickly.  For example, when the employees of our company are reasonable representatives of the target users, we've found that we can send out a quick online study to all our employees and get perhaps 1,000 of them to do the study in a couple of days.  Obviously I'm wondering whether this technique might be a promising one for incorporating usability into an agile development project.

                        I think the best way to understand the type of online usability study I'm talking about is to actually participate in one yourself.  So I set up a demonstration study evaluating websites about the Apollo Space Program.  You can participate in the study by going to:

                        http://www.webusabi litystudy. com/Apollo/

                         

                        It should only take about 15 minutes.  I'll post a summary of the results to this group later this week.  Please hold off on making comments about the websites being evaluated until after I post that summary so as not to bias others who may not have done the study yet.

                         

                        Please contact me directly (TomTullis@aol. com) if you have any technical problems with the online study.

                         

                        Thanks!

                        Tom Tullis

                        Senior VP, User Experience

                        Fidelity Investments



                      • William Pietri
                        ... I do know of places that capture this on a statistical basis. I have pals at one site that care a lot about the signup process. As with something like
                        Message 11 of 29 , Mar 18 6:23 PM
                        • 0 Attachment
                          Todd Zaki Warfel wrote:
                          > They tried three times. On the fourth time, they said "I wonder if
                          > it's not letting me put a space in." Yup, that was the problem. They
                          > couldn't get past the registration form. because the validation
                          > wouldn't allow spaces in the name fields (ironically, that means I
                          > wouldn't be able to register, as my last name has a space in it).
                          >
                          > Something on-line testing won't capture.

                          I do know of places that capture this on a statistical basis. I have
                          pals at one site that care a lot about the signup process. As with
                          something like Netflix, signups are the primary revenue source, so they
                          care a lot about it. They have two techniques that they use heavily.

                          One is to record all form submissions, successful or failed, and take a
                          close look at the ones that don't succeed the first time. They have
                          discovered a number of usability issues this way, including issues with
                          field labeling, allowed values, and hard-to-read CAPTCHAs

                          The other is extensive A/B testing. When they realized the CAPTCHA was a
                          problem, they ran several different versions in parallel to see which
                          one users had the most success with.

                          Given that most everybody uses JavaScript these days, I suspect one
                          could also do some interesting things by capturing the event stream and
                          sending it back via AJAX. And I'd love to try to build something that
                          ties those events into snapshots taken via Flash's webcam support.

                          But yeah, I agree with the basic point, which is that there are some
                          things much easier to get in person.

                          William
                        • TomTullis@aol.com
                          For a previous online usability study, I had made a conscious decision to suppress the browser s menu, and thus the Find on this page function (unless you
                          Message 12 of 29 , Mar 18 8:04 PM
                          • 0 Attachment
                            For a previous online usability study, I had made a conscious decision to suppress the browser's menu, and thus the "Find on this page" function (unless you happen to know the keyboard shortcut: Ctrl-F in both IE and Firefox). That was because that previous study primarily involved doing a visual search to find information on one page (we were comparing different visual treatments for the page), and we didn't want people using the "Find on this page" function.  Frankly, I carried that suppression of the browser's menu over to this sample study without even thinking about it.  (That's what I get for setting it up quickly!) In fact, I would probably want to use the "Find on this page" function for some of these tasks myself, especially on one of the sites.
                             
                            But in response to the point about this behavior of wanting to use "Find on this page" not being "observable" in an online study, it turns out that the verbatim comments entered by some of the participants did include a comment about not being able to use "Find on page" when they wanted to, or about discovering a work-around (like yours) for not having that menu item displayed.  So the behavior (or the desire to take that approach) was at least detectable from the online study's data. 
                             
                            On the other hand, there obviously are lots of behaviors that you can observe and capture in a lab setting that you can't capture in an online study.  But I think the question is whether you can quickly get useful information from an online study, perhaps to be followed up in a later iteration with a lab study.  I believe you can.
                             
                            By the way, about 112 people have completed the "Apollo Space Program Websites" sample study.  Since I'm doing this in my "spare time" in the evenings, I haven't had a chance to look at the data yet, except to see that we did get some significant differences in several of the usability metrics for the two sites.  I'll post a more detailed summary of the results in the next couple of days.
                             
                            Tom Tullis
                             
                            In a message dated 3/18/2008 6:16:03 A.M. Eastern Daylight Time, alain.desilets@... writes:
                            > I think you can get a lot of "broad" information doing an on-line
                            > study,
                            > such as whether or not users can get the tasks accomplished at all, if
                            > so then how accurately/well. You might then know if the site is
                            working
                            > or if it's not, and maybe even about certain areas of it. But I wonder
                            > about how you can capture the confusion or frustration of users which
                            > is
                            > most apparent from their body language and/or how they use the site
                            > (gleaned from watching rather than from a questionaire).

                            Here's an example.

                            When I went to the URL you provided to test this Apollo site, I clicked
                            on a button to start the study.

                            This opened up a new Firefox window for me to do my work in. But for
                            some reason, this window did not have any of the menus, and in
                            particular, I could find no way to search within a page. I fiddled with
                            this for a good 2 minutes until I eventually decided to just copy the
                            URL to a different Firefox window (one that I opened myself).

                            This is presumably something that would not be observable by your
                            system.

                            >
                            > Have you tried to work out the quality of information / quantity of
                            > information tradeoff? Is it better to have a broad user testing base
                            > (like your 1000) or a narrow base (say 5 - 7), and the cost of getting
                            > them (eg setting up the online test v. other more traditional
                            methods)?
                            > Presumably it would be contingent upon what was being tested. And
                            > probably also upon the type of users you require.

                            That would be a really interesting finding.

                            Alain

                             




                            Create a Home Theater Like the Pros. Watch the video on AOL Home.
                          • Manish Pillewar
                            Lovely! that you mention it. Google Analytics definitely seems to be one step further to remote usability testing. Combine that with WebOptimizer and you have
                            Message 13 of 29 , Mar 18 10:05 PM
                            • 0 Attachment
                              Lovely! that you mention it.
                              Google Analytics definitely seems to be one step
                              further to remote usability testing. Combine that with
                              WebOptimizer and you have some instant user feedback
                              for each change you do on your webapp. Google
                              analytics gives you a wide variety of usage reports,
                              which can help you understand your website visitors.
                              Pasting some excerpts from
                              http://www.googleanalyticsresults.com/2007_11_01_archive.html
                              "The Usage report allows you to see the percent of
                              website visitors using your internal search function
                              and those who are not. If you know your search
                              function is in need of an upgrade, and there area a
                              high percent of visitors engaging with the internal
                              search function, then you should consider upgrading
                              your internal search function to better meet their
                              needs.

                              If your website is already running a enterprise level
                              search solution, like the Google Search Appliance,
                              then this report will allow you to begin measuring the
                              ROI (Return On Investment) of your search solution.

                              The Goal Conversion tab within the Usage report allows
                              you to dive further into how your website's internal
                              search is performing. Allowing you to see if visitors
                              using your internal search are more likely to convert
                              or not.

                              The following questions will make you think critically
                              about your internal search:

                              * Can visitors find the information they require
                              in order to convert without using the internal search?
                              * Where is the search box located on your website?
                              * Is the search box consistently located across
                              your entire site?
                              * Are visitors getting the search results they
                              expect?
                              * Does your internal search cope with spelling
                              mistakes?
                              * Do visitors using search spend more time or view
                              more pages on your website?


                              Once you have a good understanding of how your
                              internal search is setup and used you will be able to
                              improve its performance. The Site Search Usage report
                              will then allow you to gauge the effectiveness of your
                              improvements."

                              Well, we are experimenting with this a bit, but we can
                              definitely see this bringing in more figures to nail
                              down usability as a key driver for the design. The
                              figures are up front for all to see :-)

                              Internally, you can use google's web optimizer to see
                              what little differences to the content on the webapp
                              makes a difference. Say a simple thing like shifting
                              the search from top right corner to make it a part of
                              the left navigation. The design iterations can be
                              shorter, till you see your goals accomplished. Check
                              out this online video,

                              http://www.youtube.com/watch?v=AU87ozKYY4M



                              -Manish Pillewar
                              -Gilberto Medrano
                              www.thoughtworks.com






                              From: "meszaros_susan" <susan.meszaros@...>
                              Yahoo! DomainKeys has confirmed that this message was
                              sent by yahoogroups.com. Learn more
                              Date: Tue, 18 Mar 2008 22:20:36 -0000
                              Subject: [agile-usability] Re: Online Usability Tests
                              --Snipped---

                              But I haven't convinced myself yet that remote testing
                              isn't useful or can't be done cost-effectively. In
                              some respect, we use remote testing when we analyse
                              our web site stats, even though it is at a rather
                              broad level and isn't focused very well, although it
                              probably could
                              be. Has anybody used something like Google Analytics
                              to try to do testing? Setting up goals and measuring
                              conversions (isn't this the same as setting a task for
                              users except not explicitly but implicitly?)for
                              example.

                              --snipped--

                              susan

                              --- I

                              Thanks and Regards
                              Manish Govind Pillewar
                              Sr. User Experience Designer
                              Thoughtworks India Pvt. Ltd.Bangalore-India

                              Tel. +91 9880566951 (M)
                              +91 80 41113967 (Eve.)
                              Smith & Wesson: The original point and click interface :-)



                              ___________________________________________________________
                              Rise to the challenge for Sport Relief with Yahoo! For Good

                              http://uk.promotions.yahoo.com/forgood/
                            • Todd Zaki Warfel
                              ... Seems like a lot more work than just watching a few people using it. Cheers! Todd Zaki Warfel President, Design Researcher Messagefirst | Designing
                              Message 14 of 29 , Mar 19 4:03 AM
                              • 0 Attachment

                                On Mar 18, 2008, at 9:23 PM, William Pietri wrote:
                                Given that most everybody uses JavaScript these days, I suspect one could also do some interesting things by capturing the event stream and sending it back via AJAX. And I'd love to try to build something that ties those events into snapshots taken via Flash's webcam support.

                                Seems like a lot more work than just watching a few people using it.


                                Cheers!

                                Todd Zaki Warfel
                                President, Design Researcher
                                Messagefirst | Designing Information. Beautifully.
                                ----------------------------------
                                Contact Info
                                Voice: (215) 825-7423
                                Email: todd@...
                                ----------------------------------
                                In theory, theory and practice are the same.
                                In practice, they are not.

                              • Todd Zaki Warfel
                                ... How many people didn t complete it or bailed? Any idea why? I think the main thing to keep in mind is that online usability testing is just another
                                Message 15 of 29 , Mar 19 4:06 AM
                                • 0 Attachment

                                  On Mar 18, 2008, at 11:04 PM, TomTullis@... wrote:
                                  By the way, about 112 people have completed the "Apollo Space Program Websites" sample study.

                                  How many people didn't complete it or bailed? Any idea why?

                                  I think the main thing to keep in mind is that online usability testing is just another tool—good for some things, not so good for others.


                                  Cheers!

                                  Todd Zaki Warfel
                                  President, Design Researcher
                                  Messagefirst | Designing Information. Beautifully.
                                  ----------------------------------
                                  Contact Info
                                  Voice: (215) 825-7423
                                  Email: todd@...
                                  ----------------------------------
                                  In theory, theory and practice are the same.
                                  In practice, they are not.

                                • William Pietri
                                  ... Undeniable. But better data is sometimes worth more work. William
                                  Message 16 of 29 , Mar 19 10:53 AM
                                  • 0 Attachment
                                    Todd Zaki Warfel wrote:
                                    On Mar 18, 2008, at 9:23 PM, William Pietri wrote:
                                    Given that most everybody uses JavaScript these days, I suspect one could also do some interesting things by capturing the event stream and sending it back via AJAX. And I'd love to try to build something that ties those events into snapshots taken via Flash's webcam support.

                                    Seems like a lot more work than just watching a few people using it.

                                    Undeniable. But better data is sometimes worth more work.

                                    William
                                  • Todd Zaki Warfel
                                    That s my point. Watching a few people would provide richer data that will tell you the why, not just the what. ... Cheers! Todd Zaki Warfel President, Design
                                    Message 17 of 29 , Mar 19 11:58 AM
                                    • 0 Attachment
                                      That's my point. Watching a few people would provide richer data that will tell you the why, not just the what.

                                      On Mar 19, 2008, at 1:53 PM, William Pietri wrote:
                                      Undeniable. But better data is sometimes worth more work. 


                                      Cheers!

                                      Todd Zaki Warfel
                                      President, Design Researcher
                                      Messagefirst | Designing Information. Beautifully.
                                      ----------------------------------
                                      Contact Info
                                      Voice: (215) 825-7423
                                      Email: todd@...
                                      ----------------------------------
                                      In theory, theory and practice are the same.
                                      In practice, they are not.

                                    • patdaman999
                                      ... This topic is very similar to my master research topic. Last May, many members of this community assisted me by filling out a web survey aimed at
                                      Message 18 of 29 , Mar 19 12:36 PM
                                      • 0 Attachment
                                        --- In agile-usability@yahoogroups.com, William Pietri <william@...>
                                        wrote:
                                        >
                                        > Todd Zaki Warfel wrote:
                                        > > On Mar 18, 2008, at 9:23 PM, William Pietri wrote:
                                        > >> Given that most everybody uses JavaScript these days, I suspect
                                        > >> one could also do some interesting things by capturing the event
                                        > >> stream and sending it back via AJAX. And I'd love to try to build
                                        > >> something that ties those events into snapshots taken via Flash's
                                        > >> webcam support.
                                        > >
                                        > > Seems like a lot more work than just watching a few people using it.
                                        >
                                        > Undeniable. But better data is sometimes worth more work.
                                        >
                                        > William
                                        >

                                        This topic is very similar to my master research topic. Last May,
                                        many members of this community assisted me by filling out a web survey
                                        aimed at collecting requirements for a low fi prototyping tool. We
                                        (Frank Maurer and myself) have created a tool called ActiveStory which
                                        allows designers to create low fi wireframe prototypes via a tablet
                                        and deploy them to the internet. Usability test participants can then
                                        use the prototype and the system automatically collects data such as:
                                        the time spent on a page, the mouse trail of each user, and allows
                                        users to post comments on any page in the design. If anyone is
                                        interested in evaluating the tool I would greatly appreciate it.
                                        ActiveStory can be downloaded from
                                        http://pages.cpsc.ucalgary.ca/~piwilson/research

                                        Thanks

                                        Patrick Wilson
                                        MSc Student
                                        University of Calgary
                                      • Patrick Wilson
                                        ... Sorry, I have been informed the link is wrong... here is the right one. http://pages.cpsc.ucalgary.ca/~piwilson/research.php sorry about that. Patrick
                                        Message 19 of 29 , Mar 19 1:45 PM
                                        • 0 Attachment
                                          --- In agile-usability@yahoogroups.com, "patdaman999" <piwilson@...>
                                          wrote:
                                          >
                                          > --- In agile-usability@yahoogroups.com, William Pietri <william@>
                                          > wrote:
                                          > >
                                          > > Todd Zaki Warfel wrote:
                                          > > > On Mar 18, 2008, at 9:23 PM, William Pietri wrote:
                                          > > >> Given that most everybody uses JavaScript these days, I suspect
                                          > > >> one could also do some interesting things by capturing the event
                                          > > >> stream and sending it back via AJAX. And I'd love to try to build
                                          > > >> something that ties those events into snapshots taken via Flash's
                                          > > >> webcam support.
                                          > > >
                                          > > > Seems like a lot more work than just watching a few people using it.
                                          > >
                                          > > Undeniable. But better data is sometimes worth more work.
                                          > >
                                          > > William
                                          > >
                                          >
                                          > This topic is very similar to my master research topic. Last May,
                                          > many members of this community assisted me by filling out a web survey
                                          > aimed at collecting requirements for a low fi prototyping tool. We
                                          > (Frank Maurer and myself) have created a tool called ActiveStory which
                                          > allows designers to create low fi wireframe prototypes via a tablet
                                          > and deploy them to the internet. Usability test participants can then
                                          > use the prototype and the system automatically collects data such as:
                                          > the time spent on a page, the mouse trail of each user, and allows
                                          > users to post comments on any page in the design. If anyone is
                                          > interested in evaluating the tool I would greatly appreciate it.
                                          > ActiveStory can be downloaded from
                                          > http://pages.cpsc.ucalgary.ca/~piwilson/research
                                          >
                                          > Thanks
                                          >
                                          > Patrick Wilson
                                          > MSc Student
                                          > University of Calgary
                                          >


                                          Sorry, I have been informed the link is wrong...

                                          here is the right one.
                                          http://pages.cpsc.ucalgary.ca/~piwilson/research.php

                                          sorry about that.
                                          Patrick
                                        • William Pietri
                                          ... Hmmm. I guess I should have been more complete in my last message. I m totally in favor of watching a few people. I love it, and do it whenever I can.
                                          Message 20 of 29 , Mar 19 4:30 PM
                                          • 0 Attachment
                                            Todd Zaki Warfel wrote:
                                            On Mar 19, 2008, at 1:53 PM, William Pietri wrote:
                                            Undeniable. But better data is sometimes worth more work. 


                                            That's my point. Watching a few people would provide richer data that will tell you the why, not just the what.

                                            Hmmm. I guess I should have been more complete in my last message.

                                            I'm totally in favor of watching a few people. I love it, and do it whenever I can. Indeed, I've encouraged it often enough that some of my clients are getting tired of hearing it. But there are things you can't learn from that. I tend to see the statistical and personal methods as complimentary, each raising questions that the other can answer, and each helping mitigating the flaws of the other.

                                            Watching a few people is prone to sample bias, observer bias, confirmation bias, and whatever the social-science equivalent of founder effect is. The data-driven approaches, on the other hand, are prone to promoting too-simple models, often fail to engage our deep understanding of personal behavior and motivation, and, as you say, are better at pointing out the existence of a problem than possible solutions.

                                            Neither approach is perfect, but I can't think of a project where I haven't found them both useful.

                                            However, when I mentioned using event tracking and web cams, I was thinking of it partly for remote versions of "watching a few people". I mainly work with internet startups, which have limited resources but global reach. In practice, the few people that they can afford to watch are likely to be picked for their availability. It'd be nice to reduce some of the barriers around studying distant users, and I think new tech could help with that.

                                            William



                                          • Todd Zaki Warfel
                                            ... Another reason why we typically recommend complimentary methods. Personally, if I had my choice of observing 12 participants in person or 100 people run
                                            Message 21 of 29 , Mar 19 5:42 PM
                                            • 0 Attachment

                                              On Mar 19, 2008, at 7:30 PM, William Pietri wrote:
                                              Neither approach is perfect, but I can't think of a project where I haven't found them both useful.

                                              Another reason why we typically recommend complimentary methods. Personally, if I had my choice of observing 12 participants in person or 100 people run through an automated remote process, I'd take the 12 in person every single time. In the 15+ years I've been doing this type of research, I've yet to find a pattern identified by survey and remote methods w/100 people that we weren't able to identify with 12 in person. 

                                              The main benefit we get from more quantitative methods is to satisfy the marketing research people who only believe in quantitative methods.

                                              As always, YMMV (your mileage may vary)


                                              Cheers!

                                              Todd Zaki Warfel
                                              President, Design Researcher
                                              Messagefirst | Designing Information. Beautifully.
                                              ----------------------------------
                                              Contact Info
                                              Voice: (215) 825-7423
                                              Email: todd@...
                                              ----------------------------------
                                              In theory, theory and practice are the same.
                                              In practice, they are not.

                                            • William Pietri
                                              ... I think we re talking about different kinds of quantitative methods. The choice I typically see isn t 12 local vs 100 remote. It s 4 versus 1,000. Or
                                              Message 22 of 29 , Mar 20 10:17 AM
                                              • 0 Attachment
                                                Todd Zaki Warfel wrote:
                                                > Personally, if I had my choice of observing 12 participants in person
                                                > or 100 people run through an automated remote process, I'd take the 12
                                                > in person every single time. In the 15+ years I've been doing this
                                                > type of research, I've yet to find a pattern identified by survey and
                                                > remote methods w/100 people that we weren't able to identify with 12
                                                > in person.
                                                >
                                                > The main benefit we get from more quantitative methods is to satisfy
                                                > the marketing research people who only believe in quantitative methods.

                                                I think we're talking about different kinds of quantitative methods.

                                                The choice I typically see isn't 12 local vs 100 remote. It's 4 versus
                                                1,000. Or 10,000. Or 100,000. And actual users doing actual tasks versus
                                                recruited users doing requested tasks.

                                                The simplest version of this is just log analysis combined with some
                                                basic instrumentation. For example, at one client I looked into failed
                                                orders. Looking at the data, circa 10% of orders were failing at the
                                                credit card processing stage. Some of them were legitimate failures, but
                                                I suspected that not all of them were. So we logged every bit of
                                                information in every order attempt.

                                                It turned out there were a number of minor user interface issues, none
                                                affecting more than 2% of order attempts, which is well below the power
                                                of a 12-user study to resolve. And several related to different
                                                international styles of entering addresses, which we couldn't have
                                                solved with a local user study anyhow. The cost-benefit ratio was also
                                                much better; from inception to recommendations, it was under two
                                                person-days of work.

                                                I'm also fond of tracking metrics. In-person user testing is good for
                                                things you know to ask about, but you need live site metrics to catch
                                                things that you didn't even know have changed. One client has a very
                                                data-driven site, and manually testing all the key pages with every data
                                                update is impossible. They track dozens of metrics, and significant
                                                deviations in key numbers get people paged. Good metrics also lets you
                                                catch surprises with what you thought were minor changes to the site.

                                                And once you have some metrics, A/B testing can really pay off. Suppose
                                                you want to know which of three new landing page versions increases
                                                signups by 5%. You can't do that with a 12-person user test. But you can
                                                put each version up in parallel for a week with users randomly assigned
                                                to each, and get thousands or tens of thousands of data points.

                                                For any of these approaches, the data can lead to additional questions.
                                                Some of those are best answered with more data, but many can be more
                                                effectively approached with traditional user testing.

                                                William
                                              • Todd Zaki Warfel
                                                ... Well, 4 isn t enough. I wouldn t recommend any less than 5 and typically 8-12 is best, otherwise you don t have enough to start seeing significance in
                                                Message 23 of 29 , Mar 20 1:16 PM
                                                • 0 Attachment

                                                  On Mar 20, 2008, at 1:17 PM, William Pietri wrote:
                                                  I think we're talking about different kinds of quantitative methods.

                                                  The choice I typically see isn't 12 local vs 100 remote. It's 4 versus 1,000. Or 10,000. Or 100,000. And actual users doing actual tasks versus recruited users doing requested tasks.

                                                  Well, 4 isn't enough. I wouldn't recommend any less than 5 and typically 8-12 is best, otherwise you don't have enough to start seeing significance in patterns. That's going to be one reason that 1000 is going to yield better results with web tracking—you don't have enough in your qualitative study.

                                                  [...] It turned out there were a number of minor user interface issues, none affecting more than 2% of order attempts, which is well below the power of a 12-user study to resolve. And several related to different international styles of entering addresses, which we couldn't have solved with a local user study anyhow. The cost-benefit ratio was also much better; from inception to recommendations, it was under two person-days of work.

                                                  A couple of things here: 
                                                  1. I would suspect that the "minor user interface issues" would have been easily corrected with simply having a good informed interaction designer or usability specialist assess the interface. 

                                                  2. Did you do a 12 user study on this interface? I'll bet that if you did, you would have found the same issues—I've done this literally hundreds of times. If you didn't, how would you know that it's beyond what you can find from a 12 person study? We use web metrics to help identify key abandonment areas, then in-person field studies to find out the why. For example, we had a client who had a significant abandonment in one of their cart screens, but didn't know exactly which fields. They could have spent time coding up the fields w/JS to track every single one to figure out. Instead we did a quick study w/12 people and found out that it was a combination of two fields that was causing the problem on that screen and exactly why they were an issue. Problem fixed. 

                                                  Just a different approach. And yes, we used a mix of qual and quan—something we do quite often.


                                                  I'm also fond of tracking metrics. In-person user testing is good for things you know to ask about, but you need live site metrics to catch things that you didn't even know have changed. 

                                                  Not sure I agree with that. Might be the way you're (or the person doing testing) is conducting the tests. Our testing format utilizes an open format with discovery method. We have some predefined tasks based on goals that we know/hope people are trying to accomplish with the site. This info comes from combined sources (e.g. metrics, sales, marketing, customer service, customer feedback). However, that's not all of it—we always include open ended discovery time to watch for things we don't expect, anticipate, our couldn't plan for—unexpected tasks. We've done this in pretty much every test in the last couple of years and every time find a number of new features, functions, and potential lines of revenue for our client. 

                                                  And once you have some metrics, A/B testing can really pay off. Suppose you want to know which of three new landing page versions increases signups by 5%. You can't do that with a 12-person user test. But you can put each version up in parallel for a week with users randomly assigned to each, and get thousands or tens of thousands of data points.

                                                  True. Our method selection is goal driven. What's your goal? That drives your method. Just to provide the counter point to that, the downside of A/B the way you're suggesting is that while it will tell you that one model increased signup by 5%, it won't tell you why. A quick 12 person study will tell you why and give you guidance on which one would probably increase sign-up. You then take that info and challenge/validate it with a quantitative study like you suggest. Or the reverse, take your A/B and do a supplemental 12 person study to find out why. 

                                                  Answering the why will give you far more from a design insight perspective than just seeing what happened.


                                                  Cheers!

                                                  Todd Zaki Warfel
                                                  President, Design Researcher
                                                  Messagefirst | Designing Information. Beautifully.
                                                  ----------------------------------
                                                  Contact Info
                                                  Voice: (215) 825-7423
                                                  Email: todd@...
                                                  ----------------------------------
                                                  In theory, theory and practice are the same.
                                                  In practice, they are not.

                                                • William Pietri
                                                  Hi, Todd. I think we agree entirely that people should do both quantitative and qualitative research, and both have their strengths. A few responses to minor
                                                  Message 24 of 29 , Mar 25 1:10 AM
                                                  • 0 Attachment
                                                    Hi, Todd. I think we agree entirely that people should do both
                                                    quantitative and qualitative research, and both have their strengths. A
                                                    few responses to minor points.

                                                    Todd Zaki Warfel wrote:
                                                    > Well, 4 isn't enough. [...]
                                                    > 1. I would suspect that the "minor user interface issues" would have
                                                    > been easily corrected with simply having a good informed interaction
                                                    > designer or usability specialist assess the interface.
                                                    >
                                                    > 2. Did you do a 12 user study on this interface?

                                                    I work mainly with startups and small companies. Perhaps you can do a
                                                    12-user study more effectively than they can, but it's well beyond
                                                    something a lot of small shops can afford to do on a regular basis.
                                                    Doing 4-6 people once a month is more their speed.

                                                    It may be that their designers don't meet the level you consider good.
                                                    However, they are generally the best the company has been able to find.
                                                    Whatever practices I recommend have to work in that context, which
                                                    frequently includes people wearing multiple hats.

                                                    > I'll bet that if you did, you would have found the same issues—I've
                                                    > done this literally hundreds of times.

                                                    I'm sure you win that bet a lot, but in this case you would have lost.

                                                    One substantial cause of failure was international addresses. The cost
                                                    of a multi-continent usability study surely makes sense for some people,
                                                    but not for the sums involved in this case. At the cost of a couple of
                                                    days time around the office, though, it paid off nicely.

                                                    > If you didn't, how would you know that it's beyond what you can find
                                                    > from a 12 person study?

                                                    Well, I said that because of a little math. Perhaps I'm doing it wrong,
                                                    but if only 1-2% of people have some issue, the odds of finding that
                                                    particular issue in a 12-person test don't seem particularly high. And
                                                    if only 1 of 12 has a problem, it would be hard to say whether it's a
                                                    pattern or a fluke. Whereas with 10,000 data points, you'll be able to
                                                    do solid ROI calculations so that you know which fixes are worth the effort.

                                                    > We use web metrics to help identify key abandonment areas, then
                                                    > in-person field studies to find out the why. For example [...]

                                                    I find it weird to keep saying this, but I really like in-person
                                                    studies. I think they are the bees knees. Honest. I do them every chance
                                                    I get.

                                                    The only reason I got involved in this thread was to mention how on-line
                                                    testing indeed can capture some things you said it couldn't, and to
                                                    mention how some people I know are doing it. Nobody should feel obliged
                                                    to do it that way, and they certainly shouldn't stop doing in-person
                                                    studies.

                                                    William
                                                  • Todd Zaki Warfel
                                                    ... This comes down to a recruiting issue. You don t need to do a multi- continent study to find this—you can use craigslist to recruit international people
                                                    Message 25 of 29 , Mar 25 5:19 AM
                                                    • 0 Attachment

                                                      On Mar 25, 2008, at 4:10 AM, William Pietri wrote:
                                                      One substantial cause of failure was international addresses. The cost of a multi-continent usability study surely makes sense for some people[...]

                                                      This comes down to a recruiting issue. You don't need to do a multi-continent study to find this—you can use craigslist to recruit international people to solve this issue. During the study, just have them use their address from home, or the country they came from before they got here to see this. 

                                                      Now, if this is something you don't think of to test in the study, then that's another story. 

                                                      Quick question for you, how did you find this w/the on-line study? What did you do to measure/find something like this using an on-line study? The reason I ask is that it would be nice for others to know the technique so they could use it to look for this when they are testing (reason I included the Craigslist item above).

                                                      Well, I said that because of a little math. Perhaps I'm doing it wrong, but if only 1-2% of people have some issue, the odds of finding that particular issue in a 12-person test don't seem particularly high. And if only 1 of 12 has a problem, it would be hard to say whether it's a pattern or a fluke. Whereas with 10,000 data points, you'll be able to do solid ROI calculations so that you know which fixes are worth the effort.

                                                      Aw, see that's the flaw in the equation. First, what you're looking for is a pattern. Second, what you're looking for is something that might be a smaller issue by sheer number, but you know it's significant. For example, we recently had an issue I sited earlier with someone getting hung up at registration due to a space in their name. That was one person, but it's something you know is a show stopper. In short, it depends on what the issue is. Some of "knowing this" only comes with time and experience. Some of it is a no brainer. Some of these items are easier to see with 10,000 data points. Most of them are going to be something you'll see a pattern with 5-6 people and confirmed with 8-12. 

                                                      The point is that if you start to see it in a few people in a 12 person study, you're going to see it in hundreds or thousands with 10,000. I think the issue is that people get hung up on sheer numbers instead of percentages. We rarely have issues that aren't found in 70% or more of participants in an 8-12 person study. Either way, we make sure that when we report we include:
                                                      1. Percent of people reporting (e.g. 7/10 experienced this)
                                                      2. Percentage of people it was an issue with (e.g. 8/10)
                                                      3. If it's a small number, then the number of people and why we think it might require further investigation.

                                                      For example if we did a 10 person study:
                                                      Issue 70% of 100% reporting would be 10/10
                                                      Issue 50% of 80% reporting 4/8

                                                      This is particularly important to be accurate about the reporting. Simply stating 4 of our participants isn't accurate—you need to indicate that 8 came across the issue and of those 8, 4 or 50%, had trouble with it.

                                                      I find it weird to keep saying this, but I really like in-person studies. I think they are the bees knees. Honest. I do them every chance I get.

                                                      Obviously, I agree, but I also think that remote studies are extremely beneficial. One of the biggest benefits is that they're using their own machine and you get to view how they access things (e.g. bookmarks, pop-up blockers). Additionally, it enables you to do research w/geographically dispersed audiences. 

                                                      For example, we did an ethnographic-based study last year for a client who had employees across the world. We did 48 interviews. We couldn't afford to fly there (time/budget), so we used remote screen sharing and phones to do the research. We had some very interesting findings and remote studies were the only way we could have done this.


                                                      Cheers!

                                                      Todd Zaki Warfel
                                                      President, Design Researcher
                                                      Messagefirst | Designing Information. Beautifully.
                                                      ----------------------------------
                                                      Contact Info
                                                      Voice: (215) 825-7423
                                                      Email: todd@...
                                                      ----------------------------------
                                                      In theory, theory and practice are the same.
                                                      In practice, they are not.

                                                    • William Pietri
                                                      ... We took a running site and instrumented things so that we could see the raw submissions from every sign-up attempt, successful or failed. Then we let it
                                                      Message 26 of 29 , Mar 25 1:56 PM
                                                      • 0 Attachment
                                                        Todd Zaki Warfel wrote:
                                                        >
                                                        > Quick question for you, how did you find this w/the on-line study?
                                                        > What did you do to measure/find something like this using an on-line
                                                        > study? The reason I ask is that it would be nice for others to know
                                                        > the technique so they could use it to look for this when they are
                                                        > testing (reason I included the Craigslist item above).

                                                        We took a running site and instrumented things so that we could see the
                                                        raw submissions from every sign-up attempt, successful or failed. Then
                                                        we let it run for a while and sifted through the failures looking for
                                                        patterns. Some of the issues discovered involved the three-way
                                                        interaction of the user, our code, and external credit-card processors,
                                                        who have yet different ideas of what constitutes a valid address.

                                                        That definitely doesn't catch everything, as the user has to get as far
                                                        as clicking the submit button, which is why next time I try this I'd
                                                        like to do a little AJAX instrumentation, uploading a record of
                                                        keypresses, pauses, mouse movements, and the like. That still won't
                                                        catch everything, of course. Like Col. Prescott, I like to see the
                                                        whites of their eyes. And as you say, it can still leave the "why" a puzzle.


                                                        > The point is that if you start to see it in a few people in a 12
                                                        > person study, you're going to see it in hundreds or thousands with 10,000.

                                                        I'm certainly not denying that there are a lot of great issues that will
                                                        show up in 12-person study, and that those issues will show up in larger
                                                        samples. I'm concerned about the opposite case. If I see it in 100 of
                                                        10,000 (1%), then I may not see it in 1 of 12 people (8%) , and I'm even
                                                        less likely to see it in 1 of 5 (20%).

                                                        I suspect there's also a question of relative expertise. I think you
                                                        mentioned you have done hundreds of these studies, and clearly you have
                                                        spent a lot of time thinking about the usability of interfaces. The kind
                                                        and volume of issues that you can surface in a 5-person study are
                                                        probably much superior to what a self-taught designer can do in between
                                                        cranking out HTML and tweaking the JavaScript. That may explain how you
                                                        extract so much value from them.


                                                        William
                                                      • Todd Zaki Warfel
                                                        Sorry, should have been more specific. What I m interested in is exactly how you instrumented things, or what you did to capture everything so you were able
                                                        Message 27 of 29 , Mar 25 2:03 PM
                                                        • 0 Attachment
                                                          Sorry, should have been more specific. What I'm interested in is exactly how you "instrumented things," or what you did to capture everything so you were able to tell exactly what individual fields/items were the culprit. I'd love to know more about this technique as an option to use in the future. 
                                                          On Mar 25, 2008, at 4:56 PM, William Pietri wrote:
                                                          We took a running site and instrumented things so that we could see the 
                                                          raw submissions from every sign-up attempt, successful or failed.


                                                          Cheers!

                                                          Todd Zaki Warfel
                                                          President, Design Researcher
                                                          Messagefirst | Designing Information. Beautifully.
                                                          ----------------------------------
                                                          Contact Info
                                                          Voice: (215) 825-7423
                                                          Email: todd@...
                                                          ----------------------------------
                                                          In theory, theory and practice are the same.
                                                          In practice, they are not.

                                                        • William Pietri
                                                          ... Ah, I see. Sorry for the confusion. On one occasion, where credit card usage was the primary focus, the system had already been designed to record every
                                                          Message 28 of 29 , Mar 25 3:58 PM
                                                          • 0 Attachment
                                                            Todd Zaki Warfel wrote:
                                                            > Sorry, should have been more specific. What I'm interested in is
                                                            > exactly how you "instrumented things," or what you did to capture
                                                            > everything so you were able to tell exactly what individual
                                                            > fields/items were the culprit. I'd love to know more about this
                                                            > technique as an option to use in the future.

                                                            Ah, I see. Sorry for the confusion.

                                                            On one occasion, where credit card usage was the primary focus, the
                                                            system had already been designed to record every request sent to the
                                                            credit card processor, plus processor responses, so that was a rich seam
                                                            of data to mine. After various UI changes, we then monitored to make
                                                            sure that we indeed solved the problems.

                                                            On another, we found the particular place in the code to which where a
                                                            form was submitted, and logged as much information as possible,
                                                            including IP, browser info, and the details of the form submission. Then
                                                            a little perl cleaned the output enough to feed it to a business
                                                            analyst, who worked in collaboration with the designer to figure out
                                                            what particular failures meant and how to fix them.

                                                            On a third project, we started out wondering these things early, and so
                                                            had one bit of code near the heart of things that logged every bit of
                                                            user input. I don't remember any formal studies that used it, but we'd
                                                            often use it to answer some particular question, or to see what users
                                                            were up to. That was especially handy when discussing whether or not
                                                            users would really do something.

                                                            The imagined ajaxification of this would be a little more complicated,
                                                            collecting client-side events (like keypresses and mouse movements) and
                                                            state information (transaction ids, current state of forms) and
                                                            uploading them via background asynchronous requests. It'd be exciting to
                                                            dig through that data, but one of the first things I'd look at is failed
                                                            client-side validation. Another would be the amount of time and rework
                                                            for individual fields.

                                                            People who are doing advanced work in this area include Netflix and
                                                            Google. They had a great BayCHI presentation which is, alas, not up on
                                                            the web yet, but I will mention it here when it is.

                                                            Hoping that helps,

                                                            William
                                                          Your message has been successfully submitted and would be delivered to recipients shortly.