This has been a good discussion, and it demonstrates that everyone is
a lot closer together than many thought. At the end of the day, I
think most of the apparent disagreement lies in the distinction
between a "design principle" and a "functional requirement."
Arguments that good self-service should eliminate the need for full
service, or that some applications have long queue times, or that
some agents are more competent than others -- all of the dicussion
stuff that revolves around whether users "should" or "will" press the
zero key -- is product specification and functional requirements. The
*design principle* is what to do *if* the IVR receives a zero press
(or its speech equivalent). And the answer is, "don't just ignore
it." I think all of the designers in this group and gethuman agree on
this point. If a user presses zero, it means she wants an agent. It
is not up to the designer to decide whether the user *should* want an
agent, or if the application is so good that no agent will ever be
required. Fact is, a zero press means "I want an agent."
The best machine behavior is therefore to recognize that the caller
wants an agent, and to engage in a conversation that conveys clear
options for the user to decide what to do next. Perhaps there is no
agent. Then the machine must explain that. Perhaps queues are long,
in which case the machine should not only explain that, but should
also provide any features or options that might be useful --
including callbacks, returning to the IVR, etc. Those features are
part of the spec. How to explain them to the user is the job of the
user interface design.
Similarly, there's no need to assume that a callback will be made by
a person -- automated outbound conversations are perfectly reasonable
and legitimate. The principle of announcing wait times is similarly
sensible -- whether it's to alert for "dead air" during a database
dip, or if it's a queue for an agent. The idea that a person or a
machine would ask a question that has already been answered is
certainly bad practice. Touch-tone fallback is a minimum requirement
for noisy environments -- and should always be a part of a functional
requrement. But the more interesting design principle is that a TUI
should be fully integrated to support both modalities (speech and
DTMF)at all reasonable times, with modality-swapping principles that
support user choice and that simplify user interaction. We don't
necessarily agree on all of the finer design points to accomplish
that kind of multimodality, but I think there's general agreement on
As for language, turn-taking issues such as barge-in, verbosity,
dialect and cultural implications -- the gethuman spec could be
tightened up along with some designer assistance to uncouple the spec
from the design. The goals are the same, and are obviously based on a
shared premise: effectiveness, efficiency, and satisfaction (ISO
9241). But gethuman should establish the spec without presuming the
design solution -- sometimes longer prompts shorten call time, and
sometimes turning off barge-in reduces errors and error-recovery to
the advantage of callers.
In other words, the gethuman effort is really about the proposition
that an IVR and a call center agent together represent a single
*system* that has one goal -- delivering service to a caller. I think
everyone agrees with that. Gethuman is generally right about the
spec. This vuids group is right to be a bit incensed that non-
designers can or should specify exactly *how* those specs should be
rendered. If designers can agree to accept specs from non-vuids
people, then the spec people should be willing to capitulate on
details of the design itself. Then there would be a larger community
with one common goal -- delivering service to callers.
--- In email@example.com, "simmeee" <simonie3@...> wrote:
> I did a point by point rundown on my website recently:
> Point/Counterpoint - gethuman Standard v1.0
> 1. The caller must always be able to dial 0 or to say "operator" to
> queue for a human.
> - Agreed. Systems that do not allow this option are either 10
> old or recently installed by people using 10 year old methodology.
> When IVRs began, it was thought that "containing" the caller within
> the automated system was the height of efficiency. Designers
> learned otherwise.
> 2. An accurate estimated wait-time, based on call traffic
> at the time of the call, should always be given when the caller
> arrives in the queue. A revised update should be provided
> periodically during hold time.
> - There are not wait times to talk to an automated system.
> 3. Callers should never be asked to repeat any information (name,
> full account number, description of issue, etc.) provided to a
> or an automated system during a call.
> - Agreed. Systems that do not propogate information that has been
> collected from the user have a bad codebase. This has nothing to
> with speech recognition or talking to a human vs. an automated
> system. Either the "screen pop" of information that is displayed
> the human agent isn't being properly filled out or even human to
> human, the information isn't being transferred correctly in the
> 4. When a human is not available, callers should be offered the
> option to be called back. If 24 hour service is not available, the
> caller should be able to leave a message, including a request for a
> call back the following business day. Gold Standard: Call back the
> caller at a time that they have specified.
> - Call back options have become popular recently and seem to
> callers that something is being done. However, the common problem
> with these systems so far is that noone does indeed call back. This
> is not an automated problem, but a human agent problem. They are
> keeping track of who needs to be called back and doing so in a
> manner. However, call back options are only encountered once a
> caller choose the option to speak to a human and it's after hours.
> It's never after hours for an automated system.
> 5. Speech applications should provide touch-tone (DTMF) fall-back.
> - Speech systems and DTMF systems have inherently different
> Many of the complaints about speech systems today stem from the
> that old DTMF systems were converted simply by "slapping some
> on it, which creates a very poor user interface. Having said that,
> there are advanced design methods to include a gradual progression
> prompts and help in a speech system that account for the possible
> need to "fall back".
> 6.Callers should not be forced to listen to long/verbose prompts.
> - Agreed. Long, verbose prompts are not ideal design. I can tell
> you no VUI designer puts those in. They are usually mandated by
> vendor and many arguments ensue. It takes several iterations to
> convince vendors not to use long prompts, even for legalese.
> 7. Callers should be able to interrupt prompts (via dial-through
> DTMF applications and/or via barge-in for speech applications)
> whenever doing so will enable the user to complete his task more
> - Barge-In is a design standard. Those systems not using it are
> again antiquated. I'll go one step beyond that and say that there
> are a few different kinds of barge-in and I call for what is
> to as "speak ahead", which allows a caller to barge-in with more
> one piece of data ahead of being asked for it, thus allowing the
> system to skip layers of dialog.
> 8. Do not disconnect for user errors, including when there are no
> perceived key presses (as the caller might be on a rotary phone);
> instead queue for a human operator and/or offer the choice for call-
> - The systems should never abruptly disconnect until the user is
> clearly aware of why. There has to be some limit to the amount of
> errors the system can accept, but it should generally be higher
> what you see in a lot of systems out there and there should be a
> prompt indicating to the user what caused it and why they are being
> disconnected or transferred.
> 9. Default language should be based on consumer demographics for
> organization. Primary language should be assumed with the option
> the caller to change language. (i.e. English should generally be
> assumed for the US, with a specified key for Spanish.) Gold
> Remember the caller's language preference for future calls. Gold
> Standard: Organizations should ideally support separate toll-free
> numbers for each individual language.
> - This would all be lovely. However, the vendor again dictates
> choice and a lot of personal bias can get into the choice
> unfortunately. As a designer, my choice for US systems is
> to say the initial Welcome prompt, followed by "for Spanish, say
> Espagnole" (in spanish) and proceed in English by default. For
> Canada, sometimes we welcome in both English and French, followed
> by "for French, say Francais" (in French).
> 10. All operators/representatives of the organization should be
> to communicate clearly with the caller (i.e. accents should not
> hinder communication; representatives should have excellent diction
> and enunciation.)
> - Agreed. I'll go one step further here as well and say the same
> should be expected of automated systems, whether they use text to
> speech or recorded prompts. The caller has the right to expect to
> understand a human or a system that purports to speak the same
> language. Speech recognition may actually have an advantage here,
> because data from some deployed systems shows automated systems
> understanding callers BETTER than agents and there are some solid
> reasons for that, like phones only covering a certain range of
[the rest of this thread is truncated]