Wired for speech

I’ve long been a fan of Clifford Nass’s first book The Media Equation, which demonstrates how humans unconsciously respond to computers and other machines that they can interact with as if they were human. A later book which has only just come to my attention is Wired for Speech: How Voice Activates and Advances the Human-Computer Relationship, which Nass co-wrote with Scott Brave. The primary audience for the book is those responsible for designing automated voice interfaces, such as telephone booking systems and satnavs, which is not my field at all. I was looking for relevance to learning technologies and I wasn’t disappointed.
I’m going to list a few of the findings which struck me as useful. I’m not going to back these up with the research or even the arguments, so you’ll have to buy the book if you want to dig deeper.
The central premise is as follows:
‘As a number of experiments show, the human brain rarely makes distinctions between speaking to a machine – even those with very poor speech understanding and low-quality speech production – and speaking to a person. In fact humans use the same parts of the brain to interact with machines as they do to interact with humans.’
In other words, it doesn’t matter whether we know a voice is computer-generated or not, we will still respond to it in the same way we respond to other humans, at least at an unconscious level. And don’t tell me you’ve never felt awkward ignoring the directions given by your satnav!
‘The more similar two people are, the more positively they will be disposed towards each other.’ Implication: If you’re choosing a voice over artist, look to match to your audience wherever possible.
People will assign whatever gender, racial or other stereotypes they have in dealing with humans to a machine they perceive as having that gender, race, etc. I’m not going there, mainly because I haven’t really figured out the implication.
Multiple voices
‘When a person is confronted with a new voice cognitive load is increased.’ Currently my recommendation is to use multiple voices in webinars for much the same reason: one voice becomes boring and a new voice attracts attention. Which is why they very rarely stick to a single voice on the radio. However, the implication here is that, when attention is already high, such as when a learner is concentrating hard, don’t add to the load by bringing in new voices.
On average people prefer extrovert voices to introvert, so it’s safer to use a voice that comes over this way. Extrovert people speak quickly, loudly and with significant frequency range.
Specialists v generalists
‘Experiments have shown that the products of specialists are perceived to be better than the products of generalists, even when their contents are identical.’ All that’s needed to be convincing is for the person to be labelled (not by themselves) as a specialist. The implication for voiceovers is that it may be better to use a subject specialist than a professional voiceover artist who clearly knows nothing about the subject and is simply reading a script. (After I read this, I decided to voice a number of scripts myself rather than use a pro voice. Let’s hope it works.)
Recorded v synthetic
‘The current data strongly supports the view that recorded speech is superior to synthetic speech.’ Not surprising this.
‘Although people can certainly listen without seeing a speaker’s face, they have a clear and strong bias toward the integration of faces and voices.’ So show what the speaker looks like if you can.
An interface should speak as clearly as possible. Again, hardly surprising.
Humour that is light and not provocative seems to be consistently effective. I’m all for this, but try getting humour through your client’s thought police.
Being recorded
‘When people have a sense of being recorded they are likely to say different things and process what is said differently. The lack of a record allows people to speak with a sense of informality and plausible deniability.’ The implication here is webinars. We routinely record them, but perhaps in doing so we are constraining the dialogue. Nass’s recommendation is to minimise the signals that recording is taking place, because people will forget after a while, but whether this is ethical is debatable.
So, plenty of good stuff in here. I’d recommend you have a proper read. Particularly interesting is what Nass has to say about the brain’s ability to recognise and interpret speech, starting in the womb.

Link to original post

Leave a Reply