Hundreds of millions of homes today have a smart speaker providing voice-controlled access to the internet, according to one global estimate. If we add virtual assistants installed on many smartphones to this number, then many kitchen appliances, cars, that's really a lot of Alexas and Siris. Conversation is one of the basic functions of human beings, it is natural to think that all these artificial helpers are made and designed to talk exactly like human beings. While this would give us a relatable way to interact with our devices, replicating genuinely realistic human conversations is incredibly difficult. What’s more, research suggests making a machine sound human may be unnecessary and even dishonest. It may be better to rethink the whole way of communicating with machines and learn to embrace the benefits of them being a machine.
Speech technology designers often talk about the concept of “humanness”. Recent developments in the artificial voice development have yielded some results that could say that the voices of these machines have begun to blur the line or boundary between machine and human. All the research and development of these machines moving towards making the voices of the machine more human-like. But why is sounding natural or more human-like so important?
Chasing this goal of making systems sound and behave like us perhaps stems from pop culture inspirations we use to fuel the design of these systems. The idea of talking to machines has fascinated us in literature, television and film for decades, through characters such HAL 9000 in 2001: A Space Odyssey or Samantha in Her. These characters portray seamless conversations with machines. In the case of Her, there is even a love story between an operating system and its user. Critically, all these machines sound and respond the way we think humans would.
There are interesting technological challenges in trying to achieve something resembling conversations between us and machines. To this end, Amazon has recently launched the Alexa Prize, looking to “create social bots that can converse coherently and engagingly with humans on a range of current events and popular topics such as entertainment, sports, politics, technology, and fashion”. The current round of competition asks teams to produce a 20-minute conversation between one of these bots and a human interactor.
These grand challenges, like others across science, clearly advance the state of the art, bringing planned and unplanned benefits. Yet when striving to give machines the ability to truly converse with us like other human beings, we need to think about what our spoken interactions with people are actually for and whether this is the same as the type of conversation we want to have with machines. Have we grown tired of interacting with other people, so we strive for some perfect machines that will talk to us the way we want, and is it pathetic?
The fact that conversation may have ceased to serve its purpose, which is to hear each other and make compromises, solutions, move forward, understand each other, it seems that as a humanity we have started to strive only for what we want to hear, even if it comes from a high-tech voice-producing machine and an artificial assistant that creates, I am sure, the illusion of an unreal world. It seems that if we have to listen to something, then it's better to listen to an intelligent machine, even better just not to listen to other people.
Pursuing natural conversations with machines that sound like us can become an unnecessary and burdensome objective. It creates unrealistic expectations of systems that can actually communicate and understand like us. Anyone who has interacted with an Amazon Echo or Google Home knows this is not possible with existing systems.
This matters as people need to have an idea of how to get a system to do things which, because voice-only interfaces have limited buttons and visuals, are guided significantly by what the system says and how it says it. The importance of interface design means humanness itself may not only be questionable but deceptive, especially if used to fool people into thinking they are interacting with another person. Even if their intent may be to create intelligible voices, tech companies need to consider the potential impact on users.
Rather than consistently embracing humanness, we can accept that there may be fundamental limits, both technological and philosophical, to the types of interactions we can and want to have with machines.
We should be inspired by human conversations rather than using them as a perceived gold standard for interaction. For instance, looking at these systems as performers rather than human-like conversationalists, may be one way to help to create more engaging and expressive interfaces. Incorporating specific elements of conversation may be necessary for some contexts, but we need to think about whether human-like conversational interaction is necessary, rather than using it as a default design goal.
It is hard to predict what technology will be like in the future and how social perceptions will change and develop around our devices. Maybe people will be okay with having conversations with machines, becoming friends with robots and seeking their advice.