Vocal Interfaces: pros & cons

And so it is that Duplex, Google’s system that, automatically and decisively high-tech, calls the pizzeria to book a table for you, is still not completely automatic and not so much high-tech. It is not really that surprising, but actually, it is not such a bad thing!

Let us take a step backwards: at Google I/O 2018, the conference devoted to developers, Google had made quite a lot of sensation with this service demo, which “seemed” to be able to call a pizzeria, ask for information and book a table.The aim was to connect digital users to those services not yet accepting online reservations.

Praiseworthy attempt and realisation: artificial intelligence understood the conversation, answered accordingly and seemed in fact to do what it was supposed to. At the time, there were serious doubts about the demo truthfulness, but the technology was a fact, and it was incredible.

During this year, Duplex has become available in 43 USA states, proving to have a practicality that few – me included – would have foretold.

But last week a report revealed that a 25% of calls are still made by human operators, and a 15% of the remaining percentage are passed onto a flesh and blood one after the first steps. This has created a bit of agitation: accusations of cheating and many negative comments on the real system value.

To be honest, I am also a bit relieved: a system that never makes a mistake when interacting with people recalls sci-fi sceneries which give you the creeps (and the Uncanny Valley concept, moreover, is just there round the corner), and I do not consider wrong that a system should interact with humans if it needs human intervention. There are simply too many factors involved to think “any telephone call” can be managed that way.

If anything, 25% + 15% looks little to me: most of the calls still seem to be completely automatic, and this is simply amazing. Congratulations to Google.

Beyond systems created by multibillionaire multinational corporations, below some points on which to reflect us all. Let us see:

  1. Vocal conversational interfaces are doing well  
    I do not know if people try to avoid human contact, but this is not the point; the voice is an ideal tool to interact with the “digital world” when you are doing something else, your hands are busy or you are driving. And since our solutions for order collection, onroad sales or mobile CRM are aimed at Sales reps who spend a lot of their time at the wheel.. this trend is no doubt to be considered! (Please read this post which may provide you with some hints on the topic: A sales tool that works where needed: in front of the customer)
  1. The perfect vocal interface does not yet exist 
    AI – even when distinguishing between dogs and mops – cannot yet perfectly manage “any” conversation. Duplex, always talking about a well-defined field, in this case booking, which has got its “rituals” and codified phrases, even if relying on a 40% of human intervention, is anyway an incredible achievement.

Unfortunately it is not yet possible to think of creating a system “to which freely talk”; it is still necessary to design interaction well, to give users the impression of using a language which is as natural as possible and also inducing them to provide the “automatic interlocutor” with intelligible contents, for example, by using keywords. Think of Siri/Cortana/Alexa/Google Assistant: they can answer general questions in a more or less funny way, but do their best when they focus on words like “weather”, “alarm clock”, “reminder”, etc.

We must admit things are improving a lot, in fact vocal recognition, which is the first piece of the puzzle, has reached excellent levels also in Italian, and “allows” strange inflections or dialectal words, etc. much more than it used to do. An example: the voice picking system adopted by the main Coca-Cola manufacturer and distributor in the Yucatan peninsula (Mexico). Strong point: the speaker-independent type recogniser, based on Artificial Intelligence, from the Lydia Voice vocal suite, which we have chosen to adopt for our projects.

The future will then be a game of balance. We do not believe these systems will ever be able to substitute human interaction, for example in “open fields” like technical assistance. The human factor, as we have already underlined, is indispensable, both for a complete understanding of the issue, and also to create empathy with the interlocutor. Nevertheless, similar systems to these can assist operators in their work: open a ticket, represent an initial filter, or provide a minimum service level beyond working hours or telephone assistance timetables. It is necessary to think, plan, maybe make mistakes and go back to the design bench. Being in the computing trade also means this.