“Hey Stacy, can you open our sales forecast and get me last year’s retention KPIs?” Today Stacy is a flesh‑and‑blood human like you. Soon enough you could be addressing a smart speaker, a voice‑enabled chat bot, or some other app that allows you to talk your way through a process faster than you could using fingers and keyboard.
Our phones already feature voice‑enabled assistants like Siri, Google Voice and Bixby. Roughly one in six consumers already has a smart speaker, such as the Amazon Echo or Google Home, in their homes, according to a recent survey by NPR, and sales are growing as fast as smartphones did a decade ago.
Yet at work, the voice revolution can still seem far off. One deterrent is the trend towards open offices: nobody wants to be that noisy jerk who can’t stop yelling at his virtual assistant. A global survey on AI adoption found that while 84% of respondents would freely converse with Alexa or Siri at home, just 27% would do so at the office. Lastly, most enterprise‑level software involves complex interactions of objects and words, requiring mice and keyboards.
But just as smartphones and web‑based software made their way into the enterprise, so too will the conversational UI. Advances in voice recognition and synthesis have finally intersected with AI, resulting in fertile conditions for computers that can listen and talk back while handling more complex functions and tasks.
“Alexa is barely four years old, and habituating users to a voice interface is still in the early stages,” says Joe Buzzanga, chief analyst of New Jersey‑based Fivesight Research. “It’s important that consumers are experiencing voice now so they can more naturally adopt it in the office, and there are applications where voice will be the best interface.”
Employees who say they’re comfortable using voice recognition at home versus at work
Computers find a voice
Today’s voice assistants can already tackle basic administrative chores, such as transcribing calls or scheduling meetings, and even some higher‑level tasks, such as monitoring phone calls to identify high‑potential sales leads. Reaching even this basic level of accuracy and ability has taken decades of research. In part that’s because computers have historically struggled to parse human speech— which is freeform, creative, and full of idiosyncrasies.
Progress in recent years has come from machine learning, which involves feeding machines enormous amounts of speech data and teaching them to recognize patterns on their own. In 2017, Google CEO Sundar Pichai announced that the company’s voice recognition technology had reached 95% accuracy—a 20% improvement since 2013.
Andrew Ng, the former chief scientist at Chinese tech giant Baidu, has predicted that voice assistants will become ubiquitous in the workplace once they reach 99% accuracy. That last mile will be challenging. Today’s voice assistants often struggle to identify names from unfamiliar ethnic groups, or even pop song songs with “foreign” titles.
Currently, you can only string up to two commands for Google Home (for example, “Play Spotify and set volume to 10”). Google’s AI still fails at traffic updates and other combined commands. And computers still don’t speak entirely naturally: you probably won’t mistake Alexa for your friend or coworker.
“The technology is moving very fast,” says Joshua Montgomery, CEO of Mycroft, a startup that is creating an open‑source equivalent to Amazon’s Alexa. That’s because of massive investments in smart speakers, improved voice functions for phones and cars, more advanced chatbots, and so on. Mycroft has raised about $3 million in venture capital and another $800,000 in preorders on Kickstarter and Indiegogo to get its voice assistant off the ground.
At the other end of the market, Amazon and Microsoft have formed an intriguing alliance aimed at the workplace. Alexa and Cortana (Microsoft’s digital assistant) already share reciprocal features; each can be used to interact with the other platform. Both can perform basic tasks like setting meetings, managing appointments, and sending emails. And both work with Office 365, Microsoft’s suite of productivity apps.
Integrations are still fairly basic, but it’s easy to imagine a future where Cortana could, for instance, tap into the automated “Insights” functions of Excel so users can take a quick hit of data analysis without opening a spreadsheet. Other advances will likely come from overseas. Last year, Chinese web giant Baidu announced DuerOS, a proprietary conversational platform that includes more than 100 partner brands, including HTC and Nvidia.
“We’re seeing a virtuous cycle where the technology is accelerating because so many people are working on it,” says Buzzanga. “Five years from now, will we still have today’s Microsoft and Google applications with voice bolted on? I don’t know, but it’s not what I’m looking for. I think it will be something more radical.”
An AI assistant worth talking to
The most important ability of any voice assistant is what it can do with the voice commands that it receives. By this measure, digital assistants are becoming more capable every year.
The AI‑driven scheduling startup X.ai, for instance, has created an intelligent agent, Andrew or Amy, which focuses solely on calendar tasks like scheduling a meeting. While that’s still an admin chore, Andrew/Amy is capable of working with much less information than past applications. If you tell Amy to book time with a potential client on Wednesday, she understands the request and how to perform it.
Because you’re not monitoring the process (opening the calendar or jumping into the email thread with the client, for instance) she’s also making more logical jumps than computers have in the past. Even more than consumers, enterprise users will demand that this process be error‑free. “There’s a bar even higher than you’d expect,” says Dennis Mortensen, the founder of X.ai.
Advances in AI have brought this level of quality within reach. For example, Andrew helped set up the interview for this article. Asking a machine to spin up that sales forecast is also possible: Montgomery’s team at Mycroft is building an assistant that could give a voice reply to queries about the number of backers, total funding and time left in the crowd‑funding round.
“Many companies don’t realize they need a voice strategy,” Montgomery says. With AI and speech recognition both moving so fast, he warns that companies without one will fall behind.
Some companies may avoid voice tech because request data flows to servers in the cloud, which can violate corporate security policies. Startups like Mycroft, which allows companies to control their own data, may help address this concern. Other companies will address security needs by building their own voice apps.
AI developers whose apps don’t require voice are just as excited about its possibilities as voice specialists. “There will be a point where it’s normal, if not expected, that you talk to your computer,” says Mortensen.
A platform designed for growth
The strongest driver of the voice‑UI trend is perhaps the simplest: Voice remains one of the most efficient, cost‑effective ways to communicate.
“My company isn’t paying me to do email ping pong, organize my receipts, plan travels or do any number of things that are just chores,” says Mortensen. “The future is where we’re all managers. Even as a junior employee, you’ll need to figure out: what agents do I need to do my job?”
Directing your computer to organize and submit your receipts is surely cheaper than doing it on your own—and surely faster if you can say it, rather than opening and navigating through a program.
Voice may also prove to be a simpler way to learn new job skills. After all, it’s an instinctive and natural way of communicating. So, while call centers may be the first places where we’ll see the conversational UI take root, the foundation is in place for it to spread quickly through the enterprise.