Voice enablement of new applications will be nearly universal by 2023. Tools train users how to operate them by virtue of the results they deliver. Remember how you had to take a course on how to pinch and swipe on your mobile phone?

Of course, you didn’t! You probably watched someone else do it, imitated their behavior, and were delighted when the action worked.

The first hammer you used taught you things about hammering. Applications exploiting the dominant conversational platforms (from the likes of Amazon, Apple, and Google) will teach you when and how to converse to get what you want from the applications.

Virtually all technologies will be programmed to handle natural language inputs and outputs in written and spoken form, to do natural language processing, and process explicit and implicit intents by following specific rules defined by developers.

In the past, technologies implicitly trained us on command-line, keyboard, graphical, and touch interfaces. None of these were intuitive. We got accustomed to them, and their usage became reflexive as the technology forced us to learn productive behavior patterns.

Everything will get a voice (or an ear or both)

Cars, TVs, phones, watches, speakers, security systems, household appliances, and personal headsets, among other things, already converse with us. And more will shortly. Also, consider:

    • Email and search applications predict the next word we might type and offer to insert it.
    • Transcription tools automatically convert radiologists’ recordings into electronic health record entries and ease salespeople’s typing burden by transcribing spoken words into CRM system entries
    • Internal IT help desk systems are already making available at least some limited automated conversational features
    • Most commercial web sites either offer or are considering providing chatbot-assistance for users
Significant limitations exist

Conversational technology has a very long way to go to reach its user-satisfaction potential. Despite its shortcomings, 45 percent of Millenials are already using voice assistants while shopping[1]. Voice will not replace keyboards, mice, and other interface devices this decade.

Conversational systems sometimes appear to understand. Don’t be fooled. The technology doesn’t understand anything. NLU – natural language understanding – is an illusion today. The programmers who set up and train natural language processing systems are getting better at ‘faking it,’ making it appear as though it understands. The technologies have gone far enough that we’re now adapting and asking them for an ever-expanding swath of simple tasks. Business chatbots continue to annoy us but sometimes deliver valuable services more quickly than people do. The best also expedite requests for a human operator quickly — but not too quickly, just in case they can handle our needs without resorting to putting a person on the line.

Humans adapt! We are very good at it. The technology is getting better. Unlike machines, people are learning how (and where) to use conversational technology.

By 2023, voice enablement of applications will be nearly universal.

Actions

Start planning now. Speech matters. Speech to text, intent determination, text to speech, language translation, word and phrase prediction and semantic analysis, to name a few conversationally-related categories, are all improving, and belong in the plans you are implementing — or drawing up — this year.

This impacts technology acquisition and development plans, work environments, user equipment, skills training, and user support strategies. User organizations should seek to minimize the number of disparate conversational technologies in use inside the enterprise while simultaneously supporting a broad range of conversational technologies that customers, partners, and suppliers may be using.

[1] https://voicebot.ai/2019/03/20/45-of-millennials-use-voice-assistants-while-shopping-according-to-a-new-study/