There's Not Enough "Smart" In Smart Speakers

If you still don’t believe that smart speakers are going to change everything from how we use technology to how we will connect, please read: Hello, Voice – Smart Audio Is The Now/Next Frontier.

Personally, I’ve spent over eight months deep diving into the smart audio space. It feels like the Internet at the moment that the first web browser went online. New, exciting, and – as Kevin Kelly might call it – an inevitable. There is no doubt that all devices will be voice-enabled. Voice (as I described it in the article above) is the best user interface and best user experience for consumers (when done well). Still, with all of the current excitement and pending growth in the smart audio space, one issue that keeps coming up (and remains one of the beasts to tackle) is the content.

Lots of apps… or voice skills.

On smartphones, they are called “apps,” but on smart speakers, they are called “voice skills.” As you can imagine, there’s another “gold rush” happening right now, in terms of brands (and individuals) trying to get pole position in the smart audio space. According to MediaPost’s article, Amazon Dominates Google In Smart Speaker Voice Apps, “The number of skills for Amazon Alexa is at 80,000 globally, with 58,000 of those in the U.S. Google’s version, called Google Assistant Actions, is at 4,300, based on a new analysis by Voicebot. Interestingly, the top categories of the two platforms differ. For example, the largest Alexa skill category in the U.S. is games and trivia, accounting for more than 21% of all skills. The second position is education and reference, at about 14%.” Do you remember the early days of the app store? Lots and lots of apps, but a serious lack of quality. Smart audio is trending in a similar direction, but the issue is more problematic. The true power of smart audio is that a consumer can interact with the content. For this to work, it would mean both the ability to handle a simple volley of back and forth in a question and answer style of presentation, but it would also require the smart audio to understand multiple queries within one question (an example of this might be: “Alexa, can you tell me which Thai restaurants are open right now, have a patio and can take a reservation for 4 people at 7:30 pm?”). You would think that Alexa (or Google… or whatever service) would be able to handle that question/request, but they can’t. These smart audio systems should be able to get past the first query (“which Thai restaurants are open right now?”), but they would flunk out on the patio and reservation request. The consumer would have to break that request into three or four individual ones (and even then, the platform might struggle with getting to the right answer). This is the macro challenge of smart audio.

Getting smart audio right.

In order to provide robust answers back to consumers requires a lot of work and effort. It is not an easy task. Beyond that, in order to satiate the growth and sales of smart audio devices (from smart speakers to appliances), corners are being cut. Most of the voice skills aren’t very “smart” at all. Most of the voice skills are really just audio streams. The only real “voice” component of the voice skill is when the user invokes the voice skill to “play something.” To be clear, if I were to put my podcast, Six Pixels of Separation, on Alexa, the only real component of audio comes from the consumer asking it to play the podcast. There is no interactivity or back and forth (or layers of back and forth) after that. All you’re really doing is asking instead of swiping or typing.

That’s not smart at all.

The real winners is smart audio are just around the corner. They will be the content creators who figure out how to make engaging exchanges with consumers, so that the content is truly smart audio and not just an audio trigger to play a form of content that you can hear (or watch) anywhere else in any other online channel. That’s where the excitement is. That’s where the excitement is going to be. Maybe consumers just need to get used to triggering content with their voices, and the next logical step with be multiple interactions? It’s hard to know. For now, it feels like a lot of brands are copping out and making the smart audio opportunity much less smart, because it’s easier to jump dump whatever audio (or content) that they already have into this platform, check a box and posture like they are being innovative. Don’t forget, it took a while for apps to get good, have mass consumer adoption and be used like they are today (and some might – rightfully – argue that mobile apps aren’t even really that great yet). This is where the true opportunity is.

Companies need a voice strategy, but – more importantly – they need a smart audio strategy that’s… well… smart.