Voice Search is something that I’ve been interested in for quite some time.  In fact, I’ve been helping clients optimize for voice search for as long as I can remember.  Even before Google algorithm updates such as Hummingbird back in 2013.  One of the great opportunities that I had when I started out my Digital Marketing career  nearly two decades ago was working at Enquiro where we really focused on the User and understanding on how people use Search.  One of the things that I quickly noticed was the impact of long-tail keyword activity.  I remember being at a Search Conference in San Jose and conversing with Keith Hogan, who at the time worked with Ask (nee Ask Jeeves) and he shared his thoughts on how keyword queries that they were seeing were tending to be getting longer and longer.  In fact his presentation stated that 8+word queries were one of the fastest growing segments in keyword activity when it came to searches on Ask.com.

Something clicked with me when it came to focusing on long-tail keyword queries.  They tended to be more natural in speak or language.  They seemed to be more question based.  (Ask Jeeves was a search engine quite ahead of it’s time).  It made sense to me that at some point machine learning and A.I. would factor in how search activity would be impacted in the future.  Voice Search seemed like a natural progression to me and I’m surprised that it has taken this long to come mainstream.

Vice Sound Wave

Again it is probably worth repeating what the difference between Voice Search and Voice Assistance.  Voice Search is simply the ability to use vocal commands on a device to find information.  Voice Assistance (or Voice Assist) is leveraging technology or A.I. that a user can speak to and be provided with a response or outcome.  There are a lot of people making predictions when it comes to Digital trends stating that voice search is the next big thing.  Ogilvy outlined some key digital trends for 2018 stating:

In virtually every form of technology, user interfaces are evolving away from fingers and thumbs and towards less tactile forms of interaction like voice and images. Users will increasingly engage with technology in more natural and instinctive ways. Technology will have to do a better job of understanding us and all of the nuance that comes with natural language.

Not sure if this is a trend or simply what will become (and in some cases has already become) a part of our everyday life.  So for myself Voice Search is nothing new, but for a vast many others you may just be being introduced to this concept.  Hence the reason for this post.  I thought it would be great to provide some additional insight into Voice Search and how Digital Marketers and content writers might and will be impacted.

The ABCs of Voice Search

So lets take a look at some of the semantics around Voice Search.  Think of this as a bit of a semantic map around Voice Search.

AAlexa – Alexa is Amazon’s cloud-based voice service available on tens of millions of devices from Amazon and third-party device manufacturers. With Alexa, you can build natural voice experiences that offer customers a more intuitive way to interact with the technology they use every day.  More information is available at: https://developer.amazon.com/alexa

BBixby – Samsung’s virtual assistant that is voice-powered and was launched in the United States in 2017. Learn more about Bixby from Wikipedia. https://en.wikipedia.org/wiki/Bixby_(virtual_assistant)

Chatbot – a computer program that is designed to conduct a conversation with human users.  In the simplest sense, a chatbot is a computer program that is created to participate in a conversation with people online.  Chatbots can use artificial intelligence to interact with users in an online environment such as a chatbox or other chat interface. Chatbots can be powered by a series of rules and in some cases artificial intelligence.  The typical interaction with a chatbot is via a chat interface.  Examples of chatbots include: Facebook Messenger, Tacobot from Taco Bell and many forms of Live Chat options.  Learn more with our article on Using Chatbots to Enhance your SEO and Social Strategy.

D Disfluency – common breaks in language or verbal utterances such as “ah,” “hum,” etc., that disrupt the flow of fluent speech as exhibited by smart speakers when hesitating.

Echo as in Amazon Echo, an Alexa-enabled speaker you control with your voice. Echo connects to the Alexa Voice Service to play music, provide information, news, sports scores, weather, and more–instantly. As per eMarketer, in 2017, 70.6% of Americans who used a voice-enabled speaker at least once a month used an Amazon Echo.

FFar Field Voice aka Far Field Speech Recognition – speech recognition technology that is able to process speech spoken by a user from a distance to a receiving device.  Amazon’s Alexa was considered the first performing Far Field Speech Recognition device that went mainstream. Contrary to Near Field speech recognition which is technology that is used for handing spoken input from hand held mobile devices (i.e. Siri on an iPhone) that are used within inches or somewhat close proximity to a user’s voice.

G Google Home – Google’s answer to Amazon echo.  A voice-enabled speaker that can be used to play music (via Spotify), provide information, news, sports scores, weather and more.  According to eMarketer, in 2017, 23.8% of Americans who used a voice-enabled speaker at least once in a month used a Google home.  My wife purchased a Google Home smart speaker and we enjoy it although we still have a lot to teach Google.  Google Assistant is now available on 400 million devices as per Google. https://blog.google/products/assistant/how-google-home-and-google-assistant-helped-you-get-more-done-in-2017/

HHearable – the ability to hear something or the ability of a device to be audible.

– Intonation – the rise and fall of the voice in speaking.  When dealing with voice=powered devices a quiet voice command may not register with the device.

J Jibo – a personal assistant in the form of a social robot designed to interact with humans and help the family stay connected at home.  Review of Jibo from Wired.com.

K Kaldi – speech recognition software that is available as a toolkit under Apache.  Kaldi is a toolkit for speech recognition written in C++ and licensed under the Apache License v2.0. Kaldi is intended for use by speech recognition researchers.  More information at Github.com

Latent Semantic Indexing – Latent Semantic Indexing (LSI) is a mathematical method used to determine the relationship between terms and concepts in content.  LSI is an important part of keyword research and determining the semantic relationship of your content.  Search Engines such as Google, use Latent Semantic Indexing to identify semantically relevant word associations from related keywords and synonyms in order to serve up more accurate search results.  You can see this in action when you conduct a search in Google.  Google will serve up “related searches” based on what you are typing (or speaking) into their Search Engine. Hubspot has an introductory article on LSI and its use in SEO: 


Google's Related Searches for Voice Search

M Machine Learning / Machine Translation – the ability of a machine to gather, analyze and use data to produce an outcome in an efficient manner.  In the case of Machine Translation, the ability to automatically translate text from one human language to another.

NNatural Language Processing (NLP) – NLP is an area of study that focuses on the interactions between human language and computers. In essence it is a method for computers to analyze, understand and come up with a meaning from human language in a smart and useful way. As stated in this article, it is often used with large data sets and forms a bridge between computer science, Artificial Intelligence and computer-based linguistics. Natural Language Processing includes a number of tasks such as:

  • grammar parsing
  • word stemming
  • word segmentation
  • machine translation
  • natural language understanding
  • sentiment analysis
  • topic segmentation
  • speech recognition
  • discourse analysis

All very cool, but complicated initiatives. The concept of natural language processing (NLP) began in the 1950s when Alan Turing published his paper entitled “Computing Machinery and Intelligence,” where he shared his vision that a computer could conduct a conduct a conversation with a human being without the human being realizing that they were talking to a computer. The computer would be intelligent enough to carry on the conversation without the person knowing that they were talking to a machine. Wired featured an article on this topic a few years back: https://www.wired.com/insights/2014/02/growing-importance-natural-language-processing/

O Optimizing for Voice Search – the act of tailoring your content so that it can be found in voice related queries in search engines or other voice-powered devices.  Tips on How to Optimize for Voice Search – Search Engine Land

P – Pronunciation – the way in which a word is communicated or pronounced.  Pronunciation can also be referred to as the act or result of producing the sound of speech.  When it comes to voice recognition and voice-powered devices, failure to pronounce words clearly may result in the voice assistant not being able to understand the voice command that it is being given.

Q – Queries – traditionally, search activity with search engines are based on typing terms into the search engine and getting a list of results.  In recent years, search queries have become more complex and have progressed from users typing a keyword query into a search engine.  Now users are asking questions and are leveraging more long-tail type phrases that providing the voice assistant can understand them can provide a response.  In 2015, Google incorporated A.I. into their ranking algorithm with RankBrain.  Google is becoming smarter and they look at context now, which means that they can tell by your search terminology or voice commands what you really want to see.

– Resonation – in terms of voice or speech, resonation is the ability to amplify vocal sound by way of vibration.  Wikipedia describes vocal resonation with regards to human resonating chambers as: “The voice, like all acoustic instruments such as the guitar, trumpet, piano, or violin, has its own special chambers for resonating the tone. Once the tone is produced by the vibrating vocal cords, it vibrates in and through the open resonating ducts and chambers.”

SSemantic Search – also referred to as Conversational Search – the act of a user speaking into a device and that they device can respond with full sentences.  Conversational Search has natural language and semantic search built into it.  Semantic Search is the act of searching for something based on a semantic map of related items.  Semantic defined means relating to meaning in language or logic, arising from the different meanings of words, topics or symbols.  It is based on the study of meaning.  For example understanding the meaning of different word associations.

T – Text to Speech (TTS): Text to Speech technology or services attempt to understand text and natural language to generate a synthesized audio output complete with proper cadence and intonation.  TTS is technology that converts text to audio that is spoken by the system.

Tone – The actual vocal sound of a voice with regards to quality, strength and pitch. Tone can be referred to as the quality of your voice that results from the resonance of the tone that is initially produced in your larynx.

U – Utterance – also referred to as articulation with regards to the formation of clear and distinct sounds in speech.  Utterance can be described as a spoken word, statement or any vocal sound.

V – Voice recognition – the ability for technology or a device to understand a human voice and provide a response or carry out a command.  While there are a lot of emerging technologies when it comes to speech recognition, uses of voice recognition range from everything from Smart Speakers and virtual assistants to dictation tools and search engines.  As an example, did you know that Google has a free speech recognition option as part of Google Docs?  You have to be using the Google Chrome browser, but you can use this voice recognition capability to voice type in a Google doc.  Simply login to Google docs (using the Chrome browser) and click on the Tools menu and select Voice Typing.

WWake Word – the spoken word or phrase that “wakes up” an always listening device.  Google Assistant for example responds to two sets of wake words: “OK Google” is the main one or you can use “Hey Google.  Using Alexa as an example, to change your wake word in the Alexa app:

  1. Go to the menu and select settings
  2. Select your device
  3. Scroll down and select Wake Word
  4. Use the drop-down menu to select a wake word.

X – Xenoglossy – in regards to speech, Xenoglossy is the ability to speak (or write) a language that you’ve never formally learnt or acquired by natural means.  More at Wikipedia.

Y – Yodel – the process of carrying a tune with one’s voice. Yodeling is a form of singing that involves repeated and quick changes of pitch between lower register sounds and higher pitched registers.

Z Zone – the area in which a voice activated device is able to pick up audible sounds or voice commands.

New voice technology and devices are making it easier than ever for people to simply ask a question and get information from their device whether it is a smartphone, smart speaker or virtual assistant. It is not just about Siri, Echo, or Google Home, voice-powered technology is in our cars, elevators, in our homes, in restaurants, in airports and beyond.  When you hear the phrases: semantic search, conversational keywords, virtual assistants and voice-powered, what do you think about?  AI is driving a new reality as voice-powered solutions are being leveraged across numerous industries.  Voice Search is not a buzzword, it’s a new, efficient way of acquiring information and of interacting with a brand.  Devices, search engines and AI is getting smarter. Today personal assistants provide users with information and suggestions, tomorrow these same assistants may be shopping for your groceries or filing your taxes.

It’s a brave new world for sure.

Mobile Search Stats

Source: Benu Aggarwal “Optimizing Content for Voice Search and Virtual Assistants.”

Here are some additional resources on Voice Search:

The A, B, C’s of Voice Search
Tagged on:         

Leave a Reply