How Much Are Your Virtual Assistants Really Hearing?

2021-09-02

“Hey Siri, what’s the weather today?” “Alexa, set a timer for 20 minutes.” We’ve grown so accustomed to these conversational requests that we often don’t think twice about them. But behind the scenes, our virtual assistants are listening and recording far more than many people realize.

Just how frequently are our Echoes, Alexas, Siris and Google Home devices recording ambient sounds, private conversations, and other snippets of audio that have nothing to do with our requests? You might be surprised at the sheer volume of recordings being made within your own home on a daily basis.

In this post, we’ll take a look at recent studies and statistics that shed light on the startling amount of audio virtual assistants are capturing each day. We’ll also explore the privacy implications of having an always-on listening device and whether users are fully aware of what’s being collected. The convenience of AI comes at an unexamined cost for many consumers. Join me as we delve into the eye-opening data around how much our helpful virtual assistants are actually hearing.

Virtual assistants like Apple’s Siri, Amazon Alexa, Google Assistant, and Microsoft Cortana have become ubiquitous in many homes. These AI-powered assistants use natural language processing to understand voice commands and complete tasks like setting alarms, answering questions, playing music, and controlling smart home devices.

The global virtual assistant market is enormous and growing - there are currently over 4 billion devices with built-in assistants worldwide. Market research predicts there will be over 8 billion by 2023. Alexa and Google currently have the most market share and active users, with Amazon reporting over 100 million Alexa devices sold by the end of 2019.

Virtual assistants rely on always-on listening in order to detect their programmed wake words like “Hey Siri” or “Alexa.” Small microphones built into devices are constantly listening for these specific words. Once the wake word is detected, the device records the user’s full command and uploads it to company servers for processing and to return the requested information.

Always-on listening modes are enabled by default on most virtual assistant devices today. This allows them to begin recording as soon as you say the wake word without any manual activation needed. Companies emphasize only snippets of audio are kept until the wake word is detected - nothing is recorded or transmitted without a deliberate command.

However, always-on listening means these devices are continuously monitoring sound environments and potentially capturing more than users intend even before the wake word is spoken. We’ll explore next how much ambient sound is actually being recorded daily by virtual assistants while in waiting mode.

Recent studies reveal just how frequently our virtual assistants are recording audio in our homes:

An MIT study found Google Home devices were activated 1,568 times over a two-week period in one household - an average of over 100 times per day.
A separate study out of Northeastern University saw Amazon Alexa-enabled devices recording as many as 1,700 snippets of audio in one week in a single home.
Based on these studies, it’s estimated that the average Alexa device records about 83 audio snippets per day starting from when the wake word is detected.
With over 100 million Alexa devices out there, that’s billions of recordings happening every day around the world.

It’s not just intentional queries that get recorded. Virtual assistants also pick up ambient background noise, accidental wake word triggers, and private conversations not meant for the device:

Accidental triggers: One study found Alexa was accidentally activated over 700 times in one week in a home due to words sounding similar to “Alexa.”
Background noise: Recordings include TV sounds, music, appliances beeping, people talking in the background before a request.
Private conversations: Sensitive talk unrelated to requests can be inadvertently recorded if within range of the device.

While companies maintain only small clips are kept prior to the wake word, the amount of overall audio collected is much higher than users likely expect on a daily basis inside their homes.

While virtual assistants rely heavily on AI, companies do have human reviewers listen to some audio recordings to improve speech recognition and accuracy.

Amazon has thousands of employees around the world listening to Alexa recordings to transcribe and annotate them.
Other companies like Apple and Google also have reviewers analyze snippets of Siri and Assistant recordings.
It’s estimated tech companies review between 0.1% to 0.2% of total recordings, so about 1 out of every 500 clips.
Critics argue this violates user privacy, even if anonymized and randomized. Some lawsuits have been filed.
Tech companies argue limited human review improves product functionality and user experience overall.

So while the chances might be low, there is a possibility sensitive requests and conversations could end up being reviewed by an actual person beyond just algorithm analysis.

Despite the vast amount of recordings happening, most users are unaware:

Surveys show over 70% of smart speaker owners don’t realize recordings are reviewed by humans.
Over 90% underestimate the amount of data collected by virtual assistants.
67% were uncomfortable when told how much audio is actually captured when they use a VA.
Many falsely believe devices only record with manual activation or only after hearing the wake word.

In reality, virtual assistants are triggered far more frequently than users perceive based on always-on listening defaults and accidental activations. There is a clear gap between user assumptions and the actual reality of how much audio these devices capture daily.

The prevalence of always-on listening has concerning privacy implications:

Constant audio collection amasses detailed data profiles on individuals and households, including conversations and habits.
This data could be exploited to target users with personalized ads or content.
Virtual assistant data represents a tempting target for hackers looking to steal private information.
Lack of transparency and control over how much and which recordings are reviewed and stored long-term.
Potential for unauthorized third parties to gain access to the data.
Minimal repercussions or liability for tech companies if user data is misused or breached.

Many privacy advocates argue consumers are sacrificing personal privacy for convenience without fully understanding or consenting to the scale of data collection.

On the other side, tech companies argue:

The benefits and utility of virtual assistants outweigh privacy risks for most consumers.
They only keep and review a tiny fraction of recordings to improve products.
Data practices comply with regulations and user agreements.
No evidence of systemic data misuse or breaches.
Users willingly choose to use virtual assistants knowing they are always listening.
Recordings don’t represent a security issue since they are anonymized and encrypted.

They emphasize that always-on functionality is vital to delivering the convenience users expect. The privacy tradeoff is seen as reasonable compared to usefulness for most people.

Virtual assistants undeniably offer great utility and convenience for millions of consumers. However, their always-on listening capabilities mean our devices are recording far more audio data each day than most users realize or comfortable with.

While companies defend their practices as crucial for functionality and only retaining a small fraction of clips, the sheer volume of daily recordings represents a trove of private data largely beyond user control. With the meteoric rise of voice AIs, we need greater transparency around their data collection practices and more options to limit how much audio is stored long-term.

Achieving the right balance between AI convenience and privacy protection remains an ongoing challenge. As virtual assistants continue permeating our homes, we must fully examine their always-on listening habits and hold companies accountable for prioritizing user consent. How much are you comfortable with these devices hearing? Consumers must be informed in order to make that decision.

Deng Liu