Google is investigating the source of voice data leak, plans to update its privacy policies

Google has responded to a report this week from Belgian public broadcaster VRT NWS, which revealed that contractors were given access to Google Assistant voice recordings, including those which contained sensitive information — like addresses, conversations between parents and children, business calls and others containing all sorts of private information. As a result of the report, Google says it’s now preparing to investigate and take action against the contractor who leaked this information to the news outlet.

The company, by way of a blog post, explained that it partners with language experts around the world who review and transcribe a “small set of queries” to help Google better understand various languages.

Only around 0.2% of all audio snippets are reviewed by language experts, and these snippets are not associated with Google accounts during the review process, the company says. Other background conversations or noises are not supposed to be transcribed.

The leaker had listened to more than 1,000 recordings, and found 153 were accidental in nature — meaning, it was clear the user hadn’t intended to ask for Google’s help. In addition, the report found that determining a user’s identity was often possible because the recordings themselves would reveal personal details. Some of the recordings contained highly sensitive information, like “bedroom conversations,” medical inquiries or people in what appeared to be domestic violence situations, to name a few.

Google defended the transcription process as being a necessary part of providing voice assistant technologies to its international users.

But instead of focusing on its lack of transparency with consumers over who’s really listening to their voice data, Google says it’s going after the leaker themselves.

“[Transcription] is a critical part of the process of building speech technology, and is necessary to creating products like the Google Assistant,” writes David Monsees, product manager for Search at Google, in the blog post. “We just learned that one of these language reviewers has violated our data security policies by leaking confidential Dutch audio data. Our Security and Privacy Response teams have been activated on this issue, are investigating, and we will take action. We are conducting a full review of our safeguards in this space to prevent misconduct like this from happening again,” he said.

As voice assistant devices are becoming a more common part of consumers’ everyday lives, there’s increased scrutiny on how tech companies are handling the voice recordings, who’s listening on the other end, what records are being stored and for how long, among other things.

This is not an issue that only Google is facing.

Earlier this month, Amazon responded to a U.S. senator’s inquiry over how it was handling consumers’ voice records. The inquiry had followed a CNET investigation that discovered Alexa recordings were kept unless manually deleted by users, and that some voice transcripts were never deleted. In addition, a Bloomberg report recently found that Amazon workers and contractors during the review process had access to the recordings, as well as an account number, the user’s first name and the device’s serial number.

Further, a coalition of consumer privacy groups recently lodged a complaint with the U.S. Federal Trade Commission that claims Amazon Alexa is violating the U.S. Children’s Online Privacy Protection Act (COPPA) by failing to obtain proper consent over the company’s use of the kids’ data.

Neither Amazon nor Google have gone out of their way to alert consumers as to how the voice recordings are being used.

As Wired notes, the Google Home privacy policy doesn’t disclose that Google is using contract labor to review or transcribe audio recordings. The policy also says that data only leaves the device when the wake word is detected. But these leaked recordings indicate that’s clearly not true — the devices accidentally record voice data at times.

The issues around the lack of disclosure and transparency could be yet another signal to U.S. regulators that tech companies aren’t able to make responsible decisions on their own when it comes to consumer data privacy.

The timing of the news isn’t great for Google. According to reports, the U.S. Department of Justice is preparing for a possible antitrust investigation of Google’s business practices, and is watching the company’s behavior closely. Given this increased scrutiny, one would think Google would be going over its privacy policies with a fine-toothed comb — especially in areas that are newly coming under fire, like policies around consumers’ voice data — to ensure that consumers understand how their data is being stored, shared and used.

Google also notes today that people do have a way to opt-out of having their audio data stored. Users can either turn off audio data storage entirely, or choose to have the data auto-delete every three months or every 18 months.

The company also says it will work to better explain how this voice data is used going forward.

“We’re always working to improve how we explain our settings and privacy practices to people, and will be reviewing opportunities to further clarify how data is used to improve speech technology,” said Monsees.

Responses are currently closed, but you can trackback from your own site.

Comments are closed.