Parsing Google Voice Search

Continuing our discussion of Google Takeouts data (available to download for your personal Google account or via warrant), we’re going to take a look at the voice searches done on a smartphone. 

If a user employs the microphone in the Google widget on their phone or into the search bar, or to use the Assistant to create a reminder, search for nearby locations,  the data is kept in a text format that is available in Takeouts data. 

The artifact parsed by my script is found within the Takeout>My Activity>Search>Myactivity.html file. This file should not be confused with what appears to be an older artifact found in Takeout>MyActivity>Voice and Audio>MyActivity.html. (Interestingly enough, there are some voice recordings in this folder.)

The search activity housed in the html file Takeout>MyActivity>Search>MyActivity.html is a collection of all Google Assistant activity, including incoming calls, tips, pop up notifications and voice searches. The data has a correlated timestamp for the activity.

The download from Google Takeouts has an html file which can be parsed with Python. Unfortunately, the HTML does not have tags that make separating the voice search data easy, however parsing the data for searches beginning with the word “Said” and then reporting out the following three lines will print most entries cleanly, along with their associated timestamp. 

The Python script will print all voice searches and their timestamps. Code is found here:

One thought on “Parsing Google Voice Search

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s