Google trends is a website which allows you to assess the level of public interest or popularity in a particular topic across time and region by analysing search queries made through Google. The site has been useful for data professionals in the past in predicting consumer demand for particular products (e.g. ANZ bank’s build of the ANZ Housing Search Index which aims to predict Australian house prices based on Google trends data). As a source, google trends offers some of the richest data available to data analysts in relation to public interest on any given topic.
There are a varied number of options available to access the data from google trends but the package gtrendsR makes it a very easy process for someone who may not know where to begin.
To begin, install the gtrendsR package and load libraries:
library(gtrendsR)
library(ggplot2)
Once you have done this, you can start to replicate what you see in google trends through R and get the relevant data for your analysis.
The key function within this package is gtrends
which has the below usage, and among other parameters, allows you to specify your query on google trends through keyword
searched, the geographical location of the search geo
, the time
period you’re looking to get the data for, the google product gprop
you want to query:
#gtrends function querying for data science searches in the US, for a 5 year period to today,
#across a number of google services, in the English language, removing any low search volumes
gtrends(keyword = "data science", geo = "US", time = "today+5-y",
gprop = c("web", "news", "images", "froogle", "youtube"),
category = 0, hl = "en-US", low_search_volume = FALSE,
cookie_url = "http://trends.google.com/Cookies/NID", tz = 0,
onlyInterest = FALSE)
As a working example, if you wanted to get a quick snapshot of the search trends for various position titles during the last 12 months in Australia you could use the following:
#Google search data for the position titles of data analyst, data scientist, statistician and business analyst
datascience <- gtrends(keyword = c("data analyst", "data scientist", "statistician",
"business analyst"),
geo = "AU", time = "today 12-m", gprop = "web",
low_search_volume = FALSE, onlyInterest = FALSE)
Depending on the specifics of your query, there are a number of aspects of this data that you may decide to explore. One of the key features of the google trends data is that it goes back to 2004, so you have the ability to explore any time period since then to analyse the interest in a topic. For the example of position titles and getting the data for the last 12 months from today:
#Select data frame from the google trends data looking at interest over time
time_trend <- datascience$interest_over_time
#The data includes date, relative searches, and keyword
head(time_trend)
## date hits geo time keyword gprop category
## 1 2018-08-19 25 AU today 12-m data analyst web 0
## 2 2018-08-26 41 AU today 12-m data analyst web 0
## 3 2018-09-02 29 AU today 12-m data analyst web 0
## 4 2018-09-09 23 AU today 12-m data analyst web 0
## 5 2018-09-16 29 AU today 12-m data analyst web 0
## 6 2018-09-23 30 AU today 12-m data analyst web 0
#Plotting this data we can see the popularity for a particular position title through google
#search in Australia
ggplot(data = time_trend) + aes(x = date, y = hits, color = keyword) +
geom_line() + scale_colour_viridis_d(option = "viridis") +
labs(title = "Interest by Date", x = "Date", y = "Relative Interest") +
theme_minimal() + theme(legend.position = 'bottom')
You can see from the search result hits that business and data analyst are more searched in Australia then data scientists or statisticians during the last 12 months, with data analyst searches trending slightly upwards.
Given the data your query produces from google, you may also want to look at trends across a particular region. For example you might be interested in understanding whether there is a difference in the interest in these position titles across Australian States and Territories:
#Select data frame from the google trends data looking at interest across Australia
region_trend <- datascience$interest_by_region
#The data includes Australian States or Terrritories, relative searches, and keyword
head(region_trend)
## location hits keyword geo gprop
## 1 Australian Capital Territory 100 data analyst AU web
## 2 Victoria 90 data analyst AU web
## 3 New South Wales 87 data analyst AU web
## 4 Western Australia 58 data analyst AU web
## 5 Queensland 48 data analyst AU web
## 6 South Australia 46 data analyst AU web
#Plotting this data we can see the popularity for these position titles across Australia
ggplot(data = region_trend) + aes(x = location, fill = keyword, weight = hits) +
geom_bar() + scale_fill_brewer(palette = "RdYlBu") +
labs(title = "Interest by Region", x = "Region", y = "Relative Interest") +
theme_minimal() + coord_flip()
We can see from the above search result hits that for both Tasmania and Northern Territory there was no interest in these position titles, with New South Wales and Victoria having similar profiles with regards to interest in these position titles. It is also interesting to note the unique profile of the Australian Capital Territory with interest in all position titles except data scientist.
Another really useful aspect of this package is to look at the data at a more granualar level when it comes to regions. For this sample query, we can investigate further the differences in searches of the position titles across Australian major cities:
#Select data frame from the google trends data looking at interest across Australian major cities
city_trend <- datascience$interest_by_city
#The data includes Australian major cities, relative searches, and keyword
head(city_trend)
## location hits keyword geo gprop
## 1 Melbourne 100 data analyst AU web
## 2 Sydney 99 data analyst AU web
## 3 Brisbane 53 data analyst AU web
## 4 Sydney 100 data scientist AU web
## 5 Melbourne 92 data scientist AU web
## 6 Sydney 100 business analyst AU web
#Plotting this data we can see the popularity for these position titles across Australia
ggplot(data = city_trend) + aes(x = location, fill = keyword, weight = hits) +
geom_bar() + scale_fill_brewer(palette = "RdYlBu") +
labs(title = "Searches by Major Cities", x = "City", y = "Relative Interest") +
theme_minimal() + coord_flip()
Here you can see the unique search interest of Melbourne and Sydney with 3 of the 4 position titles searched, while the remaining cities showing interest only in the analyst positions. The granularity of the detail very much depends on your search, for example, if you searched Marvel superheroes, you might get data for more major regional cities (i.e. Townsville, Ballarat) which may show interest in those keywords as opposed to the position titles selected above.
Finally, while a lot of this is interesting, there are practical uses for this information given your interest in a particular subject matter. Much of this data has been, and continues to be, used in many data analytics projects across the globe. The possibility to do more with this exists for many fields so it is really useful to have it handy with this very simple R package. There may be more sophisticated ways to get this data but hopefully this is a good starting point for anyone interested.