Gauging Public Interest

Google trends is a website which allows you to assess the level of public interest or popularity in a particular topic across time and region by analysing search queries made through Google. The site has been useful for data professionals in the past in predicting consumer demand for particular products (e.g. ANZ bank’s build of the ANZ Housing Search Index which aims to predict Australian house prices based on Google trends data). As a source, google trends offers some of the richest data available to data analysts in relation to public interest on any given topic.

gtrendsR

There are a varied number of options available to access the data from google trends but the package gtrendsR makes it a very easy process for someone who may not know where to begin.

To begin, install the gtrendsR package and load libraries:

library(gtrendsR)
library(ggplot2)

Once you have done this, you can start to replicate what you see in google trends through R and get the relevant data for your analysis.

The key function within this package is gtrends which has the below usage, and among other parameters, allows you to specify your query on google trends through keyword searched, the geographical location of the search geo, the time period you’re looking to get the data for, the google product gprop you want to query:

#gtrends function querying for data science searches in the US, for a 5 year period to today, 
#across a number of google services, in the English language, removing any low search volumes
gtrends(keyword = "data science", geo = "US", time = "today+5-y",
  gprop = c("web", "news", "images", "froogle", "youtube"),
  category = 0, hl = "en-US", low_search_volume = FALSE,
  cookie_url = "http://trends.google.com/Cookies/NID", tz = 0,
  onlyInterest = FALSE)

Position Titles in Australia

As a working example, if you wanted to get a quick snapshot of the search trends for various position titles during the last 12 months in Australia you could use the following:

#Google search data for the position titles of data analyst, data scientist, statistician and business analyst
datascience <- gtrends(keyword = c("data analyst", "data scientist", "statistician", 
                                   "business analyst"), 
                     geo = "AU", time = "today 12-m", gprop = "web", 
                     low_search_volume = FALSE, onlyInterest = FALSE)

Trend Over Time

Depending on the specifics of your query, there are a number of aspects of this data that you may decide to explore. One of the key features of the google trends data is that it goes back to 2004, so you have the ability to explore any time period since then to analyse the interest in a topic. For the example of position titles and getting the data for the last 12 months from today:

#Select data frame from the google trends data looking at interest over time
time_trend <- datascience$interest_over_time


#The data includes date, relative searches, and keyword
head(time_trend)
##         date hits geo       time      keyword gprop category
## 1 2018-08-19   25  AU today 12-m data analyst   web        0
## 2 2018-08-26   41  AU today 12-m data analyst   web        0
## 3 2018-09-02   29  AU today 12-m data analyst   web        0
## 4 2018-09-09   23  AU today 12-m data analyst   web        0
## 5 2018-09-16   29  AU today 12-m data analyst   web        0
## 6 2018-09-23   30  AU today 12-m data analyst   web        0
#Plotting this data we can see the popularity for a particular position title through google 
#search in Australia
ggplot(data = time_trend) + aes(x = date, y = hits, color = keyword) + 
  geom_line() + scale_colour_viridis_d(option  = "viridis") + 
  labs(title = "Interest by Date", x = "Date", y = "Relative Interest") + 
  theme_minimal() + theme(legend.position = 'bottom')

You can see from the search result hits that business and data analyst are more searched in Australia then data scientists or statisticians during the last 12 months, with data analyst searches trending slightly upwards.

Interest Across Australian States and Territories

Given the data your query produces from google, you may also want to look at trends across a particular region. For example you might be interested in understanding whether there is a difference in the interest in these position titles across Australian States and Territories:

#Select data frame from the google trends data looking at interest across Australia
region_trend <- datascience$interest_by_region


#The data includes Australian States or Terrritories, relative searches, and keyword
head(region_trend)
##                       location hits      keyword geo gprop
## 1 Australian Capital Territory  100 data analyst  AU   web
## 2                     Victoria   90 data analyst  AU   web
## 3              New South Wales   87 data analyst  AU   web
## 4            Western Australia   58 data analyst  AU   web
## 5                   Queensland   48 data analyst  AU   web
## 6              South Australia   46 data analyst  AU   web
#Plotting this data we can see the popularity for these position titles across Australia
ggplot(data = region_trend) + aes(x = location, fill = keyword, weight = hits) + 
  geom_bar() + scale_fill_brewer(palette = "RdYlBu") + 
  labs(title = "Interest by Region", x = "Region", y = "Relative Interest") + 
  theme_minimal() + coord_flip()

We can see from the above search result hits that for both Tasmania and Northern Territory there was no interest in these position titles, with New South Wales and Victoria having similar profiles with regards to interest in these position titles. It is also interesting to note the unique profile of the Australian Capital Territory with interest in all position titles except data scientist.

Interest Across Australian Major Cities

Another really useful aspect of this package is to look at the data at a more granualar level when it comes to regions. For this sample query, we can investigate further the differences in searches of the position titles across Australian major cities:

#Select data frame from the google trends data looking at interest across Australian major cities
city_trend <- datascience$interest_by_city


#The data includes Australian major cities, relative searches, and keyword
head(city_trend)
##    location hits          keyword geo gprop
## 1 Melbourne  100     data analyst  AU   web
## 2    Sydney   99     data analyst  AU   web
## 3  Brisbane   53     data analyst  AU   web
## 4    Sydney  100   data scientist  AU   web
## 5 Melbourne   92   data scientist  AU   web
## 6    Sydney  100 business analyst  AU   web
#Plotting this data we can see the popularity for these position titles across Australia
ggplot(data = city_trend) +  aes(x = location, fill = keyword, weight = hits) + 
  geom_bar() + scale_fill_brewer(palette = "RdYlBu") + 
  labs(title = "Searches by Major Cities", x = "City", y = "Relative Interest") + 
  theme_minimal() + coord_flip()

Here you can see the unique search interest of Melbourne and Sydney with 3 of the 4 position titles searched, while the remaining cities showing interest only in the analyst positions. The granularity of the detail very much depends on your search, for example, if you searched Marvel superheroes, you might get data for more major regional cities (i.e. Townsville, Ballarat) which may show interest in those keywords as opposed to the position titles selected above.

Finally, while a lot of this is interesting, there are practical uses for this information given your interest in a particular subject matter. Much of this data has been, and continues to be, used in many data analytics projects across the globe. The possibility to do more with this exists for many fields so it is really useful to have it handy with this very simple R package. There may be more sophisticated ways to get this data but hopefully this is a good starting point for anyone interested.