Lecture One: Mapping Public Attention with Google Trends

Vincentwx

December 30, 2018

Using Google Trends to Mapping Public Attention

Public attention
Public attention is a cognitive concept that differs from the affective concept of public opinion, denotes the time, emotion, and other mental resources citizens of a society dedicate toward thinking about a set of publicly debated issues (Newig, 2004).
Relevant Concepts: Public Interest, Public Concern, etc.
Understanding public attention on various social issues is of theoretical and practical implications

Web Queries as A Measurement of public attention
Search queries have become a valuable source of information about what issues, concerns, and desires the public is thinking about. Google Trends is a very valuable instrument that will help you capture the public attention. It shows how relative interest for any given keyword was changing starting from 2004.
Public interest towards artificial intelligence

Public interest towards artificial intelligence

In a unique manner, the search engine captures trends in interests and behavior. Hidden racisms, sexual orientation or ad returns - check out the work by Seth Stephens-Davidowitz to get some inspiration for the huge potential of Google Trends data.

This tutorial offers an example of how Google Trends data can be directly retrieved to R using the package gtrends, and how to gain insight into public attention through simple analysis

if (!require("gtrendsR")) install.packages("gtrendsR")
library (gtrendsR)

#define the search keywords
keywords <- "artificial intelligence"
#define the time window
time = "all" #"all" means get data since the beginning of Google Trends (2004)

trends = gtrends(keywords, time = time)
plot(trends)

Another important feature of Google Trends is that it allows comparing the popularity of the several keywords. For example, the graph below demonstrates that search term “Black Panther” is much more popular than “crazy Rich Asians” among people that use Google search.

#define multiple keywords
keywords = c("Black Panther","crazy Rich Asians")
#define the time window
time = ("2017-12-30 2018-12-30")
#define channels 
channel = 'web' #other channel include 'news', 'images','youtube'
#use '?gtrends' to find more description for arguments

trends = gtrends(keywords, gprop = channel, time = time)
plot(trends)

Case study: Dengue fever in Singapore

Here we try to replicate the study of Althouse, Ng, & Cummings (2011) published on PLOS: Prediction of dengue incidence using search query surveillance, with data from google trends. The popularity of using google trend in research is because of the study of Ginsberg et al (2009) published on Nature Detecting influenza epidemics using search engine query data. Generally, these studies suggested that we can detect influenza epidemics/dengue using search engine query data, which is faster and more efficient then monitoring data from hospitals. Here, we try to examine the relationship between public attention and real cases of dengue.
First, we map the google trend of “dengue” during the period of 2015 to 2018:

keywords = c("dengue")
time = ("today+5-y") #past five years
geo = "SG" 
channel = 'web'

dengue = gtrends(keywords, gprop = channel, geo = geo, time = time)

dengue_trend = dengue$interest_over_time
dengue_trend = dengue_trend[dengue_trend$date > "2015-01-01" & dengue_trend$date < "2017-12-31",] #select the data between 2015 and 2018
head(dengue_trend)
##                   date hits keyword geo gprop category
## 52 2015-01-04 01:00:00   25  dengue  SG   web        0
## 53 2015-01-11 01:00:00   25  dengue  SG   web        0
## 54 2015-01-18 01:00:00   33  dengue  SG   web        0
## 55 2015-01-25 01:00:00   27  dengue  SG   web        0
## 56 2015-02-01 01:00:00   28  dengue  SG   web        0
## 57 2015-02-08 01:00:00   23  dengue  SG   web        0

Second, we donload the following data file, which is the weekly dengue cases during the period from 2015 to 2018.
Download denguecases_sg.csv

(source: DATA.GOV.SG,NEA.GOV.SG).

Import into R and plot

dengueSG = read.csv("denguecases_sg.csv",header=T)
head(dengueSG) #the data has two columns, with the first represent the weeks, and the second represent the weekly cases of dengue in SG
##      weeks dengue_cases
## 1 2015-W01          256
## 2 2015-W02          228
## 3 2015-W03          237
## 4 2015-W04          259
## 5 2015-W05          212
## 6 2015-W06          173
par(mfrow=c(2,1))
plot(dengue_trend$date,dengue_trend$hits,type="l",main="Search Volumns of dengue (2015-2018)")
plot(dengueSG$weeks,dengueSG$dengue_cases,main="Real cases of dengue (2015-2018)")

We see Synchronous pattern between google tends and real dengue fever cases. Next we test the relationship between the volume of public attention on dengue and number of dengue fever cases:

cor.test(dengue_trend$hits,dengueSG$dengue_cases)
## 
##  Pearson's product-moment correlation
## 
## data:  dengue_trend$hits and dengueSG$dengue_cases
## t = 22.424, df = 154, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8322344 0.9073409
## sample estimates:
##       cor 
## 0.8749509

The significat and strong correlation between google trend volume and real cases consistant with the result of Althouse, Ng, & Cummings (2011): that google trend data could be an index of the outbreak of dengue fever.

Plotting the data in one step

The package allows to directly plotting the data, which is a great functionality if you just want to test some keywords.

#plot the past five years' public attention on mental health and house
plot(gtrendsR::gtrends(keyword = c("depression","anxiety","house rent"), geo = "HK", time = "today+5-y"))

#interest in movies in the past 12 months
plot(gtrendsR::gtrends(keyword = c("black panther","crazy rich asians","ant man"), geo = "SG", time = "today 12-m"))

References:

  1. Datacarrer: Analyzing Google Trends with R: Retrieve and plot with gtrendsR
  2. Seth Stephens-Davidowitz
  3. Ginsberg et al (2009). Nature. Detecting influenza epidemics using search engine query data
  4. Althouse, B.M., Ng, Y.Y., & Cummings, D.A. (2011). Prediction of dengue incidence using search query surveillance. PLoS neglected tropical diseases, 5(8), e1258.