I think the image below is pretty self-explanatory. Google search data reveal our thoughts in the closet. Using Google search data, we can get new insights into the human psyche. The NYT Best-seller Everybody Lies by details exciting projects using Google Trends data to expose racism, hate speech, mental disorder and etc.
What is also exciting is that, recently Facebook released a foolproof forecasting algorithm called prophet (https://github.com/facebook/prophet). People who do not have sophisticated knowledge about time series analysis can use the tool to make some quick predictions.
We know that Google search trends data are characterized by seasonal patterns and unexpected events. See below, the search volume for “chocolatte” peaks around Christmas, New Year Day and Valentine’s Day, just as we would predict.
Search volume for chocolatte
On Google Trends page, you can download the trend data into CSV files. That’s handy, but can be cumbersome when you need to get data for multiple terms, during diffrent time periods, and from different geographic locations. Luckily, we have the R library gtrendsR to do the dirty work!
To install gtrendsR and Facebook’s prophet run:
Now, let’s load the libraries.
library(gtrendsR)
library(ggplot2)
library(prophet)
Let’s look at the search volume for “quit smoking” in the US for the past five years.
trends_us = gtrends(c("quit smoking"), geo = c("US"), gprop = "web", time = "2012-06-30 2017-06-30")[[1]]
ggplot(data = trends_us, aes(x = date, y = hits)) +
geom_line(size = 0.5, alpha = 0.7, aes(color = geo)) +
geom_point(size = 0) +
ylim(0, NA) +
theme(legend.title=element_blank(), axis.title.x = element_blank()) +
ylab("Hits'") +
ggtitle("Google Trend")
What about the global trends in the US, Canada, UK and Australia?
trends_global = gtrends(c("quit smoking"), geo = c("US","CA","GB","AU"), gprop = "web", time = "2012-06-30 2017-06-30")[[1]]
ggplot(data = trends_global, aes(x = date, y = hits, group = geo)) +
geom_line(size = 1, alpha = 0.7, aes(color = geo)) +
geom_point(size = 0) +
ylim(0, NA) +
theme(legend.title=element_blank(), axis.title.x = element_blank()) +
ylab("Hits'") +
ggtitle("Google Trends in US, Canada, UK, and Australia")
It seems that people frantically google the term after major holidays (Thanksgiving, Christmas and New Year Day). See the peak in Jan.? We might safely assume that quit smoking frequents people’s new year resolutions.
Now, we let prophet discerns patterns and make predictions.
forcase <- trends_us[,c("date","hits")]
colnames(forcase) <-c("ds","y")
m <- prophet(forcase)
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
## Initial log joint probability = -2.60747
## Optimization terminated normally:
## Convergence detected: relative gradient magnitude is below tolerance
future <- make_future_dataframe(m, periods = 365)
forecast <- predict(m, future)
plot(m, forecast)
And we can extract daily, monthly and yearly patterns.
prophet_plot_components(m, forecast)
This tutorial is developed for COMM497DB Fall 2017, taught at UMass-Amherst.
If you find this tutorial helpful and would like to use it in your projects, please acknowledge the source:
Xu, Weiai W. (2017). How to Detect Sentiments from Donald Trump’s Tweets?. Amherst, MA: http://curiositybits.com