Let’s learn how to pull data from Google Trends in R, using a package called gtrendsR.
Refer to Google Trends data analysis in R
1. First, let’s pull trends data for keywords “オリンピック” from the last 12 months and for searches made in Japan.
# import libraries
library(gtrendsR)
library(tidyverse)
library(ggrepel)
# get searches for "オリンピック" in Japan
trends_jp <- gtrends(keyword = c("オリンピック"), time="today 12-m", geo="JP") # lists of data frames
trends_jp_df <- trends_jp$interest_over_time
tail(trends_jp_df)
## date hits keyword geo time gprop category
## 47 2021-05-02 63 オリンピック JP today 12-m web 0
## 48 2021-05-09 67 オリンピック JP today 12-m web 0
## 49 2021-05-16 57 オリンピック JP today 12-m web 0
## 50 2021-05-23 82 オリンピック JP today 12-m web 0
## 51 2021-05-30 83 オリンピック JP today 12-m web 0
## 52 2021-06-06 100 オリンピック JP today 12-m web 0
2. Let’s take another example. Pulling trends data for keywords “olympics” from the last 12 months and for searches made in Brazil. Brazilians are expected to be less interested in Olympics than in Japan.
# get searches for "olympics" in Brazil
trends_br <- gtrends(keyword = c("olympics"), time="today 12-m", geo="BR")
trends_br_df <- trends_br$interest_over_time
tail(trends_br_df)
## date hits keyword geo time gprop category
## 47 2021-05-02 33 olympics BR today 12-m web 0
## 48 2021-05-09 25 olympics BR today 12-m web 0
## 49 2021-05-16 31 olympics BR today 12-m web 0
## 50 2021-05-23 56 olympics BR today 12-m web 0
## 51 2021-05-30 57 olympics BR today 12-m web 0
## 52 2021-06-06 71 olympics BR today 12-m web 0
Creating two charts. One of them expands a part of the other with ggplot and ggrepel.
trends_jp_df %>%
top_n(5, hits) %>%
arrange(desc(hits))
## date hits keyword geo time gprop category
## 1 2021-06-06 100 オリンピック JP today 12-m web 0
## 2 2021-05-30 83 オリンピック JP today 12-m web 0
## 3 2021-05-23 82 オリンピック JP today 12-m web 0
## 4 2021-05-09 67 オリンピック JP today 12-m web 0
## 5 2021-05-02 63 オリンピック JP today 12-m web 0
trends_jp_df %>%
ggplot(aes(x = date,
y = hits, group=keyword,
color = keyword)) +
theme_bw() +
labs(title = "Google Web searches for 'Olympics' over the past year",
caption = "Obs: 3/22 was the day with the most searches",
x= NULL, y = "Interest") +
ggforce::facet_zoom(xlim = c(as.POSIXct(as.Date("2021-05-01")),as.POSIXct(as.Date("2021-06-06")))) +
geom_smooth(span=0.1,se=FALSE) + geom_vline(xintercept = as.POSIXct(as.Date("2020-10-04")),color = "red", lwd = 0.5,linetype="dashed") +
theme(legend.position = "none") +
geom_point(color="black") +
geom_label_repel(data = subset(trends_jp_df, hits == 100),
aes(label = as.character(date),),
size = 5,
box.padding = unit(0.35, "lines"),
point.padding = unit(0.3, "lines"))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning in sqrt(sum.squares/one.delta): NaNs produced
trends_br_df %>%
top_n(5, hits) %>%
arrange(desc(hits))
## date hits keyword geo time gprop category
## 1 2020-10-04 100 olympics BR today 12-m web 0
## 2 2021-06-06 71 olympics BR today 12-m web 0
## 3 2020-06-28 61 olympics BR today 12-m web 0
## 4 2021-05-30 57 olympics BR today 12-m web 0
## 5 2021-05-23 56 olympics BR today 12-m web 0
trends_br_df %>%
ggplot(aes(x = date,
y = hits, group=keyword,
color = keyword)) +
theme_bw()+
labs(title = "Google Web searches for 'Olympics' over the past year",
caption = "Obs: 3/22 was the day with the most searches",
x= NULL, y = "Interest")+
ggforce::facet_zoom(xlim = c(as.POSIXct(as.Date("2021-05-01")),as.POSIXct(as.Date("2021-06-06")))) +
geom_smooth(span=0.1,se=FALSE) + geom_vline(xintercept = as.POSIXct(as.Date("2020-10-04")),color = "red", lwd = 0.5,linetype="dashed")+
theme(legend.position = "none") +
geom_point(color="black")+
geom_label_repel(data = subset(trends_br_df, hits == 100),
aes(label = as.character(date),),
size = 5,
box.padding = unit(0.35, "lines"),
point.padding = unit(0.3, "lines"))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning in sqrt(sum.squares/one.delta): NaNs produced
To be continued.