Because the coronavirus has now been impacting the world as we know it, I decided that I wanted to try and demonstrate the peaks of panic in different countries by collecting internet search data on the term coronavirus, and terms related to the pandemic.
I will list the procedural steps ahead of each new step.
To begin with, I loaded the necessary packages needed to find my data. For this specific type of research, I needed to read in the “trendyy” package in order to find trends in internet searches.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.0 ✔ purrr 0.3.0
## ✔ tibble 2.0.1 ✔ dplyr 0.7.8
## ✔ tidyr 0.8.2 ✔ stringr 1.3.1
## ✔ readr 1.3.1 ✔ forcats 0.3.0
## ── Conflicts ────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(DT)
library(trendyy)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
The second step was simple, and I loaded the term “coronavirus” into the trendy package.This yielded raw data from the internet.
coronavirus <- trendy("coronavirus")
## Warning in system("timedatectl", intern = TRUE): running command 'timedatectl'
## had status 1
coronavirus %>%
get_interest() %>%
glimpse()
## Observations: 260
## Variables: 6
## $ date <dttm> 2015-05-03, 2015-05-10, 2015-05-17, 2015-05-24, 2015-05-31,…
## $ hits <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ keyword <chr> "coronavirus", "coronavirus", "coronavirus", "coronavirus", …
## $ geo <chr> "world", "world", "world", "world", "world", "world", "world…
## $ gprop <chr> "web", "web", "web", "web", "web", "web", "web", "web", "web…
## $ category <chr> "All categories", "All categories", "All categories", "All c…
Because the raw data is so difficult to read, my next step was to convert the raw data into a visual graph that depicts the number of Google searches for the term coronavirus. As you can see, searches in the US rose at the very beginning of 2020, and then peaked around March.
coronavirus %>%
get_interest() %>%
ggplot(aes(x = date, y = hits)) +
geom_line() +
theme_minimal() +
labs(title = "Google Searches for Coronavirus")
Because the coronavirus has only recently become so widely talked about in 2020, I thought it would be best to change the scale from years to months. Now it is easier to see that the searches peaked in March.
coronavirus %>%
get_interest() %>%
mutate(month = month(date)) %>%
group_by(month) %>%
summarize(hits_per_month = mean(hits)) %>%
ggplot(aes(x = month, y = hits_per_month)) +
geom_line() +
scale_x_discrete(limits = c(1:12))
To narrow the time frame even more, I input the dates November 1st, 2019, and marked the endpoint as the day I completed this reasearch, April 18th, 2020.
coronavirus_US <- trendy("coronavirus", geo = "US", from = "2019-11-01", to = "2020-04-18")
By adding in the line “datatable,” we now have a table that shows the number of hits per location.
coronavirus_US %>%
get_interest_dma() %>%
datatable()
Now that I have researched the peak of the coronavirus panic in the US, I wanted to compare this to another location that has also been greatly impacted. In this instance, I chose to change the geography to include both the US and Italy. As you likely know, Italy was hit incredibly hard by the virus, and thus has a higher peak in searches compared to the US.
coronavirus_countries <- trendy("coronavirus", geo = c("US", "IT"), from = "2019-11-01", to = "2020-04-18")
coronavirus_countries %>%
get_interest() %>%
mutate(month = month(date)) %>%
group_by(month, geo) %>%
summarize(hits_per_month = mean(hits)) %>%
ggplot(aes(x = month, y = hits_per_month, color = geo)) +
geom_line() +
scale_x_discrete(limits = c(1:12)) +
theme_minimal() +
labs(title = "Internet searches for 'Coronavirus' over time, by country")
Now that the term “coronavirus” has been inspected, it is time to also search for any term that carries the same meaning. I hear many people call the disease “covid,” so I selected that as my connection word. As you can see, adding the relation word did not greatly impact the graph, and looks to be almost identical.
coronavirus_covid <- trendy(c("coronavirus", "covid"), geo = "US")
coronavirus_covid %>%
get_interest() %>%
ggplot(aes(x = date, y = hits, color = keyword)) +
geom_line()
In the previous graph, the words were combined and created one line. Next, I decided to narrow the search down by limiting the searches to images only. To do this, I read in the images function, and continued to use the trendyy package as well. As you can see, there were many more searches for images using the word “coronavirus” than there were for the word “covid.”
coronavirus_covid_images <- trendy(c("coronavirus", "covid"), gprop = "images")
coronavirus_covid_images %>%
get_interest() %>%
ggplot(aes(x=date, y= hits, color = keyword))+
geom_line()+
labs(title= "'Coronavirus' vs. 'Covid' Google image searches over time")
For my last queery, I wanted to look at searches for depression compared to searches for the coronavirus. Because of state lockdowns and self-isolation, many people have spoken out about struggling with their mental health, and I thought it would be interesting to see if they peaked at any times together.
coronavirus_depression <- trendy(c("caronavirus", "depression"), geo = "US")
As you can see, there is a spike in searches for “depression” at the beginning of March, just like the peak of coronavirus searches at the same time. Correlation does not imply causation, but one could say that the peak in “depression” searches at the same time as “coronavirus” searches could be due to the conditions we are living under crrently.
coronavirus_depression %>%
get_interest() %>%
mutate(month = month(date)) %>%
group_by(month, keyword) %>%
summarize(hits_per_month = mean(hits)) %>%
ggplot(aes(x = month, y = hits_per_month, color = keyword)) +
geom_line() +
scale_x_discrete(limits = c(1:12)) +
theme_minimal() +
labs(title = "Internet searches for 'corona'and 'depression' by month")