Kaitlin Kavlie PSYC-541: Independent Project
First I ran the tidyverse, DT, trendyy, and lubridate packages before extracting and analyzing data.
library(tidyverse)
library(DT)
library(trendyy)
library(lubridate)
I created a data set called suicide examining Google searches using trendy for the word ‘suicide’ specifically in the United States.
suicide <- trendy(c("suicide"), geo = "US")
suicide %>%
glimpse()
List of 1
$ :List of 7
..$ interest_over_time :'data.frame': 261 obs. of 7 variables:
.. ..$ date : POSIXct[1:261], format: "2017-04-16" "2017-04-23" "2017-04-30" "2017-05-07" ...
.. ..$ hits : int [1:261] 43 38 30 29 32 30 26 24 25 24 ...
.. ..$ keyword : chr [1:261] "suicide" "suicide" "suicide" "suicide" ...
.. ..$ geo : chr [1:261] "US" "US" "US" "US" ...
.. ..$ time : chr [1:261] "today+5-y" "today+5-y" "today+5-y" "today+5-y" ...
.. ..$ gprop : chr [1:261] "web" "web" "web" "web" ...
.. ..$ category: int [1:261] 0 0 0 0 0 0 0 0 0 0 ...
..$ interest_by_country: NULL
..$ interest_by_region :'data.frame': 51 obs. of 5 variables:
.. ..$ location: chr [1:51] "Utah" "Alaska" "Colorado" "Idaho" ...
.. ..$ hits : int [1:51] 100 96 94 93 93 92 90 90 88 88 ...
.. ..$ keyword : chr [1:51] "suicide" "suicide" "suicide" "suicide" ...
.. ..$ geo : chr [1:51] "US" "US" "US" "US" ...
.. ..$ gprop : chr [1:51] "web" "web" "web" "web" ...
..$ interest_by_dma :'data.frame': 210 obs. of 5 variables:
.. ..$ location: chr [1:210] "Glendive MT" "Colorado Springs-Pueblo CO" "Salt Lake City UT" "Spokane WA" ...
.. ..$ hits : int [1:210] 100 97 94 90 87 87 87 86 85 85 ...
.. ..$ keyword : chr [1:210] "suicide" "suicide" "suicide" "suicide" ...
.. ..$ geo : chr [1:210] "US" "US" "US" "US" ...
.. ..$ gprop : chr [1:210] "web" "web" "web" "web" ...
..$ interest_by_city :'data.frame': 200 obs. of 5 variables:
.. ..$ location: chr [1:200] "El Segundo" "Oxford" "Joint Base Lewis-McChord" "Fairbanks" ...
.. ..$ hits : int [1:200] NA NA NA NA NA NA NA NA NA NA ...
.. ..$ keyword : chr [1:200] "suicide" "suicide" "suicide" "suicide" ...
.. ..$ geo : chr [1:200] "US" "US" "US" "US" ...
.. ..$ gprop : chr [1:200] "web" "web" "web" "web" ...
..$ related_topics :'data.frame': 46 obs. of 6 variables:
.. ..$ subject : chr [1:46] "100" "28" "25" "20" ...
.. ..$ related_topics: chr [1:46] "top" "top" "top" "top" ...
.. ..$ value : chr [1:46] "Suicide" "Suicide Squad" "The Suicide Squad" "Suicide Squad" ...
.. ..$ geo : chr [1:46] "US" "US" "US" "US" ...
.. ..$ keyword : chr [1:46] "suicide" "suicide" "suicide" "suicide" ...
.. ..$ category : int [1:46] 0 0 0 0 0 0 0 0 0 0 ...
.. ..- attr(*, "reshapeLong")=List of 4
..$ related_queries :'data.frame': 50 obs. of 6 variables:
.. ..$ subject : chr [1:50] "100" "21" "19" "19" ...
.. ..$ related_queries: chr [1:50] "top" "top" "top" "top" ...
.. ..$ value : chr [1:50] "suicide squad" "commit suicide" "the suicide squad" "suicide hotline" ...
.. ..$ geo : chr [1:50] "US" "US" "US" "US" ...
.. ..$ keyword : chr [1:50] "suicide" "suicide" "suicide" "suicide" ...
.. ..$ category : int [1:50] 0 0 0 0 0 0 0 0 0 0 ...
.. ..- attr(*, "reshapeLong")=List of 4
..- attr(*, "class")= chr [1:2] "gtrends" "list"
- attr(*, "class")= chr "trendy"
Then I created a table examining Google searches for the term ‘suicide’, organized by month. Hits_per_month describes how popular or frequently the term was searched each month. It appears the month of May was the 5th highest month for Google searching the term.
suicide %>%
get_interest() %>%
mutate(month = month(date)) %>%
group_by(month) %>%
summarize(hits_per_month = mean(hits)) %>%
datatable(options = list(pageLength = 12)) %>%
formatRound(2, 2)
Then I made a line graph with the month as the x variable, which shows the most popular month for the term suicide to be searched on Google is August and the lowest month is November. The month of May, as shown above, appears to be the 5th highest month for Google searching the term.
suicide %>%
get_interest() %>%
mutate(month = month(date)) %>%
group_by(month) %>%
summarize(hits_per_month = mean(hits)) %>%
ggplot(aes(x = month, y = hits_per_month)) +
geom_line() +
scale_x_discrete(limits = c(1:12)) +
theme_minimal() +
labs(title = "Google searches for suicide by month")
Warning: Continuous limits supplied to discrete scale.
Did you mean `limits = factor(...)` or `scale_*_continuous()`?
I created a new data set for a different term called suicide_rate examining Google searches using trendy for the term ‘suicide rate’ specifically in the United States.
suicide_rate <- trendy(c("suicide rate"), geo = "US")
Then I created a datatable examining Google searches for the term ‘suicide rate’, organized by month. Hits_per_month describes how popular or frequently the term was searched each month. It appears the month of May was the 7th highest month for Google searching the term.
suicide_rate %>%
get_interest() %>%
mutate(month = month(date)) %>%
group_by(month) %>%
summarize(hits_per_month = mean(hits)) %>%
datatable(options = list(pageLength = 12)) %>%
formatRound(2, 2)
Then I made a line graph examining the hits per month for the term suicide rate to be searched on Google. The month of May, as shown above, appears to be the 7th highest month for Google searching the term.
suicide_rate %>%
get_interest() %>%
mutate(month = month(date)) %>%
group_by(month) %>%
summarize(hits_per_month = mean(hits)) %>%
ggplot(aes(x = month, y = hits_per_month)) +
geom_line() +
scale_x_discrete(limits = c(1:12)) +
theme_minimal() +
labs(title = "Google searches for suicide rate by month")
Warning: Continuous limits supplied to discrete scale.
Did you mean `limits = factor(...)` or `scale_*_continuous()`?
I created a third data set called mental_health examining Google searches using trendy for the term ‘mental health’ specifically in the United States.
mental_health <- trendy(c("mental health"), geo = "US")
Then I created a datatable examining Google searches for the term ‘mental health’, organized by month. Hits_per_month describes how popular or frequently the term was searched each month. It appears the month of May was tied for the 3rd most popular month for Google searching the term.
mental_health %>%
get_interest() %>%
mutate(month = month(date)) %>%
group_by(month) %>%
summarize(hits_per_month = mean(hits)) %>%
datatable(options = list(pageLength = 12)) %>%
formatRound(2, 2)
Then I made a line graph examining the hits per month for the term mental health to be searched on Google. The month of May, as shown above, appears to be tied for the 3rd most popular month for Google searching the term.
mental_health %>%
get_interest() %>%
mutate(month = month(date)) %>%
group_by(month) %>%
summarize(hits_per_month = mean(hits)) %>%
ggplot(aes(x = month, y = hits_per_month)) +
geom_line() +
scale_x_discrete(limits = c(1:12)) +
theme_minimal() +
labs(title = "Google searches for mental health by month")
Warning: Continuous limits supplied to discrete scale.
Did you mean `limits = factor(...)` or `scale_*_continuous()`?
I created a fourth and final data set called suicide_every examining Google searches using trendy for all three of the terms ‘suicide’, ‘suicide rate’, and ‘mental health’ in the United States.
suicide_every <- trendy(c("suicide", "suicide rate", "mental health"), geo = "US")
Then I created a line graph for the data set suicide_every, comparing the search trends of the three terms by the average hits per month.
suicide_every %>%
get_interest() %>%
mutate(month = month(date)) %>%
group_by(month, keyword) %>%
summarize(hits_per_month = mean(hits)) %>%
ggplot(aes(x = month, y = hits_per_month, color = keyword)) +
geom_line() +
scale_x_discrete(limits = c(1:12)) +
theme_minimal() +
labs(title = "Internet searches for suicide, suicide rate, and mental health by month")
`summarise()` has grouped output by 'month'. You can override using the `.groups` argument.
Warning: Continuous limits supplied to discrete scale.
Did you mean `limits = factor(...)` or `scale_*_continuous()`?