library(tidyverse)
library(broom)
library(plotly)
library(tidycensus) # gets census data that we can use to create maps
library(sf) # helper package for mapping
library(leaflet) # interactive mapping package
library(trendyy)
library(usdata) # this package has a conversion utility for state abbreviations to full names
census_api_key("")
- For this assignment, I chose to look at cigarette smoking rates across the US as well as US census data. The following graph is a choropleth showing each states income according to US census data.
states_leaflet <- get_acs(geography = "state", # gets state by state data
variables = "B19013_001", # this is state income
geometry = TRUE) # gets geometry (the maps)
# shift_geo = T # shifts Hawaii and Alaska
state_colors <- colorNumeric(palette = "viridis", domain = states_leaflet$estimate)
states_leaflet %>%
leaflet() %>%
addTiles() %>%
addPolygons(weight = 1,
fillColor = ~state_colors(estimate),
label = ~paste0(NAME, ", income = ", estimate),
highlight = highlightOptions(weight = 2)) %>%
setView(-95, 40, zoom = 4) %>%
addLegend(pal = state_colors, values = ~estimate)
Warning: sf layer has inconsistent datum (+proj=longlat +datum=NAD83 +no_defs).
Need '+proj=longlat +datum=WGS84'
The above choropleth shows each states income according to US census data. The lighter colored states have higher incomes whereas darker colored states have lower incomes. According to the graph, states with higher incomes include states like California, Maryland, and New York.
- Per the CDC data of cigarette smoking rates across the US, I also chose to look at hits per state for the term ‘nicotine’. Google uses “hits” as a standardized number that goes from 0-100, not a raw number of searches. The following is a graph of hits per state for the term ‘nicotine’.
nicotine <- trendy("nicotine",
geo = "US",
from = "2020-01-01", to = "2021-01-01")
nicotine_states <- nicotine %>%
get_interest_region()
nicotine_states
NA
nicotine_colors <- colorNumeric(palette = "viridis", domain = nicotine_states$hits)
states_leaflet %>%
rename(location = NAME) %>%
inner_join(nicotine_states) %>%
leaflet() %>%
addTiles() %>%
addPolygons(weight = 1,
fillColor = ~nicotine_colors(hits),
label = ~paste0(location, ", Search volume = ", hits),
highlight = highlightOptions(weight = 2)) %>%
setView(-95, 40, zoom = 4) %>%
addLegend(pal = nicotine_colors, values = ~hits)
Joining, by = "location"
Warning: sf layer has inconsistent datum (+proj=longlat +datum=NAD83 +no_defs).
Need '+proj=longlat +datum=WGS84'
The choropleth above shows hits per state for the term ‘nicotine’. The lighter states average more hits with darker states averaging less hits. According to the graph, states like Montana, West Virginia, and Idaho have a higher number of hits for ‘nicotine’ than other states.
- The following looks to see if there is a statistically significant relationship between states smoking rates and hits for the term ‘nicotine’.
cig_smoking <- read_csv("Cig_smoking_percent.csv")
nicotine_model <- lm(Cig_percent ~ hits, data = nicotine_data)
glance(nicotine_model)
tidy(nicotine_model)
nicotine_data %>%
drop_na() %>%
plot_ly(x = ~hits,
y = ~Cig_percent,
hoverinfo = "text",
text = ~paste("State: ", location, "<br>", "'Nicotine' search rate: ", hits, "<br>", "Cig smoking percent: ", Cig_percent)) %>%
add_markers(showlegend = F) %>%
add_lines(y = ~fitted(nicotine_model)) %>%
layout(title = "Relationship between google searches for 'nicotine' and smoking percent rates, by state",
xaxis = list(title = "Google search volume for 'nicotine'"),
yaxis = list(title = "State smoking rate, per capita"))
NA
NA
According to the statistics, there is a statistically significant relationship between states smoking rate and hits for the term ‘nicotine’ (p = 0.00). The positive slope on the graph shows this statistically significant relationship. The statistics and graph show that states with a higher smoking rate also google ‘nicotine’ more according to their higher hit score.
- Per the CDC data of cigarette smoking rates across the US, I also chose to look at hits per state for the term ‘chantix’. The following is a graph of hits per state for the term ‘chantix’.
chantix <- trendy("chantix",
geo = "US",
from = "2020-01-01", to = "2021-01-01")
chantix_states <- chantix %>%
get_interest_region()
chantix_states
chantix_states %>%
mutate(State = state2abbr(location)) %>%
inner_join(cig_smoking)
Joining, by = "State"
chantix_data <- chantix_states %>%
mutate(State = state2abbr(location)) %>%
inner_join(cig_smoking)
chantix_colors <- colorNumeric(palette = "viridis", domain = chantix_states$hits)
states_leaflet %>%
rename(location = NAME) %>%
inner_join(chantix_states) %>%
leaflet() %>%
addTiles() %>%
addPolygons(weight = 1,
fillColor = ~chantix_colors(hits),
label = ~paste0(location, ", Search volume = ", hits),
highlight = highlightOptions(weight = 2)) %>%
setView(-95, 40, zoom = 4) %>%
addLegend(pal = chantix_colors, values = ~hits)
Joining, by = "location"
Warning: sf layer has inconsistent datum (+proj=longlat +datum=NAD83 +no_defs).
Need '+proj=longlat +datum=WGS84'
The choropleth above shows hits per state for the term ‘chantix’. The lighter states average more hits with darker states averaging less hits. According to the graph, states like Arkansas, Idaho, and Kentucky google ‘chantix’ more than other US states.
- The following is done to see if there is a statistically significant relationship between the smoking rates per state and hits for the term ‘chantix’.
chantix_model <- lm(Cig_percent ~ hits, data = chantix_data)
glance(chantix_model)
tidy(chantix_model)
chantix_data %>%
drop_na() %>%
plot_ly(x = ~hits,
y = ~Cig_percent,
hoverinfo = "text",
text = ~paste("State: ", location, "<br>", "'Chantix' search rate: ", hits, "<br>", "Cig smoking percent: ", Cig_percent)) %>%
add_markers(showlegend = F) %>%
add_lines(y = ~fitted(chantix_model)) %>%
layout(title = "Relationship between google searches for 'Chantix' and cig smoking rates, by state",
xaxis = list(title = "Google search volume for 'Chantix'"),
yaxis = list(title = "State smoking rate, per capita"))
NA
NA
According to the statistics, there is a statistically significant relationship between states smoking rate and how much they search the term ‘chantix’ (p = 0.00). The graphs positive slope shows this statistically significant relationship. The statistics and graph show that states with a higher smoking rate also google ‘chantix’ more according to their higher hit score.
