Lesson 5 is all about census data. While I started over a year ago not knowing much about what the census does, I have learned a lot. I am now comfortable with the data.census.gov. However, it has it’s drawbacks. Since the start of this class I have started using R Studio and R to data wrangle csv files. I didn’t know how easy it was to get census data using tidycensus.
For the first part of this lesson, We will look at what percent of the population have graduate degrees by county.
One thing that I have learned is to set the variables in a different section. This way it is easy to make any necessary changes.
#c(percent_high_school = "DP02_0062P"
#Here we will add the values to the parameters for the get_acs call.
geography <- "county"
variables <- c(percent_grad ="DP02_0066P") #graduate degree
state <- "OH"
year <- 2022
Here we call the ACS 5-year estimate for 2018-2022. Since the ACS 5-year estimates were just released, it made sense to use them. Here we are getting the percent population with graduate degree for counties in Ohio.
#get the ACS Data for graduate degrees
grad_degree <- get_acs(
geography = geography,
variables = variables,
state = state,
year = 2022
)
library(dplyr)
#get the maximum value:
max_county<-grad_degree %>%
filter(estimate == max(estimate)) %>%
summarise(NAME = first(NAME), MaxPercentGrad = first(estimate))
# Print row with minimum percent_grad
min_county<-grad_degree %>%
filter(estimate == min(estimate)) %>%
summarise(NAME = first(NAME), MinPercentGrad = first(estimate))
Which counties in the selected state have the largest percentages of graduate degree holders?
The county with the highest percent of graduate students is Delaware County, Ohio with 22.6% graduate degrees.
The county with the lowest percent of graduate students is Meigs County, Ohio with 3.5% graduate degrees.
Next, create a MOE graph with your chosen state.
oh_plot_errorbar <- ggplot(grad_degree, aes(x = estimate, y = reorder(NAME, estimate))) +
geom_errorbar(aes(xmin = estimate - moe, xmax = estimate + moe), width = 0.5, linewidth = 0.5) +
geom_point(color = "#663399", size = 2) +
scale_x_continuous(labels = label_dollar()) +
scale_y_discrete(labels = function(x) str_remove(x, " County, Ohio, OH")) +
labs(title = "Percent Graduate Degrees, 2018-2022 ACS",
subtitle = "Counties in OH",
caption = "Data acquired with R and tidycensus. Error bars represent margin of error around estimates.",
x = "2018- 2022 5-Year ACS Estimate",
y = "") +
theme_minimal(base_size = 12)+
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 16), # Bold and bigger main title
plot.subtitle = element_text(hjust = 0.5), # Center-align the subtitle
plot.caption = element_text(hjust = 0.5), # Center-align the caption
axis.title.x = element_text(hjust = 0.5), # Center-align the X-axis title
axis.title.y = element_text(hjust = 0.5) # Center-align the Y-axis title
)
# Explicitly print the plot object
print(oh_plot_errorbar)
Next we will make an interactive map using plotly.
library(plotly)
ggplotly(oh_plot_errorbar, tooltip = "x")
Now we will use ggiraph as anohter way to create an interactive chart.
library(ggiraph)
oh_plot_ggiraph <- ggplot(grad_degree, aes(x = estimate,
y = reorder(NAME, estimate),
tooltip = estimate,
data_id = GEOID)) +
geom_errorbar(aes(xmin = estimate - moe, xmax = estimate + moe),
width = 0.5, size = 0.5) +
geom_point_interactive(color = "darkred", size = 1) +
scale_x_continuous(labels = label_dollar()) +
scale_y_discrete(labels = function(x) str_remove(x, " County, Ohio|, Ohio")) +
labs(title = "Percent of Population with Graduate Degrees, 2017-2021 ACS",
subtitle = "Counties in Ohio",
caption = "Data acquired with R and tidycensus. Error bars represent margin of error around estimates.",
x = "ACS estimate",
y = "") +
theme_minimal(base_size = 12)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
girafe(ggobj = oh_plot_ggiraph) %>%
girafe_options(opts_hover(css = "fill:cyan;"))
girafeOutput("plotOutputId", width = "100%", height = "600px")
#print(oh_plot_ggiraph)
Here we will get Population under 19 that doesn’t have health insurance by census tract. After that we will map the variable using Mapview.
library(mapview)
#set the variables for tidycensus get_acs
nj_under19_health <- get_acs(
geography = "tract",
variables = c(no_health_ins_u19 = "B27010_017"), # Ensure this is the correct variable you want
state = "NJ",
geometry = TRUE
)
## | | | 0% | |= | 2% | |=== | 4% | |==== | 6% | |====== | 8% | |========= | 13% | |========== | 15% | |============= | 19% | |================ | 23% | |================== | 25% | |=================== | 27% | |====================== | 32% | |======================= | 33% | |============================ | 40% | |=============================== | 44% | |================================ | 46% | |=================================== | 50% | |======================================== | 56% | |============================================= | 65% | |================================================== | 71% | |===================================================== | 75% | |================================================================ | 92% | |=================================================================== | 96% | |======================================================================| 100%
# Visualize the data
mapview(nj_under19_health, zcol = "estimate")
# create a choropleth map of the Population without health insurance that are under 19
ggplot(nj_under19_health, aes(fill = estimate)) +
geom_sf() +
theme_void() +
scale_fill_viridis_c(option = "magma") +
labs(title = "Pop Ulation No Health Insurance by Census Tract",
subtitle = "New Jersey",
fill = "ACS estimate",
caption = "2018-2022 ACS | tidycensus R package")
```