Hello, on this report we will be working through obtaining census data via the tidycensus package. We will be looking at the data from Illinois and in particular, Knox County, which happens to be where my hometown is located.
The first thing we need to do is load the packages we will be using throughout this report.
library(tidycensus)
library(tidyverse)
library(ggplot2)
library(scales)
library(plotly)
library(mapview)
library(sf)
To access the Census we will need to use our API key. You can insert your API key in the code below.
### census_api_key("INSERT YOUR API KEY HERE", install = TRUE)
For the first part of this report we will be creating a margin of error plot for the percentage of population that has a graduate degree in each county within Illinois.
First we will get the correct data from the US census. For this , we will use the American Community Survey data.
# We use get_acs to obtain data from the American Community Survey
pop_graduate_degrees <- get_acs(
# Since we want information on the counties in Illinois, we will limit our geography to counties from the state of Illinois
geography = "county",
state = "IL",
# We will replace the census ID with a custom name to make it easier to remember
variables = c(percent_graduate = "DP02_0066P"),
year = 2021
)
We can look at the data to ensure it meets our needs.
view(pop_graduate_degrees)
To find out which counties have the highest percentages of graduate degrees we can arrange the data into a descending order based on the estimate column
arrange(pop_graduate_degrees, desc(estimate))
## # A tibble: 102 × 5
## GEOID NAME variable estimate moe
## <chr> <chr> <chr> <dbl> <dbl>
## 1 17019 Champaign County, Illinois percent_graduate 24.8 0.9
## 2 17043 DuPage County, Illinois percent_graduate 20.3 0.4
## 3 17097 Lake County, Illinois percent_graduate 19.3 0.4
## 4 17077 Jackson County, Illinois percent_graduate 17.7 1.4
## 5 17031 Cook County, Illinois percent_graduate 17 0.2
## 6 17109 McDonough County, Illinois percent_graduate 15.9 1.9
## 7 17113 McLean County, Illinois percent_graduate 15.4 0.9
## 8 17133 Monroe County, Illinois percent_graduate 14 1.7
## 9 17167 Sangamon County, Illinois percent_graduate 13.4 0.7
## 10 17143 Peoria County, Illinois percent_graduate 13.1 0.7
## # ℹ 92 more rows
To find out which counties have the lowest percentages of graduate degrees we can arrange the data into an ascending order based on the estimate column
arrange(pop_graduate_degrees, estimate)
## # A tibble: 102 × 5
## GEOID NAME variable estimate moe
## <chr> <chr> <chr> <dbl> <dbl>
## 1 17009 Brown County, Illinois percent_graduate 2.4 1
## 2 17047 Edwards County, Illinois percent_graduate 3.4 1.1
## 3 17051 Fayette County, Illinois percent_graduate 3.4 0.8
## 4 17145 Perry County, Illinois percent_graduate 3.5 0.9
## 5 17127 Massac County, Illinois percent_graduate 3.7 1.1
## 6 17087 Johnson County, Illinois percent_graduate 4.2 1
## 7 17003 Alexander County, Illinois percent_graduate 4.3 1.7
## 8 17059 Gallatin County, Illinois percent_graduate 4.3 1.5
## 9 17075 Iroquois County, Illinois percent_graduate 4.3 0.5
## 10 17069 Hardin County, Illinois percent_graduate 4.5 2.2
## # ℹ 92 more rows
We can create our margin of error plot for Illinois counties
# We use ggplot to create a plot based off of our pop_graduate_degrees dataset and assign it to the variable il_margin_error_plot
il_margin_error_plot <- ggplot(pop_graduate_degrees, aes(x = estimate, y = reorder(NAME, estimate))) +
# We use geom_errorbar to create a margin of error plot and assign on xmin and xmax values
geom_errorbar(aes(xmin = estimate - moe, xmax = estimate + moe),
width = 0.5, linewidth = 0.5) +
# We use geom_point to create the plot points and based their color off the estimate value
geom_point(aes(color = estimate), size = 2) +
# We use scale_color_gradient to assign a color gradient to our plot
scale_color_gradient(low = "gray", high = "darkblue") +
# We clean up the x and y tick labels. We assign percentages to our x labels and remove "County, Illinois or "Illinois" from the y labels
scale_x_continuous(labels = label_percent(scale = 1)) +
scale_y_discrete(labels = function(x) str_remove(x, " County, Illinois|, Illinois")) +
# Next we create our title, subtitle, caption, and labels for the x and y values
labs (title = "Percentage of Population with a Graduate Degree, 2017-2021, ACS",
subtitle = "Counties in Illinois",
caption = "Data acquired with R and tidycensus. Error bars represent margin of error around estimates.",
x = "ACS estimate",
y = "") +
# We set the theme to minimal
theme_minimal(base_size = 12)
# We call our plot to take a look at it
plot(il_margin_error_plot)
# We use ggplotly to make our margin of error plot interactive with tooltips on the x value
ggplotly(il_margin_error_plot, tooltip = "x")
The next half of this report will focus on creating a Graduate Symbols Map based on Poverty Population in Knox County, IL pulled from the US Census
We need to identify which variable in the ACS contains the information we need.
vars <- load_variables(2021, "acs5")
The poverty information we are looking is contained in variable B17001_001
Now that we know which variable contains the information we seek, we can pull the datatable assigned to the variable and limit it to the geographic area we are interested in.
# We use get_acs to pull the data and assign it to the variable poverty_pop
poverty_pop <- get_acs(
# Since we will be looking at a smaller area, census tracts make sense to use as our geographic unit
geography = "tract",
# We limit the data to the state of Illinois and Knox County
state = "IL",
county = "Knox",
# We rename the variable to poverty_number so that it is easier to remember
variables = c(poverty_number = "B17001_001"),
year = 2021,
# Setting geometry to TRUE pulls the the spatial information for the data
geometry = TRUE)
## | | | 0% | |= | 2% | |== | 2% | |== | 3% | |=== | 4% | |=== | 5% | |==== | 6% | |===== | 7% | |====== | 8% | |======= | 10% | |======== | 11% | |========= | 13% | |========== | 14% | |=========== | 15% | |=========== | 16% | |============ | 18% | |============= | 19% | |============== | 20% | |=============== | 21% | |=============== | 22% | |================ | 23% | |================= | 24% | |================== | 25% | |================== | 26% | |=================== | 27% | |==================== | 28% | |===================== | 29% | |===================== | 30% | |====================== | 31% | |======================= | 33% | |======================== | 34% | |======================== | 35% | |========================= | 36% | |========================== | 37% | |=========================== | 38% | |=========================== | 39% | |============================ | 40% | |============================= | 41% | |============================== | 42% | |============================== | 43% | |=============================== | 44% | |================================ | 45% | |================================= | 47% | |================================= | 48% | |================================== | 49% | |=================================== | 50% | |==================================== | 51% | |==================================== | 52% | |===================================== | 53% | |====================================== | 54% | |======================================= | 55% | |======================================= | 56% | |======================================== | 57% | |========================================= | 58% | |========================================== | 59% | |========================================== | 60% | |=========================================== | 62% | |============================================ | 63% | |============================================= | 64% | |============================================= | 65% | |============================================== | 66% | |=============================================== | 67% | |================================================ | 68% | |================================================ | 69% | |================================================= | 70% | |================================================== | 71% | |=================================================== | 72% | |=================================================== | 73% | |==================================================== | 74% | |===================================================== | 75% | |====================================================== | 77% | |====================================================== | 78% | |======================================================= | 79% | |======================================================== | 80% | |========================================================= | 81% | |========================================================= | 82% | |========================================================== | 83% | |=========================================================== | 84% | |============================================================ | 85% | |============================================================ | 86% | |============================================================= | 87% | |============================================================== | 88% | |=============================================================== | 89% | |=============================================================== | 91% | |================================================================ | 92% | |================================================================= | 93% | |================================================================== | 94% | |================================================================== | 95% | |=================================================================== | 96% | |==================================================================== | 97% | |===================================================================== | 98% | |===================================================================== | 99% | |======================================================================| 100%
We will create interactive map view of the data
mapview(poverty_pop, zcol = "estimate")
Lastly, we will create a Graduated Symbols Map based on the total population that lives below the poverty line for each census tract in Knox County
# Since we will be using a graduated symbols map we need to assign on centroids for each census tract. We use st_centroid to based the centroids off the poverty_pop we pulled
centroids <- st_centroid(poverty_pop)
# We use ggplot to create a plot and assign it to the variable poverty_pop_map
poverty_pop_map <- ggplot() +
# We use geom_sf to create the design of the plot. The fill for each census tract will be whit esmoke and the border colors will be dark blue
geom_sf(data = poverty_pop, fill = "whitesmoke", color = "darkblue") +
# We based the size of the centroids off the estimate value, give them a transparency of 70% in the alpha variable, and color them sea green.
geom_sf(data = centroids, aes(size = estimate), alpha = 0.7, color = "seagreen") +
# Create an empty theme
theme_void() +
# Create our title, subtitle, and legend (size) label
labs(title = "Poverty Population by Census Tract in Knox County, IL",
subtitle = "2017-2021 ACS, Knox County, Illinois",
size = "ACS Estimate") +
# We create a max size for our centroids
scale_size_area(max_size = 4)
Now we can take a look at our Graduated Symbols Map
poverty_pop_map