This report provides an analysis of American Community Survey (ACS) data for the state of New Jersey, focusing on population characteristics and spatial distribution of selected variables. The analysis includes both non-spatial and spatial components.
In this section, we will analyze non-spatial ACS data related to the percentage of the population with a graduate degree in New Jersey.
We start by fetching ACS data using the tidycensus package, concentrating on the percentage of the population with a graduate degree at the county level for New Jersey.
# Load necessary libraries
library(tidycensus)
library(tidyverse)
library(plotly)
# Use get_acs() to fetch data on the percentage of the population with a graduate degree
# Specify the state of interest (i.e., New Jersey)
acs_data <- get_acs(geography = "county", variables = "DP02_0066P", state = "NJ", survey = "acs5")
# Display a summary of the fetched data
summary(acs_data)
## GEOID NAME variable estimate
## Length:21 Length:21 Length:21 Min. : 5.70
## Class :character Class :character Class :character 1st Qu.:11.60
## Mode :character Mode :character Mode :character Median :14.60
## Mean :15.56
## 3rd Qu.:19.40
## Max. :26.10
## moe
## Min. :0.4000
## 1st Qu.:0.5000
## Median :0.6000
## Mean :0.6143
## 3rd Qu.:0.7000
## Max. :1.0000
After fetching the data, we identify counties with the largest and smallest percentages of graduate degree holders in New Jersey.
# Find counties with the largest percentages of graduate degree holders
top_counties <- acs_data %>%
arrange(desc(estimate)) %>%
slice(1:5)
top_counties
## # A tibble: 5 × 5
## GEOID NAME variable estimate moe
## <chr> <chr> <chr> <dbl> <dbl>
## 1 34035 Somerset County, New Jersey DP02_0066P 26.1 0.7
## 2 34027 Morris County, New Jersey DP02_0066P 24.2 0.6
## 3 34019 Hunterdon County, New Jersey DP02_0066P 23.1 1
## 4 34003 Bergen County, New Jersey DP02_0066P 20.8 0.5
## 5 34021 Mercer County, New Jersey DP02_0066P 20.2 0.7
# Find counties with the smallest percentages of graduate degree holders
bottom_counties <- acs_data %>%
arrange(estimate) %>%
slice(1:5)
bottom_counties
## # A tibble: 5 × 5
## GEOID NAME variable estimate moe
## <chr> <chr> <chr> <dbl> <dbl>
## 1 34011 Cumberland County, New Jersey DP02_0066P 5.7 0.6
## 2 34033 Salem County, New Jersey DP02_0066P 7.8 0.9
## 3 34031 Passaic County, New Jersey DP02_0066P 9.8 0.4
## 4 34001 Atlantic County, New Jersey DP02_0066P 10.3 0.6
## 5 34029 Ocean County, New Jersey DP02_0066P 11.4 0.4
We create a margin of error plot to visualize the uncertainty in the estimates, providing insights into data reliability.
# Preprocess county names to remove "New Jersey"
acs_data$NAME <- gsub(", New Jersey", "", acs_data$NAME)
# Make a margin of error plot
# This method may not work well for some states due to the nature of the data and its distribution
plot <- ggplot(acs_data, aes(x = reorder(NAME, estimate), y = estimate, ymin = estimate - moe, ymax = estimate + moe)) +
geom_point(color = "blue") +
geom_errorbar(width = 0.2) +
coord_flip() +
labs(title = "Margin of Error for % of Population with Graduate Degree by County",
x = "County",
y = "Percentage with Graduate Degree") +
theme_minimal() +
labs(caption = "This plot visualizes the percentage of population with a graduate degree by county in New Jersey,\nalong with their margin of error. The data is sourced from the American Community Survey (ACS)\n5-Year Estimates.") +
theme(plot.caption = element_text(hjust = 0), axis.text.y = element_text())
plot
To enhance interactivity, we convert the margin of error plot into an interactive chart using the plotly package.
# Convert the plot to an interactive chart using plotly
plotly_plot <- ggplotly(plot)
plotly_plot
In this section, we’ll work with spatial ACS data and visualize it using maps.
We use the tigris package to fetch spatial ACS data on a variable of interest, then display a summary of the fetched spatial data.
# Load necessary libraries
library(tidycensus)
library(tigris)
library(mapview)
library(ggplot2)
library(dplyr)
library(viridisLite)
library(viridis)
library(sf)
# Use load_variables() to fetch information about variables
income_variables <- load_variables(2022, "acs5")
head(income_variables)
## # A tibble: 6 × 4
## name label concept geography
## <chr> <chr> <chr> <chr>
## 1 B01001A_001 Estimate!!Total: Sex by Age (Whi… tract
## 2 B01001A_002 Estimate!!Total:!!Male: Sex by Age (Whi… tract
## 3 B01001A_003 Estimate!!Total:!!Male:!!Under 5 years Sex by Age (Whi… tract
## 4 B01001A_004 Estimate!!Total:!!Male:!!5 to 9 years Sex by Age (Whi… tract
## 5 B01001A_005 Estimate!!Total:!!Male:!!10 to 14 years Sex by Age (Whi… tract
## 6 B01001A_006 Estimate!!Total:!!Male:!!15 to 17 years Sex by Age (Whi… tract
# Fetch ACS data for the variable B19101_017
filtered_data <- get_acs(
geography = "county",
variables = "B19101_017",
state = "NJ", # Fetching data for New Jersey
year = 2022
)
## Getting data from the 2018-2022 5-year ACS
# Print the filtered data
print(filtered_data)
## # A tibble: 21 × 5
## GEOID NAME variable estimate moe
## <chr> <chr> <chr> <dbl> <dbl>
## 1 34001 Atlantic County, New Jersey B19101_017 8925 683
## 2 34003 Bergen County, New Jersey B19101_017 83830 1974
## 3 34005 Burlington County, New Jersey B19101_017 28389 1193
## 4 34007 Camden County, New Jersey B19101_017 21165 877
## 5 34009 Cape May County, New Jersey B19101_017 4691 435
## 6 34011 Cumberland County, New Jersey B19101_017 3044 492
## 7 34013 Essex County, New Jersey B19101_017 45056 1336
## 8 34015 Gloucester County, New Jersey B19101_017 16234 891
## 9 34017 Hudson County, New Jersey B19101_017 35748 1401
## 10 34019 Hunterdon County, New Jersey B19101_017 13050 678
## # ℹ 11 more rows
# Load the shapefile for New Jersey counties
nj_counties <- counties(state = "NJ", cb = TRUE, year = 2022)
##
|
| | 0%
|
| | 1%
|
|= | 1%
|
|= | 2%
|
|== | 2%
|
|== | 3%
|
|=== | 4%
|
|=== | 5%
|
|==== | 5%
|
|==== | 6%
|
|===== | 7%
|
|===== | 8%
|
|====== | 8%
|
|====== | 9%
|
|======= | 10%
|
|======= | 11%
|
|======== | 11%
|
|======== | 12%
|
|========= | 12%
|
|========= | 13%
|
|========== | 14%
|
|========== | 15%
|
|=========== | 15%
|
|=========== | 16%
|
|============ | 17%
|
|============ | 18%
|
|============= | 18%
|
|============= | 19%
|
|============== | 19%
|
|============== | 20%
|
|=============== | 21%
|
|=============== | 22%
|
|================ | 22%
|
|================ | 23%
|
|================== | 26%
|
|==================== | 28%
|
|==================== | 29%
|
|===================== | 29%
|
|===================== | 30%
|
|====================== | 31%
|
|====================== | 32%
|
|======================= | 33%
|
|======================== | 34%
|
|======================== | 35%
|
|========================= | 35%
|
|========================= | 36%
|
|========================== | 37%
|
|========================== | 38%
|
|=========================== | 38%
|
|=========================== | 39%
|
|============================ | 40%
|
|============================= | 41%
|
|============================= | 42%
|
|============================== | 42%
|
|============================== | 43%
|
|=============================== | 44%
|
|=============================== | 45%
|
|================================ | 45%
|
|================================ | 46%
|
|================================= | 47%
|
|================================= | 48%
|
|================================== | 48%
|
|================================== | 49%
|
|=================================== | 49%
|
|=================================== | 50%
|
|==================================== | 51%
|
|==================================== | 52%
|
|===================================== | 52%
|
|===================================== | 53%
|
|====================================== | 54%
|
|====================================== | 55%
|
|======================================= | 55%
|
|======================================= | 56%
|
|======================================== | 57%
|
|======================================== | 58%
|
|========================================= | 58%
|
|========================================= | 59%
|
|========================================== | 59%
|
|========================================== | 60%
|
|=========================================== | 61%
|
|=========================================== | 62%
|
|============================================ | 62%
|
|============================================ | 63%
|
|============================================= | 64%
|
|============================================= | 65%
|
|============================================== | 65%
|
|============================================== | 66%
|
|=============================================== | 67%
|
|=============================================== | 68%
|
|================================================ | 68%
|
|================================================ | 69%
|
|================================================= | 70%
|
|================================================== | 71%
|
|================================================== | 72%
|
|=================================================== | 72%
|
|=================================================== | 73%
|
|==================================================== | 75%
|
|===================================================== | 76%
|
|====================================================== | 77%
|
|====================================================== | 78%
|
|======================================================= | 79%
|
|======================================================== | 80%
|
|======================================================== | 81%
|
|========================================================= | 81%
|
|========================================================= | 82%
|
|========================================================== | 83%
|
|=========================================================== | 84%
|
|=========================================================== | 85%
|
|============================================================ | 85%
|
|============================================================ | 86%
|
|============================================================= | 87%
|
|============================================================= | 88%
|
|============================================================== | 88%
|
|============================================================== | 89%
|
|=============================================================== | 90%
|
|=============================================================== | 91%
|
|================================================================ | 91%
|
|================================================================ | 92%
|
|================================================================= | 93%
|
|================================================================== | 94%
|
|================================================================== | 95%
|
|=================================================================== | 95%
|
|=================================================================== | 96%
|
|==================================================================== | 97%
|
|==================================================================== | 98%
|
|===================================================================== | 98%
|
|===================================================================== | 99%
|
|======================================================================| 100%
# Convert to sf object
nj_counties_sf <- st_as_sf(nj_counties)
# Merge filtered_data with the attribute data from the shapefile based on GEOID
merged_data <- merge(nj_counties_sf, filtered_data, by.x = "GEOID", by.y = "GEOID", all.x = TRUE)
# Convert merged data to sf object
merged_sf <- st_as_sf(merged_data)
# Use mapview to display the data interactively
map <- mapview(merged_sf, zcol = "estimate", col.regions = viridis_pal(option = "C")(5), legend = TRUE)
map
We visualize the spatial data interactively using mapview and create either a choropleth map or a graduated symbols map using ggplot, depending on the data type.
# Convert estimate to percentage
merged_sf$percentage <- (merged_sf$estimate / sum(merged_sf$estimate)) * 100
# Create a choropleth map using ggplot with viridis color palette
ggplot(merged_sf, aes(fill = percentage)) +
geom_sf() + # Specify the geometry aesthetic
scale_fill_viridis(option = "C", direction = -1) + # Use viridis color palette
labs(title = "Choropleth Map of Percentage of Household\nIncome > $200,000 in New Jersey for 2022",
fill = "Percentage of Household Income > $200,000",
x = "Longitude",
y = "Latitude") + # Update x and y axis labels
theme_minimal() +
labs(caption = "This choropleth map visualizes the percentage of household income greater than $200,000 in New Jersey\ncounties. The data is sourced from the American Community Survey (ACS) 5-Year Estimates.") +
theme(plot.caption = element_text(hjust = 0)) +
theme(plot.title = element_text(hjust = 0.5)) # Align title to the center
This report provides insights into ACS data analysis for New Jersey, covering both non-spatial and spatial aspects. The analysis offers valuable information on population characteristics and spatial distribution of selected variables, aiding in understanding socioeconomic trends in the state.