In this analysis, I have focused on West Virginia (WV), my home state in Appalachia. It is a place that has great personal meaning to me as most of my family still resides there and is a vivid example of how education can change lives. My goal is to examine the state’s educational framework, focusing particularly on tertiary and vocational skills and how they relate to economic prosperity.
West Virginia is a fascinating example of examining the impact of educational attainment on economic outcomes. Using data from the American Community Survey (ACS), I aim to uncover patterns and inequalities that could inform policy and growth strategies. The ACS is used because it provides the most current and detailed demographic data for all communities in the U.S. on an annual basis. This includes a wealth of information on educational attainment, income, occupation and housing characteristics. For researchers, the ACS is an indispensable tool for understanding a community’s needs, developing ideas for allocating resources, planning services, and assisting in policy development based on current social, economic, and housing conditions of the population.
I have meticulously prepared the ACS data to understand the educational attainment and median household income in each county to get a comprehensive picture of the socioeconomic status of our state.
To gather relevant data, I used the tidycensus package, focusing on educational attainment and median income by county. The inclusion of libraries such as tidyverse and ggplot2 supported the analysis and visualization.
# Ensure all required libraries are loaded
required_packages <- c("tidycensus", "tidyverse", "ggplot2", "plotly", "sf", "mapview","ggiraph")
new_packages <- required_packages[!required_packages %in% installed.packages()[,"Package"]]
if(length(new_packages)) install.packages(new_packages)
# Load libraries
library(tidycensus)
library(tidyverse)
library(ggplot2)
library(plotly)
library(sf)
library(mapview)
library(ggiraph)
Obtaining a Census API key was crucial to accessing the detailed data that formed the basis of this research.
Now to use the load_variables() function to examine the available variables, I need to specify the year and the data set I’m interested in.
# Load the variables for 2021 ACS 5-year estimates
variables <- load_variables(year = 2021, dataset = "acs5", cache = TRUE)
# Print the loaded variables
print(variables)
## # A tibble: 27,886 × 4
## name label concept geography
## <chr> <chr> <chr> <chr>
## 1 B01001A_001 Estimate!!Total: SEX BY AGE (W… tract
## 2 B01001A_002 Estimate!!Total:!!Male: SEX BY AGE (W… tract
## 3 B01001A_003 Estimate!!Total:!!Male:!!Under 5 years SEX BY AGE (W… tract
## 4 B01001A_004 Estimate!!Total:!!Male:!!5 to 9 years SEX BY AGE (W… tract
## 5 B01001A_005 Estimate!!Total:!!Male:!!10 to 14 years SEX BY AGE (W… tract
## 6 B01001A_006 Estimate!!Total:!!Male:!!15 to 17 years SEX BY AGE (W… tract
## 7 B01001A_007 Estimate!!Total:!!Male:!!18 and 19 years SEX BY AGE (W… tract
## 8 B01001A_008 Estimate!!Total:!!Male:!!20 to 24 years SEX BY AGE (W… tract
## 9 B01001A_009 Estimate!!Total:!!Male:!!25 to 29 years SEX BY AGE (W… tract
## 10 B01001A_010 Estimate!!Total:!!Male:!!30 to 34 years SEX BY AGE (W… tract
## # ℹ 27,876 more rows
# Filter for variables related to education
education_variables <- variables %>%
filter(grepl("education", label, ignore.case = TRUE))
# Print the filtered variables related to education
print(education_variables)
## # A tibble: 171 × 4
## name label concept geography
## <chr> <chr> <chr> <chr>
## 1 B08126_011 Estimate!!Total:!!Educational services, and hea… MEANS … tract
## 2 B08126_026 Estimate!!Total:!!Car, truck, or van - drove al… MEANS … tract
## 3 B08126_041 Estimate!!Total:!!Car, truck, or van - carpoole… MEANS … tract
## 4 B08126_056 Estimate!!Total:!!Public transportation (exclud… MEANS … tract
## 5 B08126_071 Estimate!!Total:!!Walked:!!Educational services… MEANS … tract
## 6 B08126_086 Estimate!!Total:!!Taxicab, motorcycle, bicycle,… MEANS … tract
## 7 B08126_101 Estimate!!Total:!!Worked from home:!!Educationa… MEANS … tract
## 8 B08526_011 Estimate!!Total:!!Educational services, and hea… MEANS … county
## 9 B08526_026 Estimate!!Total:!!Car, truck, or van - drove al… MEANS … county
## 10 B08526_041 Estimate!!Total:!!Car, truck, or van - carpoole… MEANS … county
## # ℹ 161 more rows
Preparing and cleaning the data allowed me to examine educational trends and economic conditions throughout West Virginia. My analysis aims to highlight the areas with high educational attainment and identify the lagging areas.
# Retrieve educational attainment data
education_data <- get_acs(geography = "county", variables = "DP02_0066PE", state = "WV", year = 2021)
# Retrieve median household income data
income_data <- get_acs(geography = "county", variables = "B19013_001E", state = "WV", year = 2019, geometry = TRUE)
# Cleaning and preparing data for analysis
education_data <- education_data %>% rename(County = NAME, GradDegreePercent = estimate) %>% drop_na()
income_data$geometry <- st_transform(income_data$geometry, crs = 4326) # Transform geometry to standard projection
With the processed data, I can now examine education and economic metrics across West Virginia to gain insights and inform policy discussions.
I begin by examining the distribution of college graduates across counties and identify areas with high educational attainment and those facing major challenges. Through this analysis, I aim to highlight educational disparities and opportunities for improvement.
Identify counties that are characterized by remarkable performance and those that struggle with inequities. This analysis sheds light on inequities in education and hopes to identify ways to intervene.
# Extract the top 5 counties with the highest percentage of individuals with graduate degrees
top_education <- education_data %>%
arrange(desc(GradDegreePercent)) %>%
slice_head(n = 5)
# Extract the bottom 5 counties with the lowest percentage of individuals with graduate degrees
bottom_education <- education_data %>%
arrange(GradDegreePercent) %>%
slice_head(n = 5)
# Prepare the data for listing
top_list <- paste0(1:5, ". ", top_education$County)
bottom_list <- paste0(1:5, ". ", bottom_education$County)
# Print the lists
cat("Counties with the Highest Levels of Educational Attainment:\n")
## Counties with the Highest Levels of Educational Attainment:
cat(top_list, sep = "\n")
## 1. Monongalia County, West Virginia
## 2. Ohio County, West Virginia
## 3. Jefferson County, West Virginia
## 4. Cabell County, West Virginia
## 5. Kanawha County, West Virginia
cat("\n\nCounties with the Lowest Levels of Educational Attainment:\n")
##
##
## Counties with the Lowest Levels of Educational Attainment:
cat(bottom_list, sep = "\n")
## 1. McDowell County, West Virginia
## 2. Lincoln County, West Virginia
## 3. Calhoun County, West Virginia
## 4. Grant County, West Virginia
## 5. Webster County, West Virginia
I used both static and interactive visualizations to make the results more accessible. From examining educational disparities through charts to showing income levels in districts, these visual aids promote understanding and engagement.
Using static charts to illustrate trends in educational attainment in West Virginia counties to promote a deeper understanding of educational dynamics.
# Create the plot with the cleaned 'County' names.
# In this plot:
# - The x-axis represents the counties, reordered based on the percentage of individuals with graduate degrees (GradDegreePercent),
# while the y-axis shows the percentage of individuals with graduate degrees.
# - Bars are colored in dodgerblue to distinguish them.
# - The plot is flipped horizontally for better readability.
# Clean the county names to remove the state name "West Virginia" for clarity in the plot labels
education_data <- education_data %>%
mutate(County = str_replace(County, ", West Virginia", "")) # Remove "West Virginia" from county names
# Create the plot with the cleaned 'County' names
ggplot(education_data, aes(x = reorder(County, GradDegreePercent), y = GradDegreePercent)) +
geom_bar(stat = "identity", fill = "dodgerblue") +
coord_flip() +
theme(axis.text.y = element_text(angle = 90, vjust = 0.5, hjust=1)) +
labs(title = "Educational Attainment in West Virginia Counties",
x = "",
y = "Percentage with Graduate Degrees") +
theme_minimal() +
theme(
axis.title.y = element_text(hjust = 0.5, margin = margin(t = 0, r = 20, b = 0, l = 20)),
plot.caption = element_text(hjust = 0.5),
plot.title = element_text(hjust = 0.5), # Center the plot title
plot.margin = margin(0, 0, 2, 0, "cm") # Adjust the bottom margin as needed
)
Figure 1: Static Chart illustrating the Distribution of Graduate Education across West Virginia Counties: This bar chart visualizes the educational attainment, specifically the proportion of adults aged 25 and over who hold graduate degrees, across West Virginia’s counties. The data, sourced from the American Community Survey (ACS) in 2021, are arrayed in descending order to underscore the educational disparities among counties. The analysis includes a complete enumeration of all counties, ensuring a comprehensive overview of the state’s educational status. Note that percentages are calculated based on the adult population with graduate degrees, using methods consistent with ACS guidelines for survey data analysis.
Interactive visualizations invite readers to explore educational and economic landscapes, allowing for deeper engagement and understanding of the data.
# Create a ggplot object to visualize the educational attainment in West Virginia counties.
# This ggplot object represents a bar plot showing the percentage of individuals with graduate degrees in each county.
# Counties are reordered based on the percentage of individuals with graduate degrees (GradDegreePercent).
# Bars are colored in dodgerblue, and their width is adjusted for spacing.
ggplot_object <- ggplot(education_data, aes(x = reorder(County, GradDegreePercent), y = GradDegreePercent)) +
geom_bar(stat = "identity", fill = "dodgerblue", width = 0.5) + # Adjust width for spacing
coord_flip() + # Flip the bars to be horizontal
theme_classic() + # Use a classic theme for a clean look
theme(
axis.text.y = element_text(size = 6), # Adjust the text size for y-axis labels (county names)
axis.title.x = element_blank(), # Remove the x-axis title
plot.title = element_text(hjust = 0.5), # Center the plot title
axis.title.y = element_text(size = 10), # Adjust the text size for the y-axis title
plot.margin = margin(t = 0, r = 0, b = 1, l = 0, "cm") # Adjust the top, right, bottom, and left margins
) +
labs(title = "Educational Attainment in West Virginia Counties",
x = "",
y = "Percentage with Graduate Degrees")
# Convert to an interactive plotly object
interactive_plot <- ggplotly(ggplot_object)
# Print the interactive plot
interactive_plot
Figure 2: Interactive Chart demonstrating the Educational Disparity in Graduate Degree Holders Across West Virginia: This interactive bar chart showcases the variance in graduate degree attainment among adults aged 25 and over in West Virginia’s counties, presented in descending order. The survey methodology involved gathering educational data from a representative sample of households throughout all counties, as per the 2021 American Community Survey. This figure serves to illustrate the notable differences in higher education achievements across the state, providing insights into the disparities in educational attainment.
To better understand the reliability of my data, I will visualize the margin of error associated with the percentage of college graduates by county. This visualization helps to underscore the accuracy of our estimates and allows for more informed decision making.
# Fetch educational attainment data
education_data <- get_acs(
geography = "county",
variables = c(estimate = "DP02_0066PE"),
state = "WV",
year = 2021,
survey = "acs5"
) %>%
mutate(
County = gsub(", West Virginia", "", NAME),
County = gsub(" County", "", County), # Remove " County" from county names
LowerBound = estimate - moe,
UpperBound = estimate + moe
)
# Plot with colorblind-friendly colors
ggplot(education_data, aes(x = reorder(County, estimate), y = estimate)) +
geom_point(color = '#377eb8') + # Adjusted point color to a shade of blue
geom_errorbar(aes(ymin = LowerBound, ymax = UpperBound), width = 0.2, color = '#ff7f00') + # Adjusted error bar color to a shade of orange
coord_flip() +
labs(title = "Graduate Degree Holders by County in WV: Estimate and Margin of Error",
x = "County",
y = "Percentage (%) of Population with Graduate Degrees") +
theme_minimal() +
theme(
axis.text.y = element_text(angle = 45, hjust = 1),
plot.margin = margin(t = 0, r = 0, b = 1, l = 0, "cm") # Adjust the margins (top, right, bottom, left)
)
Figure 3: Educational Attainment Visualized: The Spread of Graduate Degrees Across West Virginia. This graph displays the distribution of adults with graduate degrees across West Virginia’s counties. Each county’s percentage is marked with a blue dot and the range of uncertainty is captured by orange error bars representing the margin of error. Counties are ordered by the educational attainment level to emphasize disparities. The data were sourced from the 2021 American Community Survey, focusing on the population aged 25 and over with advanced education. Error bars are derived from margins of error provided with the ACS data, highlighting the statistical confidence in these estimates. Such visualizations are instrumental in policy formulation, targeting areas that might benefit from educational initiatives.
With the data prepared, we can now create an interactive chart that allows us to examine the percentages and margins of error for college graduates by county.
# Fetch educational attainment data
education_data <- get_acs(
geography = "county",
variables = c(estimate = "DP02_0066PE"),
state = "WV",
year = 2021,
survey = "acs5"
) %>%
mutate(
County = gsub(", West Virginia", "", NAME),
LowerBound = estimate - moe,
UpperBound = estimate + moe
)
# Remove "County" from each county name
education_data$County <- gsub(" County", "", education_data$County)
# Generate the interactive plot
interactive_plot <- ggplot(education_data, aes(
x = reorder(County, estimate),
y = estimate,
text = paste0(County, ": ", estimate, "% ± ", moe)
)) +
geom_point(color = '#377eb8') +
geom_errorbar(aes(ymin = LowerBound, ymax = UpperBound), width = 0.2, color = '#ff7f00') +
labs(
title = "Interactive Visualization of Graduate Degree Holders by County in WV",
x = "County",
y = "Percentage (%) of Population with Graduate Degrees"
) +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
plot.margin = margin(t = 0, r = 0, b = 1, l = 0, unit = "cm") # Added extra space at the bottom
)
# Convert ggplot plot to interactive plotly plot
interactive_plotly <- ggplotly(interactive_plot, tooltip = "text")
# Print the interactive plot
interactive_plotly
Figure 4: Interactive Visualization of Graduate Degree Holders by County in WV. This interactive plot displays the percentage of the population with graduate degrees in each county of West Virginia. Hover over the points and error bars to see the exact percentages and margins of error for each county.
Transitioning from education to economics, I examine median household income in different counties to uncover economic opportunities.
A choropleth map is used to visually represent median household income in West Virginia counties. This mapping technique provides a spatial perspective that allows for the identification of economic disparities and differences between different regions. By coloring each county according to its income level, the choropleth map provides a clear and intuitive way to understand the distribution of wealth within the state. This visualization enhances our economic analysis by providing spatial context to the data.
# Plotting median household income in West Virginia by county using ggplot.
# This plot utilizes ggplot to visualize the median household income in West Virginia by county.
# Each county is represented by a filled polygon, where the fill color indicates the median household income (estimate).
# The theme is set to minimal for a clean appearance, and the plot title is centered horizontally.
ggplot(data = income_data) +
geom_sf(aes(fill = estimate), color = "white", size = 0.1) +
scale_fill_viridis_c(option = "magma", direction = -1) +
labs(title = "Median Household Income in West Virginia by County", fill = "Income") +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5),
plot.margin = margin(t = 0, r = 0, b = 1, l = 0, "cm") # Adjusted margins
)
Figure 5: Median Household Income in West Virginia by County: This choropleth map illustrates the median household income across West Virginia’s counties. Each county is shaded according to its respective income level, providing a visual representation of economic disparities within the state. The data for this visualization were sourced from income_data, and the map was generated using a choropleth mapping technique. The color gradient reflects varying income levels, with darker shades indicating higher median household incomes and lighter shades indicating lower median household incomes. This visualization offers valuable insights into the economic landscape of West Virginia, enabling policymakers and stakeholders to identify areas of concern and prioritize resources effectivel.
The interactive map provides a dynamic exploration of median household income in West Virginia counties. By hovering over each county, you can view detailed information, including the county name and median income. This interactive visualization encourages user engagement and understanding by providing a hands-on exploration of economic disparities within the state. Users can interact with the map, zooming in on specific areas of interest and gaining valuable insight into West Virginia’s economic landscape.
# Creating an interactive map to explore median household income in West Virginia counties using ggplot and plotly.
# This code chunk generates an interactive map that provides dynamic exploration of median household income across West Virginia counties.
# The map is created using ggplot and converted into an interactive plotly object.
# Each county is represented by a filled polygon, where the fill color indicates the median household income.
# Hovering over each county displays detailed information, including the county name and median income.
# The map enhances user engagement and understanding by allowing hands-on exploration of economic disparities within the state.
# Users can interact with the map, zooming in on specific areas of interest and gaining valuable insights into the economic landscape of West Virginia.
# First, create a new text column for the hover information
income_data$text_hover <- paste(income_data$NAME, "<br>Median Income: $", income_data$estimate)
# Now, create the static ggplot object with the hover text
static_map <- ggplot(data = income_data) +
geom_sf(aes(fill = estimate, text = text_hover), color = "white", size = 0.1) +
scale_fill_viridis_c(option = "magma", direction = -1) +
labs(title = "Median Household Income in West Virginia by County", fill = "Income") +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5),
plot.margin = margin(t = 0, r = 0, b = 1, l = 0, "cm") # Add extra space at the bottom
)
# Convert the static map into an interactive plotly object with custom hover text
interactive_map <- ggplotly(static_map, tooltip = "text_hover")
# Print the interactive map
interactive_map
Figure 6: Interactive Map: This interactive map offers a dynamic exploration of median household income across West Virginia’s counties. By hovering over each county, users can access detailed information, including the county name and median income. The map enhances user engagement and understanding by allowing hands-on exploration of economic disparities within the state. Users can interact with the map, zooming in on specific areas of interest and gaining valuable insights into the economic landscape of West Virginia.
In this section of my analysis, I focus on creating an interactive map to visualize the differences in median household incomes across West Virginia counties. The visualization is done using the mapview package, which can be used to create interactive maps that can be dynamically explored in an R environment or a web browser.
# This code chunk creates an interactive map visualization of median household income across the counties of West Virginia.
# The mapview package is used to generate the interactive map, allowing users to explore the data dynamically.
# The breaks argument is used to define breaks for the legend, categorizing income levels.
# Corresponding colors are assigned to represent lower and higher incomes, enhancing visualization.
# The legend title is set as "Estimated Median Household Income" to provide clarity on the data being represented.
# Users can interactively explore median household incomes of different counties in West Virginia using this visualization.
# Define breaks for the legend
breaks <- c(0, 40000, 60000, 80000, 100000)
# Define the colors for lower incomes (shades of yellow) and higher incomes (shades of green)
colors <- c("#ffffcc", "#c2e699", "#78c679", "#31a354")
# Create the mapview with the specified breaks, colors, and legend title
income_map <- mapview(income_data, zcol = 'estimate', legend = TRUE, at = breaks, col.regions = colors, legend.title = "Estimated Median Household Income")
# Print the mapview map
income_map
Figure 7: Interactive Map Visualization of Median
Household Income: This interactive map visualization showcases
the disparities in median household incomes across the counties of West
Virginia. The mapview package is utilized to create an interactive map
that can be dynamically explored within an R environment or a web
browser. To understand the visualization, the breaks for the legend are
defined to categorize income levels, and corresponding colors are
assigned to represent lower and higher incomes. The legend title is set
as “Estimated Median Household Income” to provide clarity on the data
being represented. This visualization allows users to interactively
explore the median household incomes of different counties in West
Virginia. It offers a valuable tool for policymakers, researchers, and
stakeholders to understand economic disparities and prioritize
interventions effectively.
In this analysis, I examined the educational and economic landscape of West Virginia and the interplay between educational attainment and economic prosperity. The results derived from the American Community Survey (ACS) reveal not only inequities and opportunities, but also the pressing challenges facing the state.
Educational Insights: Examining Educational Trends in West Virginia’s counties revealed both encouraging successes and troubling inequities. Identifying the areas with the highest and lowest educational attainment underscored the urgent need for targeted educational interventions. These efforts are essential to improving the state’s overall educational landscape and eliminating existing inequalities.
Visual Representations: Through the use of static and interactive visualizations, this analysis has made the educational dynamics and economic disparities within the state more understandable. From detailed static charts to dynamic interactive maps, these visual tools have greatly improved engagement and understanding of the data presented and provided a clearer picture of West Virginia’s educational and economic status.
*Economic Analysis: Moving from education analysis to examining economic dimensions has provided insight into median household income in each county. The use of bar charts and choropleth maps, as well as interactive visualizations, has helped highlight economic inequities and disparities and provides policy makers and stakeholders with valuable tools to identify and prioritize areas in need of economic uplift.
Margin of Error Visualization for college degree holders: An important addition to this analysis is the visualization of the margin of error associated with the percentage of college graduates by county. This aspect of the analysis emphasizes the precision of our estimates and increases the reliability of the data presented. By highlighting the statistical certainty of these estimates, the visualization of the margin of error serves as a foundational component for informed decision making and policy formulation. It underscores the importance of recognizing and understanding the uncertainties inherent in survey data to ensure that interventions and policies are based on a nuanced understanding of the state’s educational attainment.
To summarize this analysis of West Virginia’s education and economy, data from the American Community Survey was carefully prepared and analyzed. Specific trends in educational attainment and median household income by county were identified, with interactive visualizations allowing for detailed examination of the data by county. Bar charts and choropleth maps highlighted disparities, and interactive tools facilitated engagement with the results and helped stakeholders make decisions about educational and economic improvement in the state.
As a native West Virginian, this project is more than an academic exercise — it is a personal exploration of the potential of my home state. It reinforces my belief in the power of education and economic development as catalysts for change.
Going forward, it is important to continue to explore these dynamics and look for innovative ways to address inequities and promote growth in West Virginia. Through informed policy and targeted initiatives, we can create a better future for the state and its citizens.
Cheng, J., Xie, Y., & Allaire, J. (2021). plotly: Create Interactive Web Graphics via ‘plotly.js’ (R package version 4.10.0) [Computer software]. CRAN.R-project.org/package=plotly
Pebesma, E. (2018). sf: Simple Features for R (R package version 1.0-3) [Computer software]. CRAN.R-project.org/package=sf
Tennekes, M. (2021). tmap: Thematic Maps (R package version 3.3-1) [Computer software]. CRAN.R-project.org/package=tmap
U.S. Census Bureau. (2019). American Community Survey 1-year
estimates, Table B19013_001E. Data retrieved using
tidycensus package in R.
U.S. Census Bureau. (2021). American Community Survey 1-year
estimates, Table DP02_0066PE. Data retrieved using
tidycensus package in R.
Walker, K. (2021). tidycensus: Load US Census Boundary and Attribute Data as ‘tidyverse’ and ‘sf’-Ready Data Frames (R package version 1.0) [Computer software]. CRAN.R-project.org/package=tidycensus
Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis (2nd ed.) [Computer software]. Springer-Verlag New York. ggplot2.tidyverse.org
Wickham, H., François, R., Henry, L., & Müller, K. (2021). dplyr: A Grammar of Data Manipulation (R package version 1.0.7) [Computer software]. CRAN.R-project.org/package=dplyr
Yau, N. (2013). Data Points: Visualization That Means Something. Wiley.