Analysis of Education and Economy in West Virginia

In this analysis, I have focused on West Virginia (WV), my home state in Appalachia. It is a place that has great personal meaning to me as most of my family still resides there and is a vivid example of how education can change lives. My goal is to examine the state’s educational framework, focusing particularly on tertiary and vocational skills and how they relate to economic prosperity.

Introduction: Exploring Education and Economy in West Virginia

West Virginia is a fascinating example of examining the impact of educational attainment on economic outcomes. Using data from the American Community Survey (ACS), I aim to uncover patterns and inequalities that could inform policy and growth strategies. The ACS is used because it provides the most current and detailed demographic data for all communities in the U.S. on an annual basis. This includes a wealth of information on educational attainment, income, occupation and housing characteristics. For researchers, the ACS is an indispensable tool for understanding a community’s needs, developing ideas for allocating resources, planning services, and assisting in policy development based on current social, economic, and housing conditions of the population.

Data Preparation

I have meticulously prepared the ACS data to understand the educational attainment and median household income in each county to get a comprehensive picture of the socioeconomic status of our state.

Fetching the Data

To gather relevant data, I used the tidycensus package, focusing on educational attainment and median income by county. The inclusion of libraries such as tidyverse and ggplot2 supported the analysis and visualization.

# Ensure all required libraries are loaded
required_packages <- c("tidycensus", "tidyverse", "ggplot2", "plotly", "sf", "mapview","ggiraph")
new_packages <- required_packages[!required_packages %in% installed.packages()[,"Package"]]
if(length(new_packages)) install.packages(new_packages)

# Load libraries
library(tidycensus)
library(tidyverse)
library(ggplot2)
library(plotly)
library(sf)
library(mapview)
library(ggiraph)

Securing Access: Establishing the Census API Key

Obtaining a Census API key was crucial to accessing the detailed data that formed the basis of this research.

Using load_variables()

Now to use the load_variables() function to examine the available variables, I need to specify the year and the data set I’m interested in.

# Load the variables for 2021 ACS 5-year estimates
variables <- load_variables(year = 2021, dataset = "acs5", cache = TRUE)

# Print the loaded variables
print(variables)

## # A tibble: 27,886 × 4
##    name        label                                    concept        geography
##    <chr>       <chr>                                    <chr>          <chr>    
##  1 B01001A_001 Estimate!!Total:                         SEX BY AGE (W… tract    
##  2 B01001A_002 Estimate!!Total:!!Male:                  SEX BY AGE (W… tract    
##  3 B01001A_003 Estimate!!Total:!!Male:!!Under 5 years   SEX BY AGE (W… tract    
##  4 B01001A_004 Estimate!!Total:!!Male:!!5 to 9 years    SEX BY AGE (W… tract    
##  5 B01001A_005 Estimate!!Total:!!Male:!!10 to 14 years  SEX BY AGE (W… tract    
##  6 B01001A_006 Estimate!!Total:!!Male:!!15 to 17 years  SEX BY AGE (W… tract    
##  7 B01001A_007 Estimate!!Total:!!Male:!!18 and 19 years SEX BY AGE (W… tract    
##  8 B01001A_008 Estimate!!Total:!!Male:!!20 to 24 years  SEX BY AGE (W… tract    
##  9 B01001A_009 Estimate!!Total:!!Male:!!25 to 29 years  SEX BY AGE (W… tract    
## 10 B01001A_010 Estimate!!Total:!!Male:!!30 to 34 years  SEX BY AGE (W… tract    
## # ℹ 27,876 more rows

Searching for Variables

# Filter for variables related to education
education_variables <- variables %>%
  filter(grepl("education", label, ignore.case = TRUE))

# Print the filtered variables related to education
print(education_variables)

## # A tibble: 171 × 4
##    name       label                                            concept geography
##    <chr>      <chr>                                            <chr>   <chr>    
##  1 B08126_011 Estimate!!Total:!!Educational services, and hea… MEANS … tract    
##  2 B08126_026 Estimate!!Total:!!Car, truck, or van - drove al… MEANS … tract    
##  3 B08126_041 Estimate!!Total:!!Car, truck, or van - carpoole… MEANS … tract    
##  4 B08126_056 Estimate!!Total:!!Public transportation (exclud… MEANS … tract    
##  5 B08126_071 Estimate!!Total:!!Walked:!!Educational services… MEANS … tract    
##  6 B08126_086 Estimate!!Total:!!Taxicab, motorcycle, bicycle,… MEANS … tract    
##  7 B08126_101 Estimate!!Total:!!Worked from home:!!Educationa… MEANS … tract    
##  8 B08526_011 Estimate!!Total:!!Educational services, and hea… MEANS … county   
##  9 B08526_026 Estimate!!Total:!!Car, truck, or van - drove al… MEANS … county   
## 10 B08526_041 Estimate!!Total:!!Car, truck, or van - carpoole… MEANS … county   
## # ℹ 161 more rows

Insights from Analysis: Understanding Educational and Economic Dynamics

Preparing and cleaning the data allowed me to examine educational trends and economic conditions throughout West Virginia. My analysis aims to highlight the areas with high educational attainment and identify the lagging areas.

Data Retrieval and Cleaning

# Retrieve educational attainment data
education_data <- get_acs(geography = "county", variables = "DP02_0066PE", state = "WV", year = 2021)

# Retrieve median household income data
income_data <- get_acs(geography = "county", variables = "B19013_001E", state = "WV", year = 2019, geometry = TRUE)

# Cleaning and preparing data for analysis
education_data <- education_data %>% rename(County = NAME, GradDegreePercent = estimate) %>% drop_na()
income_data$geometry <- st_transform(income_data$geometry, crs = 4326) # Transform geometry to standard projection

Analysis

With the processed data, I can now examine education and economic metrics across West Virginia to gain insights and inform policy discussions.

Educational Trends

I begin by examining the distribution of college graduates across counties and identify areas with high educational attainment and those facing major challenges. Through this analysis, I aim to highlight educational disparities and opportunities for improvement.

Identifying Extremes

Identify counties that are characterized by remarkable performance and those that struggle with inequities. This analysis sheds light on inequities in education and hopes to identify ways to intervene.

# Extract the top 5 counties with the highest percentage of individuals with graduate degrees
top_education <- education_data %>% 
  arrange(desc(GradDegreePercent)) %>% 
  slice_head(n = 5)

# Extract the bottom 5 counties with the lowest percentage of individuals with graduate degrees
bottom_education <- education_data %>% 
  arrange(GradDegreePercent) %>% 
  slice_head(n = 5)

# Prepare the data for listing
top_list <- paste0(1:5, ". ", top_education$County)
bottom_list <- paste0(1:5, ". ", bottom_education$County)

# Print the lists
cat("Counties with the Highest Levels of Educational Attainment:\n")

## Counties with the Highest Levels of Educational Attainment:

cat(top_list, sep = "\n")

## 1. Monongalia County, West Virginia
## 2. Ohio County, West Virginia
## 3. Jefferson County, West Virginia
## 4. Cabell County, West Virginia
## 5. Kanawha County, West Virginia

cat("\n\nCounties with the Lowest Levels of Educational Attainment:\n")

## 
## 
## Counties with the Lowest Levels of Educational Attainment:

cat(bottom_list, sep = "\n")

## 1. McDowell County, West Virginia
## 2. Lincoln County, West Virginia
## 3. Calhoun County, West Virginia
## 4. Grant County, West Virginia
## 5. Webster County, West Virginia

Visualizing Findings: Bringing Data to Life

I used both static and interactive visualizations to make the results more accessible. From examining educational disparities through charts to showing income levels in districts, these visual aids promote understanding and engagement.

Crafting Static Representations

Using static charts to illustrate trends in educational attainment in West Virginia counties to promote a deeper understanding of educational dynamics.

# Create the plot with the cleaned 'County' names.
# In this plot:
# - The x-axis represents the counties, reordered based on the percentage of individuals with graduate degrees (GradDegreePercent),
#   while the y-axis shows the percentage of individuals with graduate degrees.
# - Bars are colored in dodgerblue to distinguish them.
# - The plot is flipped horizontally for better readability.

# Clean the county names to remove the state name "West Virginia" for clarity in the plot labels
education_data <- education_data %>%
  mutate(County = str_replace(County, ", West Virginia", ""))  # Remove "West Virginia" from county names

# Create the plot with the cleaned 'County' names
ggplot(education_data, aes(x = reorder(County, GradDegreePercent), y = GradDegreePercent)) +
  geom_bar(stat = "identity", fill = "dodgerblue") +
  coord_flip() +
  theme(axis.text.y = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  labs(title = "Educational Attainment in West Virginia Counties",
       x = "",
       y = "Percentage with Graduate Degrees") +
  theme_minimal() +
  theme(
    axis.title.y = element_text(hjust = 0.5, margin = margin(t = 0, r = 20, b = 0, l = 20)),
    plot.caption = element_text(hjust = 0.5),
    plot.title = element_text(hjust = 0.5),  # Center the plot title
    plot.margin = margin(0, 0, 2, 0, "cm")  # Adjust the bottom margin as needed
  )

Figure 1: Static Chart illustrating the Distribution of Graduate Education across West Virginia Counties: This bar chart visualizes the educational attainment, specifically the proportion of adults aged 25 and over who hold graduate degrees, across West Virginia’s counties. The data, sourced from the American Community Survey (ACS) in 2021, are arrayed in descending order to underscore the educational disparities among counties. The analysis includes a complete enumeration of all counties, ensuring a comprehensive overview of the state’s educational status. Note that percentages are calculated based on the adult population with graduate degrees, using methods consistent with ACS guidelines for survey data analysis.

Interactive Visualizations

Interactive visualizations invite readers to explore educational and economic landscapes, allowing for deeper engagement and understanding of the data.

# Create a ggplot object to visualize the educational attainment in West Virginia counties.
# This ggplot object represents a bar plot showing the percentage of individuals with graduate degrees in each county.
# Counties are reordered based on the percentage of individuals with graduate degrees (GradDegreePercent).
# Bars are colored in dodgerblue, and their width is adjusted for spacing.

ggplot_object <- ggplot(education_data, aes(x = reorder(County, GradDegreePercent), y = GradDegreePercent)) +
  geom_bar(stat = "identity", fill = "dodgerblue", width = 0.5) +  # Adjust width for spacing
  coord_flip() +  # Flip the bars to be horizontal
  theme_classic() +  # Use a classic theme for a clean look
  theme(
    axis.text.y = element_text(size = 6),  # Adjust the text size for y-axis labels (county names)
    axis.title.x = element_blank(),  # Remove the x-axis title
    plot.title = element_text(hjust = 0.5),  # Center the plot title
    axis.title.y = element_text(size = 10),  # Adjust the text size for the y-axis title
    plot.margin = margin(t = 0, r = 0, b = 1, l = 0, "cm")  # Adjust the top, right, bottom, and left margins
  ) +
  labs(title = "Educational Attainment in West Virginia Counties",
       x = "",
       y = "Percentage with Graduate Degrees")

# Convert to an interactive plotly object
interactive_plot <- ggplotly(ggplot_object)

# Print the interactive plot
interactive_plot

Figure 2: Interactive Chart demonstrating the Educational Disparity in Graduate Degree Holders Across West Virginia: This interactive bar chart showcases the variance in graduate degree attainment among adults aged 25 and over in West Virginia’s counties, presented in descending order. The survey methodology involved gathering educational data from a representative sample of households throughout all counties, as per the 2021 American Community Survey. This figure serves to illustrate the notable differences in higher education achievements across the state, providing insights into the disparities in educational attainment.

Margin of Error Visualization for Graduate Degree Holders

To better understand the reliability of my data, I will visualize the margin of error associated with the percentage of college graduates by county. This visualization helps to underscore the accuracy of our estimates and allows for more informed decision making.

# Fetch educational attainment data
education_data <- get_acs(
  geography = "county",
  variables = c(estimate = "DP02_0066PE"), 
  state = "WV",
  year = 2021,
  survey = "acs5"
) %>%
  mutate(
    County = gsub(", West Virginia", "", NAME), 
    County = gsub(" County", "", County),  # Remove " County" from county names
    LowerBound = estimate - moe, 
    UpperBound = estimate + moe 
  )

# Plot with colorblind-friendly colors
ggplot(education_data, aes(x = reorder(County, estimate), y = estimate)) +
  geom_point(color = '#377eb8') + # Adjusted point color to a shade of blue
  geom_errorbar(aes(ymin = LowerBound, ymax = UpperBound), width = 0.2, color = '#ff7f00') + # Adjusted error bar color to a shade of orange
  coord_flip() + 
  labs(title = "Graduate Degree Holders by County in WV: Estimate and Margin of Error",
       x = "County",
       y = "Percentage (%) of Population with Graduate Degrees") +
  theme_minimal() +
  theme(
    axis.text.y = element_text(angle = 45, hjust = 1),
    plot.margin = margin(t = 0, r = 0, b = 1, l = 0, "cm")  # Adjust the margins (top, right, bottom, left)
  )

Figure 3: Educational Attainment Visualized: The Spread of Graduate Degrees Across West Virginia. This graph displays the distribution of adults with graduate degrees across West Virginia’s counties. Each county’s percentage is marked with a blue dot and the range of uncertainty is captured by orange error bars representing the margin of error. Counties are ordered by the educational attainment level to emphasize disparities. The data were sourced from the 2021 American Community Survey, focusing on the population aged 25 and over with advanced education. Error bars are derived from margins of error provided with the ACS data, highlighting the statistical confidence in these estimates. Such visualizations are instrumental in policy formulation, targeting areas that might benefit from educational initiatives.

Margin of Error Interactive Visualization

With the data prepared, we can now create an interactive chart that allows us to examine the percentages and margins of error for college graduates by county.

# Fetch educational attainment data
education_data <- get_acs(
  geography = "county",
  variables = c(estimate = "DP02_0066PE"), 
  state = "WV",
  year = 2021,
  survey = "acs5"
) %>%
  mutate(
    County = gsub(", West Virginia", "", NAME), 
    LowerBound = estimate - moe, 
    UpperBound = estimate + moe 
  )

# Remove "County" from each county name
education_data$County <- gsub(" County", "", education_data$County)

# Generate the interactive plot
interactive_plot <- ggplot(education_data, aes(
  x = reorder(County, estimate), 
  y = estimate, 
  text = paste0(County, ": ", estimate, "% ± ", moe)
)) +
  geom_point(color = '#377eb8') +
  geom_errorbar(aes(ymin = LowerBound, ymax = UpperBound), width = 0.2, color = '#ff7f00') +
  labs(
    title = "Interactive Visualization of Graduate Degree Holders by County in WV",
    x = "County",
    y = "Percentage (%) of Population with Graduate Degrees"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    plot.margin = margin(t = 0, r = 0, b = 1, l = 0, unit = "cm") # Added extra space at the bottom
  )

# Convert ggplot plot to interactive plotly plot
interactive_plotly <- ggplotly(interactive_plot, tooltip = "text")

# Print the interactive plot
interactive_plotly

Figure 4: Interactive Visualization of Graduate Degree Holders by County in WV. This interactive plot displays the percentage of the population with graduate degrees in each county of West Virginia. Hover over the points and error bars to see the exact percentages and margins of error for each county.

Economic Analysis: Mapping Pathways to Prosperity

Transitioning from education to economics, I examine median household income in different counties to uncover economic opportunities.

Median Household Income Map: Visualizing Economic Disparities

A choropleth map is used to visually represent median household income in West Virginia counties. This mapping technique provides a spatial perspective that allows for the identification of economic disparities and differences between different regions. By coloring each county according to its income level, the choropleth map provides a clear and intuitive way to understand the distribution of wealth within the state. This visualization enhances our economic analysis by providing spatial context to the data.

# Plotting median household income in West Virginia by county using ggplot.
# This plot utilizes ggplot to visualize the median household income in West Virginia by county.
# Each county is represented by a filled polygon, where the fill color indicates the median household income (estimate).
# The theme is set to minimal for a clean appearance, and the plot title is centered horizontally.

ggplot(data = income_data) +
  geom_sf(aes(fill = estimate), color = "white", size = 0.1) +
  scale_fill_viridis_c(option = "magma", direction = -1) +
  labs(title = "Median Household Income in West Virginia by County", fill = "Income") + 
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
    plot.margin = margin(t = 0, r = 0, b = 1, l = 0, "cm")  # Adjusted margins
  )

Figure 5: Median Household Income in West Virginia by County: This choropleth map illustrates the median household income across West Virginia’s counties. Each county is shaded according to its respective income level, providing a visual representation of economic disparities within the state. The data for this visualization were sourced from income_data, and the map was generated using a choropleth mapping technique. The color gradient reflects varying income levels, with darker shades indicating higher median household incomes and lighter shades indicating lower median household incomes. This visualization offers valuable insights into the economic landscape of West Virginia, enabling policymakers and stakeholders to identify areas of concern and prioritize resources effectivel.

Interactive Map: Exploring Median Household Income

The interactive map provides a dynamic exploration of median household income in West Virginia counties. By hovering over each county, you can view detailed information, including the county name and median income. This interactive visualization encourages user engagement and understanding by providing a hands-on exploration of economic disparities within the state. Users can interact with the map, zooming in on specific areas of interest and gaining valuable insight into West Virginia’s economic landscape.

# Creating an interactive map to explore median household income in West Virginia counties using ggplot and plotly.
# This code chunk generates an interactive map that provides dynamic exploration of median household income across West Virginia counties.
# The map is created using ggplot and converted into an interactive plotly object.
# Each county is represented by a filled polygon, where the fill color indicates the median household income.
# Hovering over each county displays detailed information, including the county name and median income.
# The map enhances user engagement and understanding by allowing hands-on exploration of economic disparities within the state.
# Users can interact with the map, zooming in on specific areas of interest and gaining valuable insights into the economic landscape of West Virginia.

# First, create a new text column for the hover information
income_data$text_hover <- paste(income_data$NAME, "<br>Median Income: $", income_data$estimate)

# Now, create the static ggplot object with the hover text
static_map <- ggplot(data = income_data) +
  geom_sf(aes(fill = estimate, text = text_hover), color = "white", size = 0.1) +
  scale_fill_viridis_c(option = "magma", direction = -1) +
  labs(title = "Median Household Income in West Virginia by County", fill = "Income") +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
    plot.margin = margin(t = 0, r = 0, b = 1, l = 0, "cm")  # Add extra space at the bottom
  )

# Convert the static map into an interactive plotly object with custom hover text
interactive_map <- ggplotly(static_map, tooltip = "text_hover")

# Print the interactive map
interactive_map

Figure 6: Interactive Map: This interactive map offers a dynamic exploration of median household income across West Virginia’s counties. By hovering over each county, users can access detailed information, including the county name and median income. The map enhances user engagement and understanding by allowing hands-on exploration of economic disparities within the state. Users can interact with the map, zooming in on specific areas of interest and gaining valuable insights into the economic landscape of West Virginia.

Interactive Map Visualization of Median Household Income

In this section of my analysis, I focus on creating an interactive map to visualize the differences in median household incomes across West Virginia counties. The visualization is done using the mapview package, which can be used to create interactive maps that can be dynamically explored in an R environment or a web browser.

# This code chunk creates an interactive map visualization of median household income across the counties of West Virginia.
# The mapview package is used to generate the interactive map, allowing users to explore the data dynamically.
# The breaks argument is used to define breaks for the legend, categorizing income levels.
# Corresponding colors are assigned to represent lower and higher incomes, enhancing visualization.
# The legend title is set as "Estimated Median Household Income" to provide clarity on the data being represented.
# Users can interactively explore median household incomes of different counties in West Virginia using this visualization.


# Define breaks for the legend
breaks <- c(0, 40000, 60000, 80000, 100000)

# Define the colors for lower incomes (shades of yellow) and higher incomes (shades of green)
colors <- c("#ffffcc", "#c2e699", "#78c679", "#31a354")

# Create the mapview with the specified breaks, colors, and legend title
income_map <- mapview(income_data, zcol = 'estimate', legend = TRUE, at = breaks, col.regions = colors, legend.title = "Estimated Median Household Income")

# Print the mapview map
income_map

Figure 7: Interactive Map Visualization of Median Household Income: This interactive map visualization showcases the disparities in median household incomes across the counties of West Virginia. The mapview package is utilized to create an interactive map that can be dynamically explored within an R environment or a web browser. To understand the visualization, the breaks for the legend are defined to categorize income levels, and corresponding colors are assigned to represent lower and higher incomes. The legend title is set as “Estimated Median Household Income” to provide clarity on the data being represented. This visualization allows users to interactively explore the median household incomes of different counties in West Virginia. It offers a valuable tool for policymakers, researchers, and stakeholders to understand economic disparities and prioritize interventions effectively.

Conclusion

In this analysis, I examined the educational and economic landscape of West Virginia and the interplay between educational attainment and economic prosperity. The results derived from the American Community Survey (ACS) reveal not only inequities and opportunities, but also the pressing challenges facing the state.

Educational Insights: Examining Educational Trends in West Virginia’s counties revealed both encouraging successes and troubling inequities. Identifying the areas with the highest and lowest educational attainment underscored the urgent need for targeted educational interventions. These efforts are essential to improving the state’s overall educational landscape and eliminating existing inequalities.
Visual Representations: Through the use of static and interactive visualizations, this analysis has made the educational dynamics and economic disparities within the state more understandable. From detailed static charts to dynamic interactive maps, these visual tools have greatly improved engagement and understanding of the data presented and provided a clearer picture of West Virginia’s educational and economic status.
*Economic Analysis: Moving from education analysis to examining economic dimensions has provided insight into median household income in each county. The use of bar charts and choropleth maps, as well as interactive visualizations, has helped highlight economic inequities and disparities and provides policy makers and stakeholders with valuable tools to identify and prioritize areas in need of economic uplift.
Margin of Error Visualization for college degree holders: An important addition to this analysis is the visualization of the margin of error associated with the percentage of college graduates by county. This aspect of the analysis emphasizes the precision of our estimates and increases the reliability of the data presented. By highlighting the statistical certainty of these estimates, the visualization of the margin of error serves as a foundational component for informed decision making and policy formulation. It underscores the importance of recognizing and understanding the uncertainties inherent in survey data to ensure that interventions and policies are based on a nuanced understanding of the state’s educational attainment.

To summarize this analysis of West Virginia’s education and economy, data from the American Community Survey was carefully prepared and analyzed. Specific trends in educational attainment and median household income by county were identified, with interactive visualizations allowing for detailed examination of the data by county. Bar charts and choropleth maps highlighted disparities, and interactive tools facilitated engagement with the results and helped stakeholders make decisions about educational and economic improvement in the state.

Reflection

As a native West Virginian, this project is more than an academic exercise — it is a personal exploration of the potential of my home state. It reinforces my belief in the power of education and economic development as catalysts for change.

Going forward, it is important to continue to explore these dynamics and look for innovative ways to address inequities and promote growth in West Virginia. Through informed policy and targeted initiatives, we can create a better future for the state and its citizens.

References

Cheng, J., Xie, Y., & Allaire, J. (2021). plotly: Create Interactive Web Graphics via ‘plotly.js’ (R package version 4.10.0) [Computer software]. CRAN.R-project.org/package=plotly
Pebesma, E. (2018). sf: Simple Features for R (R package version 1.0-3) [Computer software]. CRAN.R-project.org/package=sf
Tennekes, M. (2021). tmap: Thematic Maps (R package version 3.3-1) [Computer software]. CRAN.R-project.org/package=tmap
U.S. Census Bureau. (2019). American Community Survey 1-year estimates, Table B19013_001E. Data retrieved using tidycensus package in R.
U.S. Census Bureau. (2021). American Community Survey 1-year estimates, Table DP02_0066PE. Data retrieved using tidycensus package in R.
Walker, K. (2021). tidycensus: Load US Census Boundary and Attribute Data as ‘tidyverse’ and ‘sf’-Ready Data Frames (R package version 1.0) [Computer software]. CRAN.R-project.org/package=tidycensus
Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis (2nd ed.) [Computer software]. Springer-Verlag New York. ggplot2.tidyverse.org
Wickham, H., François, R., Henry, L., & Müller, K. (2021). dplyr: A Grammar of Data Manipulation (R package version 1.0.7) [Computer software]. CRAN.R-project.org/package=dplyr
Yau, N. (2013). Data Points: Visualization That Means Something. Wiley.