Introduction

This report explores broadband access across Mississippi counties from 2015 to 2023.
The goal of the analysis is to understand how broadband access varies across counties, how it changes over time, and how it relates to education levels.

The main story communicated in this report is that counties with higher levels of education tend to have higher broadband availability, and broadband access appears to improve over time, particularly after the introduction of federal broadband expansion funding.

To explore this story, I use six visualizations:

Load libraries

library(tidyverse)
library(here)
library(janitor)
library(ggbeeswarm)
library(RColorBrewer)
library(ggplot2)

Read the dataset

broadband <- read_csv("Mississippi_broadband_2015_2023.csv")
## Rows: 656 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): county
## dbl (14): population, pop25plus, high_school, bachelors, masters, profession...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Clean column names so they are lowercase and consistent

broadband <- broadband %>%
  clean_names()
names(broadband) # list the new column names
##  [1] "county"            "population"        "pop25plus"        
##  [4] "high_school"       "bachelors"         "masters"          
##  [7] "professional"      "doctorate"         "median_income"    
## [10] "per_capita_income" "uninsured"         "broadband"        
## [13] "no_internet"       "poverty_rate"      "year"

Check the structure of the dataset

glimpse(broadband) # view the structure of the dataset
## Rows: 656
## Columns: 15
## $ county            <chr> "Adams County, Mississippi", "Alcorn County, Mississ…
## $ population        <dbl> 31583, 37242, 12574, 18731, 8306, 33121, 14617, 1022…
## $ pop25plus         <dbl> 21829, 25228, 9304, 12426, 5669, 21136, 9908, 7140, …
## $ high_school       <dbl> 6151, 6689, 3202, 3052, 1428, 4155, 3054, 1735, 3183…
## $ bachelors         <dbl> 2327, 2282, 796, 1021, 355, 2986, 579, 789, 1009, 64…
## $ masters           <dbl> 1088, 1112, 362, 430, 173, 1293, 431, 244, 218, 241,…
## $ professional      <dbl> 286, 485, 48, 39, 39, 292, 39, 100, 72, 39, 68, 112,…
## $ doctorate         <dbl> 97, 141, 25, 44, 16, 275, 64, 38, 34, 76, 62, 12, 38…
## $ median_income     <dbl> 30359, 38919, 30129, 33815, 40605, 28468, 33370, 430…
## $ per_capita_income <dbl> 17721, 20527, 19665, 20617, 18599, 16984, 17837, 229…
## $ uninsured         <dbl> 586, 554, 178, 99, 300, 875, 182, 0, 193, 27, 63, 13…
## $ broadband         <dbl> 7792, 8714, 2548, 4095, 1557, 6673, 3116, 2079, 3317…
## $ no_internet       <dbl> 195, 1103, 165, 483, 224, 189, 361, 136, 361, 251, 4…
## $ poverty_rate      <dbl> 34.2, 19.6, 22.8, 22.6, 20.2, 37.4, 26.9, 14.0, 22.9…
## $ year              <dbl> 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017…

Check summary statistics to inspect the data

summary(broadband) # examine summary statistics for the dataset
##     county            population       pop25plus       high_school   
##  Length:656         Min.   :   928   Min.   :   804   Min.   :  189  
##  Class :character   1st Qu.: 12607   1st Qu.:  9025   1st Qu.: 2644  
##  Mode  :character   Median : 22352   Median : 14602   Median : 3976  
##                     Mean   : 36228   Mean   : 24035   Mean   : 5757  
##                     3rd Qu.: 35099   3rd Qu.: 24292   3rd Qu.: 6536  
##                     Max.   :243249   Max.   :154349   Max.   :32264  
##    bachelors        masters         professional      doctorate     
##  Min.   :    8   Min.   :    8.0   Min.   :   0.0   Min.   :   0.0  
##  1st Qu.:  735   1st Qu.:  357.0   1st Qu.:  57.0   1st Qu.:  29.0  
##  Median : 1444   Median :  619.5   Median : 121.5   Median :  76.0  
##  Mean   : 3397   Mean   : 1531.5   Mean   : 353.7   Mean   : 256.9  
##  3rd Qu.: 2726   3rd Qu.: 1280.2   3rd Qu.: 291.2   3rd Qu.: 175.0  
##  Max.   :26462   Max.   :14196.0   Max.   :3769.0   Max.   :3108.0  
##  median_income   per_capita_income   uninsured        broadband    
##  Min.   :17109   Min.   :12394     Min.   :   0.0   Min.   :  140  
##  1st Qu.:35621   1st Qu.:20359     1st Qu.: 154.8   1st Qu.: 2990  
##  Median :41367   Median :23161     Median : 285.0   Median : 5450  
##  Mean   :42934   Mean   :23836     Mean   : 495.8   Mean   :10289  
##  3rd Qu.:48829   3rd Qu.:26749     3rd Qu.: 555.0   3rd Qu.: 9851  
##  Max.   :85297   Max.   :48905     Max.   :3981.0   Max.   :79974  
##   no_internet      poverty_rate        year     
##  Min.   :   0.0   Min.   : 8.20   Min.   :2017  
##  1st Qu.: 155.0   1st Qu.:18.18   1st Qu.:2019  
##  Median : 331.5   Median :21.90   Median :2020  
##  Mean   : 505.1   Mean   :22.89   Mean   :2020  
##  3rd Qu.: 659.2   3rd Qu.:26.82   3rd Qu.:2022  
##  Max.   :5448.0   Max.   :49.70   Max.   :2024

Check for missing values

colSums(is.na(broadband))
##            county        population         pop25plus       high_school 
##                 0                 0                 0                 0 
##         bachelors           masters      professional         doctorate 
##                 0                 0                 0                 0 
##     median_income per_capita_income         uninsured         broadband 
##                 0                 0                 0                 0 
##       no_internet      poverty_rate              year 
##                 0                 0                 0

Convert the year to factor

broadband$year <- as.factor(broadband$year)
glimpse(broadband) # view the Year
## Rows: 656
## Columns: 15
## $ county            <chr> "Adams County, Mississippi", "Alcorn County, Mississ…
## $ population        <dbl> 31583, 37242, 12574, 18731, 8306, 33121, 14617, 1022…
## $ pop25plus         <dbl> 21829, 25228, 9304, 12426, 5669, 21136, 9908, 7140, …
## $ high_school       <dbl> 6151, 6689, 3202, 3052, 1428, 4155, 3054, 1735, 3183…
## $ bachelors         <dbl> 2327, 2282, 796, 1021, 355, 2986, 579, 789, 1009, 64…
## $ masters           <dbl> 1088, 1112, 362, 430, 173, 1293, 431, 244, 218, 241,…
## $ professional      <dbl> 286, 485, 48, 39, 39, 292, 39, 100, 72, 39, 68, 112,…
## $ doctorate         <dbl> 97, 141, 25, 44, 16, 275, 64, 38, 34, 76, 62, 12, 38…
## $ median_income     <dbl> 30359, 38919, 30129, 33815, 40605, 28468, 33370, 430…
## $ per_capita_income <dbl> 17721, 20527, 19665, 20617, 18599, 16984, 17837, 229…
## $ uninsured         <dbl> 586, 554, 178, 99, 300, 875, 182, 0, 193, 27, 63, 13…
## $ broadband         <dbl> 7792, 8714, 2548, 4095, 1557, 6673, 3116, 2079, 3317…
## $ no_internet       <dbl> 195, 1103, 165, 483, 224, 189, 361, 136, 361, 251, 4…
## $ poverty_rate      <dbl> 34.2, 19.6, 22.8, 22.6, 20.2, 37.4, 26.9, 14.0, 22.9…
## $ year              <fct> 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017…

The dataset includes population aged 25+ (pop25plus), This variable is used to calculate the percentage of residents with a bachelor’s degree.

# create education rate based on population age 25+ (pop25plus)
broadband <- broadband %>%
  mutate(
    # calculate percentage of population with a bachelor's degree
    bachelors_rate = (bachelors / pop25plus) * 100
  )
glimpse(broadband) # make sure we have the bachelors_rate
## Rows: 656
## Columns: 16
## $ county            <chr> "Adams County, Mississippi", "Alcorn County, Mississ…
## $ population        <dbl> 31583, 37242, 12574, 18731, 8306, 33121, 14617, 1022…
## $ pop25plus         <dbl> 21829, 25228, 9304, 12426, 5669, 21136, 9908, 7140, …
## $ high_school       <dbl> 6151, 6689, 3202, 3052, 1428, 4155, 3054, 1735, 3183…
## $ bachelors         <dbl> 2327, 2282, 796, 1021, 355, 2986, 579, 789, 1009, 64…
## $ masters           <dbl> 1088, 1112, 362, 430, 173, 1293, 431, 244, 218, 241,…
## $ professional      <dbl> 286, 485, 48, 39, 39, 292, 39, 100, 72, 39, 68, 112,…
## $ doctorate         <dbl> 97, 141, 25, 44, 16, 275, 64, 38, 34, 76, 62, 12, 38…
## $ median_income     <dbl> 30359, 38919, 30129, 33815, 40605, 28468, 33370, 430…
## $ per_capita_income <dbl> 17721, 20527, 19665, 20617, 18599, 16984, 17837, 229…
## $ uninsured         <dbl> 586, 554, 178, 99, 300, 875, 182, 0, 193, 27, 63, 13…
## $ broadband         <dbl> 7792, 8714, 2548, 4095, 1557, 6673, 3116, 2079, 3317…
## $ no_internet       <dbl> 195, 1103, 165, 483, 224, 189, 361, 136, 361, 251, 4…
## $ poverty_rate      <dbl> 34.2, 19.6, 22.8, 22.6, 20.2, 37.4, 26.9, 14.0, 22.9…
## $ year              <fct> 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017…
## $ bachelors_rate    <dbl> 10.660131, 9.045505, 8.555460, 8.216643, 6.262127, 1…

Display Color Palettes

#To choose colors for visualizations, all palettes in RColorBrewer can be displayed.
display.brewer.all()

Visualization 1: Histogram of broadband access across all counties and years

ggplot(broadband, aes(x = broadband)) +
  geom_histogram(fill = "blue", bins = 30) +
  labs(
    title = "Distribution of Broadband Access",
    subtitle = "Mississippi counties between 2015 and 2023",
    x = "Broadband Access (%)",
    y = "Count of Observations",
    caption = "Data source: Census Bureau"
  ) +
  theme_minimal()

ggsave("Distribution of Broadband Access.png") # saving my plots
## Saving 7 x 5 in image

The histogram shows how broadband access is distributed across Mississippi counties and years. Most counties have lower broadband access (right skewed), while only a few counties have very high levels. This shows that broadband availability is not evenly distributed.

Visualization 2: Boxplot of broadband by county

ggplot(broadband, aes(x = county, y = broadband)) +
  geom_boxplot(fill = "lightblue") +
  labs(
    title = "Broadband Access Across Mississippi Counties",
    subtitle = "Distribution of broadband availability",
    x = "County",
    y = "Broadband Access (%)",
    caption = "Each box represents the distribution of broadband access for a county"
  ) +
  theme_minimal()

ggsave("Broadband Access Across Mississippi Counties.png")
## Saving 7 x 5 in image

The boxplot compares broadband access across Mississippi counties. Some counties consistently have higher broadband access than others. This shows that broadband availability varies across counties.

Visualization 3: Broadband access appears to improve over time? (Violin plot by year).

ggplot(broadband, aes(x = year, y = broadband, fill = year)) +
  geom_violin() +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title = "Broadband Distribution Across Years",
    subtitle = "Comparing broadband availability by year",
    x = "Year",
    y = "Broadband Access (%)",
  ) +
  theme_minimal()

ggsave("Broadband Distribution Across Years.png")
## Saving 7 x 5 in image

The violin plot shows how broadband access changes over time. The distributions appear to increase slightly in later years, suggesting that broadband availability has improved over time.

Visualization 4: Counties with above average broadband (Broadband availability differs across counties)

avg_broadband <- mean(broadband$broadband, na.rm = TRUE)

broadband %>%
  na.omit() %>%   # remove missing values
  filter(broadband > avg_broadband) %>%
  ggplot(aes(x = year, y = broadband, color = bachelors_rate)) +
  geom_point() +
  facet_wrap(~ county) +
  scale_colour_gradient(low = "lightblue", high = "darkblue") +
  labs(
    title = "Broadband Access in Counties Above the State Average",
    subtitle = "Counties with higher than average broadband access across years",
    x = "Year",
    y = "Broadband Access (%)",
    color = "Bachelor's Degree Rate",
    caption = "Only counties with broadband levels above the state average are shown"
  ) +
  theme_minimal()

ggsave("Broadband Access in Counties Above the State Average.png")
## Saving 7 x 5 in image

The faceted plot shows counties with broadband levels above the state average. Each panel represents a county, making it easier to compare how broadband access changes across years in those counties.

Visualization 5:The relationship between education levels and broadband access along with Provety rate across all Mississippi counties (Scatter plot with regression line).

broadband %>%
  na.omit() %>%   # remove missing values
  ggplot(aes(x = bachelors_rate, y = broadband, color = poverty_rate)) +
  geom_point() +                      # scatter points
  geom_smooth(color = "blue") +  # straight regression line
  scale_colour_gradient(low = "yellow", high = "red") +
  labs(
    title = "Education Level and Broadband Access",
    subtitle = "Scatter plot with linear regression line",
    x = "Population with Bachelor's Degree (%)",
    y = "Broadband Access (%)",
    color = "Poverty Rate",
    caption = "Gray band represents the confidence interval around the regression line"
  ) +
  theme_minimal()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

ggsave("Education Level and Broadband Access.png")
## Saving 7 x 5 in image
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

This scatter plot shows the relationship between education levels and broadband access. Counties with higher percentages of people with bachelor’s degrees tend to have higher broadband availability. This supports the main story of the report.

Visualization 6: Layered Plot

ggplot(broadband, aes(x = year, y = broadband)) +
  geom_violin(colour= "lightblue", fill= "blue") +
  geom_boxplot() +
  geom_point() +
  labs(
    title = "Layered Visualization of Broadband Access",
    subtitle = "Combining violin, boxplot, and points",
    x = "Year",
    y = "Broadband Access (%)"
  ) +
  theme_minimal()

ggsave("Layered Visualization of Broadband Access.png")
## Saving 7 x 5 in image

The layered plot combines violin, boxplot, and points to show broadband access across years. Together, these layers help show the distribution and variation of broadband access over time.