This report explores broadband access across Mississippi counties
from 2015 to 2023.
The goal of the analysis is to understand how broadband access varies
across counties, how it changes over time, and how it relates to
education levels.
The main story communicated in this report is that counties with higher levels of education tend to have higher broadband availability, and broadband access appears to improve over time, particularly after the introduction of federal broadband expansion funding.
To explore this story, I use six visualizations:
library(tidyverse)
library(here)
library(janitor)
library(ggbeeswarm)
library(RColorBrewer)
library(ggplot2)
broadband <- read_csv("Mississippi_broadband_2015_2023.csv")
## Rows: 656 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): county
## dbl (14): population, pop25plus, high_school, bachelors, masters, profession...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
broadband <- broadband %>%
clean_names()
names(broadband) # list the new column names
## [1] "county" "population" "pop25plus"
## [4] "high_school" "bachelors" "masters"
## [7] "professional" "doctorate" "median_income"
## [10] "per_capita_income" "uninsured" "broadband"
## [13] "no_internet" "poverty_rate" "year"
glimpse(broadband) # view the structure of the dataset
## Rows: 656
## Columns: 15
## $ county <chr> "Adams County, Mississippi", "Alcorn County, Mississ…
## $ population <dbl> 31583, 37242, 12574, 18731, 8306, 33121, 14617, 1022…
## $ pop25plus <dbl> 21829, 25228, 9304, 12426, 5669, 21136, 9908, 7140, …
## $ high_school <dbl> 6151, 6689, 3202, 3052, 1428, 4155, 3054, 1735, 3183…
## $ bachelors <dbl> 2327, 2282, 796, 1021, 355, 2986, 579, 789, 1009, 64…
## $ masters <dbl> 1088, 1112, 362, 430, 173, 1293, 431, 244, 218, 241,…
## $ professional <dbl> 286, 485, 48, 39, 39, 292, 39, 100, 72, 39, 68, 112,…
## $ doctorate <dbl> 97, 141, 25, 44, 16, 275, 64, 38, 34, 76, 62, 12, 38…
## $ median_income <dbl> 30359, 38919, 30129, 33815, 40605, 28468, 33370, 430…
## $ per_capita_income <dbl> 17721, 20527, 19665, 20617, 18599, 16984, 17837, 229…
## $ uninsured <dbl> 586, 554, 178, 99, 300, 875, 182, 0, 193, 27, 63, 13…
## $ broadband <dbl> 7792, 8714, 2548, 4095, 1557, 6673, 3116, 2079, 3317…
## $ no_internet <dbl> 195, 1103, 165, 483, 224, 189, 361, 136, 361, 251, 4…
## $ poverty_rate <dbl> 34.2, 19.6, 22.8, 22.6, 20.2, 37.4, 26.9, 14.0, 22.9…
## $ year <dbl> 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017…
summary(broadband) # examine summary statistics for the dataset
## county population pop25plus high_school
## Length:656 Min. : 928 Min. : 804 Min. : 189
## Class :character 1st Qu.: 12607 1st Qu.: 9025 1st Qu.: 2644
## Mode :character Median : 22352 Median : 14602 Median : 3976
## Mean : 36228 Mean : 24035 Mean : 5757
## 3rd Qu.: 35099 3rd Qu.: 24292 3rd Qu.: 6536
## Max. :243249 Max. :154349 Max. :32264
## bachelors masters professional doctorate
## Min. : 8 Min. : 8.0 Min. : 0.0 Min. : 0.0
## 1st Qu.: 735 1st Qu.: 357.0 1st Qu.: 57.0 1st Qu.: 29.0
## Median : 1444 Median : 619.5 Median : 121.5 Median : 76.0
## Mean : 3397 Mean : 1531.5 Mean : 353.7 Mean : 256.9
## 3rd Qu.: 2726 3rd Qu.: 1280.2 3rd Qu.: 291.2 3rd Qu.: 175.0
## Max. :26462 Max. :14196.0 Max. :3769.0 Max. :3108.0
## median_income per_capita_income uninsured broadband
## Min. :17109 Min. :12394 Min. : 0.0 Min. : 140
## 1st Qu.:35621 1st Qu.:20359 1st Qu.: 154.8 1st Qu.: 2990
## Median :41367 Median :23161 Median : 285.0 Median : 5450
## Mean :42934 Mean :23836 Mean : 495.8 Mean :10289
## 3rd Qu.:48829 3rd Qu.:26749 3rd Qu.: 555.0 3rd Qu.: 9851
## Max. :85297 Max. :48905 Max. :3981.0 Max. :79974
## no_internet poverty_rate year
## Min. : 0.0 Min. : 8.20 Min. :2017
## 1st Qu.: 155.0 1st Qu.:18.18 1st Qu.:2019
## Median : 331.5 Median :21.90 Median :2020
## Mean : 505.1 Mean :22.89 Mean :2020
## 3rd Qu.: 659.2 3rd Qu.:26.82 3rd Qu.:2022
## Max. :5448.0 Max. :49.70 Max. :2024
colSums(is.na(broadband))
## county population pop25plus high_school
## 0 0 0 0
## bachelors masters professional doctorate
## 0 0 0 0
## median_income per_capita_income uninsured broadband
## 0 0 0 0
## no_internet poverty_rate year
## 0 0 0
broadband$year <- as.factor(broadband$year)
glimpse(broadband) # view the Year
## Rows: 656
## Columns: 15
## $ county <chr> "Adams County, Mississippi", "Alcorn County, Mississ…
## $ population <dbl> 31583, 37242, 12574, 18731, 8306, 33121, 14617, 1022…
## $ pop25plus <dbl> 21829, 25228, 9304, 12426, 5669, 21136, 9908, 7140, …
## $ high_school <dbl> 6151, 6689, 3202, 3052, 1428, 4155, 3054, 1735, 3183…
## $ bachelors <dbl> 2327, 2282, 796, 1021, 355, 2986, 579, 789, 1009, 64…
## $ masters <dbl> 1088, 1112, 362, 430, 173, 1293, 431, 244, 218, 241,…
## $ professional <dbl> 286, 485, 48, 39, 39, 292, 39, 100, 72, 39, 68, 112,…
## $ doctorate <dbl> 97, 141, 25, 44, 16, 275, 64, 38, 34, 76, 62, 12, 38…
## $ median_income <dbl> 30359, 38919, 30129, 33815, 40605, 28468, 33370, 430…
## $ per_capita_income <dbl> 17721, 20527, 19665, 20617, 18599, 16984, 17837, 229…
## $ uninsured <dbl> 586, 554, 178, 99, 300, 875, 182, 0, 193, 27, 63, 13…
## $ broadband <dbl> 7792, 8714, 2548, 4095, 1557, 6673, 3116, 2079, 3317…
## $ no_internet <dbl> 195, 1103, 165, 483, 224, 189, 361, 136, 361, 251, 4…
## $ poverty_rate <dbl> 34.2, 19.6, 22.8, 22.6, 20.2, 37.4, 26.9, 14.0, 22.9…
## $ year <fct> 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017…
# create education rate based on population age 25+ (pop25plus)
broadband <- broadband %>%
mutate(
# calculate percentage of population with a bachelor's degree
bachelors_rate = (bachelors / pop25plus) * 100
)
glimpse(broadband) # make sure we have the bachelors_rate
## Rows: 656
## Columns: 16
## $ county <chr> "Adams County, Mississippi", "Alcorn County, Mississ…
## $ population <dbl> 31583, 37242, 12574, 18731, 8306, 33121, 14617, 1022…
## $ pop25plus <dbl> 21829, 25228, 9304, 12426, 5669, 21136, 9908, 7140, …
## $ high_school <dbl> 6151, 6689, 3202, 3052, 1428, 4155, 3054, 1735, 3183…
## $ bachelors <dbl> 2327, 2282, 796, 1021, 355, 2986, 579, 789, 1009, 64…
## $ masters <dbl> 1088, 1112, 362, 430, 173, 1293, 431, 244, 218, 241,…
## $ professional <dbl> 286, 485, 48, 39, 39, 292, 39, 100, 72, 39, 68, 112,…
## $ doctorate <dbl> 97, 141, 25, 44, 16, 275, 64, 38, 34, 76, 62, 12, 38…
## $ median_income <dbl> 30359, 38919, 30129, 33815, 40605, 28468, 33370, 430…
## $ per_capita_income <dbl> 17721, 20527, 19665, 20617, 18599, 16984, 17837, 229…
## $ uninsured <dbl> 586, 554, 178, 99, 300, 875, 182, 0, 193, 27, 63, 13…
## $ broadband <dbl> 7792, 8714, 2548, 4095, 1557, 6673, 3116, 2079, 3317…
## $ no_internet <dbl> 195, 1103, 165, 483, 224, 189, 361, 136, 361, 251, 4…
## $ poverty_rate <dbl> 34.2, 19.6, 22.8, 22.6, 20.2, 37.4, 26.9, 14.0, 22.9…
## $ year <fct> 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017…
## $ bachelors_rate <dbl> 10.660131, 9.045505, 8.555460, 8.216643, 6.262127, 1…
#To choose colors for visualizations, all palettes in RColorBrewer can be displayed.
display.brewer.all()
ggplot(broadband, aes(x = broadband)) +
geom_histogram(fill = "blue", bins = 30) +
labs(
title = "Distribution of Broadband Access",
subtitle = "Mississippi counties between 2015 and 2023",
x = "Broadband Access (%)",
y = "Count of Observations",
caption = "Data source: Census Bureau"
) +
theme_minimal()
ggsave("Distribution of Broadband Access.png") # saving my plots
## Saving 7 x 5 in image
The histogram shows how broadband access is distributed across Mississippi counties and years. Most counties have lower broadband access (right skewed), while only a few counties have very high levels. This shows that broadband availability is not evenly distributed.
ggplot(broadband, aes(x = county, y = broadband)) +
geom_boxplot(fill = "lightblue") +
labs(
title = "Broadband Access Across Mississippi Counties",
subtitle = "Distribution of broadband availability",
x = "County",
y = "Broadband Access (%)",
caption = "Each box represents the distribution of broadband access for a county"
) +
theme_minimal()
ggsave("Broadband Access Across Mississippi Counties.png")
## Saving 7 x 5 in image
The boxplot compares broadband access across Mississippi counties. Some counties consistently have higher broadband access than others. This shows that broadband availability varies across counties.
ggplot(broadband, aes(x = year, y = broadband, fill = year)) +
geom_violin() +
scale_fill_brewer(palette = "Set2") +
labs(
title = "Broadband Distribution Across Years",
subtitle = "Comparing broadband availability by year",
x = "Year",
y = "Broadband Access (%)",
) +
theme_minimal()
ggsave("Broadband Distribution Across Years.png")
## Saving 7 x 5 in image
The violin plot shows how broadband access changes over time. The distributions appear to increase slightly in later years, suggesting that broadband availability has improved over time.
avg_broadband <- mean(broadband$broadband, na.rm = TRUE)
broadband %>%
na.omit() %>% # remove missing values
filter(broadband > avg_broadband) %>%
ggplot(aes(x = year, y = broadband, color = bachelors_rate)) +
geom_point() +
facet_wrap(~ county) +
scale_colour_gradient(low = "lightblue", high = "darkblue") +
labs(
title = "Broadband Access in Counties Above the State Average",
subtitle = "Counties with higher than average broadband access across years",
x = "Year",
y = "Broadband Access (%)",
color = "Bachelor's Degree Rate",
caption = "Only counties with broadband levels above the state average are shown"
) +
theme_minimal()
ggsave("Broadband Access in Counties Above the State Average.png")
## Saving 7 x 5 in image
The faceted plot shows counties with broadband levels above the state average. Each panel represents a county, making it easier to compare how broadband access changes across years in those counties.
broadband %>%
na.omit() %>% # remove missing values
ggplot(aes(x = bachelors_rate, y = broadband, color = poverty_rate)) +
geom_point() + # scatter points
geom_smooth(color = "blue") + # straight regression line
scale_colour_gradient(low = "yellow", high = "red") +
labs(
title = "Education Level and Broadband Access",
subtitle = "Scatter plot with linear regression line",
x = "Population with Bachelor's Degree (%)",
y = "Broadband Access (%)",
color = "Poverty Rate",
caption = "Gray band represents the confidence interval around the regression line"
) +
theme_minimal()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
ggsave("Education Level and Broadband Access.png")
## Saving 7 x 5 in image
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
This scatter plot shows the relationship between education levels and broadband access. Counties with higher percentages of people with bachelor’s degrees tend to have higher broadband availability. This supports the main story of the report.
ggplot(broadband, aes(x = year, y = broadband)) +
geom_violin(colour= "lightblue", fill= "blue") +
geom_boxplot() +
geom_point() +
labs(
title = "Layered Visualization of Broadband Access",
subtitle = "Combining violin, boxplot, and points",
x = "Year",
y = "Broadband Access (%)"
) +
theme_minimal()
ggsave("Layered Visualization of Broadband Access.png")
## Saving 7 x 5 in image
The layered plot combines violin, boxplot, and points to show broadband access across years. Together, these layers help show the distribution and variation of broadband access over time.