This script examines the relationship between broadband access and median household income among Tennessee’s 95 counties.
First, let’s load required packages
#Install and load required packages
if (!require("dplyr")) install.packages("dplyr")
if (!require("tidyverse")) install.packages("tidyverse")
library(dplyr)
library(ggplot2)
Next, let’s load the dataset
#Read the data
mydata <- read.csv("TNBBAccessData.csv")
Here are histograms of the broadband access and household income measures.
#Look at histograms of the PctBB and MedIncome distributions
ggplot(mydata, aes(x = PctBB))+geom_histogram(color="black",fill="dodgerblue")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(mydata, aes(x = MedIncome))+geom_histogram(color="black",fill="dodgerblue")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
(Here, describe what you see in the histograms of each variable)
Next, let’s get some descriptive statistics for each variable
mydata %>%
select(PctBB, MedIncome) %>%
summarise_all(list(Median = median,
Mean = mean,
SD = sd,
Min = min,
Max = max))
## PctBB_Median MedIncome_Median PctBB_Mean MedIncome_Mean PctBB_SD MedIncome_SD
## 1 71.9 44122 72.22947 47167 7.045458 10837.72
## PctBB_Min MedIncome_Min PctBB_Max MedIncome_Max
## 1 51.2 30136 93.4 112962
(Here, summarize what the descriptive statistics say)
Finally, let’s see what the relationship between broadband access and median household income looks like.
ggplot(mydata,aes(x = MedIncome,
y = PctBB))+
geom_point(size = 2)+
geom_smooth(method = "lm",
se = FALSE)
## `geom_smooth()` using formula = 'y ~ x'
(Finish up by summarizing your interpretation of what the scatterplot shows. You can talk, for example, about the overall trend and about any outlier / oddball cases you see).