Purpose

This script examines the relationship between broadband access and median household income among Tennessee’s 95 counties.

First, let’s load required packages

#Install and load required packages
if (!require("dplyr")) install.packages("dplyr")
if (!require("tidyverse")) install.packages("tidyverse")
library(dplyr)
library(ggplot2)

Next, let’s load the dataset

#Read the data
mydata <- read.csv("TNBBAccessData.csv")

Here are histograms of the broadband access and household income measures.

#Look at histograms of the PctBB and MedIncome distributions
ggplot(mydata, aes(x = PctBB))+geom_histogram(color="black",fill="dodgerblue")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(mydata, aes(x = MedIncome))+geom_histogram(color="black",fill="dodgerblue")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

(Here, describe what you see in the histograms of each variable)

Next, let’s get some descriptive statistics for each variable

mydata %>%
  select(PctBB, MedIncome) %>%
  summarise_all(list(Median = median,
                     Mean = mean,
                     SD = sd,
                     Min = min,
                     Max = max))
##   PctBB_Median MedIncome_Median PctBB_Mean MedIncome_Mean PctBB_SD MedIncome_SD
## 1         71.9            44122   72.22947          47167 7.045458     10837.72
##   PctBB_Min MedIncome_Min PctBB_Max MedIncome_Max
## 1      51.2         30136      93.4        112962

(Here, summarize what the descriptive statistics say)

Finally, let’s see what the relationship between broadband access and median household income looks like.

ggplot(mydata,aes(x = MedIncome,
                  y = PctBB))+
  geom_point(size = 2)+
  geom_smooth(method = "lm",
              se = FALSE)
## `geom_smooth()` using formula = 'y ~ x'

(Finish up by summarizing your interpretation of what the scatterplot shows. You can talk, for example, about the overall trend and about any outlier / oddball cases you see).