Question 1: Loading and Exploring the Dataset

VaccineDoses <- read.csv("StateCovidDoses.csv", stringsAsFactors = FALSE)
TrumpVote    <- read.csv("StateTrumpVote.csv", stringsAsFactors = FALSE)

head(VaccineDoses)
##        State TotalDoses FullyVaccinated Population
## 1    Alabama    7018011         2611593    4903479
## 2     Alaska    1328221          477592     731493
## 3    Arizona   14647405         4821350    7278608
## 4   Arkansas    4874091         1720209    3017911
## 5 California   88487852        29588939   39509866
## 6   Colorado   13033446         4248431    5759023
head(TrumpVote)
##        State TrumpPct
## 1    Alabama 62.91137
## 2     Alaska 55.26185
## 3    Arizona 49.84317
## 4   Arkansas 64.21243
## 5 California 35.09109
## 6   Colorado 43.06168

Question 2: Merging data

stateMerge <- merge(VaccineDoses, TrumpVote, by = "State")
attach(stateMerge)
head(stateMerge)
##        State TotalDoses FullyVaccinated Population TrumpPct
## 1    Alabama    7018011         2611593    4903479 62.91137
## 2     Alaska    1328221          477592     731493 55.26185
## 3    Arizona   14647405         4821350    7278608 49.84317
## 4   Arkansas    4874091         1720209    3017911 64.21243
## 5 California   88487852        29588939   39509866 35.09109
## 6   Colorado   13033446         4248431    5759023 43.06168

Question 3: Examining Association Between Variables

plot(TrumpPct, FullyVaccinated,
     xlab = "Trump Vote % (2020)",
     ylab = "Number Fully Vaccinated",
     main = "Fully Vaccinated vs Trump %")
abline(lm(FullyVaccinated ~ TrumpPct), col = "red")

States with higher Trump vote share tend to have fewer fully vaccinated individuals

Question 4: Creating new variables

stateMerge$PctFullyVaccinated <- (stateMerge$FullyVaccinated / stateMerge$Population) * 100
hist(stateMerge$PctFullyVaccinated,
     xlab = "Percent Fully Vaccinated",
     main = "Distribution of State Vaccination Rates",
     col = "lightblue", breaks = 10)

The histogram shows how states cluster. Theres a skew, with some states much higher than others

Question 5: More Associations Between Variables

plot(TrumpPct, stateMerge$PctFullyVaccinated,
     xlab = "Trump Vote % (2020)",
     ylab = "Percent Fully Vaccinated",
     main = "Vaccination Rate vs Trump %")
abline(lm(stateMerge$PctFullyVaccinated ~ TrumpPct), col = "red")

Theres a clearer negative slope here than in Question 3 states with higher Trump support usually had lower vaccination rates. ### Question 6: Summing Variables

US_PctFullyVaccinated <- sum(FullyVaccinated, na.rm = TRUE) / sum(Population, na.rm = TRUE) * 100
US_PctFullyVaccinated
## [1] 68.96843