Due by 11:00pm on 9/16, submitted through Canvas

In this activity, you merge together two datasets, each of which has U.S. states as the unit of analysis. In other words, an observation in each of these datasets is a U.S. state.

The first dataset, which can be downloaded as a comma separated value (.csv) file from Canvas, is called StateCovidDoses.csv and includes information about the number of Covid vaccines administered in each U.S. state as of January 25, 2022, obtained from Our World in Data.

The second dataset (also a .csv file that can be downloaded from Canvas) is called percapita-income-states-2022.csv and contains the per capita income in each state in 2022 (i.e. the average income per person in the state).

The variables in StateCovidDoses.csv are:

The variables in percapita-income-states-2022.csv are

You should begin by downloading both of these datasets as well as the Lab 2 R Markdown template to your computer, saving them all in the same folder. Then double-click the .Rmd template file to start RStudio.

Question 1: Loading and Exploring the Dataset

Load both datasets (don’t attach either of them yet since we’ll be merging them, then attaching the merged dataset). Because both of these datasets are stored as .csv files, you’ll use the read.csv() command to load each, assigning them each a name. (I suggest calling them VaccineDoses and PerCapitaIncome, respectively). Note you probably shouldn’t attach either of these datasets since we’ll be working with a merged dataset, which we’ll attach later).

Then use the head() command twice to look at the first several rows of each dataset separately.

VaccineDoses <- read.csv("StateCovidDoses.csv")
PerCapitaIncome <- read.csv("percapita-income-states-2022.csv")
head(VaccineDoses)
##        State TotalDoses AtLeastOneDose FullyVaccinated Population
## 1    Alabama    5919745        2989103         2414385    4903300
## 2     Alaska    1062252         495018          425713     731591
## 3    Arizona   11067589        5075793         4259553    7278799
## 4   Arkansas    3975150        1954538         1583447    3017814
## 5 California   70217759       34066658        27051905   39514907
## 6   Colorado    9979697        4439463         3913025    5758683
head(PerCapitaIncome)
##        State incpercap
## 1    Alabama     48540
## 2     Alaska     67742
## 3    Arizona     54422
## 4   Arkansas     50943
## 5 California     77211
## 6   Colorado     70764

Question 2: Merging data

Merge the two datasets (called VaccineDoses and PerCapitaIncome if you used the suggested names) together. Note that the variable State is included in both datasets. You should call this new merged dataset StateMerge.

(Hint: Remember that the merge() command wants you to give the variable name on which you’re merging – the one that tells it how to match observations between the two datasets – in quotes for the by argument.)

After merging, you should attach the new merged dataset.

StateMerge <- merge(VaccineDoses, PerCapitaIncome, by = "State")
head(StateMerge)
##        State TotalDoses AtLeastOneDose FullyVaccinated Population incpercap
## 1    Alabama    5919745        2989103         2414385    4903300     48540
## 2     Alaska    1062252         495018          425713     731591     67742
## 3    Arizona   11067589        5075793         4259553    7278799     54422
## 4   Arkansas    3975150        1954538         1583447    3017814     50943
## 5 California   70217759       34066658        27051905   39514907     77211
## 6   Colorado    9979697        4439463         3913025    5758683     70764
attach(StateMerge)

Question 3: Examining Association Between Variables

Make a scatterplot using the plot() command of AtLeastOneDose against incpercap (with AtLeastOneDose on the vertical axis) and briefly comment on what relationship you see, if any.

plot(incpercap, AtLeastOneDose)

Income Per Capita only seems to have a moderate positive effect on the number of people who have gotten at least one dose of the vaccine.

Question 4: Creating New Variables

Create a new variable that is the percent of the population in each state that has at least one dose of the vaccine, calling this new variable PctAtLeastOneDose.

Hint: you will want to divide the number of people with at least one dose by the population, then multiply the whole thing by 100 so it’s a percent not a proportion (i.e. it’s between 0 and 100 like a percent, not just between 0 and 1 like a proportion).

Next, make a histogram of this new variable and briefly comment on what you learned. (Note: New Hampshire’s data are reported incorrectly in these data for some reason – they obviously can’t have vaccinated more than 100 percent of their residents – but we will ignore that).

PctAtLeastOneDose <- AtLeastOneDose/Population * 100

hist(PctAtLeastOneDose)

Most commonly, state have a percentage of the population that have gotten at least one dose of the vaccine between 60-80%.

Question 5: Associations Between Variables

Now make a scatterplot similar to the one you made above, but this time having PctAtLeastOneDose on the vertical axis and again having incpercap on the horizontal axis. Briefly comment on what you see.

plot(incpercap, PctAtLeastOneDose)

There appears to be a positive correlation between an increase in income per capita and an increase in percentage of the population who have receieved at least one dose.

Question 6: More Associations Between Variables

Create a variable called TotalDosesPerCapita that is number of total doses in each state divided by the state’s population. Then make a scatterplot similar to the one you made above, but this time having TotalDosesPerCapita on the vertical axis and again having incpercap on the horizontal axis. Briefly comment on what you see.

TotalDosesPerCapita <- TotalDoses/Population

plot(incpercap, TotalDosesPerCapita)

As the income per capita increases the total doses per capita also appears to increase, possibly signaling a correlation.

Question 7: What about Texas?

What percentage of Texans have received at least one dose of the vaccine? …and what percentage of Texans are fully vaccinated? …and how many total doses has Texas distributed? …and how big is Texas?

(Hint: Remember you can type x[y=="something"] to have R print the value of the variable x for the observation that has variable y equal to “something”.)

PctAtLeastOneDose[State=="Texas"]
## [1] 69.31905
FullyVaccinated[State=="Texas"]
## [1] 16958567
TotalDoses[State=="Texas"]
## [1] 42726217
Population[State=="Texas"]
## [1] 28993960

Question 8: Summing Variables

Use these data to calculate the proportion of the entire U.S. population that has had at least one shot (for our purposes here we’ll ignore the fact that these data don’t contain Washington, DC or U.S. territories). Also separately calculate the proportion of the U.S. population that is fully vaccinated.

Hint: This is not the simple average (mean) of the PctAtLeastOneDose variable (or of the PctFullyVaccinated variable for the second part of the question). To calculate this appropriately, you can first calculate the U.S. population based on the populations of the 50 states, then calculate the number of people who have had at least one shot in the entire U.S. based on the number who have had at least one shot in each of the 50 states, then use those numbers to calculate the percent of Americans with at least one shot (and similarly for the percent fully vaccinated. You should be able to do this all with one line of code for each variable if you think carefully.

USPopulation <- sum(Population)
USPopulation
## [1] 327530913
AtLeastOneUS <- sum(AtLeastOneDose)
AtLeastOneUS
## [1] 246245532
PctAtLeastOneUS <- AtLeastOneUS/USPopulation * 100
PctAtLeastOneUS
## [1] 75.18238
FullyVacUS <- sum(FullyVaccinated)
FullyVacUS
## [1] 206562679
PctFullyVacUS <- FullyVacUS/USPopulation * 100
PctFullyVacUS
## [1] 63.06662