In this activity, you merge together two datasets, each of which has U.S. states as the unit of analysis. In other words, an observation in each of these datasets is a U.S. state.
The first dataset, which can be downloaded as a comma separated value
(.csv) file from Canvas, is called
StateCovidDoses.csv and includes information about the
number of Covid vaccines administered in each U.S. state as of January
25, 2022, obtained from Our
World in Data.
The second dataset (also a .csv file that can be
downloaded from Canvas) is called
percapita-income-states-2022.csv and contains the per
capita income in each state in 2022 (i.e. the average income per person
in the state).
The variables in StateCovidDoses.csv are:
State name of the stateTotalDoses total number of doses administered in
stateAtLeastOneDose number of people in state who have had
at least one doseFullyVaccinated number of people in state who are fully
vaccinatedPopulation state populationThe variables in percapita-income-states-2022.csv
are
State name of the stateincpercap per capita income in the state in 2022You should begin by downloading both of these datasets as well as the
Lab 2 R Markdown template to your computer, saving them all in the same
folder. Then double-click the .Rmd template file to start
RStudio.
Load both datasets (don’t attach either of them yet since we’ll be
merging them, then attaching the merged dataset). Because both of these
datasets are stored as .csv files, you’ll use the
read.csv() command to load each, assigning them each a
name. (I suggest calling them VaccineDoses and
PerCapitaIncome, respectively). Note you probably shouldn’t
attach either of these datasets since we’ll be working with a merged
dataset, which we’ll attach later).
Then use the head() command twice to look at the first
several rows of each dataset separately.
VaccineDoses <- read.csv("StateCovidDoses.csv")
PerCapitaIncome <- read.csv("percapita-income-states-2022.csv")
head(VaccineDoses)
## State TotalDoses AtLeastOneDose FullyVaccinated Population
## 1 Alabama 5919745 2989103 2414385 4903300
## 2 Alaska 1062252 495018 425713 731591
## 3 Arizona 11067589 5075793 4259553 7278799
## 4 Arkansas 3975150 1954538 1583447 3017814
## 5 California 70217759 34066658 27051905 39514907
## 6 Colorado 9979697 4439463 3913025 5758683
head(PerCapitaIncome)
## State incpercap
## 1 Alabama 48540
## 2 Alaska 67742
## 3 Arizona 54422
## 4 Arkansas 50943
## 5 California 77211
## 6 Colorado 70764
Merge the two datasets (called VaccineDoses and
PerCapitaIncome if you used the suggested names) together.
Note that the variable State is included in both datasets.
You should call this new merged dataset StateMerge.
(Hint: Remember that the merge() command wants you to
give the variable name on which you’re merging – the one that tells it
how to match observations between the two datasets – in quotes for the
by argument.)
After merging, you should attach the new merged dataset.
StateMerge <- merge(VaccineDoses, PerCapitaIncome, by = "State")
head(StateMerge)
## State TotalDoses AtLeastOneDose FullyVaccinated Population incpercap
## 1 Alabama 5919745 2989103 2414385 4903300 48540
## 2 Alaska 1062252 495018 425713 731591 67742
## 3 Arizona 11067589 5075793 4259553 7278799 54422
## 4 Arkansas 3975150 1954538 1583447 3017814 50943
## 5 California 70217759 34066658 27051905 39514907 77211
## 6 Colorado 9979697 4439463 3913025 5758683 70764
attach(StateMerge)
Make a scatterplot using the plot() command of
AtLeastOneDose against incpercap (with
AtLeastOneDose on the vertical axis) and briefly comment on
what relationship you see, if any.
plot(incpercap, AtLeastOneDose)
Income Per Capita only seems to have a moderate positive effect on the number of people who have gotten at least one dose of the vaccine.
Create a new variable that is the percent of the population in each
state that has at least one dose of the vaccine, calling this new
variable PctAtLeastOneDose.
Hint: you will want to divide the number of people with at least one dose by the population, then multiply the whole thing by 100 so it’s a percent not a proportion (i.e. it’s between 0 and 100 like a percent, not just between 0 and 1 like a proportion).
Next, make a histogram of this new variable and briefly comment on what you learned. (Note: New Hampshire’s data are reported incorrectly in these data for some reason – they obviously can’t have vaccinated more than 100 percent of their residents – but we will ignore that).
PctAtLeastOneDose <- AtLeastOneDose/Population * 100
hist(PctAtLeastOneDose)
Most commonly, state have a percentage of the population that have gotten at least one dose of the vaccine between 60-80%.
Now make a scatterplot similar to the one you made above, but this
time having PctAtLeastOneDose on the vertical axis and
again having incpercap on the horizontal axis. Briefly
comment on what you see.
plot(incpercap, PctAtLeastOneDose)
There appears to be a positive correlation between an increase in income per capita and an increase in percentage of the population who have receieved at least one dose.
Create a variable called TotalDosesPerCapita that is
number of total doses in each state divided by the state’s population.
Then make a scatterplot similar to the one you made above, but this time
having TotalDosesPerCapita on the vertical axis and again
having incpercap on the horizontal axis. Briefly comment on
what you see.
TotalDosesPerCapita <- TotalDoses/Population
plot(incpercap, TotalDosesPerCapita)
As the income per capita increases the total doses per capita also appears to increase, possibly signaling a correlation.
What percentage of Texans have received at least one dose of the vaccine? …and what percentage of Texans are fully vaccinated? …and how many total doses has Texas distributed? …and how big is Texas?
(Hint: Remember you can type x[y=="something"] to have R
print the value of the variable x for the observation that
has variable y equal to “something”.)
PctAtLeastOneDose[State=="Texas"]
## [1] 69.31905
FullyVaccinated[State=="Texas"]
## [1] 16958567
TotalDoses[State=="Texas"]
## [1] 42726217
Population[State=="Texas"]
## [1] 28993960
Use these data to calculate the proportion of the entire U.S. population that has had at least one shot (for our purposes here we’ll ignore the fact that these data don’t contain Washington, DC or U.S. territories). Also separately calculate the proportion of the U.S. population that is fully vaccinated.
Hint: This is not the simple average (mean) of the
PctAtLeastOneDose variable (or of the
PctFullyVaccinated variable for the second part of the
question). To calculate this appropriately, you can first calculate the
U.S. population based on the populations of the 50 states, then
calculate the number of people who have had at least one shot in the
entire U.S. based on the number who have had at least one shot in each
of the 50 states, then use those numbers to calculate the percent of
Americans with at least one shot (and similarly for the percent fully
vaccinated. You should be able to do this all with one line of code for
each variable if you think carefully.
USPopulation <- sum(Population)
USPopulation
## [1] 327530913
AtLeastOneUS <- sum(AtLeastOneDose)
AtLeastOneUS
## [1] 246245532
PctAtLeastOneUS <- AtLeastOneUS/USPopulation * 100
PctAtLeastOneUS
## [1] 75.18238
FullyVacUS <- sum(FullyVaccinated)
FullyVacUS
## [1] 206562679
PctFullyVacUS <- FullyVacUS/USPopulation * 100
PctFullyVacUS
## [1] 63.06662