Chapter 8 Project Solutions
The Data
Run the following R command:
Take a look at the data:
Each row is a state in the United States, on a particular day. Only states that have experienced 500 or more cases of COVID-19 by April 4, 2020, the most recent day covered by this data. The variables are:
date: the datestate: the name of the statefips: a standard geographic code for the statecases: the number of confirmed cases of COVID-19 in the state, up to the given datedeaths: number of deaths in the state due to COVID-19, up to the given dateday_number: number of days since the 500-case mark was reached in the state
The data frame threshold_states comes from public data provided by the New York Times. For more information see:
Problem 1
Write a function called state_cases_compare() that makes line plots of the number of confirmed cases against the number of days since reaching the threshold of 500 or more confirmed cases, for any set of states that the user specifies. The function should take one parameter called states, a character vector of the names of the states that are to figure in the graph. If one of the requested states has not yet experienced 500 cases, then the function should return a message to that effect, and not produce a graph. Examples of use should be as follows:
Here is an example where one of the states has not experienced 500 cases by April 4, 2020:
## Alaska has not yet had 500 cases.
## West Virginia has not yet had 500 cases.
## Try other states!
Hint
geom_line() makes the line plots that you see in the example above. Don’t forget to aesthetically map color to state.
Solution
Here is one approach:
state_cases_compare <- function(states) {
## first take care of validation:
graphable_states <- unique(threshold_states$state)
good_to_go <- TRUE
for (state in states) {
if (!(state %in% graphable_states)) {
cat(state, " has not yet had 500 cases.\n", sep = "")
good_to_go <- FALSE
}
}
if (!good_to_go) {
return(cat("Try other states!"))
}
## if we made it this far, our input is OK,
## now get the all rows for the states to graph:
df <- subset(threshold_states, state %in% states)
## now make the graph:
ggplot(df, aes(x = day_number, y = cases)) +
geom_line(aes(color = state)) +
labs(
x = "days since reaching 500 cases",
y = "number of conformed cases"
)
}Problem 2
Part a
Create a new variable for threshold_states called mortality_rate that gives the mortality rate for COVID-19, computed as the percentage of all confirmed cases that have resulted in death by the given date. Then create a new data frame that containing all of the rows of threshold_states but only the columns date, state, cases and mortality_rate.
Hint for Part a
If you do it right then the first ten rows of the new data frame will look like this:
| date | state | cases | mortality_rate |
|---|---|---|---|
| 2020-03-26 | Alabama | 538 | 0.5576208 |
| 2020-03-27 | Alabama | 639 | 0.6259781 |
| 2020-03-28 | Alabama | 720 | 0.5555556 |
| 2020-03-29 | Alabama | 830 | 0.6024096 |
| 2020-03-30 | Alabama | 947 | 1.1615628 |
| 2020-03-31 | Alabama | 999 | 1.3013013 |
| 2020-04-01 | Alabama | 1106 | 2.5316456 |
| 2020-04-02 | Alabama | 1270 | 2.5196850 |
| 2020-04-03 | Alabama | 1535 | 2.4755700 |
| 2020-04-04 | Alabama | 1633 | 2.6944274 |
Solutionto Part a
Here is one way to do it:
Part b
Next make a data frame that ranks the states in order of mortality rate as of April 4, 2020.
Hint for Part b
You will need to select only the rows for April 4, 2020. Also read about order() in Chapter 7.
If you do it right the data frame will look like this:
Solution to Part b
Here is one way to do it: