Chapter 8 Project Solutions

The Data

Run the following R command:

Take a look at the data:

Each row is a state in the United States, on a particular day. Only states that have experienced 500 or more cases of COVID-19 by April 4, 2020, the most recent day covered by this data. The variables are:

  • date: the date
  • state: the name of the state
  • fips: a standard geographic code for the state
  • cases: the number of confirmed cases of COVID-19 in the state, up to the given date
  • deaths: number of deaths in the state due to COVID-19, up to the given date
  • day_number: number of days since the 500-case mark was reached in the state

The data frame threshold_states comes from public data provided by the New York Times. For more information see:

https://github.com/nytimes/covid-19-data

Problem 1

Write a function called state_cases_compare() that makes line plots of the number of confirmed cases against the number of days since reaching the threshold of 500 or more confirmed cases, for any set of states that the user specifies. The function should take one parameter called states, a character vector of the names of the states that are to figure in the graph. If one of the requested states has not yet experienced 500 cases, then the function should return a message to that effect, and not produce a graph. Examples of use should be as follows:

Here is an example where one of the states has not experienced 500 cases by April 4, 2020:

## Alaska has not yet had 500 cases.
## West Virginia has not yet had 500 cases.
## Try other states!

Hint

geom_line() makes the line plots that you see in the example above. Don’t forget to aesthetically map color to state.

Problem 2

Part a

Create a new variable for threshold_states called mortality_rate that gives the mortality rate for COVID-19, computed as the percentage of all confirmed cases that have resulted in death by the given date. Then create a new data frame that containing all of the rows of threshold_states but only the columns date, state, cases and mortality_rate.

Hint for Part a

If you do it right then the first ten rows of the new data frame will look like this:

date state cases mortality_rate
2020-03-26 Alabama 538 0.5576208
2020-03-27 Alabama 639 0.6259781
2020-03-28 Alabama 720 0.5555556
2020-03-29 Alabama 830 0.6024096
2020-03-30 Alabama 947 1.1615628
2020-03-31 Alabama 999 1.3013013
2020-04-01 Alabama 1106 2.5316456
2020-04-02 Alabama 1270 2.5196850
2020-04-03 Alabama 1535 2.4755700
2020-04-04 Alabama 1633 2.6944274

Part b

Next make a data frame that ranks the states in order of mortality rate as of April 4, 2020.

Hint for Part b

You will need to select only the rows for April 4, 2020. Also read about order() in Chapter 7.

If you do it right the data frame will look like this:

Homer S. White