library('tidyverse')
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.7 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.0 ✔ stringr 1.4.0
## ✔ readr 2.1.2 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
UScovid <- read.csv("/Users/kylerhalat-shafer/Desktop/UVA/MSDS/STAT 6021/Day 1 - R Basics/UScovid.csv")
latest <- UScovid%>%
filter(date == '2021-06-03')%>%
subset(county != 'Unknown')%>%
select(-c(fips,date))%>%
arrange(county, state)
head(latest)
latest <- latest%>%
mutate(death.rate = round(deaths/cases,2))
latest%>%
group_by(county)%>%
mutate(death.rate = round(deaths/cases,2))
head(latest)
latest%>%
slice_max(cases,n=10)
latest%>%
slice_max(deaths,n=10)
These counties are incredibly small in terms of population, therefore the deaths have a large impact on the fatality rate, whereas in larger counties, there’s more in number of deaths and cases becasue the populations are much larger, yet the fatality rate remains lower.
latest%>%
slice_max(death.rate,n=10)
latest%>%
filter(cases>=100000)%>%
slice_max(death.rate,n=10)
latest%>%
filter(county == 'Albemarle' | county == 'Charlottesville city')