In this project I use R Markdown and Plotly to create a bar chart of the COVID-19 outbreak using April 18th, 2020 data. Data was taken from the 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE GitHub. I will only examine the death toll among states within the United States.
Project Prompt: Create a web page presentation using R Markdown that features a plot created with Plotly. Host your webpage on either GitHub Pages, RPubs, or NeoCities. Your webpage must contain the date that you created the document, and it must contain a plot created with Plotly.
Sys.info()
## sysname release version nodename
## "Windows" "10 x64" "build 18363" "DESKTOP-DP7KPRO"
## machine login user effective_user
## "x86-64" "Derek" "Derek" "Derek"
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.6.3
library(plotly)
## Warning: package 'plotly' was built under R version 3.6.3
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(tidyr)
## Warning: package 'tidyr' was built under R version 3.6.2
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.6.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Downloading and reading in the data set.
url <- "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/04-18-2020.csv"
file <- "04-18-2020.csv"
download.file(url = url, destfile = file, method = "curl")
mapdata <- read.csv(file)
head(mapdata)
## ï..FIPS Admin2 Province_State Country_Region Last_Update Lat
## 1 45001 Abbeville South Carolina US 2020-04-18 22:32:47 34.22333
## 2 22001 Acadia Louisiana US 2020-04-18 22:32:47 30.29506
## 3 51001 Accomack Virginia US 2020-04-18 22:32:47 37.76707
## 4 16001 Ada Idaho US 2020-04-18 22:32:47 43.45266
## 5 19001 Adair Iowa US 2020-04-18 22:32:47 41.33076
## 6 21001 Adair Kentucky US 2020-04-18 22:32:47 37.10460
## Long_ Confirmed Deaths Recovered Active Combined_Key
## 1 -82.46171 15 0 0 15 Abbeville, South Carolina, US
## 2 -92.41420 110 7 0 103 Acadia, Louisiana, US
## 3 -75.63235 33 0 0 33 Accomack, Virginia, US
## 4 -116.24155 593 9 0 584 Ada, Idaho, US
## 5 -94.47106 1 0 0 1 Adair, Iowa, US
## 6 -85.28130 47 3 0 44 Adair, Kentucky, US
str(mapdata)
## 'data.frame': 3053 obs. of 12 variables:
## $ ï..FIPS : int 45001 22001 51001 16001 19001 21001 29001 40001 8001 16003 ...
## $ Admin2 : Factor w/ 1636 levels "","Abbeville",..: 2 3 4 5 6 6 6 6 7 7 ...
## $ Province_State: Factor w/ 138 levels "","Alabama","Alaska",..: 116 62 129 50 54 60 73 94 20 50 ...
## $ Country_Region: Factor w/ 185 levels "Afghanistan",..: 177 177 177 177 177 177 177 177 177 177 ...
## $ Last_Update : Factor w/ 33 levels "2020-02-23 11:19:02",..: 31 31 31 31 31 31 31 31 31 31 ...
## $ Lat : num 34.2 30.3 37.8 43.5 41.3 ...
## $ Long_ : num -82.5 -92.4 -75.6 -116.2 -94.5 ...
## $ Confirmed : int 15 110 33 593 1 47 12 29 860 1 ...
## $ Deaths : int 0 7 0 9 0 3 0 3 31 0 ...
## $ Recovered : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Active : int 15 103 33 584 1 44 12 26 829 1 ...
## $ Combined_Key : Factor w/ 3053 levels "Abbeville, South Carolina, US",..: 1 2 3 4 5 6 7 8 9 10 ...
Now I will manipulate the data set to exclude all rows that do no pertain to the United States.
mapdata_US <- mapdata[mapdata$Country_Region == "US",]
str(mapdata_US)
## 'data.frame': 2791 obs. of 12 variables:
## $ ï..FIPS : int 45001 22001 51001 16001 19001 21001 29001 40001 8001 16003 ...
## $ Admin2 : Factor w/ 1636 levels "","Abbeville",..: 2 3 4 5 6 6 6 6 7 7 ...
## $ Province_State: Factor w/ 138 levels "","Alabama","Alaska",..: 116 62 129 50 54 60 73 94 20 50 ...
## $ Country_Region: Factor w/ 185 levels "Afghanistan",..: 177 177 177 177 177 177 177 177 177 177 ...
## $ Last_Update : Factor w/ 33 levels "2020-02-23 11:19:02",..: 31 31 31 31 31 31 31 31 31 31 ...
## $ Lat : num 34.2 30.3 37.8 43.5 41.3 ...
## $ Long_ : num -82.5 -92.4 -75.6 -116.2 -94.5 ...
## $ Confirmed : int 15 110 33 593 1 47 12 29 860 1 ...
## $ Deaths : int 0 7 0 9 0 3 0 3 31 0 ...
## $ Recovered : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Active : int 15 103 33 584 1 44 12 26 829 1 ...
## $ Combined_Key : Factor w/ 3053 levels "Abbeville, South Carolina, US",..: 1 2 3 4 5 6 7 8 9 10 ...
Now I will show a stacked bar chart of the confirmed, recovered, and death cases by state of Covid-19.
State <- mapdata_US$Province_State
Confirmed <- mapdata_US$Confirmed
Death <- mapdata_US$Deaths
PlotData <- data.frame(State, Confirmed, Death)
## As it is, PlotData contains many rows per state. We first need to aggregate these into one row per state, with other columns being summed over. This can be done using dplyr.
PlotData_Clean <- PlotData %>%
group_by(State) %>%
summarise_all(sum)
summary(PlotData_Clean)
## State Confirmed Death
## Alabama : 1 Min. : 0 Min. : 0.00
## Alaska : 1 1st Qu.: 1151 1st Qu.: 26.75
## Arizona : 1 Median : 2812 Median : 128.50
## Arkansas : 1 Mean : 12624 Mean : 666.62
## California: 1 3rd Qu.: 10536 3rd Qu.: 443.50
## Colorado : 1 Max. :241712 Max. :17671.00
## (Other) :52
## Examining the States, we see that there are actually a set of "states" that are not actually states and need to be removed. I will do this now.
remove <- c("Diamond Princess", "Grand Princess", "Guam", "Northern Mariana Islands", "Puerto Rico", "Recovered", "Virgin Islands")
PlotData_Clean <- PlotData_Clean[which(!PlotData_Clean$State %in% remove),]
## R will still hold on to the unused factor levels in the State variable, even thoug we aren't using them anymore. This will cause our plot later on to show them when we do not want them. Here we can remove them with the droplevels function.
PlotData_Clean <- droplevels(PlotData_Clean)
## Now the data is ready for plotting.
plot <- plot_ly(PlotData_Clean, x = ~State, y = ~Confirmed, type = 'bar', name = 'Confirmed Cases')
plot <- plot %>% add_trace(y = ~Death, name = 'Deaths')
plot <- plot %>% layout(yaxis = list(title = 'Count'), barmode = 'group')
plot