https://datahub.io/core/covid-19#resource-time-series-19-covid-combined
I used the time-series-19-covid-combined dataset
Look at the structure of the data
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.2.1 v purrr 0.3.3
## v tibble 2.1.3 v dplyr 0.8.3
## v tidyr 1.0.0 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(lubridate)
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
covid19 <- read_csv("time-series-19-covid-combined_csv.csv")
## Parsed with column specification:
## cols(
## Date = col_character(),
## Country_Region = col_character(),
## Province_State = col_character(),
## Lat = col_double(),
## Long = col_double(),
## Confirmed = col_double(),
## Recovered = col_double(),
## Deaths = col_double()
## )
str(covid19)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 17526 obs. of 8 variables:
## $ Date : chr "1/22/2020" "1/23/2020" "1/24/2020" "1/25/2020" ...
## $ Country_Region: chr "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
## $ Province_State: chr NA NA NA NA ...
## $ Lat : num 33 33 33 33 33 33 33 33 33 33 ...
## $ Long : num 65 65 65 65 65 65 65 65 65 65 ...
## $ Confirmed : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Recovered : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Deaths : num 0 0 0 0 0 0 0 0 0 0 ...
## - attr(*, "spec")=
## .. cols(
## .. Date = col_character(),
## .. Country_Region = col_character(),
## .. Province_State = col_character(),
## .. Lat = col_double(),
## .. Long = col_double(),
## .. Confirmed = col_double(),
## .. Recovered = col_double(),
## .. Deaths = col_double()
## .. )
covid19a <- covid19 %>%
mutate(date = lubridate::mdy(Date))
str(covid19a)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 17526 obs. of 9 variables:
## $ Date : chr "1/22/2020" "1/23/2020" "1/24/2020" "1/25/2020" ...
## $ Country_Region: chr "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
## $ Province_State: chr NA NA NA NA ...
## $ Lat : num 33 33 33 33 33 33 33 33 33 33 ...
## $ Long : num 65 65 65 65 65 65 65 65 65 65 ...
## $ Confirmed : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Recovered : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Deaths : num 0 0 0 0 0 0 0 0 0 0 ...
## $ date : Date, format: "2020-01-22" "2020-01-23" ...
knitr::kable(head(covid19a))
Date | Country_Region | Province_State | Lat | Long | Confirmed | Recovered | Deaths | date |
---|---|---|---|---|---|---|---|---|
1/22/2020 | Afghanistan | NA | 33 | 65 | 0 | 0 | 0 | 2020-01-22 |
1/23/2020 | Afghanistan | NA | 33 | 65 | 0 | 0 | 0 | 2020-01-23 |
1/24/2020 | Afghanistan | NA | 33 | 65 | 0 | 0 | 0 | 2020-01-24 |
1/25/2020 | Afghanistan | NA | 33 | 65 | 0 | 0 | 0 | 2020-01-25 |
1/26/2020 | Afghanistan | NA | 33 | 65 | 0 | 0 | 0 | 2020-01-26 |
1/27/2020 | Afghanistan | NA | 33 | 65 | 0 | 0 | 0 | 2020-01-27 |
I filtered for the 8 countries that had the highest number of confirmed cases. Then I plotted points for the confirmed case values over time. Finally, I used ggplotly to make the plot interactive with mouse-over tool capabilities
p1 <- covid19a %>%
filter(Country_Region == "China" | Country_Region == "US" | Country_Region == "Italy" |Country_Region == "Spain" |Country_Region == "Germany" | Country_Region == "Iran" | Country_Region == "France" | Country_Region == "United Kingdom") %>%
ggplot(aes(date, Confirmed, color = Country_Region)) +
geom_point() +
ggtitle("COVID19 Confirmed Cases") +
xlab("Date")+
ylab("Confirmed Cases")+
theme_minimal()
ggplotly(p1)