Let’s install libraries
#install.packages("ggpubr")
library(ggpubr)## Loading required package: ggplot2
#install.packages("mvnormtest")
library(mvnormtest)
#install.packages("agricolae")
library(agricolae)
#install.packages("gplots")
library(gplots)##
## Attaching package: 'gplots'
## The following object is masked from 'package:stats':
##
## lowess
#install.packages("decoder")
library(decoder)
#install.packages("lubridate")
library(lubridate)##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
#install.packages("covid19.analytics")
library(covid19.analytics)
#install.packages("xts")
library(xts)## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
#install.packages("fpp2")
library(fpp2)## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## ── Attaching packages ────────────────────────────────────────────── fpp2 2.4 ──
## ✓ forecast 8.14 ✓ expsmooth 2.3
## ✓ fma 2.4
## ── Conflicts ───────────────────────────────────────────────── fpp2_conflicts ──
## x forecast::gghistogram() masks ggpubr::gghistogram()
In this lab, we are going to use dataset which is about active Covid-19 cases daily basis per each city and provinces.
Data source: https://github.com/dtandev/coronavirus/tree/master/data
url <- "https://raw.githubusercontent.com/dtandev/coronavirus/master/data/CoronavirusPL%20-%20Timeseries.csv"
covid_pl <- read.csv(url,header = T, sep = ",",encoding = "UTF-8")infected <- covid_pl[covid_pl$Infection.Death.Recovery=="I",]
infected$Infection.Death.Recovery<- 1
infected_city <- aggregate(Infection.Death.Recovery~Province+Timestamp, data = infected, FUN = sum)
infected_city_mean <- aggregate(Infection.Death.Recovery~Province, data = infected_city, FUN = mean)# drawing box plots
# Please use ggboxplot function from ggpubr package, if you already installed and loaded it, there is no need to do it again.
ggboxplot(infected_city, x = "Province", y = "Infection.Death.Recovery",
color = "Province", ylab = "Number of infected", xlab = "Province")## Data being read from JHU/CCSE repository
## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
## Reading data from https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv
## Data retrieved on 2021-06-22 23:53:14 || Range of dates on data: 2020-01-22--2021-06-21 | Nbr of records: 278
## --------------------------------------------------------------------------------
## Processing... POLAND
## Loading required package: pheatmap
The transformations used:
- BoxCox with auto lambda
- log()
- differentation with lag = 1 and differences = 2
zmiany_ts = ts(zmiany,start=c(2020,1,23),frequency=365)
x = ts(zmiany_ts)
x = diff(x, differences = 2)
x %>% autoplot() + ylab("Daily infections") + xlab("Day") + ggtitle("(Worse) stationarity data after transformations")
In this transformations is used only diff with 2 differences.
This is simplier transformation than the previous one but the “stationarity” data is worse so I will not use this one.
In the next steps, I will use the previous transformations with BoxCox
zmiany_ts = ts(zmiany)
zmiany_ts_diff = diff(zmiany_ts, lag = 2)
a = auto.arima(zmiany_ts, d = 2, lambda = "auto", stationary = TRUE, biasadj = TRUE)I have used auto ARIMA model with the diff = 2, BoxCox’s lambda = “auto” is also used. It provides only non-negative values in our further forecasting. The number of new-daily infections cannot be negative.
In 31-days forecasting, the number of new-daily infections is forecasted to be in similar level. However, there is a chance to suddenly grow but there is no possibility to reach even half of the level from the second wave (days 400-450).
As we can see, in long-term (180 days) the number of new-daily infections will be probably low but according to the violet areas, there is also a possibility to slightly grow or even grow quickly. The another “wave” may occur but it should be smaller than previous one.