Let’s install libraries
library(ggpubr)## Loading required package: ggplot2
library(mvnormtest)
library(agricolae)
library(gplots)##
## Attaching package: 'gplots'
## The following object is masked from 'package:stats':
##
## lowess
library(decoder)
library(lubridate)##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(covid19.analytics)
library(xts)## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(fpp2)## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## -- Attaching packages ---------------------------------------------- fpp2 2.4 --
## v forecast 8.14 v expsmooth 2.3
## v fma 2.4
## -- Conflicts ------------------------------------------------- fpp2_conflicts --
## x forecast::gghistogram() masks ggpubr::gghistogram()
library(tseries)In this lab, we are going to use dataset which is about active Covid-19 cases daily basis per each city and provinces.
Data source: https://github.com/dtandev/coronavirus/tree/master/data
url <- "https://raw.githubusercontent.com/dtandev/coronavirus/master/data/CoronavirusPL%20-%20Timeseries.csv"
covid_pl <- read.csv(url,header = T, sep = ",",encoding = "UTF-8")infected <- covid_pl[covid_pl$Infection.Death.Recovery=="I",]
infected$Infection.Death.Recovery<- 1
infected_city <- aggregate(Infection.Death.Recovery~Province+Timestamp, data = infected, FUN = sum)
infected_city_mean <- aggregate(Infection.Death.Recovery~Province, data = infected_city, FUN = mean)# drawing box plots
# Please use ggboxplot function from ggpubr package, if you already installed and loaded it, there is no need to do it again.
ggboxplot(infected_city, x = "Province", y = "Infection.Death.Recovery",
color = "Province", ylab = "Number of infected", xlab = "Province") The biggest number of infected people is in ÅšlÄ…skie.
## Data being read from JHU/CCSE repository
## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
## Reading data from https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv
## Data retrieved on 2021-06-22 22:30:21 || Range of dates on data: 2020-01-22--2021-06-21 | Nbr of records: 278
## --------------------------------------------------------------------------------
## Processing... POLAND
## Loading required package: pheatmap
There were two big trends where new-daily infections were increasing. The first was between the third and the fourth quarter of 2020 and the second was from the beginning to the end of the first quarter of 2021.
##
## Augmented Dickey-Fuller Test
##
## data: zmiany_ts
## Dickey-Fuller = -2.0659, Lag order = 8, p-value = 0.5504
## alternative hypothesis: stationary
## Warning in adf.test(zmiany_ts_diff): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: zmiany_ts_diff
## Dickey-Fuller = -5.099, Lag order = 8, p-value = 0.01
## alternative hypothesis: stationary
## Warning: The chosen seasonal unit root test encountered an error when testing for the first difference.
## From stl(): series is not periodic or has less than two periods
## 0 seasonal differences will be used. Consider using a different unit root test.
## Series: zmiany_ts_diff
## ARIMA(4,0,0) with zero mean
##
## Coefficients:
## ar1 ar2 ar3 ar4
## 0.6703 -0.9705 0.4192 -0.6125
## s.e. 0.0347 0.0416 0.0415 0.0344
##
## sigma^2 estimated as 4888990: log likelihood=-4687.23
## AIC=9384.47 AICc=9384.59 BIC=9405.68
Proposed ARIMA by auto.arima() function is: AR - 4, I - 0, MA - 0
##
## Ljung-Box test
##
## data: Residuals from ARIMA(4,0,0) with non-zero mean
## Q* = 1386.6, df = 98, p-value < 2.2e-16
##
## Model df: 5. Total lags used: 103
The model shows that the forecasting changes in infected people will be slowly increasing in the forecasted period.