Let’s install libraries

library(ggpubr)

## Loading required package: ggplot2

library(mvnormtest)
library(agricolae)
library(gplots)

## 
## Attaching package: 'gplots'

## The following object is masked from 'package:stats':
## 
##     lowess

library(decoder)
library(lubridate)

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

library(covid19.analytics)
library(xts)

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

library(fpp2)

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

## -- Attaching packages ---------------------------------------------- fpp2 2.4 --

## v forecast  8.14     v expsmooth 2.3 
## v fma       2.4

## -- Conflicts ------------------------------------------------- fpp2_conflicts --
## x forecast::gghistogram() masks ggpubr::gghistogram()

library(tseries)

Data

In this lab, we are going to use dataset which is about active Covid-19 cases daily basis per each city and provinces.

Data source: https://github.com/dtandev/coronavirus/tree/master/data

Let"s download data from github repository.

url <- "https://raw.githubusercontent.com/dtandev/coronavirus/master/data/CoronavirusPL%20-%20Timeseries.csv"

covid_pl <- read.csv(url,header = T, sep = ",",encoding = "UTF-8")

Preparing dataset for analysis

infected <- covid_pl[covid_pl$Infection.Death.Recovery=="I",]

infected$Infection.Death.Recovery<- 1

infected_city <- aggregate(Infection.Death.Recovery~Province+Timestamp, data = infected, FUN = sum)

infected_city_mean <- aggregate(Infection.Death.Recovery~Province, data = infected_city, FUN = mean)

Drawing box plots

# drawing box plots
# Please use ggboxplot function from ggpubr package, if you already installed and loaded it, there is no need to do it again.  


ggboxplot(infected_city, x = "Province", y = "Infection.Death.Recovery", 
          color = "Province", ylab = "Number of infected", xlab = "Province")

The biggest number of infected people is in Śląskie.

Preparing data for infected cases only

## Data being read from JHU/CCSE repository

## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

## Reading data from https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv

## Data retrieved on 2021-06-22 22:30:21 || Range of dates on data: 2020-01-22--2021-06-21 | Nbr of records: 278

## --------------------------------------------------------------------------------

## Processing...  POLAND

## Loading required package: pheatmap

Plots

There were two big trends where new-daily infections were increasing. The first was between the third and the fourth quarter of 2020 and the second was from the beginning to the end of the first quarter of 2021.

Stationarity - transformations

## 
##  Augmented Dickey-Fuller Test
## 
## data:  zmiany_ts
## Dickey-Fuller = -2.0659, Lag order = 8, p-value = 0.5504
## alternative hypothesis: stationary

## Warning in adf.test(zmiany_ts_diff): p-value smaller than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  zmiany_ts_diff
## Dickey-Fuller = -5.099, Lag order = 8, p-value = 0.01
## alternative hypothesis: stationary

ACF, PACF

ARIMA

## Warning: The chosen seasonal unit root test encountered an error when testing for the first difference.
## From stl(): series is not periodic or has less than two periods
## 0 seasonal differences will be used. Consider using a different unit root test.

## Series: zmiany_ts_diff 
## ARIMA(4,0,0) with zero mean 
## 
## Coefficients:
##          ar1      ar2     ar3      ar4
##       0.6703  -0.9705  0.4192  -0.6125
## s.e.  0.0347   0.0416  0.0415   0.0344
## 
## sigma^2 estimated as 4888990:  log likelihood=-4687.23
## AIC=9384.47   AICc=9384.59   BIC=9405.68

Proposed ARIMA by auto.arima() function is: AR - 4, I - 0, MA - 0

Forecasting

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(4,0,0) with non-zero mean
## Q* = 1386.6, df = 98, p-value < 2.2e-16
## 
## Model df: 5.   Total lags used: 103

The model shows that the forecasting changes in infected people will be slowly increasing in the forecasted period.

Covid19 modeling daily changes for Poland

Karol Flisikowski edited by Marcin Kyć 184678

6/3/2020