Abstract

The primary objective of this analysis is to identify whether public health policies have a significant and varying impact on the number of positive Covid-19 cases using data obtained from various sources. Analysis of variance (ANOVA) is used to assess the relationship between Covid-19 and a comprehensive set of factors to determine the effect of public health policies on the number of positive Covid-19 cases while taking into account the effects of different regions. For this method, the number of global Covid-19 cases is used as the response variable while categorical region and policy factors are used as predictor variables. Based on the results obtained, all five public health policies (face covering, vaccination, internal movement, stay-at-home, and international travel regulations) are significantly useful in determining the trajectory of Covid-19 transmission globally. Additionally, the interactions between these factors are equally effective. The identified association between the number of positive Covid-19 cases and the significant factors observed in this analysis may be an indirect association due to the fact that successful transmission of the Covid-19 virus may depend on other entities excluded from this analysis that may be used to define the significant relationships observed.

Introduction

Despite the technological innovations in our civilization today, a significant portion of the entire world’s population suffered from the unexpectedly devastating effects of the Covid-19 virus. Various government interventions and public health regulations were implemented in order to suppress the ongoing and rapid spread of Covid-19 virus and to decrease the mortality rate associated with it. As the spread of Covid-19 virus gradually declines around the world and sufficient amount of data have been collected, it is important to improve our understanding of the effectiveness of the public health policies enforced that may have significant influence on the transmission of Covid-19 virus. Experts often consider the interplay of various public health regulations to slow down the spread of a virus. However, identifying the specific policies that are correlated with virus transmission is often a challenging and complex task to perform. Often times results vary between different research studies. The results obtained from this analysis may slightly contribute to the determination of important measures that need to be implemented during a pandemic. The primary goal of this analysis is to investigate the association between the number of detected Covid-19 cases and a diverse set of public health policies. The observations obtained from this analysis can be used for hypothesis generation and further investigations in the future since this project only focuses on the limited numbers of factors and doesn’t include other covariates that might be correlated with the features identified as statistically significant. The data sets used for this analysis were collected from two sources, World Health Organization (WHO) and Our World in Data (OWID). These data sets contain the features country, region, new Covid-19 cases, date, new cases, cumulative cases, and public health policies including vaccination, face covering, internal movement, international travel, and stay-at-home order. Possible correlations between these factors are expected and appropriate methods are used to verify them. Based on the previous studies done on this subject, it can be hypothesized that a positive correlation between Covid-19 transmission and the public health policies exists. This analysis project is primarily focused on answering the following questions.

Primary Question of Interest:

Are public health policies useful in determining the number of positive Covid-19 cases?

Secondary Questions of Interest:

How effective are public health policies?
Which of these public health policies are most and least effective?

Background

Since Covid-19 virus can be transmitted from one individual to another regardless of gender or health and most public health policies are focused on limiting social contact, it is important to identify whether the policies implemented by public health experts are effective or not. According to a study conducted by BMC Public Health, public health measures are found to be effective in decreasing Covid-19 transmission (Ayouni et al., 2021). One of the public health policy included in this investigation is face covering. It has been proven that face masks are effective in filtering out microscopic particles. However, specific studies that focus on the effectiveness of wearing a face mask in decreasing the transmission of Covid-19 has not been extensively explored. One research study recently revealed that the use of facial coverings indoor has a positive effect on the decreasing the number of individuals testing positive for Covid-19 (Andrejko et al., 2022). Some of the public health regulations are mainly focused on reducing social contact. These regulations include stay-at-home orders, reduction of internal mobility, and international travel restrictions. Studies have shown that minimizing social contact by issuing stay-at-home orders is associated with decline in successful Covid-19 transmission (Fowler et al., 2021). Similarly, travel restrictions have been proven to mitigate the rapid spread of Covid-19 virus (Kwok et al., 2021). If stay-at-home orders are placed and international or domestic travel is restricted, then internal movement is also expected to decline. Therefore, the internal movement restriction should have similar effect on the spread of Covid-19 virus.

This project explores the data collected from two sources in order to answer the questions of interest, which are the primary focus of this analysis. Since any individual can be affected by the Covid-19 virus, this project targets the general population worldwide who are susceptible to contracting the Covid-19 virus regardless of existing health conditions. The total number of individuals all over the world in various countries that tested positive for Covid-19 from January 2020 to January 2022 is used as the response variable in the data modeling section. The data set that includes valuable information about the daily recorded new Covid-19 cases from 237 countries around the world is gathered from World Health Organization (WHO). The categorical covariates chosen are a diverse set of public health policies. The data containing public health policies are collected from Our World in Data (OWID). Specific information about the sources are provided in the Acknowledgement section at the end of this report. The final cleaned and aggregated data set that excludes missing records are displayed in the Data section below.

Data

The data used for this analysis was collected from various sources mentioned in the Acknowledgement section of this report.The supplementary data sets were collected from OWID, which contain the features that are useful for answering the primary and secondary questions of interest. The final aggregated data displayed in Table 3 below specifically used for this analysis project are aggregated by month for a total of 180 countries from 2020 to 2022. This data set contains variables that are defined in the tables attached below (Tables 1 and 2). The maximum policy level for each month is retained and used for this analysis since policy levels do not change daily. Simplifying the data by aggregating it into a monthly record also contributes to the efficiency when analyzing this data to answer the questions of interest. A lagged variable for total number of Covid-19 cases per month is also generated in order to consider the fact that changes in policies implemented during a specific time may not yield any significant results immediately.

Table 1: Variable Definitions

Variable	Definition	Levels (Values)
Country	Includes 180 countries worldwide with record
New_cases	Total number of new positive Covid-19 cases from 1/3/20 through 2/16/22
Face_cover	Policies on the use of face coverings outside-of-the-home	5 (0,1,2,3, and 4)
Vaccination	Policies on vaccination availability for different groups	6 (0,1,2,3,4, and 5)
Internal_movement	policies on restrictions on internal movement/travel between regions and cities	3 (0,1, and 2)
Stay_home	Policies on stay-at-home requirements or household lockdowns	4 (0,1,2, and 3)
International_travel	Policies on restrictions on international travel controls	5 (0,1,2,3, and 4)

Table 2: Level Definitions

table2 <- data.frame(Levels = c(0,1,2,3,4,5), Face_cover = c('No policy',
'Recommended',
'Required in some specified shared/public spaces outside the home with other people present, or some situations when social distancing not possible',
'Required in all shared or public spaces outside the home with other people present or all situations when social distancing not possible',
'Required outside the home at all times regardless of location or presence of other people', '--'), Vaccination = c('No availability',
'Availability key workers or clinically vulnerable groups or elderly groups',
'Availability for two of the following: key workers, clinically vulnerable groups, or elderly groups',
'Availability for key workers, clinically vulnerable groups, and elderly groups',
'Availability for all three plus partial additional availability (select broad groups/ages)',
'Universal availability'), Internal_movement= c('No measures', 'Recommended movement restriction', 'Restrict movement', '--', '--', '--'), Stay_home = c('No measures',
'Recommended not to leave the house',
'Required to not leave the house with exceptions for daily exercise, grocery shopping, and ‘essential’ trips',
'Required to not leave the house with minimal exceptions (e.g. allowed to leave only once every few days, or only one person can leave at a time, etc.', '--', '--'), International_travel = c('No measures', 'Screening', 'Quarantine from high-risk regions', 'Ban on high-risk regions', 'Total border closure', '--'))
library(kableExtra)
table2%>%kable()

Levels	Face_cover	Vaccination	Internal_movement	Stay_home	International_travel
0	No policy	No availability	No measures	No measures	No measures
1	Recommended	Availability key workers or clinically vulnerable groups or elderly groups	Recommended movement restriction	Recommended not to leave the house	Screening
2	Required in some specified shared/public spaces outside the home with other people present, or some situations when social distancing not possible	Availability for two of the following: key workers, clinically vulnerable groups, or elderly groups	Restrict movement	Required to not leave the house with exceptions for daily exercise, grocery shopping, and ‘essential’ trips	Quarantine from high-risk regions
3	Required in all shared or public spaces outside the home with other people present or all situations when social distancing not possible	Availability for key workers, clinically vulnerable groups, and elderly groups	–	Required to not leave the house with minimal exceptions (e.g. allowed to leave only once every few days, or only one person can leave at a time, etc.	Ban on high-risk regions
4	Required outside the home at all times regardless of location or presence of other people	Availability for all three plus partial additional availability (select broad groups/ages)	–	–	Total border closure
5	–	Universal availability	–	–	–

#load WHO data
who_data <-read.csv("C:/Users/kayan/UCD/STA141A/141Final-Project/Covid-19-141A/Data/covid/WHO-COVID-19-global-data.csv")
who_data <- as.data.frame(who_data)
colnames(who_data)[1] <- "Date" 
who_data <- subset(who_data, select = -c(Country_code,New_deaths, Cumulative_deaths))
who_data$Date  <- as.Date(who_data$Date, format = "%m/%d/%Y")
# who_data$Date  <- as.Date(who_data$Date , format = "%y/%m/%d")
# who_data$Date  <- as.character(who_data$Date)
who_data$Year <- strftime(who_data$Date, "%Y")    # Create year column
who_data$Month <- strftime(who_data$Date, "%m") 
library(zoo)
who_data$MonthYear <- as.yearmon(paste(who_data$Year, who_data$Month), "%Y %m")
who_data <- subset(who_data[, c(1,8,6,7,2,3,4,5)])
# sapply(who_data, class)
# who_data$Country <- tolower(who_data$Country)

#load face covering data
face_data <-read.csv("C:/Users/kayan/UCD/STA141A/141Final-Project/Data/policies/face-covering-policies-covid.csv", header=T)
face_data <- as.data.frame(face_data)
colnames(face_data)[1] <- "Country"
colnames(face_data)[3] <- "Date"
colnames(face_data)[4] <- "Face_cover"
face_data$Date  <- as.Date(face_data$Date)
# face_data$face_cover  <- as.factor(face_data$face_cover)
sapply(face_data, class)

Country        Code        Date  Face_cover

“character” “character” “Date” “integer”

#load vaccination policy data
vacc_data <-read.csv("C:/Users/kayan/UCD/STA141A/141Final-Project/Data/policies/covid-vaccination-policy.csv", header=T)
vacc_data <- as.data.frame(vacc_data)
colnames(vacc_data)[1] <- "Country"
colnames(vacc_data)[3] <- "Date"
colnames(vacc_data)[4] <- "Vaccination"
vacc_data$Date  <- as.Date(vacc_data$Date)
# vacc_data$vaccination_policy  <- as.factor(vacc_data$vaccination_policy)
sapply(vacc_data, class)

Country        Code        Date Vaccination

“character” “character” “Date” “integer”

#load internal movement policy data
internal_data <-read.csv("C:/Users/kayan/UCD/STA141A/141Final-Project/Data/policies/internal-movement-covid.csv", header=T)
internal_data <- as.data.frame(internal_data)
colnames(internal_data)[1] <- "Country"
colnames(internal_data)[3] <- "Date"
colnames(internal_data)[4] <- "Internal_movement"
internal_data$Date  <- as.Date(internal_data$Date)
# internal_data$restrictions_internal_movements  <- as.factor(internal_data$restrictions_internal_movements)
sapply(internal_data, class)

      Country              Code              Date Internal_movement 
  "character"       "character"            "Date"         "integer"

#load stay at home restrictions data
stayhome_data <-read.csv("C:/Users/kayan/UCD/STA141A/141Final-Project/Data/policies/stay-at-home-covid.csv", header=T)
stayhome_data <- as.data.frame(stayhome_data)
colnames(stayhome_data)[1] <- "Country"
colnames(stayhome_data)[3] <- "Date"
colnames(stayhome_data)[4] <- "Stay_home"
stayhome_data$Date  <- as.Date(stayhome_data$Date)
# stayhome_data$stay_home_requirements  <- as.factor(stayhome_data$stay_home_requirements)
sapply(stayhome_data, class)

Country        Code        Date   Stay_home

“character” “character” “Date” “integer”

#load international travel restrictions data
inttravel_data <-read.csv("C:/Users/kayan/UCD/STA141A/141Final-Project/Data/policies/international-travel-covid.csv", header=T)
inttravel_data <- as.data.frame(inttravel_data)
colnames(inttravel_data)[1] <- "Country"
colnames(inttravel_data)[3] <- "Date"
colnames(inttravel_data)[4] <- "International_travel"
inttravel_data$Date  <- as.Date(inttravel_data$Date)
# inttravel_data$international_travel_controls  <- as.factor(inttravel_data$international_travel_controls)
sapply(inttravel_data, class)

         Country                 Code                 Date 
     "character"          "character"               "Date"

International_travel “integer”

#inner join three policy data
library(dplyr)
policy1 <- inner_join(face_data, vacc_data, by = c("Country", "Code", "Date"))
policy1 <- inner_join(policy1, internal_data, by = c("Country", "Code", "Date"))
policy1 <- inner_join(policy1, stayhome_data, by = c("Country", "Code", "Date"))
policy_data <- inner_join(policy1, inttravel_data, by = c("Country", "Code", "Date"))

#inner join policy data and who data
library(dplyr)

df <- who_data %>% anti_join(policy_data,by = 'Country')
df <-filter(df, Date == "2022-02-01")


policy_data$Country[policy_data$Country=="Cape Verde"]<-"Cabo Verde"
policy_data$Country[policy_data$Country=="Faeroe Islands"]<-"Faroe Islands"
policy_data$Country[policy_data$Country=="Timor"]<-"Timor-Leste"

who_data$Country[who_data$Country=="Bolivia (Plurinational State of)"]<-"Bolivia"
who_data$Country[who_data$Country=="Brunei Darussalam"]<-"Brunei"
who_data$Country[who_data$Country=="Republic of Korea"]<-"South Korea"
who_data$Country[who_data$Country=="Republic of Moldova"]<-"Moldova"
who_data$Country[who_data$Country=="The United Kingdom"]<-"United Kingdom"
who_data$Country[who_data$Country=="United Republic of Tanzania"]<-"Tanzania"
who_data$Country[who_data$Country=="United States of America"]<-"United States"
who_data$Country[who_data$Country=="Viet Nam"]<-"Vietnam"
who_data$Country[who_data$Country=="Russian Federation"]<-"Russia"
who_data$Country[who_data$Country=="Iran (Islamic Republic of)"]<-"Iran"
who_data$Country[who_data$Country=="Lao People's Democratic Republic"]<-"Laos"
who_data$Country[who_data$Country=="Micronesia (Federated States of)"]<-"Micronesia"
# who_agg$Country[who_agg$Country=="Timor-Leste"]<-"Timor"
who_data$Country[who_data$Country=="Venezuela (Bolivarian Republic of)"]<-"Venezuela"
who_data$Country[who_data$Country=="Democratic Republic of the Congo"]<-"Democratic Republic of Congo"
who_data$Country[who_data$Country=="Kosovo[1]"]<-"Kosovo"
who_data$Country[who_data$Country=="Syrian Arab Republic"]<-"Syria"


data_daily <- inner_join(who_data, policy_data, by = c("Country", "Date"))

colSums(is.na(data_daily))

##                 Date            MonthYear                 Year 
##                    0                    0                    0 
##                Month              Country           WHO_region 
##                    0                    0                    0 
##            New_cases     Cumulative_cases                 Code 
##                    0                    0                    0 
##           Face_cover          Vaccination    Internal_movement 
##                    0                    0                    0 
##            Stay_home International_travel 
##                    0                    0

# data_daily$WHO_region  <- as.factor(data_daily$WHO_region)
sapply(data_daily, class)

##                 Date            MonthYear                 Year 
##               "Date"            "yearmon"          "character" 
##                Month              Country           WHO_region 
##          "character"          "character"          "character" 
##            New_cases     Cumulative_cases                 Code 
##            "integer"            "integer"          "character" 
##           Face_cover          Vaccination    Internal_movement 
##            "integer"            "integer"            "integer" 
##            Stay_home International_travel 
##            "integer"            "integer"

# aggregate data_daily by month
library(dplyr)

data_monthly1<- data_daily %>% group_by(Country, Code, WHO_region, Year, Month, MonthYear) %>%
  dplyr::summarize(New_cases = sum(New_cases, na.rm = TRUE),Face_cover = max(Face_cover,na.rm = TRUE), Vaccination = max(Vaccination,na.rm = TRUE), Internal_movement=max(Internal_movement,na.rm = TRUE), Stay_home= max(Stay_home,na.rm = TRUE),  International_travel=      max(International_travel,na.rm = TRUE), .groups = "keep") %>%
  as.data.frame()

data_monthly1[,8:12] <- lapply(data_monthly1[,8:12], as.factor)
# sapply(data_monthly1, class)



#generate lagged cases column
data_monthly <-data_monthly1 %>%  group_by(Country) %>%
    mutate(Lag_cases = lag(New_cases))
data_monthly$Lag_cases[is.na(data_monthly$Lag_cases)] <- 0
data_monthly <-data_monthly [,c(1:7,13,8:12)]

#write.csv(who_data,"../Data/aggregated-data/data-monthly.csv", row.names = FALSE)

Table 3: Monthly Data

#View final data
library(DT)
d<-data_monthly[,-6]
datatable(d, caption = htmltools::tags$caption(
                  style = 'caption-side: bottom; text-align: center;',
                  'Table 3: ', htmltools::em('Monthly Data'), rownames = FALSE,filter="top", options = list(pageLength = 5, autoWidth = TRUE, scrollX=T, columnDefs = list(list(width = '50px', targets = "_all")))))

# aggregate data_monthly by year
d<- data_monthly
d[,9:13] <- sapply(d[,9:13], as.integer)
sapply(d, class)

         Country                 Code           WHO_region 
     "character"          "character"          "character" 
            Year                Month            MonthYear 
     "character"          "character"            "yearmon" 
       New_cases            Lag_cases           Face_cover 
       "integer"            "numeric"            "integer" 
     Vaccination    Internal_movement            Stay_home 
       "integer"            "integer"            "integer"

International_travel “integer”

data_yearly<- d %>% 
  group_by(Country, Code, WHO_region, Year) %>%
  dplyr::summarize(New_cases = sum(New_cases, na.rm = TRUE),Face_cover = max(Face_cover,na.rm = TRUE), Vaccination = max(Vaccination,na.rm = TRUE), Internal_movement=max(Internal_movement,na.rm = TRUE), Stay_home= max(Stay_home,na.rm = TRUE),  International_travel=      max(International_travel,na.rm = TRUE), .groups = "keep") %>%
  as.data.frame()

#yearly data
data_2021 <- filter(data_yearly, Year=="2021")
data_2020 <- filter(data_yearly, Year=="2020")

#data with aggregated covid-cases
data_2020_2022 <- data_yearly %>% 
  group_by(Country, Code, WHO_region) %>%
  dplyr::summarize(New_cases = sum(New_cases, na.rm = TRUE),Face_cover = max(Face_cover,na.rm = TRUE), Vaccination = max(Vaccination,na.rm = TRUE), Internal_movement=max(Internal_movement,na.rm = TRUE), Stay_home= max(Stay_home,na.rm = TRUE),  International_travel=      max(International_travel,na.rm = TRUE), .groups = "keep") %>%
  as.data.frame()

data_2020_2022$Year <- "2020-2022"
data_2020_2022 <- data_2020_2022[, c(1,2,3,10,4,5,6,7,8,9)]

#data with aggregated covid-cases grouped by region
data_regional <- data_2020_2022 %>% 
  group_by(WHO_region) %>%
  dplyr::summarize(New_cases = sum(New_cases, na.rm = TRUE),Face_cover = max(Face_cover,na.rm = TRUE), Vaccination = max(Vaccination,na.rm = TRUE), Internal_movement=max(Internal_movement,na.rm = TRUE), Stay_home= max(Stay_home,na.rm = TRUE),  International_travel=      max(International_travel,na.rm = TRUE), .groups = "keep") %>%
  as.data.frame()

#data with aggregated covid-cases grouped by region
data_regional <- data_2020_2022 %>% 
  group_by(WHO_region) %>%
  dplyr::summarize(New_cases = sum(New_cases, na.rm = TRUE),Face_cover = max(Face_cover,na.rm = TRUE), Vaccination = max(Vaccination,na.rm = TRUE), Internal_movement=max(Internal_movement,na.rm = TRUE), Stay_home= max(Stay_home,na.rm = TRUE),  International_travel=      max(International_travel,na.rm = TRUE), .groups = "keep") %>%
  as.data.frame()

#data with aggregated covid-cases grouped by policies 
data_policies <- data_monthly %>% 
  group_by(Face_cover, Vaccination, Internal_movement, Stay_home, International_travel) %>%
  dplyr::summarize(New_cases = sum(New_cases, na.rm = TRUE), .groups = "keep") %>%
  as.data.frame()

Descriptive Analysis

The following sections show the results of further data exploration using numerical and graphical methods.

Summary Statistics

The following chart displays the general summary statistics of the monthly data being used for this analysis project. All five categorical variables contain contain unequal number of factor levels. In this dataset, there are a total of 180 countries from six WHO regions. The original aggregated data was properly cleaned to exclude all the missing values. In the graph column of the chart shown below, the bar graph for each variable shows the type of distributions each variable have. The New_cases variable appears to be heavily right-skewed so a proper variable transformation needs to be performed. The categorical variables appear to be roughly normal distributed with the exception of Vaccination, which is slightly right-tailed.The second column in the chart shows the computed mean, standard deviation, minimum, median, maximum, and interquartile range (IQR), and coefficient of variation (CV) for each quantitative variable. The variable Year shows that the observations in this data are records from three different years (2020 to 2022).

#view summary
library(summarytools)
monthly_data <- data_monthly1[, -c(2,6)]
print(dfSummary(monthly_data), method="render")

Data Frame Summary

monthly_data

Dimensions: 4641 x 10
Duplicates: 0

Variable

Stats / Values

Freqs (% of Valid)

Graph

Valid

Missing

Country [character]

1. Albania

2. Algeria

3. Andorra

4. Angola

5. Australia

6. Austria

7. Bahrain

8. Bangladesh

9. Belarus

10. Belgium

[ 170 others ]

26	(	0.6%	)
26	(	0.6%	)
26	(	0.6%	)
26	(	0.6%	)
26	(	0.6%	)
26	(	0.6%	)
26	(	0.6%	)
26	(	0.6%	)
26	(	0.6%	)
26	(	0.6%	)
4381	(	94.4%	)

4641 (100.0%)

0 (0.0%)

WHO_region [character]

1. AFRO

2. AMRO

3. EMRO

4. EURO

5. SEARO

6. WPRO

1083	(	23.3%	)
870	(	18.7%	)
541	(	11.7%	)
1398	(	30.1%	)
231	(	5.0%	)
518	(	11.2%	)

4641 (100.0%)

0 (0.0%)

Year [character]

1. 2020

2. 2021

3. 2022

2160	(	46.5%	)
2159	(	46.5%	)
322	(	6.9%	)

4641 (100.0%)

0 (0.0%)

Month [character]

1. 01
2. 02
3. 03
4. 04
5. 05
6. 06
7. 08
8. 09
9. 10
10. 11
[ 2 others ]

539	(	11.6%	)
503	(	10.8%	)
360	(	7.8%	)
360	(	7.8%	)
360	(	7.8%	)
360	(	7.8%	)
360	(	7.8%	)
360	(	7.8%	)
360	(	7.8%	)
360	(	7.8%	)
719	(	15.5%	)

4641 (100.0%)

0 (0.0%)

New_cases [integer]

Mean (sd) : 86102.5 (488758.6)

min ≤ med ≤ max:

0 ≤ 3053 ≤ 20257043

IQR (CV) : 27325 (5.7)

3354 distinct values

4641 (100.0%)

0 (0.0%)

Face_cover [factor]

1. 0

2. 1

3. 2

4. 3

5. 4

793	(	17.1%	)
212	(	4.6%	)
727	(	15.7%	)
1925	(	41.5%	)
984	(	21.2%	)

4641 (100.0%)

0 (0.0%)

Vaccination [factor]

1. 0

2. 1

3. 2

4. 3

5. 4

6. 5

2343	(	50.5%	)
138	(	3.0%	)
227	(	4.9%	)
362	(	7.8%	)
490	(	10.6%	)
1081	(	23.3%	)

4641 (100.0%)

0 (0.0%)

Internal_movement [factor]

1. 0

2. 1

3. 2

2090	(	45.0%	)
699	(	15.1%	)
1852	(	39.9%	)

4641 (100.0%)

0 (0.0%)

Stay_home [factor]

1. 0

2. 1

3. 2

4. 3

1580	(	34.0%	)
1064	(	22.9%	)
1737	(	37.4%	)
260	(	5.6%	)

4641 (100.0%)

0 (0.0%)

International_travel [factor]

1. 0

2. 1

3. 2

4. 3

5. 4

244	(	5.3%	)
773	(	16.7%	)
1011	(	21.8%	)
1432	(	30.9%	)
1181	(	25.4%	)

4641 (100.0%)

0 (0.0%)

Generated by summarytools 1.0.1 (R version 4.2.3)
2023-03-18

The summary statistics of monthly new Covid-19 cases from 2020 to 2022 for each country in the data is shown in Table 4 below. It appears that the United States, India, and Brazil have the highest total and average number of positive Covid-19 cases during this time period. Meanwhile, Turkmenistan, Tonga, and Vanuatu have the lowest number of Covid-19 cases in this data. However, it appears that these three countries may have inaccurate record due to their extremely low cases compare to other countries. This information is also shown in Figure 1 below.

Table 4: Summary statistics by Country

library(dplyr)
d <- data_monthly %>%
  group_by(Country) %>%
  summarise(total = sum(New_cases, na.rm=T), mean = mean(New_cases, na.rm=T),median=median(New_cases, na.rm=T), maximum = max(New_cases, na.rm=T), minimum = min(New_cases, na.rm=T), IQR = IQR(New_cases, na.rm=T), quantile_25 = quantile(New_cases, na.rm=T, c(0.25)),quantile_75 = quantile(New_cases, na.rm=T, c(0.75)), .groups = "keep")

# display summary data
library(DT)
datatable(d, caption = htmltools::tags$caption(
                  style = 'caption-side: bottom; text-align: center;overflow: auto;position: relative;width : 100% !important;',
                  'Table 4: ', htmltools::em('Summary statistics of Monthly New Cases (2020-2022) grouped by Country'), rownames = FALSE,filter="top", options = list(pageLength = 10, autoWidth = T, scrollX=TRUE, scrollY=F, columnDefs = list(list(width = '3px', targets = "_all")))))

Data Visualization

The interactive bubble chart displayed below (Figure 1) shows how the total number of new cases per year for each country included in the data change every year from 2020 to 2022. The purpose of this project is to show how these cumulative Covid-19 cases is influenced by different public health policies and restrictions. Therefore, it is important to visualize the number of Covid-19 cases, which is used in the data modeling section as the response variable. In this chart, it is evident that United States , India , and Brazil have the highest number of Covid-19 cases in 2020 and 2021. The United States remains as the country with the highest number of positive cases in 2022 and France surpassed India as the country with the second highest number of cases. It is also clear that the total number of cases for most countries significantly increased in 2021 compared to the recorded cases in 2020. Additionally, it is apparent that the highest number of Covid-19 cases belong to different regions. It is important to add that this analysis does not consider possible effect of each country on the number of Covid-19 cases. In order to improve the efficiency of the methods used in this analysis, the variable region is used as a between-subjects factor with a few number of groups.

library(plotly)

p <- plot_ly(data_yearly,
      x = ~Country,
      y = ~New_cases,
      sizes = c(10, 100),
      marker = list(opacity = 0.7,
            sizemode = 'diameter')) 
p <- p %>% layout(title = 'Covid-19 Total New Cases by Country and Year',titlefont = list(
           family = "Agency FB",
           size = 22,
           color = 'black') , font = list(family = "Agency FB", size = 12),
         xaxis = list(title = ''),
         yaxis = list(title = 'Total New Cases'), margin=0.5)

p <- add_markers(p,
      size = ~New_cases,
      color = ~WHO_region,
      text = ~Country, hoverinfo = "text",
      ids = ~Country,
      frame = ~Year,
      showlegend = TRUE) 
p <- animation_opts(p,
         frame = 1000,
         transition = 800,
         redraw = FALSE)

p <- animation_slider(p,
           currentvalue = list(prefix = "Year "))
p

Figure 1: Covid-19 Total New Cases by Country and Year

# widgetframe::frameWidget(p)

The heatmap of correlation matrix for the different regions and policy factors is displayed in Figure 2 below. Generally the different levels of the factors are not correlated with each other. The highest correlation appears to be between Internal_movement2 and Stay_home2, which is reasonable because higher restrictions on stay-at-home orders should result in less number of people outside and less amount of mobility in a particular area.

library(ggcorrplot)
d <- data_monthly[, c(3,9:13)]
corr_plot <- model.matrix(~0+., data=d) %>% 
  cor(use="pairwise.complete.obs") %>% 
  ggcorrplot(show.diag = F, type="lower", lab=TRUE, lab_size=2)

ggplotly(corr_plot)%>%layout(title ="Heatmap of the correlation matrix", 
                             titlefont = list( family = "Agency FB",size = 26,color = 'black') , 
                             font = list(family = "Agency FB", size = 18),margin = 0.5, yaxis = list(tickfont = 
list(size = 12),color = 'black',title = ""), xaxis = list(tickfont = 
list(size = 12)))

Figure 2: Correlations of Policy Levels

The histogram below shows the frequency of the different regions and levels of policies and restrictions using the data aggregated by year and month (Table 3). The histogram shows that the region and policy factors are roughly normally distributed. It also shows the these factors have unequal number of levels, which results in an imbalanced model design.

# library
library(ggplot2)
library(plotly)
library(hrbrthemes)

data_monthly$main1 = "Face cover"
data_monthly$main2 = "Vaccination"
data_monthly$main3 = "Internal movement"
data_monthly$main4 = "International travel"
data_monthly$main5 = "Stay home"
data_monthly$main6 = "Region"

f0 <- data_monthly %>% ggplot(aes(WHO_region)) +
  geom_histogram(stat="count", fill="cornflowerblue",alpha=0.5, col="white") + #+#FFBB00 
  labs(x="Region",y="Counts") +
  geom_text(geom = "text",label="",x=3.5,y=90,col="black")+ facet_wrap(~main6) + theme(text = element_text(size=10),axis.text.x = element_text(angle = 90, vjust = 0.5,size=10, hjust=1))

f1 <- data_monthly %>% ggplot(aes(Face_cover)) +
  geom_histogram(stat="count", fill="#808000",alpha=0.5, col="white") + #+#FFBB00 
  labs(x="Face cover",y="Counts") +
  geom_text(geom = "text",label="",x=3.5,y=90,col="black")+ facet_wrap(~main1) + theme(text = element_text(size=10),axis.text.x = element_text(angle = 90, vjust = 0.5,size=10, hjust=1))

f2 <- data_monthly %>% ggplot(aes(Vaccination)) +
  geom_histogram(stat="count",fill="sandybrown",alpha=0.5, col="white") +#+#FFBB00
  labs(x="Vaccination",y="Counts") +
  geom_text(geom = "text",label="",x=3.5,y=187.5,col="black")+ facet_wrap(~main2) + theme(text = element_text(size=10),axis.text.x = element_text(angle = 90, vjust = 0.5,size=10, hjust=1))

f3 <- data_monthly %>% ggplot(aes(Internal_movement)) +
  geom_histogram(stat="count",fill="#404080",alpha=0.3, col="white") + #+#FFBB00
  labs(x="Internal movement",y="Counts") +
  geom_text(geom = "text",label="",x=3.5,y=68,col="black")+ facet_wrap(~main3) + theme(text = element_text(size=10),axis.text.x = element_text(angle = 90, vjust = 0.5,size=10, hjust=1))

f4 <- data_monthly %>% ggplot(aes(International_travel)) +
  geom_histogram(stat="count",fill="#69b3a2",alpha=0.5, col="white") + #+#FFBB00
  labs(x="International travel",y="Counts") +
  geom_text(geom = "text",label="",x=3.5,y=68,col="black")+ facet_wrap(~main4) + theme(text = element_text(size=10),axis.text.x = element_text(angle = 90, vjust = 0.5,size=10, hjust=1))

f5 <- data_monthly %>% ggplot(aes(Stay_home)) +
  geom_histogram(stat="count",fill="lightsteelblue4",alpha=0.5, col="white") + #+#FFBB00
  labs(x="Stay home",y="Counts") +
  geom_text(geom = "text",label="",x=0.5,y=1800,col="black")+ facet_wrap(~main5) + theme(text = element_text(size=10),axis.text.x = element_text(angle = 90, vjust = 0.5,size=10, hjust=1))


subplot(f0, f2, f3,f1 ,f4,f5, shareY = T, nrows = 1,heights =1, margin=0.005)%>% layout(title ="Histogram of Factors", titlefont = list(
           family = "Agency FB",
           size = 26,
           color = 'black') , font = list(family = "Agency FB", size = 15),margin = 0.5, yaxis = list(
        color = 'black',
        title = "Count"))

Figure 3: Histogram of Categorical Region and Policy Variables

# library(ggplot2)
# library(plotly)
# data_monthly$main1 = "Face cover"
# data_monthly$main2 = "Vaccination"
# data_monthly$main3 = "Internal movement"
# data_monthly$main4 = "International travel"
# data_monthly$main5 = "Stay home"
# 
# f1 <- data_monthly%>% 
#   ggplot(aes(y=New_cases,x=as.integer(Face_cover))) +
#   geom_point(alpha=0.1, color="#808000") + 
#   labs(x="Total New Cases", y= "Face Cover",
#        title="") +
#   scale_y_log10()+ facet_wrap(~main1) +
#   geom_smooth(method=lm , color="darkgrey", fill="#808000", level = 0.95, se=T,size=0.5) 
# 
# f2 <- data_monthly%>% 
#   ggplot(aes(y=New_cases,x=as.integer(Vaccination))) +
#   geom_point(alpha=0.1, color="cornflowerblue") + 
#   labs(x="Total New Cases", y= "Face Cover",
#        title="") +
#   scale_y_log10()+ facet_wrap(~main2) +
#   geom_smooth(method=lm , color="darkgrey", fill="cornflowerblue", level = 0.95, se=T,size=0.5) 
# 
# f3 <- data_monthly%>% 
#   ggplot(aes(y=New_cases,x=as.integer(Internal_movement))) +
#   geom_point(alpha=0.1,  color="#FFBB00") + 
#   labs(x="Total New Cases", y= "Face Cover",
#        title="") +
#   scale_y_log10()+  facet_wrap(~main3) +
#   geom_smooth(method=lm , color="darkgrey", fill="#FFBB00", level = 0.95, se=T,size=0.5) 
# 
# f4 <- data_monthly%>% 
#   ggplot(aes(y=New_cases,x=as.integer(International_travel))) +
#   geom_point(alpha=0.1,  color="#69b3a2") + 
#   labs(x="Total New Cases", y= "Face Cover",
#        title="") +
#   scale_y_log10()+  facet_wrap(~main4) +
#   geom_smooth(method=lm , color="darkgrey", fill="#69b3a2", level = 0.95, se=T,size=0.5) 
# 
# f5 <- data_monthly%>% 
#   ggplot(aes(y=New_cases,x=as.integer(Stay_home))) +
#   geom_point(alpha=0.1,  color="#404080") + 
#   labs(x="Total New Cases", y= "Face Cover",
#        title="") +
#   scale_y_log10()+ facet_wrap(~main5) +
#   geom_smooth(method=lm , color="darkgrey", fill="#404080", level = 0.95, se=T,size=0.5) 
# 
# 
# # subplot(f2, f3,f1 ,f4,f5, shareY = T, nrows = 1,heights = 0.92)%>% layout(title ="Scatterplot of Covid-19 Cases and Policies")
# subplot(f2, f3,f1 ,f4,f5, shareY = T, shareX=F,nrows = 1, margin=0.005)%>% layout(title ="Scatterplot of Covid-19 Cases and Policies", titlefont = list(
#            family = "Agency FB",
#            size = 26,
#            color = 'black') , font = list(family = "Agency FB", size = 18),margin = 0.5, yaxis = list(
#         color = 'black',
#         title = "Monthly Covid-19 Cases"))

The attached figure below (Figure 4) shows the boxplot of total monthly Covid-19 cases in the data (Table 3) under different regions and policy factors. Since the means (red dots) within each plot does not follow a horizontal trend, this means that the regions and policies shown have an impact on the number of cases. The means of the number of cases vary across all factor levels within region and each policy variable. The boxplot of Lag_cases does not differ significantly from the boxplot using New_cases. Therefore, it is not necessary to use Lag_cases for this project.

F0<- ggplot(data = data_monthly, aes(y=New_cases,x=as.factor(WHO_region),color="grey"))+
# ggplot(Wage, aes(occupation,wage,fill=occupation))+ #to leave them alphabetic
  geom_jitter(colour = "dark gray",alpha =0.5, width=.1, size=0.5) +
  stat_boxplot(geom ='errorbar',width = 0.05, color="black") +
  geom_boxplot(fill="cornflowerblue", color="black", alpha=0.8)+
  labs(title="", 
       x = "Region",
       y = "Monthly Covid-19 Cases",
       #subtitle ="Gray dots=sample data points, Black dot=outlier, Blue dot=mean, Red=99% confidence interval",
       caption = "") +
  guides(fill=FALSE) +
  #stat_summary(fun.data = "mean_cl_normal", colour = "red", size = 1.5, fun.args = list(conf.int=.99)) +
  stat_summary(geom="point", fun.y=mean, color="red")+ facet_wrap(~main6) + 
  theme(text = element_text(size=10), axis.text.x = element_text(angle = 90, vjust = 0.5,size=10, hjust=1)) +
  scale_y_log10()


F1<-data_monthly %>% ggplot(aes(y=New_cases,x=as.factor(Face_cover),color="grey"))+
# ggplot(Wage, aes(occupation,wage,fill=occupation))+ #to leave them alphabetic
  geom_jitter(colour = "dark gray",alpha =0.5, width=.1, size=0.5) +
  stat_boxplot(geom ='errorbar',width = 0.05, color="black") +
  geom_boxplot(fill="#808000", color="black", alpha=0.8)+
  labs(title="", 
       x = "Face cover",
       y = "Monthly Covid-19 Cases",
       #subtitle ="Gray dots=sample data points, Black dot=outlier, Blue dot=mean, Red=99% confidence interval",
       caption = "") +
  guides(fill=FALSE) +
  #stat_summary(fun.data = "mean_cl_normal", colour = "red", size = 1.5, fun.args = list(conf.int=.99)) +
  stat_summary(geom="point", fun.y=mean, color="red")+ facet_wrap(~main1) + theme(text = element_text(size=10),axis.text.x = element_text(angle = 90, vjust = 0.5,size=10, hjust=1)) +
  scale_y_log10()

F2 <- data_monthly %>% ggplot( aes(y=New_cases,x=as.factor(Vaccination),color="grey"))+
# ggplot(Wage, aes(occupation,wage,fill=occupation))+ #to leave them alphabetic
  geom_jitter(colour = "dark gray",alpha =0.5, width=.1, size=0.5) +
  stat_boxplot(geom ='errorbar',width = 0.05, color="black") +
  geom_boxplot(fill="sandybrown", color="black", alpha=0.8)+
  labs(title="", 
       x = "Vaccination",
       y = "Monthly Covid-19 Cases",
       #subtitle ="Gray dots=sample data points, Black dot=outlier, Blue dot=mean, Red=99% confidence interval",
       caption = "") +
  guides(fill=FALSE) +
  #stat_summary(fun.data = "mean_cl_normal", colour = "red", size = 1.5, fun.args = list(conf.int=.99)) +
  stat_summary(geom="point", fun.y=mean, color="red")+ facet_wrap(~main2) + theme(text = element_text(size=10),axis.text.x = element_text(angle = 90, vjust = 0.5,size=10, hjust=1)) +
  scale_y_log10()


F3 <- data_monthly %>% ggplot( aes(y=New_cases,x=as.factor(Internal_movement),color="grey"))+
# ggplot(Wage, aes(occupation,wage,fill=occupation))+ #to leave them alphabetic
  geom_jitter(colour = "dark gray",alpha =0.5, width=.1, size=0.5) +
  stat_boxplot(geom ='errorbar',width = 0.05, color="black") +
  geom_boxplot(fill="#404080", color="black", alpha=0.8)+
  labs(title="", 
       x = "Internal movement",
       y = "Monthly Covid-19 Cases",
       #subtitle ="Gray dots=sample data points, Black dot=outlier, Blue dot=mean, Red=99% confidence interval",
       caption = "") +
  guides(fill=FALSE) +
  #stat_summary(fun.data = "mean_cl_normal", colour = "red", size = 1.5, fun.args = list(conf.int=.99)) +
  stat_summary(geom="point", fun.y=mean, color="red")+ facet_wrap(~main3) + theme(text = element_text(size=10),axis.text.x = element_text(angle = 90, vjust = 0.5,size=10, hjust=1)) +
  scale_y_log10()
  

F4 <- data_monthly %>% ggplot(aes(y=New_cases,x=as.factor(International_travel),color="grey"))+
# ggplot(Wage, aes(occupation,wage,fill=occupation))+ #to leave them alphabetic
  geom_jitter(colour = "dark gray",alpha =0.5, width=.1, size=0.5) +
  stat_boxplot(geom ='errorbar',width = 0.05, color="black") +
  geom_boxplot(fill="#69b3a2", color="black", alpha=0.8)+
  labs(title="", 
       x = "International travel",
       y = "Monthly Covid-19 Cases",
       #subtitle ="Gray dots=sample data points, Black dot=outlier, Blue dot=mean, Red=99% confidence interval",
       caption = "") +
  guides(fill=FALSE) +
  #stat_summary(fun.data = "mean_cl_normal", colour = "red", size = 1.5, fun.args = list(conf.int=.99)) +
  stat_summary(geom="point", fun.y=mean, color="red")+ facet_wrap(~main4) + theme(text = element_text(size=10),axis.text.x = element_text(angle = 90, vjust = 0.5,size=10, hjust=1)) +
  scale_y_log10()
 

F5 <- data_monthly %>% ggplot(aes(y=New_cases,x=as.factor(Stay_home),color="grey"))+
# ggplot(Wage, aes(occupation,wage,fill=occupation))+ #to leave them alphabetic
  geom_jitter(colour = "dark gray",alpha =0.5,width=.1, size=0.5) +
  stat_boxplot(geom ='errorbar',width = 0.05, color="black") +
  geom_boxplot(fill="lightsteelblue4", color="black", alpha=0.8)+
  labs(title="", 
       x = "Stay home",
       y = "Monthly Covid-19 Cases",
       #subtitle ="Gray dots=sample data points, Black dot=outlier, Blue dot=mean, Red=99% confidence interval",
       caption = "") +
  guides(fill=FALSE) +
  #stat_summary(fun.data = "mean_cl_normal", colour = "red", size = 1.5, fun.args = list(conf.int=.99)) +
  stat_summary(geom="point", fun.y=mean, color="red") + 
  facet_wrap(~main5) + theme(text = element_text(size=10),axis.text.x = element_text(angle = 90, vjust = 0.5,size=10, hjust=1)) +
  scale_y_log10()


# subplot(f2, f3,f1 ,f4,f5, shareX = T,margin = 0.005, nrows=5)%>% layout(title ="Scatterplot of Covid-19 Cases and Policies", titlefont = list(
#            family = "Agency FB",
#            size = 12,
#            color = 'black'))




subplot(F0, F2, F3, F1, F4, F5, shareY = T, nrows = 1,  heights=1, margin=0.005)%>% layout(title ="Boxplot of Covid-19 and Policies", titlefont = list(
           family = "Agency FB",
           size = 26,
           color = 'black') , font = list(family = "Agency FB", size = 18),margin = 0.5, yaxis = list(
        color = 'black',
        title = "Monthly Covid-19 Cases"))

Figure 4: Boxplot of Covid-19 and Policies

Exploratory Analysis

In order to further explore the data used for this analysis, the following method is used.

Multiple correspondence analysis (MCA)

The goal in this section is to identify any possible similarities between categorical predictors in the data that are used in the data modelling section. In order to perform this task, a method that is specifically used for summarizing and visualizing data that contains more than one categorical variables is used. The multiple correspondence analysis (MCA) method is similar to principal component analysis (PCA) but more useful when the data is categorical. Using this method, the significant variables that contribute the most in explaining the variations in the data can also be identified. In order to improve process efficiency, the data used in this section is an aggregated yearly data where the number of cumulative cases for each country every year are aggregated as total number of cases from 2020 to 2022 and the maximum policy imposed are retained. In terms of MCA, the categorical region and policies are used as the active variables, which are the variables that are passed into the MCA() function. The different countries are the active individuals, which represent the observations or rows included in the MCA. The number of Covid-19 cases are used as the supplementary variable, which are not included in the MCA but their coordinates can be predicted by the MCA using the active variables and active individuals. The histogram of the active variables (policies) in Figure 3 in the previous section does not show any categories with extremely small frequency. The lowest predictor level frequency is equal to 138, which is from vaccination with level 1. This value should be large enough to not cause any distortion in MCA.

# install.packages(c("FactoMineR", "factoextra"))
library("FactoMineR")
library("factoextra")
data_2020_2022[,c(3,6:10)] <- lapply(data_2020_2022[,c(3,6:10)], as.factor)
# sapply(data_2020_2022, class)
active <- data_2020_2022[, c(3,6:10)]
# number of categories per variable
n = apply(active, 2, function(x) nlevels(as.factor(x)))
# head(active[, 1:6], 3)
#perform MCA
mca <- MCA(active, graph = FALSE)
# print(mca)

To visualize the proportion of variances explained by each MCA dimension, the function fviz_screeplot() is used and the generated Scree plot is shown in Figure 5 below. This Scree plot can be interpreted in a similar way as the PCA Scree plots for principal components. The scree plot below does not appear to have any significant drop or “elbow”. Therefore, it can be inferred that there are no dominating directions that exist in the data.

# Eigenvalues / Variances
library("factoextra")
eig.val <- get_eigenvalue(mca)
fviz_screeplot(mca, addlabels = TRUE, ylim = c(0, 18))

Figure 5: MCA Scree plot

The function fviz_mca_biplot() is used to draw the MCA Biplot of individuals and variable categories or levels in Figure 6 below. This plot shows an overall pattern within the data. Observations (rows or individuals) are represented by blue points and columns (active variable categories) are the triangles color coded by contribution to the MCA dimension. In this plot, the distance between the row points or column points can be used to measure their similarity or dissimilarity depending on how close the distance is. For example, Stay_home_1, Internal_movement_1, and International_travel_3 are fairly close to each other in this plot and highly contribute to MCA dimension 1 (x-axis). Therefore, it can be concluded that they have similar profiles. These three variable categories or levels also contribute to MCA dimension 2 (y-axis).

# Biplot
# fviz_mca_biplot(mca, 
#                repel = TRUE, # avoid text overlapping 
#                ggtheme = theme_gray())

fviz_mca_biplot(mca, col.var = "contrib",
             gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"), 
             repel = TRUE, # avoid text overlapping (slow)
             ggtheme = theme_gray()
             )

Figure 6: MCA Biplot

# Graph of variables
var <- get_mca_var(mca)
# Coordinates
# head(var$coord)
# # Cos2: quality on the factore map
# head(var$cos2)
# # Contributions to the principal components
# head(var$contrib)

The correlation between variables and MCA principal dimensions can also be assessed using the plot of MCA variables in Figure 7 below. Using this plot, we can identify which of the variables contribute the most to MCA dimensions 1 (x-axis) and 2 (y-axis). In this plot, the squared correlations between MCA dimensions and the variables are used as coordinates. It is apparent that, the variables Stay_home and Internal_movement are the most correlated with MCA dimension 1. These two variables are also the most correlated with MCA dimension 2.

fviz_mca_var(mca, choice = "mca.cor", 
            repel = TRUE, # Avoid text overlapping (slow)
            ggtheme = theme_gray())

Figure 7: MCA Variables

Inferential Analysis

Since the variables included in the data are categorical and the questions of interest for this analysis are primarily focused on the effects of the different levels in these predictors on the total number of positive Covid-19 cases worldwide, using analysis of variance (ANOVA) can help determine if the dependent variable (Covid-19 cases) changes according to the level of the independent variables (policies).

Variable Transformation

For this analysis, it can be tested whether New_cases can be transformed in order to remove its heavy right-skewed distribution. This variable is used as the response for the ANOVA model in the following section. Therefore, it is crucial to determine if this variable violates the normality assumption or not. Although the residual follow a constant variance, we can see from the normal Q-Q plot below (Figure 8) that the residuals are heavily right-tailed. The log transformation plot below (Figure 9) shows that the distribution of the New_cases variable is normally distributed after performing the log transformation. Thus, the ANOVA model can be fitted using the log of New_cases.

##Box-cox transformation
d<- data_monthly1[-1,c(3,7:12)]
fit1<-aov(New_cases~., data=d) #first-order model with the (non-transformed) variables
par(mfrow=c(2,2))
plot(fit1,pch =20, cex = 2, col = "cornflowerblue")

Figure 8: Diagnostic Plot

# summary(fit1)
# ###The Box-Cox likelihood suggests a power transformation with power -0.02.
# bc<-MASS::boxcox(fit1, lambda = seq(-4, 4, by = 0.01),plotit = T)
data_monthly2 <- data_monthly1
data_monthly2$New_cases <- log(data_monthly2$New_cases)
data_monthly2$New_cases[which(!is.finite(data_monthly2$New_cases))] <- 0 #replace inf val with 0 for log(0) = inf

library(ggplot2)
library(plotly)
f2 <- ggplot(data_monthly1, aes(x = New_cases)) + xlab("Histogram of New_cases") +
    geom_histogram(aes(y = ..density..),bins = 30, fill = "#69b3a2", color = "white",alpha=0.8) +
                stat_function(fun = dnorm,
                args = list(mean = mean(data_monthly1$New_cases),
                            sd = sd(data_monthly1$New_cases)),
                col = "grey40",
                size = 0.5)+labs(x="Histogram of New_cases")

f4 <- ggplot(data_monthly1, aes(x = log(data_monthly1$New_cases))) +
    geom_histogram(aes(y = ..density..),bins = 30, fill = "#69b3a2", color = "white",alpha=0.8) +
                stat_function(fun = dnorm,
                args = list(mean = mean(log(data_monthly1$New_cases)),
                            sd = sd(log(data_monthly1$New_cases))),
                col = "grey40",
                size = 1)+labs(x="Histogram of log(New_cases)")

plotly::subplot(f2, f4, nrows=1, shareX=F, shareY=F, margin =0.02,titleX = TRUE)%>% layout(title ="Log Transformation", titlefont = list(
           family = "Agency FB",
           size = 26,
           color = 'black') , font = list(family = "Agency FB", size = 18),margin = 0.1, yaxis = list(
        color = 'black'))

Figure 9: Log transformation of response variable

Fixed-effect ANOVA Model

For this analysis, the proposed fixed-effect ANOVA model with multiple factors in the factor-effect form defined below uses the categorical policy variables (Face_cover (\(\alpha\)), Vaccination (\(\beta\)), Internal_movement (\(\gamma\)), Stay_home (\(\delta\)), International_travel (\(\zeta\)), WHO_region (\(\eta\)) for predicting New_cases.

\[Y_{ijk\ell m n o} = \mu_{\cdot\cdot }+ \alpha_i + \beta_j + \gamma_k+ \delta_{\ell} + \zeta_m + \eta_n +\epsilon_{ijk\ell m n o}\] where

\(o = 1, 2, \cdots, n_{ijk\ell m n}\), where \(n_{ijk\ell m n}\)= is the number of observations in cell (\(i,j,k,\ell,m, n\)) or the combinations of the different levels of the categorical predictors

\(i = 0,1, \cdots, 4\)

\(j = 0,1, \cdots, 5\)

\(k = 0, 1, 2\)

\(\ell = 0, 1, \cdots, 3\)

\(m = 0, 1, \cdots, 4\)

\(n = 1, 2, \cdots, 6\)

\(\epsilon_{ijk\ell m n o}\) (error terms) are i.i.d. \(N(0,\sigma^2)\)

\(Y_{ijk\ell m n o}\) = \(o^{th}\) observation from the \(i^{th}\) level of Face_cover, \(j^{th}\) level of Vaccination, \(k^{th}\) level of Internal_movement, \(\ell ^{th}\) level of Stay_home, \(m ^{th}\) level of International_travel policies, and \(n ^{th}\) group of WHO_region.

\(\mu_{\cdot\cdot } = \sum_{i=0}^{4} \sum_{j=0}^{5} \sum_{k=0}^{2} \sum_{\ell=0}^{3} \sum_{m=0}^{4} \sum_{n=1}^{6} \frac{\mu_{ijk\ell m}}{ijk\ell m}\) (overall mean across all populations)

\(\alpha_i = \mu_{i\cdot} - \mu_{\cdot\cdot}\) = factor effect of Face_cover with \(\sum_{i=0}^{4} \alpha_i = 0\) constraint

\(\beta_j = \mu_{j\cdot} - \mu_{\cdot\cdot}\) = factor effect of Vaccination with \(\sum_{j=0}^{5} \beta_j =0\) constraint

\(\gamma_k = \mu_{k\cdot} - \mu_{\cdot\cdot}\) = factor effect of Internal_movement with \(\sum_{k=0}^{2} \gamma_k =0\) constraint

\(\delta_{\ell} = \mu_{\ell\cdot} - \mu_{\cdot\cdot}\) = factor effect of Stay_home with \(\sum_{\ell=0}^{3} \delta_{\ell} =0\) constraint

\(\zeta_m = \mu_{m\cdot} - \mu_{\cdot\cdot}\) = factor effect of International_travel with \(\sum_{m=0}^{4} \zeta_m =0\) constraint

\(\eta_n = \mu_{n\cdot} - \mu_{\cdot\cdot}\) = factor effect of WHO_region with \(\sum_{n=1}^{6} \eta_n =0\) constraint

Hypothesis

Because the is analysis is focused on testing the effects of five independent variables as well as their interaction on an outcome measure, it is important to state a hypothesis test. For the main effects, which are the effects of the different levels of a single independent variable, the null hypothesis (\(H_o\)) is that the means of the different levels of a given independent variable are not different from each other, while the alternative hypothesis (\(H_a\)) is that these groups are different from each other as follows.

\[H_o : \mu_1 =\mu_2 = \cdots = \mu_k = 0 \ \ \ vs. \ \ H_a: \ not \ all \ \mu_i \ equals \ 0\] The decision rule for this hypothesis are defined as follows, where \(F(1-\alpha;r−1,n_T - r)\) is the \((1-\alpha)100\) percentile of the appropriate F distribution.

If \(F^* \le F(1-\alpha;r−1,n_T - r)\), then conclude \(H_o\).
If \(F^* > F(1-\alpha;r−1,n_T - r)\), then conclude \(H_a\).

Choosing a significance level \(\alpha=0.01\), it is evident that the Main Effect of each factor are all statistically significant based on the output of the two models specified below, where all the p-values are less than 2e-16 as shown in the summary output. The null hypothesis stated above can be rejected because of the strong evidence observed. It can also be concluded that there are significant differences in the impact of each factor on New_cases. It is important to note that this analysis is only focused on the effect of the different region and policies without considering the possible effects of the variable Country in the data. Since fitting the model with the variable Country is more complex and requires longer time to complete, choosing the variable WHO_region as the between-subjects factor is more efficient.

Summary output of the full additive ANOVA model without interactions:

data_monthly2$WHO_region <- as.factor(data_monthly2$WHO_region)
reduced_mod=aov(New_cases~Face_cover+Vaccination+Internal_movement+Stay_home+International_travel+WHO_region, data=data_monthly2)
summary(reduced_mod)

##                        Df Sum Sq Mean Sq F value Pr(>F)    
## Face_cover              4  22698    5675  888.22 <2e-16 ***
## Vaccination             5   2374     475   74.33 <2e-16 ***
## Internal_movement       2   4348    2174  340.32 <2e-16 ***
## Stay_home               3   1038     346   54.18 <2e-16 ***
## International_travel    4   2289     572   89.59 <2e-16 ***
## WHO_region              5   3923     785  122.80 <2e-16 ***
## Residuals            4617  29496       6                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The ANOVA table of the full additive ANOVA model without interactions is displayed below and shows that all of the factors are statistically significant at signficance level \(\alpha=0.05\). The ges (Generalized Eta-Squared), which measures the effect size, also shows the the factors Face_cover, Vaccination, and WHO_region have the highest effect size out of all the 6 factors included in the model.

ANOVA Table (Type II test) for full additive ANOVA model without interactions:

library(rstatix)
res.aov <- anova_test(New_cases~Face_cover+Vaccination+Internal_movement+Stay_home+International_travel+WHO_region, data=data_monthly2)
res.aov

## ANOVA Table (type II tests)
## 
##                 Effect DFn  DFd       F         p p<.05   ges
## 1           Face_cover   4 4617 216.545 1.52e-170     * 0.158
## 2          Vaccination   5 4617  74.513  2.71e-75     * 0.075
## 3    Internal_movement   2 4617 110.358  1.52e-47     * 0.046
## 4            Stay_home   3 4617  47.493  3.19e-30     * 0.030
## 5 International_travel   4 4617  54.995  2.29e-45     * 0.045
## 6           WHO_region   5 4617 122.803 2.22e-122     * 0.117

It can be further tested whether the interactions between the categorical predictors have a significant impact on the response variable New_cases. Based on the ANOVA table displayed below, the interaction between these factors appear to have a significant impact on the response with small p-values suggesting that there there is a strong evidence against the null hypothesis that the interactions between the predictors have no significant effect on the response. Additional tests can be performed in order to find the best combination of predictors for this case where their effect on the response is statistically significant. The summary output of the full ANOVA model shown below also shows that all 5 policy factors are statistically significant at significance level \(\alpha\)=0.01. By looking at this summary output, we can see that the interaction between all of the factors are equally statistically significant. Therefore, additional tests are not needed to identify which of the interactions are the best combinations. Based on these results, all of the interactions between the categorical predictors are statistically significant. Therefore the final anova model must be additive with all possible interactions.

Summary output of the full ANOVA model with interactions:

# Test for interactions 
# library(gtsummary)
full_mod=aov(New_cases~(Face_cover+Vaccination+Internal_movement+Stay_home+International_travel+WHO_region)^2,data=data_monthly2)
summary(full_mod)

##                                          Df Sum Sq Mean Sq  F value   Pr(>F)
## Face_cover                                4  22698    5675 1081.020  < 2e-16
## Vaccination                               5   2374     475   90.461  < 2e-16
## Internal_movement                         2   4348    2174  414.190  < 2e-16
## Stay_home                                 3   1038     346   65.942  < 2e-16
## International_travel                      4   2289     572  109.037  < 2e-16
## WHO_region                                5   3923     785  149.459  < 2e-16
## Face_cover:Vaccination                   20    735      37    7.002  < 2e-16
## Face_cover:Internal_movement              8    819     102   19.496  < 2e-16
## Face_cover:Stay_home                     12    450      37    7.137 4.71e-13
## Face_cover:International_travel          15    334      22    4.242 6.85e-08
## Face_cover:WHO_region                    20    927      46    8.830  < 2e-16
## Vaccination:Internal_movement            10    209      21    3.987 1.89e-05
## Vaccination:Stay_home                    15    234      16    2.970 9.53e-05
## Vaccination:International_travel         18    332      18    3.515 6.80e-07
## Vaccination:WHO_region                   25    364      15    2.776 5.34e-06
## Internal_movement:Stay_home               6    159      27    5.057 3.52e-05
## Internal_movement:International_travel    8    308      39    7.345 9.55e-10
## Internal_movement:WHO_region             10    365      36    6.953 6.76e-11
## Stay_home:International_travel           11    128      12    2.218   0.0113
## Stay_home:WHO_region                     15    382      25    4.847 1.80e-09
## International_travel:WHO_region          20    632      32    6.022 4.24e-16
## Residuals                              4404  23118       5                  
##                                           
## Face_cover                             ***
## Vaccination                            ***
## Internal_movement                      ***
## Stay_home                              ***
## International_travel                   ***
## WHO_region                             ***
## Face_cover:Vaccination                 ***
## Face_cover:Internal_movement           ***
## Face_cover:Stay_home                   ***
## Face_cover:International_travel        ***
## Face_cover:WHO_region                  ***
## Vaccination:Internal_movement          ***
## Vaccination:Stay_home                  ***
## Vaccination:International_travel       ***
## Vaccination:WHO_region                 ***
## Internal_movement:Stay_home            ***
## Internal_movement:International_travel ***
## Internal_movement:WHO_region           ***
## Stay_home:International_travel         *  
## Stay_home:WHO_region                   ***
## International_travel:WHO_region        ***
## Residuals                                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# tbl_regression(full_mod)
# sapply(data_monthly1, class)

ANOVA Table of full and reduced model comparison:

reduced_mod=aov(New_cases~Face_cover+Vaccination+Internal_movement+Stay_home+International_travel+WHO_region, data=data_monthly2)
anova(reduced_mod,full_mod)

## Analysis of Variance Table
## 
## Model 1: New_cases ~ Face_cover + Vaccination + Internal_movement + Stay_home + 
##     International_travel + WHO_region
## Model 2: New_cases ~ (Face_cover + Vaccination + Internal_movement + Stay_home + 
##     International_travel + WHO_region)^2
##   Res.Df   RSS  Df Sum of Sq      F    Pr(>F)    
## 1   4617 29496                                   
## 2   4404 23118 213    6378.6 5.7049 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Mean Pair-Wise Comparisons

In the previous section where Figure 4 is shown, it is observed that the means within each factor vary between different levels or groups. Tukey’s HSD (honestly significant difference) is utilized to perform the pairwise comparison of means (\(\mu_i -\mu_{i'}\)). The coverage is exactly 1−\(\alpha\) for this study it is at least 1−α because it is an unbalanced case.The following tables show the mean pairwise comparisons within each factor using the Tukey HSD method. The p-values in each table appears to be extremely small the pairwise differences are significant, which proves the observation obtained using Figure 4 in the previous section.

# ```{r, echo = FALSE, results='asis', message=FALSE, warning=FALSE}
sig.level= 0.01
T.cii=TukeyHSD(full_mod,conf.level = 1-sig.level)
# par(mfrow=c(3,5))
# plot(T.cii, las=1 , col="red")
# title(main=print(paste0("95% family-wise confidence level")))
# Differences of the two largest means
idxx=list();
idxx[[1]]=data_monthly2$Face_cover
means.combb=tapply(data_monthly$New_cases, INDEX=idxx,mean)
means.combb<-as.data.frame(means.combb)
# library(ggiraphExtra)
# library(gridExtra)
# f1<-ggHSD(T.cii, no = 1, digits = 2, interactive = F)
# f2<-ggHSD(T.cii, no = 2, digits = 2, interactive = FALSE)
# grid.arrange(f1, f2, ncol =2)

# Find the confidence interval among the many comparisons...
tcii1 <- T.cii[['Face_cover']]
tcii2 <- T.cii[['Vaccination']]
tcii3 <- T.cii[['Internal_movement']]
tcii4 <- T.cii[['Stay_home']]
tcii5 <- T.cii[['International_travel']]
tcii6 <- T.cii[['WHO_region']]

tcii<-rbind(tcii1,tcii2, tcii3, tcii4, tcii5, tcii6)

datatable(tcii1, caption = htmltools::tags$caption(
                  style = 'caption-side: bottom; text-align: center;',
                  'Table 5: ', htmltools::em('Face Cover Pairwise Comparison on Covid-19 cases'), rownames = FALSE,filter="top", options = list(pageLength = 10, autoWidth = TRUE, scrollX=T, columnDefs = list(list(width = '50px', targets = "_all")))))

datatable(tcii2, caption = htmltools::tags$caption(
                  style = 'caption-side: bottom; text-align: center;',
                  'Table 6: ', htmltools::em('Vaccination Pairwise Comparison on Covid-19 cases'), rownames = FALSE,filter="top", options = list(pageLength = 10, autoWidth = TRUE, scrollX=T, columnDefs = list(list(width = '50px', targets = "_all")))))

datatable(tcii3, caption = htmltools::tags$caption(
                  style = 'caption-side: bottom; text-align: center;',
                  'Table 7: ', htmltools::em('Internal movement  Pairwise Comparison on Covid-19 cases'), rownames = FALSE,filter="top", options = list(pageLength = 10, autoWidth = TRUE, scrollX=T, columnDefs = list(list(width = '50px', targets = "_all")))))

datatable(tcii4, caption = htmltools::tags$caption(
                  style = 'caption-side: bottom; text-align: center;',
                  'Table 8: ', htmltools::em('Stay at home Pairwise Comparison on Covid-19 cases'), rownames = FALSE,filter="top", options = list(pageLength = 10, autoWidth = TRUE, scrollX=T, columnDefs = list(list(width = '50px', targets = "_all")))))

datatable(tcii5, caption = htmltools::tags$caption(
                  style = 'caption-side: bottom; text-align: center;',
                  'Table 9: ', htmltools::em('International travel Pairwise Comparison on Covid-19 cases'), rownames = FALSE,filter="top", options = list(pageLength = 10, autoWidth = TRUE, scrollX=T, columnDefs = list(list(width = '50px', targets = "_all")))))

datatable(tcii6, caption = htmltools::tags$caption(
                  style = 'caption-side: bottom; text-align: center;',
                  'Table 10: ', htmltools::em('WHO Region Pairwise Comparison on Covid-19 cases'), rownames = FALSE,filter="top", options = list(pageLength = 10, autoWidth = TRUE, scrollX=T, columnDefs = list(list(width = '50px', targets = "_all")))))

Sensitivity Analysis

Each of the assumptions stated for the ANOVA model can be tested in order to verify if the chosen model is reliable or not. The following lsits the assumptions associated with the ANOVA model used for this analysis.

The error terms are independent.
The error terms are normally distributed.
The data does not have any significant outliers.
The variance of the error terms are equal.
The variances of the sampled populations are equal.

The Normal Q-Q Plot shown below (Figure 10) can be used to determine if the residuals follow a normal distribution. Since all of the points are roughly along the diagonal dashed line, it can be inferred that the the error terms \(\epsilon_{ijk\ell mn}\) follow a normal distribution without significant outliers. Furthermore, the Residuals against Fitted values shows that the red line is horizontal, which suggests that the error terms \(\epsilon_{ijk\ell mn}\) are independent. Based on the Residuals vs Leverage plot below, it appears that there are no possible signficant outlier that needs to be removed from the data.

par(mfrow=c(2,2))
plot(full_mod,pch =20, cex = 2, col = "aquamarine2")

Figure 10: Model Diagnostic Plots

The Levene test method is used to test the homogeneity of variance and generate the results shown below. The null hypothesis for the Levene test method states that the variances across each population are equal (\(H_o: \sigma_{1}^{2} = \sigma_{2}^{2} = \ldots = \sigma_{a}^{2}\)) and the alternative hypothesis is defined as \(H_a: \sigma_{i}^{2} \ne \sigma_{j}^{2}\) for at least two populations. For this test, the \(F\)-statistic is computed for \(H_0: \mathbb{E}[d_{1\cdot}]=\mathbb{E}[d_{2\cdot}] = \cdots =\mathbb{E}[d_{r\cdot}]\) and \(H_0\) is rejected if \(F^*>F(1-\alpha; r-1, n_T-r)\) at significance level \(\alpha\).The Levene test is performed below by simply using the leveneTest() function. The p-values for all factors are statistically significant, which declares there is non-homogeneity across their respective levels when tested to significance level \(\alpha\)=0.01. This confirms that the equal variance assumption stated for this model is violated. Therefore, further investigation is needed to assess the source of non-homogenous variance that we see in this result. However, for the model diagnostics portion of this report we use the results obtained using the diagnostics plot above.

Face covering:

library(car)
leveneTest(New_cases~Face_cover,data_monthly2)

## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value    Pr(>F)    
## group    4  24.148 < 2.2e-16 ***
##       4636                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Vaccination:

leveneTest(New_cases~Vaccination,data_monthly2)

## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value    Pr(>F)    
## group    5  46.511 < 2.2e-16 ***
##       4635                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Internal movement:

leveneTest(New_cases~Internal_movement,data_monthly2)

## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value    Pr(>F)    
## group    2   97.59 < 2.2e-16 ***
##       4638                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Stay-at-home:

leveneTest(New_cases~Stay_home,data_monthly2)

## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value    Pr(>F)    
## group    3  141.48 < 2.2e-16 ***
##       4637                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

International travel:

leveneTest(New_cases~International_travel,data_monthly2)

## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value    Pr(>F)    
## group    4  36.238 < 2.2e-16 ***
##       4636                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

WHO region:

leveneTest(New_cases~WHO_region,data_monthly2)

## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value    Pr(>F)    
## group    5  36.362 < 2.2e-16 ***
##       4635                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The table below shows the different possible outliers in this data. However, in the diagnostics plot above wecan see that they do not have significant effect on the model assumptions so they can be retained.

#outlier
p<-data_monthly1 %>%
  group_by(WHO_region, Face_cover, Vaccination, Internal_movement, International_travel, Stay_home) %>%
  identify_outliers(New_cases)
datatable(p, caption = htmltools::tags$caption(
                  style = 'caption-side: bottom; text-align: center;',
                  'Table 11: ', htmltools::em('Outliers'), rownames = FALSE,filter="top", options = list(pageLength = 5, autoWidth = TRUE, scrollX=T, columnDefs = list(list(width = '50px', targets = "_all")))))

Discussion

The primary objective of this analysis is to identify whether public health policies have a significant impact on the number of positive Covid-19 cases using data obtained from various sources. Using the Analysis of variance (ANOVA) method to assess the effect of public health policies on Covid-19 cases across different regions, we can conclude that all the public health policies have significant association with the number of positive cases in different regions around the world. To answer the secondary question of interest stated in the introduction section, face covering and vaccination policies have the highest effect size out of all the 6 factors included in the model while stay-at-home order has the least effect size. It important to clarify that these identified association between the number of positive Covid-19 cases and the significant factors may be a classified as an indirect association due to the fact that successful transmission of the Covid-19 virus may depend on other factors that are excluded from this analysis that can be used to define the significant relationships observed. Further investigations maybe necessary in order to verify the results obtained from the methods used in this analysis. The relationships between the variables in the data used for this project can be explored using other methods such as Mixed Effect ANOVA with multiple factors using Country as the random factor and holding the policy factors fixed. The results of using Fixed-effect ANOVA suggests that all public health policies and their interactions have significant effect on the number of positive Covid-19 cases across different regions worldwide. This may be the reason why the current number of positive cases is generally declining however, there were times during the pandemic when these factors were not effective due the increase in positive cases recorded. Therefore, we should not only consider these factors when determining the possible causes of increased virus transmission if if they are extremely helpful. The presence of other factor might play an important role and override the effects of the public health policies. Overall, government interventions are necessary during a pandemic as we see in these results.

Acknowledgement

https://statisticsglobe.com/aggregate-daily-data-to-month-year-intervals-in-r

https://www.r-bloggers.com/2018/01/bitcoin-world-map-bubbles/

https://www.machinelearningplus.com/machine-learning/feature-selection/

https://github.com/bbc/bbplot

https://rawgit.com/valentinitnelav/valentinitnelav.github.io/master/assets/2018-08-25-PCA-interactive-biplot/PCA-interactive-biplot.html

https://rpubs.com/Saskia/520216

http://www.sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/114-mca-multiple-correspondence-analysis-in-r-essentials/

https://machinelearningmastery.com/feature-selection-with-categorical-data/

https://services.google.com/fh/files/misc/exploratory_data_analysis_for_feature_selection_in_machine_learning.pdf

Data Sources:

Covid-19

Other

Reference

https://bmcpublichealth.biomedcentral.com/articles/10.1186/s12889-021-11111-1#

https://www.cdc.gov/mmwr/volumes/71/wr/mm7106e1.htm

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0248849#

https://bmcpublichealth.biomedcentral.com/articles/10.1186/s12889-021-11889-0

Appendix

Github Repository

Session information

sessionInfo()

## R version 4.2.3 (2023-03-15 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 22621)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.utf8 
## [2] LC_CTYPE=English_United States.utf8   
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.utf8    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] car_3.1-1          carData_3.0-5      rstatix_0.7.2      factoextra_1.0.7  
##  [5] FactoMineR_2.7     hrbrthemes_0.8.0   ggcorrplot_0.1.4   plotly_4.10.1     
##  [9] ggplot2_3.4.1      summarytools_1.0.1 DT_0.27            dplyr_1.1.0       
## [13] zoo_1.8-11         kableExtra_1.3.4  
## 
## loaded via a namespace (and not attached):
##   [1] colorspace_2.1-0        ggsignif_0.6.4          pryr_0.1.6             
##   [4] ellipsis_0.3.2          estimability_1.4.1      base64enc_0.1-3        
##   [7] rstudioapi_0.14         httpcode_0.3.0          ggpubr_0.6.0           
##  [10] farver_2.1.1            ggrepel_0.9.3           fansi_1.0.4            
##  [13] mvtnorm_1.1-3           lubridate_1.9.2         xml2_1.3.3             
##  [16] codetools_0.2-19        leaps_3.1               extrafont_0.19         
##  [19] cachem_1.0.7            knitr_1.42              jsonlite_1.8.4         
##  [22] broom_1.0.4             Rttf2pt1_1.3.12         cluster_2.1.4          
##  [25] shiny_1.7.4             compiler_4.2.3          httr_1.4.5             
##  [28] emmeans_1.8.5           backports_1.4.1         fastmap_1.1.1          
##  [31] lazyeval_0.2.2          cli_3.6.0               later_1.3.0            
##  [34] htmltools_0.5.4         tools_4.2.3             gtable_0.3.1           
##  [37] glue_1.6.2              reshape2_1.4.4          Rcpp_1.0.10            
##  [40] jquerylib_0.1.4         fontquiver_0.2.1        vctrs_0.6.0            
##  [43] crul_1.3                svglite_2.1.1           extrafontdb_1.0        
##  [46] crosstalk_1.2.0         xfun_0.37               stringr_1.5.0          
##  [49] rvest_1.0.3             timechange_0.2.0        mime_0.12              
##  [52] lifecycle_1.0.3         MASS_7.3-58.2           scales_1.2.1           
##  [55] promises_1.2.0.1        fontLiberation_0.1.0    RColorBrewer_1.1-3     
##  [58] yaml_2.3.7              curl_5.0.0              memoise_2.0.1          
##  [61] pander_0.6.5            gdtools_0.3.2           sass_0.4.5             
##  [64] stringi_1.7.12          fontBitstreamVera_0.1.1 highr_0.10             
##  [67] checkmate_2.1.0         rlang_1.1.0             pkgconfig_2.0.3        
##  [70] systemfonts_1.0.4       matrixStats_0.63.0      evaluate_0.20          
##  [73] lattice_0.20-45         purrr_1.0.1             rapportools_1.1        
##  [76] htmlwidgets_1.6.1       labeling_0.4.2          tidyselect_1.2.0       
##  [79] plyr_1.8.8              magrittr_2.0.3          bookdown_0.33          
##  [82] R6_2.5.1                magick_2.7.4            generics_0.1.3         
##  [85] multcompView_0.1-8      pillar_1.8.1            withr_2.5.0            
##  [88] abind_1.4-5             scatterplot3d_0.3-43    tibble_3.2.0           
##  [91] crayon_1.5.2            gfonts_0.2.0            utf8_1.2.3             
##  [94] rmarkdown_2.20          grid_4.2.3              data.table_1.14.8      
##  [97] rmdformats_1.0.4        digest_0.6.31           flashClust_1.01-2      
## [100] webshot_0.5.4           xtable_1.8-4            tidyr_1.3.0            
## [103] httpuv_1.6.9            munsell_0.5.0           viridisLite_0.4.1      
## [106] bslib_0.4.2             tcltk_4.2.3

Covid-19 Analysis: Assessment of Covid-19 Policies Effectiveness

Kay Royo (920227691)

3/13/22

Abstract

Abstract

Introduction

Background

Data

Descriptive Analysis

Summary Statistics

Data Frame Summary

monthly_data

Data Visualization

Exploratory Analysis

Multiple correspondence analysis (MCA)

Inferential Analysis

Variable Transformation

Fixed-effect ANOVA Model

Hypothesis

Mean Pair-Wise Comparisons

Sensitivity Analysis

Discussion

Acknowledgement

Reference

Appendix

Session information