Coronavirus disease (COVID-19) is an infectious disease caused by a newly discovered coronavirus.
WHO is bringing the world’s scientists and global health professionals together to accelerate the research and development process, and develop new norms and standards to contain the spread of the coronavirus pandemic and help care for those affected. https://www.who.int/emergencies/diseases/novel-coronavirus-2019
The Department of Health (DOH) is the principal health agency in the Philippines. It is responsible for ensuring access to basic public health services to all Filipinos through the provision of quality health care and regulation of providers of health goods and services. https://doh.gov.ph/about-us
Objective: The Analysis of Covid-19 in the Philippines through this study aims to observe the day to day confirmed cases over time. To see a pattern and its trend that in such, by using statistical means, formulate a prediction model allowing us to formulate actionable plans should we see an increasing trend.
DATA RETRIEVAL AND STRUCTURE
Data is retrieved from the DOH repository, cleaned and organized in such a way that is readable in our analysis.
dohacc <- read.csv("dohacc.csv")
dohacc$date <- as.Date(dohacc$date, "%m/%d/%Y")
dohacc$year <- format(dohacc$date, format = "%Y")
dohacc$confirm <- as.numeric(dohacc$confirm)
dohacc$month <- format(dohacc$date, format = "%m/%Y")
dohacc$accumulate <- as.numeric(dohacc$accumulate)
Inspect the Data structure and view first few rows
str(dohacc)
'data.frame': 394 obs. of 5 variables:
$ date : Date, format: "2020-01-30" ...
$ confirm : num 1 1 1 2 1 4 14 9 16 3 ...
$ accumulate: num 1 2 3 5 6 10 24 33 49 52 ...
$ year : chr "2020" "2020" "2020" "2020" ...
$ month : chr "01/2020" "02/2020" "02/2020" "03/2020" ...
head(dohacc)
Summary and Definitions data.frame’: 394 obs. of 5 variables:
The data set has 394 observations or Date range from Jan 1, 2020 to Mar 31, 2021
The data has 5 variable and by definition:
Date: The date, in daily format, that constitutes the confirmed cases reports
Confirm: is the daily confirmed cases as reported through DOH
Accumulate: is the daily accumlated cases that compounds daily
year: The Year range of the report
month: The month and year range of the report (year is added as we already overlap between years)
CONFIRMED CASES SUMMARY
year2020 <- dohacc %>% filter(year=="2020")
year2021 <- dohacc %>% filter(year=="2021")
sum(dohacc$confirm)
[1] 747446
From January 1,2020 to March 31, 2021, the Philippines has 747,446 confirmed cases
Year in Review
dohacc %>% group_by(year) %>% summarize(confirmed_cases=sum(confirm))
dohacc %>% ggplot(aes(date,confirm)) + geom_point() + ggtitle("Covid 19 Daily Confirmed cases Jan 2020 - Mar 2021")
Plot of the daily confirmed cases from Jan 2020 to Mar 2021. Looking at the data we see two major spikes of cases, Aug 2020 and Mar of 2021
dohacc %>% ggplot(aes(date,accumulate)) + geom_point() + ggtitle("Covid 19 Daily Accumulated Confirmed cases Jan 2020 - Mar 2021")
Overall, the Philippines continues to experience an exponential trend in confirmed cases
monthly <- dohacc %>% group_by(month) %>% summarise(confirm_cases=sum(confirm))
monthly$month <- factor(monthly$month, levels= c("01/2020","02/2020","03/2020", "04/2020", "05/2020","06/2020", "07/2020", "08/2020", "09/2020","10/2020", "11/2020", "12/2020","01/2021","02/2021","03/2021" ))
monthly %>% ggplot(aes(x=month, y=confirm_cases)) + geom_bar(stat = "identity", position = position_stack()) + theme(axis.text.x = element_text(angle = 45)) + ggtitle("Covid-19 Monthly Trend Philippines")
Observing monthly cases, we see that since Jul 2020 we have been averaging 50,000 cases per month peaking at Aug 2020 however just recently Mar 2021 we have surpassed the all time peakest - something that DOH needs to take action to.
COVID-19 FORECAST FOR THE NEXT 30 DAYS
Time series forecasting is the use of a model to predict future values based on previously observed values. https://en.wikipedia.org/wiki/Time_series
We will use TSF as a model to forecast the next 30 days from Mar 31,2020.
We will start our model at January 1, 2021
cases2021 <- dohacc %>% filter(year==2021)
ds <- cases2021$date
y <- cases2021$confirm
df <- data.frame(ds,y)
m <- prophet(df)
Disabling yearly seasonality. Run prophet with yearly.seasonality=TRUE to override this.
Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
future <- make_future_dataframe(m, periods=30)
forecast <- predict(m,future)
dyplot.prophet(m, forecast, main="Jan 2020 - Mar 2021 Data + 30 days forecast")
NA
summary(forecast)
ds trend additive_terms
Min. :2021-01-01 00:00:00 Min. : 1365 Min. :-709.843
1st Qu.:2021-01-30 18:00:00 1st Qu.: 1711 1st Qu.:-504.749
Median :2021-03-01 12:00:00 Median : 2213 Median : 194.788
Mean :2021-03-01 12:00:00 Mean : 5581 Mean : 3.575
3rd Qu.:2021-03-31 06:00:00 3rd Qu.: 9365 3rd Qu.: 428.959
Max. :2021-04-30 00:00:00 Max. :16859 Max. : 456.152
additive_terms_lower additive_terms_upper weekly
Min. :-709.843 Min. :-709.843 Min. :-709.843
1st Qu.:-504.749 1st Qu.:-504.749 1st Qu.:-504.749
Median : 194.788 Median : 194.788 Median : 194.788
Mean : 3.575 Mean : 3.575 Mean : 3.575
3rd Qu.: 428.959 3rd Qu.: 428.959 3rd Qu.: 428.959
Max. : 456.152 Max. : 456.152 Max. : 456.152
weekly_lower weekly_upper multiplicative_terms
Min. :-709.843 Min. :-709.843 Min. :0
1st Qu.:-504.749 1st Qu.:-504.749 1st Qu.:0
Median : 194.788 Median : 194.788 Median :0
Mean : 3.575 Mean : 3.575 Mean :0
3rd Qu.: 428.959 3rd Qu.: 428.959 3rd Qu.:0
Max. : 456.152 Max. : 456.152 Max. :0
multiplicative_terms_lower multiplicative_terms_upper
Min. :0 Min. :0
1st Qu.:0 1st Qu.:0
Median :0 Median :0
Mean :0 Mean :0
3rd Qu.:0 3rd Qu.:0
Max. :0 Max. :0
yhat_lower yhat_upper trend_lower trend_upper
Min. : 50.98 Min. : 1373 Min. : 1365 Min. : 1365
1st Qu.: 1134.15 1st Qu.: 2451 1st Qu.: 1711 1st Qu.: 1711
Median : 1824.53 Median : 3162 Median : 2213 Median : 2213
Mean : 4876.51 Mean : 6296 Mean : 5497 Mean : 5670
3rd Qu.: 8437.57 3rd Qu.: 9837 3rd Qu.: 9365 3rd Qu.: 9365
Max. :16175.00 Max. :18431 Max. :16023 Max. :17743
yhat
Min. : 713.7
1st Qu.: 1789.3
Median : 2491.7
Mean : 5584.8
3rd Qu.: 9116.5
Max. :17287.9
SUMMARY
Our predictions end at April 30, 2021.
By that date, we anticipate the trend to be between 16,000 - 17,000 per day with extreme between 16,000 - 18,000 per day.
Although this prediction will change as we progress through time and as we record higher or lower confirmed cases daily.
With this data and prediction, we hope that the government, business sector and the people will remain steadfast and cooperative to defeat this global pandemic balancing the economic impact that is affecting even up to grassroots level.
Prepared By Dodgecarl Incila