R Packages Utilized
library(readr)
library(ggplot2)
library(ggfortify)
library(forecast)
library(psych)
## library(dpylr) below becuase it created issues with R Markdown knit
Importing the datasets
## dengue_features_train.xlsx
train <- read_csv("C:/Users/bryce_anderson/Desktop/Boston College/Predictive Analytics and Forecasting/Week 7 (PA&F)/Project datasets/dengue_features_train.csv")
## dengue_features_test.xlsx
test <- read_csv("C:/Users/bryce_anderson/Desktop/Boston College/Predictive Analytics and Forecasting/Week 7 (PA&F)/Project datasets/dengue_features_test.csv")
## dengue_labels_train.xlsx
label <- read_csv("C:/Users/bryce_anderson/Desktop/Boston College/Predictive Analytics and Forecasting/Week 7 (PA&F)/Project datasets/dengue_labels_train.csv")
## submission_format.xlsx
submit <- read_csv("C:/Users/bryce_anderson/Desktop/Boston College/Predictive Analytics and Forecasting/Week 7 (PA&F)/Project datasets/submission_format.csv")
Combining the Label set with the Train and Set datasets
## Joining the datasets using the dpylr() function
library(dplyr)
train <- left_join(x=label, y=train, by=c("year", "weekofyear", "city"))
## Omitting NA values from the Train set
train <- na.omit(train)
Data Exploration
Train set Exploration and Interpretation
## Summarizing the Train and Test datasets
summary(train)
## city year weekofyear total_cases
## Length:1199 Min. :1990 Min. : 1.0 Min. : 0.0
## Class :character 1st Qu.:1998 1st Qu.:14.0 1st Qu.: 4.0
## Mode :character Median :2002 Median :26.0 Median : 11.0
## Mean :2001 Mean :26.5 Mean : 21.2
## 3rd Qu.:2006 3rd Qu.:39.0 3rd Qu.: 26.0
## Max. :2010 Max. :52.0 Max. :329.0
## week_start_date ndvi_ne ndvi_nw
## Min. :1990-04-30 Min. :-0.40625 Min. :-0.45610
## 1st Qu.:1998-01-11 1st Qu.: 0.04242 1st Qu.: 0.05159
## Median :2002-09-03 Median : 0.12430 Median : 0.12680
## Mean :2001-10-18 Mean : 0.13972 Mean : 0.13436
## 3rd Qu.:2006-02-01 3rd Qu.: 0.24688 3rd Qu.: 0.22071
## Max. :2010-06-25 Max. : 0.50836 Max. : 0.45443
## ndvi_se ndvi_sw precipitation_amt_mm
## Min. :-0.01553 Min. :-0.06346 Min. : 0.00
## 1st Qu.: 0.15561 1st Qu.: 0.14495 1st Qu.: 12.55
## Median : 0.19673 Median : 0.19230 Median : 41.41
## Mean : 0.20556 Mean : 0.20571 Mean : 47.58
## 3rd Qu.: 0.25255 3rd Qu.: 0.25440 3rd Qu.: 71.77
## Max. : 0.53831 Max. : 0.54602 Max. :390.60
## reanalysis_air_temp_k reanalysis_avg_temp_k reanalysis_dew_point_temp_k
## Min. :294.6 Min. :294.9 Min. :289.6
## 1st Qu.:297.6 1st Qu.:298.3 1st Qu.:294.2
## Median :298.6 Median :299.3 Median :295.7
## Mean :298.7 Mean :299.2 Mean :295.3
## 3rd Qu.:299.8 3rd Qu.:300.2 3rd Qu.:296.5
## Max. :302.2 Max. :302.6 Max. :298.4
## reanalysis_max_air_temp_k reanalysis_min_air_temp_k
## Min. :297.8 Min. :286.9
## 1st Qu.:301.1 1st Qu.:293.6
## Median :302.6 Median :296.0
## Mean :303.7 Mean :295.6
## 3rd Qu.:306.0 3rd Qu.:297.9
## Max. :313.2 Max. :299.9
## reanalysis_precip_amt_kg_per_m2 reanalysis_relative_humidity_percent
## Min. : 0.00 Min. :57.79
## 1st Qu.: 13.53 1st Qu.:77.37
## Median : 28.60 Median :80.80
## Mean : 41.55 Mean :82.63
## 3rd Qu.: 54.75 3rd Qu.:88.16
## Max. :570.50 Max. :98.61
## reanalysis_sat_precip_amt_mm reanalysis_specific_humidity_g_per_kg
## Min. : 0.00 Min. :11.72
## 1st Qu.: 12.55 1st Qu.:15.65
## Median : 41.41 Median :17.17
## Mean : 47.58 Mean :16.81
## 3rd Qu.: 71.77 3rd Qu.:18.01
## Max. :390.60 Max. :20.46
## reanalysis_tdtr_k station_avg_temp_c station_diur_temp_rng_c
## Min. : 1.357 Min. :21.40 Min. : 4.529
## 1st Qu.: 2.357 1st Qu.:26.43 1st Qu.: 6.593
## Median : 3.000 Median :27.45 Median : 7.471
## Mean : 5.137 Mean :27.23 Mean : 8.248
## 3rd Qu.: 8.029 3rd Qu.:28.16 3rd Qu.:10.012
## Max. :16.029 Max. :30.80 Max. :15.800
## station_max_temp_c station_min_temp_c station_precip_mm
## Min. :26.70 Min. :14.70 Min. : 0.00
## 1st Qu.:31.40 1st Qu.:21.10 1st Qu.: 9.70
## Median :32.80 Median :22.10 Median : 24.70
## Mean :32.57 Mean :22.08 Mean : 40.92
## 3rd Qu.:33.90 3rd Qu.:23.30 3rd Qu.: 56.05
## Max. :42.20 Max. :25.60 Max. :543.30
summary(test)
## city year weekofyear week_start_date
## Length:416 Min. :2008 Min. : 1.00 Min. :2008-04-29
## Class :character 1st Qu.:2010 1st Qu.:13.75 1st Qu.:2010-04-28
## Mode :character Median :2011 Median :26.00 Median :2011-05-28
## Mean :2011 Mean :26.44 Mean :2011-04-04
## 3rd Qu.:2012 3rd Qu.:39.00 3rd Qu.:2012-05-27
## Max. :2013 Max. :53.00 Max. :2013-06-25
##
## ndvi_ne ndvi_nw ndvi_se ndvi_sw
## Min. :-0.4634 Min. :-0.21180 Min. :0.0062 Min. :-0.01467
## 1st Qu.:-0.0015 1st Qu.: 0.01597 1st Qu.:0.1487 1st Qu.: 0.13408
## Median : 0.1101 Median : 0.08870 Median :0.2042 Median : 0.18647
## Mean : 0.1260 Mean : 0.12680 Mean :0.2077 Mean : 0.20172
## 3rd Qu.: 0.2633 3rd Qu.: 0.24240 3rd Qu.:0.2549 3rd Qu.: 0.25324
## Max. : 0.5004 Max. : 0.64900 Max. :0.4530 Max. : 0.52904
## NA's :43 NA's :11 NA's :1 NA's :1
## precipitation_amt_mm reanalysis_air_temp_k reanalysis_avg_temp_k
## Min. : 0.000 Min. :294.6 Min. :295.2
## 1st Qu.: 8.175 1st Qu.:297.8 1st Qu.:298.3
## Median : 31.455 Median :298.5 Median :299.3
## Mean : 38.354 Mean :298.8 Mean :299.4
## 3rd Qu.: 57.773 3rd Qu.:300.2 3rd Qu.:300.5
## Max. :169.340 Max. :301.9 Max. :303.3
## NA's :2 NA's :2 NA's :2
## reanalysis_dew_point_temp_k reanalysis_max_air_temp_k
## Min. :290.8 Min. :298.2
## 1st Qu.:294.3 1st Qu.:301.4
## Median :295.8 Median :302.8
## Mean :295.4 Mean :303.6
## 3rd Qu.:296.6 3rd Qu.:305.8
## Max. :297.8 Max. :314.1
## NA's :2 NA's :2
## reanalysis_min_air_temp_k reanalysis_precip_amt_kg_per_m2
## Min. :286.2 Min. : 0.00
## 1st Qu.:293.5 1st Qu.: 9.43
## Median :296.3 Median : 25.85
## Mean :295.7 Mean : 42.17
## 3rd Qu.:298.3 3rd Qu.: 56.48
## Max. :299.7 Max. :301.40
## NA's :2 NA's :2
## reanalysis_relative_humidity_percent reanalysis_sat_precip_amt_mm
## Min. :64.92 Min. : 0.000
## 1st Qu.:77.40 1st Qu.: 8.175
## Median :80.33 Median : 31.455
## Mean :82.50 Mean : 38.354
## 3rd Qu.:88.33 3rd Qu.: 57.773
## Max. :97.98 Max. :169.340
## NA's :2 NA's :2
## reanalysis_specific_humidity_g_per_kg reanalysis_tdtr_k
## Min. :12.54 Min. : 1.486
## 1st Qu.:15.79 1st Qu.: 2.446
## Median :17.34 Median : 2.914
## Mean :16.93 Mean : 5.125
## 3rd Qu.:18.17 3rd Qu.: 8.171
## Max. :19.60 Max. :14.486
## NA's :2 NA's :2
## station_avg_temp_c station_diur_temp_rng_c station_max_temp_c
## Min. :24.16 Min. : 4.043 Min. :27.20
## 1st Qu.:26.51 1st Qu.: 5.929 1st Qu.:31.10
## Median :27.48 Median : 6.643 Median :32.80
## Mean :27.37 Mean : 7.811 Mean :32.53
## 3rd Qu.:28.32 3rd Qu.: 9.812 3rd Qu.:33.90
## Max. :30.27 Max. :14.725 Max. :38.40
## NA's :12 NA's :12 NA's :3
## station_min_temp_c station_precip_mm
## Min. :14.20 Min. : 0.00
## 1st Qu.:21.20 1st Qu.: 9.10
## Median :22.20 Median : 23.60
## Mean :22.37 Mean : 34.28
## 3rd Qu.:23.30 3rd Qu.: 47.75
## Max. :26.70 Max. :212.00
## NA's :9 NA's :5
summary(label)
## city year weekofyear total_cases
## Length:1456 Min. :1990 Min. : 1.00 Min. : 0.00
## Class :character 1st Qu.:1997 1st Qu.:13.75 1st Qu.: 5.00
## Mode :character Median :2002 Median :26.50 Median : 12.00
## Mean :2001 Mean :26.50 Mean : 24.68
## 3rd Qu.:2005 3rd Qu.:39.25 3rd Qu.: 28.00
## Max. :2010 Max. :53.00 Max. :461.00
## Descriptive Statistics of the Training set
describe(train)
## vars n mean sd median
## city* 1 1199 NaN NA NA
## year 2 1199 2001.30 5.35 2002.00
## weekofyear 3 1199 26.50 14.90 26.00
## total_cases 4 1199 21.20 30.86 11.00
## week_start_date 5 1199 NaN NA NA
## ndvi_ne 6 1199 0.14 0.14 0.12
## ndvi_nw 7 1199 0.13 0.12 0.13
## ndvi_se 8 1199 0.21 0.07 0.20
## ndvi_sw 9 1199 0.21 0.09 0.19
## precipitation_amt_mm 10 1199 47.58 43.18 41.41
## reanalysis_air_temp_k 11 1199 298.68 1.36 298.62
## reanalysis_avg_temp_k 12 1199 299.24 1.26 299.32
## reanalysis_dew_point_temp_k 13 1199 295.30 1.50 295.68
## reanalysis_max_air_temp_k 14 1199 303.66 3.30 302.60
## reanalysis_min_air_temp_k 15 1199 295.58 2.59 296.00
## reanalysis_precip_amt_kg_per_m2 16 1199 41.55 44.49 28.60
## reanalysis_relative_humidity_percent 17 1199 82.63 7.30 80.80
## reanalysis_sat_precip_amt_mm 18 1199 47.58 43.18 41.41
## reanalysis_specific_humidity_g_per_kg 19 1199 16.81 1.52 17.17
## reanalysis_tdtr_k 20 1199 5.14 3.59 3.00
## station_avg_temp_c 21 1199 27.23 1.27 27.45
## station_diur_temp_rng_c 22 1199 8.25 2.18 7.47
## station_max_temp_c 23 1199 32.57 1.95 32.80
## station_min_temp_c 24 1199 22.08 1.55 22.10
## station_precip_mm 25 1199 40.92 49.00 24.70
## trimmed mad min max range
## city* NaN NA Inf -Inf -Inf
## year 2001.62 5.93 1990.00 2010.00 20.00
## weekofyear 26.49 19.27 1.00 52.00 51.00
## total_cases 14.78 13.34 0.00 329.00 329.00
## week_start_date NaN NA Inf -Inf -Inf
## ndvi_ne 0.14 0.15 -0.41 0.51 0.91
## ndvi_nw 0.13 0.12 -0.46 0.45 0.91
## ndvi_se 0.20 0.07 -0.02 0.54 0.55
## ndvi_sw 0.20 0.08 -0.06 0.55 0.61
## precipitation_amt_mm 42.52 43.90 0.00 390.60 390.60
## reanalysis_air_temp_k 298.69 1.55 294.64 302.20 7.56
## reanalysis_avg_temp_k 299.28 1.43 294.89 302.61 7.72
## reanalysis_dew_point_temp_k 295.44 1.49 289.64 298.45 8.81
## reanalysis_max_air_temp_k 303.37 2.97 297.80 313.20 15.40
## reanalysis_min_air_temp_k 295.77 2.97 286.90 299.90 13.00
## reanalysis_precip_amt_kg_per_m2 33.67 26.24 0.00 570.50 570.50
## reanalysis_relative_humidity_percent 82.23 6.17 57.79 98.61 40.82
## reanalysis_sat_precip_amt_mm 42.52 43.90 0.00 390.60 390.60
## reanalysis_specific_humidity_g_per_kg 16.91 1.55 11.72 20.46 8.75
## reanalysis_tdtr_k 4.64 1.42 1.36 16.03 14.67
## station_avg_temp_c 27.31 1.24 21.40 30.80 9.40
## station_diur_temp_rng_c 8.06 1.87 4.53 15.80 11.27
## station_max_temp_c 32.64 1.63 26.70 42.20 15.50
## station_min_temp_c 22.11 1.63 14.70 25.60 10.90
## station_precip_mm 31.74 27.43 0.00 543.30 543.30
## skew kurtosis se
## city* NA NA NA
## year -0.48 -0.75 0.15
## weekofyear 0.00 -1.20 0.43
## total_cases 3.80 21.13 0.89
## week_start_date NA NA NA
## ndvi_ne -0.08 -0.14 0.00
## ndvi_nw -0.06 0.13 0.00
## ndvi_se 0.57 0.44 0.00
## ndvi_sw 0.72 0.58 0.00
## precipitation_amt_mm 1.73 7.45 1.25
## reanalysis_air_temp_k -0.07 -0.67 0.04
## reanalysis_avg_temp_k -0.23 -0.49 0.04
## reanalysis_dew_point_temp_k -0.77 0.03 0.04
## reanalysis_max_air_temp_k 0.74 -0.44 0.10
## reanalysis_min_air_temp_k -0.57 -0.39 0.07
## reanalysis_precip_amt_kg_per_m2 3.41 22.70 1.28
## reanalysis_relative_humidity_percent 0.49 -0.59 0.21
## reanalysis_sat_precip_amt_mm 1.73 7.45 1.25
## reanalysis_specific_humidity_g_per_kg -0.58 -0.39 0.04
## reanalysis_tdtr_k 0.92 -0.52 0.10
## station_avg_temp_c -0.62 0.03 0.04
## station_diur_temp_rng_c 0.70 -0.54 0.06
## station_max_temp_c -0.27 0.30 0.06
## station_min_temp_c -0.28 0.29 0.04
## station_precip_mm 2.94 14.89 1.41
## Significance of variables to total_cases in Train set
lmTrain <- lm(total_cases ~ ., data=train)
summary(lmTrain)
##
## Call:
## lm(formula = total_cases ~ ., data = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -42.391 -13.736 -3.699 6.159 283.066
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value
## (Intercept) -4.938e+04 2.716e+04 -1.818
## citysj 9.901e+00 1.207e+01 0.820
## year 2.921e+01 1.379e+01 2.118
## weekofyear 7.471e-01 2.637e-01 2.833
## week_start_date -8.388e-02 3.777e-02 -2.221
## ndvi_ne -1.386e+01 1.170e+01 -1.185
## ndvi_nw 8.681e+00 1.336e+01 0.650
## ndvi_se -1.471e+01 1.915e+01 -0.768
## ndvi_sw 7.784e+00 1.797e+01 0.433
## precipitation_amt_mm -4.947e-03 2.332e-02 -0.212
## reanalysis_air_temp_k 6.914e+00 1.140e+01 0.606
## reanalysis_avg_temp_k -1.051e+01 5.013e+00 -2.096
## reanalysis_dew_point_temp_k -2.664e+01 1.348e+01 -1.977
## reanalysis_max_air_temp_k 1.583e+00 1.118e+00 1.416
## reanalysis_min_air_temp_k -3.253e-01 1.529e+00 -0.213
## reanalysis_precip_amt_kg_per_m2 -1.671e-02 2.449e-02 -0.682
## reanalysis_relative_humidity_percent -1.016e+00 2.497e+00 -0.407
## reanalysis_sat_precip_amt_mm NA NA NA
## reanalysis_specific_humidity_g_per_kg 3.661e+01 9.345e+00 3.918
## reanalysis_tdtr_k -6.440e-02 1.558e+00 -0.041
## station_avg_temp_c -1.772e+00 1.881e+00 -0.942
## station_diur_temp_rng_c 2.617e-01 1.164e+00 0.225
## station_max_temp_c -1.394e+00 1.065e+00 -1.308
## station_min_temp_c 6.950e-01 1.277e+00 0.544
## station_precip_mm -9.822e-04 1.928e-02 -0.051
## Pr(>|t|)
## (Intercept) 0.06931 .
## citysj 0.41223
## year 0.03442 *
## weekofyear 0.00468 **
## week_start_date 0.02656 *
## ndvi_ne 0.23626
## ndvi_nw 0.51611
## ndvi_se 0.44269
## ndvi_sw 0.66508
## precipitation_amt_mm 0.83202
## reanalysis_air_temp_k 0.54434
## reanalysis_avg_temp_k 0.03628 *
## reanalysis_dew_point_temp_k 0.04832 *
## reanalysis_max_air_temp_k 0.15711
## reanalysis_min_air_temp_k 0.83151
## reanalysis_precip_amt_kg_per_m2 0.49523
## reanalysis_relative_humidity_percent 0.68408
## reanalysis_sat_precip_amt_mm NA
## reanalysis_specific_humidity_g_per_kg 9.45e-05 ***
## reanalysis_tdtr_k 0.96704
## station_avg_temp_c 0.34657
## station_diur_temp_rng_c 0.82223
## station_max_temp_c 0.19096
## station_min_temp_c 0.58629
## station_precip_mm 0.95938
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 26.97 on 1175 degrees of freedom
## Multiple R-squared: 0.2509, Adjusted R-squared: 0.2362
## F-statistic: 17.11 on 23 and 1175 DF, p-value: < 2.2e-16
autoplot(lmTrain)
checkresiduals(lmTrain)
##
## Breusch-Godfrey test for serial correlation of order up to 28
##
## data: Residuals
## LM test = 985.69, df = 28, p-value < 2.2e-16
Variable Exploration of Train set
## Plotting total cases by the time components
par(bg="grey")
plot(total_cases ~ year, data=train, col="Dodgerblue", bg="grey", ylab="Total Cases of Dengue Fever", xlab="Year", main="Cases of Dengue Fever per Year")
plot(total_cases ~ weekofyear, data=train, col="firebrick4", bg="grey", ylab="Total Cases of Dengue Fever", xlab="Week of Year", main="Cases of Dengue Fever per Week of Year")
plot(total_cases ~ week_start_date, data=train, col="mediumpurple", bg="grey", ylab="Total Cases of Dengue Fever", xlab="Week Start Date", main="Cases of Dengue Fever per Week Start Date")
## Plotting Variables for Total Cases
plot(total_cases ~ reanalysis_avg_temp_k, data=train, col="coral", bg="grey", ylab="Total Cases of Dengue Fever", xlab="Average Temperature (Kelvin)", main="Cases of Dengue Fever per Average Temperature")
plot(total_cases ~ reanalysis_dew_point_temp_k, data=train, col="springgreen4", bg="grey", ylab="Total Cases of Dengue Fever", xlab="Average Dewpoint Temperature (Kelvin)", main="Cases of Dengue Fever per Average Dewpoint Temperature")
plot(total_cases ~ reanalysis_specific_humidity_g_per_kg, data=train, col="goldenrod4", bg="grey", ylab="Total Cases of Dengue Fever", xlab="Specific Humidity (g per kg)", main="Cases of Dengue Fever per Specific Humidity")
plot(total_cases ~ reanalysis_max_air_temp_k, data=train, col="cornflowerblue", bg="grey", ylab="Total Cases of Dengue Fever", xlab="Maximum Air Temperature (Kelvin)", main="Cases of Dengue Fever per Maximum Air Temperature")
Splitting the Train set by City
library(dplyr)
## Utilizing the dpylr() function to split the created dataset
## San Juan filter
sj <- train %>% filter(city=="sj")
## Iquitos filter
iq <- train %>% filter(city=="iq")
Linear Regression of each City
## San Juan set
lm1 <- lm(total_cases ~ year + weekofyear + week_start_date + reanalysis_avg_temp_k + reanalysis_dew_point_temp_k + reanalysis_specific_humidity_g_per_kg, data=sj)
summary(lm1)
##
## Call:
## lm(formula = total_cases ~ year + weekofyear + week_start_date +
## reanalysis_avg_temp_k + reanalysis_dew_point_temp_k + reanalysis_specific_humidity_g_per_kg,
## data = sj)
##
## Residuals:
## Min 1Q Median 3Q Max
## -48.788 -18.127 -6.939 8.676 280.846
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) -6.405e+04 3.847e+04 -1.665
## year 4.059e+01 1.962e+01 2.068
## weekofyear 1.096e+00 3.726e-01 2.941
## week_start_date -1.153e-01 5.375e-02 -2.145
## reanalysis_avg_temp_k 2.168e+00 2.538e+00 0.854
## reanalysis_dew_point_temp_k -5.943e+01 1.457e+01 -4.079
## reanalysis_specific_humidity_g_per_kg 6.320e+01 1.479e+01 4.274
## Pr(>|t|)
## (Intercept) 0.09636 .
## year 0.03896 *
## weekofyear 0.00337 **
## week_start_date 0.03229 *
## reanalysis_avg_temp_k 0.39325
## reanalysis_dew_point_temp_k 5.03e-05 ***
## reanalysis_specific_humidity_g_per_kg 2.18e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 33.15 on 720 degrees of freedom
## Multiple R-squared: 0.165, Adjusted R-squared: 0.1581
## F-statistic: 23.72 on 6 and 720 DF, p-value: < 2.2e-16
plot(lm1)
## Iquitos set
lm2 <- lm(total_cases ~ year + weekofyear + week_start_date + reanalysis_avg_temp_k + reanalysis_dew_point_temp_k + reanalysis_specific_humidity_g_per_kg, data=iq)
summary(lm2)
##
## Call:
## lm(formula = total_cases ~ year + weekofyear + week_start_date +
## reanalysis_avg_temp_k + reanalysis_dew_point_temp_k + reanalysis_specific_humidity_g_per_kg,
## data = iq)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.444 -4.902 -2.564 1.255 74.302
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) -4.463e+03 1.814e+04 -0.246
## year 3.362e+00 9.195e+00 0.366
## weekofyear 5.541e-02 1.756e-01 0.316
## week_start_date -7.942e-03 2.517e-02 -0.316
## reanalysis_avg_temp_k 1.281e-01 3.673e-01 0.349
## reanalysis_dew_point_temp_k -8.002e+00 4.914e+00 -1.628
## reanalysis_specific_humidity_g_per_kg 9.185e+00 4.829e+00 1.902
## Pr(>|t|)
## (Intercept) 0.8058
## year 0.7148
## weekofyear 0.7525
## week_start_date 0.7525
## reanalysis_avg_temp_k 0.7274
## reanalysis_dew_point_temp_k 0.1041
## reanalysis_specific_humidity_g_per_kg 0.0578 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.128 on 465 degrees of freedom
## Multiple R-squared: 0.09257, Adjusted R-squared: 0.08086
## F-statistic: 7.906 on 6 and 465 DF, p-value: 3.968e-08
plot(lm2)
Creating a time series of each City data set
sjTS <- ts(sj$total_cases, start=c(1))
iqTS <- ts(iq$total_cases, start=c(1))
## Summaries of the time series sets
summary(sjTS)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 9.00 18.00 30.21 36.00 329.00
summary(iqTS)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.000 5.000 7.309 9.000 83.000
Model Formulation
Model 1 - ETS Model
# San Juan
smodel1 <- ets(sjTS, model="ZZZ", damped=NULL, alpha=NULL, beta=NULL,
gamma=NULL, phi=NULL, lambda=NULL, biasadj=FALSE,
additive.only=FALSE, restrict=TRUE,
allow.multiplicative.trend=FALSE)
smodel1
## ETS(A,N,N)
##
## Call:
## ets(y = sjTS, model = "ZZZ", damped = NULL, alpha = NULL, beta = NULL,
##
## Call:
## gamma = NULL, phi = NULL, additive.only = FALSE, lambda = NULL,
##
## Call:
## biasadj = FALSE, restrict = TRUE, allow.multiplicative.trend = FALSE)
##
## Smoothing parameters:
## alpha = 0.9999
##
## Initial states:
## l = 4.1836
##
## sigma: 12.4615
##
## AIC AICc BIC
## 8462.075 8462.108 8475.842
forecast(smodel1, h=10)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 728 4.9996 -10.97050 20.96970 -19.42456 29.42376
## 729 4.9996 -17.58440 27.58360 -29.53964 39.53884
## 730 4.9996 -22.65957 32.65877 -37.30146 47.30066
## 731 4.9996 -26.93820 36.93740 -43.84505 53.84425
## 732 4.9996 -30.70776 40.70696 -49.61010 59.60930
## 733 4.9996 -34.11573 44.11493 -54.82213 64.82133
## 734 4.9996 -37.24968 47.24888 -59.61510 69.61430
## 735 4.9996 -40.16670 50.16590 -64.07630 74.07550
## 736 4.9996 -42.90643 52.90563 -68.26635 78.26555
## 737 4.9996 -45.49773 55.49693 -72.22941 82.22861
sjf1 <- forecast(smodel1, h=52)
autoplot(sjf1) +xlab("Week") + ylab("Total Cases of Dengue Fever")
checkresiduals(sjf1)
##
## Ljung-Box test
##
## data: Residuals from ETS(A,N,N)
## Q* = 30.582, df = 8, p-value = 0.0001667
##
## Model df: 2. Total lags used: 10
## Iquitos
imodel1 <- ets(iqTS, model="ZZZ", damped=NULL, alpha=NULL, beta=NULL,
gamma=NULL, phi=NULL, lambda=NULL, biasadj=FALSE,
additive.only=FALSE, restrict=TRUE,
allow.multiplicative.trend=FALSE)
imodel1
## ETS(A,N,N)
##
## Call:
## ets(y = iqTS, model = "ZZZ", damped = NULL, alpha = NULL, beta = NULL,
##
## Call:
## gamma = NULL, phi = NULL, additive.only = FALSE, lambda = NULL,
##
## Call:
## biasadj = FALSE, restrict = TRUE, allow.multiplicative.trend = FALSE)
##
## Smoothing parameters:
## alpha = 0.5619
##
## Initial states:
## l = 0.1055
##
## sigma: 6.5424
##
## AIC AICc BIC
## 4683.212 4683.263 4695.683
forecast(imodel1, h=10)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 473 3.176212 -5.208239 11.56066 -9.646699 15.99912
## 474 3.176212 -6.441405 12.79383 -11.532664 17.88509
## 475 3.176212 -7.533508 13.88593 -13.202891 19.55531
## 476 3.176212 -8.524114 14.87654 -14.717893 21.07032
## 477 3.176212 -9.437161 15.78958 -16.114277 22.46670
## 478 3.176212 -10.288434 16.64086 -17.416188 23.76861
## 479 3.176212 -11.088999 17.44142 -18.640545 24.99297
## 480 3.176212 -11.846963 18.19939 -19.799751 26.15217
## 481 3.176212 -12.568480 18.92090 -20.903216 27.25564
## 482 3.176212 -13.258351 19.61077 -21.958282 28.31071
iqf1 <- forecast(imodel1, h=52)
autoplot(iqf1) + xlab("Week") + ylab("Total Cases of Dengue Fever")
checkresiduals(iqf1)
##
## Ljung-Box test
##
## data: Residuals from ETS(A,N,N)
## Q* = 18.462, df = 8, p-value = 0.01802
##
## Model df: 2. Total lags used: 10
Model 2 - ARIMA Model
## San Juan
smodel2 <- auto.arima(sjTS)
smodel2
## Series: sjTS
## ARIMA(2,1,2)
##
## Coefficients:
## ar1 ar2 ma1 ma2
## -1.0539 -0.4930 1.1509 0.6432
## s.e. 0.1614 0.1615 0.1429 0.1423
##
## sigma^2 estimated as 151.3: log likelihood=-2850.29
## AIC=5710.58 AICc=5710.67 BIC=5733.52
forecast(smodel2, h=10)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 728 5.171452 -10.59441 20.93732 -18.94035 29.28326
## 729 5.733913 -17.66979 29.13762 -30.05897 41.52679
## 730 5.056630 -24.49946 34.61272 -40.14551 50.25877
## 731 5.493073 -28.35739 39.34354 -46.27675 57.26289
## 732 5.367052 -32.86062 43.59472 -53.09713 63.83123
## 733 5.284677 -36.63312 47.20248 -58.82307 69.39242
## 734 5.433621 -39.86478 50.73203 -63.84431 74.71155
## 735 5.317270 -43.23581 53.87035 -68.93826 79.57280
## 736 5.366452 -46.13232 56.86523 -73.39413 84.12703
## 737 5.371987 -48.96863 59.71260 -77.73481 88.47879
sjf2 <- forecast(smodel2, h=52)
autoplot(sjf2) +xlab("Week") + ylab("Total Cases of Dengue Fever")
checkresiduals(sjf2)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,1,2)
## Q* = 10.112, df = 6, p-value = 0.12
##
## Model df: 4. Total lags used: 10
## Iquitos
imodel2 <- auto.arima(iqTS)
imodel2
## Series: iqTS
## ARIMA(0,1,1)
##
## Coefficients:
## ma1
## -0.4370
## s.e. 0.0407
##
## sigma^2 estimated as 42.8: log likelihood=-1552.61
## AIC=3109.22 AICc=3109.24 BIC=3117.53
forecast(imodel2, h=10)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 473 3.176061 -5.208391 11.56051 -9.646852 15.99897
## 474 3.176061 -6.446017 12.79814 -11.539638 17.89176
## 475 3.176061 -7.541669 13.89379 -13.215293 19.56741
## 476 3.176061 -8.535262 14.88738 -14.734862 21.08698
## 477 3.176061 -9.450912 15.80303 -16.135227 22.48735
## 478 3.176061 -10.304510 16.65663 -17.440693 23.79281
## 479 3.176061 -11.107186 17.45931 -18.668280 25.02040
## 480 3.176061 -11.867093 18.21922 -19.830458 26.18258
## 481 3.176061 -12.590417 18.94254 -20.936687 27.28881
## 482 3.176061 -13.281982 19.63410 -21.994344 28.34647
iqf2 <- forecast(imodel2, h=52)
autoplot(iqf2) +xlab("Week") + ylab("Total Cases of Dengue Fever")
checkresiduals(iqf2)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,1)
## Q* = 18.398, df = 9, p-value = 0.03082
##
## Model df: 1. Total lags used: 10
Model 3 - Holt-Winters’ Seasonal Method
## San Juan
sjHW <- ts(sj$total_cases, frequency=3, start=c(1))
smodel3 <- hw(sjHW, seasonal="additive", damped=FALSE, h=52)
autoplot(smodel3) + xlab("Week") + ylab("Total Cases of Dengue Fever")
smodel3
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 243.3333 4.629310 -11.40265 20.66127 -19.88946 29.14808
## 243.6667 4.919338 -17.75327 27.59195 -29.75542 39.59410
## 244.0000 5.655215 -22.11387 33.42430 -36.81393 48.12437
## 244.3333 5.284527 -26.78259 37.35164 -43.75789 54.32695
## 244.6667 5.574556 -30.27900 41.42811 -49.25873 60.40784
## 245.0000 6.310433 -32.96687 45.58774 -53.75902 66.37989
## 245.3333 5.939745 -36.48712 48.36661 -58.94655 70.82604
## 245.6667 6.229773 -39.12848 51.58803 -63.13969 75.59923
## 246.0000 6.965650 -41.14625 55.07755 -66.61515 80.54645
## 246.3333 6.594962 -44.12231 57.31223 -70.97041 84.16033
## 246.6667 6.884990 -46.31020 60.08018 -74.47003 88.24001
## 247.0000 7.620867 -47.94230 63.18404 -77.35567 92.59740
## 247.3333 7.250179 -50.58499 65.08535 -81.20108 95.70144
## 247.6667 7.540207 -52.48102 67.56144 -84.25434 99.33475
## 248.0000 8.276084 -53.85475 70.40692 -86.74482 103.29699
## 248.3333 7.905396 -56.26653 72.07732 -90.23709 106.04789
## 248.6667 8.195424 -57.95464 74.34549 -92.97237 109.36322
## 249.0000 8.931301 -59.13983 77.00243 -95.17450 113.03711
## 249.3333 8.560613 -61.37956 78.50079 -98.40366 115.52488
## 249.6667 8.850642 -62.90992 80.61121 -100.89767 118.59895
## 250.0000 9.586519 -63.94973 83.12277 -102.87747 122.05051
## 250.3333 9.215830 -66.05491 84.48657 -105.90084 124.33250
## 250.6667 9.505859 -67.46030 86.47202 -108.20372 127.21544
## 251.0000 10.241736 -68.38361 88.86709 -110.00536 130.48883
## 251.3333 9.871048 -70.37984 90.12194 -112.86209 132.60419
## 251.6667 10.161076 -71.68307 92.00522 -115.00874 135.33090
## 252.0000 10.896953 -72.51033 94.30424 -116.66348 138.45739
## 252.3333 10.526265 -74.41600 95.46853 -119.38173 140.43426
## 252.6667 10.816293 -75.63371 97.26630 -121.39758 143.03016
## 253.0000 11.552170 -76.38001 99.48435 -122.92850 146.03284
## 253.3333 11.181482 -78.20888 100.57185 -125.52928 147.89225
## 253.6667 11.471510 -79.35363 102.29665 -127.43356 150.37658
## 254.0000 12.207387 -80.03049 104.44527 -128.85828 153.27305
## 254.3333 11.836699 -81.79316 105.46656 -131.35781 155.03121
## 254.6667 12.126728 -82.87471 107.12817 -133.16544 157.41889
## 255.0000 12.862604 -83.49116 109.21637 -134.49777 160.22298
## 255.3333 12.491916 -85.19599 110.17982 -136.90884 161.89268
## 255.6667 12.781945 -86.22212 111.78601 -138.63171 164.19560
## 256.0000 13.517822 -86.78540 113.82104 -139.88271 166.91835
## 256.3333 13.147134 -88.43913 114.73340 -142.21565 168.50991
## 256.6667 13.437162 -89.41614 116.29046 -143.86339 170.73771
## 257.0000 14.173039 -89.93213 118.27821 -145.04208 173.38816
## 257.3333 13.802351 -91.54030 119.14501 -147.30534 174.91004
## 257.6667 14.092379 -92.47339 120.65815 -148.88590 177.07066
## 258.0000 14.828256 -92.94699 122.60350 -149.99975 179.65626
## 258.3333 14.457568 -94.51420 123.42933 -152.20037 181.11550
## 258.6667 14.747596 -95.40770 124.90289 -153.72039 183.21558
## 259.0000 15.483473 -95.84300 126.80995 -154.77568 185.74262
## 259.3333 15.112785 -97.37313 127.59870 -156.91958 187.14515
## 259.6667 15.402814 -98.23072 129.03635 -158.38468 189.19031
## 260.0000 16.138690 -98.63121 130.90859 -159.38672 191.66410
## 260.3333 15.768002 -100.12757 131.66357 -161.47898 193.01498
summary(smodel3)
##
## Forecast method: Holt-Winters' additive method
##
## Model Information:
## Holt-Winters' additive method
##
## Call:
## hw(y = sjHW, h = 52, seasonal = "additive", damped = FALSE)
##
## Smoothing parameters:
## alpha = 0.9999
## beta = 1e-04
## gamma = 1e-04
##
## Initial states:
## l = 3.5836
## b = 0.235
## s = -0.1627 -0.223 0.3858
##
## sigma: 12.5098
##
## AIC AICc BIC
## 8472.665 8472.866 8509.377
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.2277723 12.44943 7.773029 -Inf Inf 0.6154498 0.09173901
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 243.3333 4.629310 -11.40265 20.66127 -19.88946 29.14808
## 243.6667 4.919338 -17.75327 27.59195 -29.75542 39.59410
## 244.0000 5.655215 -22.11387 33.42430 -36.81393 48.12437
## 244.3333 5.284527 -26.78259 37.35164 -43.75789 54.32695
## 244.6667 5.574556 -30.27900 41.42811 -49.25873 60.40784
## 245.0000 6.310433 -32.96687 45.58774 -53.75902 66.37989
## 245.3333 5.939745 -36.48712 48.36661 -58.94655 70.82604
## 245.6667 6.229773 -39.12848 51.58803 -63.13969 75.59923
## 246.0000 6.965650 -41.14625 55.07755 -66.61515 80.54645
## 246.3333 6.594962 -44.12231 57.31223 -70.97041 84.16033
## 246.6667 6.884990 -46.31020 60.08018 -74.47003 88.24001
## 247.0000 7.620867 -47.94230 63.18404 -77.35567 92.59740
## 247.3333 7.250179 -50.58499 65.08535 -81.20108 95.70144
## 247.6667 7.540207 -52.48102 67.56144 -84.25434 99.33475
## 248.0000 8.276084 -53.85475 70.40692 -86.74482 103.29699
## 248.3333 7.905396 -56.26653 72.07732 -90.23709 106.04789
## 248.6667 8.195424 -57.95464 74.34549 -92.97237 109.36322
## 249.0000 8.931301 -59.13983 77.00243 -95.17450 113.03711
## 249.3333 8.560613 -61.37956 78.50079 -98.40366 115.52488
## 249.6667 8.850642 -62.90992 80.61121 -100.89767 118.59895
## 250.0000 9.586519 -63.94973 83.12277 -102.87747 122.05051
## 250.3333 9.215830 -66.05491 84.48657 -105.90084 124.33250
## 250.6667 9.505859 -67.46030 86.47202 -108.20372 127.21544
## 251.0000 10.241736 -68.38361 88.86709 -110.00536 130.48883
## 251.3333 9.871048 -70.37984 90.12194 -112.86209 132.60419
## 251.6667 10.161076 -71.68307 92.00522 -115.00874 135.33090
## 252.0000 10.896953 -72.51033 94.30424 -116.66348 138.45739
## 252.3333 10.526265 -74.41600 95.46853 -119.38173 140.43426
## 252.6667 10.816293 -75.63371 97.26630 -121.39758 143.03016
## 253.0000 11.552170 -76.38001 99.48435 -122.92850 146.03284
## 253.3333 11.181482 -78.20888 100.57185 -125.52928 147.89225
## 253.6667 11.471510 -79.35363 102.29665 -127.43356 150.37658
## 254.0000 12.207387 -80.03049 104.44527 -128.85828 153.27305
## 254.3333 11.836699 -81.79316 105.46656 -131.35781 155.03121
## 254.6667 12.126728 -82.87471 107.12817 -133.16544 157.41889
## 255.0000 12.862604 -83.49116 109.21637 -134.49777 160.22298
## 255.3333 12.491916 -85.19599 110.17982 -136.90884 161.89268
## 255.6667 12.781945 -86.22212 111.78601 -138.63171 164.19560
## 256.0000 13.517822 -86.78540 113.82104 -139.88271 166.91835
## 256.3333 13.147134 -88.43913 114.73340 -142.21565 168.50991
## 256.6667 13.437162 -89.41614 116.29046 -143.86339 170.73771
## 257.0000 14.173039 -89.93213 118.27821 -145.04208 173.38816
## 257.3333 13.802351 -91.54030 119.14501 -147.30534 174.91004
## 257.6667 14.092379 -92.47339 120.65815 -148.88590 177.07066
## 258.0000 14.828256 -92.94699 122.60350 -149.99975 179.65626
## 258.3333 14.457568 -94.51420 123.42933 -152.20037 181.11550
## 258.6667 14.747596 -95.40770 124.90289 -153.72039 183.21558
## 259.0000 15.483473 -95.84300 126.80995 -154.77568 185.74262
## 259.3333 15.112785 -97.37313 127.59870 -156.91958 187.14515
## 259.6667 15.402814 -98.23072 129.03635 -158.38468 189.19031
## 260.0000 16.138690 -98.63121 130.90859 -159.38672 191.66410
## 260.3333 15.768002 -100.12757 131.66357 -161.47898 193.01498
checkresiduals(smodel3)
##
## Ljung-Box test
##
## data: Residuals from Holt-Winters' additive method
## Q* = 30.484, df = 3, p-value = 1.091e-06
##
## Model df: 7. Total lags used: 10
## Iquitos
iqHW <- ts(iq$total_cases, frequency=3, start=c(1))
imodel3 <- hw(iqHW, seasonal="additive", damped=FALSE, h=52)
autoplot(imodel3) + xlab("Week") + ylab("Total Cases of Dengue Fever")
imodel3
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 158.3333 2.820597 -5.618911 11.26011 -10.08652 15.72771
## 158.6667 3.142969 -6.515785 12.80172 -11.62882 17.91476
## 159.0000 3.197722 -7.543121 13.93856 -13.22898 19.62442
## 159.3333 2.784539 -8.939615 14.50869 -15.14601 20.71509
## 159.6667 3.106911 -9.524235 15.73806 -16.21076 22.42458
## 160.0000 3.161663 -10.315868 16.63920 -17.45044 23.77377
## 160.3333 2.748481 -11.525894 17.02286 -19.08229 24.57925
## 160.6667 3.070853 -11.958175 18.09988 -19.91406 26.05577
## 161.0000 3.125605 -12.622206 18.87342 -20.95859 27.20980
## 161.3333 2.712423 -13.723249 19.14810 -22.42377 27.84862
## 161.6667 3.034795 -14.061085 20.13067 -23.11110 29.18069
## 162.0000 3.089547 -14.642199 20.82129 -24.02882 30.20791
## 162.3333 2.676365 -15.669655 21.02239 -25.38145 30.73418
## 162.6667 2.998737 -15.941646 21.93912 -25.96808 31.96555
## 163.0000 3.053489 -16.463368 22.57035 -26.79497 32.90195
## 163.3333 2.640307 -17.436875 22.71749 -28.06509 33.34571
## 163.6667 2.962679 -17.659609 23.58497 -28.57639 34.50175
## 164.0000 3.017431 -18.136108 24.17097 -29.33412 35.36898
## 164.3333 2.604249 -19.067891 24.27639 -30.54043 35.74893
## 164.6667 2.926621 -19.251997 25.10524 -30.99265 36.84589
## 165.0000 2.981373 -19.692587 25.65533 -31.69546 37.65820
## 165.3333 2.568191 -20.590863 25.72724 -32.85053 37.98691
## 165.6667 2.890562 -20.743631 26.52476 -33.25482 39.03594
## 166.0000 2.945315 -21.154818 27.04545 -33.91266 39.80329
## 166.3333 2.532133 -22.025425 27.08969 -35.02541 40.08968
## 166.6667 2.854504 -22.152112 27.86112 -35.38982 41.09882
## 167.0000 2.909257 -22.538653 28.35717 -36.00996 41.82848
## 167.3333 2.496074 -23.385913 28.37806 -37.08701 42.07916
## 167.6667 2.818446 -23.490458 29.12735 -37.41755 43.05444
## 168.0000 2.873199 -23.855954 29.60235 -38.00551 43.75191
## 168.3333 2.460016 -24.683173 29.60321 -39.05191 43.97194
## 168.6667 2.782388 -24.768617 30.33339 -39.35324 44.91801
## 169.0000 2.837140 -25.115873 30.79015 -39.91330 45.58759
## 169.3333 2.423958 -25.925646 30.77356 -40.93302 45.78094
## 169.6667 2.746330 -25.994393 31.48705 -41.20881 46.70147
## 170.0000 2.801082 -26.325645 31.92781 -41.74440 47.34657
## 170.3333 2.387900 -27.120053 31.89585 -42.74062 47.51642
## 170.6667 2.710272 -27.174044 32.59459 -42.99385 48.41439
## 171.0000 2.765024 -27.491106 33.02115 -43.50773 49.03778
## 171.3333 2.351842 -28.271849 32.97553 -44.48305 49.18674
## 171.6667 2.674214 -28.312678 33.66111 -44.71615 50.06458
## 172.0000 2.728966 -28.617047 34.07498 -45.21062 50.66856
## 172.3333 2.315784 -29.385534 34.01710 -46.16720 50.79877
## 172.6667 2.638156 -29.414529 34.69084 -46.38220 51.65851
## 173.0000 2.692908 -29.707457 35.09327 -46.85917 52.24499
## 173.3333 2.279726 -30.464872 35.02432 -47.79882 52.35827
## 173.6667 2.602098 -30.483152 35.68735 -47.99743 53.20162
## 174.0000 2.656850 -30.765699 36.07940 -48.45853 53.77223
## 174.3333 2.243668 -31.513048 36.00038 -49.38277 53.87011
## 174.6667 2.566039 -31.521567 36.65365 -49.56646 54.69854
## 175.0000 2.620792 -31.794641 37.03622 -50.01307 55.25465
## 175.3333 2.207610 -32.532787 36.94801 -50.92324 55.33846
summary(imodel3)
##
## Forecast method: Holt-Winters' additive method
##
## Model Information:
## Holt-Winters' additive method
##
## Call:
## hw(y = iqHW, h = 52, seasonal = "additive", damped = FALSE)
##
## Smoothing parameters:
## alpha = 0.5565
## beta = 1e-04
## gamma = 1e-04
##
## Initial states:
## l = -0.1505
## b = -0.0136
## s = 0.0945 -0.2518 0.1573
##
## sigma: 6.5854
##
## AIC AICc BIC
## 4694.342 4694.653 4727.598
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 0.03339355 6.536368 3.623232 NaN Inf 0.753234 -0.01440227
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 158.3333 2.820597 -5.618911 11.26011 -10.08652 15.72771
## 158.6667 3.142969 -6.515785 12.80172 -11.62882 17.91476
## 159.0000 3.197722 -7.543121 13.93856 -13.22898 19.62442
## 159.3333 2.784539 -8.939615 14.50869 -15.14601 20.71509
## 159.6667 3.106911 -9.524235 15.73806 -16.21076 22.42458
## 160.0000 3.161663 -10.315868 16.63920 -17.45044 23.77377
## 160.3333 2.748481 -11.525894 17.02286 -19.08229 24.57925
## 160.6667 3.070853 -11.958175 18.09988 -19.91406 26.05577
## 161.0000 3.125605 -12.622206 18.87342 -20.95859 27.20980
## 161.3333 2.712423 -13.723249 19.14810 -22.42377 27.84862
## 161.6667 3.034795 -14.061085 20.13067 -23.11110 29.18069
## 162.0000 3.089547 -14.642199 20.82129 -24.02882 30.20791
## 162.3333 2.676365 -15.669655 21.02239 -25.38145 30.73418
## 162.6667 2.998737 -15.941646 21.93912 -25.96808 31.96555
## 163.0000 3.053489 -16.463368 22.57035 -26.79497 32.90195
## 163.3333 2.640307 -17.436875 22.71749 -28.06509 33.34571
## 163.6667 2.962679 -17.659609 23.58497 -28.57639 34.50175
## 164.0000 3.017431 -18.136108 24.17097 -29.33412 35.36898
## 164.3333 2.604249 -19.067891 24.27639 -30.54043 35.74893
## 164.6667 2.926621 -19.251997 25.10524 -30.99265 36.84589
## 165.0000 2.981373 -19.692587 25.65533 -31.69546 37.65820
## 165.3333 2.568191 -20.590863 25.72724 -32.85053 37.98691
## 165.6667 2.890562 -20.743631 26.52476 -33.25482 39.03594
## 166.0000 2.945315 -21.154818 27.04545 -33.91266 39.80329
## 166.3333 2.532133 -22.025425 27.08969 -35.02541 40.08968
## 166.6667 2.854504 -22.152112 27.86112 -35.38982 41.09882
## 167.0000 2.909257 -22.538653 28.35717 -36.00996 41.82848
## 167.3333 2.496074 -23.385913 28.37806 -37.08701 42.07916
## 167.6667 2.818446 -23.490458 29.12735 -37.41755 43.05444
## 168.0000 2.873199 -23.855954 29.60235 -38.00551 43.75191
## 168.3333 2.460016 -24.683173 29.60321 -39.05191 43.97194
## 168.6667 2.782388 -24.768617 30.33339 -39.35324 44.91801
## 169.0000 2.837140 -25.115873 30.79015 -39.91330 45.58759
## 169.3333 2.423958 -25.925646 30.77356 -40.93302 45.78094
## 169.6667 2.746330 -25.994393 31.48705 -41.20881 46.70147
## 170.0000 2.801082 -26.325645 31.92781 -41.74440 47.34657
## 170.3333 2.387900 -27.120053 31.89585 -42.74062 47.51642
## 170.6667 2.710272 -27.174044 32.59459 -42.99385 48.41439
## 171.0000 2.765024 -27.491106 33.02115 -43.50773 49.03778
## 171.3333 2.351842 -28.271849 32.97553 -44.48305 49.18674
## 171.6667 2.674214 -28.312678 33.66111 -44.71615 50.06458
## 172.0000 2.728966 -28.617047 34.07498 -45.21062 50.66856
## 172.3333 2.315784 -29.385534 34.01710 -46.16720 50.79877
## 172.6667 2.638156 -29.414529 34.69084 -46.38220 51.65851
## 173.0000 2.692908 -29.707457 35.09327 -46.85917 52.24499
## 173.3333 2.279726 -30.464872 35.02432 -47.79882 52.35827
## 173.6667 2.602098 -30.483152 35.68735 -47.99743 53.20162
## 174.0000 2.656850 -30.765699 36.07940 -48.45853 53.77223
## 174.3333 2.243668 -31.513048 36.00038 -49.38277 53.87011
## 174.6667 2.566039 -31.521567 36.65365 -49.56646 54.69854
## 175.0000 2.620792 -31.794641 37.03622 -50.01307 55.25465
## 175.3333 2.207610 -32.532787 36.94801 -50.92324 55.33846
checkresiduals(imodel3)
##
## Ljung-Box test
##
## data: Residuals from Holt-Winters' additive method
## Q* = 19.038, df = 3, p-value = 0.0002685
##
## Model df: 7. Total lags used: 10
Performance / Accuracy
The models created each had benefits and limitations in their formulation that provided varying results in predicting the total number of cases in each city. The ETS model showed strong performance in terms of its residual output and the Ljung-Box test p-value of 0.0001667. The forecast range produced was fairly similar to the ARIMA model which makes sense based on the auto selection procedure used and the make-ups of each model. The selected ETS model produced AIC, AICc, and BIC values of 8462, 8462, 8475 and 4683, 4683, 4695 respectively for San Juan and Iquitos. The ETS model for San Juan performed slightly better when compared to Iquitos. This model performed best in terms of these values with other ETS models created in comparison. The ARIMA models produced showed similar results when compared to the ETS models. The models produced AIC, AICc, and BIC values of 5170, 5170, 5733 and 3109, 3109, 3117 respectively for San Juan and Iquitos. In this case, the Iquitos model clearly outperformed San Juan in terms of predictive power and model fit. Finally, the Holt Winters’ Seasonal method produced AIC, AICc, and BIC values of 8472, 8472, 8509 and 4694, 4694, 4727 respectively for San Juan and Iquitos. Iquitos outperformed San Juan in the Holt Winters’ model but a large margin and proved to be a better predictor of total cases. It is important to note that these values should not be compared across models and exist for comparison within each model. In evaluating the plotted forecast and the model fit values, the Holt Winters’ seasonal method performed the best of the three techniques in terms of predicting total cases of Dengue Fever. Within this model, Iquitos showed a much stronger forecasting ability and overall model fit. The Holt model also showed a favorable residuals for each model in terms of white nose in the ACF plot and overall shape of the residual distribution. Ultimately, the Holt Winter’s Seasonal method was the best model produced overall and should be chosen in the given analysis.
Limitations
There were a number of limitations in terms of the data set and forecasting models selected that could be improved on in future work. A majority of the time spent in this analysis was aimed towards data exploration and evaluating the various components of the time series data such as month, year, and week start date. These time components showed a major seasonal trends that proved useful in formulating the discussed time series models. One limitation of the analysis the types of models used in predicting total cases of Dengue Fever. While the models were appropriate and have been used in similar situations, the overall fit could be improved with more advanced techniques and further investigation into variable significance. Introducing techniques such as a Neural Net model could improve the accuracy of future predictions. Another limitation came in the form of the data provided. As discussed, future improvements to the data such as adding variables that more closely document variables like mosquito population and cultural events could add to the predictive power of the model. Data collection is an expensive process and the overall data sets provided give a large range of variables to make predictions on. The availability of time and increased forecasting technique knowledge will improve future studies of the Predicting Disease Spread project.
Future Work
The analysis developed from the provided data from the Center for Disease Control and Prevention gives valuable predictions into future cases of Dengue Fever and the factors that lead to its spread. The predictions made can be improved in future work with more advanced forecasting models, improved environmental variables, and outside research. The models created serve the purpose of making predictions for the number of total cases in the future but could be improved with more specialized and advanced techniques. The forecast ranges produced were fairly large which can cause issues in planning for the spread of harmful disease. A more focused approach would allow disease prevention agencies to anticipate the resources and time required for aid. Models such as a Neural Net provide a more focused forecast range in combination with bootstrapping. The datasets provided could also incorporate additional environmental factors specific to each city such as average mosquito population, vaccination rates, and birth rate. These values can help to predict the increase in population of each city and the carriers of the disease. Outside research would also provide valuable insight into topics like the income of the population most effected by Dengue Fever. This would allow for a more focused approach on the population most at risk from catching the viral disease. The availability of these additional resources and time could produce stronger predictions in future work to have a better understanding of the total cases of Dengue Fever in San Juan and Iquitos. A more refined approach in future work can provide valuable forecasts to prevent the spread of a harmful disease in the targeted cities.