R Packages Utilized

library(readr)
library(ggplot2)
library(ggfortify)
library(forecast)
library(psych)

## library(dpylr) below becuase it created issues with R Markdown knit

Importing the datasets

## dengue_features_train.xlsx
train <- read_csv("C:/Users/bryce_anderson/Desktop/Boston College/Predictive Analytics and Forecasting/Week 7 (PA&F)/Project datasets/dengue_features_train.csv")

## dengue_features_test.xlsx
test <- read_csv("C:/Users/bryce_anderson/Desktop/Boston College/Predictive Analytics and Forecasting/Week 7 (PA&F)/Project datasets/dengue_features_test.csv")

## dengue_labels_train.xlsx
label <- read_csv("C:/Users/bryce_anderson/Desktop/Boston College/Predictive Analytics and Forecasting/Week 7 (PA&F)/Project datasets/dengue_labels_train.csv")

## submission_format.xlsx
submit <- read_csv("C:/Users/bryce_anderson/Desktop/Boston College/Predictive Analytics and Forecasting/Week 7 (PA&F)/Project datasets/submission_format.csv")

Combining the Label set with the Train and Set datasets

## Joining the datasets using the dpylr() function
library(dplyr)
train <- left_join(x=label, y=train, by=c("year", "weekofyear", "city"))

## Omitting NA values from the Train set
train <- na.omit(train)

Data Exploration

Train set Exploration and Interpretation

## Summarizing the Train and Test datasets
summary(train)
##      city                year        weekofyear    total_cases   
##  Length:1199        Min.   :1990   Min.   : 1.0   Min.   :  0.0  
##  Class :character   1st Qu.:1998   1st Qu.:14.0   1st Qu.:  4.0  
##  Mode  :character   Median :2002   Median :26.0   Median : 11.0  
##                     Mean   :2001   Mean   :26.5   Mean   : 21.2  
##                     3rd Qu.:2006   3rd Qu.:39.0   3rd Qu.: 26.0  
##                     Max.   :2010   Max.   :52.0   Max.   :329.0  
##  week_start_date         ndvi_ne            ndvi_nw        
##  Min.   :1990-04-30   Min.   :-0.40625   Min.   :-0.45610  
##  1st Qu.:1998-01-11   1st Qu.: 0.04242   1st Qu.: 0.05159  
##  Median :2002-09-03   Median : 0.12430   Median : 0.12680  
##  Mean   :2001-10-18   Mean   : 0.13972   Mean   : 0.13436  
##  3rd Qu.:2006-02-01   3rd Qu.: 0.24688   3rd Qu.: 0.22071  
##  Max.   :2010-06-25   Max.   : 0.50836   Max.   : 0.45443  
##     ndvi_se            ndvi_sw         precipitation_amt_mm
##  Min.   :-0.01553   Min.   :-0.06346   Min.   :  0.00      
##  1st Qu.: 0.15561   1st Qu.: 0.14495   1st Qu.: 12.55      
##  Median : 0.19673   Median : 0.19230   Median : 41.41      
##  Mean   : 0.20556   Mean   : 0.20571   Mean   : 47.58      
##  3rd Qu.: 0.25255   3rd Qu.: 0.25440   3rd Qu.: 71.77      
##  Max.   : 0.53831   Max.   : 0.54602   Max.   :390.60      
##  reanalysis_air_temp_k reanalysis_avg_temp_k reanalysis_dew_point_temp_k
##  Min.   :294.6         Min.   :294.9         Min.   :289.6              
##  1st Qu.:297.6         1st Qu.:298.3         1st Qu.:294.2              
##  Median :298.6         Median :299.3         Median :295.7              
##  Mean   :298.7         Mean   :299.2         Mean   :295.3              
##  3rd Qu.:299.8         3rd Qu.:300.2         3rd Qu.:296.5              
##  Max.   :302.2         Max.   :302.6         Max.   :298.4              
##  reanalysis_max_air_temp_k reanalysis_min_air_temp_k
##  Min.   :297.8             Min.   :286.9            
##  1st Qu.:301.1             1st Qu.:293.6            
##  Median :302.6             Median :296.0            
##  Mean   :303.7             Mean   :295.6            
##  3rd Qu.:306.0             3rd Qu.:297.9            
##  Max.   :313.2             Max.   :299.9            
##  reanalysis_precip_amt_kg_per_m2 reanalysis_relative_humidity_percent
##  Min.   :  0.00                  Min.   :57.79                       
##  1st Qu.: 13.53                  1st Qu.:77.37                       
##  Median : 28.60                  Median :80.80                       
##  Mean   : 41.55                  Mean   :82.63                       
##  3rd Qu.: 54.75                  3rd Qu.:88.16                       
##  Max.   :570.50                  Max.   :98.61                       
##  reanalysis_sat_precip_amt_mm reanalysis_specific_humidity_g_per_kg
##  Min.   :  0.00               Min.   :11.72                        
##  1st Qu.: 12.55               1st Qu.:15.65                        
##  Median : 41.41               Median :17.17                        
##  Mean   : 47.58               Mean   :16.81                        
##  3rd Qu.: 71.77               3rd Qu.:18.01                        
##  Max.   :390.60               Max.   :20.46                        
##  reanalysis_tdtr_k station_avg_temp_c station_diur_temp_rng_c
##  Min.   : 1.357    Min.   :21.40      Min.   : 4.529         
##  1st Qu.: 2.357    1st Qu.:26.43      1st Qu.: 6.593         
##  Median : 3.000    Median :27.45      Median : 7.471         
##  Mean   : 5.137    Mean   :27.23      Mean   : 8.248         
##  3rd Qu.: 8.029    3rd Qu.:28.16      3rd Qu.:10.012         
##  Max.   :16.029    Max.   :30.80      Max.   :15.800         
##  station_max_temp_c station_min_temp_c station_precip_mm
##  Min.   :26.70      Min.   :14.70      Min.   :  0.00   
##  1st Qu.:31.40      1st Qu.:21.10      1st Qu.:  9.70   
##  Median :32.80      Median :22.10      Median : 24.70   
##  Mean   :32.57      Mean   :22.08      Mean   : 40.92   
##  3rd Qu.:33.90      3rd Qu.:23.30      3rd Qu.: 56.05   
##  Max.   :42.20      Max.   :25.60      Max.   :543.30
summary(test)
##      city                year        weekofyear    week_start_date     
##  Length:416         Min.   :2008   Min.   : 1.00   Min.   :2008-04-29  
##  Class :character   1st Qu.:2010   1st Qu.:13.75   1st Qu.:2010-04-28  
##  Mode  :character   Median :2011   Median :26.00   Median :2011-05-28  
##                     Mean   :2011   Mean   :26.44   Mean   :2011-04-04  
##                     3rd Qu.:2012   3rd Qu.:39.00   3rd Qu.:2012-05-27  
##                     Max.   :2013   Max.   :53.00   Max.   :2013-06-25  
##                                                                        
##     ndvi_ne           ndvi_nw            ndvi_se          ndvi_sw        
##  Min.   :-0.4634   Min.   :-0.21180   Min.   :0.0062   Min.   :-0.01467  
##  1st Qu.:-0.0015   1st Qu.: 0.01597   1st Qu.:0.1487   1st Qu.: 0.13408  
##  Median : 0.1101   Median : 0.08870   Median :0.2042   Median : 0.18647  
##  Mean   : 0.1260   Mean   : 0.12680   Mean   :0.2077   Mean   : 0.20172  
##  3rd Qu.: 0.2633   3rd Qu.: 0.24240   3rd Qu.:0.2549   3rd Qu.: 0.25324  
##  Max.   : 0.5004   Max.   : 0.64900   Max.   :0.4530   Max.   : 0.52904  
##  NA's   :43        NA's   :11         NA's   :1        NA's   :1         
##  precipitation_amt_mm reanalysis_air_temp_k reanalysis_avg_temp_k
##  Min.   :  0.000      Min.   :294.6         Min.   :295.2        
##  1st Qu.:  8.175      1st Qu.:297.8         1st Qu.:298.3        
##  Median : 31.455      Median :298.5         Median :299.3        
##  Mean   : 38.354      Mean   :298.8         Mean   :299.4        
##  3rd Qu.: 57.773      3rd Qu.:300.2         3rd Qu.:300.5        
##  Max.   :169.340      Max.   :301.9         Max.   :303.3        
##  NA's   :2            NA's   :2             NA's   :2            
##  reanalysis_dew_point_temp_k reanalysis_max_air_temp_k
##  Min.   :290.8               Min.   :298.2            
##  1st Qu.:294.3               1st Qu.:301.4            
##  Median :295.8               Median :302.8            
##  Mean   :295.4               Mean   :303.6            
##  3rd Qu.:296.6               3rd Qu.:305.8            
##  Max.   :297.8               Max.   :314.1            
##  NA's   :2                   NA's   :2                
##  reanalysis_min_air_temp_k reanalysis_precip_amt_kg_per_m2
##  Min.   :286.2             Min.   :  0.00                 
##  1st Qu.:293.5             1st Qu.:  9.43                 
##  Median :296.3             Median : 25.85                 
##  Mean   :295.7             Mean   : 42.17                 
##  3rd Qu.:298.3             3rd Qu.: 56.48                 
##  Max.   :299.7             Max.   :301.40                 
##  NA's   :2                 NA's   :2                      
##  reanalysis_relative_humidity_percent reanalysis_sat_precip_amt_mm
##  Min.   :64.92                        Min.   :  0.000             
##  1st Qu.:77.40                        1st Qu.:  8.175             
##  Median :80.33                        Median : 31.455             
##  Mean   :82.50                        Mean   : 38.354             
##  3rd Qu.:88.33                        3rd Qu.: 57.773             
##  Max.   :97.98                        Max.   :169.340             
##  NA's   :2                            NA's   :2                   
##  reanalysis_specific_humidity_g_per_kg reanalysis_tdtr_k
##  Min.   :12.54                         Min.   : 1.486   
##  1st Qu.:15.79                         1st Qu.: 2.446   
##  Median :17.34                         Median : 2.914   
##  Mean   :16.93                         Mean   : 5.125   
##  3rd Qu.:18.17                         3rd Qu.: 8.171   
##  Max.   :19.60                         Max.   :14.486   
##  NA's   :2                             NA's   :2        
##  station_avg_temp_c station_diur_temp_rng_c station_max_temp_c
##  Min.   :24.16      Min.   : 4.043          Min.   :27.20     
##  1st Qu.:26.51      1st Qu.: 5.929          1st Qu.:31.10     
##  Median :27.48      Median : 6.643          Median :32.80     
##  Mean   :27.37      Mean   : 7.811          Mean   :32.53     
##  3rd Qu.:28.32      3rd Qu.: 9.812          3rd Qu.:33.90     
##  Max.   :30.27      Max.   :14.725          Max.   :38.40     
##  NA's   :12         NA's   :12              NA's   :3         
##  station_min_temp_c station_precip_mm
##  Min.   :14.20      Min.   :  0.00   
##  1st Qu.:21.20      1st Qu.:  9.10   
##  Median :22.20      Median : 23.60   
##  Mean   :22.37      Mean   : 34.28   
##  3rd Qu.:23.30      3rd Qu.: 47.75   
##  Max.   :26.70      Max.   :212.00   
##  NA's   :9          NA's   :5
summary(label)
##      city                year        weekofyear     total_cases    
##  Length:1456        Min.   :1990   Min.   : 1.00   Min.   :  0.00  
##  Class :character   1st Qu.:1997   1st Qu.:13.75   1st Qu.:  5.00  
##  Mode  :character   Median :2002   Median :26.50   Median : 12.00  
##                     Mean   :2001   Mean   :26.50   Mean   : 24.68  
##                     3rd Qu.:2005   3rd Qu.:39.25   3rd Qu.: 28.00  
##                     Max.   :2010   Max.   :53.00   Max.   :461.00
## Descriptive Statistics of the Training set
describe(train)
##                                       vars    n    mean    sd  median
## city*                                    1 1199     NaN    NA      NA
## year                                     2 1199 2001.30  5.35 2002.00
## weekofyear                               3 1199   26.50 14.90   26.00
## total_cases                              4 1199   21.20 30.86   11.00
## week_start_date                          5 1199     NaN    NA      NA
## ndvi_ne                                  6 1199    0.14  0.14    0.12
## ndvi_nw                                  7 1199    0.13  0.12    0.13
## ndvi_se                                  8 1199    0.21  0.07    0.20
## ndvi_sw                                  9 1199    0.21  0.09    0.19
## precipitation_amt_mm                    10 1199   47.58 43.18   41.41
## reanalysis_air_temp_k                   11 1199  298.68  1.36  298.62
## reanalysis_avg_temp_k                   12 1199  299.24  1.26  299.32
## reanalysis_dew_point_temp_k             13 1199  295.30  1.50  295.68
## reanalysis_max_air_temp_k               14 1199  303.66  3.30  302.60
## reanalysis_min_air_temp_k               15 1199  295.58  2.59  296.00
## reanalysis_precip_amt_kg_per_m2         16 1199   41.55 44.49   28.60
## reanalysis_relative_humidity_percent    17 1199   82.63  7.30   80.80
## reanalysis_sat_precip_amt_mm            18 1199   47.58 43.18   41.41
## reanalysis_specific_humidity_g_per_kg   19 1199   16.81  1.52   17.17
## reanalysis_tdtr_k                       20 1199    5.14  3.59    3.00
## station_avg_temp_c                      21 1199   27.23  1.27   27.45
## station_diur_temp_rng_c                 22 1199    8.25  2.18    7.47
## station_max_temp_c                      23 1199   32.57  1.95   32.80
## station_min_temp_c                      24 1199   22.08  1.55   22.10
## station_precip_mm                       25 1199   40.92 49.00   24.70
##                                       trimmed   mad     min     max  range
## city*                                     NaN    NA     Inf    -Inf   -Inf
## year                                  2001.62  5.93 1990.00 2010.00  20.00
## weekofyear                              26.49 19.27    1.00   52.00  51.00
## total_cases                             14.78 13.34    0.00  329.00 329.00
## week_start_date                           NaN    NA     Inf    -Inf   -Inf
## ndvi_ne                                  0.14  0.15   -0.41    0.51   0.91
## ndvi_nw                                  0.13  0.12   -0.46    0.45   0.91
## ndvi_se                                  0.20  0.07   -0.02    0.54   0.55
## ndvi_sw                                  0.20  0.08   -0.06    0.55   0.61
## precipitation_amt_mm                    42.52 43.90    0.00  390.60 390.60
## reanalysis_air_temp_k                  298.69  1.55  294.64  302.20   7.56
## reanalysis_avg_temp_k                  299.28  1.43  294.89  302.61   7.72
## reanalysis_dew_point_temp_k            295.44  1.49  289.64  298.45   8.81
## reanalysis_max_air_temp_k              303.37  2.97  297.80  313.20  15.40
## reanalysis_min_air_temp_k              295.77  2.97  286.90  299.90  13.00
## reanalysis_precip_amt_kg_per_m2         33.67 26.24    0.00  570.50 570.50
## reanalysis_relative_humidity_percent    82.23  6.17   57.79   98.61  40.82
## reanalysis_sat_precip_amt_mm            42.52 43.90    0.00  390.60 390.60
## reanalysis_specific_humidity_g_per_kg   16.91  1.55   11.72   20.46   8.75
## reanalysis_tdtr_k                        4.64  1.42    1.36   16.03  14.67
## station_avg_temp_c                      27.31  1.24   21.40   30.80   9.40
## station_diur_temp_rng_c                  8.06  1.87    4.53   15.80  11.27
## station_max_temp_c                      32.64  1.63   26.70   42.20  15.50
## station_min_temp_c                      22.11  1.63   14.70   25.60  10.90
## station_precip_mm                       31.74 27.43    0.00  543.30 543.30
##                                        skew kurtosis   se
## city*                                    NA       NA   NA
## year                                  -0.48    -0.75 0.15
## weekofyear                             0.00    -1.20 0.43
## total_cases                            3.80    21.13 0.89
## week_start_date                          NA       NA   NA
## ndvi_ne                               -0.08    -0.14 0.00
## ndvi_nw                               -0.06     0.13 0.00
## ndvi_se                                0.57     0.44 0.00
## ndvi_sw                                0.72     0.58 0.00
## precipitation_amt_mm                   1.73     7.45 1.25
## reanalysis_air_temp_k                 -0.07    -0.67 0.04
## reanalysis_avg_temp_k                 -0.23    -0.49 0.04
## reanalysis_dew_point_temp_k           -0.77     0.03 0.04
## reanalysis_max_air_temp_k              0.74    -0.44 0.10
## reanalysis_min_air_temp_k             -0.57    -0.39 0.07
## reanalysis_precip_amt_kg_per_m2        3.41    22.70 1.28
## reanalysis_relative_humidity_percent   0.49    -0.59 0.21
## reanalysis_sat_precip_amt_mm           1.73     7.45 1.25
## reanalysis_specific_humidity_g_per_kg -0.58    -0.39 0.04
## reanalysis_tdtr_k                      0.92    -0.52 0.10
## station_avg_temp_c                    -0.62     0.03 0.04
## station_diur_temp_rng_c                0.70    -0.54 0.06
## station_max_temp_c                    -0.27     0.30 0.06
## station_min_temp_c                    -0.28     0.29 0.04
## station_precip_mm                      2.94    14.89 1.41
## Significance of variables to total_cases in Train set
lmTrain <- lm(total_cases ~ ., data=train)
summary(lmTrain)
## 
## Call:
## lm(formula = total_cases ~ ., data = train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -42.391 -13.736  -3.699   6.159 283.066 
## 
## Coefficients: (1 not defined because of singularities)
##                                         Estimate Std. Error t value
## (Intercept)                           -4.938e+04  2.716e+04  -1.818
## citysj                                 9.901e+00  1.207e+01   0.820
## year                                   2.921e+01  1.379e+01   2.118
## weekofyear                             7.471e-01  2.637e-01   2.833
## week_start_date                       -8.388e-02  3.777e-02  -2.221
## ndvi_ne                               -1.386e+01  1.170e+01  -1.185
## ndvi_nw                                8.681e+00  1.336e+01   0.650
## ndvi_se                               -1.471e+01  1.915e+01  -0.768
## ndvi_sw                                7.784e+00  1.797e+01   0.433
## precipitation_amt_mm                  -4.947e-03  2.332e-02  -0.212
## reanalysis_air_temp_k                  6.914e+00  1.140e+01   0.606
## reanalysis_avg_temp_k                 -1.051e+01  5.013e+00  -2.096
## reanalysis_dew_point_temp_k           -2.664e+01  1.348e+01  -1.977
## reanalysis_max_air_temp_k              1.583e+00  1.118e+00   1.416
## reanalysis_min_air_temp_k             -3.253e-01  1.529e+00  -0.213
## reanalysis_precip_amt_kg_per_m2       -1.671e-02  2.449e-02  -0.682
## reanalysis_relative_humidity_percent  -1.016e+00  2.497e+00  -0.407
## reanalysis_sat_precip_amt_mm                  NA         NA      NA
## reanalysis_specific_humidity_g_per_kg  3.661e+01  9.345e+00   3.918
## reanalysis_tdtr_k                     -6.440e-02  1.558e+00  -0.041
## station_avg_temp_c                    -1.772e+00  1.881e+00  -0.942
## station_diur_temp_rng_c                2.617e-01  1.164e+00   0.225
## station_max_temp_c                    -1.394e+00  1.065e+00  -1.308
## station_min_temp_c                     6.950e-01  1.277e+00   0.544
## station_precip_mm                     -9.822e-04  1.928e-02  -0.051
##                                       Pr(>|t|)    
## (Intercept)                            0.06931 .  
## citysj                                 0.41223    
## year                                   0.03442 *  
## weekofyear                             0.00468 ** 
## week_start_date                        0.02656 *  
## ndvi_ne                                0.23626    
## ndvi_nw                                0.51611    
## ndvi_se                                0.44269    
## ndvi_sw                                0.66508    
## precipitation_amt_mm                   0.83202    
## reanalysis_air_temp_k                  0.54434    
## reanalysis_avg_temp_k                  0.03628 *  
## reanalysis_dew_point_temp_k            0.04832 *  
## reanalysis_max_air_temp_k              0.15711    
## reanalysis_min_air_temp_k              0.83151    
## reanalysis_precip_amt_kg_per_m2        0.49523    
## reanalysis_relative_humidity_percent   0.68408    
## reanalysis_sat_precip_amt_mm                NA    
## reanalysis_specific_humidity_g_per_kg 9.45e-05 ***
## reanalysis_tdtr_k                      0.96704    
## station_avg_temp_c                     0.34657    
## station_diur_temp_rng_c                0.82223    
## station_max_temp_c                     0.19096    
## station_min_temp_c                     0.58629    
## station_precip_mm                      0.95938    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 26.97 on 1175 degrees of freedom
## Multiple R-squared:  0.2509, Adjusted R-squared:  0.2362 
## F-statistic: 17.11 on 23 and 1175 DF,  p-value: < 2.2e-16
autoplot(lmTrain)

checkresiduals(lmTrain)

## 
##  Breusch-Godfrey test for serial correlation of order up to 28
## 
## data:  Residuals
## LM test = 985.69, df = 28, p-value < 2.2e-16

Variable Exploration of Train set

## Plotting total cases by the time components
par(bg="grey")
plot(total_cases ~ year, data=train, col="Dodgerblue", bg="grey", ylab="Total Cases of Dengue Fever", xlab="Year", main="Cases of Dengue Fever per Year")

plot(total_cases ~ weekofyear, data=train, col="firebrick4", bg="grey", ylab="Total Cases of Dengue Fever", xlab="Week of Year", main="Cases of Dengue Fever per Week of Year")

plot(total_cases ~ week_start_date, data=train, col="mediumpurple", bg="grey", ylab="Total Cases of Dengue Fever", xlab="Week Start Date", main="Cases of Dengue Fever per Week Start Date")

## Plotting Variables for Total Cases
plot(total_cases ~ reanalysis_avg_temp_k, data=train, col="coral", bg="grey", ylab="Total Cases of Dengue Fever", xlab="Average Temperature (Kelvin)", main="Cases of Dengue Fever per Average Temperature")

plot(total_cases ~ reanalysis_dew_point_temp_k, data=train, col="springgreen4", bg="grey", ylab="Total Cases of Dengue Fever", xlab="Average Dewpoint Temperature (Kelvin)", main="Cases of Dengue Fever per Average Dewpoint Temperature")

plot(total_cases ~ reanalysis_specific_humidity_g_per_kg, data=train, col="goldenrod4", bg="grey", ylab="Total Cases of Dengue Fever", xlab="Specific Humidity (g per kg)", main="Cases of Dengue Fever per Specific Humidity")

plot(total_cases ~ reanalysis_max_air_temp_k, data=train, col="cornflowerblue", bg="grey", ylab="Total Cases of Dengue Fever", xlab="Maximum Air Temperature (Kelvin)", main="Cases of Dengue Fever per Maximum Air Temperature")

Splitting the Train set by City

library(dplyr)
## Utilizing the dpylr() function to split the created dataset
## San Juan filter
sj <- train %>% filter(city=="sj")

## Iquitos filter
iq <- train %>% filter(city=="iq")

Linear Regression of each City

## San Juan set
lm1 <- lm(total_cases ~ year + weekofyear + week_start_date + reanalysis_avg_temp_k  + reanalysis_dew_point_temp_k + reanalysis_specific_humidity_g_per_kg, data=sj)

summary(lm1)
## 
## Call:
## lm(formula = total_cases ~ year + weekofyear + week_start_date + 
##     reanalysis_avg_temp_k + reanalysis_dew_point_temp_k + reanalysis_specific_humidity_g_per_kg, 
##     data = sj)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -48.788 -18.127  -6.939   8.676 280.846 
## 
## Coefficients:
##                                         Estimate Std. Error t value
## (Intercept)                           -6.405e+04  3.847e+04  -1.665
## year                                   4.059e+01  1.962e+01   2.068
## weekofyear                             1.096e+00  3.726e-01   2.941
## week_start_date                       -1.153e-01  5.375e-02  -2.145
## reanalysis_avg_temp_k                  2.168e+00  2.538e+00   0.854
## reanalysis_dew_point_temp_k           -5.943e+01  1.457e+01  -4.079
## reanalysis_specific_humidity_g_per_kg  6.320e+01  1.479e+01   4.274
##                                       Pr(>|t|)    
## (Intercept)                            0.09636 .  
## year                                   0.03896 *  
## weekofyear                             0.00337 ** 
## week_start_date                        0.03229 *  
## reanalysis_avg_temp_k                  0.39325    
## reanalysis_dew_point_temp_k           5.03e-05 ***
## reanalysis_specific_humidity_g_per_kg 2.18e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 33.15 on 720 degrees of freedom
## Multiple R-squared:  0.165,  Adjusted R-squared:  0.1581 
## F-statistic: 23.72 on 6 and 720 DF,  p-value: < 2.2e-16
plot(lm1)

## Iquitos set
lm2 <- lm(total_cases ~ year + weekofyear + week_start_date + reanalysis_avg_temp_k  + reanalysis_dew_point_temp_k + reanalysis_specific_humidity_g_per_kg, data=iq)

summary(lm2)
## 
## Call:
## lm(formula = total_cases ~ year + weekofyear + week_start_date + 
##     reanalysis_avg_temp_k + reanalysis_dew_point_temp_k + reanalysis_specific_humidity_g_per_kg, 
##     data = iq)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -11.444  -4.902  -2.564   1.255  74.302 
## 
## Coefficients:
##                                         Estimate Std. Error t value
## (Intercept)                           -4.463e+03  1.814e+04  -0.246
## year                                   3.362e+00  9.195e+00   0.366
## weekofyear                             5.541e-02  1.756e-01   0.316
## week_start_date                       -7.942e-03  2.517e-02  -0.316
## reanalysis_avg_temp_k                  1.281e-01  3.673e-01   0.349
## reanalysis_dew_point_temp_k           -8.002e+00  4.914e+00  -1.628
## reanalysis_specific_humidity_g_per_kg  9.185e+00  4.829e+00   1.902
##                                       Pr(>|t|)  
## (Intercept)                             0.8058  
## year                                    0.7148  
## weekofyear                              0.7525  
## week_start_date                         0.7525  
## reanalysis_avg_temp_k                   0.7274  
## reanalysis_dew_point_temp_k             0.1041  
## reanalysis_specific_humidity_g_per_kg   0.0578 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.128 on 465 degrees of freedom
## Multiple R-squared:  0.09257,    Adjusted R-squared:  0.08086 
## F-statistic: 7.906 on 6 and 465 DF,  p-value: 3.968e-08
plot(lm2)

Creating a time series of each City data set

sjTS <- ts(sj$total_cases, start=c(1))
iqTS <- ts(iq$total_cases, start=c(1))

## Summaries of the time series sets
summary(sjTS)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    9.00   18.00   30.21   36.00  329.00
summary(iqTS)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.000   5.000   7.309   9.000  83.000

Model Formulation

Model 1 - ETS Model

# San Juan
smodel1 <- ets(sjTS, model="ZZZ", damped=NULL, alpha=NULL, beta=NULL,
    gamma=NULL, phi=NULL, lambda=NULL, biasadj=FALSE,
    additive.only=FALSE, restrict=TRUE,
    allow.multiplicative.trend=FALSE)

smodel1
## ETS(A,N,N) 
## 
## Call:
##  ets(y = sjTS, model = "ZZZ", damped = NULL, alpha = NULL, beta = NULL,  
## 
##  Call:
##      gamma = NULL, phi = NULL, additive.only = FALSE, lambda = NULL,  
## 
##  Call:
##      biasadj = FALSE, restrict = TRUE, allow.multiplicative.trend = FALSE) 
## 
##   Smoothing parameters:
##     alpha = 0.9999 
## 
##   Initial states:
##     l = 4.1836 
## 
##   sigma:  12.4615
## 
##      AIC     AICc      BIC 
## 8462.075 8462.108 8475.842
forecast(smodel1, h=10)
##     Point Forecast     Lo 80    Hi 80     Lo 95    Hi 95
## 728         4.9996 -10.97050 20.96970 -19.42456 29.42376
## 729         4.9996 -17.58440 27.58360 -29.53964 39.53884
## 730         4.9996 -22.65957 32.65877 -37.30146 47.30066
## 731         4.9996 -26.93820 36.93740 -43.84505 53.84425
## 732         4.9996 -30.70776 40.70696 -49.61010 59.60930
## 733         4.9996 -34.11573 44.11493 -54.82213 64.82133
## 734         4.9996 -37.24968 47.24888 -59.61510 69.61430
## 735         4.9996 -40.16670 50.16590 -64.07630 74.07550
## 736         4.9996 -42.90643 52.90563 -68.26635 78.26555
## 737         4.9996 -45.49773 55.49693 -72.22941 82.22861
sjf1 <- forecast(smodel1, h=52)
autoplot(sjf1) +xlab("Week") + ylab("Total Cases of Dengue Fever")

checkresiduals(sjf1)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,N,N)
## Q* = 30.582, df = 8, p-value = 0.0001667
## 
## Model df: 2.   Total lags used: 10
## Iquitos
imodel1 <- ets(iqTS, model="ZZZ", damped=NULL, alpha=NULL, beta=NULL,
    gamma=NULL, phi=NULL, lambda=NULL, biasadj=FALSE,
    additive.only=FALSE, restrict=TRUE,
    allow.multiplicative.trend=FALSE)

imodel1
## ETS(A,N,N) 
## 
## Call:
##  ets(y = iqTS, model = "ZZZ", damped = NULL, alpha = NULL, beta = NULL,  
## 
##  Call:
##      gamma = NULL, phi = NULL, additive.only = FALSE, lambda = NULL,  
## 
##  Call:
##      biasadj = FALSE, restrict = TRUE, allow.multiplicative.trend = FALSE) 
## 
##   Smoothing parameters:
##     alpha = 0.5619 
## 
##   Initial states:
##     l = 0.1055 
## 
##   sigma:  6.5424
## 
##      AIC     AICc      BIC 
## 4683.212 4683.263 4695.683
forecast(imodel1, h=10)
##     Point Forecast      Lo 80    Hi 80      Lo 95    Hi 95
## 473       3.176212  -5.208239 11.56066  -9.646699 15.99912
## 474       3.176212  -6.441405 12.79383 -11.532664 17.88509
## 475       3.176212  -7.533508 13.88593 -13.202891 19.55531
## 476       3.176212  -8.524114 14.87654 -14.717893 21.07032
## 477       3.176212  -9.437161 15.78958 -16.114277 22.46670
## 478       3.176212 -10.288434 16.64086 -17.416188 23.76861
## 479       3.176212 -11.088999 17.44142 -18.640545 24.99297
## 480       3.176212 -11.846963 18.19939 -19.799751 26.15217
## 481       3.176212 -12.568480 18.92090 -20.903216 27.25564
## 482       3.176212 -13.258351 19.61077 -21.958282 28.31071
iqf1 <- forecast(imodel1, h=52)
autoplot(iqf1) + xlab("Week") + ylab("Total Cases of Dengue Fever")

checkresiduals(iqf1)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,N,N)
## Q* = 18.462, df = 8, p-value = 0.01802
## 
## Model df: 2.   Total lags used: 10

Model 2 - ARIMA Model

## San Juan
smodel2 <- auto.arima(sjTS)
smodel2
## Series: sjTS 
## ARIMA(2,1,2) 
## 
## Coefficients:
##           ar1      ar2     ma1     ma2
##       -1.0539  -0.4930  1.1509  0.6432
## s.e.   0.1614   0.1615  0.1429  0.1423
## 
## sigma^2 estimated as 151.3:  log likelihood=-2850.29
## AIC=5710.58   AICc=5710.67   BIC=5733.52
forecast(smodel2, h=10)
##     Point Forecast     Lo 80    Hi 80     Lo 95    Hi 95
## 728       5.171452 -10.59441 20.93732 -18.94035 29.28326
## 729       5.733913 -17.66979 29.13762 -30.05897 41.52679
## 730       5.056630 -24.49946 34.61272 -40.14551 50.25877
## 731       5.493073 -28.35739 39.34354 -46.27675 57.26289
## 732       5.367052 -32.86062 43.59472 -53.09713 63.83123
## 733       5.284677 -36.63312 47.20248 -58.82307 69.39242
## 734       5.433621 -39.86478 50.73203 -63.84431 74.71155
## 735       5.317270 -43.23581 53.87035 -68.93826 79.57280
## 736       5.366452 -46.13232 56.86523 -73.39413 84.12703
## 737       5.371987 -48.96863 59.71260 -77.73481 88.47879
sjf2 <- forecast(smodel2, h=52)
autoplot(sjf2) +xlab("Week") + ylab("Total Cases of Dengue Fever")

checkresiduals(sjf2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(2,1,2)
## Q* = 10.112, df = 6, p-value = 0.12
## 
## Model df: 4.   Total lags used: 10
## Iquitos
imodel2 <- auto.arima(iqTS)
imodel2
## Series: iqTS 
## ARIMA(0,1,1) 
## 
## Coefficients:
##           ma1
##       -0.4370
## s.e.   0.0407
## 
## sigma^2 estimated as 42.8:  log likelihood=-1552.61
## AIC=3109.22   AICc=3109.24   BIC=3117.53
forecast(imodel2, h=10)
##     Point Forecast      Lo 80    Hi 80      Lo 95    Hi 95
## 473       3.176061  -5.208391 11.56051  -9.646852 15.99897
## 474       3.176061  -6.446017 12.79814 -11.539638 17.89176
## 475       3.176061  -7.541669 13.89379 -13.215293 19.56741
## 476       3.176061  -8.535262 14.88738 -14.734862 21.08698
## 477       3.176061  -9.450912 15.80303 -16.135227 22.48735
## 478       3.176061 -10.304510 16.65663 -17.440693 23.79281
## 479       3.176061 -11.107186 17.45931 -18.668280 25.02040
## 480       3.176061 -11.867093 18.21922 -19.830458 26.18258
## 481       3.176061 -12.590417 18.94254 -20.936687 27.28881
## 482       3.176061 -13.281982 19.63410 -21.994344 28.34647
iqf2 <- forecast(imodel2, h=52)
autoplot(iqf2) +xlab("Week") + ylab("Total Cases of Dengue Fever")

checkresiduals(iqf2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,1)
## Q* = 18.398, df = 9, p-value = 0.03082
## 
## Model df: 1.   Total lags used: 10

Model 3 - Holt-Winters’ Seasonal Method

## San Juan
sjHW <- ts(sj$total_cases, frequency=3, start=c(1))
smodel3 <- hw(sjHW, seasonal="additive", damped=FALSE, h=52)
autoplot(smodel3) + xlab("Week") + ylab("Total Cases of Dengue Fever")

smodel3
##          Point Forecast      Lo 80     Hi 80      Lo 95     Hi 95
## 243.3333       4.629310  -11.40265  20.66127  -19.88946  29.14808
## 243.6667       4.919338  -17.75327  27.59195  -29.75542  39.59410
## 244.0000       5.655215  -22.11387  33.42430  -36.81393  48.12437
## 244.3333       5.284527  -26.78259  37.35164  -43.75789  54.32695
## 244.6667       5.574556  -30.27900  41.42811  -49.25873  60.40784
## 245.0000       6.310433  -32.96687  45.58774  -53.75902  66.37989
## 245.3333       5.939745  -36.48712  48.36661  -58.94655  70.82604
## 245.6667       6.229773  -39.12848  51.58803  -63.13969  75.59923
## 246.0000       6.965650  -41.14625  55.07755  -66.61515  80.54645
## 246.3333       6.594962  -44.12231  57.31223  -70.97041  84.16033
## 246.6667       6.884990  -46.31020  60.08018  -74.47003  88.24001
## 247.0000       7.620867  -47.94230  63.18404  -77.35567  92.59740
## 247.3333       7.250179  -50.58499  65.08535  -81.20108  95.70144
## 247.6667       7.540207  -52.48102  67.56144  -84.25434  99.33475
## 248.0000       8.276084  -53.85475  70.40692  -86.74482 103.29699
## 248.3333       7.905396  -56.26653  72.07732  -90.23709 106.04789
## 248.6667       8.195424  -57.95464  74.34549  -92.97237 109.36322
## 249.0000       8.931301  -59.13983  77.00243  -95.17450 113.03711
## 249.3333       8.560613  -61.37956  78.50079  -98.40366 115.52488
## 249.6667       8.850642  -62.90992  80.61121 -100.89767 118.59895
## 250.0000       9.586519  -63.94973  83.12277 -102.87747 122.05051
## 250.3333       9.215830  -66.05491  84.48657 -105.90084 124.33250
## 250.6667       9.505859  -67.46030  86.47202 -108.20372 127.21544
## 251.0000      10.241736  -68.38361  88.86709 -110.00536 130.48883
## 251.3333       9.871048  -70.37984  90.12194 -112.86209 132.60419
## 251.6667      10.161076  -71.68307  92.00522 -115.00874 135.33090
## 252.0000      10.896953  -72.51033  94.30424 -116.66348 138.45739
## 252.3333      10.526265  -74.41600  95.46853 -119.38173 140.43426
## 252.6667      10.816293  -75.63371  97.26630 -121.39758 143.03016
## 253.0000      11.552170  -76.38001  99.48435 -122.92850 146.03284
## 253.3333      11.181482  -78.20888 100.57185 -125.52928 147.89225
## 253.6667      11.471510  -79.35363 102.29665 -127.43356 150.37658
## 254.0000      12.207387  -80.03049 104.44527 -128.85828 153.27305
## 254.3333      11.836699  -81.79316 105.46656 -131.35781 155.03121
## 254.6667      12.126728  -82.87471 107.12817 -133.16544 157.41889
## 255.0000      12.862604  -83.49116 109.21637 -134.49777 160.22298
## 255.3333      12.491916  -85.19599 110.17982 -136.90884 161.89268
## 255.6667      12.781945  -86.22212 111.78601 -138.63171 164.19560
## 256.0000      13.517822  -86.78540 113.82104 -139.88271 166.91835
## 256.3333      13.147134  -88.43913 114.73340 -142.21565 168.50991
## 256.6667      13.437162  -89.41614 116.29046 -143.86339 170.73771
## 257.0000      14.173039  -89.93213 118.27821 -145.04208 173.38816
## 257.3333      13.802351  -91.54030 119.14501 -147.30534 174.91004
## 257.6667      14.092379  -92.47339 120.65815 -148.88590 177.07066
## 258.0000      14.828256  -92.94699 122.60350 -149.99975 179.65626
## 258.3333      14.457568  -94.51420 123.42933 -152.20037 181.11550
## 258.6667      14.747596  -95.40770 124.90289 -153.72039 183.21558
## 259.0000      15.483473  -95.84300 126.80995 -154.77568 185.74262
## 259.3333      15.112785  -97.37313 127.59870 -156.91958 187.14515
## 259.6667      15.402814  -98.23072 129.03635 -158.38468 189.19031
## 260.0000      16.138690  -98.63121 130.90859 -159.38672 191.66410
## 260.3333      15.768002 -100.12757 131.66357 -161.47898 193.01498
summary(smodel3)
## 
## Forecast method: Holt-Winters' additive method
## 
## Model Information:
## Holt-Winters' additive method 
## 
## Call:
##  hw(y = sjHW, h = 52, seasonal = "additive", damped = FALSE) 
## 
##   Smoothing parameters:
##     alpha = 0.9999 
##     beta  = 1e-04 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 3.5836 
##     b = 0.235 
##     s = -0.1627 -0.223 0.3858
## 
##   sigma:  12.5098
## 
##      AIC     AICc      BIC 
## 8472.665 8472.866 8509.377 
## 
## Error measures:
##                      ME     RMSE      MAE  MPE MAPE      MASE       ACF1
## Training set -0.2277723 12.44943 7.773029 -Inf  Inf 0.6154498 0.09173901
## 
## Forecasts:
##          Point Forecast      Lo 80     Hi 80      Lo 95     Hi 95
## 243.3333       4.629310  -11.40265  20.66127  -19.88946  29.14808
## 243.6667       4.919338  -17.75327  27.59195  -29.75542  39.59410
## 244.0000       5.655215  -22.11387  33.42430  -36.81393  48.12437
## 244.3333       5.284527  -26.78259  37.35164  -43.75789  54.32695
## 244.6667       5.574556  -30.27900  41.42811  -49.25873  60.40784
## 245.0000       6.310433  -32.96687  45.58774  -53.75902  66.37989
## 245.3333       5.939745  -36.48712  48.36661  -58.94655  70.82604
## 245.6667       6.229773  -39.12848  51.58803  -63.13969  75.59923
## 246.0000       6.965650  -41.14625  55.07755  -66.61515  80.54645
## 246.3333       6.594962  -44.12231  57.31223  -70.97041  84.16033
## 246.6667       6.884990  -46.31020  60.08018  -74.47003  88.24001
## 247.0000       7.620867  -47.94230  63.18404  -77.35567  92.59740
## 247.3333       7.250179  -50.58499  65.08535  -81.20108  95.70144
## 247.6667       7.540207  -52.48102  67.56144  -84.25434  99.33475
## 248.0000       8.276084  -53.85475  70.40692  -86.74482 103.29699
## 248.3333       7.905396  -56.26653  72.07732  -90.23709 106.04789
## 248.6667       8.195424  -57.95464  74.34549  -92.97237 109.36322
## 249.0000       8.931301  -59.13983  77.00243  -95.17450 113.03711
## 249.3333       8.560613  -61.37956  78.50079  -98.40366 115.52488
## 249.6667       8.850642  -62.90992  80.61121 -100.89767 118.59895
## 250.0000       9.586519  -63.94973  83.12277 -102.87747 122.05051
## 250.3333       9.215830  -66.05491  84.48657 -105.90084 124.33250
## 250.6667       9.505859  -67.46030  86.47202 -108.20372 127.21544
## 251.0000      10.241736  -68.38361  88.86709 -110.00536 130.48883
## 251.3333       9.871048  -70.37984  90.12194 -112.86209 132.60419
## 251.6667      10.161076  -71.68307  92.00522 -115.00874 135.33090
## 252.0000      10.896953  -72.51033  94.30424 -116.66348 138.45739
## 252.3333      10.526265  -74.41600  95.46853 -119.38173 140.43426
## 252.6667      10.816293  -75.63371  97.26630 -121.39758 143.03016
## 253.0000      11.552170  -76.38001  99.48435 -122.92850 146.03284
## 253.3333      11.181482  -78.20888 100.57185 -125.52928 147.89225
## 253.6667      11.471510  -79.35363 102.29665 -127.43356 150.37658
## 254.0000      12.207387  -80.03049 104.44527 -128.85828 153.27305
## 254.3333      11.836699  -81.79316 105.46656 -131.35781 155.03121
## 254.6667      12.126728  -82.87471 107.12817 -133.16544 157.41889
## 255.0000      12.862604  -83.49116 109.21637 -134.49777 160.22298
## 255.3333      12.491916  -85.19599 110.17982 -136.90884 161.89268
## 255.6667      12.781945  -86.22212 111.78601 -138.63171 164.19560
## 256.0000      13.517822  -86.78540 113.82104 -139.88271 166.91835
## 256.3333      13.147134  -88.43913 114.73340 -142.21565 168.50991
## 256.6667      13.437162  -89.41614 116.29046 -143.86339 170.73771
## 257.0000      14.173039  -89.93213 118.27821 -145.04208 173.38816
## 257.3333      13.802351  -91.54030 119.14501 -147.30534 174.91004
## 257.6667      14.092379  -92.47339 120.65815 -148.88590 177.07066
## 258.0000      14.828256  -92.94699 122.60350 -149.99975 179.65626
## 258.3333      14.457568  -94.51420 123.42933 -152.20037 181.11550
## 258.6667      14.747596  -95.40770 124.90289 -153.72039 183.21558
## 259.0000      15.483473  -95.84300 126.80995 -154.77568 185.74262
## 259.3333      15.112785  -97.37313 127.59870 -156.91958 187.14515
## 259.6667      15.402814  -98.23072 129.03635 -158.38468 189.19031
## 260.0000      16.138690  -98.63121 130.90859 -159.38672 191.66410
## 260.3333      15.768002 -100.12757 131.66357 -161.47898 193.01498
checkresiduals(smodel3)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt-Winters' additive method
## Q* = 30.484, df = 3, p-value = 1.091e-06
## 
## Model df: 7.   Total lags used: 10
## Iquitos
iqHW <- ts(iq$total_cases, frequency=3, start=c(1))
imodel3 <- hw(iqHW, seasonal="additive", damped=FALSE, h=52)
autoplot(imodel3) + xlab("Week") + ylab("Total Cases of Dengue Fever")

imodel3
##          Point Forecast      Lo 80    Hi 80     Lo 95    Hi 95
## 158.3333       2.820597  -5.618911 11.26011 -10.08652 15.72771
## 158.6667       3.142969  -6.515785 12.80172 -11.62882 17.91476
## 159.0000       3.197722  -7.543121 13.93856 -13.22898 19.62442
## 159.3333       2.784539  -8.939615 14.50869 -15.14601 20.71509
## 159.6667       3.106911  -9.524235 15.73806 -16.21076 22.42458
## 160.0000       3.161663 -10.315868 16.63920 -17.45044 23.77377
## 160.3333       2.748481 -11.525894 17.02286 -19.08229 24.57925
## 160.6667       3.070853 -11.958175 18.09988 -19.91406 26.05577
## 161.0000       3.125605 -12.622206 18.87342 -20.95859 27.20980
## 161.3333       2.712423 -13.723249 19.14810 -22.42377 27.84862
## 161.6667       3.034795 -14.061085 20.13067 -23.11110 29.18069
## 162.0000       3.089547 -14.642199 20.82129 -24.02882 30.20791
## 162.3333       2.676365 -15.669655 21.02239 -25.38145 30.73418
## 162.6667       2.998737 -15.941646 21.93912 -25.96808 31.96555
## 163.0000       3.053489 -16.463368 22.57035 -26.79497 32.90195
## 163.3333       2.640307 -17.436875 22.71749 -28.06509 33.34571
## 163.6667       2.962679 -17.659609 23.58497 -28.57639 34.50175
## 164.0000       3.017431 -18.136108 24.17097 -29.33412 35.36898
## 164.3333       2.604249 -19.067891 24.27639 -30.54043 35.74893
## 164.6667       2.926621 -19.251997 25.10524 -30.99265 36.84589
## 165.0000       2.981373 -19.692587 25.65533 -31.69546 37.65820
## 165.3333       2.568191 -20.590863 25.72724 -32.85053 37.98691
## 165.6667       2.890562 -20.743631 26.52476 -33.25482 39.03594
## 166.0000       2.945315 -21.154818 27.04545 -33.91266 39.80329
## 166.3333       2.532133 -22.025425 27.08969 -35.02541 40.08968
## 166.6667       2.854504 -22.152112 27.86112 -35.38982 41.09882
## 167.0000       2.909257 -22.538653 28.35717 -36.00996 41.82848
## 167.3333       2.496074 -23.385913 28.37806 -37.08701 42.07916
## 167.6667       2.818446 -23.490458 29.12735 -37.41755 43.05444
## 168.0000       2.873199 -23.855954 29.60235 -38.00551 43.75191
## 168.3333       2.460016 -24.683173 29.60321 -39.05191 43.97194
## 168.6667       2.782388 -24.768617 30.33339 -39.35324 44.91801
## 169.0000       2.837140 -25.115873 30.79015 -39.91330 45.58759
## 169.3333       2.423958 -25.925646 30.77356 -40.93302 45.78094
## 169.6667       2.746330 -25.994393 31.48705 -41.20881 46.70147
## 170.0000       2.801082 -26.325645 31.92781 -41.74440 47.34657
## 170.3333       2.387900 -27.120053 31.89585 -42.74062 47.51642
## 170.6667       2.710272 -27.174044 32.59459 -42.99385 48.41439
## 171.0000       2.765024 -27.491106 33.02115 -43.50773 49.03778
## 171.3333       2.351842 -28.271849 32.97553 -44.48305 49.18674
## 171.6667       2.674214 -28.312678 33.66111 -44.71615 50.06458
## 172.0000       2.728966 -28.617047 34.07498 -45.21062 50.66856
## 172.3333       2.315784 -29.385534 34.01710 -46.16720 50.79877
## 172.6667       2.638156 -29.414529 34.69084 -46.38220 51.65851
## 173.0000       2.692908 -29.707457 35.09327 -46.85917 52.24499
## 173.3333       2.279726 -30.464872 35.02432 -47.79882 52.35827
## 173.6667       2.602098 -30.483152 35.68735 -47.99743 53.20162
## 174.0000       2.656850 -30.765699 36.07940 -48.45853 53.77223
## 174.3333       2.243668 -31.513048 36.00038 -49.38277 53.87011
## 174.6667       2.566039 -31.521567 36.65365 -49.56646 54.69854
## 175.0000       2.620792 -31.794641 37.03622 -50.01307 55.25465
## 175.3333       2.207610 -32.532787 36.94801 -50.92324 55.33846
summary(imodel3)
## 
## Forecast method: Holt-Winters' additive method
## 
## Model Information:
## Holt-Winters' additive method 
## 
## Call:
##  hw(y = iqHW, h = 52, seasonal = "additive", damped = FALSE) 
## 
##   Smoothing parameters:
##     alpha = 0.5565 
##     beta  = 1e-04 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = -0.1505 
##     b = -0.0136 
##     s = 0.0945 -0.2518 0.1573
## 
##   sigma:  6.5854
## 
##      AIC     AICc      BIC 
## 4694.342 4694.653 4727.598 
## 
## Error measures:
##                      ME     RMSE      MAE MPE MAPE     MASE        ACF1
## Training set 0.03339355 6.536368 3.623232 NaN  Inf 0.753234 -0.01440227
## 
## Forecasts:
##          Point Forecast      Lo 80    Hi 80     Lo 95    Hi 95
## 158.3333       2.820597  -5.618911 11.26011 -10.08652 15.72771
## 158.6667       3.142969  -6.515785 12.80172 -11.62882 17.91476
## 159.0000       3.197722  -7.543121 13.93856 -13.22898 19.62442
## 159.3333       2.784539  -8.939615 14.50869 -15.14601 20.71509
## 159.6667       3.106911  -9.524235 15.73806 -16.21076 22.42458
## 160.0000       3.161663 -10.315868 16.63920 -17.45044 23.77377
## 160.3333       2.748481 -11.525894 17.02286 -19.08229 24.57925
## 160.6667       3.070853 -11.958175 18.09988 -19.91406 26.05577
## 161.0000       3.125605 -12.622206 18.87342 -20.95859 27.20980
## 161.3333       2.712423 -13.723249 19.14810 -22.42377 27.84862
## 161.6667       3.034795 -14.061085 20.13067 -23.11110 29.18069
## 162.0000       3.089547 -14.642199 20.82129 -24.02882 30.20791
## 162.3333       2.676365 -15.669655 21.02239 -25.38145 30.73418
## 162.6667       2.998737 -15.941646 21.93912 -25.96808 31.96555
## 163.0000       3.053489 -16.463368 22.57035 -26.79497 32.90195
## 163.3333       2.640307 -17.436875 22.71749 -28.06509 33.34571
## 163.6667       2.962679 -17.659609 23.58497 -28.57639 34.50175
## 164.0000       3.017431 -18.136108 24.17097 -29.33412 35.36898
## 164.3333       2.604249 -19.067891 24.27639 -30.54043 35.74893
## 164.6667       2.926621 -19.251997 25.10524 -30.99265 36.84589
## 165.0000       2.981373 -19.692587 25.65533 -31.69546 37.65820
## 165.3333       2.568191 -20.590863 25.72724 -32.85053 37.98691
## 165.6667       2.890562 -20.743631 26.52476 -33.25482 39.03594
## 166.0000       2.945315 -21.154818 27.04545 -33.91266 39.80329
## 166.3333       2.532133 -22.025425 27.08969 -35.02541 40.08968
## 166.6667       2.854504 -22.152112 27.86112 -35.38982 41.09882
## 167.0000       2.909257 -22.538653 28.35717 -36.00996 41.82848
## 167.3333       2.496074 -23.385913 28.37806 -37.08701 42.07916
## 167.6667       2.818446 -23.490458 29.12735 -37.41755 43.05444
## 168.0000       2.873199 -23.855954 29.60235 -38.00551 43.75191
## 168.3333       2.460016 -24.683173 29.60321 -39.05191 43.97194
## 168.6667       2.782388 -24.768617 30.33339 -39.35324 44.91801
## 169.0000       2.837140 -25.115873 30.79015 -39.91330 45.58759
## 169.3333       2.423958 -25.925646 30.77356 -40.93302 45.78094
## 169.6667       2.746330 -25.994393 31.48705 -41.20881 46.70147
## 170.0000       2.801082 -26.325645 31.92781 -41.74440 47.34657
## 170.3333       2.387900 -27.120053 31.89585 -42.74062 47.51642
## 170.6667       2.710272 -27.174044 32.59459 -42.99385 48.41439
## 171.0000       2.765024 -27.491106 33.02115 -43.50773 49.03778
## 171.3333       2.351842 -28.271849 32.97553 -44.48305 49.18674
## 171.6667       2.674214 -28.312678 33.66111 -44.71615 50.06458
## 172.0000       2.728966 -28.617047 34.07498 -45.21062 50.66856
## 172.3333       2.315784 -29.385534 34.01710 -46.16720 50.79877
## 172.6667       2.638156 -29.414529 34.69084 -46.38220 51.65851
## 173.0000       2.692908 -29.707457 35.09327 -46.85917 52.24499
## 173.3333       2.279726 -30.464872 35.02432 -47.79882 52.35827
## 173.6667       2.602098 -30.483152 35.68735 -47.99743 53.20162
## 174.0000       2.656850 -30.765699 36.07940 -48.45853 53.77223
## 174.3333       2.243668 -31.513048 36.00038 -49.38277 53.87011
## 174.6667       2.566039 -31.521567 36.65365 -49.56646 54.69854
## 175.0000       2.620792 -31.794641 37.03622 -50.01307 55.25465
## 175.3333       2.207610 -32.532787 36.94801 -50.92324 55.33846
checkresiduals(imodel3)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt-Winters' additive method
## Q* = 19.038, df = 3, p-value = 0.0002685
## 
## Model df: 7.   Total lags used: 10

Performance / Accuracy

The models created each had benefits and limitations in their formulation that provided varying results in predicting the total number of cases in each city. The ETS model showed strong performance in terms of its residual output and the Ljung-Box test p-value of 0.0001667. The forecast range produced was fairly similar to the ARIMA model which makes sense based on the auto selection procedure used and the make-ups of each model. The selected ETS model produced AIC, AICc, and BIC values of 8462, 8462, 8475 and 4683, 4683, 4695 respectively for San Juan and Iquitos. The ETS model for San Juan performed slightly better when compared to Iquitos. This model performed best in terms of these values with other ETS models created in comparison. The ARIMA models produced showed similar results when compared to the ETS models. The models produced AIC, AICc, and BIC values of 5170, 5170, 5733 and 3109, 3109, 3117 respectively for San Juan and Iquitos. In this case, the Iquitos model clearly outperformed San Juan in terms of predictive power and model fit. Finally, the Holt Winters’ Seasonal method produced AIC, AICc, and BIC values of 8472, 8472, 8509 and 4694, 4694, 4727 respectively for San Juan and Iquitos. Iquitos outperformed San Juan in the Holt Winters’ model but a large margin and proved to be a better predictor of total cases. It is important to note that these values should not be compared across models and exist for comparison within each model. In evaluating the plotted forecast and the model fit values, the Holt Winters’ seasonal method performed the best of the three techniques in terms of predicting total cases of Dengue Fever. Within this model, Iquitos showed a much stronger forecasting ability and overall model fit. The Holt model also showed a favorable residuals for each model in terms of white nose in the ACF plot and overall shape of the residual distribution. Ultimately, the Holt Winter’s Seasonal method was the best model produced overall and should be chosen in the given analysis.

Limitations

There were a number of limitations in terms of the data set and forecasting models selected that could be improved on in future work. A majority of the time spent in this analysis was aimed towards data exploration and evaluating the various components of the time series data such as month, year, and week start date. These time components showed a major seasonal trends that proved useful in formulating the discussed time series models. One limitation of the analysis the types of models used in predicting total cases of Dengue Fever. While the models were appropriate and have been used in similar situations, the overall fit could be improved with more advanced techniques and further investigation into variable significance. Introducing techniques such as a Neural Net model could improve the accuracy of future predictions. Another limitation came in the form of the data provided. As discussed, future improvements to the data such as adding variables that more closely document variables like mosquito population and cultural events could add to the predictive power of the model. Data collection is an expensive process and the overall data sets provided give a large range of variables to make predictions on. The availability of time and increased forecasting technique knowledge will improve future studies of the Predicting Disease Spread project.

Future Work

The analysis developed from the provided data from the Center for Disease Control and Prevention gives valuable predictions into future cases of Dengue Fever and the factors that lead to its spread. The predictions made can be improved in future work with more advanced forecasting models, improved environmental variables, and outside research. The models created serve the purpose of making predictions for the number of total cases in the future but could be improved with more specialized and advanced techniques. The forecast ranges produced were fairly large which can cause issues in planning for the spread of harmful disease. A more focused approach would allow disease prevention agencies to anticipate the resources and time required for aid. Models such as a Neural Net provide a more focused forecast range in combination with bootstrapping. The datasets provided could also incorporate additional environmental factors specific to each city such as average mosquito population, vaccination rates, and birth rate. These values can help to predict the increase in population of each city and the carriers of the disease. Outside research would also provide valuable insight into topics like the income of the population most effected by Dengue Fever. This would allow for a more focused approach on the population most at risk from catching the viral disease. The availability of these additional resources and time could produce stronger predictions in future work to have a better understanding of the total cases of Dengue Fever in San Juan and Iquitos. A more refined approach in future work can provide valuable forecasts to prevent the spread of a harmful disease in the targeted cities.