Cholera Reported Cases in Africa

AFricaCDC

2024-10-05

Introduction on Cholera Reported Cases

This document provides an analysis of cholera cases using time series techniques. The analysis includes visualizing the data, checking stationarity, and modeling with ARIMA.

## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## Registered S3 methods overwritten by 'forecast':
##   method                 from     
##   autoplot.Arima         ggfortify
##   autoplot.acf           ggfortify
##   autoplot.ar            ggfortify
##   autoplot.bats          ggfortify
##   autoplot.decomposed.ts ggfortify
##   autoplot.ets           ggfortify
##   autoplot.forecast      ggfortify
##   autoplot.stl           ggfortify
##   autoplot.ts            ggfortify
##   fitted.ar              ggfortify
##   fortify.ts             ggfortify
##   residuals.ar           ggfortify

Load and Visualize Data on Cholera Reported Cases

##    reported_cases
## 1           11086
## 2           72654
## 3            5137
## 4            6337
## 5            6074
## 6            6650
## 7            3180
## 8            9502
## 9           24643
## 10          21586
## 11          18742
## 12          19415
## 13          46924
## 14          37383
## 15          17504
## 16          31884
## 17          35585
## 18          34358
## 19          23012
## 20          35857
## 21          43262
## 22         153367
## 23          92079
## 24          76713
## 25         162413
## 26          72597
## 27         108535
## 28         118367
## 29         211761
## 30         210820
## 31         124484
## 32         173954
## 33         137866
## 34         108067
## 35          95560
## 36         125018
## 37         234226
## 38         167298
## 39         179323
## 40         217333
## 41         115106
## 42         188678
## 43         117570
## 44          56329
## 45         105287
## 46          71176
## 47          71058
## 48         179835
## 49         120650
## 50          55087
## 51          47256
## 52         141467
## 53         112282
## 54         177521
## 55         100317
## Time Series:
## Start = 1970 
## End = 2024 
## Frequency = 1 
##       reported_cases
##  [1,]          11086
##  [2,]          72654
##  [3,]           5137
##  [4,]           6337
##  [5,]           6074
##  [6,]           6650
##  [7,]           3180
##  [8,]           9502
##  [9,]          24643
## [10,]          21586
## [11,]          18742
## [12,]          19415
## [13,]          46924
## [14,]          37383
## [15,]          17504
## [16,]          31884
## [17,]          35585
## [18,]          34358
## [19,]          23012
## [20,]          35857
## [21,]          43262
## [22,]         153367
## [23,]          92079
## [24,]          76713
## [25,]         162413
## [26,]          72597
## [27,]         108535
## [28,]         118367
## [29,]         211761
## [30,]         210820
## [31,]         124484
## [32,]         173954
## [33,]         137866
## [34,]         108067
## [35,]          95560
## [36,]         125018
## [37,]         234226
## [38,]         167298
## [39,]         179323
## [40,]         217333
## [41,]         115106
## [42,]         188678
## [43,]         117570
## [44,]          56329
## [45,]         105287
## [46,]          71176
## [47,]          71058
## [48,]         179835
## [49,]         120650
## [50,]          55087
## [51,]          47256
## [52,]         141467
## [53,]         112282
## [54,]         177521
## [55,]         100317

Check for Stationarity and Differencing on Cholera Reported Cases

ADF Unit Root Test for Stationarity on Cholera Reported Cases

## 
##  Augmented Dickey-Fuller Test
## 
## data:  cases.ts
## Dickey-Fuller = -2.0897, Lag order = 3, p-value = 0.5384
## alternative hypothesis: stationary
## Warning in adf.test(cases.ts_diff1): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  cases.ts_diff1
## Dickey-Fuller = -5.296, Lag order = 3, p-value = 0.01
## alternative hypothesis: stationary

Fractional Differencing and Long Memory Test on Cholera Reported Cases

## $d
## [1] 1.062263
## 
## $sd.as
## [1] 0.6192913
## 
## $sd.reg
## [1] 0.6941739
## $d
## [1] 1.062263
## 
## $sd.as
## [1] 0.6192913
## 
## $sd.reg
## [1] 0.6941739
## $d
## [1] 1.062263
## 
## $sd.as
## [1] 0.6192913
## 
## $sd.reg
## [1] 0.6941739

ARIMA Modeling on Cholera Reported Cases

## Series: cases.ts_diff1 
## ARIMA(2,1,2) 
## 
## Coefficients:
##          ar1      ar2      ma1     ma2
##       0.0482  -0.0489  -1.6252  0.6253
## s.e.  0.2890   0.2064   0.2903  0.2787
## 
## sigma^2 = 2.246e+09:  log likelihood = -646.92
## AIC=1303.84   AICc=1305.11   BIC=1313.69
## 
## Training set error measures:
##                     ME     RMSE     MAE      MPE     MAPE      MASE
## Training set -969.8791 45148.98 32370.8 437.2808 667.6828 0.4806923
##                      ACF1
## Training set -0.002766728

Forecasting and Model Evaluation on Cholera Reported Cases

## RMSE: 130737.7
## MAE: 112511
## MAPE: 174.4429 %

Conclusion on Cholera Reported Cases

This analysis covers the basic steps in exploring and modeling cholera case data using time series methods. Further refinement of models and diagnostics could be pursued to enhance predictive accuracy.

Introduction on Death due to Cholera

This document provides an analysis death cases due to cholera using time series techniques. The analysis includes visualizing the data, checking stationarity, and modeling with ARIMA.

Load and Visualize Data on Death due to Cholera

## Time Series:
## Start = 1970 
## End = 2024 
## Frequency = 1 
##       death
##  [1,]   747
##  [2,] 11427
##  [3,]   386
##  [4,]   636
##  [5,]   582
##  [6,]   504
##  [7,]   194
##  [8,]   462
##  [9,]  1591
## [10,]  1869
## [11,]  1185
## [12,]  1581
## [13,]  2988
## [14,]  1903
## [15,]  1711
## [16,]  3837
## [17,]  3490
## [18,]  2610
## [19,]  2237
## [20,]  1443
## [21,]  2167
## [22,] 13998
## [23,]  5319
## [24,]  2542
## [25,]  8136
## [26,]  2962
## [27,]  5935
## [28,]  5853
## [29,]  9858
## [30,]  8707
## [31,]  4960
## [32,]  2752
## [33,]  4551
## [34,]  1884
## [35,]  2331
## [36,]  2230
## [37,]  6292
## [38,]  3996
## [39,]  5074
## [40,]  4883
## [41,]  3397
## [42,]  4148
## [43,]  2042
## [44,]  1366
## [45,]  1882
## [46,]   937
## [47,]  1762
## [48,]  3217
## [49,]  2436
## [50,]   880
## [51,]   741
## [52,]  4094
## [53,]  2495
## [54,]  1745
## [55,]  1379

Check for Stationarity and Differencing on Death due to Cholera

ADF Unit Root Test for Stationarity on Death due to Cholera

## 
##  Augmented Dickey-Fuller Test
## 
## data:  death.ts
## Dickey-Fuller = -2.2376, Lag order = 3, p-value = 0.4788
## alternative hypothesis: stationary
## Warning in adf.test(death.ts_diff1): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  death.ts_diff1
## Dickey-Fuller = -5.3765, Lag order = 3, p-value = 0.01
## alternative hypothesis: stationary

Fractional Differencing and Long Memory Test on Death due to Cholera

## $d
## [1] 0.8461494
## 
## $sd.as
## [1] 0.6192913
## 
## $sd.reg
## [1] 0.4145109
## $d
## [1] 0.8461494
## 
## $sd.as
## [1] 0.6192913
## 
## $sd.reg
## [1] 0.4145109
## $d
## [1] 0.8461494
## 
## $sd.as
## [1] 0.6192913
## 
## $sd.reg
## [1] 0.4145109

ARIMA Modeling on Death due to Cholera

## Series: death.ts_diff1 
## ARIMA(2,1,2) 
## 
## Coefficients:
##           ar1      ar2      ma1     ma2
##       -0.0554  -0.1083  -1.6733  0.6733
## s.e.   0.2301   0.2093   0.2318  0.2190
## 
## sigma^2 = 8216034:  log likelihood = -498.66
## AIC=1007.33   AICc=1008.61   BIC=1017.18
## 
## Training set error measures:
##                     ME     RMSE      MAE      MPE     MAPE      MASE      ACF1
## Training set -82.56852 2730.438 1762.568 84.76194 221.9807 0.5033628 0.1352915

Forecasting and Model Evaluation on Death due to Cholera

cases

##      Point Forecast      Lo 80     Hi 80      Lo 95     Hi 95
## 2025      -631.0939  -6471.863  5209.675  -9563.779  8301.591
## 2026      -448.0867  -6562.465  5666.291  -9799.221  8903.048
## 2027      -574.4255  -8221.251  7072.400 -12269.237 11120.386
## 2028      -487.2076  -8586.946  7612.531 -12874.689 11900.274
## 2029      -547.4184  -9579.752  8484.916 -14361.181 13266.344
## 2030      -505.8520 -10049.315  9037.611 -15101.318 14089.615
## 2031      -534.5473 -10775.070  9705.975 -16196.075 15126.981
## 2032      -514.7375 -11266.842 10237.367 -16958.662 15929.187
## 2033      -528.4132 -11860.352 10803.526 -17859.118 16802.292
## 2034      -518.9722 -12341.514 11303.570 -18599.990 17562.046
## 2035      -525.4898 -12859.215 11808.235 -19388.295 18337.315
## 2036      -520.9904 -13318.731 12276.750 -20093.445 19051.464
## 2037      -524.0965 -13787.415 12739.222 -20808.592 19760.399
## 2038      -521.9522 -14223.127 13179.222 -21476.091 20432.186
## 2039      -523.4325 -14656.866 13610.001 -22138.654 21091.789
## 2040      -522.4106 -15069.920 14025.099 -22770.906 21726.085
## 2041      -523.1161 -15476.827 14430.595 -23392.843 22346.611
## 2042      -522.6290 -15869.379 14824.121 -23993.458 22948.200
## 2043      -522.9653 -16254.564 15208.633 -24582.369 23536.438
## 2044      -522.7331 -16628.890 15583.424 -25154.975 24109.508
## 2045      -522.8934 -16995.835 15950.048 -25716.083 24670.296
## 2046      -522.7828 -17354.017 16308.452 -26263.934 25218.369
## 2047      -522.8591 -17705.255 16659.537 -26801.066 25755.348
## 2048      -522.8064 -18049.101 17003.488 -27326.960 26281.348
## 2049      -522.8428 -18386.571 17340.886 -27843.058 26797.372
## 2050      -522.8177 -18717.618 17671.983 -28349.364 27303.728
## 2051      -522.8350 -19042.861 17997.191 -28846.771 27801.101
## 2052      -522.8231 -19362.413 18316.767 -29335.490 28289.844
## 2053      -522.8313 -19676.688 18631.025 -29816.127 28770.464
## 2054      -522.8256 -19985.852 18940.201 -30288.956 29243.304
## 2055      -522.8296 -20290.206 19244.547 -30754.423 29708.764
## 2056      -522.8268 -20589.928 19544.274 -31212.809 30167.156
## 2057      -522.8287 -20885.250 19839.592 -31664.464 30618.807
## 2058      -522.8274 -21176.341 20130.687 -32109.651 31063.997
## 2059      -522.8283 -21463.393 20417.736 -32548.658 31503.001
## 2060      -522.8277 -21746.558 20700.903 -32981.722 31936.067
## 2061      -522.8281 -22025.997 20980.341 -33409.088 32363.431
## 2062      -522.8278 -22301.850 21256.194 -33830.968 32785.312
## 2063      -522.8280 -22574.253 21528.597 -34247.573 33201.917
## 2064      -522.8279 -22843.331 21797.676 -34659.092 33613.437
## 2065      -522.8280 -23109.205 22063.549 -35065.711 34020.055
## 2066      -522.8279 -23371.984 22326.329 -35467.597 34421.941
## 2067      -522.8280 -23631.776 22586.120 -35864.914 34819.258
## 2068      -522.8279 -23888.680 22843.024 -36257.814 35212.158
## 2069      -522.8280 -24142.789 23097.133 -36646.441 35600.785
## 2070      -522.8279 -24394.193 23348.538 -37030.931 35985.275
## 2071      -522.8279 -24642.978 23597.322 -37411.414 36365.758
## 2072      -522.8279 -24889.222 23843.566 -37788.012 36742.356
## 2073      -522.8279 -25133.003 24087.347 -38160.842 37115.186
## 2074      -522.8279 -25374.392 24328.736 -38530.015 37484.359
## 2075      -522.8279 -25613.459 24567.803 -38895.637 37849.981
## 2076      -522.8279 -25850.270 24804.614 -39257.807 38212.151
## 2077      -522.8279 -26084.886 25039.231 -39616.623 38570.967
## 2078      -522.8279 -26317.369 25271.713 -39972.175 38926.519
## 2079      -522.8279 -26547.776 25502.120 -40324.551 39278.895
## RMSE: 4693.267
## MAE: 3766.429
## MAPE: 135.6387 %
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
##    death
## 1    747
## 2  11427
## 3    386
## 4    636
## 5    582
## 6    504
## 7    194
## 8    462
## 9   1591
## 10  1869
## 11  1185
## 12  1581
## 13  2988
## 14  1903
## 15  1711
## 16  3837
## 17  3490
## 18  2610
## 19  2237
## 20  1443
## 21  2167
## 22 13998
## 23  5319
## 24  2542
## 25  8136
## 26  2962
## 27  5935
## 28  5853
## 29  9858
## 30  8707
## 31  4960
## 32  2752
## 33  4551
## 34  1884
## 35  2331
## 36  2230
## 37  6292
## 38  3996
## 39  5074
## 40  4883
## 41  3397
## 42  4148
## 43  2042
## 44  1366
## 45  1882
## 46   937
## 47  1762
## 48  3217
## 49  2436
## 50   880
## 51   741
## 52  4094
## 53  2495
## 54  1745
## 55  1379
##    reported_cases
## 1           11086
## 2           72654
## 3            5137
## 4            6337
## 5            6074
## 6            6650
## 7            3180
## 8            9502
## 9           24643
## 10          21586
## 11          18742
## 12          19415
## 13          46924
## 14          37383
## 15          17504
## 16          31884
## 17          35585
## 18          34358
## 19          23012
## 20          35857
## 21          43262
## 22         153367
## 23          92079
## 24          76713
## 25         162413
## 26          72597
## 27         108535
## 28         118367
## 29         211761
## 30         210820
## 31         124484
## 32         173954
## 33         137866
## 34         108067
## 35          95560
## 36         125018
## 37         234226
## 38         167298
## 39         179323
## 40         217333
## 41         115106
## 42         188678
## 43         117570
## 44          56329
## 45         105287
## 46          71176
## 47          71058
## 48         179835
## 49         120650
## 50          55087
## 51          47256
## 52         141467
## 53         112282
## 54         177521
## 55         100317

## 
##  Augmented Dickey-Fuller Test
## 
## data:  cases.ts
## Dickey-Fuller = -2.0897, Lag order = 3, p-value = 0.5384
## alternative hypothesis: stationary

## 
## Call:
## arima(x = cases.ts, order = c(1, d, 1), seasonal = list(order = c(1, 0, 1), 
##     period = 1))
## 
## Coefficients:
## Warning in sqrt(diag(x$var.coef)): NaNs produced
##          ar1      ma1    sar1     sma1
##       0.2715  -0.5632  0.2715  -0.5632
## s.e.     NaN   1.2227     NaN   1.2227
## 
## sigma^2 estimated as 2.062e+09:  log likelihood = -655.94,  aic = 1321.89
## 
## Training set error measures:
##                    ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 4953.194 44996.19 32512.04 -31.29897 62.92448 0.8199362
##                     ACF1
## Training set -0.02012548

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Don't know how to automatically pick scale for object of type <ts>. Defaulting
## to continuous.

##      Point Forecast    Lo 80    Hi 80     Lo 95    Hi 95
## 2025       115067.7 56871.29 173264.1 26063.960 204071.5
## 2026       117963.2 54919.77 181006.6 21546.607 214379.7
## 2027       118448.1 52315.79 184580.3 17307.475 219588.7
## 2028       118497.9 49360.62 187635.2 12761.534 224234.3
## 2029       118489.3 46354.71 190623.8  8168.981 228809.6

Correlation Plots between reported cases and deaths due to Cholera outbreak

## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...

Basic Scatter Plot with Trend Line

Scatter Plot with Color Gradient based on Density

## 
## Attaching package: 'MASS'
## The following object is masked from 'package:plotly':
## 
##     select
## The following object is masked from 'package:dplyr':
## 
##     select
## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...

Scatter Plot with Marginal Histograms

## Warning: No trace type specified and no positional attributes specified
## No trace type specified:
##   Based on info supplied, a 'scatter' trace seems appropriate.
##   Read more about this trace type -> https://plotly.com/r/reference/#scatter
## No scatter mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode

Animated Scatter Plot over Time

Conclusion on Death due to Cholera

This analysis covers the basic steps in exploring and modeling cholera reported cases and death data using time series methods while exploring the relationship between the reported cases and death as reported by the member states for the development of the public health intelligence report for Africa.