Modelling Cholera Transmission in Nigeria

Cholera Transmission in Nigeria

INTRODUCTION

Cholera is an infectious transmissible diarrheal disease caused by the ingestion of contaminated water or food with the bacterium Vibrio cholerae (Awofeso & Aldbak, 2018).

It is undoubtedly among the diseases of public health importance in many developing countries, including Nigeria (Ajayi & Smith, 2019; Ngwa et al., 2021). The disease is predominant in areas lacking safe drinking water, adequate sanitation and hygiene, such as those typically found in Internally Displaced Persons (IDP) camps in Nigeria. (Chansa et al., 2024; Mike-Ogburia et al., 2024; Salako et al., 2021).

The infection is often mild or without symptoms, but sometimes it can be severe. Infected persons may develop severe disease characterized by profuse watery diarrhea, vomiting, leg cramps and if untreated, it leads to rapid dehydration, acidosis, circulatory collapse and death within 12–24 h (Sanchez et al., 1994, WHO, 2008). Despite being easily preventable and treatable, cholera remains a major cause of morbidity and mortality in regions with inadequate water, sanitation, and hygiene (WASH) infrastructure.(Lauren D’Mello-Guyett et al.,2020).

The disease is predominant in areas lacking safe drinking water, adequate sanitation and hygiene, such as those typically found in Internally Displaced Persons (IDP) camps in Nigeria (Chansa et al., 2024; Mike-Ogburia et al., 2024; Salako et al., 2021).

AIM

To examine the trend of cholera in Nigeria and evaluate how WASH interventions help in reducing cholera outbreaks.

Research Questions

What has been the trend and pattern of cholera cases in Nigeria?
How have WASH (Water, Sanitation and Hygiene) interventions impacted cholera transmission and case reduction in Nigeria?

Research Objectives

To examine the trend and pattern of cholera cases in Nigeria over the study period.
To investigate the effect of WASH interventions and treatment measures on cholera transmission in Nigeria.

Literature Reviews

(Mike-Ogburia et al., 2024) examined cholera outbreaks in Nigeria from 2010–2024 and found that recurrent epidemics are driven by poor WASH conditions, seasonal flooding, and conflict-related displacement. Northern regions and overcrowded IDP camps were the most affected. They also reported rising cases in young children, multiple outbreak waves, and increasing antibiotic-resistant Vibrio cholerae. Although surveillance, treatment, WASH interventions, and vaccination campaigns have helped, gaps in healthcare capacity and vaccine supply continue to hinder effective control.
(Charnley et al., 2022) reported that Cholera remains endemic in over 50 countries, with Nigeria having the second-highest burden. Using co variate selection and machine-learning models, the authors linked cholera transmission to environmental extremes (floods, droughts), conflict, poverty, and poor sanitation. Their model showed that improved sanitation and reduced poverty significantly protect populations—even during climate shocks. A traffic-light early-warning system was developed to signal high-risk periods. The study highlights that addressing poverty, sanitation, and social instability is essential for reducing cholera outbreaks in Nigeria.

Research Methodology

Study Area

The study was conducted using cholera data aggregated at the national level across all 36 states in Nigeria, including the Federal Capital Territory (FCT). Nigeria is located in West Africa between latitudes 4°N–14°N and longitudes 3°E–15°E. It shares boundaries with Benin Republic to the west, Niger to the north, Chad to the northeast, Cameroon to the east, and the Atlantic Ocean to the south. With an estimated population of over 220 million people, the country experiences diverse climatic conditions that influence cholera transmission patterns. The Nigeria hotspot map used in this study, showing the spatial distribution of cholera cases across the states and highlighting high-risk areas, particularly in the northern regions.

Study Area

Data Description

The data used in this study was obtained from the Nigeria Centre for Disease Control (NCDC) and contains monthly records of suspected and confirmed cholera cases in Nigeria from 2018 to 2024. The dataset represents aggregated national reports covering the 36 states and the Federal Capital Territory (FCT). Altogether, the dataset spans 81 months, from May 2018 to December 2024.

During data extraction, inconsistencies were observed in monthly reporting on the NCDC platform. The most notable gap occurred in 2020, where only one monthly record (December) was available, while all other months were missing. Similar missing months appeared in other years as well, but these were fewer compared to the gaps seen in 2020. To maintain a continuous monthly time series, the missing months were assigned a value of 1, rather than zero, to indicate the absence of reported data rather than an actual absence of cases.

In total, the dataset contains 8,590 confirmed cholera cases within the study period, with an approximate monthly average of 195 confirmed cases. Clear outbreak patterns were evident in the data. A major peak occurred in 2021, beginning with an increase in August (316 cases) and reaching the highest recorded value in December (943 cases). Another significant surge was recorded in 2024, with high case counts from July to September, including 660 cases in July and 650 cases in August. These periods represent the most prominent cholera clusters within the dataset.

Time Series Analysis

Required packages

library(tidyr)
library(dplyr)
library(ggplot2)
library(caret)
library(sf)
require(tseries)
require(forecast)
library(corrplot)
library(rnaturalearth)
library(devtools)
library(rnaturalearthdata)
library(ggspatial)
library(Metrics)

Extract cholera cases

cholera_data<-read.csv("~/GLADYS FOLDER .R/Cholera_dataaaa.csv")
#View(cholera_data)

Converting date to date format

dates <- seq(as.Date("2018-01-01"),as.Date("2024-12-01"),by = "month")
head(dates)

## [1] "2018-01-01" "2018-02-01" "2018-03-01" "2018-04-01" "2018-05-01"
## [6] "2018-06-01"

plot(dates,cholera_data$ConfirmedCases,type = "l",col="red")

Checking Stationary

adf_result<-adf.test(cholera_data$ConfirmedCases)

## Warning in adf.test(cholera_data$ConfirmedCases): p-value smaller than printed
## p-value

print(adf_result)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  cholera_data$ConfirmedCases
## Dickey-Fuller = -4.458, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary

# Extract exogenous variables
rainfall <- cholera_data$Rainfall
humidity <- cholera_data$Humidity

Log transformation

cholera_data<- cholera_data$ConfirmedCases+1
log_cholera_data<-log(cholera_data)
plot(log_cholera_data,type= "l")

ts_cholera_data <- ts( log_cholera_data, start = c(2018,1), frequency = 12)
ts_rain <- ts(rainfall, start = c(2018,1), frequency = 12)
ts_humid <- ts(humidity, start = c(2018,1), frequency = 12)

plot(ts_cholera_data)

decomp<-stl(ts_cholera_data,s.window = "periodic")
plot(decomp,col.range = "red")

Training and Testing data

n <- length(ts_cholera_data)
n_train <- floor(0.8 * n)

train_cholera_data <- window(ts_cholera_data, end = time(ts_cholera_data)[n_train])
test_cholera_data  <- window(ts_cholera_data, start = time(ts_cholera_data)[n_train + 1])

train_rain <- window(ts_rain, end = time(ts_rain)[n_train])
test_rain  <- window(ts_rain, start = time(ts_rain)[n_train + 1])

train_humid <- window(ts_humid, end = time(ts_humid)[n_train])
test_humid  <- window(ts_humid, start = time(ts_humid)[n_train + 1])

# Combine exogenous variables
train_xreg <- cbind(train_rain, train_humid)
test_xreg  <- cbind(test_rain, test_humid)

Auto.arima

model_auto <- auto.arima(train_cholera_data, seasonal = TRUE,
                              ic = "aic",
                              trace = TRUE)

## 
##  ARIMA(2,0,2)(1,0,1)[12] with non-zero mean : 236.3737
##  ARIMA(0,0,0)            with non-zero mean : 283.6322
##  ARIMA(1,0,0)(1,0,0)[12] with non-zero mean : 229.9084
##  ARIMA(0,0,1)(0,0,1)[12] with non-zero mean : 245.6296
##  ARIMA(0,0,0)            with zero mean     : 342.5922
##  ARIMA(1,0,0)            with non-zero mean : 227.9085
##  ARIMA(1,0,0)(0,0,1)[12] with non-zero mean : 229.9084
##  ARIMA(1,0,0)(1,0,1)[12] with non-zero mean : 231.664
##  ARIMA(2,0,0)            with non-zero mean : 228.6632
##  ARIMA(1,0,1)            with non-zero mean : 228.6917
##  ARIMA(0,0,1)            with non-zero mean : 244.2091
##  ARIMA(2,0,1)            with non-zero mean : 230.6355
##  ARIMA(1,0,0)            with zero mean     : 232.0472
## 
##  Best model: ARIMA(1,0,0)            with non-zero mean

Model fitting

model1 = Arima(train_cholera_data, order = c(1,0,0))
model2 = Arima(train_cholera_data,order = c(1,0,0),seasonal = list(order = c(0,0,1), period = 12))
model3 = Arima(train_cholera_data,order = c(1,0,0),seasonal = list(order = c(1,0,1), period = 12))
model4 = Arima(train_cholera_data,order = c(2,0,2),seasonal = list(order = c(1,0,1), period = 12))
model5 = Arima(train_cholera_data,order = c(1,0,0),seasonal = list(order = c(0,1,1), period = 12))

forecast

forecast1 <- forecast(model1,h = length(test_cholera_data))
forecast2 <- forecast(model2,h =length(test_cholera_data ))
forecast3 <- forecast(model3,h = length(test_cholera_data ))
forecast4 <- forecast(model4,h = length(test_cholera_data ))
forecast5 <- forecast(model5,h =length(test_cholera_data ))

Root Mean Square Error

rmse1 <- rmse(test_cholera_data, forecast1$mean)
rmse2 <- rmse(test_cholera_data, forecast2$mean)
rmse3 <- rmse(test_cholera_data, forecast3$mean)
rmse4 <- rmse(test_cholera_data, forecast4$mean)
rmse5 <- rmse(test_cholera_data, forecast5$mean)
RMSE= c(rmse1,rmse2, rmse3,rmse4,rmse5)
RMSE

## [1] 2.397805 2.397102 2.343880 2.389193 2.190628

Mean Absolute Error

mae1  <- mae(test_cholera_data, forecast1$mean)
mae2   <- mae(test_cholera_data, forecast2$mean)
mae3   <- mae(test_cholera_data, forecast3$mean)
mae4   <- mae(test_cholera_data, forecast4$mean)
mae5   <- mae(test_cholera_data, forecast5$mean)
MAE= c(mae1,mae2,mae3,mae4,mae5)
MAE

## [1] 2.109525 2.108861 2.067711 2.122549 2.028944

Train and test best model

model <- Arima(train_cholera_data,order = c(1,0,0),seasonal = list(order = c(0,1,1), period = 12))
fit <- forecast(model,h =length(test_cholera_data ))

Train test fit

plot(fit, main="Train-Test fit",
     ylab="log(Cases)", xlab="Time", col="black", lwd=2)
lines(test_cholera_data, col="red", lwd=2, lty=2)
legend("topleft", legend=c("Forecast", "Actual"),
       col=c("blue", "red"), lty=c(1,2))

Full forecast

model_actual <- Arima(ts_cholera_data,order = c(1,0,0),seasonal = list(order = c(0,1,1), period = 12))
fit1 <- forecast(model_actual,h =24)
plot(fit1, ylab="log(Cases)", xlab="Time (Months)",lwd=2, main="Cholera Cases")
legend("topleft", c("cases", "Forecast","80% CI",
                    "95% CI"), cex=0.7, col=c(1, 4,"lightblue",8),lty=c(1, 1, 1, 2))

Back-transform the forecast mean

mean_original <- exp(fit1$mean)

#--------Back-transform the 80% and 95% prediction intervals---------
lower_80 <- exp(fit1$lower[ , "80%"])
upper_80 <- exp(fit1$upper[ , "80%"])
lower_95 <- exp(fit1$lower[ , "95%"])
upper_95 <- exp(fit1$upper[ , "95%"])

#--------Back-transform mean and prediction intervals---------
mean_original <- exp(fit1$mean)
lower_80 <- exp(fit1$lower[ , "80%"])
upper_80 <- exp(fit1$upper[ , "80%"])
lower_95 <- exp(fit1$lower[ , "95%"])
upper_95 <- exp(fit1$upper[ , "95%"])

Combine into a data frame

#-------Create time variable------
time_months <- time(mean_original)


forecast_df <- data.frame(
  Time = time_months,
  Mean = mean_original,
  Lower_80 = lower_80,
  Upper_80 = upper_80,
  Lower_95 = lower_95,
  Upper_95 = upper_95
)

print(forecast_df)

##        Time      Mean  Lower_80   Upper_80   Lower_95    Upper_95
## 1  2025.000  1.925122 0.2593862   14.28794 0.08976893    41.28482
## 2  2025.083  3.715725 0.3096111   44.59342 0.08308223   166.18014
## 3  2025.167  5.509685 0.3669436   82.72831 0.08745475   347.11240
## 4  2025.250  7.959582 0.4732460  133.87318 0.10621501   596.47833
## 5  2025.333  9.936179 0.5567333  177.33382 0.12108907   815.33080
## 6  2025.417 56.298217 3.0566664 1036.91042 0.65383317  4847.55038
## 7  2025.500 55.822463 2.9802986 1045.58228 0.63184844  4931.79570
## 8  2025.583 86.374511 4.5702173 1632.42917 0.96433006  7736.51724
## 9  2025.667 62.839721 3.3092355 1193.27580 0.69650970  5669.45528
## 10 2025.750  9.289034 0.4880119  176.81156 0.10258474   841.12065
## 11 2025.833  9.999756 0.5248451  190.52311 0.11027122   906.81068
## 12 2025.917  3.190932 0.1674910   60.79161 0.03519165   289.33138
## 13 2026.000  4.513788 0.2221455   91.71593 0.04511025   451.65528
## 14 2026.083  6.947633 0.3302333  146.16818 0.06583525   733.18775
## 15 2026.167  8.724386 0.4070427  186.99488 0.08035279   947.25905
## 16 2026.250 11.155391 0.5152934  241.49881 0.10118603  1229.84123
## 17 2026.333 12.731516 0.5849526  277.10191 0.11453907  1415.16339
## 18 2026.417 67.540135 3.0942398 1474.24572 0.60495904  7540.46051
## 19 2026.500 63.808264 2.9188198 1394.91124 0.57020236  7140.43788
## 20 2026.583 95.286498 4.3553774 2084.66815 0.85049095 10675.61811
## 21 2026.667 67.538871 3.0860695 1478.09345 0.60252372  7570.65474
## 22 2026.750  9.794268 0.4475263  214.35091 0.08737441  1097.89214
## 23 2026.833 10.396375 0.4751804  227.46018 0.09278824  1164.85251
## 24 2026.917  3.283399 0.1501815   71.78453 0.02933718   367.47601

Checking for residual

checkresiduals(fit1)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,0,0)(0,1,1)[12]
## Q* = 8.9727, df = 15, p-value = 0.8789
## 
## Model df: 2.   Total lags used: 17

Model plot

# Dataset
model_data <- data.frame(
  Models = c("ARIMA(1,0,0)",
             "ARIMA(1,0,0)(0,0,1)[12]",
             "ARIMA(1,0,0)(1,0,1)[12]",
             "ARIMA(2,0,2)(1,0,1)[12]",
             "ARIMA(1,0,0)(0,1,1)[12]"),
  RMSE = c(2.397805, 2.397102, 2.343880, 2.389193, 2.190628 ),
  MAE  = c(2.109525, 2.108861,  2.067711, 2.122549,  2.028944)
)

# Convert to long format
long_data <- model_data %>%
  pivot_longer(cols = RMSE:MAE, names_to = "Metric", values_to = "Value") %>%
  mutate(Highlight = ifelse(Models == "ARIMA(1,0,0)(0,1,1)[12]", "Best", "Others"))

##-------Plot---------
ggplot(long_data, aes(x = Models, y = Value, fill = Highlight)) +
  geom_col(width = 0.7) +
  facet_wrap(~ Metric, scales = "free_y") +
  scale_fill_manual(values = c("Best" = "maroon", "Others" = "lightgrey")) +
  theme_minimal(base_size = 12) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        plot.title = element_text(hjust = 0.5)) +
  labs(
    title = "Model Comparison (ARIMA Forecasts)",
    x = NULL,
    y = "Metric Value",
    fill = "Model Type"
  )

Heatmap Correlation Plot with the EXOGENOUS Variable

# Load data
cholera_data <- read.csv("~/GLADYS FOLDER .R/Cholera_dataaaa.csv")

# KEEP THE DATA AS A DATAFRAME. DO NOT UNLIST.
# Rename column ConfirmedCases → Cases
names(cholera_data)[names(cholera_data) == "ConfirmedCases"] <- "Cases"

cholera_df <- cholera_data

# --- Decompose ConfirmedCases into seasonal component ---
ts_cases <- ts(cholera_df$Cases, frequency = 12)

decomp <- stl(ts_cases, s.window = "periodic")

# Add seasonal effect to your dataframe
cholera_df$seasonal_effect <- decomp$time.series[, "seasonal"]

# --- Select variables for correlation ---
variables <- cholera_df[, c("Cases", "Rainfall", "Temperature", "Humidity")]

# --- Correlation matrix ---
corre_matrix <- cor(variables, use = "complete.obs")

# --- Plot correlation ---
corrplot(corre_matrix,
         method = "color",
         type = "upper",
         col = colorRampPalette(c("red", "white", "blue"))(200),
         addCoef.col = "black",
         tl.col = "black",
         number.cex = 0.7,
         tl.srt = 45)

Model Formulation

Model Variables

Human population is divided into:

\(S(t)\) – Susceptible

\(E(t)\) – Pre-infectious (exposed)

\(A(t)\) – Asymptomatic

\(I(t)\) – Symptomatic

\(R(t)\) – Recovered

\(N(t)\) – Total population size: S + E + A + I + R

The environment is represented by :

\(B(t)\) – Reservoir of v.cholerae

Flow Diagram

The schematic diagram of the compartmental deterministic model

Model Parameter

\(\Pi_1\) – Recruitment rate of humans into the susceptible population.

\(\Pi_2\) – Introduction of bacteria into the reservoir.

\(\lambda\) – Rate at which the susceptible individuals get infected.

\(\gamma\) – Rate at which asymptomatic class becomes infectious.

\(\omega_1\) – Recovery rate of asymptomatic individuals.

\(\omega_2\) – Recovery rate of symptomatic individuals.

\(\Theta\) – Proportion of exposed that becomes asymptomatic.

\(\tau\) – Natural death of bacteria.

\(\delta\) – Disease induced death.

\(\phi\) – Rate at which the recovered individuals goes back to the susceptible.

\((1-\Theta)\kappa\) – Proportion of exposed indviduals who become infectious.

\(\psi_1\) – Contribution of asymptomatic individuals to the bacteria reservoir.

\(\psi_2\) – Contribution of symptomatic individuals to the bacteria reservoir.

\(\eta\) – efficacy of WASH intervention.

\(\mu\) – Natural death rate of human population.

Model Equation

\[\frac{dS}{dt} = \Pi_1 + \psi R -(1-\eta)\lambda S -\mu S\]

\[\frac{dE}{dt} = (1-\eta) \lambda S - \kappa E - \mu E\]

\[\frac{dA}{dt} = \kappa\Theta E - (\gamma+ \omega _{1}+ \mu )A\]

\[\frac{dI}{dt} = \kappa (1-\Theta) E +\gamma A -(\omega_{2} + \mu +\delta) I\]

\[\frac{dR}{dt} = \omega_{1}A +\omega_{2} I -(\psi + \mu) R\]

\[\frac{dB}{dt} = \Pi_2 + (1-\eta)\phi_1 A + (1-\eta)\phi_2 I - \tau B \]

Conclusion

The time series forecast shows that cholera cases are expected to increase mainly between June and September, which is the rainy season, in both 2025 and 2026. This pattern agrees with previous studies in Nigeria, which report that cholera outbreaks usually become more common during periods of heavy rainfall and flooding. Abaje, I., Abdullahi, N., & Jeje, O. (2016).

During the rainy season, flooding and poor drainage can contaminate water sources, making people more exposed to cholera through unsafe drinking water and poor sanitation.

These findings show that cholera follows a predictable seasonal pattern. Using time series forecasts to prepare ahead of the rainy season—through improved water supply, sanitation, hygiene education, and early public health action—can help reduce future outbreaks (Ahmad & Prasad, 2023).

Modelling Cholera Transmission in Nigeria

Ibitoye Gladys

2025-12-16

Cholera Transmission in Nigeria

INTRODUCTION

AIM

Research Questions

Research Objectives

Literature Reviews

Research Methodology

Study Area

Data Description

Time Series Analysis

Extract cholera cases

Converting date to date format

Checking Stationary

Log transformation

Training and Testing data

Auto.arima

Model fitting

forecast

Root Mean Square Error

Mean Absolute Error

Train and test best model

Train test fit

Full forecast

Back-transform the forecast mean

Combine into a data frame

Checking for residual

Model plot

Heatmap Correlation Plot with the EXOGENOUS Variable

Model Formulation

Model Variables

Flow Diagram

Model Parameter

Model Equation

Conclusion