The aim of the research is to generate an early warning system based on malaria trends in Kaduna State for decision-making in malaria control.
What are the temporal patterns in malaria incidence in Kaduna State?
What are the future trends in malaria incidence in Kaduna State based on historical data?
The objectives of this study are to:
Examine and describe the temporal patterns of malaria incidence in Kaduna State.
Forecast future malaria incidence in Kaduna State using historical data.
In respect to objective 1
Historical confirmed uncomplicated malaria case data were obtained from the National Malaria Data Repository, and malaria incidence was estimated using population figures sourced from the Nigerian Population Commission data repository.
The malaria incidence time series was plotted and decomposed into its trend, seasonal, and residual components to enable a detailed examination of the underlying temporal patterns.
In respect to objective 2
setwd("C:\\Users\\USER\\Documents\\R")
kaduna_data <- read.csv("Kaduna_Dataa.csv")
summary(kaduna_data)
## periodname Confirmed.uncomplicated.Malaria
## Length:120 Min. : 6362
## Class :character 1st Qu.: 45606
## Mode :character Median :116448
## Mean :116347
## 3rd Qu.:168176
## Max. :279846
## Persons.with.clinically.diagnosed.Malaria Pop
## Min. : 11 Length:120
## 1st Qu.: 1774 Class :character
## Median : 5377 Mode :character
## Mean :15713
## 3rd Qu.:31398
## Max. :60096
## Rainfall Humidity Temperature Persons.with.fever
## Min. : 0.000 Min. :24.60 Min. :16.96 Min. : 11442
## 1st Qu.: 0.165 1st Qu.:49.73 1st Qu.:22.70 1st Qu.: 82649
## Median : 141.775 Median :72.15 Median :23.93 Median :158179
## Mean : 224.823 Mean :66.10 Mean :23.86 Mean :159375
## 3rd Qu.: 293.317 3rd Qu.:85.33 3rd Qu.:25.16 3rd Qu.:219428
## Max. :1552.160 Max. :91.24 Max. :28.77 Max. :350658
head(kaduna_data)
## periodname Confirmed.uncomplicated.Malaria
## 1 15-Jan 6362
## 2 15-Feb 20387
## 3 15-Mar 24538
## 4 15-Apr 23037
## 5 15-May 25563
## 6 15-Jun 31619
## Persons.with.clinically.diagnosed.Malaria Pop Rainfall Humidity
## 1 7016 1,046,000 0.00 29.05
## 2 22860 1,046,000 0.60 28.34
## 3 28344 1,046,000 2.44 32.81
## 4 25232 1,046,000 2.21 25.25
## 5 28807 1,046,000 56.76 56.19
## 6 33598 1,046,000 123.16 75.69
## Temperature Persons.with.fever
## 1 20.80 11442
## 2 26.94 38728
## 3 28.05 45905
## 4 28.70 42961
## 5 28.77 51961
## 6 25.97 58044
# Convert population values from character to numeric
kaduna_data$Pop <- as.numeric(gsub(",", "", kaduna_data$Pop))
# Calculate malaria incidence per 1000 inhabitants
Confirmed_Malaria_Incidence <-
(kaduna_data$Confirmed.uncomplicated.Malaria / kaduna_data$Pop) * 1000
kaduna_data <- kaduna_data %>%
mutate(Malaria_Incidence = round(Confirmed_Malaria_Incidence, 3))
Data_1 <- kaduna_data$Malaria_Incidence
#date
dates_data <- seq(as.Date("2015/1/1"), as.Date("2024/12/1"), by = "month")
plot(Data_1~dates_data, data=kaduna_data,col='red', xlab = "Time", type='l', lwd=3)
The plot indicates strong seasonal (temporal) patterns in malaria incidence in Kaduna State, with recurrent peaks typically occurring between May and October each year, corresponding to the rainy season, and lower incidence between November and March. Over time, there is a gradual upward trend, with malaria cases increasing from relatively low levels in 2015–2017, rising more sharply from 2018 onward, and reaching the highest levels between 2022 and 2024. This pattern reflects persistent seasonality alongside a growing malaria burden over the study period
#data("Kaduna_routine_2015_2024")
Data_2 <- log(Data_1) # log to stabilize variance
##
ts_data <- ts((Data_2),start= c(2015,1), frequency = 12) # specify frequency!
length(ts_data) # should be >= 24 for monthly data
## [1] 120
plot(ts_data, type= "l")
#To check if the data is stationary
adf.test(ts_data) # To check if the p-value of the data is less than 0.05
## Warning in adf.test(ts_data): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: ts_data
## Dickey-Fuller = -4.5652, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
# STL decomposition
decomp <- stl(ts_data, s.window = "periodic") # s.window = "periodic" for stable seasonality
# Plot decomposition
plot(decomp,col.range = "blue")
# Data Split
n=length(ts_data)
n_train=floor(0.7*n)
train=window(ts_data,end=time(ts_data)[n_train])
test=window(ts_data,start=time(ts_data)[n_train+1])
#fit arima model
model=auto.arima(train,ic="aic",trace=TRUE)
##
## ARIMA(2,1,2)(1,0,1)[12] with drift : Inf
## ARIMA(0,1,0) with drift : 1.051391
## ARIMA(1,1,0)(1,0,0)[12] with drift : 0.5957612
## ARIMA(0,1,1)(0,0,1)[12] with drift : 1.935375
## ARIMA(0,1,0) : 1.150643
## ARIMA(1,1,0) with drift : 2.729574
## ARIMA(1,1,0)(2,0,0)[12] with drift : 1.394708
## ARIMA(1,1,0)(1,0,1)[12] with drift : 0.2136217
## ARIMA(1,1,0)(0,0,1)[12] with drift : 1.658619
## ARIMA(1,1,0)(2,0,1)[12] with drift : 2.21345
## ARIMA(1,1,0)(1,0,2)[12] with drift : 2.213413
## ARIMA(1,1,0)(0,0,2)[12] with drift : 2.751596
## ARIMA(1,1,0)(2,0,2)[12] with drift : 4.17356
## ARIMA(0,1,0)(1,0,1)[12] with drift : Inf
## ARIMA(2,1,0)(1,0,1)[12] with drift : 1.699748
## ARIMA(1,1,1)(1,0,1)[12] with drift : 1.341303
## ARIMA(0,1,1)(1,0,1)[12] with drift : 0.4571998
## ARIMA(2,1,1)(1,0,1)[12] with drift : Inf
## ARIMA(1,1,0)(1,0,1)[12] : -0.68151
## ARIMA(1,1,0)(0,0,1)[12] : 1.720156
## ARIMA(1,1,0)(1,0,0)[12] : 0.3987677
## ARIMA(1,1,0)(2,0,1)[12] : 1.317541
## ARIMA(1,1,0)(1,0,2)[12] : 1.3173
## ARIMA(1,1,0) : 2.998669
## ARIMA(1,1,0)(0,0,2)[12] : 2.620064
## ARIMA(1,1,0)(2,0,0)[12] : 0.8265951
## ARIMA(1,1,0)(2,0,2)[12] : 3.278579
## ARIMA(0,1,0)(1,0,1)[12] : Inf
## ARIMA(2,1,0)(1,0,1)[12] : 0.7277321
## ARIMA(1,1,1)(1,0,1)[12] : 0.4522662
## ARIMA(0,1,1)(1,0,1)[12] : -0.4458432
## ARIMA(2,1,1)(1,0,1)[12] : Inf
##
## Best model: ARIMA(1,1,0)(1,0,1)[12]
#select model with lowest aic value
model1=Arima(train,order=c(1,1,0),seasonal=list(order=c(1,0,1),period=12))
model2=Arima(train,order=c(1,1,0),seasonal=list(order=c(1,0,0),period=12))
model3=Arima(train,order=c(1,1,1),seasonal=list(order=c(1,0,1),period=12))
model4=Arima(train,order=c(0,1,1),seasonal=list(order=c(1,0,1),period=12))
#forecast
forecast1=forecast(model1,h=length(test))
forecast2=forecast(model2,h=length(test))
forecast3=forecast(model3,h=length(test))
forecast4=forecast(model4,h=length(test))
#check r model metrics
rmse1=rmse(test,forecast1$mean)
rmse2=rmse(test,forecast2$mean)
rmse3=rmse(test,forecast3$mean)
rmse4=rmse(test,forecast4$mean)
RMSE=c(rmse1,rmse2,rmse3,rmse4)
RMSE
## [1] 0.1390960 0.1697500 0.1404222 0.1496580
mae1=mae(test,forecast1$mean)
mae2=mae(test,forecast2$mean)
mae3=mae(test,forecast3$mean)
mae4=mae(test,forecast4$mean)
MAE=c(mae1,mae2,mae3,mae4)
MAE
## [1] 0.1223946 0.1392379 0.1243776 0.1332541
mape1=mape(test,forecast1$mean)
mape2=mape(test,forecast2$mean)
mape3=mape(test,forecast3$mean)
mape4=mape(test,forecast4$mean)
MAPE=c(mape1,mape2,mape3,mape4)
MAPE
## [1] 0.02445229 0.02723795 0.02482584 0.02658660
# Dataset
model_data <- data.frame(
Models = c("ARIMA(1,1,0)(1,0,1)[12]",
"ARIMA(1,1,0)(1,0,0)[12]",
"ARIMA(1,1,1)(1,0,1)[12]",
"ARIMA(0,1,1)(1,0,1)[12]"
),
RMSE = c(0.1390960, 0.1697500, 0.1404222, 0.1496580),
MAE = c(0.1223946, 0.1392379, 0.1243776, 0.1332541),
MAPE = c(0.02445229, 0.02723795, 0.02482584, 0.02658660)
)
# Convert to long format
long_data <- model_data %>%
pivot_longer(cols = RMSE:MAPE, names_to = "Metric", values_to = "Value") %>%
mutate(Highlight = ifelse(Models == "ARIMA(1,1,0)(1,0,1)[12]", "Best", "Others"))
##-------Plot---------
ggplot(long_data, aes(x = Models, y = Value, fill = Highlight)) +
geom_col(width = 0.7) +
facet_wrap(~ Metric, scales = "free_y") +
scale_fill_manual(values = c("Best" = "blue", "Others" = "lightgrey")) +
theme_minimal(base_size = 12) +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5)) +
labs(
title = "Model Comparison (ARIMA Forecasts)",
x = NULL,
y = "Metric Value",
fill = "Model Type"
)
forecast= forecast(model1, h=length(test))
plot(forecast1,ylab="log(cases)",xlab="Time",
lwd=2,main="Forecast vs Actual")
lines(test, col="red",lty=2,lwd=2)
legend("topleft",legend=c("forecast","actual"),
col=c("blue","red"),lty=c(1,2))
The forecast versus actual plot compares observed malaria cases with model-predicted values over time. Historical data show an overall increasing trend with fluctuations. The forecasted values closely track the actual observed cases during the validation period, indicating good model performance. The widening prediction intervals reflect increasing uncertainty further into the future.
#fitting (For the forecast)
model1= Arima(ts_data,order=c(1,1,0),seasonal=list(order=c(1,0,1),period=12))
fit=forecast(model1,h=24)
plot(forecast(fit,h=24),ylab="log(cases)",
xlab="Time",lwd=2,main="Forecast from ARIMA(1,1,0)(1,0,1)[12]")
legend("topleft",c("log(cases)","forecast","80% CI","95% CI"),
cex=0.7,col=c(1,4,"lightblue",8),lty=c(1,1,1,2))
The result from the plot suggest that malaria incidence in Kaduna State is expected to remain persistently high if current transmission patterns continue.
# Back-transform the forecast mean
mean_original <- exp(fit$mean)
# Back-transform the 80% and 90% prediction intervals
lower_80 <- exp(fit$lower[,"80%"])
upper_80 <- exp(fit$upper[,"80%"])
lower_95 <- exp(fit$lower[,"95%"])
upper_95 <- exp(fit$upper[,"95%"])
# Back-transform mean and prediction intervals
mean_original <- exp(fit$mean)
lower_80 <- exp(fit$lower[,"80%"])
upper_80 <- exp(fit$upper[,"80%"])
lower_95 <- exp(fit$lower[,"95%"])
upper_95 <- exp(fit$upper[,"95%"])
# Create time variable
time_months <- time(mean_original)
# Combine into a data frame
forecast_df <- data.frame(
Time = time_months,
Mean = mean_original,
Lower_80 = lower_80,
Upper_80 = upper_80,
Lower_95 = lower_95,
Upper_95 = upper_95
)
print(forecast_df)
## Time Mean Lower_80 Upper_80 Lower_95 Upper_95
## 1 2025.000 180.2297 140.53755 231.1322 123.19771 263.6636
## 2 2025.083 191.9438 139.08901 264.8837 117.28546 314.1261
## 3 2025.167 191.8755 130.45339 282.2172 106.35362 346.1678
## 4 2025.250 201.2354 129.63217 312.3893 102.70886 394.2766
## 5 2025.333 207.2449 127.23115 337.5780 98.27102 437.0611
## 6 2025.417 241.1382 141.69952 410.3588 106.93959 543.7429
## 7 2025.500 236.1156 133.25034 418.3898 98.43362 566.3775
## 8 2025.583 254.0262 138.04644 467.4465 99.95863 645.5604
## 9 2025.667 237.1959 124.39826 452.2722 88.39674 636.4701
## 10 2025.750 252.1573 127.86346 497.2750 89.25294 712.3944
## 11 2025.833 229.0559 112.48031 466.4515 77.19217 679.6882
## 12 2025.917 228.2558 108.69753 479.3182 73.39326 709.8840
## 13 2026.000 209.1075 95.36061 458.5327 62.92948 694.8406
## 14 2026.083 221.3930 97.15001 504.5275 62.81695 780.2807
## 15 2026.167 221.3696 93.59285 523.5925 59.33692 825.8688
## 16 2026.250 231.3503 94.39324 567.0210 58.72764 911.3758
## 17 2026.333 237.7475 93.73855 602.9950 57.27286 986.9225
## 18 2026.417 273.5756 104.36137 717.1580 62.65841 1194.4700
## 19 2026.500 268.2909 99.12967 726.1196 58.51992 1230.0085
## 20 2026.583 287.0998 102.84826 801.4360 59.72905 1380.0034
## 21 2026.667 269.4282 93.66265 775.0319 53.53673 1355.9203
## 22 2026.750 285.1418 96.27293 844.5348 54.18476 1500.5295
## 23 2026.833 260.8487 85.60202 794.8650 47.45919 1433.6959
## 24 2026.917 260.0042 82.99182 814.5644 45.34170 1490.9494
# Extract residuals
res <- residuals(model)
# set up plotting layout: 3 plots, one below the other
#par(mfrow = c(3,1), mar = c(4,4,3,2)) # Adjust margins
# ACF plot
acf(res, main = "ACF of Residuals", lwd = 2)
checkresiduals(fit)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,1,0)(1,0,1)[12]
## Q* = 22.926, df = 21, p-value = 0.3479
##
## Model df: 3. Total lags used: 24
The analysis of malaria incidence in Kaduna State shows a gradual upward trend with clear seasonal fluctuations. Using the ARIMA(1,1,0)(1,0,1)[12] model on log-transformed monthly data, Malaria cases was forecasted from early 2024 through 2026. The forecasted log(cases) range between approximately 5.2 and 5.5, corresponding to 180 to 245 cases per month, with widening 80% and 95% confidence intervals indicating increasing uncertainty over time. These projections reflect the expected trajectory of malaria in Kaduna if no additional interventions or control measures are implemented, preserving the historical seasonal pattern with recurring peaks during months of higher malaria incidence. These results underscore persistent seasonality and highlight the urgent need for continuous monitoring to guide timely interventions and resource allocation.