Cholera is a waterborne disease in the family of acute diarrheal diseases caused by the bacterium Vibrio cholerae [1] . Cholera is spread by the fecal-oral route, either directly from person to person or indirectly through contaminated fluids from an environmental reservoir of varying duration, food, and potentially flies and fomites [2] . Indeed, it has been shown that epidemic cholera often occurs near watercourses when weather conditions are favorable for bacterial growth, such that there is an interaction between the aquatic environment and feces [2] . In response, prevention and control efforts have been deployed, including strengthening laboratory capacity, improving epidemiological surveillance systems, and promoting access to safe drinking water and sanitation [3] . However, despite these initiatives, cholera remains endemic in many African countries. According to the World Health Organization (WHO), Africa is one of the regions most affected by this disease, with peaks in incidence often linked to poor health conditions and humanitarian crises [3] . The first cholera epidemics date back to 1817 in Africa, when the disease spread from the Indian subcontinent through maritime trade routes [4] . Since then, the continent has experienced seven major pandemics, with recurring outbreaks that have strained health systems [5] . Cholera remains a major health problem in Africa, with recurring epidemics affecting many countries, despite prevention and control efforts implemented by the various organizations working towards its eradication [1] . Although cholera is preventable through simple measures such as improving access to drinking water and basic sanitation infrastructure, epidemics have persisted on the continent for decades [6] . Despite efforts in vaccination, awareness raising and improved health infrastructure, cholera outbreaks continue to occur at regular intervals, particularly in vulnerable areas [7] , although progress has been made in combating this epidemic, it continues to affect millions of people each year in Africa, particularly in the poorest and most vulnerable regions. This situation leads us to ask the following question: what are the root causes of the recurrence of cholera in Africa and how do these factors interact to maintain its spread? This research question focuses on identifying the underlying causes of cholera recurrence and seeks to understand the complex interactions between different factors (demographic, environmental, socio-economic, and health) that contribute to the persistence of the disease in Africa. The bulk of the new cases and deaths have been in Malawi, which is facing its worst cholera outbreak in two decades. Malawi’s neighbors Mozambique and Zambia have also recently reported cases. In East Africa, Ethiopia, Kenya and Somalia are responding to outbreaks amid prolonged and severe drought that has left millions in urgent need of humanitarian assistance. Burundi, Cameroon, the Democratic Republic of the Congo and Nigeria have also reported cases. “We are witnessing a worrying scenario in which conflict and extreme weather events are exacerbating cholera triggers and increasing the toll,” said Dr Matshidiso Moeti, World Health Organization (WHO) regional director for Africa. Western DRC is considered to be affected by cholera outbreaks along the Congo River and its tributaries [8] . In 2023, 225,857 cases and 3,167 deaths have been reported in 20 African countries facing outbreaks since the beginning of the year. In 2022, nearly 80,000 cases and 1,863 deaths have been recorded in 15 affected countries. If the current rapid upward trend continues, it could surpass the number of cases recorded in 2021, the worst year for cholera in Africa in nearly a decade. The average case fatality rate is currently close to 3%, above the 2.3% reached in 2022, and well above the acceptable level of less than 1% [9] . Cholera therefore remains a major cause of illness and death in Africa, where it continues to occur in several countries, despite being largely eliminated from developed countries over a century ago. It disproportionately affects populations already strained by conflict, inadequate infrastructure, non-resilient health systems, and poverty [10] . The causes of cholera persistence in Africa are multiple and interconnected. They include environmental factors, such as water contamination and degraded sanitation infrastructure, as well as socio-economic determinants such as poverty, rapid urbanization and forced displacement [11] . In addition, the fragility of African health systems, characterized by limited access to care and a lack of resources, hampers efforts to prevent and respond to cholera outbreaks. The main objective of this research is to identify and analyze the root causes of cholera recurrence in Africa, taking into account demographic, health, socio-economic, and environmental factors that contribute to the spread of this disease. Through this study, the aim will be to better understand why cholera persists despite prevention efforts and infrastructure improvements, and also to explore the interactions between these factors in order to propose sustainable solutions to reduce future epidemics. In order to achieve our main goal, three specific objectives have been defined: Analyze environmental conditions and health infrastructure that promote cholera transmission, including access to drinking water and sanitation, as well as the impact of climate change. Analyze the socio-economic determinants that influence the vulnerability of populations to cholera, in particular the influence of inequalities in access to health services and precarious living conditions. Propose practical and appropriate solutions to limit the recurrence of cholera in Africa, based on the results of the analysis. This article aims to explore the causes of the recurrence of cholera in Africa, particularly for the year 2023, by highlighting the dynamics that favor its spread and by discussing the measures necessary to mitigate its impact on the most vulnerable populations.
To carry out our Principal Component Analysis, we first import the data into R.
setwd("C:/Users/HARISSE/Downloads/cours_2iE/S7_GEAAH/RTI/Maladi_hydriqu/Projet_RTI_Groupe_10_GEAAH/Donnee_ACP/")
data = read.csv(file ="data.csv", header = TRUE, sep = ";", quote = "\"",
dec = ",", row.names = 1)
data[,1:11]
## N_C P_0_15 P_15_64 den GIRE hand edu san wat rain tem
## Burundi 1394 45.2 54.85 0.42 47 6.30 75.54 45.69 62.44 1274 21
## Cameroun 6470 42.0 58.05 0.59 40 36.71 78.23 43.12 69.59 1604 25
## RDC 52654 46.5 53.46 0.48 32 19.35 80.54 16.17 35.12 1543 25
## Ethiopya 30389 39.3 60.70 0.49 41 8.32 51.77 9.34 51.51 848 24
## Kenya 8809 37.2 62.78 0.60 59 37.60 82.88 36.53 62.86 630 26
## Malawi 32530 43.3 57.95 0.51 55 15.31 68.80 49.24 71.87 1181 23
## Mozambique 39101 42.0 56.71 0.46 62 15.00 59.78 37.38 63.20 1032 24
## Nigeria 3457 42.8 57.24 0.55 44 31.08 62.02 46.57 79.64 1150 28
## Sud_Africa 1478 28.3 71.70 0.72 71 44.00 90.00 77.63 94.49 495 18
## Soudan_Sud 1471 43.0 57.04 0.38 43 5.60 34.52 16.06 41.19 900 29
## Tanzanie 1040 43.1 56.88 0.53 54 28.92 80.20 30.62 60.79 1071 23
## Zambie 4531 42.4 57.55 0.57 58 18.15 87.50 36.30 68.25 1020 23
## Zimbabwe 14148 40.3 59.72 0.55 63 42.46 89.85 34.62 62.29 657 22
The corrplot, psych, Hmisc libraries were downloaded to determine the correlation matrix and make the graphs.
library(corrplot)
## Warning: le package 'corrplot' a été compilé avec la version R 4.4.2
## corrplot 0.95 loaded
library(psych)
library(Hmisc)
## Warning: le package 'Hmisc' a été compilé avec la version R 4.4.2
##
## Attachement du package : 'Hmisc'
## L'objet suivant est masqué depuis 'package:psych':
##
## describe
## Les objets suivants sont masqués depuis 'package:base':
##
## format.pval, units
mat_cor = cor(data)
col = colorRampPalette(c("#BB4444", "#EE9988", "#FFFFFF", "#77AADD", "#4477AA"))
corrplot(mat_cor, method="color", col=col(200),
type="upper", order="hclust",
addCoef.col = "black", # Ajout du coefficient de corrélation
tl.col="black", tl.srt=90, #Rotation des étiquettes de textes
, sig.level = 0.1, insig = "blank",
# Cacher les coefficients de corrélation sur la diagonale
diag=FALSE)
To check the relevance of correlations between variables, we determined the p-value, which verifies whether the correlation matrix is significantly different from the identity matrix. The null hypothesis of the test is that the correlation matrix is an identity matrix, which would mean that there are no correlations between the variables. Thus, for a p-value of less than 5%, we would say that the correlation matrix is not an identity matrix, and that there are indeed significant correlations between the variables.
rcorr(as.matrix(mat_cor[,1:11]))
## N_C P_0_15 P_15_64 den GIRE hand edu san wat rain tem
## N_C 1.00 0.74 -0.74 -0.76 -0.73 -0.78 -0.59 -0.81 -0.84 0.69 0.56
## P_0_15 0.74 1.00 -1.00 -0.96 -0.95 -0.91 -0.75 -0.90 -0.93 0.95 0.84
## P_15_64 -0.74 -1.00 1.00 0.96 0.95 0.92 0.75 0.91 0.93 -0.95 -0.84
## den -0.76 -0.96 0.96 1.00 0.91 0.98 0.88 0.94 0.95 -0.85 -0.87
## GIRE -0.73 -0.95 0.95 0.91 1.00 0.87 0.79 0.92 0.93 -0.94 -0.90
## hand -0.78 -0.91 0.92 0.98 0.87 1.00 0.88 0.88 0.90 -0.82 -0.81
## edu -0.59 -0.75 0.75 0.88 0.79 0.88 1.00 0.84 0.81 -0.62 -0.91
## san -0.81 -0.90 0.91 0.94 0.92 0.88 0.84 1.00 0.99 -0.79 -0.91
## wat -0.84 -0.93 0.93 0.95 0.93 0.90 0.81 0.99 1.00 -0.82 -0.88
## rain 0.69 0.95 -0.95 -0.85 -0.94 -0.82 -0.62 -0.79 -0.82 1.00 0.73
## tem 0.56 0.84 -0.84 -0.87 -0.90 -0.81 -0.91 -0.91 -0.88 0.73 1.00
##
## n= 11
##
##
## P
## N_C P_0_15 P_15_64 den GIRE hand edu san wat rain
## N_C 0.0096 0.0088 0.0061 0.0102 0.0050 0.0545 0.0025 0.0012 0.0186
## P_0_15 0.0096 0.0000 0.0000 0.0000 0.0000 0.0081 0.0001 0.0000 0.0000
## P_15_64 0.0088 0.0000 0.0000 0.0000 0.0000 0.0076 0.0001 0.0000 0.0000
## den 0.0061 0.0000 0.0000 0.0000 0.0000 0.0003 0.0000 0.0000 0.0009
## GIRE 0.0102 0.0000 0.0000 0.0000 0.0005 0.0035 0.0000 0.0000 0.0000
## hand 0.0050 0.0000 0.0000 0.0000 0.0005 0.0003 0.0003 0.0002 0.0021
## edu 0.0545 0.0081 0.0076 0.0003 0.0035 0.0003 0.0014 0.0027 0.0426
## san 0.0025 0.0001 0.0001 0.0000 0.0000 0.0003 0.0014 0.0000 0.0040
## wat 0.0012 0.0000 0.0000 0.0000 0.0000 0.0002 0.0027 0.0000 0.0018
## rain 0.0186 0.0000 0.0000 0.0009 0.0000 0.0021 0.0426 0.0040 0.0018
## tem 0.0706 0.0013 0.0012 0.0004 0.0002 0.0028 0.0001 0.0001 0.0004 0.0100
## tem
## N_C 0.0706
## P_0_15 0.0013
## P_15_64 0.0012
## den 0.0004
## GIRE 0.0002
## hand 0.0028
## edu 0.0001
## san 0.0001
## wat 0.0004
## rain 0.0100
## tem
Analyzing the p-value values, we notice that all values are below 5%, which allows us to continue our PCA with the collected data.
The FactoMineR package was used to run our PCA.
library(FactoMineR)
pca_1 = PCA(X = data, scale.unit = TRUE, ncp = 11, ind.sup = NULL,
quanti.sup = NULL, quali.sup = NULL, row.w = NULL,
col.w = NULL, graph = FALSE, axes = c(1,2))
library(ggplot2)
##
## Attachement du package : 'ggplot2'
## Les objets suivants sont masqués depuis 'package:psych':
##
## %+%, alpha
library(factoextra)
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
fviz_eig(pca_1, addlabels=TRUE, hjust = -0.3) +
ylim(0, 65)
The first 2 axes of the analysis express 70.48% of the total inertia of the data set. This means that 70.48% of the information on all of our 11 variables is located in these two axes. Axis 1 is predominant and alone explains 56.82% of the total variability of the data. The choice of axes for our analysis will therefore be made with the first two axes.
library(ggplot2)
library(factoextra)
fviz_pca_var(pca_1, col.var = "cos2" , gradient.col = c("blue" , "green" , "red"), repel = TRUE )
The squared cosine circle allows us to visualize the quality of the representation of the variables in relation to the main axes. Here, we notice that the variables are better represented on axis 1 compared to axis 2, which is justified given that axis 1 has 56.82% of variance percentage compared to axis 2 which only contains 13.65%.
fviz_pca_ind(pca_1, col.ind = "contrib", gradient.cols = c("blue" , "green" , "red"), repel = TRUE)
fviz_pca_biplot(pca_1, repel = TRUE,col.var = "blue",col.ind = "red")
The graph allows us to say axis 1 opposes individuals such as South Africa to individuals such as South Sudan and Ethiopia. This is justified by the fact that South Africa has high values for the variables population aged 15 to 64, population density and rate of access to drinking water against low values for the variables population aged 0 to 15 and average annual temperature. On the other hand, the group of individuals like South Sudan and Ethiopia share low values for the variables literacy rate and sanitation access rate.
Description in relation to axis 1:
fviz_contrib(pca_1, choice = "ind", axes = 1)
fviz_contrib(pca_1, choice = "var", axes = 1)
Axis 1 has a variance percentage of 56.82%, making it the axis containing the most information on our variance. The individuals who contribute more than 10% to its creation are South Africa, the Democratic Republic of Congo and South Sudan. The variables that contribute the most to its creation are population density, the rates of people aged 0 to 15 and 15 to 64, the rate of access to drinking water and sanitation and finally the rate of implementation of IWRM.
Description in relation to axis 2:
fviz_contrib(pca_1, choice = "ind", axes = 2)
fviz_contrib(pca_1, choice = "var", axes = 2)
Axis 2 contains 13.65% of the information of our variance. The individuals contributing best to its creation are South Sudan, Cameroon, Ethiopia and the Democratic Republic of Congo while the variables that contribute best to its formation are the average annual rainfall and the literacy rate.
fviz_pca_var(pca_1)
Looking at the graphique, we notice a positive correlation between the
number of cholera cases and the variables rainfall, population aged 0 to
15 and average annual temperature. This would mean that these variables
contribute positively to the recurrence of cholera unlike the other
parameters which contribute to slowing down its progress.
res.PCA<-PCA(data,graph=FALSE)
res.HCPC<-HCPC(res.PCA,nb.clust=3,consol=FALSE,graph=FALSE)
plot.HCPC(res.HCPC,choice='tree',title='Arbre hiérarchique')
plot.HCPC(res.HCPC,choice='map',draw.tree=FALSE,title='Plan factoriel')
plot.HCPC(res.HCPC,choice='3D.map',ind.names=FALSE,centers.plot=FALSE,angle=60,title='Arbre hiérarchique sur le plan factoriel')
The classification of our individuals reveals the presence of three groups which are characterized by their similarity with respect to certain variables. We have: Group 1: It consists of 6 countries which are the Democratic Republic of Congo, Burundi, Malawi, Mozambique, Ethiopia and South Sudan. These individuals are characterized by high values for the number of cholera cases with low values for the variables hand washing rate and hygiene, population density, literacy rate and rate of access to drinking water. Group 2: It consists of countries such as Cameroon, Tanzania, Zambia, Nigeria, Kenya and Zimbabwe. This group is characterized by high values for the variable handwashing rate and hygiene. Group 3: The last group is composed of South Africa which has high values for the variables population aged 15-64, sanitation access rate, population density and sanitation access rate.
library(car)
## Warning: le package 'car' a été compilé avec la version R 4.4.2
## Le chargement a nécessité le package : carData
## Warning: le package 'carData' a été compilé avec la version R 4.4.2
##
## Attachement du package : 'car'
## L'objet suivant est masqué depuis 'package:psych':
##
## logit
library(carData)
library(corrplot)
library("clusterSim")
## Warning: le package 'clusterSim' a été compilé avec la version R 4.4.2
## Le chargement a nécessité le package : cluster
## Le chargement a nécessité le package : MASS
library(DataExplorer)
## Warning: le package 'DataExplorer' a été compilé avec la version R 4.4.2
library(factoextra)
library(FactoInvestigate)
print(data)
## N_C P_0_15 P_15_64 den GIRE hand edu san wat rain tem
## Burundi 1394 45.2 54.85 0.42 47 6.30 75.54 45.69 62.44 1274 21
## Cameroun 6470 42.0 58.05 0.59 40 36.71 78.23 43.12 69.59 1604 25
## RDC 52654 46.5 53.46 0.48 32 19.35 80.54 16.17 35.12 1543 25
## Ethiopya 30389 39.3 60.70 0.49 41 8.32 51.77 9.34 51.51 848 24
## Kenya 8809 37.2 62.78 0.60 59 37.60 82.88 36.53 62.86 630 26
## Malawi 32530 43.3 57.95 0.51 55 15.31 68.80 49.24 71.87 1181 23
## Mozambique 39101 42.0 56.71 0.46 62 15.00 59.78 37.38 63.20 1032 24
## Nigeria 3457 42.8 57.24 0.55 44 31.08 62.02 46.57 79.64 1150 28
## Sud_Africa 1478 28.3 71.70 0.72 71 44.00 90.00 77.63 94.49 495 18
## Soudan_Sud 1471 43.0 57.04 0.38 43 5.60 34.52 16.06 41.19 900 29
## Tanzanie 1040 43.1 56.88 0.53 54 28.92 80.20 30.62 60.79 1071 23
## Zambie 4531 42.4 57.55 0.57 58 18.15 87.50 36.30 68.25 1020 23
## Zimbabwe 14148 40.3 59.72 0.55 63 42.46 89.85 34.62 62.29 657 22
attach(data)
summary(data)
## N_C P_0_15 P_15_64 den
## Min. : 1040 Min. :28.30 Min. :53.46 Min. :0.3800
## 1st Qu.: 1478 1st Qu.:40.30 1st Qu.:56.88 1st Qu.:0.4800
## Median : 6470 Median :42.40 Median :57.55 Median :0.5300
## Mean :15190 Mean :41.18 Mean :58.82 Mean :0.5269
## 3rd Qu.:30389 3rd Qu.:43.10 3rd Qu.:59.72 3rd Qu.:0.5700
## Max. :52654 Max. :46.50 Max. :71.70 Max. :0.7200
## GIRE hand edu san
## Min. :32.00 Min. : 5.60 Min. :34.52 Min. : 9.34
## 1st Qu.:43.00 1st Qu.:15.00 1st Qu.:62.02 1st Qu.:30.62
## Median :54.00 Median :19.35 Median :78.23 Median :36.53
## Mean :51.46 Mean :23.75 Mean :72.43 Mean :36.87
## 3rd Qu.:59.00 3rd Qu.:36.71 3rd Qu.:82.88 3rd Qu.:45.69
## Max. :71.00 Max. :44.00 Max. :90.00 Max. :77.63
## wat rain tem
## Min. :35.12 Min. : 495 Min. :18.00
## 1st Qu.:60.79 1st Qu.: 848 1st Qu.:23.00
## Median :62.86 Median :1032 Median :24.00
## Mean :63.33 Mean :1031 Mean :23.92
## 3rd Qu.:69.59 3rd Qu.:1181 3rd Qu.:25.00
## Max. :94.49 Max. :1604 Max. :29.00
model1 = lm(formula = N_C ~ edu + san + hand + den + wat + GIRE + P_15_64 + tem + P_0_15 + rain,data= data)
summary(model1)
##
## Call:
## lm(formula = N_C ~ edu + san + hand + den + wat + GIRE + P_15_64 +
## tem + P_0_15 + rain, data = data)
##
## Residuals:
## Burundi Cameroun RDC Ethiopya Kenya Malawi Mozambique
## -3721 -8076 11406 9712 9088 14859 13995
## Nigeria Sud_Africa Soudan_Sud Tanzanie Zambie Zimbabwe
## 6274 -4732 -19056 -22383 -13134 5769
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 582765.24 2425475.09 0.240 0.833
## edu -2034.27 3444.31 -0.591 0.615
## san 1014.72 3090.52 0.328 0.774
## hand -121.58 1561.25 -0.078 0.945
## den 591585.77 1030677.55 0.574 0.624
## wat -2549.07 3737.58 -0.682 0.566
## GIRE 526.94 2589.25 0.204 0.858
## P_15_64 -7909.58 27355.04 -0.289 0.800
## tem -6731.46 11898.31 -0.566 0.629
## P_0_15 202.78 19354.45 0.010 0.993
## rain -13.82 146.30 -0.094 0.933
##
## Residual standard error: 31100 on 2 degrees of freedom
## Multiple R-squared: 0.469, Adjusted R-squared: -2.186
## F-statistic: 0.1766 on 10 and 2 DF, p-value: 0.9773
vif(model1)
## edu san hand den wat GIRE P_15_64
## 40.028512 36.435541 5.723006 99.811729 40.549970 10.475205 191.293649
## tem P_0_15 rain
## 14.479389 95.839302 29.355792
library(ggplot2)
donnees = data.frame(data)
data_vis= data.frame(valeurs_reelles=donnees$N_C,predictions=predict(model1))
ggplot(data_vis,aes(x=valeurs_reelles,y=predictions))+
geom_point()+
geom_smooth(method="lm",se=FALSE,color="blue")+
labs(x="valeurs_reelles",y="predictions")+
ggtitle(model1)
## `geom_smooth()` using formula = 'y ~ x'
Linear regression gives us an R2 of 0.47, which allows us to say that the analysis of cholera recurrence is not suitable for a linear model. This indicates that another model should be chosen to analyze cholera recurrence.
The principal component analysis (PCA) that we carried out on the factors contributing to the recurrence of cholera in Africa revealed several significant trends and correlations. Our study revealed that the main variables that influence the recurrence of cholera in African countries include climatic parameters such as high rainfall and high temperatures. A population that is too young also contributes to the recurrence of the disease. Contrary to this, the recurrence of cholera is favored for countries with low rates concerning health parameters (rate of access to water, sanitation and hand washing including hygiene), environmental parameters (Integrated Water Resources Management) and finally socio-economic parameters (literacy rate). These results show the importance of a multisectoral approach that integrates interventions in public health, infrastructure improvement and water resource management in order to combat this disease that is raging in our countries. This involves implementing community awareness programs, improving access to safe drinking water sources, and strengthening sanitation systems. In addition, given the increasing influence of climate change on the recurrence of cholera, it is crucial to integrate environmental considerations into the strategies to be adopted to minimize their impacts on the transmission of the disease.