INTRODUCTION

Cholera is a waterborne disease in the family of acute diarrheal diseases caused by the bacterium Vibrio cholerae [1] . Cholera is spread by the fecal-oral route, either directly from person to person or indirectly through contaminated fluids from an environmental reservoir of varying duration, food, and potentially flies and fomites [2] . Indeed, it has been shown that epidemic cholera often occurs near watercourses when weather conditions are favorable for bacterial growth, such that there is an interaction between the aquatic environment and feces [2] . In response, prevention and control efforts have been deployed, including strengthening laboratory capacity, improving epidemiological surveillance systems, and promoting access to safe drinking water and sanitation [3] . However, despite these initiatives, cholera remains endemic in many African countries. According to the World Health Organization (WHO), Africa is one of the regions most affected by this disease, with peaks in incidence often linked to poor health conditions and humanitarian crises [3] . The first cholera epidemics date back to 1817 in Africa, when the disease spread from the Indian subcontinent through maritime trade routes [4] . Since then, the continent has experienced seven major pandemics, with recurring outbreaks that have strained health systems [5] . Cholera remains a major health problem in Africa, with recurring epidemics affecting many countries, despite prevention and control efforts implemented by the various organizations working towards its eradication [1] . Although cholera is preventable through simple measures such as improving access to drinking water and basic sanitation infrastructure, epidemics have persisted on the continent for decades [6] . Despite efforts in vaccination, awareness raising and improved health infrastructure, cholera outbreaks continue to occur at regular intervals, particularly in vulnerable areas [7] , although progress has been made in combating this epidemic, it continues to affect millions of people each year in Africa, particularly in the poorest and most vulnerable regions. This situation leads us to ask the following question: what are the root causes of the recurrence of cholera in Africa and how do these factors interact to maintain its spread? This research question focuses on identifying the underlying causes of cholera recurrence and seeks to understand the complex interactions between different factors (demographic, environmental, socio-economic, and health) that contribute to the persistence of the disease in Africa. The bulk of the new cases and deaths have been in Malawi, which is facing its worst cholera outbreak in two decades. Malawi’s neighbors Mozambique and Zambia have also recently reported cases. In East Africa, Ethiopia, Kenya and Somalia are responding to outbreaks amid prolonged and severe drought that has left millions in urgent need of humanitarian assistance. Burundi, Cameroon, the Democratic Republic of the Congo and Nigeria have also reported cases. “We are witnessing a worrying scenario in which conflict and extreme weather events are exacerbating cholera triggers and increasing the toll,” said Dr Matshidiso Moeti, World Health Organization (WHO) regional director for Africa. Western DRC is considered to be affected by cholera outbreaks along the Congo River and its tributaries [8] . In 2023, 225,857 cases and 3,167 deaths have been reported in 20 African countries facing outbreaks since the beginning of the year. In 2022, nearly 80,000 cases and 1,863 deaths have been recorded in 15 affected countries. If the current rapid upward trend continues, it could surpass the number of cases recorded in 2021, the worst year for cholera in Africa in nearly a decade. The average case fatality rate is currently close to 3%, above the 2.3% reached in 2022, and well above the acceptable level of less than 1% [9] . Cholera therefore remains a major cause of illness and death in Africa, where it continues to occur in several countries, despite being largely eliminated from developed countries over a century ago. It disproportionately affects populations already strained by conflict, inadequate infrastructure, non-resilient health systems, and poverty [10] . The causes of cholera persistence in Africa are multiple and interconnected. They include environmental factors, such as water contamination and degraded sanitation infrastructure, as well as socio-economic determinants such as poverty, rapid urbanization and forced displacement [11] . In addition, the fragility of African health systems, characterized by limited access to care and a lack of resources, hampers efforts to prevent and respond to cholera outbreaks. The main objective of this research is to identify and analyze the root causes of cholera recurrence in Africa, taking into account demographic, health, socio-economic, and environmental factors that contribute to the spread of this disease. Through this study, the aim will be to better understand why cholera persists despite prevention efforts and infrastructure improvements, and also to explore the interactions between these factors in order to propose sustainable solutions to reduce future epidemics. In order to achieve our main goal, three specific objectives have been defined: Analyze environmental conditions and health infrastructure that promote cholera transmission, including access to drinking water and sanitation, as well as the impact of climate change. Analyze the socio-economic determinants that influence the vulnerability of populations to cholera, in particular the influence of inequalities in access to health services and precarious living conditions. Propose practical and appropriate solutions to limit the recurrence of cholera in Africa, based on the results of the analysis. This article aims to explore the causes of the recurrence of cholera in Africa, particularly for the year 2023, by highlighting the dynamics that favor its spread and by discussing the measures necessary to mitigate its impact on the most vulnerable populations.

Loarding data

To carry out our Principal Component Analysis, we first import the data into R.

setwd("C:/Users/HARISSE/Downloads/cours_2iE/S7_GEAAH/RTI/Maladi_hydriqu/Projet_RTI_Groupe_10_GEAAH/Donnee_ACP/")

data = read.csv(file ="data.csv", header = TRUE, sep = ";", quote = "\"",
                dec = ",", row.names = 1)
data[,1:11]
##              N_C P_0_15 P_15_64  den GIRE  hand   edu   san   wat rain tem
## Burundi     1394   45.2   54.85 0.42   47  6.30 75.54 45.69 62.44 1274  21
## Cameroun    6470   42.0   58.05 0.59   40 36.71 78.23 43.12 69.59 1604  25
## RDC        52654   46.5   53.46 0.48   32 19.35 80.54 16.17 35.12 1543  25
## Ethiopya   30389   39.3   60.70 0.49   41  8.32 51.77  9.34 51.51  848  24
## Kenya       8809   37.2   62.78 0.60   59 37.60 82.88 36.53 62.86  630  26
## Malawi     32530   43.3   57.95 0.51   55 15.31 68.80 49.24 71.87 1181  23
## Mozambique 39101   42.0   56.71 0.46   62 15.00 59.78 37.38 63.20 1032  24
## Nigeria     3457   42.8   57.24 0.55   44 31.08 62.02 46.57 79.64 1150  28
## Sud_Africa  1478   28.3   71.70 0.72   71 44.00 90.00 77.63 94.49  495  18
## Soudan_Sud  1471   43.0   57.04 0.38   43  5.60 34.52 16.06 41.19  900  29
## Tanzanie    1040   43.1   56.88 0.53   54 28.92 80.20 30.62 60.79 1071  23
## Zambie      4531   42.4   57.55 0.57   58 18.15 87.50 36.30 68.25 1020  23
## Zimbabwe   14148   40.3   59.72 0.55   63 42.46 89.85 34.62 62.29  657  22

Creation of correlation matrix

The corrplot, psych, Hmisc libraries were downloaded to determine the correlation matrix and make the graphs.

library(corrplot)
## Warning: le package 'corrplot' a été compilé avec la version R 4.4.2
## corrplot 0.95 loaded
library(psych)
library(Hmisc)
## Warning: le package 'Hmisc' a été compilé avec la version R 4.4.2
## 
## Attachement du package : 'Hmisc'
## L'objet suivant est masqué depuis 'package:psych':
## 
##     describe
## Les objets suivants sont masqués depuis 'package:base':
## 
##     format.pval, units
mat_cor = cor(data)
col = colorRampPalette(c("#BB4444", "#EE9988", "#FFFFFF", "#77AADD", "#4477AA"))
corrplot(mat_cor, method="color", col=col(200),  
         type="upper", order="hclust", 
         addCoef.col = "black", # Ajout du coefficient de corrélation
         tl.col="black", tl.srt=90, #Rotation des étiquettes de textes
         , sig.level = 0.1, insig = "blank", 
         # Cacher les coefficients de corrélation sur la diagonale
         diag=FALSE)

To check the relevance of correlations between variables, we determined the p-value, which verifies whether the correlation matrix is significantly different from the identity matrix. The null hypothesis of the test is that the correlation matrix is an identity matrix, which would mean that there are no correlations between the variables. Thus, for a p-value of less than 5%, we would say that the correlation matrix is not an identity matrix, and that there are indeed significant correlations between the variables.

rcorr(as.matrix(mat_cor[,1:11]))
##           N_C P_0_15 P_15_64   den  GIRE  hand   edu   san   wat  rain   tem
## N_C      1.00   0.74   -0.74 -0.76 -0.73 -0.78 -0.59 -0.81 -0.84  0.69  0.56
## P_0_15   0.74   1.00   -1.00 -0.96 -0.95 -0.91 -0.75 -0.90 -0.93  0.95  0.84
## P_15_64 -0.74  -1.00    1.00  0.96  0.95  0.92  0.75  0.91  0.93 -0.95 -0.84
## den     -0.76  -0.96    0.96  1.00  0.91  0.98  0.88  0.94  0.95 -0.85 -0.87
## GIRE    -0.73  -0.95    0.95  0.91  1.00  0.87  0.79  0.92  0.93 -0.94 -0.90
## hand    -0.78  -0.91    0.92  0.98  0.87  1.00  0.88  0.88  0.90 -0.82 -0.81
## edu     -0.59  -0.75    0.75  0.88  0.79  0.88  1.00  0.84  0.81 -0.62 -0.91
## san     -0.81  -0.90    0.91  0.94  0.92  0.88  0.84  1.00  0.99 -0.79 -0.91
## wat     -0.84  -0.93    0.93  0.95  0.93  0.90  0.81  0.99  1.00 -0.82 -0.88
## rain     0.69   0.95   -0.95 -0.85 -0.94 -0.82 -0.62 -0.79 -0.82  1.00  0.73
## tem      0.56   0.84   -0.84 -0.87 -0.90 -0.81 -0.91 -0.91 -0.88  0.73  1.00
## 
## n= 11 
## 
## 
## P
##         N_C    P_0_15 P_15_64 den    GIRE   hand   edu    san    wat    rain  
## N_C            0.0096 0.0088  0.0061 0.0102 0.0050 0.0545 0.0025 0.0012 0.0186
## P_0_15  0.0096        0.0000  0.0000 0.0000 0.0000 0.0081 0.0001 0.0000 0.0000
## P_15_64 0.0088 0.0000         0.0000 0.0000 0.0000 0.0076 0.0001 0.0000 0.0000
## den     0.0061 0.0000 0.0000         0.0000 0.0000 0.0003 0.0000 0.0000 0.0009
## GIRE    0.0102 0.0000 0.0000  0.0000        0.0005 0.0035 0.0000 0.0000 0.0000
## hand    0.0050 0.0000 0.0000  0.0000 0.0005        0.0003 0.0003 0.0002 0.0021
## edu     0.0545 0.0081 0.0076  0.0003 0.0035 0.0003        0.0014 0.0027 0.0426
## san     0.0025 0.0001 0.0001  0.0000 0.0000 0.0003 0.0014        0.0000 0.0040
## wat     0.0012 0.0000 0.0000  0.0000 0.0000 0.0002 0.0027 0.0000        0.0018
## rain    0.0186 0.0000 0.0000  0.0009 0.0000 0.0021 0.0426 0.0040 0.0018       
## tem     0.0706 0.0013 0.0012  0.0004 0.0002 0.0028 0.0001 0.0001 0.0004 0.0100
##         tem   
## N_C     0.0706
## P_0_15  0.0013
## P_15_64 0.0012
## den     0.0004
## GIRE    0.0002
## hand    0.0028
## edu     0.0001
## san     0.0001
## wat     0.0004
## rain    0.0100
## tem

Analyzing the p-value values, we notice that all values are below 5%, which allows us to continue our PCA with the collected data.

Launching the PCA

The FactoMineR package was used to run our PCA.

library(FactoMineR)
pca_1 = PCA(X = data, scale.unit = TRUE, ncp = 11, ind.sup = NULL, 
            quanti.sup = NULL, quali.sup = NULL, row.w = NULL, 
            col.w = NULL, graph = FALSE, axes = c(1,2))

Inertia and choice of main axes

library(ggplot2)
## 
## Attachement du package : 'ggplot2'
## Les objets suivants sont masqués depuis 'package:psych':
## 
##     %+%, alpha
library(factoextra)
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
fviz_eig(pca_1, addlabels=TRUE, hjust = -0.3) +
  ylim(0, 65)

The first 2 axes of the analysis express 70.48% of the total inertia of the data set. This means that 70.48% of the information on all of our 11 variables is located in these two axes. Axis 1 is predominant and alone explains 56.82% of the total variability of the data. The choice of axes for our analysis will therefore be made with the first two axes.

Projections of variables on the axes:

library(ggplot2)
library(factoextra)
fviz_pca_var(pca_1, col.var = "cos2" , gradient.col = c("blue" , "green" , "red"), repel = TRUE )

The squared cosine circle allows us to visualize the quality of the representation of the variables in relation to the main axes. Here, we notice that the variables are better represented on axis 1 compared to axis 2, which is justified given that axis 1 has 56.82% of variance percentage compared to axis 2 which only contains 13.65%.

Positions of individuals on the axes:

fviz_pca_ind(pca_1, col.ind = "contrib", gradient.cols = c("blue" , "green" , "red"), repel = TRUE)

fviz_pca_biplot(pca_1, repel = TRUE,col.var = "blue",col.ind = "red")

The graph allows us to say axis 1 opposes individuals such as South Africa to individuals such as South Sudan and Ethiopia. This is justified by the fact that South Africa has high values for the variables population aged 15 to 64, population density and rate of access to drinking water against low values for the variables population aged 0 to 15 and average annual temperature. On the other hand, the group of individuals like South Sudan and Ethiopia share low values for the variables literacy rate and sanitation access rate.

Description of variables and individuals in relation to the axes

  Description in relation to axis 1:
  
fviz_contrib(pca_1, choice = "ind", axes = 1)

fviz_contrib(pca_1, choice = "var", axes = 1)

Axis 1 has a variance percentage of 56.82%, making it the axis containing the most information on our variance. The individuals who contribute more than 10% to its creation are South Africa, the Democratic Republic of Congo and South Sudan. The variables that contribute the most to its creation are population density, the rates of people aged 0 to 15 and 15 to 64, the rate of access to drinking water and sanitation and finally the rate of implementation of IWRM.

    Description in relation to axis 2:
fviz_contrib(pca_1, choice = "ind", axes = 2)

fviz_contrib(pca_1, choice = "var", axes = 2)

Axis 2 contains 13.65% of the information of our variance. The individuals contributing best to its creation are South Sudan, Cameroon, Ethiopia and the Democratic Republic of Congo while the variables that contribute best to its formation are the average annual rainfall and the literacy rate.

Correlation between variables

fviz_pca_var(pca_1)

Looking at the graphique, we notice a positive correlation between the number of cholera cases and the variables rainfall, population aged 0 to 15 and average annual temperature. This would mean that these variables contribute positively to the recurrence of cholera unlike the other parameters which contribute to slowing down its progress.

Classification of individuals

res.PCA<-PCA(data,graph=FALSE)
res.HCPC<-HCPC(res.PCA,nb.clust=3,consol=FALSE,graph=FALSE)
plot.HCPC(res.HCPC,choice='tree',title='Arbre hiérarchique')

plot.HCPC(res.HCPC,choice='map',draw.tree=FALSE,title='Plan factoriel')

plot.HCPC(res.HCPC,choice='3D.map',ind.names=FALSE,centers.plot=FALSE,angle=60,title='Arbre hiérarchique sur le plan factoriel')

The classification of our individuals reveals the presence of three groups which are characterized by their similarity with respect to certain variables. We have: Group 1: It consists of 6 countries which are the Democratic Republic of Congo, Burundi, Malawi, Mozambique, Ethiopia and South Sudan. These individuals are characterized by high values for the number of cholera cases with low values for the variables hand washing rate and hygiene, population density, literacy rate and rate of access to drinking water. Group 2: It consists of countries such as Cameroon, Tanzania, Zambia, Nigeria, Kenya and Zimbabwe. This group is characterized by high values for the variable handwashing rate and hygiene. Group 3: The last group is composed of South Africa which has high values for the variables population aged 15-64, sanitation access rate, population density and sanitation access rate.

Linear regression

library(car)
## Warning: le package 'car' a été compilé avec la version R 4.4.2
## Le chargement a nécessité le package : carData
## Warning: le package 'carData' a été compilé avec la version R 4.4.2
## 
## Attachement du package : 'car'
## L'objet suivant est masqué depuis 'package:psych':
## 
##     logit
library(carData)
library(corrplot)
library("clusterSim")
## Warning: le package 'clusterSim' a été compilé avec la version R 4.4.2
## Le chargement a nécessité le package : cluster
## Le chargement a nécessité le package : MASS
library(DataExplorer)
## Warning: le package 'DataExplorer' a été compilé avec la version R 4.4.2
library(factoextra)
library(FactoInvestigate)
print(data)
##              N_C P_0_15 P_15_64  den GIRE  hand   edu   san   wat rain tem
## Burundi     1394   45.2   54.85 0.42   47  6.30 75.54 45.69 62.44 1274  21
## Cameroun    6470   42.0   58.05 0.59   40 36.71 78.23 43.12 69.59 1604  25
## RDC        52654   46.5   53.46 0.48   32 19.35 80.54 16.17 35.12 1543  25
## Ethiopya   30389   39.3   60.70 0.49   41  8.32 51.77  9.34 51.51  848  24
## Kenya       8809   37.2   62.78 0.60   59 37.60 82.88 36.53 62.86  630  26
## Malawi     32530   43.3   57.95 0.51   55 15.31 68.80 49.24 71.87 1181  23
## Mozambique 39101   42.0   56.71 0.46   62 15.00 59.78 37.38 63.20 1032  24
## Nigeria     3457   42.8   57.24 0.55   44 31.08 62.02 46.57 79.64 1150  28
## Sud_Africa  1478   28.3   71.70 0.72   71 44.00 90.00 77.63 94.49  495  18
## Soudan_Sud  1471   43.0   57.04 0.38   43  5.60 34.52 16.06 41.19  900  29
## Tanzanie    1040   43.1   56.88 0.53   54 28.92 80.20 30.62 60.79 1071  23
## Zambie      4531   42.4   57.55 0.57   58 18.15 87.50 36.30 68.25 1020  23
## Zimbabwe   14148   40.3   59.72 0.55   63 42.46 89.85 34.62 62.29  657  22
attach(data)
summary(data)
##       N_C            P_0_15         P_15_64           den        
##  Min.   : 1040   Min.   :28.30   Min.   :53.46   Min.   :0.3800  
##  1st Qu.: 1478   1st Qu.:40.30   1st Qu.:56.88   1st Qu.:0.4800  
##  Median : 6470   Median :42.40   Median :57.55   Median :0.5300  
##  Mean   :15190   Mean   :41.18   Mean   :58.82   Mean   :0.5269  
##  3rd Qu.:30389   3rd Qu.:43.10   3rd Qu.:59.72   3rd Qu.:0.5700  
##  Max.   :52654   Max.   :46.50   Max.   :71.70   Max.   :0.7200  
##       GIRE            hand            edu             san       
##  Min.   :32.00   Min.   : 5.60   Min.   :34.52   Min.   : 9.34  
##  1st Qu.:43.00   1st Qu.:15.00   1st Qu.:62.02   1st Qu.:30.62  
##  Median :54.00   Median :19.35   Median :78.23   Median :36.53  
##  Mean   :51.46   Mean   :23.75   Mean   :72.43   Mean   :36.87  
##  3rd Qu.:59.00   3rd Qu.:36.71   3rd Qu.:82.88   3rd Qu.:45.69  
##  Max.   :71.00   Max.   :44.00   Max.   :90.00   Max.   :77.63  
##       wat             rain           tem       
##  Min.   :35.12   Min.   : 495   Min.   :18.00  
##  1st Qu.:60.79   1st Qu.: 848   1st Qu.:23.00  
##  Median :62.86   Median :1032   Median :24.00  
##  Mean   :63.33   Mean   :1031   Mean   :23.92  
##  3rd Qu.:69.59   3rd Qu.:1181   3rd Qu.:25.00  
##  Max.   :94.49   Max.   :1604   Max.   :29.00
model1 = lm(formula = N_C ~ edu + san + hand + den + wat + GIRE + P_15_64 + tem + P_0_15 + rain,data= data)
summary(model1)
## 
## Call:
## lm(formula = N_C ~ edu + san + hand + den + wat + GIRE + P_15_64 + 
##     tem + P_0_15 + rain, data = data)
## 
## Residuals:
##    Burundi   Cameroun        RDC   Ethiopya      Kenya     Malawi Mozambique 
##      -3721      -8076      11406       9712       9088      14859      13995 
##    Nigeria Sud_Africa Soudan_Sud   Tanzanie     Zambie   Zimbabwe 
##       6274      -4732     -19056     -22383     -13134       5769 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept)  582765.24 2425475.09   0.240    0.833
## edu           -2034.27    3444.31  -0.591    0.615
## san            1014.72    3090.52   0.328    0.774
## hand           -121.58    1561.25  -0.078    0.945
## den          591585.77 1030677.55   0.574    0.624
## wat           -2549.07    3737.58  -0.682    0.566
## GIRE            526.94    2589.25   0.204    0.858
## P_15_64       -7909.58   27355.04  -0.289    0.800
## tem           -6731.46   11898.31  -0.566    0.629
## P_0_15          202.78   19354.45   0.010    0.993
## rain            -13.82     146.30  -0.094    0.933
## 
## Residual standard error: 31100 on 2 degrees of freedom
## Multiple R-squared:  0.469,  Adjusted R-squared:  -2.186 
## F-statistic: 0.1766 on 10 and 2 DF,  p-value: 0.9773
vif(model1)
##        edu        san       hand        den        wat       GIRE    P_15_64 
##  40.028512  36.435541   5.723006  99.811729  40.549970  10.475205 191.293649 
##        tem     P_0_15       rain 
##  14.479389  95.839302  29.355792
library(ggplot2)
donnees = data.frame(data)
data_vis= data.frame(valeurs_reelles=donnees$N_C,predictions=predict(model1))
ggplot(data_vis,aes(x=valeurs_reelles,y=predictions))+
  geom_point()+
  geom_smooth(method="lm",se=FALSE,color="blue")+
  labs(x="valeurs_reelles",y="predictions")+
  ggtitle(model1)
## `geom_smooth()` using formula = 'y ~ x'

Linear regression gives us an R2 of 0.47, which allows us to say that the analysis of cholera recurrence is not suitable for a linear model. This indicates that another model should be chosen to analyze cholera recurrence.

Conclusion

The principal component analysis (PCA) that we carried out on the factors contributing to the recurrence of cholera in Africa revealed several significant trends and correlations. Our study revealed that the main variables that influence the recurrence of cholera in African countries include climatic parameters such as high rainfall and high temperatures. A population that is too young also contributes to the recurrence of the disease. Contrary to this, the recurrence of cholera is favored for countries with low rates concerning health parameters (rate of access to water, sanitation and hand washing including hygiene), environmental parameters (Integrated Water Resources Management) and finally socio-economic parameters (literacy rate). These results show the importance of a multisectoral approach that integrates interventions in public health, infrastructure improvement and water resource management in order to combat this disease that is raging in our countries. This involves implementing community awareness programs, improving access to safe drinking water sources, and strengthening sanitation systems. In addition, given the increasing influence of climate change on the recurrence of cholera, it is crucial to integrate environmental considerations into the strategies to be adopted to minimize their impacts on the transmission of the disease.