THE REMARKS THAT WERE MADE DURING THE PRESENTATION

ADD LINEAR REGRESSION

SPECIFY THE SOURCE AND DATE OF THE DATA OF THE VARIABLES IN THE MAPS

WRITE THE NAMES OF THE VARIABLES IN THE CARDS IN FULL : NO ABBREVIATIONS

ADD THE ACP WITH NIGERIA WHICH IS A SINGULAR ELEMENT

THE QUESTIONNAIRE WHICH IS PROVIDED BY A LINK

EXPLANATION OF GROUPS

SUMMARY

The study aims to analyze the impact of access to sanitation on public health, based on key indicators related to hygiene, sanitation, and health in different countries, particularly those in West Africa. This study explores the social, economic, and environmental factors influencing access to adequate sanitation services and the resulting health consequences.

A principal components analysis (PCA) was performed from several Essential variables related to sanitation and public health.

PCA has made it possible to reduce the complexity of these variables while identifying the relationships between them and highlighting the most significant factors that influence sanitation and public health.

Based on the results of the PCA and the identified target populations (including rural households , vulnerable communities, children under 5 years old and pregnant women), questionnaires were formulated. These questionnaires seek to collect specific data regarding access to sanitation and public health.

A literature review was conducted. This helped us better orient ourselves and better approach our theme, and to properly situate this broad study. This study also incorporates a Geographic Information System (GIS) that allows visualization of the data collected on the mapping of areas with limited or no access to sanitation. Maps were created to better understand the geographic distribution of populations exposed to sanitation-related health risks. This helps identify priority areas for health interventions and strengthen public policy planning.

INTRODUCTION

According to the UN (2013), sub-Saharan Africa lags far behind other regions of the world in achieving the Millennium Development Goals (MDGs), particularly with regard to sanitation. In 2011, 90% of the population of North Africa had access to improved sanitation, while only 30% of the population of sub-Saharan Africa did so. In addition, 26% of the population still practice open defecation. [1] However, a poor sanitation system leads to the proliferation of waterborne diseases, both direct and indirect, and can cause epidemics. Indeed, according to the United Nations (2007), of the 1.7 million people who die each year worldwide from diarrhea, 45% live in sub-Saharan Africa. [2]

In West Africa, hygiene, sanitation and health education challenges are facing many challenges, mainly related to the lack of adequate sanitation infrastructure. There is a considerable gap in sanitation infrastructure, both in urban and rural areas, leading to numerous risks to public health and the environment. [3]

Despite the progress made by other regions, West Africa faces a considerable delay, particularly with regard to access to improved sanitation systems. The question that arises is: “To what extent does insufficient access to sanitation in West Africa contribute to the spread of diseases and hinder socio-economic development?” Based on this question, the following questions arise: How does the lack of access to sanitation affect public health in the different municipalities of West Africa? What is the link between morbidity rates related to waterborne diseases and the level of access to sanitation in West Africa? [4]

The overall objective of this theme is to study the impact of access to sanitation on public health in West Africa. Indeed, a precarious or non-existent sanitation system constitutes a fertile ground for the proliferation of waterborne diseases, often serious and potentially fatal. These diseases, such as diarrhea, cholera, dysentery, and hepatitis, particularly affect vulnerable populations and slow down the region’s economic development by increasing health care expenditure and reducing worker productivity [5] . In addition, the deterioration of sanitary conditions particularly affects women and girls, limiting their access to education and exacerbating gender inequalities. Access to quality sanitation infrastructure is therefore essential not only to improve public health, but also to promote sustainable social and economic development in West Africa. [6]

This issue allows us to understand how the lack of adequate sanitation directly affects the spread of diseases and analyzes its effects on the living conditions of populations. Thus, access to quality sanitation is fundamental to preventing water-related diseases and improving public health in sub-Saharan Africa. It is also a question of examining the consequences on the educational, economic and social sectors, in order to propose solutions aimed at strengthening sanitation systems and improving the quality of life of the inhabitants of the region.

I. Materials and methods

1. Materials used for the project

For our project, we used a single site for collecting our data. This is the site OurWorldInData.com . This platform offers information on several themes, also presents empirical research on the evolution of living conditions in the world . We considered the data from the year 2020). As target countries, we used those of West Africa namely:Burkina Faso, Mali, Niger, Ivory Coast, Benin, Cape Verde, Gambia, Guinea, Guinea Bissau, Ghana, Liberia, Mauritania, Nigeria, Senegal, Sierra Leone and Togo .

For our bibliographic references, we used the siteScholar.google.com , a search engine that primarily indexes the content of academic periods and e-books from several commercial publishers, learned societies, and universities. In other words, it is a specialized search engine, dedicated to scientific literature. It is also a tool for finding articles. We also used software to facilitate data collection, processing, and analysis, as well as mapping of our study area. These include:

 ZOTERO Software

It is a bibliographic management tool that allows us to organize, store, and cite the sources used in our research, while also building our own reference database. It allowed us to create and maintain this essential base for our research, integrate relevant references into our report, and generate our bibliography.

Studio Software

It is a software used for data analysis, visualization, and static modeling. This software was used to conduct our principal component analysis using packages like FactoMineR, Factoextra, carData, cluster, corrplot, and FactoInvestigate.

QGIS software

It is a free and versatile Geographic Information System (GIS) software that allows the creation, analysis, visualization, and management of geographic and spatial data. It also supports a large number of vector data formats. This software allowed us to map our study area and develop other maps related to our work.

KOBOTOOLBOX

This is a mobile system that allows data collection. This software allowed us to generate the questionnaire to send to our various targets for data collection related to cholera. Link: https://kf.kobotoolbox.org/#/forms/amd3vLhf7ZHHjrheixdGmQ

2.Methodology

To better elucidate our work, we have adopted several approaches:

Variable Research and Data Collection

To search for our variables, we used the website OurWorldInData.com from where we carried out searches related to the chosen theme while using keywords such as: ** Insufficient access to sanitation, Spread of diseases, Socio-economic development , Environment and sustainable development, Political and institutional challenges. **

Once the variables were found, we extracted the data from these variables while relying on slightly more recent information. As the year of research, we opted for the year 2020. The choice of the countries studied was guided by the rate of access to sanitation and its impact on human health. Thus, we included countries such as Benin, Burkina Faso, Cape Verde, Ivory Coast, Mali, Niger, Nigeria, Sierra Leone, Senegal, Mauritania, Togo, Gambia, Ghana, Guinea, Guinea Bissau, Liberia. We obtained ten (10) variables for our study area among which we have:

Share of the population practicing open defecation : This variable refers to the proportion of individuals living in rural and urban areas who resort to the practice of defecating outside any sanitary facility, i.e., in nature. According to the UN, 26% of populations practice open defecation [7] . This generally occurs in regions with insufficient or non-existent sanitation infrastructure. This practice may be due to a lack of sanitation infrastructure, awareness, or local traditions. Although open defecation is more common in rural areas, it can also occur in cities, often due to factors such as poverty, rapid urbanization, lack of adequate sanitation services, or lack of access to public toilets. [8] This situation can expose the population to serious health risks, including the spread of infectious diseases, water contamination, and environmental degradation.
GDP ( Gross Domestic Product ) per capita : A high GDP per capita is often associated with a better level of economic development, allowing governments and businesses to invest more in essential infrastructure such as sanitation (construction of toilets, wastewater treatment systems, etc.). In countries with low GDP per capita, financial resources to develop sanitation infrastructure are limited. This can lead to a lack of access to adequate toilets, wastewater management systems, and appropriate health services. [8] In these settings, populations may be forced to practice open defecation, which exposes society to significant health risks, such as the spread of infectious diseases like cholera, typhoid, diarrhea, etc. Thus, GDP per capita directly influences access to sanitation services and, consequently, the state of public health.
Share of population using limited sanitation services : This variable shows us the proportion of people who have access to improved sanitation facilities, such as toilets or latrines, but who share these facilities with other households or communities, which can reduce their effectiveness and pose hygiene and safety challenges. Sharing sanitation facilities can also lead to health risks, as common use by several people increases the chances of contamination and the spread of disease, especially if the facilities are not properly maintained. [10] Furthermore, it can limit the privacy and dignity of individuals, especially in environments where access to sanitation remains insufficient.
Estimated maternal mortality ratio : In the area of access to sanitation, a low maternal mortality ratio is often linked to good sanitation and adequate access to health infrastructure. Quality sanitation facilities (clean toilets, wastewater management, safe drinking water supply) directly contribute to the overall health of pregnant women, as the lack of adequate sanitation can lead to infections, complications from urinary or genital infections, or unsanitary conditions that increase the risk of maternal death.
Under-five mortality rate: this variance shows us the proportion of children under five who die in a given population, often within one year.
The total population not using improved water sources: it shows us the proportion of people who do not have access to drinking water from sources deemed reliable and safe for consumption. They are therefore exposed to certain waterborne diseases such as diarrhea, dysentery, typhoid. [10]
Total mortality due to diarrheal diseases in children under five (years): it explains the total number of deaths caused by diarrheal diseases without distinction of sex (male, female). [11]
Human Development Index: helps understand how factors such as sanitation influence the overall development of a country and more specifically the health of its population. [13]
Share of population with access to a handwashing facility with soap and water at home: as indicated by the variance, it shows the number of people in a population who have the opportunity to wash their hands.
Bibliographic research

To better orient ourselves and properly approach our theme, we carried out a review to understand the impact of sanitation on the health of populations.

Data processing

The collected data was processed using Excel, where we compiled the different individuals and variables chosen for our study. R software was used for our principal component analysis (PCA).

setwd("C:/Users/HP/Desktop/Projet_RTI_S7_GEAAH_Groupe3")
donnees=read.csv(file = "base_de_donnees.csv",header = TRUE, sep = ";", quote = "\"",
                 dec = ".", comment.char = " ", row.names = 1)
donnees

##          PIB     PTUSEA     PPDLM   PPUISNA      PPPDL       TMM       TME
## BEN 3317.860 3173668.00 12.033162 10.787372 50.1748100 522.60480  8.576023
## BFA 2380.932 4735030.50  9.008969  8.411009 36.9222870 263.75540  8.426225
## CPV 7207.869   16708.98  9.008569  2.054531 10.4456960  42.24144  1.337823
## CIV 5787.815 5109563.50 21.621376 15.678690 22.4624940 479.90674  7.375122
## GMB 2702.319  234129.81 12.922296 38.221790  0.3162697 458.24033  4.892720
## GHA 6412.685 2312691.20 41.542220 10.682980 17.5134000 263.10858  4.534027
## GIN 3630.993 2395168.80 20.476960 34.122230  8.9663150 553.40160 10.124377
## GNB 2360.531  490857.30 18.114985 46.923510  9.6910320 725.09240  7.704989
## LBR 1497.377  822987.94  2.907872 15.706154 37.0568800 652.33875  7.774502
## MLI 2348.882 3089584.80 17.025766 30.363256  5.6806574 440.22476  9.971262
## MRT 5963.235  686686.10 44.510517  9.071481 29.2484360 463.83212  4.184231
## SEN 4018.041 2142769.80 22.011423 16.104609  9.1116580 260.88060  3.997455
## SLE 2751.967 2429933.80 16.910929 27.874298 17.2321100 442.83040 10.824886
## TGO 2490.053 2119567.20 17.146687 12.476508 41.4806330 399.03983  6.440245
##          TMD   IDH     PPUSA
## BEN  6611.07 0.501 19.767100
## BFA 10247.79 0.446 31.346024
## CPV    43.09 0.649  6.581627
## CIV  8290.07 0.530 26.237747
## GMB   566.80 0.492 14.097363
## GHA  6476.92 0.601 45.624420
## GIN  8284.69 0.471 28.106920
## GNB   804.62 0.482 17.037231
## LBR  1514.61 0.483 25.896390
## MLI 12546.41 0.407 17.373495
## MRT   898.19 0.539 10.012347
## SEN  3407.89 0.514 16.685520
## SLE  4306.74 0.453 33.876880
## TGO  3461.86 0.540 27.103490

II.Presentation of data and mapping of the study area

To collect our data, we opted for sixteen (16) countries including ten (10) variables. The table below shows us the mapping and data of our study area.

#III. Results and interpretations

##Thematic maps

The various maps below clearly represent the representation of our different variables by country. They were designed using QGIS software.

Principal component analysis (PCA)

Analysis and interpretation of the results obtained at the end of our first study.

In our study, our dataset on the impact of sanitation on public health in West African countries includes 10 variables and 15 individuals.

Study of the Correlation Matrix Table Correlation matrix

Table 1 : Correlation matrix

GDP PTUSEA PPDLM PPUISNA PPPDL TMM TME TMD IDH PPUSA PIB 1.000 -0.065 0.571 -0.519 -0.166 -0.581 -0.664 -0.147 0.816 -0.114 PTUSEA -0.065 1.000 -0.022 -0.212 0.328 -0.064 0.507 0.850 -0.347 0.506 PPDLM 0.571 -0.022 1.000 -0.111 -0.129 -0.084 -0.273 -0.001 0.313 0.175 PPUISNA -0.519 -0.212 -0.111 1.000 -0.584 0.640 0.465 -0.017 -0.574 -0.055 PPPDL -0.166 0.328 -0.129 -0.584 1.000 0.126 0.163 0.084 0.023 0.228 TMM -0.581 -0.064 -0.084 0.640 0.126 1.000 0.596 -0.042 -0.562 0.038 TME -0.664 0.507 -0.273 0.465 0.163 0.596 1.000 0.604 -0.835 0.420 TMD -0. 147 0.850 -0.001 -0.017 0.084 -0.042 0.604 1.000 -0.490 0.450 HDI 0.816 -0. 347 0.313 -0.574 0.023 -0.562 -0.835 -0.490 1.000 -0.111 PPUSA -0. 114 0.506 0.175 -0.055 0.228 0.038 0.420 0.450 -0.111 1.000

The correlation matrix is a matrix grouping a set of values called correlation coefficient. Indeed, these values show the relationship that exists between the variables taken two by two. They vary between -1 and 1 depending on whether the relationship that exists between the two variables is strong or weak. Indeed, there are three types of correlation namely: • Positive Correlation This means that two variables move in the same direction, i.e., an increase in one leads to an increase in the other and vice versa. The closer the value is to +1, the stronger the relationship. The most notable values are in red and greater than 0.5.

• Negative correlation: It tends to show that two variables move in opposite directions. • This means that an increase in one leads to a decrease in the other and vice versa. The closer the value is to -1, the greater the opposition relationship. • Zero correlation : This shows that the increase or decrease of one of the variables has no influence on the other.

In our correlation matrix we can describe the relationship between some variables. For example, the TME and HDI variables are negatively correlated. Thus, we can say that as the HDI increases, the TME decreases, and vice versa.

There is also a positive correlation between the PTUSEA and TMD variables, which allows us to say that the more PTUSEA increases, the more TMD increases . This means that the more we do not have improved water sources, the more diarrheal diseases ravage.

     II.    Study of eigenvalues
     
     

   eigenvalue variance.percent cumulative.variance.percent

Dim.1 3.376 33.760 33.760 Dim.2 2.793 27.934 61.694 Dim.3 1.614 16.137 77.830 Dim.4 1.105 11.052 88.883 Dim.5 0.540 5.399 94.282 Dim.6 0.340 3.400 97.681 Dim.7 0.169 1.692 99.374 Dim.8 0.043 0.426 99.800 Dim.9 0.019 0.187 99.987 Dim.10 0.001 0.013 100.000

This table shows the eigenvalues with their percentage of variance on each axis as well as the cumulative percentage of variance which is equal to 100% at the 10th variable . The number of eigenvalues is equal to 10 because according to PCA the number of eigenvalues must be equal to the number of variables. These eigenvalues also correspond to the variance of the cloud of individuals.

From these eigenvalues we can determine a priori the number of axes that we can retain using the method called Kaiser criterion. Thus from the Kaiser criterion which says that we must retain the axes associated with eigenvalues greater than 1, then we can retain axes 1, 2 3 and 4 which group together 88.88 % of the information. However, analysis on 4 axes can be difficult so we can then keep the first two axes which group together 61.69 % of the information for easier analysis.

To solidify our choice of the first two axes we can also base ourselves on the elbow rule or scree of eigenvalues.

library(car)

## Warning: le package 'car' a été compilé avec la version R 4.4.3

## Le chargement a nécessité le package : carData

## Warning: le package 'carData' a été compilé avec la version R 4.4.3

library(carData)
library(cluster)
library(corrplot)

## Warning: le package 'corrplot' a été compilé avec la version R 4.4.3

## corrplot 0.95 loaded

library(psych)

## Warning: le package 'psych' a été compilé avec la version R 4.4.3

## 
## Attachement du package : 'psych'

## L'objet suivant est masqué depuis 'package:car':
## 
##     logit

library(Hmisc)

## Warning: le package 'Hmisc' a été compilé avec la version R 4.4.3

## 
## Attachement du package : 'Hmisc'

## L'objet suivant est masqué depuis 'package:psych':
## 
##     describe

## Les objets suivants sont masqués depuis 'package:base':
## 
##     format.pval, units

library(ggplot2)

## Warning: le package 'ggplot2' a été compilé avec la version R 4.4.3

## 
## Attachement du package : 'ggplot2'

## Les objets suivants sont masqués depuis 'package:psych':
## 
##     %+%, alpha

library(factoextra)

## Warning: le package 'factoextra' a été compilé avec la version R 4.4.3

## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa

library(FactoMineR)

## Warning: le package 'FactoMineR' a été compilé avec la version R 4.4.3

library(FactoInvestigate)

## Warning: le package 'FactoInvestigate' a été compilé avec la version R 4.4.3

library(DataExplorer)

## Warning: le package 'DataExplorer' a été compilé avec la version R 4.4.3

setwd("/Users/hp/Desktop/Projet_rti_groupe/projet")
donnees=read.csv(file = "base_de_donnees.csv",header = TRUE, sep = ";", quote = "\"",
                 dec = ".", comment.char = " ", row.names = 1)
donnees

##          PIB     PPDLM   PPUISNA      PPPDL        TMM       TME      PTUSEA
## BEN 3317.860 12.033162 10.787372 50.1748100  522.60480  8.576023  3173668.00
## BFA 2380.932  9.008969  8.411009 36.9222870  263.75540  8.426225  4735030.50
## CPV 7207.869  9.008569  2.054531 10.4456960   42.24144  1.337823    16708.98
## CIV 5787.815 21.621376 15.678690 22.4624940  479.90674  7.375122  5109563.50
## GMB 2702.319 12.922296 38.221790  0.3162697  458.24033  4.892720   234129.81
## GHA 6412.685 41.542220 10.682980 17.5134000  263.10858  4.534027  2312691.20
## GIN 3630.993 20.476960 34.122230  8.9663150  553.40160 10.124377  2395168.80
## GNB 2360.531 18.114985 46.923510  9.6910320  725.09240  7.704989   490857.30
## LBR 1497.377  2.907872 15.706154 37.0568800  652.33875  7.774502   822987.94
## MLI 2348.882 17.025766 30.363256  5.6806574  440.22476  9.971262  3089584.80
## MRT 5963.235 44.510517  9.071481 29.2484360  463.83212  4.184231   686686.10
## NGA 5410.693 30.839497 19.109652 19.0438580 1047.23540 11.394196 38726692.00
## SEN 4018.041 22.011423 16.104609  9.1116580  260.88060  3.997455  2142769.80
## SLE 2751.967 16.910929 27.874298 17.2321100  442.83040 10.824886  2429933.80
## TGO 2490.053 17.146687 12.476508 41.4806330  399.03983  6.440245  2119567.20
##           TMD   IDH     PPUSA
## BEN   6611.07 0.501 19.767100
## BFA  10247.79 0.446 31.346024
## CPV     43.09 0.649  6.581627
## CIV   8290.07 0.530 26.237747
## GMB    566.80 0.492 14.097363
## GHA   6476.92 0.601 45.624420
## GIN   8284.69 0.471 28.106920
## GNB    804.62 0.482 17.037231
## LBR   1514.61 0.483 25.896390
## MLI  12546.41 0.407 17.373495
## MRT    898.19 0.539 10.012347
## NGA 159784.88 0.539 17.495344
## SEN   3407.89 0.514 16.685520
## SLE   4306.74 0.453 33.876880
## TGO   3461.86 0.540 27.103490

res.pca1=PCA(donnees, scale.unit = TRUE, ncp = 10, ind.sup = NULL, 
            quanti.sup = NULL, quali.sup = NULL, row.w = NULL, 
            col.w = NULL, graph = FALSE, axes = c(1,2))

library(car)
library(carData)
library(cluster)
library(corrplot)
library(psych)
library(Hmisc)
library(ggplot2)
library(factoextra)
library(FactoMineR)
library(FactoInvestigate)
library(DataExplorer)
fviz_eig(res.pca1, addlabels=TRUE, hjust = -0.4)###diagramme de coude

Figure 1: Inertia explained

The total inertia of the factorial axes indicates on the one hand whether the variables are structured and on the other hand suggests the number of principal components that is appropriate to study.

So looking at this figure, we see that the first two axes of the analysis express 69.69% of the total inertia of the data set, which means that 61.69% of the total variability of the cloud of individuals or variables is represented in this plane. This is a high percentage, and the first plane therefore represents the variability contained in a very large part of the active data set. This value is higher than the reference value of 50% , the variability explained by this plane is therefore significant.

Based on these observations, we can say that the first two axes carry real information. Consequently, we will only keep these two axes for the description of the analysis. Thus, we reaffirm our choice of axes based on the Kaizer criterion .

Presentation of individual graphs

Contributions of individuals to the formation of the factorial plan

fviz_contrib(res.pca1, choice = "ind", axes = c(1,2))

Figure 2: Contribution of individuals to the formation of the factorial plan

This graph shows the contribution of individuals (countries) to the first two dimensions of PCA ( Principal Component Analysis). The higher a country’s contribution, the more it influences the construction of the factor axis.

Reading the graph:

• NGA (Nigeria) is the country that contributes the most to the Dim1-Dim2 axis, with over 40%. He is a very influential individual in shaping the ACP axes.

• CPV (Cape Verde) comes next, with around 20% contribution.

• Other countries (like GHA, MRT, MLI, GNB…) have a much more moderate contribution.

• The dotted red line represents the expected average contribution. Countries above this line (NGA, CPV, GHA, etc.) play an important role in structuring the data.

• Countries with very low contributions (BEN, TGO, CV, etc.) have a low impact on the first two dimensions, so their position is less decisive in the analysis.

Possible interpretation:

Nigeria is a very atypical country in the analysis. It strongly influences the structure of the factor space, which can be explained by extreme values on certain variables, for example, GDP or other socio-economic indicators.

Countries like Cape Verde and Ghana also contribute significantly, meaning they stand out in the multidimensional space, likely through specific levels of sanitation access or public health.”

Conversely, countries like Benin or Togo have a low contribution: they are closer to the global average and do not strongly influence the construction of the axes.

Cloud of individuals in the factorial plane

fviz_pca_ind(res.pca1,repel = TRUE,col.ind = "red",label="ind")

Figure 3: Cloud of individuals on the factorial plane

The representation of individuals in the factorial plan was done according to the previously chosen axes.

The projection of individuals onto the factorial plane defined by Dim1 and Dim2 allows us to obtain this graph grouping together as much information as possible to visualize on a plane from the initial

data. The inertia explained by the factorial plane is 3 3.8% for Dim1 and 27.9% for Dim2, i.e. 61.6% of the information retained.

Individuals far from the origin are better represented while those close to the origin are poorly represented. In our case, Nigeria and Cape Verde are better represented compared to other individuals.

• Nigeria (NGA) is very far from the center of the cloud (and other countries), especially on Dim2.

• This means that it has a very specific (even extreme) profile compared to other countries.

• This country could have very high or very low values on variables such as GDP, urban population with access to sanitation, or other health indicators.

• It is noted that Nigeria appears to be an atypical individual, which is probably explained by very distinct economic or health indicators.

Hierarchical Ascending Classification (HAC)

Classification is a method that aims to group individuals with characteristics or some characteristics in common. It can be done in a supervised way (knowing the number of classes desired, we seek to know to which class an individual can belong) and unsupervised way (we do not know in advance the number of classes). In our case, we performed an unsupervised classification (a hierarchical ascending classification).

res.PCA<-PCA(donnees,graph=TRUE)

  res.HCPC<-HCPC(res.PCA,nb.clust=3,consol=FALSE,graph=FALSE)
  plot.HCPC(res.HCPC,choice='tree',title='Arbre hiérarchique')

 fviz_cluster(res.HCPC,repel = TRUE,show.clust.cent = TRUE)

Principal component analysis (PCA) without Nigeria

Analysis and interpretation of the results obtained at the end of our second study.

In our study, our dataset on the impact of sanitation on public health in West African countries includes 10 variables and 15 individuals.

-Study of the Correlation Matrix Table

setwd("/Users/hp/Desktop/Projet_rti_groupe/projet")
donneees=read.csv(file = "base_de_donneees.csv",header = TRUE, sep = ";", quote = "\"",
                   dec = ".", comment.char = " ", row.names = 1)
mat_cor=cor(donneees[,1:10])
print(mat_cor)

##                 PIB        PPDLM     PPUISNA       PPPDL         TMM        TME
## PIB      1.00000000  0.570856826 -0.51920313 -0.16582957 -0.58129391 -0.6639147
## PPDLM    0.57085683  1.000000000 -0.11074962 -0.12890418 -0.08383755 -0.2726739
## PPUISNA -0.51920313 -0.110749618  1.00000000 -0.58375737  0.64002523  0.4653139
## PPPDL   -0.16582957 -0.128904178 -0.58375737  1.00000000  0.12632193  0.1626644
## TMM     -0.58129391 -0.083837548  0.64002523  0.12632193  1.00000000  0.5959763
## TME     -0.66391466 -0.272673950  0.46531389  0.16266441  0.59597625  1.0000000
## PTUSEA  -0.06465608 -0.021699109 -0.21210016  0.32814061 -0.06417448  0.5072795
## TMD     -0.14694125 -0.001313632 -0.01746785  0.08443822 -0.04172368  0.6044514
## IDH      0.81616463  0.312972889 -0.57415607  0.02333737 -0.56229276 -0.8345751
## PPUSA   -0.11443333  0.175027836 -0.05457034  0.22807014  0.03821940  0.4198796
##              PTUSEA          TMD         IDH       PPUSA
## PIB     -0.06465608 -0.146941252  0.81616463 -0.11443333
## PPDLM   -0.02169911 -0.001313632  0.31297289  0.17502784
## PPUISNA -0.21210016 -0.017467846 -0.57415607 -0.05457034
## PPPDL    0.32814061  0.084438221  0.02333737  0.22807014
## TMM     -0.06417448 -0.041723675 -0.56229276  0.03821940
## TME      0.50727946  0.604451372 -0.83457514  0.41987955
## PTUSEA   1.00000000  0.849963173 -0.34733085  0.50555737
## TMD      0.84996317  1.000000000 -0.48974316  0.45021784
## IDH     -0.34733085 -0.489743164  1.00000000 -0.11109203
## PPUSA    0.50555737  0.450217844 -0.11109203  1.00000000

Correlation matrix

Positive Correlation: This means that two variables move in the same direction, meaning that an increase in one leads to an increase in the other and vice versa. The closer the value is to +1, the stronger the relationship. The most notable values are in red and greater than 0.5.

•Negative correlation : It tends to show that two variables move in opposite directions. We can say that increasing one leads to a decrease in the other and vice versa. The closer the value is to -1, the greater the opposition relationship.

• **Zero correlation* : it shows that the increase or decrease of one of the variables has no influence on the others.

In our correlation matrix we can describe the relationship between some variables. For example, the variables TME and GDP are negatively correlated. Thus, we can say that the more GDP increases, the more TME decreases and vice versa.

There is also a positive correlation between the PTUSEA and TMD variables, which allows us to say that the more PTUSEA increases, the more TMD increases. This means that the more we don’t have improved water sources, the more diarrheal diseases ravage.

Study of eigenvalues

library(car)##regression linéaire
library(carData)##regression lineaire
library(cluster)
library(corrplot)
library(psych)
library(Hmisc)
library(ggplot2)### pour les graphiques
library(factoextra)##graphi
library(FactoMineR)
library(FactoInvestigate)
library(DataExplorer)

res.pca2=PCA(donneees, scale.unit = TRUE, ncp = 10, ind.sup = NULL, 
            quanti.sup = NULL, quali.sup = NULL, row.w = NULL, 
            col.w = NULL, graph = FALSE, axes = c(1,2))
print(res.pca2)

## **Results for the Principal Component Analysis (PCA)**
## The analysis was performed on 14 individuals, described by 10 variables
## *The results are available in the following objects:
## 
##    name               description                          
## 1  "$eig"             "eigenvalues"                        
## 2  "$var"             "results for the variables"          
## 3  "$var$coord"       "coord. for the variables"           
## 4  "$var$cor"         "correlations variables - dimensions"
## 5  "$var$cos2"        "cos2 for the variables"             
## 6  "$var$contrib"     "contributions of the variables"     
## 7  "$ind"             "results for the individuals"        
## 8  "$ind$coord"       "coord. for the individuals"         
## 9  "$ind$cos2"        "cos2 for the individuals"           
## 10 "$ind$contrib"     "contributions of the individuals"   
## 11 "$call"            "summary statistics"                 
## 12 "$call$centre"     "mean of the variables"              
## 13 "$call$ecart.type" "standard error of the variables"    
## 14 "$call$row.w"      "weights for the individuals"        
## 15 "$call$col.w"      "weights for the variables"

valeur_propre=get_eigenvalue(res.pca2)

round(valeur_propre,digits = 3)

##        eigenvalue variance.percent cumulative.variance.percent
## Dim.1       3.966           39.664                      39.664
## Dim.2       2.474           24.742                      64.406
## Dim.3       1.391           13.909                      78.315
## Dim.4       1.004           10.041                      88.356
## Dim.5       0.590            5.896                      94.252
## Dim.6       0.288            2.879                      97.131
## Dim.7       0.136            1.364                      98.495
## Dim.8       0.089            0.886                      99.380
## Dim.9       0.053            0.525                      99.906
## Dim.10      0.009            0.094                     100.000

This table presents the eigenvalues with their percentage of variance on each axis as well as the cumulative percentage of variance which is equal to 100% at the 10th variable . The The number of eigenvalues is equal to 10 because according to PCA the number of eigenvalues must be equal to the number of variables. These eigenvalues also correspond to the variance of the cloud of individuals.

From these eigenvalues we can determine a priori the number of axes that we can retain using the method called Kaiser criterion. Thus from the Kaiser criterion which says that we must retain the axes associated with eigenvalues greater than 1, then we can retain axes 1, 2 3 and 4 which group together 88.36 % of the information. However, analysis on 4 axes can be difficult so we can then keep the first two axes which group together 64.4 % of the information for easier analysis.

To solidify our choice of the first two axes we can also base ourselves on the elbow rule or scree of eigenvalues.

fviz_eig(res.pca2, addlabels=TRUE, hjust = -0.4)###diagramme de coude

The total inertia of the factorial axes indicates on the one hand whether the variables are structured and on the other hand suggests the number of principal components that is appropriate to study.

So looking at this figure, we see that the first two axes of the analysis express 64.4% of the total inertia of the data set, which means that 64.4% of the total variability of the cloud of individuals or variables is represented in this plane. This is a high percentage, and the first plane therefore represents the variability contained in a very large part of the active data set. This value is higher than the reference value of 50% , the variability explained by this plane is therefore significant.

Based on these observations, we can say that the first two axes carry real information. Consequently, we will only retain these two axes for the description of the analysis. Thus, we reaffirm our choice of axes based on the Kaizer criterion.

#Studies of variables

#Study of the correlation circle

This graph shows the contribution of variables to the construction of the first two dimensions of Principal Component Analysis (PCA)

• Dim-1 (39.7% of variance explained): Associated with socioeconomic development and health conditions.

• Dim-2 (24.7% of variance explained): Represents differences in access to sanitation and drinking water.

The variables are represented by arrows indicating their correlation with the axes:

• The longer the arrow, the better the variable is represented. • Variables that are close to each other are correlated. • Opposite variables are negatively correlated

#Analysis of Variable Groups

Strongly negative on Dim-1 , weak on Dim-2.

Interpretation:

• GDP and HDI are opposed to mortality and poor sanitation variables . • A high HDI/GDP is associated with better access to sanitation and lower mortality . Positive on Dim-1 , weak on Dim-2.

• Positively correlated with each other , indicating that maternal, infant, and diarrheal disease mortality are linked.

• Opposite to HDI and GDP , which shows that countries with high mortality are also those with low development. Position: Positive on Dim-2 and Dim-1.

• PPUSA, PTUSEA and PPUISNA are correlated with each other and show a direct link between limited access to water and sanitation infrastructure and negative impacts on health.

• TMD (diarrheal disease mortality rate) is also close to these indicators , confirming that the lack of basic infrastructure contributes to preventable diseases.

Conclusion and Interpretation

Dim-1 differentiates countries according to the level of socio-economic development.

Dim-2 represents access to sanitation infrastructure and its impact on public health.

Wealthier countries (high HDI, high GDP) have better health status, while those with low HDI have high mortality and poor access to infrastructure.

res.pca2=PCA(donneees, scale.unit = TRUE, ncp = 10, ind.sup = NULL, 
            quanti.sup = NULL, quali.sup = NULL, row.w = NULL, 
            col.w = NULL, graph = FALSE, axes = c(1,2))
fviz_contrib(res.pca2, choice = "var", axes = 1)

The most contributing variables to Dim-1 :

TME (Child Mortality Rate) - Maximum Contribution

• This factor is the most influential in the structuring of Dim-1.

• It is strongly correlated with other health indicators.

• Its likely opposition to HDI and GDP indicates that countries with high infant mortality are also those with low human and economic development.

HDI (Human Development Index) - Very influential

• Its strong contribution confirms that it is a key indicator of countries’ development.

• It is probably opposite to the mortality and poor sanitation variables , suggesting that countries with a better HDI have better health status.

GDP (Gross Domestic Product) - High Contribution

• Like the HDI, it is associated with the level of economic development.

• It plays a major role in differentiating countries according to their wealth and access to health infrastructure.

MMR (Maternal Mortality Rate) - Significant Contribution

Closely linked to TME and TMD, it highlights the impact of sanitary conditions on maternal mortality.

TMD (Diarrheal Disease Mortality Rate) - Moderate contribution

• It is strongly influenced by access to sanitation infrastructure and drinking water.

• It shows the direct link between poor hygiene and mortality rates.

Secondary Variables (Low Contribution to Dim-1):

PPUSA, PTUSEA, PPUSA, PPDO, PPDLM (Indicators of access to water and sanitation)

• These variables have a lower contribution to Dim-1, but remain relevant. • They are probably more influential on Dim-2 , which seems to represent access to hygiene infrastructure.

Conclusion and Interpretation

Dim-1 strongly differentiates countries according to their level of development and health status.

Countries with high HDI and GDP have lower infant and maternal mortality. Countries with high infant/maternal mortality have low human and economic development.

The variables of access to water and sanitation influence this dimension less but are surely more important on Dim-2.

fviz_contrib(res.pca2, choice = "var", axes = 2)

#General Reading

• Dim-2 (24.7% of the explained variance) appears to be strongly influenced by sanitation and hygiene conditions. • The tallest bars indicate the variables that contribute most to Dim-2. • The red line represents the expected average contribution (10%). Most Contributing Variables to Dim-2

PTUSEA (Total Population Not Using Improved Water Sources) - Maximum Contribution

• This factor is most influential on Dim-2. • It reflects the level of access to safe drinking water , which is a key factor in health conditions. • It is probably correlated with other indicators of sanitation and water-related diseases.

PPUISNA (Share of population practicing open defecation) - Very influential • It illustrates the lack of health infrastructure and its impact on public health.

• Its importance on Dim-2 shows that it plays a role in differentiating countries according to their level of public hygiene. TMD (Diarrheal Disease Mortality Rate) - High Contribution

• Strongly influenced by problems of access to drinking water and sanitation infrastructure.

• It is directly linked to the variables of access to water and sanitation facilities . PPUSA (Share of Population Using Limited Sanitation Services) - Significant Contribution • Shows the level of health insecurity and its impact on health. • It is probably correlated with PTUSEA and PPUISNA , which suggests a classification of countries according to their access to basic services.

PPDLM (Proportion of the population with access to a handwashing facility with soap and water) - Significant contribution

• Key factor in the prevention of infectious diseases , particularly diarrheal diseases. • Low coverage of handwashing facilities is associated with poor hygiene and high health risk. Secondary Variables (Low Contribution to Dim-2): TMM (Maternal Mortality), GDP, PPDLM, HDI, TME • These variables are less influential on Dim-2.

• HDI and GDP were very influential on Dim-1, meaning that Dim-2 is less related to economic development and more focused on • Health infrastructure and its effects on public health. • MMR and MCM (maternal and infant mortality) have a lesser role on this dimension, which confirms that Dim-2 differentiates countries mainly according to their sanitation conditions.

Conclusion and General Interpretation

Dim-2 differentiates countries based on their access to drinking water and sanitation infrastructure. Countries with poor access to drinking water and poor sanitation infrastructure have a higher risk of diarrheal diseases. Economic variables (HDI, GDP) have little influence on this dimension, meaning that countries with good economic development can still face health challenges.

fviz_contrib(res.pca2, choice = "var", axes = c(1,2))

General Reading

• Variables with a high contribution (above the red line) are the most influential in explaining the total variance of the data. • Dim-1-2 combines socio-economic and health aspects to explain the differentiation of countries.

Most Contributing Variables to Dim-1-2:

TME (Child Mortality Rate) – Maximum Contribution

• Key indicator of public health, strongly influenced by access to health services and health infrastructure . • Shows that countries with poor sanitation have high infant mortality rates . HDI (Human Development Index) – Very influential • Global factor that integrates health, education and standard of living . • Its importance shows that more developed countries have a better health situation and a lower mortality rate . PTUSEA (Population not using improved water sources) – Very influential

• Indicates access to drinking water, a key differentiator between countries in terms of public health .

• Shows a direct link with waterborne diseases and infant mortality . GDP (Gross Domestic Product) – High contribution • A strong economic indicator that directly influences access to health and medical infrastructure . • The presence of GDP and HDI among the dominant variables shows that economic and social aspects are linked to health conditions . 5️PPUISNA (Share of population practicing open defecation) – High contribution • Key indicator of insufficient health infrastructure . • Its influence shows that countries where open defecation is widespread are also those with poor public health indicators . TMD (Diarrheal Disease Mortality Rate) – Significant Contribution • Directly linked to sanitation conditions and access to drinking water.

• Strengthens the link between health infrastructure and infant mortality. Secondary Variables (Low Contribution to Dim-1-2): TMM (Maternal Mortality), PPUSA (Population using limited sanitation services), PPDLM (Access to handwashing) • Less influential on the global axis, but still linked to hygiene and public health .

• PPDLM has a low contribution , suggesting that access to handwashing, although important, is not the main differentiating factor between countries .

  **Conclusion and Interpretation**

Economic variables (HDI, GDP) and public health indicators (EMR, MDR) are the most influential in differentiating countries. Countries with a low HDI, low GDP, and limited health infrastructure have high mortality rates (EMR, MDR). Access to drinking water (PTUSEA) and sanitation (PPUISNA) are key factors in explaining the differences between countries in public health.

fviz_pca_biplot(res.pca2, repel = TRUE,col.var = "blue",col.ind = "red")

 **General Reading of the Chart**

• The Dim1 axis (39.7%) explains most of the variance in the data. It appears to differentiate countries according to their level of development and health conditions .

• The Dim2 axis (24.7%) explains a significant part of the variance, potentially linked to specific disparities in access to water and health infrastructure.

• Countries close to each other share similar characteristics in terms of sanitation and public health.

• The arrows represent the variables: their direction indicates how they influence the distribution of countries.

Interpretation of Variables (Blue Arrows)

Variables strongly projected on Dim1 (horizontal axis): • HDI (Human Development Index) and GDP (Gross Domestic Product) are on the left, showing an opposition with health variables like TME (Infant Mortality), TMD (Mortality due to diarrheal diseases), and PPUISNA (Open Defecation) which are on the right.

• A country on the left has better economic development and improved health infrastructure , while a country on the right has poor health conditions.

Variables projected on Dim2 (vertical axis)

• PTUSEA (Population without access to an improved water source) and PPUISNA (Population using limited sanitation) are at the top. • This means that countries at the top of the chart have very limited access to clean water and sanitation infrastructure.

• At the bottom, PPUISNA (Open defecation) and TMM (Maternal mortality) are present, which can be another differentiating criterion. Country Analysis (Red Dots):

Countries with poor sanitation and limited access to water (located right and above)

• SLE (Sierra Leone), MLI (Mali), BEN (Benin), CIV (Côte d’Ivoire), BFA (Burkina Faso) are strongly associated with high mortality rates and lack of access to drinking water. • These countries need more effective health policies to improve sanitation infrastructure and access to drinking water.

Countries with higher development and better infrastructure (located left and bottom)

• CPV (Cape Verde), MRT (Mauritania), SEN (Senegal) are positioned closer to the HDI and GDP, which means they have better economic and health conditions.

• This suggests better management of water and sanitation infrastructure, leading to a reduction in water-related diseases. Countries in an intermediate situation (close to the center) • GIN (Guinea), LBR (Liberia), TGO (Togo) are in the center of the graph, which indicates an intermediate situation where health infrastructure is present but not optimal.

**Conclusion and Recommendations**

Dim1 (horizontal axis) is an indicator of the level of development and quality of health infrastructure.

Dim2 (vertical axis) reflects more access to drinking water and rates of associated diseases.

Countries on the right require priority interventions to improve access to drinking water and reduce infant mortality.

The countries on the left can be used as a benchmark to identify good practices in sanitation and public health.

Presentation of individual graphs

fviz_contrib(res.pca2, choice = "ind", axes = 1)

of our Principal Component Analysis (PCA)). Analysis :

• CPV (Cape Verde) largely dominates the contribution to Dim-1 with more than 40%. This means that this country plays a major role in defining this first component.

• MRT (Mauritania), MLI (Mali) and GHA (Ghana) also have significant contributions , but much weaker than CPV. They appear to be significant players in explaining Dim-1. Other countries have smaller contributions , with decreasing influence as one moves towards the right of the graph.

The red line represents the expected average contribution . Countries above this line have a significant influence on the component, while those below this line have a lesser influence.

 **Interpretation:**
 
 
 
 
 
•   CPV strongly influences Dim-1 , meaning that it is strongly differentiated from other countries along this dimension.

• Countries like MRT, MLI and GHA are also well represented , but with less weight.

• The countries on the right (BEN, GNB, TGO, etc.) have a low contribution , which means that they are not strongly correlated with the first dimension and have little influence on its interpretation.

For a more in-depth analysis, one would need to examine what dimension 1 represents (e.g., an economic, climatic, or cultural gradient) and see which variables are most associated with this dimension.

fviz_contrib(res.pca2, choice = "ind", axes = 2)

Analysis of Contributions

Guinea-Bissau (GNB) and Gambia (GMB) have the highest contributions to Dim-2 (over 20% each). This means that these countries are the most differentiated along this dimension.

Burkina Faso (BFA), Ghana (GHA) and Cape Verde (CPV) follow with notable contributions (between 10 and 20%). They also play an important role in the construction of this component.

The other countries (Togo, Mauritania, Guinea, Sierra Leone, Mali) have low contributions. This means that these countries are not significantly impacted by Dim-2.

The red line represents the expected average contribution. Countries above this line significantly influence Dim-2 and those below have a weaker impact on this dimension.

 Potential Interpretation

The meaning of Dim-2 depends on the variables that are strongly associated with it.

If we assume that Dim-2 reflects a gradient related to sanitation and water , then:

• GNB and GMB could be characterized by poor access to sanitation (PPUSA, PPUISNA, PTUSEA) and high mortality rates (TMM, TME, TMD). • Conversely, countries with a low contribution (MLI, SLE, GIN) could be more homogeneous or less differentiated according to these criteria.

fviz_contrib(res.pca2, choice = "ind", axes = c(1,2))

 Analysis of Contributions

Cape Verde (CPV) has the highest contribution (> 20%): • It plays a major role in differentiating countries according to Dim-1 and Dim-2.

• It is probably an extreme country in the variables analyzed (development, public health, sanitation).

Guinea-Bissau (GNB), Ghana (GHA), Burkina Faso (BFA), Gambia (GMB), Mauritania (MRT) and Mali (MLI) contribute significantly (> 10% ). These countries strongly influence the analysis and present contrasting profiles.

Countries like Togo (TGO), Senegal (SEN), Benin (BEN) and Liberia (LBR) have a lower contribution. They are less differentiated and have little influence on the structuring of the main dimensions. The red line shows the expected average contribution.

Countries above this line are strongly differentiated according to Dim-1 and Dim-2. Those below are more homogeneous and close to the regional average .

 Interpretation in relation to the Circle of Correlations Looking at the variables associated with Dim-1 and Dim-2 , we can assume that:

• Cape Verde (CPV) is probably a country with good socio-economic development (high HDI, GDP) and better sanitation infrastructure .

• Guinea-Bissau (GNB), Gambia (GMB) and Burkina Faso (BFA) could be characterized by a lack of access to drinking water, a high mortality rate and limited sanitation infrastructure.

• Countries like Senegal (SEN) and Togo (TGO) have more intermediate profiles and are less marked by the contrasts of the dimensions studied .

partial conclusion

Countries with a high contribution (CPV, GNB, GMB, BFA) are those with extreme characteristics in terms of access to sanitation and impact on public health.

Countries with a low contribution are more homogeneous and do not strongly influence the structure of the data.

fviz_pca_ind(res.pca2, col.ind = "contrib",gradient.cols = c("blue" , "green" , "red"), repel = TRUE)

• Dim-1 (39.7% of the variance explained): it is linked to socio-economic development and health conditions.

• Dim-2 (24.7% of the variance explained): differentiates countries according to access to water, hygiene and mortality.

Each point represents a country, colored according to its contribution to the structuring of the axes (from blue = low contribution to red = high contribution).

 Country Group Analysis

• Position: CPV is very negative on Dim-1 and slightly negative on Dim-2.

Therefore, we can say that it is opposed to countries with poor sanitary conditions. In addition, it has a high GDP and HDI and better access to sanitation and drinking water and stands out strongly from other countries.

• On Dim-2 the position is very bad and slightly negative on Dim-1. These countries suffer from poor access to drinking water and hygiene infrastructure (strong correlation with PTUSEA, PPUISNA).

Their high contribution to Dim-2 shows that they are among the most affected by diarrheal diseases and infant/maternal mortality . Positive on Dim-1 and positive on Dim-2:

These countries are not the most vulnerable , but they remain in an intermediate situation.

They have average economic and social development , with challenges in sanitation and public health .

Close to the origin, low contribution :

These countries are closer to the average and do not present extreme situations. They are intermediate in terms of sanitation and public health .

Conclusion and General Interpretation

Cape Verde (CPV) is the most developed country with better sanitation infrastructure.

Gambia (GMB) and Guinea-Bissau (GNB) are the most affected by lack of access to drinking water and healthcare infrastructure. Ghana (GHA), Burkina Faso (BFA), and Côte d’Ivoire (CIV) are in an intermediate position.

The other countries have more homogeneous profiles, close to the average. Access to sanitation and drinking water is a key factor influencing mortality and human development in the region.

Classification

Classification is a method that aims to group individuals with characteristics or some characteristics in common. It can be done in a supervised way (Knowing the number of desired classes, we seek to know to which class an individual can belong) and unsupervised way (We do not know in advance the number of classes). In our case, we carried out an unsupervised classification (a hierarchical ascending classification).

    res.PCA1<-PCA(donneees,graph=FALSE)
  ### HCPC effectue CAH sur les composantes principales des donnees
res.PCA1<-PCA(donneees,graph=TRUE)

res.HCPC1<-HCPC(res.PCA1,nb.clust=3,consol=FALSE,graph=FALSE)### HCPC effectue CAH sur les composantes principales des donnees
  plot.HCPC(res.HCPC1,choice='tree',title='Arbre hiérarchique')

This dendrogram represents the ascending hierarchical classification (AHC) of countries according to their similarities in socioeconomic, health, or environmental variables. It allows the identification of homogeneous groups of countries sharing common characteristics. General Reading of the Dendrogram

• The vertical axis represents the distance (or dissimilarity) between groups :

The higher a merge is performed in the graph , the more different the merged groups are . Conversely, a merge near the base indicates greater similarity between the grouped elements.

• The colored boxes represent the groups identified after cutting the tree : Cutting the tree at a given height allows all countries to be divided into distinct clusters , each grouping together countries with similar profiles.

Identification of Groups (Clusters) Looking at the dendrogram, we can distinguish three main groups :

Group 1 (left - framed in red)

• Countries: CPV (Cape Verde), GHA (Ghana), MRT (Mauritania), SEN (Senegal)

• Possible features:

o Relatively more advanced level of development . o Better access to sanitation and drinking water. o Higher GDP and HDI than other groups. o Lower infant and maternal mortality rates .

Group 2 (center - framed in pink )

• Countries: GMB (Gambia), GNB (Guinea-Bissau) • Possible features : o Countries in an intermediate situation , with moderate access to sanitation services. o Living conditions less favorable than group 1 , but better than group 3 . o Average level of economic and health development

Group 3 (right - framed in green)

• Country: LBR (Liberia), TGO (Togo), BEN (Benin), MLI (Mali), GIN (Guinea), SLE (Sierra Leone), CIV (Ivory Coast), BFA (Burkina Faso) • Possible features : o These countries have the worst sanitary conditions and access to drinking water . o High mortality rates (infant, maternal, diarrheal diseases). o Low Human Development Index (HDI) and GDP . o High vulnerability to health crises and increased need for investment in health and social infrastructure .

Interpretation and Implications

• Group 1 appears to be a model in health and economic development , and can serve as a reference for others.

• Group 2 is an intermediate category , requiring support to avoid a deterioration in living conditions.

• Group 3 is a priority for interventions because it includes countries with the most precarious conditions.

Thus, this classification allows: ✔ Identify action priorities in terms of investments in health and economic infrastructure.

✔ Target public policies adapted to each group.

✔ Facilitate cooperation between countries with similar characteristics to share effective solutions.

plot.HCPC(res.HCPC1,choice='map',draw.tree=FALSE,title='Plan factoriel')

This graph represents a principal component analysis (PCA) with a cluster classification from an ascending hierarchical classification (AHC).

Chart axes:

• Dim 1 (39.66%) : represents the first principal component, which captures almost 40% of the total information.

• Dim 2 (24.74%) : second principal component, explaining approximately 25% of the variance.

• Together, these two dimensions explain about 64.4% of the variability in the data , which is a good representation of the overall structure.

Identification of Groups (Clusters)

Three groups of countries are identified, each with different characteristics:

Cluster 1 (Red - GMB, GNB)

• This group is located on the right side of the chart. • It is characterized by positive values in Dim 1 , which means that it is associated with specific factors distinct from other groups . • Possible interpretation:

o More difficult socio-economic conditions.

o Low levels of HDI and GDP.

o Sanitation problems and limited access to drinking water.

o Countries that require significant development efforts.

Cluster 2 (Black - CPV, MRT, SEN, GHA) • This group is located on the left of the graph , with negative values in Dim 1 .

• Possible interpretation:

oCountries with relatively better socio-economic conditions than others. oMore advanced level of infrastructure.

o Better access to public and economic services.

o Better HDI and GDP compared to other clusters.

Cluster 3 (Green - BFA, CIV, BEN, TGO, SLE, MLI, GIN, LBR)

• These countries are located in the upper right part of the graph . • Possible interpretation: o Countries with intermediate conditions . o A combination of varied social and economic characteristics . o They are not as disadvantaged as Cluster 1 , but they also do not benefit from the advantages of Cluster 2 . o Require targeted development actions to improve their situation.

General Interpretation

• Dim 1 separates countries according to their level of socio-economic development:

o Left (CPV, MRT, SEN, GHA): relatively better conditions . o Right (GMB , GNB): most vulnerable countries .

• Sun 2 appears to distinguish countries based on their potential for improvement:

o At the top (BFA, CIV, BEN, etc.): countries that could benefit from adapted policies.

o Bottom (GMB, GNB): countries with greater challenges to address . Implications and Recommendations

  **Cluster 1 countries (red - GMB, GNB):** Urgent

need for infrastructure investment and improvement of living conditions.

Priority for humanitarian and economic programs. Cluster 2 countries (black - CPV, MRT, SEN, GHA) : These countries can serve as a model for others. Promote regional cooperation policies to help countries in difficulty.

Cluster 3 countries (green - BFA, CIV, BEN, etc.) Implementation of targeted programs to prevent them from joining Cluster 1. Need for adapted strategies to promote growth and access to infrastructure .

fviz_cluster(res.HCPC1,repel = TRUE,show.clust.cent = TRUE)

Linear Regression

Multiple linear regression is a statistical technique used to model the linear relationship between a variable and a dependent variable.

model1=lm(formula = donneees$PIB~donneees$IDH+
              donneees$PPDLM
            ,data=donneees)
  vif(model1)

##   donneees$IDH donneees$PPDLM 
##       1.108588       1.108588

  dada=data.frame(donneees)
  data_vis=data.frame(valeurs_reelles=dada$IDH,predictions=predict(model1))
  ggplot(data_vis,aes(x=valeurs_reelles,y=predictions))+
    geom_point()+
    geom_smooth(method ="lm",se=FALSE,color="red")+
    labs(x="valeurs_reelles",y="predictions")+
    ggtitle(model1)

## `geom_smooth()` using formula = 'y ~ x'

Reading the graph

Aligning the points with the red line: o The points are very close to the red line, which indicates that the model is performing very well. o The majority of predictions are close to the actual values, which validates the quality of the adjustment. Good linear trend:

o There is a strong linear relationship between the explanatory variables and the dependent variable. o This means that the independent variables chosen are relevant to explain the target variable. Negative intercept:

o The model has an intercept of −7662.30, which indicates that when all explanatory variables are zero, the predicted value would be negative.

o This is not always directly interpretable, especially if the explanatory variables can never be zero in reality. This is not a problem if the model is good in the domain of the observed data.

Interpretation related to your theme: Sanitation and public health This graph is probably taken from the following model: lm(formula = GDP ~ HDI + PPDLM, data = data ) • This means that GDP is predicted based on HDI (Human Development Index) and PPDLM (e.g., poor population without access to sanitation or mortality rate). What this implies:

• A high HDI is strongly correlated with a high GDP → The more humanly developed a country is, the more it tends to have a high GDP.

• Access to sanitation or a reduction in precarious health conditions (represented by PPDLM) also plays an important role in improving GDP, and therefore economic development. • This model supports the idea that improved sanitation positively influences the national economy by contributing to better well-being and increased productivity.

Conclusion This multiple regression model shows a strong and significant linear relationship between GDP and variables such as HDI and PPDLM . It validates the link between human development, access to sanitation and economic growth .

#CONCLUSION

In conclusion, the project on “Access to sanitation and its impact on public health” has made it possible to understand the importance of access to adequate sanitation infrastructure for the health of populations. The results obtained through principal component analysis (PCA) have made it possible to better understand the key factors influencing access to sanitation and to identify the most vulnerable geographical areas. These results have also shown that social and economic factors play a determining role in the quality of sanitation and, consequently, on the public health of populations. Thus, this study shows that improving access to sanitation is essential to reduce the health burden linked to waterborne diseases. [14] It is becoming essential to promote better access to sanitation infrastructure, strengthen health education among populations and foster partnerships between public authorities, NGOs and local communities in order to guarantee a healthier future for all. To address these issues and improve access to sanitation, it is necessary to strengthen sanitation infrastructure. [15].

Indeed, it is imperative to invest in sustainable infrastructure, often for the poorest. This includes the construction of modern toilets, wastewater management, and the installation of water treatment plants. In addition, communities must be educated on good hygiene practices and the importance of sanitation; educational programs and awareness campaigns must be implemented.

#Bibliographic reference

[1] MH Grelle, K. Kabeyne, K. Kenmogne, T. Tatietse, and GE Ekodeck, “Access to drinking water and sanitation in cities of developing countries: the case of Basoussam (Cameroon)”, VertigO - the electronic journal in environmental sciences , vol. 7, no . 2, 2006.

[2] United Nations, “Millennium Development Goals”, Annual Report, New York , vol. 75, 2015.

[3] A.S. Fall, A.T. Fall, R. Cissé, and L. Vidal, “Sanitation and Hygiene in West and Central Africa,” Strengthening Social Science Research in Support of the Regional Priorities of the UNICEF West and Central Africa Regional Office: Thematic Analyses Dakar (SEN) , pp. 87-98, 2017.

[4] World Health Organization, Sanitation Safety Management Planning: Step-by-Step Risk Management for Safely Managed Sanitation Systems . World Health Organization, 2023. [5] O. Amadou, “Total health expenditure versus public health expenditure and health outcomes in West Africa”, The International Review of French-Speaking Economists , vol. 9, no. 1 , 2024. [6] B. Evans, L. Haller, and G. Hutton, “Closing the sanitation gap: the case for better public funding of sanitation and hygiene,” 2004.

[7] MH Grelle, K. Kabeyne, K. Kenmogne, T. Tatietse, and GE Ekodeck, “Access to drinking water and sanitation in cities of developing countries: the case of Basoussam (Cameroon)”, VertigO - the electronic journal in environmental sciences , vol. 7, no . 2, 2006.

[8] RA AKONO, “Sub-Saharan African States facing the challenge of public toilets: between access crisis, sanitation crisis, risks and solutions”, Revue Francophone du Développement Durable , pp. 33-50, 2020.

[9] CH Kane and P.-E. Mandl, “Towards a rethinking of public health policies in West and Central Africa?”, Tiers Monde Review , pp. 135-147, 1973.

[10] P. Alagidede and A. Alagidede, “The public health effects of water and sanitation in selected West African countries”, Public health , vol. 130, p. 59‑63, 2016. [11] E. Nkiaka, RG Bryant, M. Okumah, and FF Gomo, “Water security in sub-Saharan Africa: Understanding the status of sustainable development goals 6”, WIREs Water , vol. 8, no . 6, p. e1552, Nov. 2021, doi: 10.1002/wat2.1552. [12] BJF ELOUNDOU, “Climate change, access to drinking water and public health: between realities and perspectives in Africa”. [13] EA Addae, D. Sun, OJ Abban, and JF Addae, “Appraising the spillover effect of water use efficiency indicators in sub-Saharan Africa: A spatial econometric approach”, Heliyon , vol. 8, no . 11, Nov. 2022, doi: 10.1016/j.heliyon.2022.e11672. [14] S. Jay, C. Jones, P. Slinn, and C. Wood, “Environmental impact assessment: Retrospect and prospect,” Environmental impact assessment review , vol. 27, no . 4, p. 287-300, 2007. [15] S. Jay, C. Jones, P. Slinn, and C. Wood, “Environmental impact assessment: Retrospect and prospect,” Environmental impact assessment review , vol. 27, no . 4, p. 287-300, 2007.

Access to sanitation and impact on public health: the case of west Africa Afrique de l’Ouest

SANOU Alidou 20221126 SAWADOGO Mady 20240803 ZOUNGRANA Oumaïma 20210123

2025-04-24

SUMMARY

INTRODUCTION

I. Materials and methods

1. Materials used for the project

II.Presentation of data and mapping of the study area