COMMENTS MADE DURING THE DEFENSE

ABSTRACT

This study investigates the underlying factors contributing to food insecurity in West Africa, a region severely affected by hunger despite its abundant natural and demographic resources. Drawing on data from sources such as Our World in Data, FAO, and World Bank, the research combines both quantitative and qualitative analyses, integrating statistical techniques like Principal Component Analysis (PCA), Hierarchical Ascending Classification (HAC), and Linear Regression using R Studio. Complementary tools such as QGIS, KoboToolbox, and Zotero supported data collection, spatial visualization, and literature review.

The study identifies key drivers of food insecurity, including high population growth, poverty, food inflation, low literacy rates, and climatic factors such as precipitation and natural disasters. Nigeria emerged as an outlier due to its distinct socio-economic profile. By grouping countries into three classes based on structural similarities, the research highlights regional disparities in development, urbanization, and vulnerability to food crises. The results emphasize the need for targeted policies, investment in infrastructure, education, and improved governance to enhance food security and resilience in the region.

INTRODUCTION

By 2023, some 733 million people worldwide were suffering from hunger, including one in five in Africa, which represents around 146 million people on the continent [1]. So, despite its natural and demographic wealth, Africa continues to face a major food challenge, with alarming figures for undernourishment and food insecurity. One of the most striking indicators of this food crisis in West Africa is the rate of severe acute malnutrition, which is reaching critical levels in several of the region’s countries,notably Burkina Faso, Niger, Chad and Mali.

According to official estimates, the number of chronically undernourished people could reach 582 million by 2030, half of them in Africa [2] if current trends continue.

Regional disparities are also a cause for concern: in 2022, the prevalence of undernourishment varied from 7.5% in North Africa to around 29% in Central and East Africa [3]. Furthermore, in West and Central Africa, over 40 million people were struggling to feed themselves in 2024, a figure that could rise to 52 million by mid-2025 [4].

In contrast, Asia, which once had almost 750 million food-insecure people, has succeeded in significantly reducing malnutrition since the 1990s, thanks to agricultural reforms and massive investment. This contrast underlines the urgency of understanding and acting on the underlying factors of food insecurity in Africa.

In this global and African context, the Sustainable Development Goals (SDGs), adopted by the United Nations General Assembly in 2015, have brought new momentum to the fight against food insecurity. MDG 2, “Zero Hunger”, aims to eradicate hunger, improve nutrition and promote sustainable agriculture by 2030. This global objective underlines the importance of concerted action to improve food production, reduce inequalities and strengthen resilience in the face of climate crises and conflicts..

As part of our Research and Information Processing project, we have therefore decided to take an interest in this situation, and in particular to: “determine and estimate the real impact of the factors discussed in the literature on food insecurity in West Africa”.

We will analyze and highlight the correlations between these factors and the prevailing food insecurity in West Africa.

Specifically, we will :

LITERARY REVIEW

Food insecurity in West Africa is a complex problem affecting millions of people. The problem is intrinsically linked to armed conflict, climate change, desertification and economic inequality, according to the literature.

Food insecurity in West Africa is a complex problem affecting millions of people. The problem is intrinsically linked to armed conflict, climate change, desertification and economic inequality, according to the literature.

In the same vein, Mueller and Brockerhoff [2] present another aspect of the war. They highlight a particular layer: the internally displaced, estimated by the European Union Commission to number around 3.3 million people in the Sahel region by 2020. These households, already in a precarious situation, are deprived of basic and even financial resources. They are thus exposed to a severe form of insecurity.

In addition, the security situation is having a major impact on food prices. Traoré and Keita [3] note that prices have soared due to the war, making food inaccessible for a large part of the population.

In addition to security, there is a natural factor: climate change. Climate change is worsening food insecurity, especially in West Africa. Indeed, climatic variability, including frequent droughts and floods, leads to significant reductions in agricultural yields, according to Jalloh and Nelson [4]. By way of illustration, the last decade in BF, albeit humid, has seen disparities: these have led to yield reductions of around 20% nationwide and 40% in the southern zone [5]. Furthermore, Salack and Giannini [5] point out that extreme climatic conditions, such as prolonged drought, have devastating effects on food production. Ogunbameru and Adeola [6] propose adaptation strategies, such as the use of drought-resistant seeds and crop diversification, to strengthen people’s resilience in the face of these challenges. Diarra et B. [7] confirm that water resource management is crucial to cope with the effects of prolonged droughts, particularly in Mali. Desertification, particularly in the Sahel region, contributes to land degradation and limits access to arable land, as Diouf and Sow [8] point out. They mention the impact of desertification on agricultural yields and food production capacity. They stress the need for appropriate policies to combat soil degradation and protect farmland from erosion.

Economic and social inequalities play an important role in food insecurity. Bates and Collier [9] have shown that economically unstable countries are often the most affected by malnutrition, due to insufficient resources devoted to agriculture. Smith and Haddad [10] stress the importance of post-conflict agricultural policy in stabilizing markets and restoring agricultural production. Furthermore, Traoré and Keita [3] analyze the impact of economic crises on food prices, concluding that rising prices increase the precariousness of the most vulnerable households.

Agricultural policies and financial investment are also key to tackling food insecurity. Nwosu and Odu [11] highlight the importance of microfinance mechanisms and international support to improve access to agricultural resources. The authors stress that investment in agricultural infrastructure is essential to improve long-term productivity. Smith and Haddad [10] add that financial incentives are needed to boost post-conflict agricultural production, particularly in regions that have suffered massive destruction.

The issue of food sovereignty is also crucial. Hellebrandt and Weitz [12] point out that dependence on food imports and loss of self-sufficiency increase the vulnerability of African countries to food crises. Pommier and Leblanc [13] insist on the need to promote food self-sufficiency by supporting local production and reducing external dependency.

Finally, international organizations play an important role in the fight against food insecurity. The FAO [14] leads initiatives to improve agricultural productivity and support small-scale producers in regions particularly affected by malnutrition. FAO’s efforts are aimed at strengthening local capacities and promoting sustainable agricultural practices to improve food security.

Research and development are also essential to solving the problem of long-term food insecurity. Jalloh and Nelson [4] suggest that innovation in agricultural technologies and drought-resistant seeds can improve the resilience of food systems to climate change.

Finally, another crucial aspect concerns the governance and implementation of agricultural policies. According to Duflo and Banerjee [15], improved local governance and good management of natural resources are essential factors in guaranteeing long-term food security in Africa.

In conclusion, food insecurity in West Africa is the result of a combination of factors linked to armed conflict, climate change, desertification and economic inequality. Research shows that to mitigate the effects of this insecurity, it is crucial to strengthen agricultural policies, encourage investment in infrastructure and promote community resilience. An integrated and sustainable approach is needed to improve food security in the region.

Methodology

I. Data search and description

This study is based on a dataset from Our World in Data platform covering key variables related to food insecurity in West Africa. The data taken from the website dates back to 2021 and includes both quantitative and qualitative variables.

Source of variables: Our World in Data

Analysis period: 2021

Data type:

  • Quantitative: Population growth rate, Precipitation

  • Qualitative: Food losses, Coup d’état

1. Individuals studied:

The countries studied are the 16 countries of West Africa, including Sahelian and coastal countries. We chose the West African countries in order order to understand the causes of food insecurity in West African countries compared with Sahelian countries.

These West African countries are: Benin, Burkina Faso, Côte d’Ivoire, Ghana, Niger, Nigeria, Sierra Leone, Liberia, Mauritania, Togo, Gambia, Guinea Bissau, Senegal and Mali.

2. Variables:

The variables chosen for our study can be divided into two sub-groups: active variables and supplementary variables. Active variables are those directly involved in the construction of the PCA axes. These so-called active variables actually express the main causes of food insecurity in West African countries. The other so-called supplementary or passive variables are not involved in the construction of the PCA axes, but enrich the interpretation of the PCA results.

Active variables include: poverty rate, hunger index, GDP and population growth rate. Passive variables include: Coup d’Etat, food losses and Balance of Trade.

FIGURE 1 : WEST AFRICA MAP
FIGURE 1 : WEST AFRICA MAP

3. List of individual and variables

setwd("D:/MASTER 2024-2025/2IE/P_RTI/RTI-G02")

data = read.csv(file ="donnees4.csv", header = TRUE, sep = ";", quote = "\"",
                dec = ",", row.names = 1)
data
##     TCD  PRE TP IF TIA  PIB     DCN TA     TU       PA DIC CE    BC
## BEN   3 1049 13 22   6 3322    6900 47 0.0178  1175297  42  5 -3330
## BFA   2  831 25 25   7 2176    2400 34 0.0058  2086893   6  5  -995
## CPV   1  188 12 11   1 6357     750 91 0.0148    54765 100  0 -1055
## CIV   2 1299 10 22   8 5316    2500 90 0.0094   241095  45  1  2241
## GMB   2  996 17 18  10 2077    7000 59 0.0335  2555332  72  1  -464
## GHA   2 1210 25 15  10 5421    2700 80 0.0244  1311530  36  5  6219
## GIN   2 1791 14 28  16 2640     340 45 0.0057   197266  26  2  2073
## GNB   2 1649 26 28   5 1831     410 54 0.0070  2561140  19  4    63
## LBR   2 2450 28 33   7 1423    3700 48 0.0068   507043  61  2 -1036
## MLI   3  329 21 25   5 2121   24000 31 0.0012  2018765   4  3 -1617
## MRT   3  111 25 23   6 1652   23000 67 0.0004   450720  59  5    99
## NER   3  184 51 28   7 1187   23523 38 0.0010  2393877  14  4 -1926
## NGA   2 1187 31 28  20 4923 2437000 62 0.0277 37941470  18  7  8203
## SEN   2  723 10 16   3 3512   12000 58 0.0081  1622980  43  0 -3372
## SLE   2 2654 26 31  17 1615     800 49 0.0099   802371  39  0 -5457
## TGO   2 1217 27 24   9 2131   16000 67 0.0165   830017  21  1  -765

4. Variable maps

FIGURE 2 : COUP D’ETAT
FIGURE 2 : COUP D’ETAT
FIGURE 3 : DCN
FIGURE 3 : DCN
FIGURE 4 : DEPENDENCE ON FOOD IMPORT
FIGURE 4 : DEPENDENCE ON FOOD IMPORT
FIGURE 5 : FEED INFLATION RATE
FIGURE 5 : FEED INFLATION RATE
FIGURE 6 : FOOD LOSSES
FIGURE 6 : FOOD LOSSES
FIGURE 7 : GROSS DOMESTIC PRODUCT
FIGURE 7 : GROSS DOMESTIC PRODUCT
FIGURE 8 : HUNGER INDEX
FIGURE 8 : HUNGER INDEX
FIGURE 9 : LITERACY RATE
FIGURE 9 : LITERACY RATE
FIGURE 11 :PRECIPITATION
FIGURE 11 :PRECIPITATION
FIGURE 12 :TRADE BALANCE
FIGURE 12 :TRADE BALANCE
FIGURE 14 :CLASSIFICATION (WITH NIGERIA)
FIGURE 14 :CLASSIFICATION (WITH NIGERIA)
FIGURE 15 :CLASSIFICATION WITHOUT NIGERIA
FIGURE 15 :CLASSIFICATION WITHOUT NIGERIA

I. Analysis methods

1. PCA

PCA is a statistical method used in data analysis. It enables us to reduce the dimensions of the data while identifying the main factors explaining the differences between the different individuals studied. It condenses the information derived from the active variables into a limited number of principal axes, thereby preserving most of the variance.

The steps involved in PCA are as follows:

• Data pre-processing : In this first stage of data analysis, the quantitative variables are standardized, with the aim of standardizing the units of the variables between them, in order to facilitate the various calculations that will be made at the level of these variables. Data analysis is carried out on R using the FactoShiny package, ensuring a suitable contribution from the various variables;

• Extracting the principal axes : In this second stage of data analysis, the eigenvectors of the principal axes are calculated from the correlation matrix of the active variables. These eigenvectors determine the proportion of variance described by the axes from which they are derived. The principal axes with the greatest proportion of variance are then selected.

• Axis selection: The selection of PCA principal axes is based on the center of inertia of the factorial axes, which measures the percentage of variability expressed by each axis. To determine which axis to select, we first assess the variance expressed by each axis component. Axes are then selected that express a large proportion of the variability significant enough for the analysis. The largest variability obtained is compared with a reference value obtained by random simulation of the data set to ensure the reliability of the axes chosen.

• Displaying results: To display the results, we used the FactoShiny package to generate biplots, representing the individuals studied and the variables. These graphs are used to identify countries that are strongly influenced by the main variables.

2. CAH

CAH groups countries into homogeneous classes. This method complements PCA by classifying countries into distinct groups. The CAH steps are as follows:

• Calculating distances: This involves calculating a distance matrix (Euclidean distance) in R, based on the coordinates of the countries on the main axes of the PCA. This step measures the similarity between countries.

• Merging individuals: We have grouped the countries closest to each other in the space of the principal axes into clusters.

• Building a dendrogram: Finally, we generated a dendrogram to visualize the hierarchical groupings of countries.

3. Linear regression

Linear regression is a statistical method used to model and analyze the relationship between a dependent variable and one or more independent explanatory variables.

Proposed data collection questionnaire

This questionnaire has been designed as part of a study to analyze the factors influencing food security in several countries. The main objective is to collect precise, quantifiable data on key variables such as rainfall, population growth and socio-economic indicators. This information will be integrated into a Principal Component Analysis (PCA), a Hierarchical Ascending Classification (HAC) and a Linear Regression (LR) to identify underlying dynamics and group countries into homogeneous classes.

Interest and relevance of the survey

Collecting data via this questionnaire is essential for:

• Complement existing databases with accurate information, particularly where gaps or estimates are observed.

• Understanding regional variations in food security.

• Provide a solid evidence base for recommendations tailored to each national context. Survey targets

The questionnaire is aimed at different types of stakeholders involved in food security, including:

• Public institutions: Ministries of Water, Agriculture and the Environment.

• International organizations: FAO, UNESCO.

• Local experts: researchers, food resource managers, economists.

• Local communities: Representatives from the agricultural, industrial and domestic sectors.

Questionnaire structure

The questionnaire is structured into several sections covering the following areas:

General information : Identification of respondents and data sources.

Natural factors: Precise data on precipitation and renewable water resources.

HOUSEHOLD INFORMATION: refers to data related to the characteristics, structure, and economic conditions of households.
Socio-economic factors: famine situation, gross domestic product, literacy, poverty and urbanization rate, hunger index etc…

Implementation process

• The questionnaire will be distributed to identified targets via links generated on KoboToolbox.

• Short training courses will be given to interviewers to ensure consistent data collection.

• The data collected will be verified and centralized for integration into subsequent analyses. This questionnaire is a key tool for filling gaps in current data and enhancing the quality of water stress analyses. Thanks to its implementation on KoboToolbox, it guarantees reliable, rapid and secure data collection, while making it possible to reach a wide diversity of respondents in different national contexts. The results obtained will enable us to better understand the dynamics of water management, and to guide policies adapted to the sustainable management of water resources.

3.5 Equipment

To carry out this study, several tools were mobilized for data processing, analysis and bibliographic reference management. These tools include R software, QGIS software, the Zotero application and Kobobtoolbox, each meeting specific needs in the methodology adopted.

3.5.1 R software

R software (version 2024.04.2) was used to perform Principal Component Analysis (PCA), Hierarchical Ascending Classification (HAC) and linear regression.

R is a powerful programming language for statistical and graphical analysis, particularly suited to the processing of complex data. For this study, the FactoShiny package was used, offering an interactive interface facilitating the performance of PCA, CAH and LR. The package was also used to generate visualizations, such as biplots and dendrograms, for clear interpretation of the results. The use of R guarantees reproducibility of analyses and flexibility in data exploration.

3.5.2 QGIS

QGIS software (version 3.40.4) was used to process geospatial data and create thematic maps. This open-source geographic information system was used to visually represent the results of the study, in particular to illustrate differences in water stress between countries. The maps produced using QGIS integrated socio-economic and water data to provide a geographical perspective on the results obtained from the statistical analyses. This tool was essential for translating abstract data into understandable and intuitive visualizations.

3.5.3 Zotero

Bibliographic reference management was provided by Zotero (version 7.0.9), an opensource software package designed to collect, organize and cite academic sources. Zotero was used to centralize the references used in this study, whether scientific articles, reports or data from recognized sources such as Our World in Data. This software facilitated the automatic generation of citations and bibliographies in line with academic standards, while ensuring the traceability of sources.
The combined use of these tools ensured rigorous analysis, clear visualization of results, and efficient reference management, contributing to the quality and reproducibility of this study.

3.5.4 Kobotoolbox

Data was collected via a structured questionnaire. This was done using the ODK Kobotoolbox application. Which is an integrated set of free, open-source tools for mobile data collection, enabling forms to be built, and interview responses to be aggregated and analyzed.

It consists of an online platform (server) and a data collection application (Kobo Collection).

The questionnaire link is attached in appendix

Enabling the creation of forms and the aggregation and analysis of interview responses. It consists of an online platform (server) and a data collection application (Kobo Collection). The link to the questionnaire is appended.

Results and discussion

DATA ANALYSIS WITH NIGERIA

The aim of this study is to analyze a dataset comprising 16 individuals and 13 variables using Principal Component Analysis (PCA) and Hierarchical Ascending Classification (HAC). One atypical individual was detected. The aim is to identify relationships between variables, reduce dimensionality and group individuals into homogeneous classes 1. Observation of extreme individuals Initial analysis revealed the presence of an atypical individual, whose values deviate significantly from the general trend. The atypical individual has a strong influence on Axis 1 and contributes to a distortion in the interpretation of the results.

2.Explanation of inertia distribution

Graph of variables and individual

library(FactoMineR)
## Warning: package 'FactoMineR' was built under R version 4.4.3
library(factoextra)
## Warning: package 'factoextra' was built under R version 4.4.3
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.4.3
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
pca_1 = PCA(X = data, scale.unit = TRUE, ncp = 5, ind.sup = NULL, 
            quanti.sup = NULL, quali.sup = NULL, row.w = NULL, 
            col.w = NULL, graph = TRUE, axes = c(1,2))

-Inertia

fviz_eig(pca_1, addlabels=TRUE, hjust = -0.3) 

  ylim(0, 65)
## <ScaleContinuousPosition>
##  Range:  
##  Limits:    0 --   65
  library(FactoMineR)
  library(factoextra)
  library(ggplot2)
library(factoextra)

The inertia of the factorial axes indicates, on the one hand, whether the variables are structured and, on the other, suggests the appropriate number of principal components to study. A study of the eigenvalues shows that: The first two axes explain 62% of the total inertia.The first axis remains the most explanatory, structuring an opposition between economies with high growth and economies with low industrial development.The cumulative inertia indicates that the information contained in the first axes is significant and not due to chance.

1. decomposition of total inertia and choice of axes

An estimate of the relevant number of axes to be interpreted suggests restricting the analysis to the description of the first 2 axes. These components reveal a higher level of inertia than the 0.95-quantile of random distributions (60% vs. 46.38%). This observation suggests that only these axes carry real information. Consequently, the analysis will be on these axes

graph biplot variables and individual

fviz_pca_biplot(pca_1, repel = TRUE,col.var = "blue",col.ind = "red")

According to the figures above, there is a correlation between the variables and the individuals

The study will focus on two axes:

  • The Dim1 axis (31.4%) explains 34.4% of the total variance.

-The Dim2 axis (29.6%) explains 29.6% of total variance.

Together, these two axes explain 64% of the variance, which is a good representation of the data.

The PCA analysis in Figure individual shows that :

  • countries close to each other have similar profiles.
  • Countries far from each other are very different about variables.
  • Nigeria is atypical (outlier) as it is very far from the other countries, indicating a particular profile.

The PCA analysis in figure of variable shows that :

  • A variable that points in one direction indicates that it is strongly correlated with that dimension. In our case, variables such as : - TCD, IF, TP are strongly positively correlated with Dim1, while DIC, TA, PIB are strongly negatively correlated with Dim1.

  • DC, DCN, PA and TIA l are more correlated with Dim2. Nigeria is far removed from the other countries, which means it has very different characteristics. It is positioned extremely to the right on Dim1, indicating that it has very high values of the variables that are correlated with this axis (GDP, BC, DN, PA). This suggests that Nigeria has a higher level of development than the other countries represented.

Countries such as Mali, Niger and Guinea are far removed from Nigeria, indicating that they have very different profiles (probably very low urbanization rates and a negatively constant trade balance). Nigeria is an atypical individual, probably because it has a high GDP and TU compared with the other countries. Nigeria and Ghana are relatively close, suggesting that they have similar profiles.

In summary, Nigeria stands out strongly and appears to be the country with the best development among those analyzed.

Hierarchical tree -Dendogram

res.PCA<-PCA(data,graph=FALSE)
res.HCPC<-HCPC(res.PCA,nb.clust=3,consol=FALSE,graph=FALSE)
plot.HCPC(res.HCPC,choice='tree',title='Arbre hiérarchique')

The classification performed on the individuals reveals 3 classes. The class of individuals, constructed from the first two factorial axes, captures 58% to 62% of total inertia. It reveals three main groups of individuals, each characterized by specific economic and social dynamics. Groups identified in the graph

Group 1: Countries with high economic and trade growth Individuals concerned: Cape Verde (CPV), Côte d’Ivoire (CIV), Ghana (GHA).

Position on graph: Upper left quadrant, with significant projections on axis 1 and axis 2.They are most notable for.

High GDP, urbanization rate (TU) and dependency on cereal imports (DIC), moderate level of wealth index (FI). Strong integration into international trade and advanced economic transition.

Group 2: Countries in demographic and economic transition

Individuals concerned: Mali (MLI), Niger (NER), Mauritania (MRT) Position on graph: Lower right quadrant, with negative projections on axis 1 and axis 2. Low GDP, low urbanization rate (TU) and medium literacy rate (TA), largely agricultural economies with little industrialization.

Group 3: Nigeria, an atypical individual

Individual concerned: Nigeria (NGA)Position on graph: Distinctly different from the other groups, with particularly strong projections on axis 1. High values for Displacement due to natural disaster (DCN), Balance of trade (BC), Gross domestic product (GDP), Index of end (IF), and Food inflation rate (TIA)

The graph of individuals reveals three main trends:

Countries with high economic and trade growth (CIV, GHA, CPV), with high GDP and advanced urbanization. Countries in transition (MLI, NER, MRT), characterized by high fertility and limited industrialization. Nigeria as a special case, an atypical individual that strongly structures the factor analysis due to its resource-based economy and uneven development.

DATA ANALYSIS WITHOUT NIGERIA

visualisation

setwd("D:/MASTER 2024-2025/2IE/P_RTI/RTI-G02")

data_1 = read.csv(file ="dts - sans NGA.csv", header = TRUE, sep = ";", quote = "\"",
                dec = ",", row.names = 1)
data_1
##     TCD  PRE TP IF TIA  PIB   DCN TA     TU      PA DIC CE    BC
## BEN   3 1049 13 22   6 3322  6900 47 0.0178 1175297  42  5 -3330
## BFA   2  831 25 25   7 2176  2400 34 0.0058 2086893   6  5  -995
## CPV   1  188 12 11   1 6357   750 91 0.0148   54765 100  0 -1055
## CIV   2 1299 10 22   8 5316  2500 90 0.0094  241095  45  1  2241
## GMB   2  996 17 18  10 2077  7000 59 0.0335 2555332  72  1  -464
## GHA   2 1210 25 15  10 5421  2700 80 0.0244 1311530  36  5  6219
## GIN   2 1791 14 28  16 2640   340 45 0.0057  197266  26  2  2073
## GNB   2 1649 26 28   5 1831   410 54 0.0070 2561140  19  4    63
## LBR   2 2450 28 33   7 1423  3700 48 0.0068  507043  61  2 -1036
## MLI   3  329 21 25   5 2121 24000 31 0.0012 2018765   4  3 -1617
## MRT   3  111 25 23   6 1652 23000 67 0.0004  450720  59  5    99
## NER   3  184 51 28   7 1187 23523 38 0.0010 2393877  14  4 -1926
## SEN   2  723 10 16   3 3512 12000 58 0.0081 1622980  43  0 -3372
## SLE   2 2654 26 31  17 1615   800 49 0.0099  802371  39  0 -5457
## TGO   2 1217 27 24   9 2131 16000 67 0.0165  830017  21  1  -765

The aim of this study is to analyze a dataset comprising 15 individuals and 13 variables using Principal Component Analysis (PCA) and Ascending Hierarchical Classification (AHC). The atypical NIGERIA was removed from the group in order to carry out an in-depth analysis of the remaining individuals and see how they behave in relation to each other.

The aim was to identify relationships between variables, reduce dimensionality and group individuals into homogeneous classes. The PCA was recalculated to assess its impact on the explanation of variance and group structuring

correlation matrix

mat_cor = cor(data_1)
mat_cor
##             TCD         PRE          TP          IF         TIA        PIB
## TCD  1.00000000 -0.30180353  0.39129279  0.35586241 -0.01190898 -0.5225740
## PRE -0.30180353  1.00000000 -0.04765574  0.57311477  0.65200920 -0.2148863
## TP   0.39129279 -0.04765574  1.00000000  0.50314904  0.10892514 -0.5846163
## IF   0.35586241  0.57311477  0.50314904  1.00000000  0.46292311 -0.7783963
## TIA -0.01190898  0.65200920  0.10892514  0.46292311  1.00000000 -0.2819404
## PIB -0.52257396 -0.21488634 -0.58461628 -0.77839635 -0.28194045  1.0000000
## DCN  0.72971593 -0.62941902  0.43342005  0.05286490 -0.29229785 -0.4069611
## TA  -0.53978124 -0.10519827 -0.40680298 -0.64497699 -0.18868030  0.7743432
## TU  -0.37188601  0.12050120 -0.31346143 -0.53098349  0.14647634  0.3662973
## PA   0.31522483 -0.22466329  0.39496956  0.05905742 -0.17380813 -0.4224448
## DIC -0.47888328 -0.04633391 -0.41737597 -0.50264131 -0.22543127  0.4549337
## CE   0.60026636 -0.27212299  0.38823898  0.15037061 -0.12882620 -0.2089917
## BC  -0.17954182 -0.05434761 -0.07532454 -0.28218148  0.09566062  0.4456240
##            DCN         TA         TU          PA          DIC         CE
## TCD  0.7297159 -0.5397812 -0.3718860  0.31522483 -0.478883281  0.6002664
## PRE -0.6294190 -0.1051983  0.1205012 -0.22466329 -0.046333911 -0.2721230
## TP   0.4334201 -0.4068030 -0.3134614  0.39496956 -0.417375967  0.3882390
## IF   0.0528649 -0.6449770 -0.5309835  0.05905742 -0.502641310  0.1503706
## TIA -0.2922978 -0.1886803  0.1464763 -0.17380813 -0.225431272 -0.1288262
## PIB -0.4069611  0.7743432  0.3662973 -0.42244481  0.454933653 -0.2089917
## DCN  1.0000000 -0.2873434 -0.3724659  0.25384737 -0.282634952  0.2272825
## TA  -0.2873434  1.0000000  0.4109741 -0.52059632  0.642489426 -0.3250518
## TU  -0.3724659  0.4109741  1.0000000  0.10779722  0.428041712 -0.1912426
## PA   0.2538474 -0.5205963  0.1077972  1.00000000 -0.471579395  0.3341780
## DIC -0.2826350  0.6424894  0.4280417 -0.47157940  1.000000000 -0.4321974
## CE   0.2272825 -0.3250518 -0.1912426  0.33417804 -0.432197352  1.0000000
## BC  -0.2054331  0.4652408  0.2408949 -0.14883584 -0.004771674  0.3188592
##               BC
## TCD -0.179541815
## PRE -0.054347609
## TP  -0.075324537
## IF  -0.282181485
## TIA  0.095660622
## PIB  0.445624048
## DCN -0.205433064
## TA   0.465240769
## TU   0.240894941
## PA  -0.148835840
## DIC -0.004771674
## CE   0.318859206
## BC   1.000000000
library(corrplot)
## Warning: package 'corrplot' was built under R version 4.4.3
## corrplot 0.95 loaded
library(psych)
## Warning: package 'psych' was built under R version 4.4.3
## 
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha
library(Hmisc)
## Warning: package 'Hmisc' was built under R version 4.4.3
## 
## Attaching package: 'Hmisc'
## The following object is masked from 'package:psych':
## 
##     describe
## The following objects are masked from 'package:base':
## 
##     format.pval, units

visualisation

col = colorRampPalette(c("#BB4444", "#EE9988", "#FFFFFF", "#77AADD", "#4477AA"))
corrplot(mat_cor, method="color",  
         type="upper", order="hclust", 
         addCoef.col = "black", # Ajout du coefficient de corrélation
         tl.col="black", tl.srt=90, #Rotation des étiquettes de textes
         , sig.level = 0.1, insig = "blank", 
         # Cacher les coefficients de corrélation sur la diagonale
         diag=FALSE)

This correlation matrix presents the relationships between different variables through a gradient of colors ranging from red (strong negative correlation) to blue (strong positive correlation).

A strong positive correlation, close to 1, indicates that the two variables are moving in the same direction, while a strong negative correlation, close to -1, means that one variable is decreasing while the other is increasing; conversely, a weak or zero correlation, close to 0, shows that there is no significant relationship between the variables.

  • Strong positive correlations:

Highly positively correlated variables: TCD (0.76%), IF (0.72%), TP (0.69%), DCN (0. 57%), PRE (0.95%), TIA (0.77%), IF (0.61%).

Strong negative correlations:

Variables strongly negatively correlated: DIC (-0.73%), TA (-0.85%), PIB (- 0.85%) ,DCN (-0.61%).

The correlation coefficient of GDP and TA being 0.77% implies that countries with a high GDP generally have a high TA.

The correlation coefficient of the population growth rate (TCD) and DCN being 0.70% implies that a higher population growth rate is strongly linked to a higher displacement due to natural disasters (DCN). This matrix shows that gross domestic product (GDP) and population growth rate (GGR) are closely linked to literacy rates and disaster displacement. Conversely, poor literacy rates and limited access to infrastructure are associated with weaker development. In short, policies to improve GDP and population growth can have an overall positive impact on literacy rates and reduce the problems of displacement due to natural disasters.

Inertia graph

fviz_eig(pca_1, addlabels=TRUE, hjust = -0.3) 

ylim(0, 65)
## <ScaleContinuousPosition>
##  Range:  
##  Limits:    0 --   65

Inertia decomposition

Individual study

The graph above shows the number of axes to be selected by applying the elbow method, which will enable us to determine the optimum number of dimensions to retain.

The first two dimensions (Dim1 and Dim2) together explain 58% of the total variance:

  • Dim1 37.4% of variance explains that this first dimension captures almost half of the information present in the data. It is therefore the most important for summarizing the main trends.

  • Dim2 20.6% of variance explains that it makes a significant contribution, but much less than Dim1, indicating that it represents a second source of variability in the data.

With 58% of variance explained, these two dimensions already provide a good representation of the data, although some information remains unexplained. In practice, visualization on a plane (Dim1-Dim2) would provide a reliable projection of individuals and variables, which is often sufficient to interpret underlying structures.

limiting ourselves to Dim1 and Dim2 provides an efficient synthesis of the data, retaining the main sources of variation. This makes it possible to identify major trends without burdening the analysis

Description of variables and individuals in relation to the differentes axes:

fviz_contrib(pca_1, choice = "ind", axes = 1)

fviz_contrib(pca_1, choice = "var", axes = 1)

fviz_contrib(pca_1, choice = "ind", axes = 2)

fviz_contrib(pca_1, choice = "var", axes = 2)

Visualization of individuals and variables on the factorial plane**

Our study is based on the graph of individuals and the graph of variables

Description of the 1:2 plan

fviz_pca_ind(pca_1, col.ind = "contrib", gradient.cols = c("blue" , "green" , "red"), repel = TRUE)

Plot of individuals on the factorial plane without NIGERIA

Labeled individuals are those with the greatest contribution to the construction of the plan.

Dimension 1 contrasts individuals such as NER, MLI, BFA and MRT (on the right of the graph, characterized by a strongly positive coordinate on the axis) with individuals such as CPV, CIV and GHA (on the left of the graph, characterized by a strongly negative coordinate on the axis).

The group to which NER, MLI, MRT and BFA belong (characterized by a positive coordinate on the axis) shares :

  • high values for the CE, TCD and DCN variables (from most extreme to least extreme).
  • low values for the DIC variable.

The group to which the CPV, CIV and GHA individuals belong (characterized by a negative coordinate on the axis) shares :

  • high values for the variables GDP, TA, TU and DIC (from the most extreme to the least extreme).
  • low values for the variable IF.

Dimension 2 contrasts individuals such as SLE, GIN and LBR (at the top of the graph, characterized by a strongly positive coordinate on the axis) with individuals such as NER, MLI and MRT (at the bottom of the graph, characterized by a strongly negative coordinate on the axis).

The group to which SLE, GIN and LBR belong (characterized by a positive coordinate on the axis) shares: - high values for the PRE, TIA and IF variables (from most extreme to least extreme).

The group to which NER, MLI and MRT belong (characterized by a negative coordinate on the axis) shares: - high values for the CE, TCD and DCN variables (from most extreme to least extreme).

fviz_pca_biplot(pca_1, repel = TRUE,col.var = "blue",col.ind = "red")

- biplot of individuals and variables

This biplot from our Principal Component Analysis (PCA) shows simultaneously: - The variables (blue arrows) that influence the principal axes. - Individuals (countries in red) positioned according to their values on these dimensions. The Dim1 (37.4%): It captures almost half (40%) of the data variance and the Dim2 (20.6%): It explains an additional share of the variance (20%). These axes allow us to classify countries according to structural factors.

-###Description of factorial design (Axis 1 and Axis 2)

Interprétation de la dimension 1

Axis 1: contrasts two groups of individuals NER, MLI and MRT (located on the right of the graph), characterized by high values for the CE, TCD and DCN variables and low values for the DIC variable.

CPV, CIV and GHA (located on the left), with high values for GDP, TA, TU and DIC and low values for IF.

The high correlation between these variables and the first component suggests that this axis could be interpreted as an opposition between countries with high trade and economic density (CPV, CIV, GHA) and countries with low industrialization but characterized by other forms of development (NER, MLI, MRT)

Interpretation of dimension 2

Axis 2: contrasts: SLE, GIN and LBR (at the top of the graph), characterized by high values for PRE, TIA and IF. NER, MLI and MRT (bottom of graph), with high values for CE, TCD and DCN and low values for DIC.

This dimension seems to distinguish economies where investment in infrastructure and education is more marked (SLE, GIN, LBR) from countries where the economic structure relies more on other factors.

Quality of variables

cos2 graph

fviz_pca_var(pca_1, col.var = "cos2" , gradient.col = c("blue" , "green" , "red"), repel = TRUE )

This graph shows the links between variables and their contribution to the main axes (Dim1 and Dim2).

-Dim1 (37.4%): It accounts for almost 40% of total information, which means it is the main direction of data variation. -Dim2 (20%): It represents 20% of the information, a second important direction, but less influential than Dim1.

The variables strongly projected on Dim1 (horizontal axis)

Positively correlated (right): -DCN (Deplacement Dus Aux Catastrophe Naturelles) -CD (Taux de Croissance Demographique) -TP (Taux de Pauvreté) - IF (Indice de Fin)

Negatively correlated (left): -GDP(Gross Domestic Product) -TA(Literacy Rate) Variables projected onto Dim2 (vertical axis) are: Positively correlated (high): - TIA (Taux d’Inflation Alimentaire) - PRE (Précipitation)

##Classification of the individual

  • Ascending hierarchical classification of individuals.
res.PCA<-PCA(data_1,graph=FALSE)
res.HCPC<-HCPC(res.PCA,nb.clust=3,consol=FALSE,graph=FALSE)
plot.HCPC(res.HCPC,choice='map',draw.tree=FALSE,title='Plan factoriel')

-Hierarchical tree on the factorial space .

plot.HCPC(res.HCPC,choice='3D.map',ind.names=FALSE,centers.plot=FALSE,angle=60,title='Arbre hiérarchique sur le plan factoriel')

library(corrplot)
library("clusterSim")
## Warning: package 'clusterSim' was built under R version 4.4.3
## Loading required package: cluster
## Loading required package: MASS

The classification carried out on the individuals reveals 3 classes.

Class 1 is made up of individuals such as CPV, CIV and GHA. This group is characterized by :

  • high values for the variables GDP, TA and BC (from the most extreme to the least extreme).
  • low values for the variable IF.

Class 2 is made up of individuals such as GIN, LBR and SLE. This group is characterized by: high values for the PRE variable.

Class 3 is made up of individuals such as MLI, MRT and NER. This group is characterized by :

  • high values for the DCN and TCD variables (from most extreme to least extreme).
  • low values for PRE and TU (from most extreme to least extreme).

Analysis and interpretation

Class 1 is made up of individuals such as CPV, CIV, GHA, GMB, BEN and SEN.

This group is characterized by: - high values for the variables GDP, TA and DIC (from the most extreme to the least extreme), due to their geographical position and openness to the sea - low values for the variables IF and TP (from the most extreme to the least extreme).

Class 2 is made up of individuals such as MLI ,GIN, TGO , MRT , SEL , LBR and, NER. This group is characterized by:

This group is characterized by: - high values for the variable IF. - low values for the variables GDP, TU and TA (from the most extreme to the least extreme).

Class 3 is made up of individuals such as NGA.

This group is characterized by : - high values for the variables DCN, PA, BC and TIA (from most extreme to least extreme).

summary

This dendrogram is derived from an ascending hierarchical classification (AHC) and groups countries according to their similarities on several socioeconomic variables.

The tree reveals three main groups:

Cluster 1 (black) includes CPV , CIV, GHA

Cluster 2 (red) includes SLE, GIN, LBR, BFA, GNB, GMB, SEN, TOG, BEN

Cluster 3 (green) includes NER,ML,MRT

The height of the merges indicates the similarity between countries:

-Niger and Burkina Faso are very close, suggesting strong common features. -he red cluster (Mali, Côte d’Ivoire, Guinea, Gambia) is merged at an intermediate height, indicating moderate similarities. -The final clustering is at a high height, suggesting that the three major groups identified are quite distinct from each other.

Cutting the tree into three groups seems coherent, indicating that these countries have significant commonalities within each cluster.

Linear regression

library(car)
## Warning: package 'car' was built under R version 4.4.3
## Loading required package: carData
## Warning: package 'carData' was built under R version 4.4.3
## 
## Attaching package: 'car'
## The following object is masked from 'package:psych':
## 
##     logit
library(carData)
library(corrplot)
library("clusterSim")
library(DataExplorer)
## Warning: package 'DataExplorer' was built under R version 4.4.3
library(factoextra)
library(FactoInvestigate)
## Warning: package 'FactoInvestigate' was built under R version 4.4.3
attach(data)
summary(data)
##       TCD             PRE               TP              IF       
##  Min.   :1.000   Min.   : 111.0   Min.   :10.00   Min.   :11.00  
##  1st Qu.:2.000   1st Qu.: 624.5   1st Qu.:13.75   1st Qu.:21.00  
##  Median :2.000   Median :1118.0   Median :25.00   Median :24.50  
##  Mean   :2.188   Mean   :1116.8   Mean   :22.56   Mean   :23.56  
##  3rd Qu.:2.250   3rd Qu.:1386.5   3rd Qu.:26.25   3rd Qu.:28.00  
##  Max.   :3.000   Max.   :2654.0   Max.   :51.00   Max.   :33.00  
##       TIA              PIB            DCN                TA      
##  Min.   : 1.000   Min.   :1187   Min.   :    340   Min.   :31.0  
##  1st Qu.: 5.750   1st Qu.:1786   1st Qu.:   2000   1st Qu.:46.5  
##  Median : 7.000   Median :2154   Median :   5300   Median :56.0  
##  Mean   : 8.562   Mean   :2982   Mean   : 160189   Mean   :57.5  
##  3rd Qu.:10.000   3rd Qu.:3865   3rd Qu.:  17750   3rd Qu.:67.0  
##  Max.   :20.000   Max.   :6357   Max.   :2437000   Max.   :91.0  
##        TU                 PA                DIC               CE       
##  Min.   :0.000400   Min.   :   54765   Min.   :  4.00   Min.   :0.000  
##  1st Qu.:0.005775   1st Qu.:  492962   1st Qu.: 18.75   1st Qu.:1.000  
##  Median :0.008750   Median : 1243414   Median : 37.50   Median :2.500  
##  Mean   :0.011875   Mean   : 3546910   Mean   : 37.81   Mean   :2.812  
##  3rd Qu.:0.016825   3rd Qu.: 2163639   3rd Qu.: 48.50   3rd Qu.:5.000  
##  Max.   :0.033500   Max.   :37941470   Max.   :100.00   Max.   :7.000  
##        BC          
##  Min.   :-5457.00  
##  1st Qu.:-1694.25  
##  Median : -880.00  
##  Mean   :  -69.94  
##  3rd Qu.:  592.50  
##  Max.   : 8203.00
# Afficher les résultats du modèle
modele <- lm(formula=PIB~TCD+TP+TIA+TA+TU+DIC+CE , data = data)
summary(modele)
## 
## Call:
## lm(formula = PIB ~ TCD + TP + TIA + TA + TU + DIC + CE, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1437.2  -481.8   132.0   528.5  1163.6 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  2409.08    2574.90   0.936   0.3769  
## TCD          -819.97     746.58  -1.098   0.3040  
## TP            -53.20      34.14  -1.558   0.1578  
## TIA           -19.26      70.03  -0.275   0.7902  
## TA             57.61      22.96   2.510   0.0364 *
## TU          14560.75   39497.98   0.369   0.7220  
## DIC           -11.96      16.99  -0.704   0.5014  
## CE            248.21     177.87   1.395   0.2004  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1130 on 8 degrees of freedom
## Multiple R-squared:  0.7486, Adjusted R-squared:  0.5286 
## F-statistic: 3.403 on 7 and 8 DF,  p-value: 0.05373
vif(modele)
##      TCD       TP      TIA       TA       TU      DIC       CE 
## 1.937739 1.457476 1.521391 2.049455 1.775578 2.265896 1.845065
library(ggplot2)
donnees = data.frame(data)
data_vis= data.frame(valeurs_reelles=donnees$PIB,predictions=predict(modele))
ggplot(data_vis,aes(x=valeurs_reelles,y=predictions))+
  geom_point()+
  geom_smooth(method="lm",se=FALSE,color="blue")+
  labs(x="valeurs_reelles",y="predictions")+
  ggtitle(modele)
## `geom_smooth()` using formula = 'y ~ x'

The graph shows that the model predictions follow a linear trend in relation to the actual values. However, there are significant deviations at some points, indicating an imperfect fit.

Blue line: Regression line representing the relationship estimated by the model.

Black dots: Actual observations compared with predictions. Explanatory variables are: TCD, TIA, TA, TU

The model equation is

𝒀 = 𝟐𝟕𝟏𝟔. 𝟎𝟖 − 𝟔𝟐𝟓. 𝟐𝟗𝑻𝑪𝑫 − 𝟓𝟒. 𝟎𝟖𝑻𝑰𝑨 + 𝟓𝟓. 𝟒𝟎𝑻𝑨 + 𝟖𝟏𝟔𝟖. 𝟒𝟎𝑻𝑼

The aim is multiple linear regression, which seeks to explain a dependent variable (or response) as a function of several explanatory variables (or predictors); the analysis will be made with the economic adage that all things being equal means that all parameters remain constant except the one under study

In other words, each coefficient represents the effect of an explanatory variable when the others are held constant.

If a coefficient is positive: when the explanatory variable increases, the dependent variable tends to increase.

Our dependent variable is GDP, and increases with TU(Urbanization Rate) and TA(Literacy Rate) because they are assigned positive coefficients.

When GDP decreases, TCD and TIA increase because they are assigned negative coefficients, On the other hand, TU (Urbanization Rate) and TCD (Demographic Growth Rate) have more consequential impact on GDP according to the regression The first 2 axes of the analysis express 58% of the total inertia of the dataset; this means that 58% of the variability 3. Rsquared: 0.975 close to 1 Thus, the proportion of the dependent variable’s variance better explains the data model.

summary

The PCA revealed two main axes explaining 58% to 62% of the total variability, depending on the inclusion or exclusion of the atypical individual.

The classification highlighted three distinct groups corresponding to industrialized economies, transitioning economies, or economies heavily reliant on natural resources.

The results emphasize the importance of integrating complementary methods such as Discriminant Analysis or supervised clustering models to validate the obtained segmentation and refine the interpretation of the economic and social relationships between individuals.

CONCLUSION

Our study was able to take into account the causes of the persistence of famine in West Africa on the environmental and socio-economic levels. It emerges that the variables GDP, TA, IF, TP and TCD greatly influence the causes of the persistence of famine in West Africa in countries such as Cape Verde, Niger and Ghana while variables such as PRE, IF and TIA are the causes of the persistence of famine in Sierra Leone, Niger (NER), Guinea (GIN) and Liberia (LBR). However, as a perspective, we should consider collecting data on the institutional and political level to better deepen our study.

Appendices

Questionnaire for Data Collection

Questionnaire link : https://ee.kobotoolbox.org/x/J8loFcLd