COMMENTS MADE DURING THE DEFENSE
This study investigates the underlying factors contributing to food insecurity in West Africa, a region severely affected by hunger despite its abundant natural and demographic resources. Drawing on data from sources such as Our World in Data, FAO, and World Bank, the research combines both quantitative and qualitative analyses, integrating statistical techniques like Principal Component Analysis (PCA), Hierarchical Ascending Classification (HAC), and Linear Regression using R Studio. Complementary tools such as QGIS, KoboToolbox, and Zotero supported data collection, spatial visualization, and literature review.
The study identifies key drivers of food insecurity, including high population growth, poverty, food inflation, low literacy rates, and climatic factors such as precipitation and natural disasters. Nigeria emerged as an outlier due to its distinct socio-economic profile. By grouping countries into three classes based on structural similarities, the research highlights regional disparities in development, urbanization, and vulnerability to food crises. The results emphasize the need for targeted policies, investment in infrastructure, education, and improved governance to enhance food security and resilience in the region.
By 2023, some 733 million people worldwide were suffering from hunger, including one in five in Africa, which represents around 146 million people on the continent [1]. So, despite its natural and demographic wealth, Africa continues to face a major food challenge, with alarming figures for undernourishment and food insecurity. One of the most striking indicators of this food crisis in West Africa is the rate of severe acute malnutrition, which is reaching critical levels in several of the region’s countries,notably Burkina Faso, Niger, Chad and Mali.
According to official estimates, the number of chronically undernourished people could reach 582 million by 2030, half of them in Africa [2] if current trends continue.
Regional disparities are also a cause for concern: in 2022, the prevalence of undernourishment varied from 7.5% in North Africa to around 29% in Central and East Africa [3]. Furthermore, in West and Central Africa, over 40 million people were struggling to feed themselves in 2024, a figure that could rise to 52 million by mid-2025 [4].
In contrast, Asia, which once had almost 750 million food-insecure people, has succeeded in significantly reducing malnutrition since the 1990s, thanks to agricultural reforms and massive investment. This contrast underlines the urgency of understanding and acting on the underlying factors of food insecurity in Africa.
In this global and African context, the Sustainable Development Goals (SDGs), adopted by the United Nations General Assembly in 2015, have brought new momentum to the fight against food insecurity. MDG 2, “Zero Hunger”, aims to eradicate hunger, improve nutrition and promote sustainable agriculture by 2030. This global objective underlines the importance of concerted action to improve food production, reduce inequalities and strengthen resilience in the face of climate crises and conflicts..
As part of our Research and Information Processing project, we have therefore decided to take an interest in this situation, and in particular to: “determine and estimate the real impact of the factors discussed in the literature on food insecurity in West Africa”.
We will analyze and highlight the correlations between these factors and the prevailing food insecurity in West Africa.
Specifically, we will :
Identify factors through literature review
Study the impact of determining factors (population growth and urbanization, poverty level, food inflation, coups d’état and trade balance, educational levels, population growth and urbanization)
Analyze and interpret results.
Food insecurity in West Africa is a complex problem affecting millions of people. The problem is intrinsically linked to armed conflict, climate change, desertification and economic inequality, according to the literature.
Food insecurity in West Africa is a complex problem affecting millions of people. The problem is intrinsically linked to armed conflict, climate change, desertification and economic inequality, according to the literature.
In the same vein, Mueller and Brockerhoff [2] present another aspect of the war. They highlight a particular layer: the internally displaced, estimated by the European Union Commission to number around 3.3 million people in the Sahel region by 2020. These households, already in a precarious situation, are deprived of basic and even financial resources. They are thus exposed to a severe form of insecurity.
In addition, the security situation is having a major impact on food prices. Traoré and Keita [3] note that prices have soared due to the war, making food inaccessible for a large part of the population.
In addition to security, there is a natural factor: climate change. Climate change is worsening food insecurity, especially in West Africa. Indeed, climatic variability, including frequent droughts and floods, leads to significant reductions in agricultural yields, according to Jalloh and Nelson [4]. By way of illustration, the last decade in BF, albeit humid, has seen disparities: these have led to yield reductions of around 20% nationwide and 40% in the southern zone [5]. Furthermore, Salack and Giannini [5] point out that extreme climatic conditions, such as prolonged drought, have devastating effects on food production. Ogunbameru and Adeola [6] propose adaptation strategies, such as the use of drought-resistant seeds and crop diversification, to strengthen people’s resilience in the face of these challenges. Diarra et B. [7] confirm that water resource management is crucial to cope with the effects of prolonged droughts, particularly in Mali. Desertification, particularly in the Sahel region, contributes to land degradation and limits access to arable land, as Diouf and Sow [8] point out. They mention the impact of desertification on agricultural yields and food production capacity. They stress the need for appropriate policies to combat soil degradation and protect farmland from erosion.
Economic and social inequalities play an important role in food insecurity. Bates and Collier [9] have shown that economically unstable countries are often the most affected by malnutrition, due to insufficient resources devoted to agriculture. Smith and Haddad [10] stress the importance of post-conflict agricultural policy in stabilizing markets and restoring agricultural production. Furthermore, Traoré and Keita [3] analyze the impact of economic crises on food prices, concluding that rising prices increase the precariousness of the most vulnerable households.
Agricultural policies and financial investment are also key to tackling food insecurity. Nwosu and Odu [11] highlight the importance of microfinance mechanisms and international support to improve access to agricultural resources. The authors stress that investment in agricultural infrastructure is essential to improve long-term productivity. Smith and Haddad [10] add that financial incentives are needed to boost post-conflict agricultural production, particularly in regions that have suffered massive destruction.
The issue of food sovereignty is also crucial. Hellebrandt and Weitz [12] point out that dependence on food imports and loss of self-sufficiency increase the vulnerability of African countries to food crises. Pommier and Leblanc [13] insist on the need to promote food self-sufficiency by supporting local production and reducing external dependency.
Finally, international organizations play an important role in the fight against food insecurity. The FAO [14] leads initiatives to improve agricultural productivity and support small-scale producers in regions particularly affected by malnutrition. FAO’s efforts are aimed at strengthening local capacities and promoting sustainable agricultural practices to improve food security.
Research and development are also essential to solving the problem of long-term food insecurity. Jalloh and Nelson [4] suggest that innovation in agricultural technologies and drought-resistant seeds can improve the resilience of food systems to climate change.
Finally, another crucial aspect concerns the governance and implementation of agricultural policies. According to Duflo and Banerjee [15], improved local governance and good management of natural resources are essential factors in guaranteeing long-term food security in Africa.
In conclusion, food insecurity in West Africa is the result of a combination of factors linked to armed conflict, climate change, desertification and economic inequality. Research shows that to mitigate the effects of this insecurity, it is crucial to strengthen agricultural policies, encourage investment in infrastructure and promote community resilience. An integrated and sustainable approach is needed to improve food security in the region.
This study is based on a dataset from Our World in Data platform covering key variables related to food insecurity in West Africa. The data taken from the website dates back to 2021 and includes both quantitative and qualitative variables.
Source of variables: Our World in Data
Analysis period: 2021
Data type:
Quantitative: Population growth rate, Precipitation
Qualitative: Food losses, Coup d’état
The countries studied are the 16 countries of West Africa, including Sahelian and coastal countries. We chose the West African countries in order order to understand the causes of food insecurity in West African countries compared with Sahelian countries.
These West African countries are: Benin, Burkina Faso, Côte d’Ivoire, Ghana, Niger, Nigeria, Sierra Leone, Liberia, Mauritania, Togo, Gambia, Guinea Bissau, Senegal and Mali.
The variables chosen for our study can be divided into two sub-groups: active variables and supplementary variables. Active variables are those directly involved in the construction of the PCA axes. These so-called active variables actually express the main causes of food insecurity in West African countries. The other so-called supplementary or passive variables are not involved in the construction of the PCA axes, but enrich the interpretation of the PCA results.
Active variables include: poverty rate, hunger index, GDP and population growth rate. Passive variables include: Coup d’Etat, food losses and Balance of Trade.
setwd("D:/MASTER 2024-2025/2IE/P_RTI/RTI-G02")
data = read.csv(file ="donnees4.csv", header = TRUE, sep = ";", quote = "\"",
dec = ",", row.names = 1)
data
## TCD PRE TP IF TIA PIB DCN TA TU PA DIC CE BC
## BEN 3 1049 13 22 6 3322 6900 47 0.0178 1175297 42 5 -3330
## BFA 2 831 25 25 7 2176 2400 34 0.0058 2086893 6 5 -995
## CPV 1 188 12 11 1 6357 750 91 0.0148 54765 100 0 -1055
## CIV 2 1299 10 22 8 5316 2500 90 0.0094 241095 45 1 2241
## GMB 2 996 17 18 10 2077 7000 59 0.0335 2555332 72 1 -464
## GHA 2 1210 25 15 10 5421 2700 80 0.0244 1311530 36 5 6219
## GIN 2 1791 14 28 16 2640 340 45 0.0057 197266 26 2 2073
## GNB 2 1649 26 28 5 1831 410 54 0.0070 2561140 19 4 63
## LBR 2 2450 28 33 7 1423 3700 48 0.0068 507043 61 2 -1036
## MLI 3 329 21 25 5 2121 24000 31 0.0012 2018765 4 3 -1617
## MRT 3 111 25 23 6 1652 23000 67 0.0004 450720 59 5 99
## NER 3 184 51 28 7 1187 23523 38 0.0010 2393877 14 4 -1926
## NGA 2 1187 31 28 20 4923 2437000 62 0.0277 37941470 18 7 8203
## SEN 2 723 10 16 3 3512 12000 58 0.0081 1622980 43 0 -3372
## SLE 2 2654 26 31 17 1615 800 49 0.0099 802371 39 0 -5457
## TGO 2 1217 27 24 9 2131 16000 67 0.0165 830017 21 1 -765
PCA is a statistical method used in data analysis. It enables us to reduce the dimensions of the data while identifying the main factors explaining the differences between the different individuals studied. It condenses the information derived from the active variables into a limited number of principal axes, thereby preserving most of the variance.
The steps involved in PCA are as follows:
• Data pre-processing : In this first stage of data analysis, the quantitative variables are standardized, with the aim of standardizing the units of the variables between them, in order to facilitate the various calculations that will be made at the level of these variables. Data analysis is carried out on R using the FactoShiny package, ensuring a suitable contribution from the various variables;
• Extracting the principal axes : In this second stage of data analysis, the eigenvectors of the principal axes are calculated from the correlation matrix of the active variables. These eigenvectors determine the proportion of variance described by the axes from which they are derived. The principal axes with the greatest proportion of variance are then selected.
• Axis selection: The selection of PCA principal axes is based on the center of inertia of the factorial axes, which measures the percentage of variability expressed by each axis. To determine which axis to select, we first assess the variance expressed by each axis component. Axes are then selected that express a large proportion of the variability significant enough for the analysis. The largest variability obtained is compared with a reference value obtained by random simulation of the data set to ensure the reliability of the axes chosen.
• Displaying results: To display the results, we used the FactoShiny package to generate biplots, representing the individuals studied and the variables. These graphs are used to identify countries that are strongly influenced by the main variables.
CAH groups countries into homogeneous classes. This method complements PCA by classifying countries into distinct groups. The CAH steps are as follows:
• Calculating distances: This involves calculating a distance matrix (Euclidean distance) in R, based on the coordinates of the countries on the main axes of the PCA. This step measures the similarity between countries.
• Merging individuals: We have grouped the countries closest to each other in the space of the principal axes into clusters.
• Building a dendrogram: Finally, we generated a dendrogram to visualize the hierarchical groupings of countries.
Linear regression is a statistical method used to model and analyze the relationship between a dependent variable and one or more independent explanatory variables.
This questionnaire has been designed as part of a study to analyze the factors influencing food security in several countries. The main objective is to collect precise, quantifiable data on key variables such as rainfall, population growth and socio-economic indicators. This information will be integrated into a Principal Component Analysis (PCA), a Hierarchical Ascending Classification (HAC) and a Linear Regression (LR) to identify underlying dynamics and group countries into homogeneous classes.
Interest and relevance of the survey
Collecting data via this questionnaire is essential for:
• Complement existing databases with accurate information, particularly where gaps or estimates are observed.
• Understanding regional variations in food security.
• Provide a solid evidence base for recommendations tailored to each national context. Survey targets
The questionnaire is aimed at different types of stakeholders involved in food security, including:
• Public institutions: Ministries of Water, Agriculture and the Environment.
• International organizations: FAO, UNESCO.
• Local experts: researchers, food resource managers, economists.
• Local communities: Representatives from the agricultural, industrial and domestic sectors.
Questionnaire structure
The questionnaire is structured into several sections covering the following areas:
General information : Identification of respondents and data sources.
Natural factors: Precise data on precipitation and renewable water resources.
HOUSEHOLD INFORMATION: refers to data related to the
characteristics, structure, and economic conditions of households.
Socio-economic factors: famine situation, gross domestic product,
literacy, poverty and urbanization rate, hunger index etc…
Implementation process
• The questionnaire will be distributed to identified targets via links generated on KoboToolbox.
• Short training courses will be given to interviewers to ensure consistent data collection.
• The data collected will be verified and centralized for integration into subsequent analyses. This questionnaire is a key tool for filling gaps in current data and enhancing the quality of water stress analyses. Thanks to its implementation on KoboToolbox, it guarantees reliable, rapid and secure data collection, while making it possible to reach a wide diversity of respondents in different national contexts. The results obtained will enable us to better understand the dynamics of water management, and to guide policies adapted to the sustainable management of water resources.
3.5 Equipment
To carry out this study, several tools were mobilized for data processing, analysis and bibliographic reference management. These tools include R software, QGIS software, the Zotero application and Kobobtoolbox, each meeting specific needs in the methodology adopted.
3.5.1 R software
R software (version 2024.04.2) was used to perform Principal Component Analysis (PCA), Hierarchical Ascending Classification (HAC) and linear regression.
R is a powerful programming language for statistical and graphical analysis, particularly suited to the processing of complex data. For this study, the FactoShiny package was used, offering an interactive interface facilitating the performance of PCA, CAH and LR. The package was also used to generate visualizations, such as biplots and dendrograms, for clear interpretation of the results. The use of R guarantees reproducibility of analyses and flexibility in data exploration.
3.5.2 QGIS
QGIS software (version 3.40.4) was used to process geospatial data and create thematic maps. This open-source geographic information system was used to visually represent the results of the study, in particular to illustrate differences in water stress between countries. The maps produced using QGIS integrated socio-economic and water data to provide a geographical perspective on the results obtained from the statistical analyses. This tool was essential for translating abstract data into understandable and intuitive visualizations.
3.5.3 Zotero
Bibliographic reference management was provided by Zotero (version
7.0.9), an opensource software package designed to collect, organize and
cite academic sources. Zotero was used to centralize the references used
in this study, whether scientific articles, reports or data from
recognized sources such as Our World in Data. This software facilitated
the automatic generation of citations and bibliographies in line with
academic standards, while ensuring the traceability of sources.
The combined use of these tools ensured rigorous analysis, clear
visualization of results, and efficient reference management,
contributing to the quality and reproducibility of this study.
3.5.4 Kobotoolbox
Data was collected via a structured questionnaire. This was done using the ODK Kobotoolbox application. Which is an integrated set of free, open-source tools for mobile data collection, enabling forms to be built, and interview responses to be aggregated and analyzed.
It consists of an online platform (server) and a data collection application (Kobo Collection).
The questionnaire link is attached in appendix
Enabling the creation of forms and the aggregation and analysis of interview responses. It consists of an online platform (server) and a data collection application (Kobo Collection). The link to the questionnaire is appended.
The aim of this study is to analyze a dataset comprising 16 individuals and 13 variables using Principal Component Analysis (PCA) and Hierarchical Ascending Classification (HAC). One atypical individual was detected. The aim is to identify relationships between variables, reduce dimensionality and group individuals into homogeneous classes 1. Observation of extreme individuals Initial analysis revealed the presence of an atypical individual, whose values deviate significantly from the general trend. The atypical individual has a strong influence on Axis 1 and contributes to a distortion in the interpretation of the results.
library(FactoMineR)
## Warning: package 'FactoMineR' was built under R version 4.4.3
library(factoextra)
## Warning: package 'factoextra' was built under R version 4.4.3
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.4.3
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
pca_1 = PCA(X = data, scale.unit = TRUE, ncp = 5, ind.sup = NULL,
quanti.sup = NULL, quali.sup = NULL, row.w = NULL,
col.w = NULL, graph = TRUE, axes = c(1,2))
-Inertia
fviz_eig(pca_1, addlabels=TRUE, hjust = -0.3)
ylim(0, 65)
## <ScaleContinuousPosition>
## Range:
## Limits: 0 -- 65
library(FactoMineR)
library(factoextra)
library(ggplot2)
library(factoextra)
The inertia of the factorial axes indicates, on the one hand, whether the variables are structured and, on the other, suggests the appropriate number of principal components to study. A study of the eigenvalues shows that: The first two axes explain 62% of the total inertia.The first axis remains the most explanatory, structuring an opposition between economies with high growth and economies with low industrial development.The cumulative inertia indicates that the information contained in the first axes is significant and not due to chance.
An estimate of the relevant number of axes to be interpreted suggests restricting the analysis to the description of the first 2 axes. These components reveal a higher level of inertia than the 0.95-quantile of random distributions (60% vs. 46.38%). This observation suggests that only these axes carry real information. Consequently, the analysis will be on these axes
graph biplot variables and individual
fviz_pca_biplot(pca_1, repel = TRUE,col.var = "blue",col.ind = "red")
According to the figures above, there is a correlation between the variables and the individuals
The study will focus on two axes:
-The Dim2 axis (29.6%) explains 29.6% of total variance.
Together, these two axes explain 64% of the variance, which is a good representation of the data.
The PCA analysis in Figure individual shows that :
The PCA analysis in figure of variable shows that :
A variable that points in one direction indicates that it is strongly correlated with that dimension. In our case, variables such as : - TCD, IF, TP are strongly positively correlated with Dim1, while DIC, TA, PIB are strongly negatively correlated with Dim1.
DC, DCN, PA and TIA l are more correlated with Dim2. Nigeria is far removed from the other countries, which means it has very different characteristics. It is positioned extremely to the right on Dim1, indicating that it has very high values of the variables that are correlated with this axis (GDP, BC, DN, PA). This suggests that Nigeria has a higher level of development than the other countries represented.
Countries such as Mali, Niger and Guinea are far removed from Nigeria, indicating that they have very different profiles (probably very low urbanization rates and a negatively constant trade balance). Nigeria is an atypical individual, probably because it has a high GDP and TU compared with the other countries. Nigeria and Ghana are relatively close, suggesting that they have similar profiles.
In summary, Nigeria stands out strongly and appears to be the country with the best development among those analyzed.
Hierarchical tree -Dendogram
res.PCA<-PCA(data,graph=FALSE)
res.HCPC<-HCPC(res.PCA,nb.clust=3,consol=FALSE,graph=FALSE)
plot.HCPC(res.HCPC,choice='tree',title='Arbre hiérarchique')
The classification performed on the individuals reveals 3 classes. The class of individuals, constructed from the first two factorial axes, captures 58% to 62% of total inertia. It reveals three main groups of individuals, each characterized by specific economic and social dynamics. Groups identified in the graph
Group 1: Countries with high economic and trade growth Individuals concerned: Cape Verde (CPV), Côte d’Ivoire (CIV), Ghana (GHA).
Position on graph: Upper left quadrant, with significant projections on axis 1 and axis 2.They are most notable for.
High GDP, urbanization rate (TU) and dependency on cereal imports (DIC), moderate level of wealth index (FI). Strong integration into international trade and advanced economic transition.
Group 2: Countries in demographic and economic transition
Individuals concerned: Mali (MLI), Niger (NER), Mauritania (MRT) Position on graph: Lower right quadrant, with negative projections on axis 1 and axis 2. Low GDP, low urbanization rate (TU) and medium literacy rate (TA), largely agricultural economies with little industrialization.
Group 3: Nigeria, an atypical individual
Individual concerned: Nigeria (NGA)Position on graph: Distinctly different from the other groups, with particularly strong projections on axis 1. High values for Displacement due to natural disaster (DCN), Balance of trade (BC), Gross domestic product (GDP), Index of end (IF), and Food inflation rate (TIA)
The graph of individuals reveals three main trends:
Countries with high economic and trade growth (CIV, GHA, CPV), with high GDP and advanced urbanization. Countries in transition (MLI, NER, MRT), characterized by high fertility and limited industrialization. Nigeria as a special case, an atypical individual that strongly structures the factor analysis due to its resource-based economy and uneven development.
setwd("D:/MASTER 2024-2025/2IE/P_RTI/RTI-G02")
data_1 = read.csv(file ="dts - sans NGA.csv", header = TRUE, sep = ";", quote = "\"",
dec = ",", row.names = 1)
data_1
## TCD PRE TP IF TIA PIB DCN TA TU PA DIC CE BC
## BEN 3 1049 13 22 6 3322 6900 47 0.0178 1175297 42 5 -3330
## BFA 2 831 25 25 7 2176 2400 34 0.0058 2086893 6 5 -995
## CPV 1 188 12 11 1 6357 750 91 0.0148 54765 100 0 -1055
## CIV 2 1299 10 22 8 5316 2500 90 0.0094 241095 45 1 2241
## GMB 2 996 17 18 10 2077 7000 59 0.0335 2555332 72 1 -464
## GHA 2 1210 25 15 10 5421 2700 80 0.0244 1311530 36 5 6219
## GIN 2 1791 14 28 16 2640 340 45 0.0057 197266 26 2 2073
## GNB 2 1649 26 28 5 1831 410 54 0.0070 2561140 19 4 63
## LBR 2 2450 28 33 7 1423 3700 48 0.0068 507043 61 2 -1036
## MLI 3 329 21 25 5 2121 24000 31 0.0012 2018765 4 3 -1617
## MRT 3 111 25 23 6 1652 23000 67 0.0004 450720 59 5 99
## NER 3 184 51 28 7 1187 23523 38 0.0010 2393877 14 4 -1926
## SEN 2 723 10 16 3 3512 12000 58 0.0081 1622980 43 0 -3372
## SLE 2 2654 26 31 17 1615 800 49 0.0099 802371 39 0 -5457
## TGO 2 1217 27 24 9 2131 16000 67 0.0165 830017 21 1 -765
The aim of this study is to analyze a dataset comprising 15 individuals and 13 variables using Principal Component Analysis (PCA) and Ascending Hierarchical Classification (AHC). The atypical NIGERIA was removed from the group in order to carry out an in-depth analysis of the remaining individuals and see how they behave in relation to each other.
The aim was to identify relationships between variables, reduce dimensionality and group individuals into homogeneous classes. The PCA was recalculated to assess its impact on the explanation of variance and group structuring
correlation matrix
mat_cor = cor(data_1)
mat_cor
## TCD PRE TP IF TIA PIB
## TCD 1.00000000 -0.30180353 0.39129279 0.35586241 -0.01190898 -0.5225740
## PRE -0.30180353 1.00000000 -0.04765574 0.57311477 0.65200920 -0.2148863
## TP 0.39129279 -0.04765574 1.00000000 0.50314904 0.10892514 -0.5846163
## IF 0.35586241 0.57311477 0.50314904 1.00000000 0.46292311 -0.7783963
## TIA -0.01190898 0.65200920 0.10892514 0.46292311 1.00000000 -0.2819404
## PIB -0.52257396 -0.21488634 -0.58461628 -0.77839635 -0.28194045 1.0000000
## DCN 0.72971593 -0.62941902 0.43342005 0.05286490 -0.29229785 -0.4069611
## TA -0.53978124 -0.10519827 -0.40680298 -0.64497699 -0.18868030 0.7743432
## TU -0.37188601 0.12050120 -0.31346143 -0.53098349 0.14647634 0.3662973
## PA 0.31522483 -0.22466329 0.39496956 0.05905742 -0.17380813 -0.4224448
## DIC -0.47888328 -0.04633391 -0.41737597 -0.50264131 -0.22543127 0.4549337
## CE 0.60026636 -0.27212299 0.38823898 0.15037061 -0.12882620 -0.2089917
## BC -0.17954182 -0.05434761 -0.07532454 -0.28218148 0.09566062 0.4456240
## DCN TA TU PA DIC CE
## TCD 0.7297159 -0.5397812 -0.3718860 0.31522483 -0.478883281 0.6002664
## PRE -0.6294190 -0.1051983 0.1205012 -0.22466329 -0.046333911 -0.2721230
## TP 0.4334201 -0.4068030 -0.3134614 0.39496956 -0.417375967 0.3882390
## IF 0.0528649 -0.6449770 -0.5309835 0.05905742 -0.502641310 0.1503706
## TIA -0.2922978 -0.1886803 0.1464763 -0.17380813 -0.225431272 -0.1288262
## PIB -0.4069611 0.7743432 0.3662973 -0.42244481 0.454933653 -0.2089917
## DCN 1.0000000 -0.2873434 -0.3724659 0.25384737 -0.282634952 0.2272825
## TA -0.2873434 1.0000000 0.4109741 -0.52059632 0.642489426 -0.3250518
## TU -0.3724659 0.4109741 1.0000000 0.10779722 0.428041712 -0.1912426
## PA 0.2538474 -0.5205963 0.1077972 1.00000000 -0.471579395 0.3341780
## DIC -0.2826350 0.6424894 0.4280417 -0.47157940 1.000000000 -0.4321974
## CE 0.2272825 -0.3250518 -0.1912426 0.33417804 -0.432197352 1.0000000
## BC -0.2054331 0.4652408 0.2408949 -0.14883584 -0.004771674 0.3188592
## BC
## TCD -0.179541815
## PRE -0.054347609
## TP -0.075324537
## IF -0.282181485
## TIA 0.095660622
## PIB 0.445624048
## DCN -0.205433064
## TA 0.465240769
## TU 0.240894941
## PA -0.148835840
## DIC -0.004771674
## CE 0.318859206
## BC 1.000000000
library(corrplot)
## Warning: package 'corrplot' was built under R version 4.4.3
## corrplot 0.95 loaded
library(psych)
## Warning: package 'psych' was built under R version 4.4.3
##
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
library(Hmisc)
## Warning: package 'Hmisc' was built under R version 4.4.3
##
## Attaching package: 'Hmisc'
## The following object is masked from 'package:psych':
##
## describe
## The following objects are masked from 'package:base':
##
## format.pval, units
visualisation
col = colorRampPalette(c("#BB4444", "#EE9988", "#FFFFFF", "#77AADD", "#4477AA"))
corrplot(mat_cor, method="color",
type="upper", order="hclust",
addCoef.col = "black", # Ajout du coefficient de corrélation
tl.col="black", tl.srt=90, #Rotation des étiquettes de textes
, sig.level = 0.1, insig = "blank",
# Cacher les coefficients de corrélation sur la diagonale
diag=FALSE)
This correlation matrix presents the relationships between different variables through a gradient of colors ranging from red (strong negative correlation) to blue (strong positive correlation).
A strong positive correlation, close to 1, indicates that the two variables are moving in the same direction, while a strong negative correlation, close to -1, means that one variable is decreasing while the other is increasing; conversely, a weak or zero correlation, close to 0, shows that there is no significant relationship between the variables.
Highly positively correlated variables: TCD (0.76%), IF (0.72%), TP (0.69%), DCN (0. 57%), PRE (0.95%), TIA (0.77%), IF (0.61%).
Strong negative correlations:
Variables strongly negatively correlated: DIC (-0.73%), TA (-0.85%), PIB (- 0.85%) ,DCN (-0.61%).
The correlation coefficient of GDP and TA being 0.77% implies that countries with a high GDP generally have a high TA.
The correlation coefficient of the population growth rate (TCD) and DCN being 0.70% implies that a higher population growth rate is strongly linked to a higher displacement due to natural disasters (DCN). This matrix shows that gross domestic product (GDP) and population growth rate (GGR) are closely linked to literacy rates and disaster displacement. Conversely, poor literacy rates and limited access to infrastructure are associated with weaker development. In short, policies to improve GDP and population growth can have an overall positive impact on literacy rates and reduce the problems of displacement due to natural disasters.
Inertia graph
fviz_eig(pca_1, addlabels=TRUE, hjust = -0.3)
ylim(0, 65)
## <ScaleContinuousPosition>
## Range:
## Limits: 0 -- 65
Individual study
The graph above shows the number of axes to be selected by applying the elbow method, which will enable us to determine the optimum number of dimensions to retain.
The first two dimensions (Dim1 and Dim2) together explain 58% of the total variance:
Dim1 37.4% of variance explains that this first dimension captures almost half of the information present in the data. It is therefore the most important for summarizing the main trends.
Dim2 20.6% of variance explains that it makes a significant contribution, but much less than Dim1, indicating that it represents a second source of variability in the data.
With 58% of variance explained, these two dimensions already provide a good representation of the data, although some information remains unexplained. In practice, visualization on a plane (Dim1-Dim2) would provide a reliable projection of individuals and variables, which is often sufficient to interpret underlying structures.
limiting ourselves to Dim1 and Dim2 provides an efficient synthesis of the data, retaining the main sources of variation. This makes it possible to identify major trends without burdening the analysis
fviz_contrib(pca_1, choice = "ind", axes = 1)
fviz_contrib(pca_1, choice = "var", axes = 1)
fviz_contrib(pca_1, choice = "ind", axes = 2)
fviz_contrib(pca_1, choice = "var", axes = 2)
Our study is based on the graph of individuals and the graph of variables
Description of the 1:2 plan
fviz_pca_ind(pca_1, col.ind = "contrib", gradient.cols = c("blue" , "green" , "red"), repel = TRUE)
Plot of individuals on the factorial plane without NIGERIA
Labeled individuals are those with the greatest contribution to the construction of the plan.
Dimension 1 contrasts individuals such as NER, MLI, BFA and MRT (on the right of the graph, characterized by a strongly positive coordinate on the axis) with individuals such as CPV, CIV and GHA (on the left of the graph, characterized by a strongly negative coordinate on the axis).
The group to which NER, MLI, MRT and BFA belong (characterized by a positive coordinate on the axis) shares :
The group to which the CPV, CIV and GHA individuals belong (characterized by a negative coordinate on the axis) shares :
Dimension 2 contrasts individuals such as SLE, GIN and LBR (at the top of the graph, characterized by a strongly positive coordinate on the axis) with individuals such as NER, MLI and MRT (at the bottom of the graph, characterized by a strongly negative coordinate on the axis).
The group to which SLE, GIN and LBR belong (characterized by a positive coordinate on the axis) shares: - high values for the PRE, TIA and IF variables (from most extreme to least extreme).
The group to which NER, MLI and MRT belong (characterized by a negative coordinate on the axis) shares: - high values for the CE, TCD and DCN variables (from most extreme to least extreme).
fviz_pca_biplot(pca_1, repel = TRUE,col.var = "blue",col.ind = "red")
- biplot of individuals and variables
This biplot from our Principal Component Analysis (PCA) shows simultaneously: - The variables (blue arrows) that influence the principal axes. - Individuals (countries in red) positioned according to their values on these dimensions. The Dim1 (37.4%): It captures almost half (40%) of the data variance and the Dim2 (20.6%): It explains an additional share of the variance (20%). These axes allow us to classify countries according to structural factors.
-###Description of factorial design (Axis 1 and Axis 2)
Interprétation de la dimension 1
Axis 1: contrasts two groups of individuals NER, MLI and MRT (located on the right of the graph), characterized by high values for the CE, TCD and DCN variables and low values for the DIC variable.
CPV, CIV and GHA (located on the left), with high values for GDP, TA, TU and DIC and low values for IF.
The high correlation between these variables and the first component suggests that this axis could be interpreted as an opposition between countries with high trade and economic density (CPV, CIV, GHA) and countries with low industrialization but characterized by other forms of development (NER, MLI, MRT)
Interpretation of dimension 2
Axis 2: contrasts: SLE, GIN and LBR (at the top of the graph), characterized by high values for PRE, TIA and IF. NER, MLI and MRT (bottom of graph), with high values for CE, TCD and DCN and low values for DIC.
This dimension seems to distinguish economies where investment in infrastructure and education is more marked (SLE, GIN, LBR) from countries where the economic structure relies more on other factors.
cos2 graph
fviz_pca_var(pca_1, col.var = "cos2" , gradient.col = c("blue" , "green" , "red"), repel = TRUE )
This graph shows the links between variables and their contribution to the main axes (Dim1 and Dim2).
-Dim1 (37.4%): It accounts for almost 40% of total information, which means it is the main direction of data variation. -Dim2 (20%): It represents 20% of the information, a second important direction, but less influential than Dim1.
The variables strongly projected on Dim1 (horizontal axis)
Positively correlated (right): -DCN (Deplacement Dus Aux Catastrophe Naturelles) -CD (Taux de Croissance Demographique) -TP (Taux de Pauvreté) - IF (Indice de Fin)
Negatively correlated (left): -GDP(Gross Domestic Product) -TA(Literacy Rate) Variables projected onto Dim2 (vertical axis) are: Positively correlated (high): - TIA (Taux d’Inflation Alimentaire) - PRE (Précipitation)
##Classification of the individual
res.PCA<-PCA(data_1,graph=FALSE)
res.HCPC<-HCPC(res.PCA,nb.clust=3,consol=FALSE,graph=FALSE)
plot.HCPC(res.HCPC,choice='map',draw.tree=FALSE,title='Plan factoriel')
-Hierarchical tree on the factorial space .
plot.HCPC(res.HCPC,choice='3D.map',ind.names=FALSE,centers.plot=FALSE,angle=60,title='Arbre hiérarchique sur le plan factoriel')
library(corrplot)
library("clusterSim")
## Warning: package 'clusterSim' was built under R version 4.4.3
## Loading required package: cluster
## Loading required package: MASS
The classification carried out on the individuals reveals 3 classes.
Class 1 is made up of individuals such as CPV, CIV and GHA. This group is characterized by :
Class 2 is made up of individuals such as GIN, LBR and SLE. This group is characterized by: high values for the PRE variable.
Class 3 is made up of individuals such as MLI, MRT and NER. This group is characterized by :
Class 1 is made up of individuals such as CPV, CIV, GHA, GMB, BEN and SEN.
This group is characterized by: - high values for the variables GDP, TA and DIC (from the most extreme to the least extreme), due to their geographical position and openness to the sea - low values for the variables IF and TP (from the most extreme to the least extreme).
Class 2 is made up of individuals such as MLI ,GIN, TGO , MRT , SEL , LBR and, NER. This group is characterized by:
This group is characterized by: - high values for the variable IF. - low values for the variables GDP, TU and TA (from the most extreme to the least extreme).
Class 3 is made up of individuals such as NGA.
This group is characterized by : - high values for the variables DCN, PA, BC and TIA (from most extreme to least extreme).
This dendrogram is derived from an ascending hierarchical classification (AHC) and groups countries according to their similarities on several socioeconomic variables.
The tree reveals three main groups:
Cluster 1 (black) includes CPV , CIV, GHA
Cluster 2 (red) includes SLE, GIN, LBR, BFA, GNB, GMB, SEN, TOG, BEN
Cluster 3 (green) includes NER,ML,MRT
The height of the merges indicates the similarity between countries:
-Niger and Burkina Faso are very close, suggesting strong common features. -he red cluster (Mali, Côte d’Ivoire, Guinea, Gambia) is merged at an intermediate height, indicating moderate similarities. -The final clustering is at a high height, suggesting that the three major groups identified are quite distinct from each other.
Cutting the tree into three groups seems coherent, indicating that these countries have significant commonalities within each cluster.
library(car)
## Warning: package 'car' was built under R version 4.4.3
## Loading required package: carData
## Warning: package 'carData' was built under R version 4.4.3
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
library(carData)
library(corrplot)
library("clusterSim")
library(DataExplorer)
## Warning: package 'DataExplorer' was built under R version 4.4.3
library(factoextra)
library(FactoInvestigate)
## Warning: package 'FactoInvestigate' was built under R version 4.4.3
attach(data)
summary(data)
## TCD PRE TP IF
## Min. :1.000 Min. : 111.0 Min. :10.00 Min. :11.00
## 1st Qu.:2.000 1st Qu.: 624.5 1st Qu.:13.75 1st Qu.:21.00
## Median :2.000 Median :1118.0 Median :25.00 Median :24.50
## Mean :2.188 Mean :1116.8 Mean :22.56 Mean :23.56
## 3rd Qu.:2.250 3rd Qu.:1386.5 3rd Qu.:26.25 3rd Qu.:28.00
## Max. :3.000 Max. :2654.0 Max. :51.00 Max. :33.00
## TIA PIB DCN TA
## Min. : 1.000 Min. :1187 Min. : 340 Min. :31.0
## 1st Qu.: 5.750 1st Qu.:1786 1st Qu.: 2000 1st Qu.:46.5
## Median : 7.000 Median :2154 Median : 5300 Median :56.0
## Mean : 8.562 Mean :2982 Mean : 160189 Mean :57.5
## 3rd Qu.:10.000 3rd Qu.:3865 3rd Qu.: 17750 3rd Qu.:67.0
## Max. :20.000 Max. :6357 Max. :2437000 Max. :91.0
## TU PA DIC CE
## Min. :0.000400 Min. : 54765 Min. : 4.00 Min. :0.000
## 1st Qu.:0.005775 1st Qu.: 492962 1st Qu.: 18.75 1st Qu.:1.000
## Median :0.008750 Median : 1243414 Median : 37.50 Median :2.500
## Mean :0.011875 Mean : 3546910 Mean : 37.81 Mean :2.812
## 3rd Qu.:0.016825 3rd Qu.: 2163639 3rd Qu.: 48.50 3rd Qu.:5.000
## Max. :0.033500 Max. :37941470 Max. :100.00 Max. :7.000
## BC
## Min. :-5457.00
## 1st Qu.:-1694.25
## Median : -880.00
## Mean : -69.94
## 3rd Qu.: 592.50
## Max. : 8203.00
# Afficher les résultats du modèle
modele <- lm(formula=PIB~TCD+TP+TIA+TA+TU+DIC+CE , data = data)
summary(modele)
##
## Call:
## lm(formula = PIB ~ TCD + TP + TIA + TA + TU + DIC + CE, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1437.2 -481.8 132.0 528.5 1163.6
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2409.08 2574.90 0.936 0.3769
## TCD -819.97 746.58 -1.098 0.3040
## TP -53.20 34.14 -1.558 0.1578
## TIA -19.26 70.03 -0.275 0.7902
## TA 57.61 22.96 2.510 0.0364 *
## TU 14560.75 39497.98 0.369 0.7220
## DIC -11.96 16.99 -0.704 0.5014
## CE 248.21 177.87 1.395 0.2004
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1130 on 8 degrees of freedom
## Multiple R-squared: 0.7486, Adjusted R-squared: 0.5286
## F-statistic: 3.403 on 7 and 8 DF, p-value: 0.05373
vif(modele)
## TCD TP TIA TA TU DIC CE
## 1.937739 1.457476 1.521391 2.049455 1.775578 2.265896 1.845065
library(ggplot2)
donnees = data.frame(data)
data_vis= data.frame(valeurs_reelles=donnees$PIB,predictions=predict(modele))
ggplot(data_vis,aes(x=valeurs_reelles,y=predictions))+
geom_point()+
geom_smooth(method="lm",se=FALSE,color="blue")+
labs(x="valeurs_reelles",y="predictions")+
ggtitle(modele)
## `geom_smooth()` using formula = 'y ~ x'
The graph shows that the model predictions follow a linear trend in relation to the actual values. However, there are significant deviations at some points, indicating an imperfect fit.
Blue line: Regression line representing the relationship estimated by the model.
Black dots: Actual observations compared with predictions. Explanatory variables are: TCD, TIA, TA, TU
The model equation is
𝒀 = 𝟐𝟕𝟏𝟔. 𝟎𝟖 − 𝟔𝟐𝟓. 𝟐𝟗𝑻𝑪𝑫 − 𝟓𝟒. 𝟎𝟖𝑻𝑰𝑨 + 𝟓𝟓. 𝟒𝟎𝑻𝑨 + 𝟖𝟏𝟔𝟖. 𝟒𝟎𝑻𝑼
The aim is multiple linear regression, which seeks to explain a dependent variable (or response) as a function of several explanatory variables (or predictors); the analysis will be made with the economic adage that all things being equal means that all parameters remain constant except the one under study
In other words, each coefficient represents the effect of an explanatory variable when the others are held constant.
If a coefficient is positive: when the explanatory variable increases, the dependent variable tends to increase.
Our dependent variable is GDP, and increases with TU(Urbanization Rate) and TA(Literacy Rate) because they are assigned positive coefficients.
When GDP decreases, TCD and TIA increase because they are assigned negative coefficients, On the other hand, TU (Urbanization Rate) and TCD (Demographic Growth Rate) have more consequential impact on GDP according to the regression The first 2 axes of the analysis express 58% of the total inertia of the dataset; this means that 58% of the variability 3. Rsquared: 0.975 close to 1 Thus, the proportion of the dependent variable’s variance better explains the data model.
The PCA revealed two main axes explaining 58% to 62% of the total variability, depending on the inclusion or exclusion of the atypical individual.
The classification highlighted three distinct groups corresponding to industrialized economies, transitioning economies, or economies heavily reliant on natural resources.
The results emphasize the importance of integrating complementary methods such as Discriminant Analysis or supervised clustering models to validate the obtained segmentation and refine the interpretation of the economic and social relationships between individuals.
Our study was able to take into account the causes of the persistence of famine in West Africa on the environmental and socio-economic levels. It emerges that the variables GDP, TA, IF, TP and TCD greatly influence the causes of the persistence of famine in West Africa in countries such as Cape Verde, Niger and Ghana while variables such as PRE, IF and TIA are the causes of the persistence of famine in Sierra Leone, Niger (NER), Guinea (GIN) and Liberia (LBR). However, as a perspective, we should consider collecting data on the institutional and political level to better deepen our study.
Questionnaire link : https://ee.kobotoolbox.org/x/J8loFcLd