# Set options and load required libraries
knitr::opts_chunk$set(echo = TRUE)
library(FactoMineR)
library(factoextra)
## Le chargement a nécessité le package : ggplot2
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
library(ggplot2)
library(psych)
##
## Attachement du package : 'psych'
## Les objets suivants sont masqués depuis 'package:ggplot2':
##
## %+%, alpha
library(Factoshiny)
## Le chargement a nécessité le package : shiny
## Le chargement a nécessité le package : FactoInvestigate
library(shiny)
library(FactoInvestigate)
#Set options and load required libraries
library(DataExplorer)
## Warning: le package 'DataExplorer' a été compilé avec la version R 4.5.2
library(corrplot)
## corrplot 0.95 loaded
library(pander)
## Warning: le package 'pander' a été compilé avec la version R 4.5.2
##
## Attachement du package : 'pander'
## L'objet suivant est masqué depuis 'package:shiny':
##
## p
library(DT)
##
## Attachement du package : 'DT'
## Les objets suivants sont masqués depuis 'package:shiny':
##
## dataTableOutput, renderDataTable
library(rsconnect)
## Warning: le package 'rsconnect' a été compilé avec la version R 4.5.2
##
## Attachement du package : 'rsconnect'
## L'objet suivant est masqué depuis 'package:shiny':
##
## serverInfo
library(askpass)
library(VIM)
## Le chargement a nécessité le package : colorspace
## Le chargement a nécessité le package : grid
## VIM is ready to use.
## Suggestions and bug-reports can be submitted at: https://github.com/statistikat/VIM/issues
##
## Attachement du package : 'VIM'
## L'objet suivant est masqué depuis 'package:datasets':
##
## sleep
library(dplyr)
##
## Attachement du package : 'dplyr'
## Les objets suivants sont masqués depuis 'package:stats':
##
## filter, lag
## Les objets suivants sont masqués depuis 'package:base':
##
## intersect, setdiff, setequal, union
The database was entered into Excel and saved in CSV (semicolon-delimited) format. We then imported this file into R using the “read.csv()” function.
#Define the file path
file <- "C:/Users/LEGION/Desktop/Projet RTI S7A-2025/RTI_DATA_2022.csv"
# Set working directory and load data
donnees_csv <- read.csv(file, header = TRUE, sep = ";", dec = ",", row.names = 1, fileEncoding = "latin1")
datatable(donnees_csv, options = list(pageLength = 5, autoWidth = TRUE))
To improve the readability of the database header, we decided to rename the variables so that the software could not display the units of the data contained within parentheses. To do this, we used the “colnames” function, which we applied to our database.
#Rename the columns with the variable names without the units in parentheses
colnames(donnees_csv) <- c(
"Elect_Gen", # Elec_Gen (GWh)
"Access_Elect", # Access_Elec (% of pop)
"Access_Elect_Urbain", # Access_Elec_urban (% of urban pop)
"Access_Elect_Rural", # Access_Elec_Rural (% of rural pop)
"Elec_Demand", # Elec_Demand (GWh)
"Total_Pop", # Total_Pop (hbts)
"Rural_Pop", # Rural_Pop (% of total pop)
"Pop_Growth", # Pop_Growth (annual %)
"GDP_Per_Capita", # GDP_Per_Capita (current US$)
"HDI", # HDI
"Fossil_fuels_elect_gen", # Fossil fuels elect gen (billion kWh)
"Hydroelectricity_gen", # Hydroelectricity generation (billion kWh)
"Income_Class", # Income_Class
"Indust_Level" #Indust_Level
)
datatable(donnees_csv, options = list(pageLength = 5, autoWidth = TRUE))
In this section, we created a function that identifies missing values and calculates their proportion relative to the entire database. After running the code, we see that the data on the rural electrification rate is missing. Therefore, we have a proportion of 0.1 missing values compared to 0.9 for the other data.
# Function to calculate the proportion of missing values per variable
proportion_valeurs_manquantes <- function(data)
{
# Calculating the number of missing values per column
nb_valeurs_manquantes <- sapply(data, function(x) sum(is.na(x)))
# Calculating the proportion of missing values
proportion_manquantes <- nb_valeurs_manquantes / nrow(data)
# Creating a dataframe for the result
resultat <- data.frame(Nombre = nb_valeurs_manquantes, Proportion = proportion_manquantes)
return(resultat)
}
# Using the function with your database
resultat <- proportion_valeurs_manquantes(donnees_csv)
# Displaying the result
resultat
## Nombre Proportion
## Elect_Gen 0 0.00000000
## Access_Elect 0 0.00000000
## Access_Elect_Urbain 0 0.00000000
## Access_Elect_Rural 1 0.09090909
## Elec_Demand 0 0.00000000
## Total_Pop 0 0.00000000
## Rural_Pop 0 0.00000000
## Pop_Growth 0 0.00000000
## GDP_Per_Capita 0 0.00000000
## HDI 0 0.00000000
## Fossil_fuels_elect_gen 0 0.00000000
## Hydroelectricity_gen 0 0.00000000
## Income_Class 0 0.00000000
## Indust_Level 0 0.00000000
# Using the aggr() function to view missing values
aggr(donnees_csv, col=c('navyblue','yellow'), numbers=TRUE, sortVars=TRUE,
labels=names(donnees_csv), cex.axis=.7, gap=3, ylab=c("Histogram of missing data","Pattern"))
##
## Variables sorted by number of missings:
## Variable Count
## Access_Elect_Rural 0.09090909
## Elect_Gen 0.00000000
## Access_Elect 0.00000000
## Access_Elect_Urbain 0.00000000
## Elec_Demand 0.00000000
## Total_Pop 0.00000000
## Rural_Pop 0.00000000
## Pop_Growth 0.00000000
## GDP_Per_Capita 0.00000000
## HDI 0.00000000
## Fossil_fuels_elect_gen 0.00000000
## Hydroelectricity_gen 0.00000000
## Income_Class 0.00000000
## Indust_Level 0.00000000
Only one variable contains a missing value: • Access_Elect_Rural: 1 missing value (10% for this variable). All other variables have 0% missing values.
Interpretation • The dataset is generally clean and fully usable.
• The single missing value does not invalidate the PCA, especially since FactoMineR automatically imputes using the mean (but the note warned you to potentially use imputePCA).
• This near-total absence of NA means that the PCA results will be stable and reliable.
This description involves creating histograms and boxplots for each variable. These graphs will allow us to analyze and understand the distribution of each variable: the mean, the variance, outliers, etc. We begin this step by identifying the columns containing quantitative variables using the “sapply()” package.
# Identify the quantitative columns
vars_quantitatives <- sapply(donnees_csv, is.numeric)
# Create a histogram for each quantitative variable
for (var in names(donnees_csv)[vars_quantitatives]) {
print(ggplot(donnees_csv, aes_string(x = var)) +
geom_histogram(bins = 30, fill = "blue", color = "black") +
theme_minimal() +
labs(title = paste("Histogram of", var), x = var, y = "Frequency"))
}
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_bin()`).
The histograms show very heterogeneous distributions, which is normal in an analysis of Central Africa where countries have very different energy profiles. Indeed,
Elect_Gen (Total Electricity Production) • Highly asymmetrical distribution. • Cameroon, Congo, and Gabon produce significantly more electricity than the Central African Republic (CAR) and Chad. This description indicates strong structural disparities between countries.
Access_Elect, Access_Elect_Urban, Access_Elect_Rural • High heterogeneity: • Gabon and Cameroon: very high access • Chad and CAR: very low access • Rural: extremely low values everywhere (very low for CAR and Chad) This confirms that rural access is the main challenge in the region.
GDP_per_capita, HDI • Highly dispersed distribution: Gabon and Equatorial Guinea are by far the most dominant. Wealthy countries have better energy performance.
Fossil fuels, electricity, and hydropower • Some countries do not use fossil fuels or hydropower at all (Chad, Central African Republic). The energy mix also explains the disparities in electrification.
These descriptions may be confirmed or refuted by the ACP
# Create a boxplot for each quantitative variable
for (var in names(donnees_csv)[vars_quantitatives]) {
print(ggplot(donnees_csv, aes_string(x = factor(1), y = var)) +
geom_boxplot(fill = "skyblue", color = "darkblue") +
theme_minimal() +
labs(title = paste("Boxplot of", var), x = "", y = var))
}
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_boxplot()`).
The boxplots show: • Strong outliers for: • General Elective (Cameroon very high) • GDP per capita (Gabon and Equatorial Guinea well above) • Rural Elective Access (Gabon much above the others) This confirms that the sample contains extremely diverse countries, fully justifying the use of PCA to identify typical profiles.
# Function to create a barplot in proportions
creer_barplot_proportion <- function(data, column_name)
{
# Calculate the proportionss
proportions <- data %>%
count(.data[[column_name]]) %>%
mutate(Proportion = n / sum(n))
# Create the barplot
ggplot(proportions, aes_string(x = column_name, y = "Proportion", fill = column_name)) +
geom_bar(stat = "identity") +
scale_y_continuous(labels = scales::percent_format()) +
labs(x = column_name, y = "Proportion (%)") +
theme_minimal()
}
# Create a bar plot for the variable "Income_Class"
creer_barplot_proportion(donnees_csv, "Income_Class")
# Create a bar plot for the variable "Indust_Level"
creer_barplot_proportion(donnees_csv, "Indust_Level")
The income bracket chart shows that over 45% of countries have very low incomes and nearly 35% have middle incomes. Only slightly less than 20% of the countries studied have relatively high incomes.
The industrialization level chart shows that nearly 40% of Central African countries have low levels of industrialization, while nearly 40% have high levels of industrialization.
We calculate the correlation matrix for the first nine variables and visualize it using a correlation plot. This helps in understanding relationships between variables before performing PCA.
# Identify the quantitative columns
vars_quantitatives <- sapply(donnees_csv, is.numeric)
#Extraction of quantitative variables
donnees_quantitatives <- donnees_csv[, vars_quantitatives]
# Calculate the correlation matrix
matrice_correlation <- cor(donnees_quantitatives, use = "complete.obs")
datatable(matrice_correlation, options = list(pageLength = 6)) %>%
formatRound(columns = 1:ncol(matrice_correlation), digits = 2)
# DataExplorer correlation plot
corrplot(matrice_correlation, method = "color", type = "upper", tl.col = "black", tl.srt = 75)
The correlation graph (heatmap) highlights the relationships between the different variables associated with electrification in Central Africa. Several important links clearly emerge:
These three electricity access variables are very strongly correlated with each other (high coefficients, in dark blue). This means that: • when a country has good overall access to electricity, • it also has good access in urban areas, • and often better access in rural areas (even if the levels remain low). This is logical: overall access is primarily driven by urban performance, but when rural access improves, it immediately enhances total access.
The matrix shows one of the highest positive correlations (intense blue). This indicates that: • Countries with a high GDP per capita (Gabon, Equatorial Guinea) • also have a higher Human Development Index. This reflects a structural reality: The wealthier a country is, the better its performance in health, education, and infrastructure—and therefore in electrification.
The two variables are almost perfectly correlated. This means that: • Countries that produce a lot of electricity • are also those that consume a lot of it. This is normal behavior for energy systems: Demand drives production, and production capacity depends on the level of industrialization and urbanization.
• Countries with high hydroelectric production (Cameroon, Gabon) are not those with high fossil fuel production. • The two variables are therefore generally inversely correlated.
This shows two types of energy profiles: • “hydro-dependent” countries • “fossil fuel-dependent” countries# Center and reduce the data
donnees_centrees_reduites <- scale(donnees_quantitatives,center = TRUE,scale=TRUE)
datatable(donnees_centrees_reduites, options = list(pageLength = 5, autoWidth = TRUE))
# Perform the PCA
resultat_acp <- PCA(donnees_centrees_reduites, axes = c(1, 2), graph = TRUE)
## Warning in PCA(donnees_centrees_reduites, axes = c(1, 2), graph = TRUE):
## Missing values are imputed by the mean of the variable: you should use the
## imputePCA function of the missMDA package
# Display the results of the PCA
print(resultat_acp)
## **Results for the Principal Component Analysis (PCA)**
## The analysis was performed on 11 individuals, described by 12 variables
## *The results are available in the following objects:
##
## name description
## 1 "$eig" "eigenvalues"
## 2 "$var" "results for the variables"
## 3 "$var$coord" "coord. for the variables"
## 4 "$var$cor" "correlations variables - dimensions"
## 5 "$var$cos2" "cos2 for the variables"
## 6 "$var$contrib" "contributions of the variables"
## 7 "$ind" "results for the individuals"
## 8 "$ind$coord" "coord. for the individuals"
## 9 "$ind$cos2" "cos2 for the individuals"
## 10 "$ind$contrib" "contributions of the individuals"
## 11 "$call" "summary statistics"
## 12 "$call$centre" "mean of the variables"
## 13 "$call$ecart.type" "standard error of the variables"
## 14 "$call$row.w" "weights for the individuals"
## 15 "$call$col.w" "weights for the variables"
The correlation circle analysis clearly identifies the structure of the first two axes of the PCA. The first dimension (Dimension 1), which explains 43.41% of the total variance, is strongly correlated with variables reflecting the level of socioeconomic development and energy performance. We observe that the vectors GDP_Per_Capita, HDI, Access_Elect, Access_Elect_Urban, as well as the variables related to electricity production and demand (Elect_Gen, Elect_Demand, Hydroelectricity_gen, Fossil_fuels_elect_gen) clearly point in the positive direction of this axis. This means that Dimension 1 contrasts countries with high levels of wealth, advanced electrification, and a more developed energy system with those with low economic and energy capacity. Thus, this dimension can be named:
Axis 1: “Economic Development and Energy Performance”
The second dimension (Dimension 2), which explains 36.60% of the variance, primarily contrasts demographic variables. The vectors Pop_Growth and Total_Pop are positively aligned on this axis, while Rural_Pop is projected to the negative side. This structure reflects a contrast between, on the one hand, countries with high population growth or a large total population, and on the other hand, those where the population is predominantly rural. The positioning of variables such as Hydroelectricity_gen or Elect_Demand near the vertical axis indicates that they contribute moderately to this dimension, without strongly structuring it. Dimension 2 therefore expresses characteristics related to population pressure, urbanization, and territorial imbalances more than to energy performance. This dimension can be named:
Axis 2: “Demographic Dynamics and Territorial Structure”# Perform the PCA with qualitatives variables
resultat_acp <- PCA(donnees_csv, scale.unit = TRUE, ncp = 2, quali.sup = 13:14, graph = TRUE)
## Warning in PCA(donnees_csv, scale.unit = TRUE, ncp = 2, quali.sup = 13:14, :
## Missing values are imputed by the mean of the variable: you should use the
## imputePCA function of the missMDA package
# Display the results of the PCA
print(resultat_acp)
## **Results for the Principal Component Analysis (PCA)**
## The analysis was performed on 11 individuals, described by 14 variables
## *The results are available in the following objects:
##
## name description
## 1 "$eig" "eigenvalues"
## 2 "$var" "results for the variables"
## 3 "$var$coord" "coord. for the variables"
## 4 "$var$cor" "correlations variables - dimensions"
## 5 "$var$cos2" "cos2 for the variables"
## 6 "$var$contrib" "contributions of the variables"
## 7 "$ind" "results for the individuals"
## 8 "$ind$coord" "coord. for the individuals"
## 9 "$ind$cos2" "cos2 for the individuals"
## 10 "$ind$contrib" "contributions of the individuals"
## 11 "$quali.sup" "results for the supplementary categorical variables"
## 12 "$quali.sup$coord" "coord. for the supplementary categories"
## 13 "$quali.sup$v.test" "v-test of the supplementary categories"
## 14 "$call" "summary statistics"
## 15 "$call$centre" "mean of the variables"
## 16 "$call$ecart.type" "standard error of the variables"
## 17 "$call$row.w" "weights for the individuals"
## 18 "$call$col.w" "weights for the variables"
# Biplot visualization
fviz_pca_biplot(resultat_acp, repel = TRUE)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## ℹ The deprecated feature was likely used in the ggpubr package.
## Please report the issue at <https://github.com/kassambara/ggpubr/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
This graph is a principal component analysis (PCA) bigraph, which visually represents the relationships between countries (points) and variables (arrows) based on the first two principal dimensions, Dim1 and Dim2. These dimensions capture 43.55% and 34.57% of the data variance, respectively, meaning that together they account for 78.12% of the total variance. We can also make the following observations:
• The first axis, titled “Economic Development and Energy Performance,” reflects a clear opposition between two groups of countries. To the right of the axis are the most developed countries, characterized by high GDP per capita, a higher HDI, significant energy production, and better rates of access to electricity, both in urban and rural areas. These include Gabon, Equatorial Guinea, Cameroon, DRC and Angola. Conversely, to the left of this axis appear countries with a lower level of development, limited energy production, and reduced access to electricity, such as the Central African Republic, Chad, and Burundi. The first axis thus reflects the overall gradient of development and energy performance among the countries studied.
• The second axis, called “Demographic Dynamics and Territorial Structure,” contrasts countries characterized by strong population growth and a predominantly rural population—such as Chad, the Central African Republic, and Burundi—with those whose demographic structure is more stable and more urbanized, such as Gabon, Equatorial Guinea, and São Tomé and Príncipe. This axis therefore highlights the influence of rurality and demographic pressure on the challenges related to electrification, showing that the more rural countries with high population growth are also those that encounter the greatest difficulties in accessing energy services.The eigenvalues indicate the amount of variance explained by each principal component.
# Extract and plot eigenvalues
val.propre <- get_eigenvalue(resultat_acp)
pander(val.propre)
| eigenvalue | variance.percent | cumulative.variance.percent | |
|---|---|---|---|
| Dim.1 | 5.226 | 43.55 | 43.55 |
| Dim.2 | 4.148 | 34.57 | 78.12 |
| Dim.3 | 1.044 | 8.696 | 86.81 |
| Dim.4 | 0.7268 | 6.056 | 92.87 |
| Dim.5 | 0.5651 | 4.709 | 97.58 |
| Dim.6 | 0.1574 | 1.312 | 98.89 |
| Dim.7 | 0.08546 | 0.7122 | 99.6 |
| Dim.8 | 0.03278 | 0.2732 | 99.88 |
| Dim.9 | 0.01335 | 0.1113 | 99.99 |
| Dim.10 | 0.001465 | 0.01221 | 100 |
fviz_eig(resultat_acp, addlabels = TRUE, ylim = c(0, 50))
## Warning in geom_bar(stat = "identity", fill = barfill, color = barcolor, :
## Ignoring empty aesthetic: `width`.
We examine the contribution of each variable to the principal components.
# Get PCA variable results
resultat.var <- get_pca_var(resultat_acp)
pander(resultat.var$coord)
| Dim.1 | Dim.2 | |
|---|---|---|
| Elect_Gen | -0.4548 | 0.8823 |
| Access_Elect | 0.8706 | 0.4483 |
| Access_Elect_Urbain | 0.7557 | 0.3368 |
| Access_Elect_Rural | 0.6208 | 0.07858 |
| Elec_Demand | -0.4971 | 0.8549 |
| Total_Pop | -0.7399 | 0.5019 |
| Rural_Pop | -0.6205 | -0.6026 |
| Pop_Growth | -0.8368 | 0.319 |
| GDP_Per_Capita | 0.7041 | 0.3572 |
| HDI | 0.779 | 0.6015 |
| Fossil_fuels_elect_gen | 0.1798 | 0.6623 |
| Hydroelectricity_gen | -0.5388 | 0.8205 |
pander(resultat.var$cor)
| Dim.1 | Dim.2 | |
|---|---|---|
| Elect_Gen | -0.4548 | 0.8823 |
| Access_Elect | 0.8706 | 0.4483 |
| Access_Elect_Urbain | 0.7557 | 0.3368 |
| Access_Elect_Rural | 0.6208 | 0.07858 |
| Elec_Demand | -0.4971 | 0.8549 |
| Total_Pop | -0.7399 | 0.5019 |
| Rural_Pop | -0.6205 | -0.6026 |
| Pop_Growth | -0.8368 | 0.319 |
| GDP_Per_Capita | 0.7041 | 0.3572 |
| HDI | 0.779 | 0.6015 |
| Fossil_fuels_elect_gen | 0.1798 | 0.6623 |
| Hydroelectricity_gen | -0.5388 | 0.8205 |
pander(resultat.var$cos2)
| Dim.1 | Dim.2 | |
|---|---|---|
| Elect_Gen | 0.2069 | 0.7784 |
| Access_Elect | 0.7579 | 0.2009 |
| Access_Elect_Urbain | 0.5711 | 0.1135 |
| Access_Elect_Rural | 0.3854 | 0.006175 |
| Elec_Demand | 0.2471 | 0.7308 |
| Total_Pop | 0.5475 | 0.2519 |
| Rural_Pop | 0.385 | 0.3632 |
| Pop_Growth | 0.7002 | 0.1018 |
| GDP_Per_Capita | 0.4958 | 0.1276 |
| HDI | 0.6068 | 0.3618 |
| Fossil_fuels_elect_gen | 0.03233 | 0.4386 |
| Hydroelectricity_gen | 0.2903 | 0.6732 |
pander(resultat.var$contrib)
| Dim.1 | Dim.2 | |
|---|---|---|
| Elect_Gen | 3.958 | 18.77 |
| Access_Elect | 14.5 | 4.845 |
| Access_Elect_Urbain | 10.93 | 2.735 |
| Access_Elect_Rural | 7.374 | 0.1489 |
| Elec_Demand | 4.728 | 17.62 |
| Total_Pop | 10.48 | 6.074 |
| Rural_Pop | 7.367 | 8.756 |
| Pop_Growth | 13.4 | 2.453 |
| GDP_Per_Capita | 9.486 | 3.076 |
| HDI | 11.61 | 8.722 |
| Fossil_fuels_elect_gen | 0.6186 | 10.57 |
| Hydroelectricity_gen | 5.555 | 16.23 |
Let’s now visualize these contributions on the contribution graphs :
From the analysis of the contribution graphs for the variables, it emerges that:
The variables that participate best in the formation of dimension 1 are the variables HDI, Fossil_fuels_elect_gen, Hydroelectricity_gen, Elec_Demand and Elec_Gen
The variables that contribute best to the formation of dimension 2 are Pop_Growth, Total_Pop and Access_Elec
Similarly, the variables Total_Pop, Elec_Demand, Elec_Gen, HDI, Access_Elec, Hydroelectricity_gen and Total_Pop contribute best to the formation of factorial plan.fviz_pca_var(resultat_acp, col.var = "contrib", gradient.cols = c("blue", "orange", "red"), repel = TRUE, title = "Contribution of Variables to Principal Components")
fviz_contrib(resultat_acp, choice = "var", axes = 1, top = 12)
fviz_contrib(resultat_acp, choice = "var", axes = 2, top = 12)
fviz_contrib(resultat_acp, choice = "var", axes = 1:2, top = 12)
In this section, we explore the coordinates, quality of representation, and contributions of individuals (observations) to the PCA axes.
# Get PCA individual results
resultat.ind <- get_pca_ind(resultat_acp)
pander(resultat.ind$coord)
| Dim.1 | Dim.2 | |
|---|---|---|
| Cameroon | 0.5026 | 1.469 |
| Republic of the Congo | 0.9198 | 0.3619 |
| DRC | -4.357 | 2.313 |
| Gabon | 3.612 | 1.262 |
| Chad | -2.188 | -2.291 |
| Central African Republic | -1.257 | -2.717 |
| Ecuadorian Guinea | 2.132 | 0.2254 |
| Angola | -1.384 | 3.906 |
| Rwanda | 0.7403 | -1.327 |
| Burundi | -1.613 | -2.485 |
| São Tomé and Príncipe | 2.894 | -0.7181 |
pander(resultat.ind$cos2)
| Dim.1 | Dim.2 | |
|---|---|---|
| Cameroon | 0.06392 | 0.5459 |
| Republic of the Congo | 0.1893 | 0.0293 |
| DRC | 0.6939 | 0.1956 |
| Gabon | 0.8161 | 0.09962 |
| Chad | 0.4192 | 0.4593 |
| Central African Republic | 0.1402 | 0.6543 |
| Ecuadorian Guinea | 0.5306 | 0.00593 |
| Angola | 0.1021 | 0.8129 |
| Rwanda | 0.08499 | 0.273 |
| Burundi | 0.2763 | 0.6559 |
| São Tomé and Príncipe | 0.585 | 0.03602 |
pander(resultat.ind$contrib)
| Dim.1 | Dim.2 | |
|---|---|---|
| Cameroon | 0.4394 | 4.728 |
| Republic of the Congo | 1.472 | 0.287 |
| DRC | 33.03 | 11.73 |
| Gabon | 22.7 | 3.491 |
| Chad | 8.33 | 11.5 |
| Central African Republic | 2.751 | 16.18 |
| Ecuadorian Guinea | 7.904 | 0.1113 |
| Angola | 3.334 | 33.45 |
| Rwanda | 0.9533 | 3.858 |
| Burundi | 4.526 | 13.54 |
| São Tomé and Príncipe | 14.57 | 1.13 |
Let’s now visualize these contributions on the contribution graphs :
From the analysis of the contribution graphs for the individuals, it emerges that :
The individual that participate best in the formation of dimension 1 are DRC, Gabon and São Tomé and Príncipe
The individual that contribute best to the formation of dimension 2 are Angola and Central African Republic
Similarly, the individuals DRC, Angola and Gabon contribute best to the formation of factorial plan.fviz_pca_ind(resultat_acp, col.ind = "cos2", gradient.cols = c("blue", "orange", "red"), repel = TRUE)
fviz_contrib(resultat_acp, choice = "ind", axes = 1, top = 12)
fviz_contrib(resultat_acp, choice = "ind", axes = 2, top = 12)
fviz_contrib(resultat_acp, choice = "ind", axes = 1:2, top = 12)
# Perform HCPC
resultat.cah <- HCPC(resultat_acp, nb.clust = -1, consol = FALSE, graph = FALSE)
# Visualize hierarchical clustering
plot.HCPC(resultat.cah, choice = 'tree', title = 'Hierarchical Tree')
plot.HCPC(resultat.cah, choice = 'map', draw.tree = FALSE, title = 'Factor Map')
# Fit multiple linear regression
regression <- lm(Access_Elect ~ Total_Pop + Elect_Gen + Elec_Demand + Rural_Pop + Pop_Growth + HDI + GDP_Per_Capita + Fossil_fuels_elect_gen + Hydroelectricity_gen, data = donnees_csv)
print(summary(regression))
##
## Call:
## lm(formula = Access_Elect ~ Total_Pop + Elect_Gen + Elec_Demand +
## Rural_Pop + Pop_Growth + HDI + GDP_Per_Capita + Fossil_fuels_elect_gen +
## Hydroelectricity_gen, data = donnees_csv)
##
## Residuals:
## Cameroon Republic of the Congo DRC
## -0.45629 -0.16779 -0.08937
## Gabon Chad Central African Republic
## -0.26422 1.67162 1.38688
## Ecuadorian Guinea Angola Rwanda
## 0.02580 0.29476 2.39262
## Burundi São Tomé and Príncipe
## -4.32482 -0.46918
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.896e+00 3.970e+01 -0.249 0.844
## Total_Pop 1.964e-06 7.106e-07 2.763 0.221
## Elect_Gen -9.286e-02 3.373e-02 -2.753 0.222
## Elec_Demand 1.068e-02 1.192e-02 0.896 0.535
## Rural_Pop -5.320e-01 2.255e-01 -2.359 0.255
## Pop_Growth -2.065e+01 1.127e+01 -1.832 0.318
## HDI 2.361e+02 4.705e+01 5.017 0.125
## GDP_Per_Capita -1.845e-03 2.421e-03 -0.762 0.585
## Fossil_fuels_elect_gen 8.264e+01 2.925e+01 2.826 0.217
## Hydroelectricity_gen 8.124e+01 2.401e+01 3.384 0.183
##
## Residual standard error: 5.456 on 1 degrees of freedom
## Multiple R-squared: 0.9964, Adjusted R-squared: 0.9644
## F-statistic: 31.06 on 9 and 1 DF, p-value: 0.1384
regression5 <- lm(Access_Elect ~ Hydroelectricity_gen + HDI, data = donnees_csv)
print(summary(regression5))
##
## Call:
## lm(formula = Access_Elect ~ Hydroelectricity_gen + HDI, data = donnees_csv)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.8533 -5.4869 -0.4644 5.8651 13.6879
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -92.2424 15.1983 -6.069 0.000299 ***
## Hydroelectricity_gen -0.8533 0.5783 -1.476 0.178278
## HDI 262.3808 27.6107 9.503 1.24e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.183 on 8 degrees of freedom
## Multiple R-squared: 0.9192, Adjusted R-squared: 0.899
## F-statistic: 45.51 on 2 and 8 DF, p-value: 4.26e-05
# Plot regression diagnostics
plot(regression5,which = 1)
# Plot regression diagnostics
plot(regression5,which = 2)
We can use the model to make predictions for Access_Elec based on the values of the predictor variables.
# Make predictions
predictions <- predict(regression)
pander(predictions)
| Cameroon | Republic of the Congo | DRC | Gabon | Chad |
|---|---|---|---|---|
| 71.46 | 50.77 | 21.59 | 93.76 | 10.03 |
| Central African Republic | Ecuadorian Guinea | Angola | Rwanda | Burundi |
|---|---|---|---|---|
| 14.31 | 66.97 | 48.21 | 48.21 | 14.62 |
| São Tomé and Príncipe |
|---|
| 78.47 |
In this analysis, we examined the challenges of electrification in Central Africa by applying statistical techniques to a set of socioeconomic, demographic, and energy variables. Principal Component Analysis (PCA) reduced the complexity of the dataset and identified the major dimensions that structure regional disparities. The results highlight the crucial role of GDP per capita, electricity production, access to electricity (urban and rural), and demographic characteristics in differentiating the countries of the region.
The PCA revealed that the first two dimensions capture most of the variability between countries, contrasting, on the one hand, states with relatively high economic and energy capacity, and on the other hand, those facing structural weaknesses, high levels of rural population, or significant population growth. Cluster analysis, when combined with PCA results, reveals distinct national profiles, reflecting heterogeneous levels of electrification, economic development, and territorial organization. The results also suggest that GDP per capita remains a key explanatory factor for access to electricity in the region, thus confirming dynamics already observed in other African contexts.
Overall, these results provide crucial insights into the persistent disparities in electrification across Central Africa and underscore the need for targeted policies, particularly to strengthen rural electrification, improve energy efficiency, and diversify generation sources, especially through renewable energy. Future research could incorporate longitudinal data to analyze changes over time, or include institutional and policy variables to better understand the influence of governance on electrification progress.