Information Research and Processing: The Challenges of Electrification in CENTRAL Africa

INTRODUCTION

Central Africa, comprising primarily Angola, Burundi, Cameroon, the Central African Republic, Chad, the Republic of Congo, the Democratic Republic of Congo, Gabon, Equatorial Guinea, Rwanda, and São Tomé and Príncipe, is one of the least electrified regions in the world. This sub-region exhibits particularly alarming rates of access to electricity, with the situation even more critical in rural areas. This literature review aims to examine in depth the structural obstacles hindering electrification in Central Africa, while also exploring the transformative potential of renewable energy as a means of reducing spatial inequalities in energy access.

Scree plot

DOWNLOAD THE PACKAGES

# Set options and load required libraries
knitr::opts_chunk$set(echo = TRUE)
library(FactoMineR)
library(factoextra)

## Le chargement a nécessité le package : ggplot2

## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa

library(ggplot2)
library(psych)

## 
## Attachement du package : 'psych'

## Les objets suivants sont masqués depuis 'package:ggplot2':
## 
##     %+%, alpha

library(Factoshiny)

## Le chargement a nécessité le package : shiny

## Le chargement a nécessité le package : FactoInvestigate

library(shiny)
library(FactoInvestigate)

#Set options and load required libraries
library(DataExplorer)

## Warning: le package 'DataExplorer' a été compilé avec la version R 4.5.2

library(corrplot)

## corrplot 0.95 loaded

library(pander)

## Warning: le package 'pander' a été compilé avec la version R 4.5.2

## 
## Attachement du package : 'pander'

## L'objet suivant est masqué depuis 'package:shiny':
## 
##     p

library(DT)

## 
## Attachement du package : 'DT'

## Les objets suivants sont masqués depuis 'package:shiny':
## 
##     dataTableOutput, renderDataTable

library(rsconnect)

## Warning: le package 'rsconnect' a été compilé avec la version R 4.5.2

## 
## Attachement du package : 'rsconnect'

## L'objet suivant est masqué depuis 'package:shiny':
## 
##     serverInfo

library(askpass)
library(VIM)

## Le chargement a nécessité le package : colorspace

## Le chargement a nécessité le package : grid

## VIM is ready to use.

## Suggestions and bug-reports can be submitted at: https://github.com/statistikat/VIM/issues

## 
## Attachement du package : 'VIM'

## L'objet suivant est masqué depuis 'package:datasets':
## 
##     sleep

library(dplyr)

## 
## Attachement du package : 'dplyr'

## Les objets suivants sont masqués depuis 'package:stats':
## 
##     filter, lag

## Les objets suivants sont masqués depuis 'package:base':
## 
##     intersect, setdiff, setequal, union

DATABASE IMPORT

The database was entered into Excel and saved in CSV (semicolon-delimited) format. We then imported this file into R using the “read.csv()” function.

#Define the file path
file <- "C:/Users/LEGION/Desktop/Projet RTI S7A-2025/RTI_DATA_2022.csv"

# Set working directory and load data
donnees_csv <- read.csv(file, header = TRUE, sep = ";", dec = ",", row.names = 1,  fileEncoding = "latin1")
datatable(donnees_csv, options = list(pageLength = 5, autoWidth = TRUE))

To improve the readability of the database header, we decided to rename the variables so that the software could not display the units of the data contained within parentheses. To do this, we used the “colnames” function, which we applied to our database.

#Rename the columns with the variable names without the units in parentheses
colnames(donnees_csv) <-  c(
  "Elect_Gen",                 # Elec_Gen (GWh)
  "Access_Elect",              # Access_Elec (% of pop)
  "Access_Elect_Urbain",       # Access_Elec_urban (% of urban pop)
  "Access_Elect_Rural",        # Access_Elec_Rural (% of rural pop)
  "Elec_Demand",               # Elec_Demand (GWh)
  "Total_Pop",                 # Total_Pop (hbts) 
  "Rural_Pop",                 # Rural_Pop (% of total pop)
  "Pop_Growth",                # Pop_Growth (annual %)
  "GDP_Per_Capita",            # GDP_Per_Capita (current US$)
  "HDI",                       # HDI
  "Fossil_fuels_elect_gen",    # Fossil fuels elect gen (billion kWh)
  "Hydroelectricity_gen",      # Hydroelectricity generation (billion kWh)
  "Income_Class",              # Income_Class
  "Indust_Level"               #Indust_Level
)

datatable(donnees_csv, options = list(pageLength = 5, autoWidth = TRUE))

ANALYSIS OF MISSING DATA

In this section, we created a function that identifies missing values and calculates their proportion relative to the entire database. After running the code, we see that the data on the rural electrification rate is missing. Therefore, we have a proportion of 0.1 missing values compared to 0.9 for the other data.

# Function to calculate the proportion of missing values per variable
proportion_valeurs_manquantes <- function(data) 
  {
    # Calculating the number of missing values per column
  nb_valeurs_manquantes <- sapply(data, function(x) sum(is.na(x)))

  # Calculating the proportion of missing values
  proportion_manquantes <- nb_valeurs_manquantes / nrow(data)

  # Creating a dataframe for the result
  resultat <- data.frame(Nombre = nb_valeurs_manquantes, Proportion = proportion_manquantes)
  return(resultat)
}

# Using the function with your database
resultat <- proportion_valeurs_manquantes(donnees_csv)

# Displaying the result
resultat

##                        Nombre Proportion
## Elect_Gen                   0 0.00000000
## Access_Elect                0 0.00000000
## Access_Elect_Urbain         0 0.00000000
## Access_Elect_Rural          1 0.09090909
## Elec_Demand                 0 0.00000000
## Total_Pop                   0 0.00000000
## Rural_Pop                   0 0.00000000
## Pop_Growth                  0 0.00000000
## GDP_Per_Capita              0 0.00000000
## HDI                         0 0.00000000
## Fossil_fuels_elect_gen      0 0.00000000
## Hydroelectricity_gen        0 0.00000000
## Income_Class                0 0.00000000
## Indust_Level                0 0.00000000

# Using the aggr() function to view missing values

aggr(donnees_csv, col=c('navyblue','yellow'), numbers=TRUE, sortVars=TRUE, 
     labels=names(donnees_csv), cex.axis=.7, gap=3, ylab=c("Histogram of missing data","Pattern"))

## 
##  Variables sorted by number of missings: 
##                Variable      Count
##      Access_Elect_Rural 0.09090909
##               Elect_Gen 0.00000000
##            Access_Elect 0.00000000
##     Access_Elect_Urbain 0.00000000
##             Elec_Demand 0.00000000
##               Total_Pop 0.00000000
##               Rural_Pop 0.00000000
##              Pop_Growth 0.00000000
##          GDP_Per_Capita 0.00000000
##                     HDI 0.00000000
##  Fossil_fuels_elect_gen 0.00000000
##    Hydroelectricity_gen 0.00000000
##            Income_Class 0.00000000
##            Indust_Level 0.00000000

Only one variable contains a missing value: • Access_Elect_Rural: 1 missing value (10% for this variable). All other variables have 0% missing values.

Interpretation • The dataset is generally clean and fully usable.

• The single missing value does not invalidate the PCA, especially since FactoMineR automatically imputes using the mean (but the note warned you to potentially use imputePCA).

• This near-total absence of NA means that the PCA results will be stable and reliable.

DESCRIPTION OF QUANTITATIVE VARIABLES

This description involves creating histograms and boxplots for each variable. These graphs will allow us to analyze and understand the distribution of each variable: the mean, the variance, outliers, etc. We begin this step by identifying the columns containing quantitative variables using the “sapply()” package.

# Identify the quantitative columns
vars_quantitatives <- sapply(donnees_csv, is.numeric)


# Create a histogram for each quantitative variable
for (var in names(donnees_csv)[vars_quantitatives]) {
  print(ggplot(donnees_csv, aes_string(x = var)) +
          geom_histogram(bins = 30, fill = "blue", color = "black") +
          theme_minimal() +
          labs(title = paste("Histogram of", var), x = var, y = "Frequency"))
}

## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_bin()`).

The histograms show very heterogeneous distributions, which is normal in an analysis of Central Africa where countries have very different energy profiles. Indeed,

Elect_Gen (Total Electricity Production) • Highly asymmetrical distribution. • Cameroon, Congo, and Gabon produce significantly more electricity than the Central African Republic (CAR) and Chad. This description indicates strong structural disparities between countries.

Access_Elect, Access_Elect_Urban, Access_Elect_Rural • High heterogeneity: • Gabon and Cameroon: very high access • Chad and CAR: very low access • Rural: extremely low values everywhere (very low for CAR and Chad) This confirms that rural access is the main challenge in the region.

GDP_per_capita, HDI • Highly dispersed distribution: Gabon and Equatorial Guinea are by far the most dominant. Wealthy countries have better energy performance.

Fossil fuels, electricity, and hydropower • Some countries do not use fossil fuels or hydropower at all (Chad, Central African Republic). The energy mix also explains the disparities in electrification.

These descriptions may be confirmed or refuted by the ACP

# Create a boxplot for each quantitative variable
for (var in names(donnees_csv)[vars_quantitatives]) {
  print(ggplot(donnees_csv, aes_string(x = factor(1), y = var)) +
          geom_boxplot(fill = "skyblue", color = "darkblue") +
          theme_minimal() +
          labs(title = paste("Boxplot of", var), x = "", y = var))
}

## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_boxplot()`).

The boxplots show: • Strong outliers for: • General Elective (Cameroon very high) • GDP per capita (Gabon and Equatorial Guinea well above) • Rural Elective Access (Gabon much above the others) This confirms that the sample contains extremely diverse countries, fully justifying the use of PCA to identify typical profiles.

ANALYSIS OF QUALITATIVE VARIABLES

# Function to create a barplot in proportions
creer_barplot_proportion <- function(data, column_name) 
  {
  # Calculate the proportionss
  proportions <- data %>%
    count(.data[[column_name]]) %>%
    mutate(Proportion = n / sum(n))

  # Create the barplot
  ggplot(proportions, aes_string(x = column_name, y = "Proportion", fill = column_name)) +
    geom_bar(stat = "identity") +
    scale_y_continuous(labels = scales::percent_format()) +
    labs(x = column_name, y = "Proportion (%)") +
    theme_minimal()
}

# Create a bar plot for the variable "Income_Class"
creer_barplot_proportion(donnees_csv, "Income_Class")

# Create a bar plot for the variable "Indust_Level"
creer_barplot_proportion(donnees_csv, "Indust_Level")

The income bracket chart shows that over 45% of countries have very low incomes and nearly 35% have middle incomes. Only slightly less than 20% of the countries studied have relatively high incomes.

The industrialization level chart shows that nearly 40% of Central African countries have low levels of industrialization, while nearly 40% have high levels of industrialization.

ANALYSIS OF CORRELATIONS BETWEEN QUANTITATIVE VARIABLES

We calculate the correlation matrix for the first nine variables and visualize it using a correlation plot. This helps in understanding relationships between variables before performing PCA.

# Identify the quantitative columns
vars_quantitatives <- sapply(donnees_csv, is.numeric)

#Extraction of quantitative variables
donnees_quantitatives <- donnees_csv[, vars_quantitatives]

# Calculate the correlation matrix
matrice_correlation <- cor(donnees_quantitatives, use = "complete.obs")

datatable(matrice_correlation, options = list(pageLength = 6)) %>%
  formatRound(columns = 1:ncol(matrice_correlation), digits = 2)

# DataExplorer correlation plot
corrplot(matrice_correlation, method = "color", type = "upper", tl.col = "black", tl.srt = 75)

The correlation graph (heatmap) highlights the relationships between the different variables associated with electrification in Central Africa. Several important links clearly emerge:

Strong positive correlation between Access_Elect, Urban Access_Elect, and Rural Access_Elect

These three electricity access variables are very strongly correlated with each other (high coefficients, in dark blue). This means that: • when a country has good overall access to electricity, • it also has good access in urban areas, • and often better access in rural areas (even if the levels remain low). This is logical: overall access is primarily driven by urban performance, but when rural access improves, it immediately enhances total access.

Very strong correlation between GDP_per_capita and HDI

The matrix shows one of the highest positive correlations (intense blue). This indicates that: • Countries with a high GDP per capita (Gabon, Equatorial Guinea) • also have a higher Human Development Index. This reflects a structural reality: The wealthier a country is, the better its performance in health, education, and infrastructure—and therefore in electrification.

Positive Correlation Between Electricity_Gen and Electricity_Demand

The two variables are almost perfectly correlated. This means that: • Countries that produce a lot of electricity • are also those that consume a lot of it. This is normal behavior for energy systems: Demand drives production, and production capacity depends on the level of industrialization and urbanization.

Population Growth Negatively Correlated with HDI

Even though the correlation is less strong, a negative link is observed between: • population growth rate (Population Growth), • human development level (HDI). This suggests that countries with high population growth (Ex : Chad, Central African Republic) are also those with lower human development.

This can be explained by: • pressure on public services, • the difficulty of electrifying a rapidly growing population, • infrastructure that cannot keep pace.

Rural Population Negatively Correlated with Access to Electricity

The proportion of the rural population is inversely correlated with overall access to electricity. In other words: • the more rural a country is, • the lower its access to electricity.

This reflects a fundamental reality in Central Africa: rural electrification is the main energy deficit because: • distances are greater, • infrastructure costs are higher, • rural areas are less profitable for operators.

Relationships between energy sources: Hydro and Fossil fuels

• Countries with high hydroelectric production (Cameroon, Gabon) are not those with high fossil fuel production. • The two variables are therefore generally inversely correlated.

This shows two types of energy profiles: • “hydro-dependent” countries • “fossil fuel-dependent” countries

CENTER AND REDUCE THE DATA

After the descriptive analysis of the variables and the examination of the correlation matrix, it becomes clear that the indicators used in the database are not expressed on comparable scales. Some variables, such as electricity production or demand, have very high values expressed in gigawatt-hours, while others, such as electricity access rates or the proportion of the rural population, are expressed as percentages. Similarly, indicators such as the HDI and GDP per capita vary significantly in magnitude and units. In this context, performing a PCA absolutely requires harmonizing the scales to prevent variables with large magnitudes from dominating the analysis. This methodological preparation naturally leads to the next step: centering and reducing the data, which makes all variables comparable and ensures a reliable interpretation of the factorial axes.

# Center and reduce the data
donnees_centrees_reduites <- scale(donnees_quantitatives,center = TRUE,scale=TRUE)
datatable(donnees_centrees_reduites, options = list(pageLength = 5, autoWidth = TRUE))

FACTORIAL ANALYSIS (FA) / Principal component analysis (PCA)

Factor analysis is a family of statistical methods used to reduce the dimensionality of a dataset while retaining essential information. It involves identifying latent dimensions, called factors, that summarize the relationships between the initial variables. This approach is particularly relevant when working with a large number of interdependent variables, as is the case in the study of the determinants of electrification in Central Africa. Among the factor analysis methods, we have chosen Principal Component Analysis (PCA), which is well-suited to quantitative data. PCA transforms the set of initial variables into a smaller number of new, uncorrelated components, while maximizing the proportion of variance explained. Therefore, the implementation of PCA in our study aims primarily to simplify the data structure and identify the major components that summarize the energy, demographic, and socioeconomic characteristics of the countries analyzed.

# Perform the PCA
resultat_acp <- PCA(donnees_centrees_reduites, axes = c(1, 2), graph = TRUE)

## Warning in PCA(donnees_centrees_reduites, axes = c(1, 2), graph = TRUE):
## Missing values are imputed by the mean of the variable: you should use the
## imputePCA function of the missMDA package

# Display the results of the PCA
print(resultat_acp)

## **Results for the Principal Component Analysis (PCA)**
## The analysis was performed on 11 individuals, described by 12 variables
## *The results are available in the following objects:
## 
##    name               description                          
## 1  "$eig"             "eigenvalues"                        
## 2  "$var"             "results for the variables"          
## 3  "$var$coord"       "coord. for the variables"           
## 4  "$var$cor"         "correlations variables - dimensions"
## 5  "$var$cos2"        "cos2 for the variables"             
## 6  "$var$contrib"     "contributions of the variables"     
## 7  "$ind"             "results for the individuals"        
## 8  "$ind$coord"       "coord. for the individuals"         
## 9  "$ind$cos2"        "cos2 for the individuals"           
## 10 "$ind$contrib"     "contributions of the individuals"   
## 11 "$call"            "summary statistics"                 
## 12 "$call$centre"     "mean of the variables"              
## 13 "$call$ecart.type" "standard error of the variables"    
## 14 "$call$row.w"      "weights for the individuals"        
## 15 "$call$col.w"      "weights for the variables"

Correlation Circle of Variables

The correlation circle analysis clearly identifies the structure of the first two axes of the PCA. The first dimension (Dimension 1), which explains 43.41% of the total variance, is strongly correlated with variables reflecting the level of socioeconomic development and energy performance. We observe that the vectors GDP_Per_Capita, HDI, Access_Elect, Access_Elect_Urban, as well as the variables related to electricity production and demand (Elect_Gen, Elect_Demand, Hydroelectricity_gen, Fossil_fuels_elect_gen) clearly point in the positive direction of this axis. This means that Dimension 1 contrasts countries with high levels of wealth, advanced electrification, and a more developed energy system with those with low economic and energy capacity. Thus, this dimension can be named:

Axis 1: “Economic Development and Energy Performance”

The second dimension (Dimension 2), which explains 36.60% of the variance, primarily contrasts demographic variables. The vectors Pop_Growth and Total_Pop are positively aligned on this axis, while Rural_Pop is projected to the negative side. This structure reflects a contrast between, on the one hand, countries with high population growth or a large total population, and on the other hand, those where the population is predominantly rural. The positioning of variables such as Hydroelectricity_gen or Elect_Demand near the vertical axis indicates that they contribute moderately to this dimension, without strongly structuring it. Dimension 2 therefore expresses characteristics related to population pressure, urbanization, and territorial imbalances more than to energy performance. This dimension can be named:

Axis 2: “Demographic Dynamics and Territorial Structure”

Correlation Circle of Individuals

Analysis of the first two axes of the PCA highlights the main factors structuring energy differences between Central African countries. The first dimension, which explains 43.41% of the total variance, clearly distinguishes countries with strong economic and industrial capacity from those with a lower level of development. Positive values on this axis are associated with greater access to electricity, higher energy production, and a higher level of industrialization, encompassing countries such as Angola, Gabon, Equatorial Guinea, DRC and the Congo. Conversely, low-income countries, characterized by low electrification and limited energy infrastructure—notably Burundi, the Central African Republic, and Chad—are located on the negative end of this dimension. The second dimension, which explains 36.60% of the variance, introduces further differentiation based on structural characteristics related to demographics, institutional stability, and the development of public services. Countries at the top of the axis generally face more pronounced socio-economic challenges, while those at the bottom, such as São Tomé and Príncipe, Gabon, and Equatorial Guinea, are distinguished by particular economic structures or atypical demographic profiles. Thus, the two axes combined reveal a clear contrast between countries with strong economic and energy capacity and those experiencing structural vulnerability, while also reflecting the region’s internal diversity.

# Perform the PCA with qualitatives variables
resultat_acp <- PCA(donnees_csv, scale.unit = TRUE, ncp = 2, quali.sup = 13:14, graph = TRUE)

## Warning in PCA(donnees_csv, scale.unit = TRUE, ncp = 2, quali.sup = 13:14, :
## Missing values are imputed by the mean of the variable: you should use the
## imputePCA function of the missMDA package

# Display the results of the PCA
print(resultat_acp)

## **Results for the Principal Component Analysis (PCA)**
## The analysis was performed on 11 individuals, described by 14 variables
## *The results are available in the following objects:
## 
##    name                description                                          
## 1  "$eig"              "eigenvalues"                                        
## 2  "$var"              "results for the variables"                          
## 3  "$var$coord"        "coord. for the variables"                           
## 4  "$var$cor"          "correlations variables - dimensions"                
## 5  "$var$cos2"         "cos2 for the variables"                             
## 6  "$var$contrib"      "contributions of the variables"                     
## 7  "$ind"              "results for the individuals"                        
## 8  "$ind$coord"        "coord. for the individuals"                         
## 9  "$ind$cos2"         "cos2 for the individuals"                           
## 10 "$ind$contrib"      "contributions of the individuals"                   
## 11 "$quali.sup"        "results for the supplementary categorical variables"
## 12 "$quali.sup$coord"  "coord. for the supplementary categories"            
## 13 "$quali.sup$v.test" "v-test of the supplementary categories"             
## 14 "$call"             "summary statistics"                                 
## 15 "$call$centre"      "mean of the variables"                              
## 16 "$call$ecart.type"  "standard error of the variables"                    
## 17 "$call$row.w"       "weights for the individuals"                        
## 18 "$call$col.w"       "weights for the variables"

The integration of the additional qualitative variables “Income_Class” and “Indust_Level” enriches the analysis by revealing key socio-economic dynamics within Central African countries. The categories projected onto the factorial plane logically align with the distribution of countries. Thus, the categories Income_Class_Low and Indust_Level_Bottom, located on the negative side of dimension 1, correspond to countries such as Burundi, the Central African Republic, and Chad, characterized by low income levels, limited industrialization, and underdeveloped energy infrastructure. Conversely, the categories Income_Class_U-M and Indust_Level_U-M, positioned on the positive side of this axis, correspond to countries with a higher level of economic and industrial development, notably Gabon, Equatorial Guinea, Congo, and especially DRC and Angola, which stands out as an extreme case. Finally, intermediate categories, such as Income_Class_L-M or Indust_Level_L-M, are located near the center of the graph and reflect transitional economic profiles, encompassing countries like Cameroon, Rwanda, and the DRC. Overall, these qualitative categories confirm the consistency of the resulting factor structure: income level and degree of industrialization play a decisive role in the energy contrasts observed between countries in the region.

# Biplot visualization
fviz_pca_biplot(resultat_acp, repel = TRUE)

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## ℹ The deprecated feature was likely used in the ggpubr package.
##   Please report the issue at <https://github.com/kassambara/ggpubr/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

This graph is a principal component analysis (PCA) bigraph, which visually represents the relationships between countries (points) and variables (arrows) based on the first two principal dimensions, Dim1 and Dim2. These dimensions capture 43.55% and 34.57% of the data variance, respectively, meaning that together they account for 78.12% of the total variance. We can also make the following observations:

• The first axis, titled “Economic Development and Energy Performance,” reflects a clear opposition between two groups of countries. To the right of the axis are the most developed countries, characterized by high GDP per capita, a higher HDI, significant energy production, and better rates of access to electricity, both in urban and rural areas. These include Gabon, Equatorial Guinea, Cameroon, DRC and Angola. Conversely, to the left of this axis appear countries with a lower level of development, limited energy production, and reduced access to electricity, such as the Central African Republic, Chad, and Burundi. The first axis thus reflects the overall gradient of development and energy performance among the countries studied.

• The second axis, called “Demographic Dynamics and Territorial Structure,” contrasts countries characterized by strong population growth and a predominantly rural population—such as Chad, the Central African Republic, and Burundi—with those whose demographic structure is more stable and more urbanized, such as Gabon, Equatorial Guinea, and São Tomé and Príncipe. This axis therefore highlights the influence of rurality and demographic pressure on the challenges related to electrification, showing that the more rural countries with high population growth are also those that encounter the greatest difficulties in accessing energy services.

EIGENVALUES

The eigenvalues indicate the amount of variance explained by each principal component.

# Extract and plot eigenvalues
val.propre <- get_eigenvalue(resultat_acp)
pander(val.propre)

	eigenvalue	variance.percent	cumulative.variance.percent
Dim.1	5.226	43.55	43.55
Dim.2	4.148	34.57	78.12
Dim.3	1.044	8.696	86.81
Dim.4	0.7268	6.056	92.87
Dim.5	0.5651	4.709	97.58
Dim.6	0.1574	1.312	98.89
Dim.7	0.08546	0.7122	99.6
Dim.8	0.03278	0.2732	99.88
Dim.9	0.01335	0.1113	99.99
Dim.10	0.001465	0.01221	100

fviz_eig(resultat_acp, addlabels = TRUE, ylim = c(0, 50))

## Warning in geom_bar(stat = "identity", fill = barfill, color = barcolor, :
## Ignoring empty aesthetic: `width`.

Eigenvalue analysis shows that the first three axes have eigenvalues greater than 1, which, according to Kaiser’s criterion, initially suggests retaining these three components. However, the final decision also takes into account the scree test and considerations of parsimony and interpretability. The scree diagram shows a clear slowing of the slope after the second axis: the first two axes together explain 78.12% of the total variance (43.55% and 34.57%), while the third contributes only an additional 8.7% (bringing the cumulative variance to 86.81%). In other words, the first two axes capture most of the structural information in the dataset. Furthermore, the third axis, although greater than 1, contributes little to the remaining variability and risks introducing secondary components that are difficult to interpret robustly, especially with a small number of observations. For these reasons (a large proportion of variance explained by the first two axes, the presence of a marked bend after axis 2, and the need to produce an interpretable and concise synthesis), we retain two factorial axes for the main analysis. The third axis may, however, be presented in an appendix if a more detailed exploration of the residual variations proves necessary.

CONTRIBUTION OF VARIABLES COMPONENTS

We examine the contribution of each variable to the principal components.

# Get PCA variable results
resultat.var <- get_pca_var(resultat_acp)
pander(resultat.var$coord)

	Dim.1	Dim.2
Elect_Gen	-0.4548	0.8823
Access_Elect	0.8706	0.4483
Access_Elect_Urbain	0.7557	0.3368
Access_Elect_Rural	0.6208	0.07858
Elec_Demand	-0.4971	0.8549
Total_Pop	-0.7399	0.5019
Rural_Pop	-0.6205	-0.6026
Pop_Growth	-0.8368	0.319
GDP_Per_Capita	0.7041	0.3572
HDI	0.779	0.6015
Fossil_fuels_elect_gen	0.1798	0.6623
Hydroelectricity_gen	-0.5388	0.8205

pander(resultat.var$cor)

	Dim.1	Dim.2
Elect_Gen	-0.4548	0.8823
Access_Elect	0.8706	0.4483
Access_Elect_Urbain	0.7557	0.3368
Access_Elect_Rural	0.6208	0.07858
Elec_Demand	-0.4971	0.8549
Total_Pop	-0.7399	0.5019
Rural_Pop	-0.6205	-0.6026
Pop_Growth	-0.8368	0.319
GDP_Per_Capita	0.7041	0.3572
HDI	0.779	0.6015
Fossil_fuels_elect_gen	0.1798	0.6623
Hydroelectricity_gen	-0.5388	0.8205

pander(resultat.var$cos2)

	Dim.1	Dim.2
Elect_Gen	0.2069	0.7784
Access_Elect	0.7579	0.2009
Access_Elect_Urbain	0.5711	0.1135
Access_Elect_Rural	0.3854	0.006175
Elec_Demand	0.2471	0.7308
Total_Pop	0.5475	0.2519
Rural_Pop	0.385	0.3632
Pop_Growth	0.7002	0.1018
GDP_Per_Capita	0.4958	0.1276
HDI	0.6068	0.3618
Fossil_fuels_elect_gen	0.03233	0.4386
Hydroelectricity_gen	0.2903	0.6732

pander(resultat.var$contrib)

	Dim.1	Dim.2
Elect_Gen	3.958	18.77
Access_Elect	14.5	4.845
Access_Elect_Urbain	10.93	2.735
Access_Elect_Rural	7.374	0.1489
Elec_Demand	4.728	17.62
Total_Pop	10.48	6.074
Rural_Pop	7.367	8.756
Pop_Growth	13.4	2.453
GDP_Per_Capita	9.486	3.076
HDI	11.61	8.722
Fossil_fuels_elect_gen	0.6186	10.57
Hydroelectricity_gen	5.555	16.23

Let’s now visualize these contributions on the contribution graphs :

From the analysis of the contribution graphs for the variables, it emerges that:

The variables that participate best in the formation of dimension 1 are the variables HDI, Fossil_fuels_elect_gen, Hydroelectricity_gen, Elec_Demand and Elec_Gen

The variables that contribute best to the formation of dimension 2 are Pop_Growth, Total_Pop and Access_Elec

Similarly, the variables Total_Pop, Elec_Demand, Elec_Gen, HDI, Access_Elec, Hydroelectricity_gen and Total_Pop contribute best to the formation of factorial plan.

fviz_pca_var(resultat_acp, col.var = "contrib", gradient.cols = c("blue", "orange", "red"), repel = TRUE, title = "Contribution of Variables to Principal Components")

fviz_contrib(resultat_acp, choice = "var", axes = 1, top = 12)

fviz_contrib(resultat_acp, choice = "var", axes = 2, top = 12)

fviz_contrib(resultat_acp, choice = "var", axes = 1:2, top = 12)

CONTRIBUTION OF INDIVIDUALS COMPONENTS

In this section, we explore the coordinates, quality of representation, and contributions of individuals (observations) to the PCA axes.

# Get PCA individual results
resultat.ind <- get_pca_ind(resultat_acp)
pander(resultat.ind$coord)

	Dim.1	Dim.2
Cameroon	0.5026	1.469
Republic of the Congo	0.9198	0.3619
DRC	-4.357	2.313
Gabon	3.612	1.262
Chad	-2.188	-2.291
Central African Republic	-1.257	-2.717
Ecuadorian Guinea	2.132	0.2254
Angola	-1.384	3.906
Rwanda	0.7403	-1.327
Burundi	-1.613	-2.485
São Tomé and Príncipe	2.894	-0.7181

pander(resultat.ind$cos2)

	Dim.1	Dim.2
Cameroon	0.06392	0.5459
Republic of the Congo	0.1893	0.0293
DRC	0.6939	0.1956
Gabon	0.8161	0.09962
Chad	0.4192	0.4593
Central African Republic	0.1402	0.6543
Ecuadorian Guinea	0.5306	0.00593
Angola	0.1021	0.8129
Rwanda	0.08499	0.273
Burundi	0.2763	0.6559
São Tomé and Príncipe	0.585	0.03602

pander(resultat.ind$contrib)

	Dim.1	Dim.2
Cameroon	0.4394	4.728
Republic of the Congo	1.472	0.287
DRC	33.03	11.73
Gabon	22.7	3.491
Chad	8.33	11.5
Central African Republic	2.751	16.18
Ecuadorian Guinea	7.904	0.1113
Angola	3.334	33.45
Rwanda	0.9533	3.858
Burundi	4.526	13.54
São Tomé and Príncipe	14.57	1.13

Let’s now visualize these contributions on the contribution graphs :

From the analysis of the contribution graphs for the individuals, it emerges that :

The individual that participate best in the formation of dimension 1 are DRC, Gabon and São Tomé and Príncipe

The individual that contribute best to the formation of dimension 2 are Angola and Central African Republic

Similarly, the individuals DRC, Angola and Gabon contribute best to the formation of factorial plan.

fviz_pca_ind(resultat_acp, col.ind = "cos2", gradient.cols = c("blue", "orange", "red"), repel = TRUE)

fviz_contrib(resultat_acp, choice = "ind", axes = 1, top = 12)

fviz_contrib(resultat_acp, choice = "ind", axes = 2, top = 12)

fviz_contrib(resultat_acp, choice = "ind", axes = 1:2, top = 12)

# Perform HCPC
resultat.cah <- HCPC(resultat_acp, nb.clust = -1, consol = FALSE, graph = FALSE)

# Visualize hierarchical clustering
plot.HCPC(resultat.cah, choice = 'tree', title = 'Hierarchical Tree')

plot.HCPC(resultat.cah, choice = 'map', draw.tree = FALSE, title = 'Factor Map')

Scree plot

MULTIPLE LINEAR REGRESSION

Finally, we fit a multiple linear regression model to explore the relationships between Access_Elec (electricity access) and various predictors.

Calculation of regression coefficients

# Fit multiple linear regression
regression <- lm(Access_Elect ~ Total_Pop + Elect_Gen + Elec_Demand + Rural_Pop + Pop_Growth + HDI + GDP_Per_Capita + Fossil_fuels_elect_gen + Hydroelectricity_gen, data = donnees_csv)
print(summary(regression))

## 
## Call:
## lm(formula = Access_Elect ~ Total_Pop + Elect_Gen + Elec_Demand + 
##     Rural_Pop + Pop_Growth + HDI + GDP_Per_Capita + Fossil_fuels_elect_gen + 
##     Hydroelectricity_gen, data = donnees_csv)
## 
## Residuals:
##                 Cameroon    Republic of the Congo                      DRC 
##                 -0.45629                 -0.16779                 -0.08937 
##                    Gabon                     Chad Central African Republic 
##                 -0.26422                  1.67162                  1.38688 
##        Ecuadorian Guinea                   Angola                   Rwanda 
##                  0.02580                  0.29476                  2.39262 
##                  Burundi    São Tomé and Príncipe 
##                 -4.32482                 -0.46918 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)
## (Intercept)            -9.896e+00  3.970e+01  -0.249    0.844
## Total_Pop               1.964e-06  7.106e-07   2.763    0.221
## Elect_Gen              -9.286e-02  3.373e-02  -2.753    0.222
## Elec_Demand             1.068e-02  1.192e-02   0.896    0.535
## Rural_Pop              -5.320e-01  2.255e-01  -2.359    0.255
## Pop_Growth             -2.065e+01  1.127e+01  -1.832    0.318
## HDI                     2.361e+02  4.705e+01   5.017    0.125
## GDP_Per_Capita         -1.845e-03  2.421e-03  -0.762    0.585
## Fossil_fuels_elect_gen  8.264e+01  2.925e+01   2.826    0.217
## Hydroelectricity_gen    8.124e+01  2.401e+01   3.384    0.183
## 
## Residual standard error: 5.456 on 1 degrees of freedom
## Multiple R-squared:  0.9964, Adjusted R-squared:  0.9644 
## F-statistic: 31.06 on 9 and 1 DF,  p-value: 0.1384

Our model is statistically significant, with a p-value below 5% (0.1384). However, only the variable Hydroelectricity_gen and HDI explain Access_Elec with a p-value below 5% (0.183 and 0.125). To improve the significance of our model, we will successively remove variables with high p-values.

regression5 <- lm(Access_Elect ~ Hydroelectricity_gen + HDI, data = donnees_csv)
print(summary(regression5))

## 
## Call:
## lm(formula = Access_Elect ~ Hydroelectricity_gen + HDI, data = donnees_csv)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.8533  -5.4869  -0.4644   5.8651  13.6879 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          -92.2424    15.1983  -6.069 0.000299 ***
## Hydroelectricity_gen  -0.8533     0.5783  -1.476 0.178278    
## HDI                  262.3808    27.6107   9.503 1.24e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.183 on 8 degrees of freedom
## Multiple R-squared:  0.9192, Adjusted R-squared:  0.899 
## F-statistic: 45.51 on 2 and 8 DF,  p-value: 4.26e-05

We obtained a more significant model with a p-value of 0.0000426. Additionally, it consists of variables that are all significant, with p-values below 5%. Furthermore, our model achieved a determination coefficient R² = 0.9192 (close to 1), indicating the quality of our fit.

Regression graphs

# Plot regression diagnostics
plot(regression5,which = 1)

The random distribution of points supports the model’s validity, but the presence of outliers, particularly for Cameroon, Republic of the Congo, and Ecuadorian Guinea, suggests that additional investigation into these cases may be warranted.

# Plot regression diagnostics
plot(regression5,which = 2)

We observe that the points generally follow a straight line, although there are some deviations, particularly for Cameroon, Republic of the Congo, and Ecuadorian Guinea. This suggests an overall normal distribution, thus demonstrating the quality of our model.

Predictions

We can use the model to make predictions for Access_Elec based on the values of the predictor variables.

# Make predictions
predictions <- predict(regression)
pander(predictions)

Table continues below
Cameroon	Republic of the Congo	DRC	Gabon	Chad
71.46	50.77	21.59	93.76	10.03

Table continues below
Central African Republic	Ecuadorian Guinea	Angola	Rwanda	Burundi
14.31	66.97	48.21	48.21	14.62

São Tomé and Príncipe
78.47

The model appears to be quite close to the actual values for several countries (for example, Cameroon, Republic of the Congo, DRC, Burundi, São Tomé and Príncipe), but there are notable discrepancies for certain countries (for example, Chad, Angola, Central African Republic, Ecuadorian Guinea, Rwanda).

CONCLUSION

In this analysis, we examined the challenges of electrification in Central Africa by applying statistical techniques to a set of socioeconomic, demographic, and energy variables. Principal Component Analysis (PCA) reduced the complexity of the dataset and identified the major dimensions that structure regional disparities. The results highlight the crucial role of GDP per capita, electricity production, access to electricity (urban and rural), and demographic characteristics in differentiating the countries of the region.

The PCA revealed that the first two dimensions capture most of the variability between countries, contrasting, on the one hand, states with relatively high economic and energy capacity, and on the other hand, those facing structural weaknesses, high levels of rural population, or significant population growth. Cluster analysis, when combined with PCA results, reveals distinct national profiles, reflecting heterogeneous levels of electrification, economic development, and territorial organization. The results also suggest that GDP per capita remains a key explanatory factor for access to electricity in the region, thus confirming dynamics already observed in other African contexts.

Overall, these results provide crucial insights into the persistent disparities in electrification across Central Africa and underscore the need for targeted policies, particularly to strengthen rural electrification, improve energy efficiency, and diversify generation sources, especially through renewable energy. Future research could incorporate longitudinal data to analyze changes over time, or include institutional and policy variables to better understand the influence of governance on electrification progress.