INTRODUCTION

Central Africa, comprising primarily Angola, Burundi, Cameroon, the Central African Republic, Chad, the Republic of Congo, the Democratic Republic of Congo, Gabon, Equatorial Guinea, Rwanda, and São Tomé and Príncipe, is one of the least electrified regions in the world. This sub-region exhibits particularly alarming rates of access to electricity, with the situation even more critical in rural areas. This literature review aims to examine in depth the structural obstacles hindering electrification in Central Africa, while also exploring the transformative potential of renewable energy as a means of reducing spatial inequalities in energy access.





Overall Project Objective

The overall objective of the project is to analyze the obstacles that limit electrification in Central Africa, by examining the disparities between urban and rural areas, the socio-economic and energy determinants, and the potential role of renewable energies in the sustainable improvement of access to electricity.



Project Utility (Scientific Rationale)

Project Utility (Scientific Rationale)

This project is useful because it allows us to:

• identify the structural factors (economic, demographic, infrastructural) that explain low electrification rates;

• assess energy inequalities between countries and between urban and rural areas;

• guide public policies towards more effective and appropriate solutions;

• determine the true potential of renewable energies as a lever for reducing disparities in access;

• provide an essential quantitative basis to support energy planning, investments, and electrification strategies.



Methodology

The methodology is based on a quantitative and comparative approach, structured in several stages:

Stage 1 : Data Collection

• Data from reliable sources: World Bank, UN, IEA, Our World in Data, AFRISTAT, national databases.

• Period studied: depending on availability (often 2000–2022).

Stage 2 : Definition and Structuring of Variables

• Dependent variables: access to electricity (total, rural, urban).

• Explanatory variables: economic, demographic, energy, and infrastructural aspects.

Stage 3 : Descriptive Analysis

• Descriptive statistics

• Comparison between countries

• Analysis of urban/rural disparities

Stage 4 : Causal Analysis

• Correlations and visualizations

• Explanatory models (if necessary: ​​multiple regression, factor analysis)

Stage 5 : Interpretation

• Discussion of major challenges

• Identification of opportunities

• Policy or strategic recommendations



Context and Rationale for the Study

Central Africa remains one of the least electrified regions in the world, despite its considerable energy potential, particularly in hydropower and solar power. Electricity access rates vary significantly between countries and between urban and rural areas, revealing structural challenges related to inadequate infrastructure, economic constraints, low investment, and often unstable energy governance. Furthermore, population growth, rapid urbanization, and increasing energy demand are placing additional strain on already fragile power systems.

In this context, it is essential to conduct a scientific study to identify the major obstacles to electrification and assess the potential role of renewable energies in the sustainable improvement of electricity access. This research is justified by its contribution to the development of effective strategies tailored to the realities of the region for policymakers, regional institutions, energy stakeholders, and the scientific community.



Problem Statement and Research Questions

Problem Statement

What obstacles limit electrification in Central Africa, and to what extent can renewable energies mitigate inequalities in access between rural and urban areas?



Research Questions

  1. What are the main technical, economic, demographic, and institutional factors that explain the low electrification rates in the region?

  2. How do disparities in access to electricity manifest themselves between rural and urban areas?

  3. What is the real potential of renewable energies to improve access to electricity in Central African countries?

  4. What levers can be used to sustainably reduce energy inequalities?



Study Objectives

Overall Objective

To analyze the challenges hindering electrification in Central Africa and assess the potential role of renewable energy in reducing inequalities in access to electricity.

Specific Objectives

  1. To analyze the socio-economic, demographic, and energy determinants of access to electricity.

  2. To assess disparities in access between rural and urban areas.

  3. To examine the current and future contribution of renewable energy to the regional energy mix.

  4. To identify policies and strategies likely to strengthen sustainable electrification.



Working Hypotheses

  1. H1: Low levels of electrification in Central Africa are primarily linked to weak infrastructure, financial constraints, and high population growth.

  2. H2: Disparities between urban and rural areas result from high installation costs and a lack of targeted investment in sparsely populated areas.

  3. H3: Renewable energies, particularly solar and hydropower, constitute a viable alternative for reducing inequalities in access.

  4. H4: Strengthened energy governance and coherent policies would significantly improve regional electrification.



Methodological Approach

This study adopts a quantitative and analytical approach based on:

Data Collection

• Sources: World Bank, UN, IEA, Our World in Data, national reports. • Variables: electricity production, access (urban/rural), population, population growth, GDP, income, HDI, industrialization, renewables, electricity demand.

Descriptive Analysis

• Basic Statistics • Comparison Between Countries • Analysis of Rural/Urban Disparities

• Summary Graphs and Tables

Explanatory Analysis

• Correlations Between Variables • Analysis of Determinants of Access • Identification of Relationships Between Economic, Demographic, and Energy Factors

Interpretation and Recommendations

• Discussion of Identified Issues • Potential of Renewables • Proposals for Sustainable Electrification Measures



Detailed Description and Explanation of Each Variable

  1. Electricity Generation
This is the total amount of electricity generated in a country. It allows us to assess the energy system’s capacity to meet demand. In Central Africa, low production is one of the main obstacles to universal access. This variable is relevant to the topic because it directly reflects the energy supply and its shortcomings.





  1. Access to Electricity (Urban and Rural)
These indicators measure the percentage of the population with an electricity connection The urban/rural dimension reveals one of the region’s biggest challenges: • urban areas are often better electrified, • rural areas are largely neglected. These variables are central because they measure inequalities in access, a fundamental element of the topic.







  1. Electricity Demand
This represents the energy needs of the population and economic sectors. In Central Africa, demand is increasing rapidly due to urbanization and economic growth. This variable is relevant for understanding the imbalance between supply and demand.





  1. Total Population
This determines the scale of the needs for electrical infrastructure. The larger the population, the greater the necessary investments. This variable is essential for analyzing the overall pressure on the energy system.
  1. Rural Population
This variable indicates the proportion of people living outside urban areas. Rural areas are where electrification is lowest. It therefore helps explain the geographical, logistical, and economic challenges of electrification.





  1. Population Growth
A high growth rate increases energy demand and complicates electrification efforts. In several Central African countries, the population is growing faster than energy capacity. This variable perfectly describes the theme because it highlights the increasing pressure on the grid.





  1. GDP (Gross Domestic Product)
GDP measures a country’s economic wealth. Countries with low GDP have more difficulty financing electricity infrastructure. This variable is essential because the challenge of electrification is also an economic challenge.





  1. Human Development Index (HDI)
The HDI reflects the level of education, health, and income. A low HDI is often associated with limited access to electricity. This variable illustrates the link between the level of development and access to energy.





  1. Fossil Fuel Production
This shows the country’s capacity to produce electricity from oil, gas, or coal. In some countries, dependence on fuel imports weakens the energy system. This variable explains vulnerabilities related to non-renewable energy sources.





  1. Hydroelectricity generation
Central Africa has significant hydropower potential (particularly the Congo). This variable measures the actual exploitation of this potential. It is crucial for identifying opportunities for sustainable energy development.





  1. Population Income Level
This indicates whether households can afford a connection and regular consumption. In some areas, even if electricity exists, it is not financially accessible. This variable helps explain the economic barriers to electrification.
  1. Level of Industrialization
Industrialization increases electricity demand and requires a reliable grid. A low level of industrialization limits investment, and therefore energy capacity. It describes the theme by highlighting the link between economic structure and electrification capacity.



Relevance of the variables and justification for their selection

These variables were chosen because they: • cover the essential dimensions of electrification challenges: economic, demographic, social, and energy-related;

• explain territorial inequalities, particularly between rural and urban areas;

• reflect the structural constraints specific to Central Africa;

• allow for a robust quantitative analysis based on available data;

• allow for an assessment of the potential role of renewable energies.

Together, these variables describe the theme because they allow for the analysis of both obstacles (production, income, governance, rural population) and opportunities (hydropower, energy transition).



DOWNLOAD THE PACKAGES

# Set options and load required libraries
knitr::opts_chunk$set(echo = TRUE)
library(FactoMineR)
library(factoextra)
## Le chargement a nécessité le package : ggplot2
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
library(ggplot2)
library(psych)
## 
## Attachement du package : 'psych'
## Les objets suivants sont masqués depuis 'package:ggplot2':
## 
##     %+%, alpha
library(Factoshiny)
## Le chargement a nécessité le package : shiny
## Le chargement a nécessité le package : FactoInvestigate
library(shiny)
library(FactoInvestigate)
#Set options and load required libraries
library(DataExplorer)
## Warning: le package 'DataExplorer' a été compilé avec la version R 4.5.2
library(corrplot)
## corrplot 0.95 loaded
library(pander)
## Warning: le package 'pander' a été compilé avec la version R 4.5.2
## 
## Attachement du package : 'pander'
## L'objet suivant est masqué depuis 'package:shiny':
## 
##     p
library(DT)
## 
## Attachement du package : 'DT'
## Les objets suivants sont masqués depuis 'package:shiny':
## 
##     dataTableOutput, renderDataTable
library(rsconnect)
## Warning: le package 'rsconnect' a été compilé avec la version R 4.5.2
## 
## Attachement du package : 'rsconnect'
## L'objet suivant est masqué depuis 'package:shiny':
## 
##     serverInfo
library(askpass)
library(VIM)
## Le chargement a nécessité le package : colorspace
## Le chargement a nécessité le package : grid
## VIM is ready to use.
## Suggestions and bug-reports can be submitted at: https://github.com/statistikat/VIM/issues
## 
## Attachement du package : 'VIM'
## L'objet suivant est masqué depuis 'package:datasets':
## 
##     sleep
library(dplyr)
## 
## Attachement du package : 'dplyr'
## Les objets suivants sont masqués depuis 'package:stats':
## 
##     filter, lag
## Les objets suivants sont masqués depuis 'package:base':
## 
##     intersect, setdiff, setequal, union

DATABASE IMPORT

The database was entered into Excel and saved in CSV (semicolon-delimited) format. We then imported this file into R using the “read.csv()” function.

#Define the file path
file <- "C:/Users/LEGION/Desktop/Projet RTI S7A-2025/RTI_DATA_2022.csv"

# Set working directory and load data
donnees_csv <- read.csv(file, header = TRUE, sep = ";", dec = ",", row.names = 1,  fileEncoding = "latin1")
datatable(donnees_csv, options = list(pageLength = 5, autoWidth = TRUE))

To improve the readability of the database header, we decided to rename the variables so that the software could not display the units of the data contained within parentheses. To do this, we used the “colnames” function, which we applied to our database.

#Rename the columns with the variable names without the units in parentheses
colnames(donnees_csv) <-  c(
  "Elect_Gen",                 # Elec_Gen (GWh)
  "Access_Elect",              # Access_Elec (% of pop)
  "Access_Elect_Urbain",       # Access_Elec_urban (% of urban pop)
  "Access_Elect_Rural",        # Access_Elec_Rural (% of rural pop)
  "Elec_Demand",               # Elec_Demand (GWh)
  "Total_Pop",                 # Total_Pop (hbts) 
  "Rural_Pop",                 # Rural_Pop (% of total pop)
  "Pop_Growth",                # Pop_Growth (annual %)
  "GDP_Per_Capita",            # GDP_Per_Capita (current US$)
  "HDI",                       # HDI
  "Fossil_fuels_elect_gen",    # Fossil fuels elect gen (billion kWh)
  "Hydroelectricity_gen",      # Hydroelectricity generation (billion kWh)
  "Income_Class",              # Income_Class
  "Indust_Level"               #Indust_Level
)

datatable(donnees_csv, options = list(pageLength = 5, autoWidth = TRUE))



ANALYSIS OF MISSING DATA

In this section, we created a function that identifies missing values and calculates their proportion relative to the entire database. After running the code, we see that the data on the rural electrification rate is missing. Therefore, we have a proportion of 0.1 missing values compared to 0.9 for the other data.

# Function to calculate the proportion of missing values per variable
proportion_valeurs_manquantes <- function(data) 
  {
    # Calculating the number of missing values ​​per column
  nb_valeurs_manquantes <- sapply(data, function(x) sum(is.na(x)))

  # Calculating the proportion of missing values
  proportion_manquantes <- nb_valeurs_manquantes / nrow(data)

  # Creating a dataframe for the result
  resultat <- data.frame(Nombre = nb_valeurs_manquantes, Proportion = proportion_manquantes)
  return(resultat)
}

# Using the function with your database
resultat <- proportion_valeurs_manquantes(donnees_csv)

# Displaying the result
resultat
##                        Nombre Proportion
## Elect_Gen                   0 0.00000000
## Access_Elect                0 0.00000000
## Access_Elect_Urbain         0 0.00000000
## Access_Elect_Rural          1 0.09090909
## Elec_Demand                 0 0.00000000
## Total_Pop                   0 0.00000000
## Rural_Pop                   0 0.00000000
## Pop_Growth                  0 0.00000000
## GDP_Per_Capita              0 0.00000000
## HDI                         0 0.00000000
## Fossil_fuels_elect_gen      0 0.00000000
## Hydroelectricity_gen        0 0.00000000
## Income_Class                0 0.00000000
## Indust_Level                0 0.00000000
# Using the aggr() function to view missing values

aggr(donnees_csv, col=c('navyblue','yellow'), numbers=TRUE, sortVars=TRUE, 
     labels=names(donnees_csv), cex.axis=.7, gap=3, ylab=c("Histogram of missing data","Pattern"))

## 
##  Variables sorted by number of missings: 
##                Variable      Count
##      Access_Elect_Rural 0.09090909
##               Elect_Gen 0.00000000
##            Access_Elect 0.00000000
##     Access_Elect_Urbain 0.00000000
##             Elec_Demand 0.00000000
##               Total_Pop 0.00000000
##               Rural_Pop 0.00000000
##              Pop_Growth 0.00000000
##          GDP_Per_Capita 0.00000000
##                     HDI 0.00000000
##  Fossil_fuels_elect_gen 0.00000000
##    Hydroelectricity_gen 0.00000000
##            Income_Class 0.00000000
##            Indust_Level 0.00000000

Only one variable contains a missing value: • Access_Elect_Rural: 1 missing value (10% for this variable). All other variables have 0% missing values.

Interpretation • The dataset is generally clean and fully usable.

• The single missing value does not invalidate the PCA, especially since FactoMineR automatically imputes using the mean (but the note warned you to potentially use imputePCA).

• This near-total absence of NA means that the PCA results will be stable and reliable.



DESCRIPTION OF QUANTITATIVE VARIABLES

This description involves creating histograms and boxplots for each variable. These graphs will allow us to analyze and understand the distribution of each variable: the mean, the variance, outliers, etc. We begin this step by identifying the columns containing quantitative variables using the “sapply()” package.

# Sélection des colonnes 1, 4, 9, 11 et 12
colonnes_cible <- c(1, 4, 9, 11, 12)

# Vérifier qu'elles sont bien numériques
vars_quantitatives <- sapply(donnees_csv[, colonnes_cible], is.numeric)

# Boucle pour faire un histogramme pour chaque variable souhaitée
for (var in names(donnees_csv[, colonnes_cible])[vars_quantitatives]) {
  print(
    ggplot(donnees_csv, aes_string(x = var)) +
      geom_histogram(bins = 30, fill = "blue", color = "black") +
      theme_minimal() +
      labs(title = paste("Histogramme de", var),
           x = var,
           y = "Fréquence")
  )
}
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_bin()`).

The histograms show very heterogeneous distributions, which is normal in an analysis of Central Africa where countries have very different energy profiles. Indeed,

Elect_Gen (Total Electricity Production) • Highly asymmetrical distribution. • Cameroon, Congo, and Gabon produce significantly more electricity than the Central African Republic (CAR) and Chad. This description indicates strong structural disparities between countries.

Access_Elect, Access_Elect_Urban, Access_Elect_Rural • High heterogeneity: • Gabon and Cameroon: very high access • Chad and CAR: very low access • Rural: extremely low values ​​everywhere (very low for CAR and Chad) This confirms that rural access is the main challenge in the region.

GDP_per_capita, HDI • Highly dispersed distribution: Gabon and Equatorial Guinea are by far the most dominant. Wealthy countries have better energy performance.

Fossil fuels, electricity, and hydropower • Some countries do not use fossil fuels or hydropower at all (Chad, Central African Republic). The energy mix also explains the disparities in electrification.

These descriptions may be confirmed or refuted by the ACP



# Sélection des colonnes 1, 4, 9 et 12
colonnes_cible <- c(1, 4, 9, 11, 12)

# Vérifier que ces colonnes sont bien numériques
vars_quantitatives <- sapply(donnees_csv[, colonnes_cible], is.numeric)

# Création d'un boxplot pour chaque variable ciblée
for (var in names(donnees_csv[, colonnes_cible])[vars_quantitatives]) {
  print(
    ggplot(donnees_csv, aes_string(x = factor(1), y = var)) +
      geom_boxplot(fill = "skyblue", color = "darkblue") +
      theme_minimal() +
      labs(title = paste("Boxplot de", var),
           x = "",
           y = var)
  )
}

## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_boxplot()`).

The boxplots show: • Strong outliers for: • General Elective (Cameroon very high) • GDP per capita (Gabon and Equatorial Guinea well above) • Rural Elective Access (Gabon much above the others) This confirms that the sample contains extremely diverse countries, fully justifying the use of PCA to identify typical profiles.



ANALYSIS OF QUALITATIVE VARIABLES

# Function to create a barplot in proportions
creer_barplot_proportion <- function(data, column_name) 
  {
  # Calculate the proportionss
  proportions <- data %>%
    count(.data[[column_name]]) %>%
    mutate(Proportion = n / sum(n))

  # Create the barplot
  ggplot(proportions, aes_string(x = column_name, y = "Proportion", fill = column_name)) +
    geom_bar(stat = "identity") +
    scale_y_continuous(labels = scales::percent_format()) +
    labs(x = column_name, y = "Proportion (%)") +
    theme_minimal()
}

# Create a bar plot for the variable "Income_Class"
creer_barplot_proportion(donnees_csv, "Income_Class")

# Create a bar plot for the variable "Indust_Level"
creer_barplot_proportion(donnees_csv, "Indust_Level")

The income bracket chart shows that over 45% of countries have very low incomes and nearly 35% have middle incomes. Only slightly less than 20% of the countries studied have relatively high incomes.

The industrialization level chart shows that nearly 40% of Central African countries have low levels of industrialization, while nearly 40% have high levels of industrialization.



ANALYSIS OF CORRELATIONS BETWEEN QUANTITATIVE VARIABLES

We calculate the correlation matrix for the first nine variables and visualize it using a correlation plot. This helps in understanding relationships between variables before performing PCA.

# Identify the quantitative columns
vars_quantitatives <- sapply(donnees_csv, is.numeric)

#Extraction of quantitative variables
donnees_quantitatives <- donnees_csv[, vars_quantitatives]

# Calculate the correlation matrix
matrice_correlation <- cor(donnees_quantitatives, use = "complete.obs")

datatable(matrice_correlation, options = list(pageLength = 6)) %>%
  formatRound(columns = 1:ncol(matrice_correlation), digits = 2)
# DataExplorer correlation plot
corrplot(matrice_correlation, method = "color", type = "upper", tl.col = "black", tl.srt = 75)

The correlation graph (heatmap) highlights the relationships between the different variables associated with electrification in Central Africa. Several important links clearly emerge:

Strong positive correlation between Access_Elect, Urban Access_Elect, and Rural Access_Elect

These three electricity access variables are very strongly correlated with each other (high coefficients, in dark blue). This means that: • when a country has good overall access to electricity, • it also has good access in urban areas, • and often better access in rural areas (even if the levels remain low). This is logical: overall access is primarily driven by urban performance, but when rural access improves, it immediately enhances total access.

Very strong correlation between GDP_per_capita and HDI

The matrix shows one of the highest positive correlations (intense blue). This indicates that: • Countries with a high GDP per capita (Gabon, Equatorial Guinea) • also have a higher Human Development Index. This reflects a structural reality: The wealthier a country is, the better its performance in health, education, and infrastructure—and therefore in electrification.

Positive Correlation Between Electricity_Gen and Electricity_Demand

The two variables are almost perfectly correlated. This means that: • Countries that produce a lot of electricity • are also those that consume a lot of it. This is normal behavior for energy systems: Demand drives production, and production capacity depends on the level of industrialization and urbanization.

Population Growth Negatively Correlated with HDI

Even though the correlation is less strong, a negative link is observed between: • population growth rate (Population Growth), • human development level (HDI). This suggests that countries with high population growth (Ex : Chad, Central African Republic) are also those with lower human development.

This can be explained by: • pressure on public services, • the difficulty of electrifying a rapidly growing population, • infrastructure that cannot keep pace.

Rural Population Negatively Correlated with Access to Electricity

The proportion of the rural population is inversely correlated with overall access to electricity. In other words: • the more rural a country is, • the lower its access to electricity.

This reflects a fundamental reality in Central Africa: rural electrification is the main energy deficit because: • distances are greater, • infrastructure costs are higher, • rural areas are less profitable for operators.

Relationships between energy sources: Hydro and Fossil fuels

• Countries with high hydroelectric production (Cameroon, Gabon) are not those with high fossil fuel production. • The two variables are therefore generally inversely correlated.

This shows two types of energy profiles: • “hydro-dependent” countries • “fossil fuel-dependent” countries



CENTER AND REDUCE THE DATA

After the descriptive analysis of the variables and the examination of the correlation matrix, it becomes clear that the indicators used in the database are not expressed on comparable scales. Some variables, such as electricity production or demand, have very high values ​​expressed in gigawatt-hours, while others, such as electricity access rates or the proportion of the rural population, are expressed as percentages. Similarly, indicators such as the HDI and GDP per capita vary significantly in magnitude and units. In this context, performing a PCA absolutely requires harmonizing the scales to prevent variables with large magnitudes from dominating the analysis. This methodological preparation naturally leads to the next step: centering and reducing the data, which makes all variables comparable and ensures a reliable interpretation of the factorial axes.
# Center and reduce the data
donnees_centrees_reduites <- scale(donnees_quantitatives,center = TRUE,scale=TRUE)
datatable(donnees_centrees_reduites, options = list(pageLength = 5, autoWidth = TRUE))



FACTORIAL ANALYSIS (FA) / Principal component analysis (PCA)

Factor analysis is a family of statistical methods used to reduce the dimensionality of a dataset while retaining essential information. It involves identifying latent dimensions, called factors, that summarize the relationships between the initial variables. This approach is particularly relevant when working with a large number of interdependent variables, as is the case in the study of the determinants of electrification in Central Africa. Among the factor analysis methods, we have chosen Principal Component Analysis (PCA), which is well-suited to quantitative data. PCA transforms the set of initial variables into a smaller number of new, uncorrelated components, while maximizing the proportion of variance explained. Therefore, the implementation of PCA in our study aims primarily to simplify the data structure and identify the major components that summarize the energy, demographic, and socioeconomic characteristics of the countries analyzed.
# Perform the PCA
resultat_acp <- PCA(donnees_centrees_reduites, axes = c(1, 2), graph = TRUE)
## Warning in PCA(donnees_centrees_reduites, axes = c(1, 2), graph = TRUE):
## Missing values are imputed by the mean of the variable: you should use the
## imputePCA function of the missMDA package

# Display the results of the PCA
print(resultat_acp)
## **Results for the Principal Component Analysis (PCA)**
## The analysis was performed on 11 individuals, described by 12 variables
## *The results are available in the following objects:
## 
##    name               description                          
## 1  "$eig"             "eigenvalues"                        
## 2  "$var"             "results for the variables"          
## 3  "$var$coord"       "coord. for the variables"           
## 4  "$var$cor"         "correlations variables - dimensions"
## 5  "$var$cos2"        "cos2 for the variables"             
## 6  "$var$contrib"     "contributions of the variables"     
## 7  "$ind"             "results for the individuals"        
## 8  "$ind$coord"       "coord. for the individuals"         
## 9  "$ind$cos2"        "cos2 for the individuals"           
## 10 "$ind$contrib"     "contributions of the individuals"   
## 11 "$call"            "summary statistics"                 
## 12 "$call$centre"     "mean of the variables"              
## 13 "$call$ecart.type" "standard error of the variables"    
## 14 "$call$row.w"      "weights for the individuals"        
## 15 "$call$col.w"      "weights for the variables"



### Correlation Circle of Variables

The correlation circle analysis clearly identifies the structure of the first two axes of the PCA. The first dimension (Dimension 1), which explains 43.41% of the total variance, is strongly correlated with variables reflecting the level of socioeconomic development and energy performance. We observe that the vectors GDP_Per_Capita, HDI, Access_Elect, Access_Elect_Urban, as well as the variables related to electricity production and demand (Elect_Gen, Elect_Demand, Hydroelectricity_gen, Fossil_fuels_elect_gen) clearly point in the positive direction of this axis. This means that Dimension 1 contrasts countries with high levels of wealth, advanced electrification, and a more developed energy system with those with low economic and energy capacity. Thus, this dimension can be named:

Axis 1: “Economic Development and Energy Performance”

The second dimension (Dimension 2), which explains 36.60% of the variance, primarily contrasts demographic variables. The vectors Pop_Growth and Total_Pop are positively aligned on this axis, while Rural_Pop is projected to the negative side. This structure reflects a contrast between, on the one hand, countries with high population growth or a large total population, and on the other hand, those where the population is predominantly rural. The positioning of variables such as Hydroelectricity_gen or Elect_Demand near the vertical axis indicates that they contribute moderately to this dimension, without strongly structuring it. Dimension 2 therefore expresses characteristics related to population pressure, urbanization, and territorial imbalances more than to energy performance. This dimension can be named:

Axis 2: “Demographic Dynamics and Territorial Structure”



Correlation Circle of Individuals

Analysis of the first two axes of the PCA highlights the main factors structuring energy differences between Central African countries. The first dimension, which explains 43.41% of the total variance, clearly distinguishes countries with strong economic and industrial capacity from those with a lower level of development. Positive values ​​on this axis are associated with greater access to electricity, higher energy production, and a higher level of industrialization, encompassing countries such as Angola, Gabon, Equatorial Guinea, DRC and the Congo. Conversely, low-income countries, characterized by low electrification and limited energy infrastructure—notably Burundi, the Central African Republic, and Chad—are located on the negative end of this dimension. The second dimension, which explains 36.60% of the variance, introduces further differentiation based on structural characteristics related to demographics, institutional stability, and the development of public services. Countries at the top of the axis generally face more pronounced socio-economic challenges, while those at the bottom, such as São Tomé and Príncipe, Gabon, and Equatorial Guinea, are distinguished by particular economic structures or atypical demographic profiles. Thus, the two axes combined reveal a clear contrast between countries with strong economic and energy capacity and those experiencing structural vulnerability, while also reflecting the region’s internal diversity.



# Perform the PCA with qualitatives variables
resultat_acp <- PCA(donnees_csv, scale.unit = TRUE, ncp = 2, quali.sup = 13:14, graph = TRUE)
## Warning in PCA(donnees_csv, scale.unit = TRUE, ncp = 2, quali.sup = 13:14, :
## Missing values are imputed by the mean of the variable: you should use the
## imputePCA function of the missMDA package

# Display the results of the PCA
print(resultat_acp)
## **Results for the Principal Component Analysis (PCA)**
## The analysis was performed on 11 individuals, described by 14 variables
## *The results are available in the following objects:
## 
##    name                description                                          
## 1  "$eig"              "eigenvalues"                                        
## 2  "$var"              "results for the variables"                          
## 3  "$var$coord"        "coord. for the variables"                           
## 4  "$var$cor"          "correlations variables - dimensions"                
## 5  "$var$cos2"         "cos2 for the variables"                             
## 6  "$var$contrib"      "contributions of the variables"                     
## 7  "$ind"              "results for the individuals"                        
## 8  "$ind$coord"        "coord. for the individuals"                         
## 9  "$ind$cos2"         "cos2 for the individuals"                           
## 10 "$ind$contrib"      "contributions of the individuals"                   
## 11 "$quali.sup"        "results for the supplementary categorical variables"
## 12 "$quali.sup$coord"  "coord. for the supplementary categories"            
## 13 "$quali.sup$v.test" "v-test of the supplementary categories"             
## 14 "$call"             "summary statistics"                                 
## 15 "$call$centre"      "mean of the variables"                              
## 16 "$call$ecart.type"  "standard error of the variables"                    
## 17 "$call$row.w"       "weights for the individuals"                        
## 18 "$call$col.w"       "weights for the variables"
The integration of the additional qualitative variables “Income_Class” and “Indust_Level” enriches the analysis by revealing key socio-economic dynamics within Central African countries. The categories projected onto the factorial plane logically align with the distribution of countries. Thus, the categories Income_Class_Low and Indust_Level_Bottom, located on the negative side of dimension 1, correspond to countries such as Burundi, the Central African Republic, and Chad, characterized by low income levels, limited industrialization, and underdeveloped energy infrastructure. Conversely, the categories Income_Class_U-M and Indust_Level_U-M, positioned on the positive side of this axis, correspond to countries with a higher level of economic and industrial development, notably Gabon, Equatorial Guinea, Congo, and especially DRC and Angola, which stands out as an extreme case. Finally, intermediate categories, such as Income_Class_L-M or Indust_Level_L-M, are located near the center of the graph and reflect transitional economic profiles, encompassing countries like Cameroon, Rwanda, and the DRC. Overall, these qualitative categories confirm the consistency of the resulting factor structure: income level and degree of industrialization play a decisive role in the energy contrasts observed between countries in the region.



# Biplot visualization
fviz_pca_biplot(resultat_acp, repel = TRUE)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## ℹ The deprecated feature was likely used in the ggpubr package.
##   Please report the issue at <https://github.com/kassambara/ggpubr/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

This graph is a principal component analysis (PCA) bigraph, which visually represents the relationships between countries (points) and variables (arrows) based on the first two principal dimensions, Dim1 and Dim2. These dimensions capture 43.55% and 34.57% of the data variance, respectively, meaning that together they account for 78.12% of the total variance. We can also make the following observations:

• The first axis, titled “Economic Development and Energy Performance,” reflects a clear opposition between two groups of countries. To the right of the axis are the most developed countries, characterized by high GDP per capita, a higher HDI, significant energy production, and better rates of access to electricity, both in urban and rural areas. These include Gabon, Equatorial Guinea, Cameroon, DRC and Angola. Conversely, to the left of this axis appear countries with a lower level of development, limited energy production, and reduced access to electricity, such as the Central African Republic, Chad, and Burundi. The first axis thus reflects the overall gradient of development and energy performance among the countries studied.

• The second axis, called “Demographic Dynamics and Territorial Structure,” contrasts countries characterized by strong population growth and a predominantly rural population—such as Chad, the Central African Republic, and Burundi—with those whose demographic structure is more stable and more urbanized, such as Gabon, Equatorial Guinea, and São Tomé and Príncipe. This axis therefore highlights the influence of rurality and demographic pressure on the challenges related to electrification, showing that the more rural countries with high population growth are also those that encounter the greatest difficulties in accessing energy services.



EIGENVALUES

The eigenvalues indicate the amount of variance explained by each principal component.

# Extract and plot eigenvalues
val.propre <- get_eigenvalue(resultat_acp)
pander(val.propre)
  eigenvalue variance.percent cumulative.variance.percent
Dim.1 5.226 43.55 43.55
Dim.2 4.148 34.57 78.12
Dim.3 1.044 8.696 86.81
Dim.4 0.7268 6.056 92.87
Dim.5 0.5651 4.709 97.58
Dim.6 0.1574 1.312 98.89
Dim.7 0.08546 0.7122 99.6
Dim.8 0.03278 0.2732 99.88
Dim.9 0.01335 0.1113 99.99
Dim.10 0.001465 0.01221 100
fviz_eig(resultat_acp, addlabels = TRUE, ylim = c(0, 50))
## Warning in geom_bar(stat = "identity", fill = barfill, color = barcolor, :
## Ignoring empty aesthetic: `width`.

Eigenvalue analysis shows that the first three axes have eigenvalues ​​greater than 1, which, according to Kaiser’s criterion, initially suggests retaining these three components. However, the final decision also takes into account the scree test and considerations of parsimony and interpretability. The scree diagram shows a clear slowing of the slope after the second axis: the first two axes together explain 78.12% of the total variance (43.55% and 34.57%), while the third contributes only an additional 8.7% (bringing the cumulative variance to 86.81%). In other words, the first two axes capture most of the structural information in the dataset. Furthermore, the third axis, although greater than 1, contributes little to the remaining variability and risks introducing secondary components that are difficult to interpret robustly, especially with a small number of observations. For these reasons (a large proportion of variance explained by the first two axes, the presence of a marked bend after axis 2, and the need to produce an interpretable and concise synthesis), we retain two factorial axes for the main analysis. The third axis may, however, be presented in an appendix if a more detailed exploration of the residual variations proves necessary.



CONTRIBUTION OF VARIABLES COMPONENTS

We examine the contribution of each variable to the principal components.

# Get PCA variable results
resultat.var <- get_pca_var(resultat_acp)
pander(resultat.var$coord)
  Dim.1 Dim.2
Elect_Gen -0.4548 0.8823
Access_Elect 0.8706 0.4483
Access_Elect_Urbain 0.7557 0.3368
Access_Elect_Rural 0.6208 0.07858
Elec_Demand -0.4971 0.8549
Total_Pop -0.7399 0.5019
Rural_Pop -0.6205 -0.6026
Pop_Growth -0.8368 0.319
GDP_Per_Capita 0.7041 0.3572
HDI 0.779 0.6015
Fossil_fuels_elect_gen 0.1798 0.6623
Hydroelectricity_gen -0.5388 0.8205
pander(resultat.var$cor)
  Dim.1 Dim.2
Elect_Gen -0.4548 0.8823
Access_Elect 0.8706 0.4483
Access_Elect_Urbain 0.7557 0.3368
Access_Elect_Rural 0.6208 0.07858
Elec_Demand -0.4971 0.8549
Total_Pop -0.7399 0.5019
Rural_Pop -0.6205 -0.6026
Pop_Growth -0.8368 0.319
GDP_Per_Capita 0.7041 0.3572
HDI 0.779 0.6015
Fossil_fuels_elect_gen 0.1798 0.6623
Hydroelectricity_gen -0.5388 0.8205
pander(resultat.var$cos2)
  Dim.1 Dim.2
Elect_Gen 0.2069 0.7784
Access_Elect 0.7579 0.2009
Access_Elect_Urbain 0.5711 0.1135
Access_Elect_Rural 0.3854 0.006175
Elec_Demand 0.2471 0.7308
Total_Pop 0.5475 0.2519
Rural_Pop 0.385 0.3632
Pop_Growth 0.7002 0.1018
GDP_Per_Capita 0.4958 0.1276
HDI 0.6068 0.3618
Fossil_fuels_elect_gen 0.03233 0.4386
Hydroelectricity_gen 0.2903 0.6732
pander(resultat.var$contrib)
  Dim.1 Dim.2
Elect_Gen 3.958 18.77
Access_Elect 14.5 4.845
Access_Elect_Urbain 10.93 2.735
Access_Elect_Rural 7.374 0.1489
Elec_Demand 4.728 17.62
Total_Pop 10.48 6.074
Rural_Pop 7.367 8.756
Pop_Growth 13.4 2.453
GDP_Per_Capita 9.486 3.076
HDI 11.61 8.722
Fossil_fuels_elect_gen 0.6186 10.57
Hydroelectricity_gen 5.555 16.23

Let’s now visualize these contributions on the contribution graphs :

From the analysis of the contribution graphs for the variables, it emerges that:

The variables that participate best in the formation of dimension 1 are the variables HDI, Fossil_fuels_elect_gen, Hydroelectricity_gen, Elec_Demand and Elec_Gen

The variables that contribute best to the formation of dimension 2 are Pop_Growth, Total_Pop and Access_Elec

Similarly, the variables Total_Pop, Elec_Demand, Elec_Gen, HDI, Access_Elec, Hydroelectricity_gen and Total_Pop contribute best to the formation of factorial plan.



fviz_pca_var(resultat_acp, col.var = "contrib", gradient.cols = c("blue", "orange", "red"), repel = TRUE, title = "Contribution of Variables to Principal Components")

fviz_contrib(resultat_acp, choice = "var", axes = 1, top = 12)

fviz_contrib(resultat_acp, choice = "var", axes = 2, top = 12)

fviz_contrib(resultat_acp, choice = "var", axes = 1:2, top = 12)



CONTRIBUTION OF INDIVIDUALS COMPONENTS

In this section, we explore the coordinates, quality of representation, and contributions of individuals (observations) to the PCA axes.

# Get PCA individual results
resultat.ind <- get_pca_ind(resultat_acp)
pander(resultat.ind$coord)
  Dim.1 Dim.2
Cameroon 0.5026 1.469
Republic of the Congo 0.9198 0.3619
DRC -4.357 2.313
Gabon 3.612 1.262
Chad -2.188 -2.291
Central African Republic -1.257 -2.717
Equatorial Guinea 2.132 0.2254
Angola -1.384 3.906
Rwanda 0.7403 -1.327
Burundi -1.613 -2.485
São Tomé and Príncipe 2.894 -0.7181
pander(resultat.ind$cos2)
  Dim.1 Dim.2
Cameroon 0.06392 0.5459
Republic of the Congo 0.1893 0.0293
DRC 0.6939 0.1956
Gabon 0.8161 0.09962
Chad 0.4192 0.4593
Central African Republic 0.1402 0.6543
Equatorial Guinea 0.5306 0.00593
Angola 0.1021 0.8129
Rwanda 0.08499 0.273
Burundi 0.2763 0.6559
São Tomé and Príncipe 0.585 0.03602
pander(resultat.ind$contrib)
  Dim.1 Dim.2
Cameroon 0.4394 4.728
Republic of the Congo 1.472 0.287
DRC 33.03 11.73
Gabon 22.7 3.491
Chad 8.33 11.5
Central African Republic 2.751 16.18
Equatorial Guinea 7.904 0.1113
Angola 3.334 33.45
Rwanda 0.9533 3.858
Burundi 4.526 13.54
São Tomé and Príncipe 14.57 1.13

Let’s now visualize these contributions on the contribution graphs :

From the analysis of the contribution graphs for the individuals, it emerges that :

The individual that participate best in the formation of dimension 1 are DRC, Gabon and São Tomé and Príncipe

The individual that contribute best to the formation of dimension 2 are Angola and Central African Republic

Similarly, the individuals DRC, Angola and Gabon contribute best to the formation of factorial plan.



fviz_pca_ind(resultat_acp, col.ind = "cos2", gradient.cols = c("blue", "orange", "red"), repel = TRUE)

fviz_contrib(resultat_acp, choice = "ind", axes = 1, top = 12)

fviz_contrib(resultat_acp, choice = "ind", axes = 2, top = 12)

fviz_contrib(resultat_acp, choice = "ind", axes = 1:2, top = 12)



# Perform HCPC
resultat.cah <- HCPC(resultat_acp, nb.clust = -1, consol = FALSE, graph = FALSE)

# Visualize hierarchical clustering
plot.HCPC(resultat.cah, choice = 'tree', title = 'Hierarchical Tree')

plot.HCPC(resultat.cah, choice = 'map', draw.tree = FALSE, title = 'Factor Map')

The dendrogram derived from the hierarchical clustering highlights three main groups of countries with similar energy and socio-economic profiles. The first group comprises the DRC (Democratic Republic of Congo) and Angola, two countries characterized by high energy production, large land area, large population, and significant electricity demand, which explains their proximity in the tree. The second group includes Chad, Burundi, and the Central African Republic, countries with low levels of development, a predominantly rural population, very limited access to electricity, and reduced generation capacity; these structural similarities justify their grouping. The third, more diverse group includes Gabon, Equatorial Guinea, São Tomé and Príncipe, Rwanda, Cameroon, and the Republic of Congo. These countries exhibit higher levels of development, more pronounced urbanization, and better energy performance—particularly Gabon and Equatorial Guinea or intermediate but relatively stable energy trajectories, such as Cameroon and Congo. Thus, the hierarchical classification accurately reflects the contrasts between highly disadvantaged, intermediate, and more energy-efficient countries in the Central African region.





MULTIPLE LINEAR REGRESSION

Finally, we fit a multiple linear regression model to explore the relationships between Access_Elec (electricity access) and various predictors.

Calculation of regression coefficients

# Fit multiple linear regression
regression <- lm(Access_Elect ~ Total_Pop + Elect_Gen + Elec_Demand + Rural_Pop + Pop_Growth + HDI + GDP_Per_Capita + Fossil_fuels_elect_gen + Hydroelectricity_gen, data = donnees_csv)
print(summary(regression))
## 
## Call:
## lm(formula = Access_Elect ~ Total_Pop + Elect_Gen + Elec_Demand + 
##     Rural_Pop + Pop_Growth + HDI + GDP_Per_Capita + Fossil_fuels_elect_gen + 
##     Hydroelectricity_gen, data = donnees_csv)
## 
## Residuals:
##                 Cameroon    Republic of the Congo                      DRC 
##                 -0.45629                 -0.16779                 -0.08937 
##                    Gabon                     Chad Central African Republic 
##                 -0.26422                  1.67162                  1.38688 
##        Equatorial Guinea                   Angola                   Rwanda 
##                  0.02580                  0.29476                  2.39262 
##                  Burundi    São Tomé and Príncipe 
##                 -4.32482                 -0.46918 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)
## (Intercept)            -9.896e+00  3.970e+01  -0.249    0.844
## Total_Pop               1.964e-06  7.106e-07   2.763    0.221
## Elect_Gen              -9.286e-02  3.373e-02  -2.753    0.222
## Elec_Demand             1.068e-02  1.192e-02   0.896    0.535
## Rural_Pop              -5.320e-01  2.255e-01  -2.359    0.255
## Pop_Growth             -2.065e+01  1.127e+01  -1.832    0.318
## HDI                     2.361e+02  4.705e+01   5.017    0.125
## GDP_Per_Capita         -1.845e-03  2.421e-03  -0.762    0.585
## Fossil_fuels_elect_gen  8.264e+01  2.925e+01   2.826    0.217
## Hydroelectricity_gen    8.124e+01  2.401e+01   3.384    0.183
## 
## Residual standard error: 5.456 on 1 degrees of freedom
## Multiple R-squared:  0.9964, Adjusted R-squared:  0.9644 
## F-statistic: 31.06 on 9 and 1 DF,  p-value: 0.1384
Our model is statistically significant, with a p-value below 5% (0.1384). However, only the variable Hydroelectricity_gen and HDI explain Access_Elec with a p-value below 5% (0.183 and 0.125). To improve the significance of our model, we will successively remove variables with high p-values.



regression5 <- lm(Access_Elect ~ Hydroelectricity_gen + HDI, data = donnees_csv)
print(summary(regression5))
## 
## Call:
## lm(formula = Access_Elect ~ Hydroelectricity_gen + HDI, data = donnees_csv)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.8533  -5.4869  -0.4644   5.8651  13.6879 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          -92.2424    15.1983  -6.069 0.000299 ***
## Hydroelectricity_gen  -0.8533     0.5783  -1.476 0.178278    
## HDI                  262.3808    27.6107   9.503 1.24e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.183 on 8 degrees of freedom
## Multiple R-squared:  0.9192, Adjusted R-squared:  0.899 
## F-statistic: 45.51 on 2 and 8 DF,  p-value: 4.26e-05
We obtained a more significant model with a p-value of 0.0000426. Additionally, it consists of variables that are all significant, with p-values below 5%. Furthermore, our model achieved a determination coefficient R² = 0.9192 (close to 1), indicating the quality of our fit.



Regression graphs

# Plot regression diagnostics
plot(regression5,which = 1)
The random distribution of points supports the model’s validity, but the presence of outliers, particularly for Cameroon, Republic of the Congo, and Equatorial Guinea, suggests that additional investigation into these cases may be warranted.



# Plot regression diagnostics
plot(regression5,which = 2)

We observe that the points generally follow a straight line, although there are some deviations, particularly for Cameroon, Republic of the Congo, and Ecuadorian Guinea. This suggests an overall normal distribution, thus demonstrating the quality of our model.



Predictions

We can use the model to make predictions for Access_Elec based on the values of the predictor variables.

# Make predictions
predictions <- predict(regression)
pander(predictions)
Table continues below
Cameroon Republic of the Congo DRC Gabon Chad
71.46 50.77 21.59 93.76 10.03
Table continues below
Central African Republic Equatorial Guinea Angola Rwanda Burundi
14.31 66.97 48.21 48.21 14.62
São Tomé and Príncipe
78.47
The model appears to be quite close to the actual values for several countries (for example, Cameroon, Republic of the Congo, DRC, Burundi, São Tomé and Príncipe), but there are notable discrepancies for certain countries (for example, Chad, Angola, Central African Republic, Equatorial Guinea, Rwanda).



CONCLUSION

In this analysis, we examined the challenges of electrification in Central Africa by applying statistical techniques to a set of socioeconomic, demographic, and energy variables. Principal Component Analysis (PCA) reduced the complexity of the dataset and identified the major dimensions that structure regional disparities. The results highlight the crucial role of GDP per capita, electricity production, access to electricity (urban and rural), and demographic characteristics in differentiating the countries of the region.

The PCA revealed that the first two dimensions capture most of the variability between countries, contrasting, on the one hand, states with relatively high economic and energy capacity, and on the other hand, those facing structural weaknesses, high levels of rural population, or significant population growth. Cluster analysis, when combined with PCA results, reveals distinct national profiles, reflecting heterogeneous levels of electrification, economic development, and territorial organization. The results also suggest that GDP per capita remains a key explanatory factor for access to electricity in the region, thus confirming dynamics already observed in other African contexts.

Overall, these results provide crucial insights into the persistent disparities in electrification across Central Africa and underscore the need for targeted policies, particularly to strengthen rural electrification, improve energy efficiency, and diversify generation sources, especially through renewable energy. Future research could incorporate longitudinal data to analyze changes over time, or include institutional and policy variables to better understand the influence of governance on electrification progress.