Electric Energy in African Countries: Production, Consumption, Imports, and Inflation.

Author

Merveille Kuendzong

Published

April 14, 2024

Introduction

Countries across the world vary greatly in numerous aspects, including socioeconomic status, type of government, culture, currency, population diversity, and management of natural energy resources. This work essentially focuses on the analysis of electric energy in African countries, specifically examining energy production, consumption, imports, and their correlation with inflation. This topic is meaningful to me as I am originally from Africa and am aware of the energy challenges faced by populations in many African countries.

Data

The dataset I am working with was obtained from web scraping, and contains information on all the countries in the world. However, I have focused my work on the analysis of African countries. My original dataset contains 64 variables, but I have used 15 of them for this analysis: “country”: The name of the country. “region”: The region to which the country belongs. “latitude”: The latitude of the country. “longitude”: The longitude of the country. “inflation”: The inflation rate in the country. “internet_pct”: The percentage of the population in the country that has access to the internet. “electricity_access_pct”: The percentage of the population in the country that has access to electricity. “alternative_nuclear_energy_pct”: The percentage of total electricity consumption in the country that comes from alternative energy sources. “electricity_production_coal_pct”: The percentage of total electricity production in the country that comes from coal power. “electricity_production_hydroelectric_pct”: The percentage of total electricity production in the country that comes from hydroelectric power. “electricity_production_gas_pct”: The percentage of total electricity production in the country that comes from natural gas power. “electricity_production_nuclear_pct”: The percentage of total electricity production in the country that comes from nuclear power. “electricity_production_oil_pct”: The percentage of total electricity production in the country that comes from oil power. “electricity_production_renewable_pct”: The percentage of total electricity production in the country that comes from renewable energy sources. “energy_imports_pct”: The percentage of the country’s total energy that is imported from other countries. A negative percentage might indicate that the country is a net exporter of energy.

Load Libraries

library(tidyverse)
Warning: package 'dplyr' was built under R version 4.3.2
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.3     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(ggfortify)
library(ggfortify)
library(RColorBrewer)
library(leaflet)
Warning: package 'leaflet' was built under R version 4.3.3
library(viridis)
Loading required package: viridisLite
library(plotly)
Warning: package 'plotly' was built under R version 4.3.2

Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout

Load Data

# set working directory
setwd("C:/Users/kmerv_6exilcx/Dropbox/SPRING 2024/Data 110/week11/project2")
countries <- read_csv('AllCountries.csv')

# display the first six rows
head(countries)
# A tibble: 6 × 64
  country   country_long currency capital_city region continent demonym latitude
  <chr>     <chr>        <chr>    <chr>        <chr>  <chr>     <chr>      <dbl>
1 Afghanis… Islamic Sta… Afghan … Kabul        South… Asia      Afghan      33  
2 Albania   Republic of… Albania… Tirana       South… Europe    Albani…     41  
3 Algeria   People's De… Algeria… Algiers      North… Africa    Algeri…     28  
4 Andorra   Principalit… Euro     Andorra la … South… Europe    Andorr…     42.5
5 Angola    People's Re… Angolan… Luanda       Middl… Africa    Angolan    -12.5
6 Antigua … Antigua and… East Ca… Saint John's Carib… Americas  Antigu…     17.0
# ℹ 56 more variables: longitude <dbl>, agricultural_land <dbl>,
#   forest_area <dbl>, land_area <dbl>, rural_land <dbl>, urban_land <dbl>,
#   central_government_debt_pct_gdp <dbl>, expense_pct_gdp <dbl>, gdp <dbl>,
#   inflation <dbl>, self_employed_pct <dbl>, tax_revenue_pct_gdp <dbl>,
#   unemployment_pct <dbl>, vulnerable_employment_pct <dbl>,
#   electricity_access_pct <dbl>, alternative_nuclear_energy_pct <dbl>,
#   electricty_production_coal_pct <dbl>, …

Filter data

# filter african countries
af_countries = countries |>
  filter(continent == "Africa") 
# select only the columns I will work with
data <- af_countries[, c(1, 5, 8, 9, 18, 50, 23, 24, 25, 26, 27, 28, 29, 30, 31)]
head(data)
# A tibble: 6 × 15
  country      region          latitude longitude inflation internet_pct
  <chr>        <chr>              <dbl>     <dbl>     <dbl>        <dbl>
1 Algeria      Northern Africa     28        3         9.27        70.8 
2 Angola       Middle Africa      -12.5     18.5      25.8         32.6 
3 Benin        Western Africa       9.5      2.25      1.35        34.0 
4 Botswana     Southern Africa    -22       24        11.7         73.5 
5 Burkina Faso Western Africa      13       -2        14.3         21.6 
6 Burundi      Eastern Africa      -3.5     30        18.8          5.80
# ℹ 9 more variables: electricity_access_pct <dbl>,
#   alternative_nuclear_energy_pct <dbl>, electricty_production_coal_pct <dbl>,
#   electricty_production_hydroelectric_pct <dbl>,
#   electricty_production_gas_pct <dbl>,
#   electricty_production_nuclear_pct <dbl>,
#   electricty_production_oil_pct <dbl>,
#   electricty_production_renewable_pct <dbl>, energy_imports_pct <dbl>

Relationship between Electricity Access and Internet Access in African Countries

Scatterplot of Electricity Access to Internet Access

m_plot <- ggplot(data, aes(x = electricity_access_pct, y = internet_pct, color=region, text = paste("country:", country))) +
  theme_minimal(base_size = 12, base_family = "serif") + 
  geom_point(size = 3, alpha = 0.5) +
  geom_smooth(method=lm, se=FALSE, lty = 5, linewidth = 0.2) +
  scale_color_brewer(palette = "Set1") +
  labs(x="Percentage of populations that have access to electricity", 
       y="Percentage of populations that have access to Internet",
       title = "Scatterplot of Electricity Access to Internet Access",
       caption = "Source: Web Scraping")
m_plot <- ggplotly(m_plot)
`geom_smooth()` using formula = 'y ~ x'
m_plot

Correlation and Linear Regression Model

cor(data$internet_pct, data$electricity_access_pct)
[1] 0.7972618
fit = lm(internet_pct ~ electricity_access_pct, data = data)
summary(fit)

Call:
lm(formula = internet_pct ~ electricity_access_pct, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-33.776  -4.176   1.334   9.571  25.161 

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)             0.98903    4.50272   0.220    0.827    
electricity_access_pct  0.68367    0.07178   9.524 5.44e-13 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 14.02 on 52 degrees of freedom
Multiple R-squared:  0.6356,    Adjusted R-squared:  0.6286 
F-statistic: 90.71 on 1 and 52 DF,  p-value: 5.445e-13

Correlation is equal to 0.7972618, meaning that there is a strong positive linear relationship between the percentage of populations that have access to electricity and the percentage of populations that have access to the Internet. However, correlation does not imply causation, so while the two variables are related, this does not necessarily mean that changes in one directly cause changes in the other.

The model has the equation: internet_pct = 0.68367(electricity_access_pct) + 0.98903

The slope may be interpreted as: For each additional percent of electricity access (electricity_access_pct), there is a predicted increase of 0.68367 percent of Internet access.

The p-value on the right of electricity_access_pct is very low (5.44e-13) and has 3 asterisks which suggests it is a meaningful variable to explain the linear increase in internet_pct. The Adjusted R-Squared value states that about 62.86% of the variation in the observations may be explained by the model. It is quite high, which indicates that the model’s explanatory power is relatively consistent.

Diagnostic plots

autoplot(fit, 1:4, nrow=2, ncol=2)

The non horizontal pattern or trend of the Residual vs Fitted plot may suggest violations of the assumption of constant variance. Both Residual and Normal Q-Q plots show observations 28 and 44 have an effect on those plot as well as having high scale-location values. Those observations correspond to the countries ‘Libya’ and ‘Somalia’ that have a internet_pct very low than electricity_acces_pct.

Regional Distribution of Energy Access and Energy Imports

# mean electricity_access_pct for each region
energy_access <- data |>
  group_by(region) |>
  summarise(mean_electricity_access = mean(electricity_access_pct, na.rm = TRUE))

ggplot(data = energy_access, aes(x = region, y = mean_electricity_access, fill = region)) +
  geom_bar(stat="identity")+
  labs(x = "Region", y = "Average Electricity Access (%)", fill = "Region",
         title = "Average Electricity Access by Region", caption = "Source: Web Scraping") +
  theme_dark() +
  scale_fill_brewer(palette = "Set1") 

This graph shows that in Middle Africa, less than 50% of the population has access to electricity (the lowest percentage among the regions), and there are no regions where 100% of the population has access to electricity.

# mean energy_imports_pct for each region
energy_imports <- data |>
  group_by(region) |>
  summarise(mean_energy_import = mean(energy_imports_pct, na.rm = TRUE))


ggplot(data = energy_imports, aes(x = region, y = mean_energy_import, fill = region)) +
  geom_bar(stat="identity")+
  labs(x = "Region", y = "Average Energy Imports (%)", fill = "Region",
         title = "Average Energy Imports by Region", caption = "Source: Web Scraping") +
  theme_dark()+
  scale_fill_brewer(palette = "Set1") 

This graph shows that the average energy imports in Middle Africa are unusually high and negative (nearly -400), suggesting that the region exports four times the energy it consumes. This is surprising given the low percentage of the population with access to electricity.

Energy Access, Production, and Imports Stats in African Countries

# select columns needed and rows containing non na values, and order the data based on countries names
countr <- data[,c(1, 7, 8, 9, 10, 11, 12, 13, 14, 15)]
countr <-na.omit(countr)
countr <- countr[order(countr$country),]
countr <- as.data.frame(countr)
# rename columns to give shorter names
countr <- countr |>
  rename(
    elAcc = electricity_access_pct, 
    altNucl = alternative_nuclear_energy_pct,
    pcoal = electricty_production_coal_pct, 
    phydro = electricty_production_hydroelectric_pct,
    pgas = electricty_production_gas_pct,
    pnucl = electricty_production_nuclear_pct,
    poil = electricty_production_oil_pct,
    prenew = electricty_production_renewable_pct,
    enImp = energy_imports_pct
  )

row.names(countr) <- countr$country
countr <- countr[,c(2:10)]
# matrix of data without nas
countr_matrix <- data.matrix(countr)
# Heatmap of energy stats
heatmap(countr_matrix, 
                       Rowv=NA, 
                       Colv=NA, 
                       col = viridis(30),  
                       scale="column", 
                       margins=c(5,10),
                       xlab = "Energy access and production Stats",
                       ylab = "African countries",
                       main = "Energy Stats in African Countries")

We observe that South Sudan has the lowest access to electricity and the lowest energy import percentage (indicating the highest export), which is surprising. Namibia has the highest percentage of electricity consumption coming from nuclear power. South Africa has a significant portion of its electricity production coming from nuclear power, a contrast to other countries where it is nonexistent. Kenya leads in the production of electricity from renewable energy sources.

Data Map: Inflation Rate in each African Country

# latitude and longitude of a a country located in Middle Africa: 'Cameroon'
long <- 12
lat <- 6
#Popup creation for my map

pop <- paste0(
      "<b>Country: </b>", data$country, "<br>",
      "<b>Region: </b>", data$region, "<br>",
      "<b>Inflation: </b>", data$inflation, "<br>",
      "<b>Latitude: </b>", data$latitude, "<br>",
      "<b>Longitude: </b>", data$longitude, "<br>",
      "<b>Electricity access %: </b>", data$electricity_access_pct, "<br>",
      "<b>Energy import %: </b>", data$energy_imports_pct, "<br>"
    )
# map
my_map <- leaflet() |>
  setView(lng = long, lat = lat, zoom = 4) |>
  addProviderTiles("Esri.NatGeoWorldMap") 

palette <- colorFactor(palette = "Set1", domain = data$region)


my_map <- my_map|>
  addCircles(
    data = data,
    radius = data$inflation*3500, # radius is based on percentage of inflation
    color = ~palette(region), 
    fillColor = ~palette(region),
    #Add popup
    popup = pop
  )
Assuming "longitude" and "latitude" are longitude and latitude, respectively
# addlegend for colors
my_map <- addLegend(my_map, 
                    position = "bottomleft", 
                    pal = palette, 
                    values = data$region, 
                    title = "Region")

my_map

We can observe that some countries, such as Angola in the Middle Africa, experience high inflation rates (25.7%) despite having a high percentage of energy exports (-541% of energy imports) and a low percentage of electricity access (48%). The same is true for Sudan in the Northern Africa, which experiences an inflation rate of 138.8%.

Conclusion

This dataset was suitable for analyzing the management of natural energy resources in specific regions of the world. The scatterplot and correlation coefficient showed a strong positive relationship between electricity access and internet access in African countries. The linear model further confirmed that the percentage of the population with access to electricity is a reliable predictor of the percentage of the population with access to the internet. Bar graphs revealed that Middle Africa, on average, has the lowest electricity access, yet surprisingly, it is also the region that exports the most energy. The heatmap reveals patterns and relationships among statistical variables, providing insights into energy access, production, and imports across African countries. Additionally, the map depicted the variation in inflation rates among these countries. Cleaning the dataset before working with it was unnecessary, as all variable names were already lowercase and clearly named. However, renaming them for better readability in the heatmap was required. I filtered the data to focus only on African countries and removed NA values before conducting computations and generating the heatmap. Working with this dataset provided an excellent opportunity for practicing visualization and gave me valuable insights.