World Energy Consumption

Submitted by : Michelle Vava (201706245), Francis Jose (202106946), Goutham Regu (202005622)

Team Roles

  • Michelle Vava (201706245) – Team Lead, delegating tasks, making edits to submissions
  • Francis Jose (202106946) – Data wrangling and analysis
  • Goutham Regu (202005622) – Data transform and visualization

Abstract

The world needs energy because it is a basic human need as result it is important to find out where energy comes from and whether it is accessible to everyone through electricity. No electricity means no refrigeration of food, no washing machine, and no light at night. Secondly with access to electricity the energy sources currently emit high carbon emissions which is leading to climate change. This project will focus on investigating energy consumption through tidying, mutating ,and visualizing data from various sources and building models to assess whether we are all countries have access to electricity and if we are doing anything to reduce carbon emissions.

Introduction

The world consumes about 580 million terajoules of energy a year. 83% of this energy comes from fossil fuels which produces greenhouse emissions. Furthermore, hundreds of millions lack access to energy entirely. Although countries are taking initiatives to increase the use of renewables there continues to be a rising demand for fossil fuels and the world lacks safe, low-carbon , cheap large-scale energy alternatives to use.

From these observations, it raises the following questions about energy consumption :

  • Who has access to energy?
  • What share of people do not have access to electricity?
  • Which countries extract the most energy?
  • How much energy is used per capita?
  • How much carbon emissions per capita are there from the energy consumed?
  • What is the primary source of energy country by country?
  • What are the safest and cleanest sources of energy?
  • How has the mix of energy consumption changed over time?
  • What sources do we get energy from?
  • Is the world making progress decarbonizing energy?
  • How has the world’s energy sources changed over the last two centuries?
  • Which share of the primary energy source is fossil fuels or renewable or nuclear?
  • Can we reduce the ratio of energy to GDP?

We will then use the answers to these questions to answer our main question: Is energy accessible to everyone and how close are we to getting rid of fossil fuels??

Dataset

Our main our data source will be Kaggle. We selected this dataset because it has categorical data columns which would be useful in data visualization and get valuable insights as the data is also updated regularly by Our World in Data. The dataset has 122 columns with metrics like primary energy per capita, growth rates, energy mix and electricity mixed ,etc. We are aware that some columns have null values and intend to handle such use cases in our analysis. We have data dating from 1990 to 2020 with 242 unique countries included. Our second data source is Our World in Data with multiple datasets related to which countries have access to electricity and the GDP per capita share with metrics such electricity per capita, access to electricity and others. We wanted to include this to our analysis to enrich our results and help gather more plots to provide a better conclusion on the issue about energy consumption.

Method of Analysis

  1. We will firstly determine the Primary Energy source from the 1800s to 2020.In addition we will also illustrate this using a world map.
  2. The consumption of energy based on carbon emissions vs GDP per capita .
  3. The number of people without electricity
  4. How much energy people consume per capita
  5. How many people do not have access to clean fuels for cooking?
  6. What are the safest and cleanest sources of energy?
  7. Electricity production from fossil and renewable sources
  8. Future trends in the electricity production and consumption in world and in Europe
  9. Usage of secondary energy sources in different countries
  10. Clustering of countries based on secondary sources

Results

Analysis 1: What is the primary source of energy?

# energy consumed per year
global_energy_substitution <- read_csv("World Energy Consumption Datasets/global-energy-substitution.csv")
energy_consumed <- select(global_energy_substitution,-Entity,-Code)
energy_consumed_result_line_graph <- energy_consumed %>%
  pivot_longer(-c(Year)) %>%
  ggplot(aes(x=Year,y=value,group=name, color=name, fill = name)) +
  geom_line() +  scale_x_continuous(breaks=seq(min(energy_consumed$Year),max(energy_consumed$Year),10),guide = guide_axis(check.overlap = TRUE) ) +
  labs(title="Global primary energy consumption by source", 
       caption="World Energy Consumption",y = "Twh",x="Year")

energy_consumed_result_area_graph <- energy_consumed %>%
  pivot_longer(-c(Year)) %>%
  ggplot(aes(x=Year,y=value,group=name, color=name, fill = name)) +
  geom_area() +  scale_x_continuous(breaks=seq(min(energy_consumed$Year),max(energy_consumed$Year),10),guide = guide_axis(check.overlap = TRUE) ) +
  labs(title="Global primary energy consumption by source", 
       caption="World Energy Consumption",y = "Twh",x="Year")
energy_consumed_result_area_graph

energy_consumed_result_line_graph

  • This is the primary energy consumed between 1800 to 2020.
  • Until the mid-19th century, traditional biomass. Traditional biomass is the burning of solid fuels such as wood, crop waste, or charcoal which was the dominant source of energy used across the world.
  • Only in 1960s that there is the use of nuclear energy and renewables such as wind & solar in the 1980s.
  • During the Industrial Revolution came the rise of coal, followed by oil, gas and now hydropower.
  • We get the largest amount of our energy from oil, followed by coal, gas, then hydroelectric power which shows that the global energy mix is still dominated by fossil fuels. Fossils fuels still account for 80% of the energy used in 2020.
  • It has taken many decades for a particular energy source to become dominant
  • Further investigation: we want to assess which countries have transitioned energy sources and whether that has been a slow or fast transition.

Analysis 2: The number of people without electricity more than halved over the last 20 years

#The number of people without electricity more than halved over the last 20 years

number_of_people_with_and_without_electricity_access <- read_csv("World Energy Consumption Datasets/number-of-people-with-and-without-electricity-access.csv")
number_of_people_with_and_without_electricity_access_filtered <- number_of_people_with_and_without_electricity_access %>%
  filter(Entity %in% c("World"))
number_of_people_with_and_without_electricity_access_display <- select(number_of_people_with_and_without_electricity_access_filtered,-Entity,-Code) %>%
  filter(Year %in% c(1998,2000,2002,2004,2006,2008,2010,2012,2014,2016,2019))  %>%
  pivot_longer(-c(Year)) %>%
  ggplot(aes(x=Year,y=value,group=name, color=name, fill = name)) +
  geom_bar(stat = "identity") +  scale_x_continuous(breaks=seq(min(number_of_people_with_and_without_electricity_access_filtered$Year),max(number_of_people_with_and_without_electricity_access_filtered$Year),10),guide = guide_axis(check.overlap = TRUE) ) +
  labs(title="Number of people with and without electricity access, World",y = "Billion",x="Year")

number_of_people_with_and_without_electricity_access_display

  • number of people without access to electricity was more than double what it is today
  • In 2019, an estimated 761 million people did not have electricity. Two decades ago, more than 1.6 billion people were in this position.
  • More than three-quarters of those who do not have access to electricity live in Sub-Saharan Africa.  * It does not guarantee that someone has the electricity needed to maintain a high standard of living. But it does provide a useful measure of how many people in the world have access to electricity for the most basic of uses.

Analysis 3: How much energy do people consume per capita

#how much energy do people consume per capita?

per_capita_energy_use_data <- read_csv("World Energy Consumption Datasets/per-capita-energy-use.csv")

world <- map_data('world') %>%
  filter(region != 'Antarctica')

gapminder_data_3 <- per_capita_energy_use_data  %>%
  inner_join(maps::iso3166 %>%
               select(a3, mapname), by= c(Code = "a3")) %>%
  mutate(mapname = str_remove(mapname, "\\(.*"))

per_capita_energy_use_result <- map_data("world") %>%
  as_tibble() %>%
  inner_join(gapminder_data_3, by=c(region= "mapname")) %>%
  filter(Year %in% c(2021)) %>%
  ggplot(aes(long, lat, group= group, fill= primary_energy_consumption_per_capita )) +
  geom_polygon(color = "white", size = 0.05, alpha = 0.8) +
  scale_fill_viridis(
    option= "magma",
    direction = -1,
    name = "years",
    guide =guide_colorbar(
      direction = "horizontal",
      barheight = unit(2, units = "mm"),
      barwidth = unit(50, units = "mm"),
      draw.ulim = F,
      title.position = "top",
      title.hjust = 0.5,
      label.hjust = 0.5
    )) +
  theme_void() +
  facet_wrap(~Year) +
  labs(title="Energy Use per person,2021")  +
  coord_fixed (ratio = 1.3) +
  theme(plot.title=element_text(size = 16,
                                hjust = 0.5),
        legend.position = "bottom")

per_capita_energy_use_result

#Plot a curves of energy consumption per capita
energy_consumed_curves_result <- read_csv("World Energy Consumption Datasets/per-capita-energy-use.csv") %>%
  filter(Entity %in% c("Sweden","Nigeria","India","United States","China","Brazil","World","United Kingdom")) %>%
  ggplot(aes(y = primary_energy_consumption_per_capita, x = Year,group=Entity, color=Entity, fill = Entity)) +
  geom_point() + geom_line() +
  labs(y = "kWh", x = "Year")

energy_consumed_curves_result

  • The average American consumes about the same amount of energy in one month as the average Indian consumes in an entire year. The average Brit consumes double that of the average Brazilian.
  • very poorest countries in the world, energy consumption is so low that it hardly registers.

Analysis 4 : Which countries extract the most energy based on GDP?

# access to electricty vs gdp per capita
access_to_electricity_vs_gdp_per_capita <- read_csv("World Energy Consumption Datasets/access-to-electricity-vs-gdp-per-capita.csv")
access_to_electricity_vs_gdp_per_capita_plot <- ggplot(data=na.omit(access_to_electricity_vs_gdp_per_capita),aes(x=gdp,y=access,color=Continent,size=`Population (historical estimates)`,label=Entity)) + geom_point() + 
  labs(x = "GDP per capita",  y = "Consumption-based emissions per capita", title = "Consumption-based CO₂ emissions per capita vs GDP per capita,2019")

# ggploty
access_to_electricity_vs_gdp_per_capita_display <- ggplotly(access_to_electricity_vs_gdp_per_capita_plot)

access_to_electricity_vs_gdp_per_capita_plot

This scatter plot shows the carbon emissions per capita on the vertical axis against the average income of the speficifed country on the horizontal axis with the size of the circle representing the population size.

Attached is a link with a interactive display of the scatter plot showing the information of country plotted.

Interactive Display Link : (https://rpubs.com/MichelleVava/960877)

Analysis :

  • people in the richest countries have the very highest emissions.
  • Greenhouse gas emissions are still too high for countries with incomes greater than $25000 per capital especially if we want to avoid severe climate change.
  • People in poor countries have low emissions such as Ethopia,Uganda, Malawi, and others.
  • To bring climate change to an end the gas emissions need to be reduced
  • People in poor countries have low emissions

Next steps : We still want to investigate why emissions are low in power countries. Does this mean the poor countries use clean energy? Or do they have access modern energy and technology?

Analysis 5 : Share of population that has access to electricity and clean fuels

# share of population with access to electricity
share_of_the_population_with_access_to_electricity <- read_csv("World Energy Consumption Datasets/share-of-the-population-with-access-to-electricity.csv")


world <- map_data('world') %>%
  filter(region != 'Antarctica')

gapminder_data_1 <- share_of_the_population_with_access_to_electricity %>%
  inner_join(maps::iso3166 %>%
               select(a3, mapname), by= c(Code = "a3")) %>%
  mutate(mapname = str_remove(mapname, "\\(.*"))

access_to_electricity_map <- map_data("world") %>%
  as_tibble() %>%
  inner_join(gapminder_data_1, by=c(region= "mapname")) %>%
  filter(Year %in% c(2019)) %>%
  ggplot(aes(long, lat, group= group, fill= Access_to_electricity)) +
  geom_polygon(color = "white", size = 0.05, alpha = 0.8) +
  scale_fill_viridis(
    option= "magma",
    direction = -1,
    name = "years",
    guide =guide_colorbar(
      direction = "horizontal",
      barheight = unit(2, units = "mm"),
      barwidth = unit(50, units = "mm"),
      draw.ulim = F,
      title.position = "top",
      title.hjust = 0.5,
      label.hjust = 0.5
    )) +
  theme_void() +
  facet_wrap(~Year) +
  labs(title="Access to electricity")  +
  coord_fixed (ratio = 1.3) +
  theme(plot.title=element_text(size = 16,
                                hjust = 0.5),
        legend.position = "bottom")


access_to_electricity_map

  • 59% of the world population have access to clean fuels for cooking. The remaining 41% do not. his means 3 billion people do not have access to clean cooking fuels.
  • People who do not have access to clean cooking fuels have to rely on substitutes like wood and charcoal
  • Burning these fuels leads to millions of deaths. It emits harmful pollution.
# access to clean fuels for cooking?
access_to_clean_fuels_and_technologies_for_cooking <- read_csv("World Energy Consumption Datasets/access-to-clean-fuels-and-technologies-for-cooking.csv")
world <- map_data('world') %>%
  filter(region != 'Antarctica')

gapminder_data_2 <- access_to_clean_fuels_and_technologies_for_cooking  %>%
  inner_join(maps::iso3166 %>%
               select(a3, mapname), by= c(Code = "a3")) %>%
  mutate(mapname = str_remove(mapname, "\\(.*"))

access_to_clean_fuels_map <- map_data("world") %>%
  as_tibble() %>%
  inner_join(gapminder_data_2, by=c(region= "mapname")) %>%
  filter(Year %in% c(2020)) %>%
  ggplot(aes(long, lat, group= group, fill= Indicator)) +
  geom_polygon(color = "white", size = 0.05, alpha = 0.8) +
  scale_fill_viridis(
    option= "viridis",
    direction = -1,
    name = "years",
    guide =guide_colorbar(
      direction = "horizontal",
      barheight = unit(2, units = "mm"),
      barwidth = unit(50, units = "mm"),
      draw.ulim = F,
      title.position = "top",
      title.hjust = 0.5,
      label.hjust = 0.5
    )) +
  theme_void() +
  facet_wrap(~Year) +
  labs(title="Access to clean fuels for cooking")  +
  coord_fixed (ratio = 1.3) +
  theme(plot.title=element_text(size = 16,
                                hjust = 0.5),
        legend.position = "bottom")


#Number of people without access to clean fuels for cooking
number_without_clean_cooking_fuel <- read_csv("World Energy Consumption Datasets/number-without-clean-cooking-fuel.csv")
access_to_clean_fuels_display <- number_without_clean_cooking_fuel %>%
  filter(Entity %in% c("India","China","Nigeria","Kenya","Tunisia")) %>%
  ggplot(aes(y = number_without_clean_fuels_cooking, x = Year,group=Entity, color=Entity, fill = Entity)) +
  geom_point() + geom_line() +
  labs(y = "kWh", x = "Year",title="Number of people without access to clean fuels for cooking")

access_to_clean_fuels_display

access_to_clean_fuels_map

  • 59% of the world population have access to clean fuels for cooking. The remaining 41% do not. This means 3 billion people do not have access to clean cooking fuels.
  • People who do not have access to clean cooking fuels have to rely on substitutes like wood and charcoal
  • Burning these fuels leads to millions of deaths. It emits harmful pollution into the home where it has significant health impacts for those who breathe it in.

Analysis 6 : What are the safest and cleanest sources of energy?

Before the long-term impacts of climate change, it is important to analysis how each source stacks up in terms of short-term health risks.

# death tolls
death_rates_from_energy_production_per_twh <- read_csv("World Energy Consumption Datasets/death-rates-from-energy-production-per-twh.csv")
death_rates_result <- ggplot(death_rates_from_energy_production_per_twh, aes(x=Entity, y=Deaths_per_TWh_of_electricity_production,fill=Entity)) + 
  geom_bar(stat = "identity") +
  coord_flip()

death_rates_result

  • fossil fuels still dominate our global electricity mix, so we would expect that they would kill more people.
  • Fossil fuels and biomass kill many more people than nuclear and modern renewables per unit of electricity.
  • Coal is, by far, the dirtiest.
  • Examples of Impact : Chernobyl in Ukraine in 1986, and Fukushima in Japan in 2011.
  • Finally, we have solar and wind. The death rates from both sources are low, but not zero.A small number of people die in accidents in supply chains,ranging from helicopter collisions with turbines; fires during the installation of turbines or panels. nuclear and modern renewable energy sources are vastly safer and cleaner
  • Unfortunately, the global electricity mix is still dominated by fossil fuels

Analysis 7 : Electricity production from fossil and renewable sources

df <- read.csv("World Energy Consumption Datasets/electricity_production.csv")
df[is.na(df)] = 0

electricity_production <- df[,c('continent','year','biofuel_electricity','hydro_electricity',
                                'nuclear_electricity','solar_electricity',
                                'wind_electricity','other_renewable_electricity',
                                "fossil_electricity","renewables_consumption",
                                "fossil_fuel_consumption","renewables_electricity",
                                "coal_electricity","gas_electricity","oil_electricity",
                                "coal_consumption","gas_consumption",'oil_consumption')]


electricity_production<-electricity_production %>%
  select( continent,year,
          biofuel_electricity,
          hydro_electricity,
          nuclear_electricity,
          solar_electricity,
          wind_electricity,
          other_renewable_electricity,
          renewables_electricity,
          fossil_electricity,
          coal_electricity,
          gas_electricity,
          oil_electricity,
          renewables_consumption,
          fossil_fuel_consumption,
          coal_consumption,
          gas_consumption,
          oil_consumption)

# renewable electricity production

df2 <- mutate(electricity_production,total_renewable = (biofuel_electricity + hydro_electricity + 
                                                          nuclear_electricity + solar_electricity + 
                                                          wind_electricity + other_renewable_electricity))
grouped <- df2 %>%
  group_by(continent,year) %>%
  summarise(renewable_production= sum(total_renewable))

oil_coal_gas <- mutate(electricity_production, total_fossil = oil_electricity+gas_electricity+coal_electricity)
grouped1 <- oil_coal_gas%>%
  group_by(continent,year)%>%
  summarise(production = sum(total_fossil))

accumulate_by <- function(dat, var) {
  var <- lazyeval::f_eval(var, dat)
  lvls <- plotly:::getLevels(var)
  dats <- lapply(seq_along(lvls), function(x) {
    cbind(dat[var %in% lvls[seq(1, x)], ], frame = lvls[[x]])
  })
  dplyr::bind_rows(dats)
}

# df3 <- grouped
# fig <- df3 %>%
#   filter(between(year,1980,2019), continent %in% c("Asia", "Europe","Africa","Americas","Oceania"))
# fig <- fig %>% accumulate_by(~year)

fig1 <- electricity_production %>%
  group_by(continent,year)%>%
  summarise(production = sum(renewables_electricity))
fig1<-fig1 %>%
  filter(between(year,1980,2019), continent %in% c("Asia", "Europe","Africa","Americas","Oceania"))%>%
  accumulate_by(~year)

fig2 <- electricity_production %>%
  group_by(continent,year)%>%
  summarise(consumption = sum(renewables_consumption))
fig2<-fig2 %>%
  filter(between(year,1980,2019), continent %in% c("Asia", "Europe","Africa","Americas","Oceania"))%>%
  accumulate_by(~year)

fig3 <- electricity_production %>%
  group_by(continent,year)%>%
  summarise(consumption = sum(fossil_fuel_consumption))
fig3<-fig3 %>%
  filter(between(year,1980,2019), continent %in% c("Asia", "Europe","Africa","Americas","Oceania"))%>%
  accumulate_by(~year)

fig4 <- electricity_production %>%
  group_by(continent,year)%>%
  summarise(production = sum(fossil_electricity))
fig4<-fig4 %>%
  filter(between(year,1980,2019), continent %in% c("Asia", "Europe","Africa","Americas","Oceania"))%>%
  accumulate_by(~year)

fig5 <- grouped1
# fig5 %>%filter(fossil_production_sum > 0)
fig5<-grouped1 %>%
  filter(between(year,1980,2019), continent %in% c("Asia", "Europe","Africa","Americas","Oceania"))%>%
  accumulate_by(~year)

pltly1 <-plot_ly() %>%
  # add_trace(
  #   x = ~year,
  #   y = ~consumption,
  #   split = ~continent,
  #   frame = ~frame,
  #   type = 'scatter',
  #   mode = 'lines',
  #   data = fig2,
  #   opacity = 1.0
  # ) %>%
  # add_trace(
  #   x = ~year,
  #   y = ~consumption,
  #   split = ~continent,
  #   frame = ~frame,
  #   type = 'scatter',
  #   mode = 'lines',
  #   data = fig3,
  #   opacity = 0.5
  # ) %>%
  add_trace(
    x = ~year,
    y = ~production,
    split = ~continent,
    frame = ~frame,
    type = 'scatter',
    mode = 'lines',
    data = fig4,
    opacity = 1.0
  ) %>%
  add_trace(
    x = ~year,
    y = ~production,
    split = ~continent,
    frame = ~frame,
    type = 'scatter',
    mode = 'lines',
    data = fig1,
    opacity = 0.5
  ) %>%
  animation_opts(
    frame = 100,
    transition = 0,
    redraw = FALSE
  ) %>%
  layout(title = "Electricity production from fossil and renewable sources")%>%
  animation_slider(
    hide = T
  ) %>%
  animation_button(
    x = 1, xanchor = "right", y = 0, yanchor = "bottom"
  )

# pltly1

pltly2 <-plot_ly() %>%
  add_trace(
    x = ~year,
    y = ~consumption,
    split = ~continent,
    frame = ~frame,
    type = 'scatter',
    mode = 'lines',
    data = fig2,
    opacity = 1.0
  ) %>%
  add_trace(
    x = ~year,
    y = ~consumption,
    split = ~continent,
    frame = ~frame,
    type = 'scatter',
    mode = 'lines',
    data = fig3,
    opacity = 0.5
  ) %>%
  # add_trace(
  #   x = ~year,
  #   y = ~production,
  #   split = ~continent,
  #   frame = ~frame,
  #   type = 'scatter',
  #   mode = 'lines',
  #   data = fig4,
  #   opacity = 1.0
  # ) %>%
  # add_trace(
  #   x = ~year,
  #   y = ~production,
  #   split = ~continent,
  #   frame = ~frame,
  #   type = 'scatter',
  #   mode = 'lines',
  #   data = fig1,
  #   opacity = 0.5
  # ) %>%
  animation_opts(
    frame = 100,
    transition = 0,
    redraw = FALSE
  ) %>%
  layout(title = "Electricity consumption from fossil and renewable sources")%>%
  animation_slider(
    hide = T
  ) %>%
  animation_button(
    x = 1, xanchor = "right", y = 0, yanchor = "bottom"
  )

# pltly2

Renewable and fossil electricity production

  • Analysis shows the trends in the electricity production from fossil and renewable sources over the years
  • It is evident that the production in many continents shows different trend in the production of fossil fuel electricity.
  • Population explosion is the main reason for the increase in the production in fossil fuel electricity.
  • A notable trend is happening in the production of fossil fuel in Europe and Americas.
  • Both these continents shows a decreasing trend in fossil fuel production which indicates they are preparing for a transition to renewable energy sources.
  • Among both Europe shows a noticeable trend, there is a huge decline in the production of electricity.
  • Renewable production trend shows a tremendous increase in recent years. Which indicates the need for more energy and countries going more greener energy
  • Europe, Asia and Americas shows progressive trend in the production of electricity while developing nations still needs to make improvements.

Renewable and fossil electricity Consumption

  • Population is a main factor which affects the energy consumption. It is evident that Asia, being biggest in population their fossil fuel consumption surpasses the production.Which indicates to meet their needs they have to borrow energy from other countries.

  • Americas and Europe shows a decreasing trend in fossil fuel consumption

  • Among both Europe fuel consumption shows a tremendous decline in the past years.

  • On the other hand the renewable energy consumption shows increasing trend.

  • Interactive Display : (https://rpubs.com/Francis2707/981100)

Analysis 9 : Usage of secondary energy sources in different countries

library(FactoMineR)
library(factoextra)

wenergy <- read.csv ("World Energy Consumption Datasets/electricity_production.csv")
w1 <- select(filter(wenergy,country == "Canada",year > 2001),year,renewables_electricity)

# ggplot(data = w1, mapping = aes(x = renewables_electricity, y = year, color = factor(renewables_electricity))) +
#   geom_point()+
#   ggtitle("Renewable Energy Usage in Canada from 2005-2020")

w2 <- select(filter(wenergy,year >2000),country,biofuel_electricity,
             coal_electricity,fossil_electricity,gas_electricity,
             hydro_electricity,nuclear_electricity,oil_electricity,
             other_renewable_electricity,other_renewable_exc_biofuel_electricity,renewables_electricity,solar_electricity)
colnames(w2)[1] <- "region"
w2 <- w2 %>% 
  mutate(region = ifelse(as.character(region) == "United States", "USA", as.character(region)))
w2 <- w2 %>% 
  mutate(region = ifelse(as.character(region) == "United Kingdom", "UK", as.character(region)))
w2 <- w2 %>% 
  mutate(region = ifelse(as.character(region) == "Democratic Republic of Congo", "Democratic Republic of the Congo", as.character(region)))
w3 <- subset(w2, select = -region)
w3$emax <- colnames(w3)[max.col(w3)]

w31 <- subset(w3, select = -emax)
w31$maxe <-do.call(pmax, c(w31, na.rm=TRUE))

for(i in 1:nrow(w31)){
  for(j in colnames(w31)){
    if( isTRUE(w31) && w31[i,"j" ] == w31[i,"maxe"]){
      w31[i, "j"] <- 0
    }
  } 
  }
   
#second highest
w32 <- subset(w31, select = -maxe)
w32$maxe2 <-do.call(pmax, c(w32, na.rm=TRUE))
w312 <- subset(w31, select = -maxe)
w32$e2max <- colnames(w31)[max.col(w312)]
w32 <- cbind(w2["region"], w32[])

w4 <- cbind(w2["region"], w3["emax"])
#view(w4)

#Secondary energy source
ggplot(data = w32, aes(x=e2max))+
  geom_bar(aes(fill = e2max))+ 
  ggtitle("Secondary source of energy since 2001")+
  theme(axis.text.x = element_text(angle = 90,hjust = 1))

Filtered data for secondary energy source by getting rid of primary sources as most of the results wouldn’t be any different We were interested in secondary energy source type and its production around the world hence we did that by filtering out the data and eliminating primary source by finding it and making it zero using for loop.Visualization of secondary source of energy around the world can be seen here.

Analysis 10 : Clustering of countries based on secondary sources

library(factoextra)
set.seed(123)
w33 <- subset(w32, select = -c(1,13,14))
# ?kmeans

numeric_round_func <- function(x){
  round(as.numeric(as.character(x)),2)
}
w34 <- w32
w34 <- w34 %>%
  mutate_at(vars(-one_of("region", "e2max")), numeric_round_func)


w35 <- w34%>%
  drop_na()

set.seed(1234)

# Cluster plot using kmeans
kmeans_b <- kmeans(w35[,2:12], centers = 11)
kmeans_table <- data.frame(kmeans_b$size, kmeans_b$centers)
kmeans_df <- data.frame(Cluster = kmeans_b$cluster, w35)
#description 1
# head of dataframe after kmeans
head(kmeans_df)
#description2
# kmeans fancy
kmeans_f <- kmeans(scale(w35[,2:12]), 11 , nstart = 10)
# plotting clusters
fviz_cluster(kmeans_f, data = scale(w35[,2:12]), geom = c("point"),ellipse.type = "euclid")

#description 3
#plotting each type of electricity in each cluster
ggplot(data = kmeans_df, aes(y = Cluster)) +
  geom_bar(aes(fill = e2max)) +
  ggtitle("Count of Clusters by Secondary Source of Electricity") +
  theme(plot.title = element_text(hjust = 0.5))

K-Means clustering is basically clustering data into different clusters with their nearest mean. Applying K-Means clustering algorithm over the data frame for better analysis of data.You can see from the head of the data frame that secondary energy source of Afghanistan is renewables_electricity, and it belongs to cluster 4 depending on the amount in kWH produced.Here, K-Means clustering was used to cluster the secondary energy source production in each country. We can see that most of the data has lesser secondary source productions as most of the clusters are towards the left most end.The cluster towards the right is mostly comprising fossil electricity which is the most produced electricity type.Clustering helps in analyzing the dataset in a way that excludes outliers from the clusters.We achieved clustering analysis using factoextra library and functions like kmeans and fviz_cluster Following is the visualisation of clusters and their counts depending on the electricity type. Each bar shows different electricity types contained in each cluster depicting their production amount.Contradicting to expectation was fossil electricity is still the second highest source of energy in various countries.

Conclusion

We can see that based on our analysis that :

  • Fossil fuels is still the main energy source.
  • Low GDP per capita countries have very small emissions.
  • Those that have access to energy produce greenhouse emissions that are too high
  • The transition to more renewable energy sources has been slow.
  • More than three-quarters of those who do not have access to electricity live in Sub-Saharan Africa.
  • Fossil fuels and biomass kill many more people than nuclear and modern renewable per unit of electricity.
  • Very poor countries in the world, consume low energy that it hardly registers.
  • Europe’s renewable energy production shows a massive upsurge in the future years to meet the electricity needs.

References