The purpose of the project is to create an rMarkdown document that presents collected data in a visual form – maps, graphs and charts. What we aim to explore is the relationship between the different types of powerplants and greenhouse gases emissions. We use also additional information for each of the countries that we are inspecting, such as population, GDP and GDP per capita. With such information we will be able to, for example, visualize a country’s economic status based on the powerplants they use and their greenhouse emissions per capita.

1. Data

1.1. Data Description

In our project we are using 5 different data sets. Detailed description for all the sets is provided below.

Global Power Plants data description:

This dataset was created based on the annual ranking of the top 2000 public companies in the world compiled by World Resources Institute magazine. It covers approximately 30,000 power plants from 164 countries and includes thermal plants (e.g. coal, gas, oil, nuclear, biomass, waste, geothermal) and renewables (e.g. hydro, wind, solar). Obtained from Kaggle.

Country: Country name

Powerplant Name: Name or title of the power plant, generally in Romanized form

gppd_idnr: 10- or 12-character identifier for the power plant

Capacity (MW): Electrical generating capacity in megawatts

Latitude: Geolocation in decimal degrees

Longitude: Geolocation in decimal degrees

Primary Fuel: Main energy source used in electricity generation or export

Owner: Majority shareholder of the power plant, generally in Romanized form

Source: Entity reporting the data; could be an organization, report, or document, generally in Romanized form

CO2 and GHG emission data description:

In this dataset we have data on annual greenhouse gas emissions measured in tons for 232 countries since 1751. For majority of the timeline, we naturally do not have data, so we will focus on the modern times. This dataset consists of time series for each of the countries with no other variables. Obtained from Kaggle.

GDP, GDP per capita and population

We scrapped the data about GDP, GDP per capita and population from websites:

They all contain the table with list od the world countries and respectively values of searched variables.

1.2. Data Merging

# Importing data
emission_data       <- read.csv("emission_data.csv")
power_plant_data    <- read.csv("global_power_plant.csv")
gdp_per_capita_data <- read_excel("gdp_per_capita_data.xlsx")
gdp_data            <- read_excel("gdp_data.xlsx")
continents_data     <- read_excel("continents_regions.xlsx")

For the emission data we choose the years 2010 - 2017. Older records are incomplete for some countries and also we would like to focus on the most current data. Let’s also extract the data for the continents only.

# Keeping only emissions data for the years 2010-2017
emission_data <- emission_data[c("Country", "X2010", "X2011", "X2012", "X2013", "X2014", "X2015",
                                 "X2016", "X2017")]

# Creating a list of countries that are common for all data sets
common_countries <- Reduce(intersect, list(emission_data$Country, gdp_data$Country, gdp_per_capita_data$Country, unique(power_plant_data$Country), continents_data$Country))

# Dropping the observations for the countries outside of the common_countries list
emission_data <- emission_data[emission_data$Country %in% common_countries,]
gdp_data <- gdp_data[gdp_data$Country %in% common_countries,]
gdp_per_capita_data <- gdp_per_capita_data[gdp_per_capita_data$Country %in% common_countries,]
power_plant_data <- power_plant_data[power_plant_data$Country %in% common_countries,]
continents_data <- continents_data[continents_data$Country %in% common_countries,]

The next step was to check if all our data frames are of the appropriate type:

# str(emission_data)       : dataframe
# str(gdp_data)            : tibble
# str(gdp_per_capita_data) : tibble
# str(power_plant_data)    : dataframe
# str(continents_data)     : tibble

# Transforming tibbles into dataframes
gdp_data <- as.data.frame(gdp_data)
gdp_per_capita_data <- as.data.frame(gdp_per_capita_data)
continents_data <- as.data.frame(continents_data)

Now, we will merge some of the sets into a bigger ones to make plotting of the chosen information simpler.

# Ordering alphabetically by country
emission_data <- emission_data[order(emission_data$Country),]
gdp_data <- gdp_data[order(gdp_data$Country),]
gdp_per_capita_data <- gdp_per_capita_data[order(gdp_per_capita_data$Country),]

# Merging the data sets aboout GDP and emission
gdp <- merge(x = gdp_data, y = gdp_per_capita_data, by = "Country", all = TRUE)
continents_gdp <- merge(x = continents_data, y = gdp, by = "Country", all = TRUE)
total <- merge(x = continents_gdp, y = emission_data, by = "Country", all = TRUE)

# Dropping unuseful columns
total[6] <- NULL
total[8] <- NULL
power_plant_data[3] <- NULL

# Changing column names
colnames(total) <- c("Country", "ISO_code", "Region", "Continent","GDP_nominal", "GDP_growth",
                     "Population_2017", "GDP_worldshare", "GDP_pc_PPP", "GDP_pc_nominal",
                     "GDP_vs_world", "Emission_2010","Emission_2011","Emission_2012",
                     "Emission_2013","Emission_2014", "Emission_2015", "Emission_2016",
                     "Emission_2017")

colnames(power_plant_data) <- c("Country", "Name", "Capacity", "Latitude", "Longitude", "Primary_fuel",
                                "Owner", "Source")

# Merging all data sets into one long one
total_powerplant <- merge(x = power_plant_data, y = total, by = "Country", all = TRUE)

Finally, for visualization we will use two data sets: total that contains the information about the countries and total_powerplant that contains information about the power plants and the countries.

str(total)

## 'data.frame':    151 obs. of  19 variables:
##  $ Country        : chr  "Afghanistan" "Albania" "Algeria" "Angola" ...
##  $ ISO_code       : chr  "AFG" "ALB" "DZA" "AGO" ...
##  $ Region         : chr  "Southern Asia" "Southern Europe" "Northern Africa" "Middle Africa" ...
##  $ Continent      : chr  "Asia" "Europe" "Africa" "Africa" ...
##  $ GDP_nominal    : num  1.95e+10 1.30e+10 1.68e+11 1.22e+11 6.37e+11 ...
##  $ GDP_growth     : num  0.0267 0.0384 0.016 -0.0015 0.0285 0.075 0.0196 0.0304 0.001 0.0388 ...
##  $ Population_2017: num  36296113 2884169 41389189 29816766 43937140 ...
##  $ GDP_worldshare : num  0.0002 0.0002 0.0021 0.0015 0.0079 0.0001 0.0164 0.0052 0.0005 0.0004 ...
##  $ GDP_pc_PPP     : num  1976 12943 15293 6658 20829 ...
##  $ GDP_pc_nominal : num  538 4521 4048 4096 14508 ...
##  $ GDP_vs_world   : num  0.12 0.76 0.89 0.39 1.22 0.57 2.89 3.15 1.02 2.79 ...
##  $ Emission_2010  : num  1.00e+08 2.38e+08 3.13e+09 3.88e+08 6.59e+09 ...
##  $ Emission_2011  : num  1.13e+08 2.43e+08 3.25e+09 4.18e+08 6.78e+09 ...
##  $ Emission_2012  : num  1.23e+08 2.48e+08 3.38e+09 4.51e+08 6.97e+09 ...
##  $ Emission_2013  : num  1.33e+08 2.53e+08 3.51e+09 4.84e+08 7.16e+09 ...
##  $ Emission_2014  : num  1.43e+08 2.59e+08 3.66e+09 5.18e+08 7.36e+09 ...
##  $ Emission_2015  : num  1.53e+08 2.65e+08 3.81e+09 5.53e+08 7.57e+09 ...
##  $ Emission_2016  : num  1.65e+08 2.71e+08 3.96e+09 5.88e+08 7.77e+09 ...
##  $ Emission_2017  : num  1.79e+08 2.77e+08 4.11e+09 6.24e+08 7.98e+09 ...

2. Visualizations

2.1. Types of Fuel

Firstly, let’s analyze the set about power plants, more precisely what types of fuels are consumed in the world.

table(power_plant_data$Primary_fuel)

## 
##        Biomass           Coal   Cogeneration            Gas     Geothermal 
##           1395           2365             41           3861            189 
##          Hydro        Nuclear            Oil          Other        Petcoke 
##           7100            195           2275             44             13 
##          Solar        Storage          Waste Wave and Tidal           Wind 
##           5929             58           1087             10           5180

As we can see, there are 15 types of fuel, there are no NA values in that column. The most frequently used fuel is Hydro, meaning that the most popular source is water.

Pie Chart

Firstly, data frame containing frequencies for every type of fuel needs to be created. We will transoform the table into a data frame and arrange it by frequency. Next step was to group 4 least frequent types of fuels into one category Other. In order to create the sticks pointing out to a label on the plot, separate variable was created. Variable pos calculates the position around the pie chart wheel that indicates the tip of the label stick.

# Checking frequency of every type of fuel
fuel_types <- as.data.frame(table(total_powerplant$Primary_fuel))
colnames(fuel_types) <- c("Fuel", "Frequency")
fuel_types <- fuel_types[-1,] %>% arrange(-Frequency)

# Merging 4 types of fuel with the smallest frequencies with "Other"
fuels <- fuel_types
fuels$Fuel[c(10, 12, 13, 14)] <- "Other"
fuels <- fuels %>% group_by(Fuel) %>% summarise(Frequency = sum(Frequency)) %>% arrange(-Frequency)
fuels$Fuel <- factor(fuels$Fuel, levels = rev(as.character(fuels$Fuel)))

# Calculating the position of the label pointer
fuels$pos <- (cumsum(c(0, fuels$Frequency)) + c(fuels$Frequency / 2, .01))[1:nrow(fuels)]

Pie Chart

# Plotting world share of every type of fuel
ggplot(fuels, aes(x = "", y = Frequency, fill = Fuel)) +
       geom_bar(stat = "identity", width = 1, color = "white") +
       geom_text_repel(aes(x = 1.4,
                           y = pos, label = paste0(round((Frequency/sum(Frequency)*100), 2), " %")), 
                       nudge_x = .3, 
                       segment.size = .7) +
       guides(fill = guide_legend(reverse = TRUE)) +
       labs(x = NULL, 
            y = NULL, 
            title = "Pie Chart", 
            subtitle = "Percentage share of powerplants by fuel\n") +
       scale_fill_brewer(palette = "Paired") +
       coord_polar(theta = "y") +
       theme_void()

Table

# Table with the fuels frequencies
kable(fuel_types)

Fuel	Frequency
Hydro	7100
Solar	5929
Wind	5180
Gas	3861
Coal	2365
Oil	2275
Waste	1087
Nuclear	195
Geothermal	189
Storage	58
Other	44
Cogeneration	41
Petcoke	13
Wave and Tidal	10

As it can be observed on the pie chart, 3 most common fuel types are: Hydro, Solar and Wind. It is a positive observation, meaning the world invested in the renewable energies and they are dominating the market.

Treemap Chart

The treemap chart presents similar information as the pie chart but for every continent. We grouped the fuel types by continent and displayed in two ways - on the first chart, the size of the main boxes represents the amount of total emissions. It means that the continent with the biggest emissions will have the biggest rectangle on the graph. The other graph adjusts the size of the rectangles according to the total number of power plants located on that continent.

# Creating data frames for merging with types of fuel and emission
treemapset_fuel <- total_powerplant %>% select(Continent, Primary_fuel) %>%
                                        count(Continent, Primary_fuel)
treemapset_emission <- total %>% group_by(Continent) %>% summarise(Total_emission = sum(Emission_2017))

# Merging two data frames into one
treemapset <- merge(treemapset_fuel, treemapset_emission, by = "Continent", all = TRUE)

# Counting the number of powerplants per continent
treemapset_count <- treemapset %>% group_by(Continent) %>% summarise(No_power_plants = sum(n))

# Merging with previously created data frame
treemapset <- merge(treemapset, treemapset_count, by = "Continent", all = TRUE)
treemapset$Part_emission <- (treemapset$n / treemapset$No_power_plants) * treemapset$Total_emission

Emission

# Treemap with the size of boxes representing emissions
treemap <- treemap(treemapset,
                   index = c("Continent", "Primary_fuel"),
                   vSize = "Part_emission",
                   type = "index",
                   palette = "Set2",
                   border.col = "darkslategrey",
                   fontcolor.labels = "white",
                   title = "Types of fuel")

Count

# Treemap with the size of boxes representing the number of powerplants
treemap2 <- treemap(treemapset,
                   index = c("Continent", "Primary_fuel"),
                   vSize = "n",
                   type = "index",
                   palette = "Set2",
                   border.col = "darkslategrey",
                   fontcolor.labels = "white",
                   title = "Types of fuel")

We can see that when it comes to emissions, the leaders here are Europe and North America. Interesting observation is that when we look at the share of Asia when it comes to emissions and compare it with its share of the number of power plants, it can be noted that there are significantly less power plants that in Europe or North America. Taking a closer look to the split of primary fuels, we can see that the main source of energy alongside water is coal. When we look at Europe, 3 major energy sources are water, wind and Sun.

2.2. Map of Powerplants

Below, we created a map of power plants that displayes their geographical coordinates. Every point on the map is labeled and if you click on it, the idetailed information of this particular power plant will be displayed.

Leaflet Map

# Leaflet map presenting the location of powerplants
map <- leaflet(power_plant_data, options = leafletOptions(minZoom = 2, maxZoom = 12)) %>%
       addTiles() %>% setView(lng = 2.34, lat = 48.85, zoom = 5) %>% addProviderTiles("CartoDB.Voyager") %>%
       addCircleMarkers(~Longitude, ~Latitude,
                        label = ~htmlEscape(Name), color = "#ADC70A",
                        fillOpacity = 0.6, stroke = FALSE,
                        popup = ~paste0("<strong>Name: </strong>", power_plant_data$Name, "<br/>",
                                        "<strong>Country: </strong>", power_plant_data$Country, "<br/>",
                                        "<strong>Capacity: </strong>", power_plant_data$Capacity, " MW", "<br/>",
                                        "<strong>Primary Fuel: </strong>", power_plant_data$Primary_fuel),
                        popupOptions = popupOptions(closeButton = FALSE),
                        clusterOptions = markerClusterOptions(
                                         iconCreateFunction = JS("function (cluster) {    
                                                            var childCount = cluster.getChildCount();  
                                                            if (childCount < 60) {  
                                                              c = 'rgba(178, 223, 138, 0.8);'
                                                            } else if (childCount < 260) {  
                                                              c = 'rgba(245, 131, 131, 0.8);'  
                                                            } else if (childCount < 460) {  
                                                              c = 'rgba(166, 206, 227, 0.8);'  
                                                            } else { 
                                                              c = 'rgba(253, 191, 111, 0.8);'  
                                                            }    
                                                            return new L.DivIcon({ 
                                                            html: '<div style=\"background-color:'+ c +
                                                            '\"><span>' + childCount + '</span></div>',
                                                            className: 'marker-cluster',
                                                            iconSize: new L.Point(40, 40) });
                                                            }")))
  map

2.3. Total GHG Emission

Next, we will present the change in emissions between the year 2010 and 2017 by region. Every country on our list has assigned continent and region name. We can see the change on the dumbbell chart. The labels on the right display the percentage change for the given region. Color indicates the continent.

Dumbbell Chart

# Grouping the data
dumbbellset <- total %>% group_by(Region) %>% 
                         summarise(Continent = unique(Continent),
                                   Emission_2010 = sum(Emission_2010),
                                   Emission_2017 = sum(Emission_2017),
                                   Change = (Emission_2017 - Emission_2010) / Emission_2010) %>%                                                                 arrange(-Emission_2017)

# Creating the dumbell plot
dumbbelplot <- ggplot(dumbbellset,
                      aes(x = Emission_2010, xend = Emission_2017,
                          y = reorder(Region, Emission_2017), group = Region, colour = Continent)) + 
                      geom_dumbbell(size = 2, size_x = 3, size_xend = 3) + 
                      scale_colour_brewer(palette = "Set2") +
                      geom_text(aes(x = Emission_2017,
                                    y = Region,
                                    label = scales::percent(Change, accuracy = 1L)),
                                    color = "darkslategrey",
                                    hjust = -0.3,
                                    size = 4,
                                    nudge_x = 10) +
                      scale_x_continuous(breaks = seq(0,600000000000,by=150000000000),
                                         labels = format(seq(0,600,by=150))) +
                      labs(x = "Emission per year (million t)", 
                           y = NULL, 
                           title = "Dumbbell Chart", 
                           subtitle = "Change in GHG emissions: 2010 vs 2017") +
                      guides(colour = guide_legend(override.aes = list(shape = 16, alpha = 1, size = 3.5))) +
                      theme_minimal() +
                      theme(axis.text.x = element_text(size = 12),
                            axis.text.y = element_text(size = 12),
                            axis.title.x = element_text(size = 14),
                            plot.title = element_text(size = 18),
                            plot.subtitle = element_text(size = 14),
                            legend.text = element_text(size = 12),
                            legend.title = element_text(size = 12))

dumbbelplot

The biggest increase in tons of GHG emissions was observed for Eastern Asia, however, if we look at the percentage, the highest increase was in fact in Middle Africa. Compared to the emissions in Northern America, Europe and Asia, this increase is still a dot on the plot. Even though in scale the change was not significant, it is still a negative phenomena.

2.4. GDP PPP vs nominal

On the bar plot we decided to present the difference between the GDP per capita nominal and PPP for European countries. We divided them into regions: Northern, Eastern, southern and Western Europe. The red horizontal line indicates the world average.

Bar Plot

# Separating European countries
europe <- total %>% filter(Continent == "Europe") %>% select(Country, Region, GDP_pc_PPP, GDP_pc_nominal)
europe$Country <- factor(europe$Country)
colnames(europe) <- c("Country", "Region", "PPP", "nominal")

# Transforming into long format
europe_long <- gather(europe, GDP, Value, PPP:nominal, factor_key = TRUE)

Eastern Europe

# Bar plot for Eastern Europe
europe_long %>% filter(Region == "Eastern Europe") %>%
   ggplot(aes(x = reorder(Country, Value), y = Value, fill = GDP)) +
   geom_bar(stat = "identity", position = position_dodge(), colour = "grey30") +
   geom_hline(yintercept = 17000, linetype = "dashed", color = "darkred", size = 1) +
   geom_text(aes(x = "Moldova", y = 17000, label = "World Average PPP"),
             color = "darkred", hjust = 0.1, vjust = -1) +
   labs(x = NULL, 
        y = NULL, 
        title = "Bar Plot", 
        subtitle = "GDP per capita PPP vs GDP per capita nominal") +
   scale_fill_brewer(palette = "Paired") +
   theme_bw() +
   theme(axis.text.x = element_text(size = 11, angle = 90, vjust = 0.2, hjust = 0.85),
         axis.text.y = element_text(size = 10)) +
    scale_y_continuous(breaks = c(5000, 10000, 17000, 20000, 25000, 30000, 35000, 40000),
                      labels = c("5,000 $", "10,000 $","17,000 $", "20,000 $","25,000 $", "30,000 $", "35,000 $", "40,000 $"))

Northern Europe

# Bar plot for Northern Europe
europe_long %>% filter(Region == "Northern Europe") %>%
   ggplot(aes(x = reorder(Country, Value), y = Value, fill = GDP)) +
   geom_bar(stat = "identity", position = position_dodge(), colour = "grey30") +
   geom_hline(yintercept = 17000, linetype = "dashed", color = "darkred", size = 1) +
   geom_text(aes(x = "Latvia", y = 17000, label = "World Average PPP"),
             color = "darkred", hjust = 0.1, vjust = -1) +
   labs(x = NULL, 
        y = NULL, 
        title = "Bar Plot", 
        subtitle = "GDP per capita PPP vs GDP per capita nominal") +
   scale_fill_brewer(palette = "Paired") +
   theme_bw() +
   theme(axis.text.x = element_text(size = 11, angle = 90, vjust = 0.2, hjust = 0.85),
         axis.text.y = element_text(size = 10)) +
    scale_y_continuous(breaks = c(17000, 30000, 40000, 50000, 60000, 70000, 80000),
                      labels = c("17,000 $","30,000 $", "40,000 $", "50,000 $", "60,000 $", "70,000 $", "80,000 $"))

Southern Europe

# Bar plot for Southern Europe
europe_long %>% filter(Region == "Southern Europe") %>%
   ggplot(aes(x = reorder(Country, Value), y = Value, fill = GDP)) +
   geom_bar(stat = "identity", position = position_dodge(), colour = "grey30") +
   geom_hline(yintercept = 17000, linetype = "dashed", color = "darkred", size = 1) +
   geom_text(aes(x = "Albania", y = 17000, label = "World Average PPP"),
             color = "darkred", hjust = 0.1, vjust = -1) +
   labs(x = NULL, 
        y = NULL, 
        title = "Bar Plot", 
        subtitle = "GDP per capita PPP vs GDP per capita nominal") +
   scale_fill_brewer(palette = "Paired") +
   theme_bw() +
   theme(axis.text.x = element_text(size = 11, angle = 90, vjust = 0.2, hjust = 0.85),
         axis.text.y = element_text(size = 10)) +
    scale_y_continuous(breaks = c(5000, 10000, 17000, 20000, 25000, 30000, 35000, 40000),
                      labels = c("5,000 $", "10,000 $","17,000 $", "20,000 $","25,000 $", "30,000 $", "35,000 $", "40,000 $"))

Western Europe

# Bar plot for Western Europe
europe_long %>% filter(Region == "Western Europe") %>%
   ggplot(aes(x = reorder(Country, Value), y = Value, fill = GDP)) +
   geom_bar(stat = "identity", position = position_dodge(), colour = "grey30") +
   geom_hline(yintercept = 17000, linetype = "dashed", color = "darkred", size = 1) +
   geom_text(aes(x = "France", y = 17000, label = "World Average PPP"),
             color = "darkred", hjust = 0.1, vjust = -1) +
   labs(x = NULL, 
        y = NULL, 
        title = "Bar Plot", 
        subtitle = "GDP per capita PPP vs GDP per capita nominal") +
   scale_fill_brewer(palette = "Paired") +
   theme_bw() +
   theme(axis.text.x = element_text(size = 11, angle = 90, vjust = 0.2, hjust = 0.85),
         axis.text.y = element_text(size = 10)) +
   scale_y_continuous(breaks = c(17000, 30000, 45000, 60000, 75000, 90000, 105000),
                      labels = c("17,000 $", "30,000 $","45,000 $", "60,000 $", "75,000 $", "90,000 $", "105,000 $"))

Without surprise, the countries with the highest GDP are located in Western and Northern Europe. For all the countries in Western Europe, GDP nominal and PPP are higher than world average. In Southern Europe the poorest countries are Albania, Serbia, Macedonia, Montenegro, Croatia and Bosnia and Herzegovina. All of these countries belonged to Yugoslavia and the low GDP is the effect of that. The poorest countries are located in the Eastern Europe and they are: Moldova, Ukraine and Belarus. For Poland, the GDP per capita PPP is higher than the world average but when we take the nominal value, it is a bit less than the average.

2.5. GDP vs World Average

In this visualization we wanted to present the percentage of the world GDP in every country. The “zero” line is set at 100%, meaning that GDP at this point is equal to the world average at 17,000$. The data was split by continent.

Diverging Bar Plot

european_countries <- total %>% filter(Continent == "Europe") %>% mutate(GDP_vs_world = round(GDP_vs_world, 2) - 1) %>% arrange(-GDP_vs_world)

asian_countries <- total %>% filter(Continent == "Asia") %>% mutate(GDP_vs_world = round(GDP_vs_world, 2) - 1) %>% arrange(-GDP_vs_world)

african_countries <- total %>% filter(Continent == "Africa") %>% mutate(GDP_vs_world = round(GDP_vs_world, 2) - 1) %>% arrange(-GDP_vs_world)

american_countries <- total %>% filter(Continent == "North America" | Continent == "South America") %>% mutate(GDP_vs_world = round(GDP_vs_world, 2) - 1) %>% arrange(-GDP_vs_world)

oceanian_countries <- total %>% filter(Continent == "Oceania") %>% mutate(GDP_vs_world = round(GDP_vs_world, 2) - 1) %>% arrange(-GDP_vs_world)

Europe

# Diverging bar chart for Europe
ggplot(european_countries, aes(x = reorder(Country, GDP_vs_world),
                               y = GDP_vs_world, label = GDP_vs_world)) + 
       geom_bar(stat = 'identity', aes(fill = GDP_vs_world), width = 0.9) +
       scale_fill_gradientn(name = "% of World Average",
                            breaks = c(-1, 0, 1, 2, 3, 4, 5, 6, 7),
                            labels = c("0", "100", "200", "300", "400", "500", "600", "700", "800"),
                            colours = c("indianred4", "white", "olivedrab"),
                            values = scales::rescale(c(-1, -0.1, 0, 0.1, 7)),
                            limits = c(-1, 7)) +
       geom_hline(yintercept = 0) +
       labs(x = NULL, 
            y = "GDP (Percent of World Average)", 
            title = "Europe", 
            subtitle = "GDP per capita PPP vs World Average") +
       guides(color = guide_legend(title = "World Average Percent")) +
       coord_flip() + 
       theme_minimal() +
       scale_y_continuous(breaks = c(0, 1, 2, 3, 4, 5, 6, 7),
                          labels = c("100 %", "200 %", "300 %", "400 %", "500 %", "600 %", "700 %", "800 %")) +
       theme(axis.text.x = element_text(size = 12),
                            axis.text.y = element_text(size = 12),
                            axis.title.x = element_text(size = 14),
                            plot.title = element_text(size = 18),
                            plot.subtitle = element_text(size = 14),
                            legend.text = element_text(size = 12),
                            legend.title = element_text(size = 12))

Asia

# Diverging bar chart for Asia
ggplot(asian_countries, aes(x = reorder(Country, GDP_vs_world),
                            y = GDP_vs_world, label = GDP_vs_world)) + 
       geom_bar(stat = 'identity', aes(fill = GDP_vs_world), width = 0.9) +
       scale_fill_gradientn(name = "% of World Average",
                            breaks = c(-1, 0, 1, 2, 3, 4, 5, 6, 7),
                            labels = c("0", "100", "200", "300", "400", "500", "600", "700", "800"),
                            colours = c("indianred4", "white", "olivedrab"),
                            values = scales::rescale(c(-1, -0.1, 0, 0.1, 7)),
                            limits = c(-1, 7)) +
       geom_hline(yintercept = 0) +
       labs(x = NULL, 
            y = "GDP (Percent of World Average)", 
            title = "Asia", 
            subtitle = "GDP per capita PPP vs World Average") +
       guides(color = guide_legend(title = "World Average Percent")) +
       coord_flip() + 
       theme_minimal() +
       scale_y_continuous(breaks = c(-1, 0, 1, 2, 3, 4, 5, 6, 7),
                          labels = c("0", "100 %", "200 %", "300 %", "400 %", "500 %", "600 %", "700 %", "800 %")) +
       theme(axis.text.x = element_text(size = 12),
                            axis.text.y = element_text(size = 12),
                            axis.title.x = element_text(size = 14),
                            plot.title = element_text(size = 18),
                            plot.subtitle = element_text(size = 14),
                            legend.text = element_text(size = 12),
                            legend.title = element_text(size = 12))

Africa

# Diverging bar chart for Africa
ggplot(african_countries, aes(x = reorder(Country, GDP_vs_world),
                              y = GDP_vs_world, label = GDP_vs_world)) + 
       geom_bar(stat = 'identity', aes(fill = GDP_vs_world), width = 0.9) +
       scale_fill_gradientn(name = "% of World Average",
                            breaks = c(-1, -0.5, 0, 0.5, 1),
                            labels = c("0", "50", "100", "150", "200"),
                            colours = c("indianred4", "white", "olivedrab"),
                            values = scales::rescale(c(-1, -0.1, 0, 0.1, 1)),
                            limits = c(-1, 1)) +
       geom_hline(yintercept = 0) +
       labs(x = NULL, 
            y = "GDP (Percent of World Average)", 
            title = "Africa", 
            subtitle = "GDP per capita PPP vs World Average") +
       guides(color = guide_legend(title = "World Average Percent")) +
       coord_flip() + 
       theme_minimal() +
       scale_y_continuous(breaks = c(-1, -0.75, -0.5, -0.25, 0, 0.25, 0.5),
                          labels = c("0", "25 %", "50 %", "75 %", "100 %", "125 %", "150 %")) +
       theme(axis.text.x = element_text(size = 12),
                            axis.text.y = element_text(size = 12),
                            axis.title.x = element_text(size = 14),
                            plot.title = element_text(size = 18),
                            plot.subtitle = element_text(size = 14),
                            legend.text = element_text(size = 12),
                            legend.title = element_text(size = 12))

Americas

# Diverging bar chart for North and South America
ggplot(american_countries, aes(x = reorder(Country, GDP_vs_world),
                               y = GDP_vs_world, label = GDP_vs_world)) + 
       geom_bar(stat = 'identity', aes(fill = GDP_vs_world), width = 0.9) +
       scale_fill_gradientn(name = "% of World Average",
                            breaks = c(-1, 0, 1, 2, 3),
                            labels = c("0", "100", "200", "300", "400"),
                            colours = c("indianred4", "white", "olivedrab"),
                            values = scales::rescale(c(-1, -0.1, 0, 0.1, 3)),
                            limits = c(-1, 3)) +
       geom_hline(yintercept = 0) +
       labs(x = NULL, 
            y = "GDP (Percent of World Average)", 
            title = "North and South America", 
            subtitle = "GDP per capita PPP vs World Average") +
       guides(color = guide_legend(title = "World Average Percent")) +
       coord_flip() + 
       theme_minimal() +
       scale_y_continuous(breaks = c(-1, 0, 1, 2, 3),
                          labels = c("0", "100 %", "200 %", "300 %", "400 %"))

Oceania

# Diverging bar chart for Oceania
ggplot(oceanian_countries, aes(x = reorder(Country, GDP_vs_world),
                               y = GDP_vs_world, label = GDP_vs_world)) + 
       geom_bar(stat = 'identity', aes(fill = GDP_vs_world), width = 0.9) +
       scale_fill_gradientn(name = "% of World Average",
                            breaks = c(-1, 0, 1, 2, 3),
                            labels = c("0", "100", "200", "300", "400"),
                            colours = c("indianred4", "white", "olivedrab"),
                            values = scales::rescale(c(-1, -0.1, 0, 0.1, 3)),
                            limits = c(-1, 3)) +
       geom_hline(yintercept = 0) +
       labs(x = NULL, 
            y = "GDP (Percent of World Average)", 
            title = "Australia and Oceania", 
            subtitle = "GDP per capita PPP vs World Average") +
       guides(color = guide_legend(title = "World Average Percent")) +
       coord_flip() + 
       theme_minimal() +
       scale_y_continuous(breaks = c(-1, 0, 1, 2, 3),
                          labels = c("0", "100 %", "200 %", "300 %", "400 %"))

The poorest continent is without any doubt Africa. GDP of the great majority of all countries is way below the world average. The most diverse continent is Asia, where we can find a huge amplitude between the richest and the poorest country.

2.6. Countries vs Total Capacity

In these diverging lollipop graphs we can see a clear domination when it comes to total power output. When compared to the world average power per country, most countries fall well below the average. This is due to the outlier effect of countries such as Russia, US, China or India.

Diverging Lollipop Chart

country_and_continent <- total %>% select(Country, Continent)

continent_and_power_plant <- merge(power_plant_data, country_and_continent)

grouped_by_continent <- continent_and_power_plant %>% select(Continent, Capacity) %>% group_by(Continent) %>% 
                        summarise(mean_capacity = mean(Capacity))  

mean_world_capacity <- power_plant_data %>% select(Country, Capacity) %>% 
                          group_by(Country) %>% summarise(total_capacity = sum(Capacity))

mean_world_capacity <- mean(mean_world_capacity$total_capacity)
mean_world_capacity <- mean_world_capacity / 1000

european_powerplants <- continent_and_power_plant %>% filter(Continent == "Europe") %>% select(Continent, Country, Capacity) %>% 
                          group_by(Country) %>% summarise(total_capacity = sum(Capacity))
european_powerplants$total_capacity <- european_powerplants$total_capacity / 1000

asian_powerplants <- continent_and_power_plant %>% filter(Continent == "Asia") %>% select(Continent, Country, Capacity) %>% 
                          group_by(Country) %>% summarise(total_capacity = sum(Capacity))
asian_powerplants$total_capacity <- asian_powerplants$total_capacity / 1000

african_powerplants <- continent_and_power_plant %>% filter(Continent == "Africa") %>% select(Continent, Country, Capacity) %>% 
                          group_by(Country) %>% summarise(total_capacity = sum(Capacity))
african_powerplants$total_capacity <- african_powerplants$total_capacity / 1000

american_powerplants <- continent_and_power_plant %>% filter(Continent == "North America" | Continent == "South America") %>% select(Continent, Country, Capacity) %>% 
                          group_by(Country) %>% summarise(total_capacity = sum(Capacity)) 
american_powerplants$total_capacity <- american_powerplants$total_capacity / 1000

oceanian_powerplants <- continent_and_power_plant %>% filter(Continent == "Oceania") %>% select(Continent, Country, Capacity) %>% 
                          group_by(Country) %>% summarise(total_capacity = sum(Capacity))
oceanian_powerplants$total_capacity <- oceanian_powerplants$total_capacity / 1000

Europe

theme_set(theme_bw())

european_powerplants %>% ggplot(aes(x = reorder(Country, total_capacity), y = total_capacity)) +
  geom_point(size = 2, colour = "blue") +
  geom_segment(aes(xend = Country, y = mean_world_capacity, yend = total_capacity), size = 1) +
  theme_minimal() + labs(y = "Total capacity by country", x = "country") +
  coord_flip()

Asia

theme_set(theme_bw())

asian_powerplants %>% ggplot(aes(x = reorder(Country, total_capacity), y = total_capacity)) +
  geom_point(size = 2, colour = "blue") +
  geom_segment(aes(xend = Country, y = mean_world_capacity, yend = total_capacity), size = 1) +
  theme_minimal() + labs(y = "Total capacity by country", x = "country") +
  coord_flip()

Africa

theme_set(theme_bw())

african_powerplants %>% ggplot(aes(x = reorder(Country, total_capacity), y = total_capacity)) +
  geom_point(size = 2, colour = "blue") +
  geom_segment(aes(xend = Country, y = mean_world_capacity, yend = total_capacity), size = 1) +
  theme_minimal() + labs(y = "Total capacity by country", x = "country") +
  coord_flip()

Americas

theme_set(theme_bw())

american_powerplants %>% ggplot(aes(x = reorder(Country, total_capacity), y = total_capacity)) +
  geom_point(size = 2, colour = "blue") +
  geom_segment(aes(xend = Country, y = mean_world_capacity, yend = total_capacity), size = 1) +
  theme_minimal() + labs(y = "Total capacity by country", x = "country") +
  coord_flip()

Oceania

theme_set(theme_bw())

oceanian_powerplants %>% ggplot(aes(x = reorder(Country, total_capacity), y = total_capacity)) +
  geom_point(size = 2, colour = "blue") +
  geom_segment(aes(xend = Country, y = mean_world_capacity, yend = total_capacity), size = 1) +
  theme_minimal() + labs(y = "Total capacity by country", x = "country") +
  coord_flip()

2.7. Population vs GDP vs Power

On this graph we can observe a comparison between the population, gdp and total power output. In order to be able to visualize the countries, we used a log transformation on population and gdp. That way, the countries that are not world powers also get visualized, instead of being clustered in the corner. What stands out is China and US. Interestingly, their total power capacity is similar, despite having different gdp and drastically different population.

Bubble Chart

country_and_continent_gdp <- total %>% select(Country, Continent, GDP_nominal, Population_2017)

total_capacity_by_country <- power_plant_data %>% select(Country, Capacity) %>% group_by(Country) %>% summarise(total_capacity = sum(Capacity))

final_dataset <- merge(country_and_continent_gdp, total_capacity_by_country)

final_dataset$Population_2017 <- log(final_dataset$Population_2017)
final_dataset$GDP_nominal <- log(final_dataset$GDP_nominal)

ggplot(final_dataset, aes(x=Population_2017, y=GDP_nominal, size = total_capacity, color = Continent)) +
    geom_point(alpha=0.5) +
  theme_minimal()

2.8. Wordcloud

Wordcloud

Lastly, we present countries by the amount of powerplants using a wordcloud, just to avoid using another barplot. Although less professional than a barplot, this wordcloud presents what is most important. We can see, that the US has nearly three times as many powerplants as China. This means, that the two biggest greenhouse emitters, who still highly rely on gas, oil and coal, and two biggest contributors by power capacity have a very different structure. Both of these players should slowly diverge to the European, progressive approach of renewable energy. The question is, for which one will it be more difficult?

countries_counted <- power_plant_data %>% count(Country)
wordcloud2(data = countries_counted, size = 0.3)

It would be interesting to see which country faces a more difficult task in converting their energy strategy based on the number of existing powerplants.

Power Plants Emissions

Aleksandra Tomczak, Konrad Lewszyk

1/16/2022

1. Data

1.1. Data Description

Global Power Plants data description:

CO2 and GHG emission data description:

GDP, GDP per capita and population

1.2. Data Merging

2. Visualizations

2.1. Types of Fuel

Pie Chart

Pie Chart

Table

Treemap Chart

Emission

Count

2.2. Map of Powerplants

Leaflet Map

2.3. Total GHG Emission

Dumbbell Chart

2.4. GDP PPP vs nominal

Bar Plot

Eastern Europe

Northern Europe

Southern Europe

Western Europe

2.5. GDP vs World Average

Diverging Bar Plot

Europe

Asia

Africa

Americas

Oceania

2.6. Countries vs Total Capacity

Diverging Lollipop Chart

Europe

Asia

Africa

Americas

Oceania

2.7. Population vs GDP vs Power

Bubble Chart

2.8. Wordcloud

Wordcloud

3. Sources