The purpose of the project is to create an rMarkdown document that presents collected data in a visual form – maps, graphs and charts. What we aim to explore is the relationship between the different types of powerplants and greenhouse gases emissions. We use also additional information for each of the countries that we are inspecting, such as population, GDP and GDP per capita. With such information we will be able to, for example, visualize a country’s economic status based on the powerplants they use and their greenhouse emissions per capita.
In our project we are using 5 different data sets. Detailed description for all the sets is provided below.
This dataset was created based on the annual ranking of the top 2000 public companies in the world compiled by World Resources Institute magazine. It covers approximately 30,000 power plants from 164 countries and includes thermal plants (e.g. coal, gas, oil, nuclear, biomass, waste, geothermal) and renewables (e.g. hydro, wind, solar). Obtained from Kaggle.
Country: Country name
Powerplant Name: Name or title of the power plant, generally in Romanized form
gppd_idnr: 10- or 12-character identifier for the power plant
Capacity (MW): Electrical generating capacity in megawatts
Latitude: Geolocation in decimal degrees
Longitude: Geolocation in decimal degrees
Primary Fuel: Main energy source used in electricity generation or export
Owner: Majority shareholder of the power plant, generally in Romanized form
Source: Entity reporting the data; could be an organization, report, or document, generally in Romanized form
In this dataset we have data on annual greenhouse gas emissions measured in tons for 232 countries since 1751. For majority of the timeline, we naturally do not have data, so we will focus on the modern times. This dataset consists of time series for each of the countries with no other variables. Obtained from Kaggle.
We scrapped the data about GDP, GDP per capita and population from websites:
They all contain the table with list od the world countries and respectively values of searched variables.
# Importing data
emission_data <- read.csv("emission_data.csv")
power_plant_data <- read.csv("global_power_plant.csv")
gdp_per_capita_data <- read_excel("gdp_per_capita_data.xlsx")
gdp_data <- read_excel("gdp_data.xlsx")
continents_data <- read_excel("continents_regions.xlsx")
For the emission data we choose the years 2010 - 2017. Older records are incomplete for some countries and also we would like to focus on the most current data. Let’s also extract the data for the continents only.
# Keeping only emissions data for the years 2010-2017
emission_data <- emission_data[c("Country", "X2010", "X2011", "X2012", "X2013", "X2014", "X2015",
"X2016", "X2017")]
# Creating a list of countries that are common for all data sets
common_countries <- Reduce(intersect, list(emission_data$Country, gdp_data$Country, gdp_per_capita_data$Country, unique(power_plant_data$Country), continents_data$Country))
# Dropping the observations for the countries outside of the common_countries list
emission_data <- emission_data[emission_data$Country %in% common_countries,]
gdp_data <- gdp_data[gdp_data$Country %in% common_countries,]
gdp_per_capita_data <- gdp_per_capita_data[gdp_per_capita_data$Country %in% common_countries,]
power_plant_data <- power_plant_data[power_plant_data$Country %in% common_countries,]
continents_data <- continents_data[continents_data$Country %in% common_countries,]
The next step was to check if all our data frames are of the appropriate type:
# str(emission_data) : dataframe
# str(gdp_data) : tibble
# str(gdp_per_capita_data) : tibble
# str(power_plant_data) : dataframe
# str(continents_data) : tibble
# Transforming tibbles into dataframes
gdp_data <- as.data.frame(gdp_data)
gdp_per_capita_data <- as.data.frame(gdp_per_capita_data)
continents_data <- as.data.frame(continents_data)
Now, we will merge some of the sets into a bigger ones to make plotting of the chosen information simpler.
# Ordering alphabetically by country
emission_data <- emission_data[order(emission_data$Country),]
gdp_data <- gdp_data[order(gdp_data$Country),]
gdp_per_capita_data <- gdp_per_capita_data[order(gdp_per_capita_data$Country),]
# Merging the data sets aboout GDP and emission
gdp <- merge(x = gdp_data, y = gdp_per_capita_data, by = "Country", all = TRUE)
continents_gdp <- merge(x = continents_data, y = gdp, by = "Country", all = TRUE)
total <- merge(x = continents_gdp, y = emission_data, by = "Country", all = TRUE)
# Dropping unuseful columns
total[6] <- NULL
total[8] <- NULL
power_plant_data[3] <- NULL
# Changing column names
colnames(total) <- c("Country", "ISO_code", "Region", "Continent","GDP_nominal", "GDP_growth",
"Population_2017", "GDP_worldshare", "GDP_pc_PPP", "GDP_pc_nominal",
"GDP_vs_world", "Emission_2010","Emission_2011","Emission_2012",
"Emission_2013","Emission_2014", "Emission_2015", "Emission_2016",
"Emission_2017")
colnames(power_plant_data) <- c("Country", "Name", "Capacity", "Latitude", "Longitude", "Primary_fuel",
"Owner", "Source")
# Merging all data sets into one long one
total_powerplant <- merge(x = power_plant_data, y = total, by = "Country", all = TRUE)
Finally, for visualization we will use two data sets: total that contains the information about the countries and total_powerplant that contains information about the power plants and the countries.
str(total)
## 'data.frame': 151 obs. of 19 variables:
## $ Country : chr "Afghanistan" "Albania" "Algeria" "Angola" ...
## $ ISO_code : chr "AFG" "ALB" "DZA" "AGO" ...
## $ Region : chr "Southern Asia" "Southern Europe" "Northern Africa" "Middle Africa" ...
## $ Continent : chr "Asia" "Europe" "Africa" "Africa" ...
## $ GDP_nominal : num 1.95e+10 1.30e+10 1.68e+11 1.22e+11 6.37e+11 ...
## $ GDP_growth : num 0.0267 0.0384 0.016 -0.0015 0.0285 0.075 0.0196 0.0304 0.001 0.0388 ...
## $ Population_2017: num 36296113 2884169 41389189 29816766 43937140 ...
## $ GDP_worldshare : num 0.0002 0.0002 0.0021 0.0015 0.0079 0.0001 0.0164 0.0052 0.0005 0.0004 ...
## $ GDP_pc_PPP : num 1976 12943 15293 6658 20829 ...
## $ GDP_pc_nominal : num 538 4521 4048 4096 14508 ...
## $ GDP_vs_world : num 0.12 0.76 0.89 0.39 1.22 0.57 2.89 3.15 1.02 2.79 ...
## $ Emission_2010 : num 1.00e+08 2.38e+08 3.13e+09 3.88e+08 6.59e+09 ...
## $ Emission_2011 : num 1.13e+08 2.43e+08 3.25e+09 4.18e+08 6.78e+09 ...
## $ Emission_2012 : num 1.23e+08 2.48e+08 3.38e+09 4.51e+08 6.97e+09 ...
## $ Emission_2013 : num 1.33e+08 2.53e+08 3.51e+09 4.84e+08 7.16e+09 ...
## $ Emission_2014 : num 1.43e+08 2.59e+08 3.66e+09 5.18e+08 7.36e+09 ...
## $ Emission_2015 : num 1.53e+08 2.65e+08 3.81e+09 5.53e+08 7.57e+09 ...
## $ Emission_2016 : num 1.65e+08 2.71e+08 3.96e+09 5.88e+08 7.77e+09 ...
## $ Emission_2017 : num 1.79e+08 2.77e+08 4.11e+09 6.24e+08 7.98e+09 ...
Firstly, let’s analyze the set about power plants, more precisely what types of fuels are consumed in the world.
table(power_plant_data$Primary_fuel)
##
## Biomass Coal Cogeneration Gas Geothermal
## 1395 2365 41 3861 189
## Hydro Nuclear Oil Other Petcoke
## 7100 195 2275 44 13
## Solar Storage Waste Wave and Tidal Wind
## 5929 58 1087 10 5180
As we can see, there are 15 types of fuel, there are no NA values in that column. The most frequently used fuel is Hydro, meaning that the most popular source is water.
Firstly, data frame containing frequencies for every type of fuel needs to be created. We will transoform the table into a data frame and arrange it by frequency. Next step was to group 4 least frequent types of fuels into one category Other. In order to create the sticks pointing out to a label on the plot, separate variable was created. Variable pos calculates the position around the pie chart wheel that indicates the tip of the label stick.
# Checking frequency of every type of fuel
fuel_types <- as.data.frame(table(total_powerplant$Primary_fuel))
colnames(fuel_types) <- c("Fuel", "Frequency")
fuel_types <- fuel_types[-1,] %>% arrange(-Frequency)
# Merging 4 types of fuel with the smallest frequencies with "Other"
fuels <- fuel_types
fuels$Fuel[c(10, 12, 13, 14)] <- "Other"
fuels <- fuels %>% group_by(Fuel) %>% summarise(Frequency = sum(Frequency)) %>% arrange(-Frequency)
fuels$Fuel <- factor(fuels$Fuel, levels = rev(as.character(fuels$Fuel)))
# Calculating the position of the label pointer
fuels$pos <- (cumsum(c(0, fuels$Frequency)) + c(fuels$Frequency / 2, .01))[1:nrow(fuels)]
# Plotting world share of every type of fuel
ggplot(fuels, aes(x = "", y = Frequency, fill = Fuel)) +
geom_bar(stat = "identity", width = 1, color = "white") +
geom_text_repel(aes(x = 1.4,
y = pos, label = paste0(round((Frequency/sum(Frequency)*100), 2), " %")),
nudge_x = .3,
segment.size = .7) +
guides(fill = guide_legend(reverse = TRUE)) +
labs(x = NULL,
y = NULL,
title = "Pie Chart",
subtitle = "Percentage share of powerplants by fuel\n") +
scale_fill_brewer(palette = "Paired") +
coord_polar(theta = "y") +
theme_void()
# Table with the fuels frequencies
kable(fuel_types)
| Fuel | Frequency |
|---|---|
| Hydro | 7100 |
| Solar | 5929 |
| Wind | 5180 |
| Gas | 3861 |
| Coal | 2365 |
| Oil | 2275 |
| Waste | 1087 |
| Nuclear | 195 |
| Geothermal | 189 |
| Storage | 58 |
| Other | 44 |
| Cogeneration | 41 |
| Petcoke | 13 |
| Wave and Tidal | 10 |
As it can be observed on the pie chart, 3 most common fuel types are: Hydro, Solar and Wind. It is a positive observation, meaning the world invested in the renewable energies and they are dominating the market.
The treemap chart presents similar information as the pie chart but for every continent. We grouped the fuel types by continent and displayed in two ways - on the first chart, the size of the main boxes represents the amount of total emissions. It means that the continent with the biggest emissions will have the biggest rectangle on the graph. The other graph adjusts the size of the rectangles according to the total number of power plants located on that continent.
# Creating data frames for merging with types of fuel and emission
treemapset_fuel <- total_powerplant %>% select(Continent, Primary_fuel) %>%
count(Continent, Primary_fuel)
treemapset_emission <- total %>% group_by(Continent) %>% summarise(Total_emission = sum(Emission_2017))
# Merging two data frames into one
treemapset <- merge(treemapset_fuel, treemapset_emission, by = "Continent", all = TRUE)
# Counting the number of powerplants per continent
treemapset_count <- treemapset %>% group_by(Continent) %>% summarise(No_power_plants = sum(n))
# Merging with previously created data frame
treemapset <- merge(treemapset, treemapset_count, by = "Continent", all = TRUE)
treemapset$Part_emission <- (treemapset$n / treemapset$No_power_plants) * treemapset$Total_emission
# Treemap with the size of boxes representing emissions
treemap <- treemap(treemapset,
index = c("Continent", "Primary_fuel"),
vSize = "Part_emission",
type = "index",
palette = "Set2",
border.col = "darkslategrey",
fontcolor.labels = "white",
title = "Types of fuel")
# Treemap with the size of boxes representing the number of powerplants
treemap2 <- treemap(treemapset,
index = c("Continent", "Primary_fuel"),
vSize = "n",
type = "index",
palette = "Set2",
border.col = "darkslategrey",
fontcolor.labels = "white",
title = "Types of fuel")
We can see that when it comes to emissions, the leaders here are Europe and North America. Interesting observation is that when we look at the share of Asia when it comes to emissions and compare it with its share of the number of power plants, it can be noted that there are significantly less power plants that in Europe or North America. Taking a closer look to the split of primary fuels, we can see that the main source of energy alongside water is coal. When we look at Europe, 3 major energy sources are water, wind and Sun.
Below, we created a map of power plants that displayes their geographical coordinates. Every point on the map is labeled and if you click on it, the idetailed information of this particular power plant will be displayed.
# Leaflet map presenting the location of powerplants
map <- leaflet(power_plant_data, options = leafletOptions(minZoom = 2, maxZoom = 12)) %>%
addTiles() %>% setView(lng = 2.34, lat = 48.85, zoom = 5) %>% addProviderTiles("CartoDB.Voyager") %>%
addCircleMarkers(~Longitude, ~Latitude,
label = ~htmlEscape(Name), color = "#ADC70A",
fillOpacity = 0.6, stroke = FALSE,
popup = ~paste0("<strong>Name: </strong>", power_plant_data$Name, "<br/>",
"<strong>Country: </strong>", power_plant_data$Country, "<br/>",
"<strong>Capacity: </strong>", power_plant_data$Capacity, " MW", "<br/>",
"<strong>Primary Fuel: </strong>", power_plant_data$Primary_fuel),
popupOptions = popupOptions(closeButton = FALSE),
clusterOptions = markerClusterOptions(
iconCreateFunction = JS("function (cluster) {
var childCount = cluster.getChildCount();
if (childCount < 60) {
c = 'rgba(178, 223, 138, 0.8);'
} else if (childCount < 260) {
c = 'rgba(245, 131, 131, 0.8);'
} else if (childCount < 460) {
c = 'rgba(166, 206, 227, 0.8);'
} else {
c = 'rgba(253, 191, 111, 0.8);'
}
return new L.DivIcon({
html: '<div style=\"background-color:'+ c +
'\"><span>' + childCount + '</span></div>',
className: 'marker-cluster',
iconSize: new L.Point(40, 40) });
}")))
map
Next, we will present the change in emissions between the year 2010 and 2017 by region. Every country on our list has assigned continent and region name. We can see the change on the dumbbell chart. The labels on the right display the percentage change for the given region. Color indicates the continent.
# Grouping the data
dumbbellset <- total %>% group_by(Region) %>%
summarise(Continent = unique(Continent),
Emission_2010 = sum(Emission_2010),
Emission_2017 = sum(Emission_2017),
Change = (Emission_2017 - Emission_2010) / Emission_2010) %>% arrange(-Emission_2017)
# Creating the dumbell plot
dumbbelplot <- ggplot(dumbbellset,
aes(x = Emission_2010, xend = Emission_2017,
y = reorder(Region, Emission_2017), group = Region, colour = Continent)) +
geom_dumbbell(size = 2, size_x = 3, size_xend = 3) +
scale_colour_brewer(palette = "Set2") +
geom_text(aes(x = Emission_2017,
y = Region,
label = scales::percent(Change, accuracy = 1L)),
color = "darkslategrey",
hjust = -0.3,
size = 4,
nudge_x = 10) +
scale_x_continuous(breaks = seq(0,600000000000,by=150000000000),
labels = format(seq(0,600,by=150))) +
labs(x = "Emission per year (million t)",
y = NULL,
title = "Dumbbell Chart",
subtitle = "Change in GHG emissions: 2010 vs 2017") +
guides(colour = guide_legend(override.aes = list(shape = 16, alpha = 1, size = 3.5))) +
theme_minimal() +
theme(axis.text.x = element_text(size = 12),
axis.text.y = element_text(size = 12),
axis.title.x = element_text(size = 14),
plot.title = element_text(size = 18),
plot.subtitle = element_text(size = 14),
legend.text = element_text(size = 12),
legend.title = element_text(size = 12))
dumbbelplot
The biggest increase in tons of GHG emissions was observed for Eastern Asia, however, if we look at the percentage, the highest increase was in fact in Middle Africa. Compared to the emissions in Northern America, Europe and Asia, this increase is still a dot on the plot. Even though in scale the change was not significant, it is still a negative phenomena.
On the bar plot we decided to present the difference between the GDP per capita nominal and PPP for European countries. We divided them into regions: Northern, Eastern, southern and Western Europe. The red horizontal line indicates the world average.
# Separating European countries
europe <- total %>% filter(Continent == "Europe") %>% select(Country, Region, GDP_pc_PPP, GDP_pc_nominal)
europe$Country <- factor(europe$Country)
colnames(europe) <- c("Country", "Region", "PPP", "nominal")
# Transforming into long format
europe_long <- gather(europe, GDP, Value, PPP:nominal, factor_key = TRUE)
# Bar plot for Eastern Europe
europe_long %>% filter(Region == "Eastern Europe") %>%
ggplot(aes(x = reorder(Country, Value), y = Value, fill = GDP)) +
geom_bar(stat = "identity", position = position_dodge(), colour = "grey30") +
geom_hline(yintercept = 17000, linetype = "dashed", color = "darkred", size = 1) +
geom_text(aes(x = "Moldova", y = 17000, label = "World Average PPP"),
color = "darkred", hjust = 0.1, vjust = -1) +
labs(x = NULL,
y = NULL,
title = "Bar Plot",
subtitle = "GDP per capita PPP vs GDP per capita nominal") +
scale_fill_brewer(palette = "Paired") +
theme_bw() +
theme(axis.text.x = element_text(size = 11, angle = 90, vjust = 0.2, hjust = 0.85),
axis.text.y = element_text(size = 10)) +
scale_y_continuous(breaks = c(5000, 10000, 17000, 20000, 25000, 30000, 35000, 40000),
labels = c("5,000 $", "10,000 $","17,000 $", "20,000 $","25,000 $", "30,000 $", "35,000 $", "40,000 $"))
# Bar plot for Northern Europe
europe_long %>% filter(Region == "Northern Europe") %>%
ggplot(aes(x = reorder(Country, Value), y = Value, fill = GDP)) +
geom_bar(stat = "identity", position = position_dodge(), colour = "grey30") +
geom_hline(yintercept = 17000, linetype = "dashed", color = "darkred", size = 1) +
geom_text(aes(x = "Latvia", y = 17000, label = "World Average PPP"),
color = "darkred", hjust = 0.1, vjust = -1) +
labs(x = NULL,
y = NULL,
title = "Bar Plot",
subtitle = "GDP per capita PPP vs GDP per capita nominal") +
scale_fill_brewer(palette = "Paired") +
theme_bw() +
theme(axis.text.x = element_text(size = 11, angle = 90, vjust = 0.2, hjust = 0.85),
axis.text.y = element_text(size = 10)) +
scale_y_continuous(breaks = c(17000, 30000, 40000, 50000, 60000, 70000, 80000),
labels = c("17,000 $","30,000 $", "40,000 $", "50,000 $", "60,000 $", "70,000 $", "80,000 $"))
# Bar plot for Southern Europe
europe_long %>% filter(Region == "Southern Europe") %>%
ggplot(aes(x = reorder(Country, Value), y = Value, fill = GDP)) +
geom_bar(stat = "identity", position = position_dodge(), colour = "grey30") +
geom_hline(yintercept = 17000, linetype = "dashed", color = "darkred", size = 1) +
geom_text(aes(x = "Albania", y = 17000, label = "World Average PPP"),
color = "darkred", hjust = 0.1, vjust = -1) +
labs(x = NULL,
y = NULL,
title = "Bar Plot",
subtitle = "GDP per capita PPP vs GDP per capita nominal") +
scale_fill_brewer(palette = "Paired") +
theme_bw() +
theme(axis.text.x = element_text(size = 11, angle = 90, vjust = 0.2, hjust = 0.85),
axis.text.y = element_text(size = 10)) +
scale_y_continuous(breaks = c(5000, 10000, 17000, 20000, 25000, 30000, 35000, 40000),
labels = c("5,000 $", "10,000 $","17,000 $", "20,000 $","25,000 $", "30,000 $", "35,000 $", "40,000 $"))
# Bar plot for Western Europe
europe_long %>% filter(Region == "Western Europe") %>%
ggplot(aes(x = reorder(Country, Value), y = Value, fill = GDP)) +
geom_bar(stat = "identity", position = position_dodge(), colour = "grey30") +
geom_hline(yintercept = 17000, linetype = "dashed", color = "darkred", size = 1) +
geom_text(aes(x = "France", y = 17000, label = "World Average PPP"),
color = "darkred", hjust = 0.1, vjust = -1) +
labs(x = NULL,
y = NULL,
title = "Bar Plot",
subtitle = "GDP per capita PPP vs GDP per capita nominal") +
scale_fill_brewer(palette = "Paired") +
theme_bw() +
theme(axis.text.x = element_text(size = 11, angle = 90, vjust = 0.2, hjust = 0.85),
axis.text.y = element_text(size = 10)) +
scale_y_continuous(breaks = c(17000, 30000, 45000, 60000, 75000, 90000, 105000),
labels = c("17,000 $", "30,000 $","45,000 $", "60,000 $", "75,000 $", "90,000 $", "105,000 $"))
Without surprise, the countries with the highest GDP are located in Western and Northern Europe. For all the countries in Western Europe, GDP nominal and PPP are higher than world average. In Southern Europe the poorest countries are Albania, Serbia, Macedonia, Montenegro, Croatia and Bosnia and Herzegovina. All of these countries belonged to Yugoslavia and the low GDP is the effect of that. The poorest countries are located in the Eastern Europe and they are: Moldova, Ukraine and Belarus. For Poland, the GDP per capita PPP is higher than the world average but when we take the nominal value, it is a bit less than the average.
In this visualization we wanted to present the percentage of the world GDP in every country. The “zero” line is set at 100%, meaning that GDP at this point is equal to the world average at 17,000$. The data was split by continent.
european_countries <- total %>% filter(Continent == "Europe") %>% mutate(GDP_vs_world = round(GDP_vs_world, 2) - 1) %>% arrange(-GDP_vs_world)
asian_countries <- total %>% filter(Continent == "Asia") %>% mutate(GDP_vs_world = round(GDP_vs_world, 2) - 1) %>% arrange(-GDP_vs_world)
african_countries <- total %>% filter(Continent == "Africa") %>% mutate(GDP_vs_world = round(GDP_vs_world, 2) - 1) %>% arrange(-GDP_vs_world)
american_countries <- total %>% filter(Continent == "North America" | Continent == "South America") %>% mutate(GDP_vs_world = round(GDP_vs_world, 2) - 1) %>% arrange(-GDP_vs_world)
oceanian_countries <- total %>% filter(Continent == "Oceania") %>% mutate(GDP_vs_world = round(GDP_vs_world, 2) - 1) %>% arrange(-GDP_vs_world)
# Diverging bar chart for Europe
ggplot(european_countries, aes(x = reorder(Country, GDP_vs_world),
y = GDP_vs_world, label = GDP_vs_world)) +
geom_bar(stat = 'identity', aes(fill = GDP_vs_world), width = 0.9) +
scale_fill_gradientn(name = "% of World Average",
breaks = c(-1, 0, 1, 2, 3, 4, 5, 6, 7),
labels = c("0", "100", "200", "300", "400", "500", "600", "700", "800"),
colours = c("indianred4", "white", "olivedrab"),
values = scales::rescale(c(-1, -0.1, 0, 0.1, 7)),
limits = c(-1, 7)) +
geom_hline(yintercept = 0) +
labs(x = NULL,
y = "GDP (Percent of World Average)",
title = "Europe",
subtitle = "GDP per capita PPP vs World Average") +
guides(color = guide_legend(title = "World Average Percent")) +
coord_flip() +
theme_minimal() +
scale_y_continuous(breaks = c(0, 1, 2, 3, 4, 5, 6, 7),
labels = c("100 %", "200 %", "300 %", "400 %", "500 %", "600 %", "700 %", "800 %")) +
theme(axis.text.x = element_text(size = 12),
axis.text.y = element_text(size = 12),
axis.title.x = element_text(size = 14),
plot.title = element_text(size = 18),
plot.subtitle = element_text(size = 14),
legend.text = element_text(size = 12),
legend.title = element_text(size = 12))
# Diverging bar chart for Asia
ggplot(asian_countries, aes(x = reorder(Country, GDP_vs_world),
y = GDP_vs_world, label = GDP_vs_world)) +
geom_bar(stat = 'identity', aes(fill = GDP_vs_world), width = 0.9) +
scale_fill_gradientn(name = "% of World Average",
breaks = c(-1, 0, 1, 2, 3, 4, 5, 6, 7),
labels = c("0", "100", "200", "300", "400", "500", "600", "700", "800"),
colours = c("indianred4", "white", "olivedrab"),
values = scales::rescale(c(-1, -0.1, 0, 0.1, 7)),
limits = c(-1, 7)) +
geom_hline(yintercept = 0) +
labs(x = NULL,
y = "GDP (Percent of World Average)",
title = "Asia",
subtitle = "GDP per capita PPP vs World Average") +
guides(color = guide_legend(title = "World Average Percent")) +
coord_flip() +
theme_minimal() +
scale_y_continuous(breaks = c(-1, 0, 1, 2, 3, 4, 5, 6, 7),
labels = c("0", "100 %", "200 %", "300 %", "400 %", "500 %", "600 %", "700 %", "800 %")) +
theme(axis.text.x = element_text(size = 12),
axis.text.y = element_text(size = 12),
axis.title.x = element_text(size = 14),
plot.title = element_text(size = 18),
plot.subtitle = element_text(size = 14),
legend.text = element_text(size = 12),
legend.title = element_text(size = 12))
# Diverging bar chart for Africa
ggplot(african_countries, aes(x = reorder(Country, GDP_vs_world),
y = GDP_vs_world, label = GDP_vs_world)) +
geom_bar(stat = 'identity', aes(fill = GDP_vs_world), width = 0.9) +
scale_fill_gradientn(name = "% of World Average",
breaks = c(-1, -0.5, 0, 0.5, 1),
labels = c("0", "50", "100", "150", "200"),
colours = c("indianred4", "white", "olivedrab"),
values = scales::rescale(c(-1, -0.1, 0, 0.1, 1)),
limits = c(-1, 1)) +
geom_hline(yintercept = 0) +
labs(x = NULL,
y = "GDP (Percent of World Average)",
title = "Africa",
subtitle = "GDP per capita PPP vs World Average") +
guides(color = guide_legend(title = "World Average Percent")) +
coord_flip() +
theme_minimal() +
scale_y_continuous(breaks = c(-1, -0.75, -0.5, -0.25, 0, 0.25, 0.5),
labels = c("0", "25 %", "50 %", "75 %", "100 %", "125 %", "150 %")) +
theme(axis.text.x = element_text(size = 12),
axis.text.y = element_text(size = 12),
axis.title.x = element_text(size = 14),
plot.title = element_text(size = 18),
plot.subtitle = element_text(size = 14),
legend.text = element_text(size = 12),
legend.title = element_text(size = 12))
# Diverging bar chart for North and South America
ggplot(american_countries, aes(x = reorder(Country, GDP_vs_world),
y = GDP_vs_world, label = GDP_vs_world)) +
geom_bar(stat = 'identity', aes(fill = GDP_vs_world), width = 0.9) +
scale_fill_gradientn(name = "% of World Average",
breaks = c(-1, 0, 1, 2, 3),
labels = c("0", "100", "200", "300", "400"),
colours = c("indianred4", "white", "olivedrab"),
values = scales::rescale(c(-1, -0.1, 0, 0.1, 3)),
limits = c(-1, 3)) +
geom_hline(yintercept = 0) +
labs(x = NULL,
y = "GDP (Percent of World Average)",
title = "North and South America",
subtitle = "GDP per capita PPP vs World Average") +
guides(color = guide_legend(title = "World Average Percent")) +
coord_flip() +
theme_minimal() +
scale_y_continuous(breaks = c(-1, 0, 1, 2, 3),
labels = c("0", "100 %", "200 %", "300 %", "400 %"))
# Diverging bar chart for Oceania
ggplot(oceanian_countries, aes(x = reorder(Country, GDP_vs_world),
y = GDP_vs_world, label = GDP_vs_world)) +
geom_bar(stat = 'identity', aes(fill = GDP_vs_world), width = 0.9) +
scale_fill_gradientn(name = "% of World Average",
breaks = c(-1, 0, 1, 2, 3),
labels = c("0", "100", "200", "300", "400"),
colours = c("indianred4", "white", "olivedrab"),
values = scales::rescale(c(-1, -0.1, 0, 0.1, 3)),
limits = c(-1, 3)) +
geom_hline(yintercept = 0) +
labs(x = NULL,
y = "GDP (Percent of World Average)",
title = "Australia and Oceania",
subtitle = "GDP per capita PPP vs World Average") +
guides(color = guide_legend(title = "World Average Percent")) +
coord_flip() +
theme_minimal() +
scale_y_continuous(breaks = c(-1, 0, 1, 2, 3),
labels = c("0", "100 %", "200 %", "300 %", "400 %"))
The poorest continent is without any doubt Africa. GDP of the great majority of all countries is way below the world average. The most diverse continent is Asia, where we can find a huge amplitude between the richest and the poorest country.
In these diverging lollipop graphs we can see a clear domination when it comes to total power output. When compared to the world average power per country, most countries fall well below the average. This is due to the outlier effect of countries such as Russia, US, China or India.
country_and_continent <- total %>% select(Country, Continent)
continent_and_power_plant <- merge(power_plant_data, country_and_continent)
grouped_by_continent <- continent_and_power_plant %>% select(Continent, Capacity) %>% group_by(Continent) %>%
summarise(mean_capacity = mean(Capacity))
mean_world_capacity <- power_plant_data %>% select(Country, Capacity) %>%
group_by(Country) %>% summarise(total_capacity = sum(Capacity))
mean_world_capacity <- mean(mean_world_capacity$total_capacity)
mean_world_capacity <- mean_world_capacity / 1000
european_powerplants <- continent_and_power_plant %>% filter(Continent == "Europe") %>% select(Continent, Country, Capacity) %>%
group_by(Country) %>% summarise(total_capacity = sum(Capacity))
european_powerplants$total_capacity <- european_powerplants$total_capacity / 1000
asian_powerplants <- continent_and_power_plant %>% filter(Continent == "Asia") %>% select(Continent, Country, Capacity) %>%
group_by(Country) %>% summarise(total_capacity = sum(Capacity))
asian_powerplants$total_capacity <- asian_powerplants$total_capacity / 1000
african_powerplants <- continent_and_power_plant %>% filter(Continent == "Africa") %>% select(Continent, Country, Capacity) %>%
group_by(Country) %>% summarise(total_capacity = sum(Capacity))
african_powerplants$total_capacity <- african_powerplants$total_capacity / 1000
american_powerplants <- continent_and_power_plant %>% filter(Continent == "North America" | Continent == "South America") %>% select(Continent, Country, Capacity) %>%
group_by(Country) %>% summarise(total_capacity = sum(Capacity))
american_powerplants$total_capacity <- american_powerplants$total_capacity / 1000
oceanian_powerplants <- continent_and_power_plant %>% filter(Continent == "Oceania") %>% select(Continent, Country, Capacity) %>%
group_by(Country) %>% summarise(total_capacity = sum(Capacity))
oceanian_powerplants$total_capacity <- oceanian_powerplants$total_capacity / 1000
theme_set(theme_bw())
european_powerplants %>% ggplot(aes(x = reorder(Country, total_capacity), y = total_capacity)) +
geom_point(size = 2, colour = "blue") +
geom_segment(aes(xend = Country, y = mean_world_capacity, yend = total_capacity), size = 1) +
theme_minimal() + labs(y = "Total capacity by country", x = "country") +
coord_flip()
theme_set(theme_bw())
asian_powerplants %>% ggplot(aes(x = reorder(Country, total_capacity), y = total_capacity)) +
geom_point(size = 2, colour = "blue") +
geom_segment(aes(xend = Country, y = mean_world_capacity, yend = total_capacity), size = 1) +
theme_minimal() + labs(y = "Total capacity by country", x = "country") +
coord_flip()
theme_set(theme_bw())
african_powerplants %>% ggplot(aes(x = reorder(Country, total_capacity), y = total_capacity)) +
geom_point(size = 2, colour = "blue") +
geom_segment(aes(xend = Country, y = mean_world_capacity, yend = total_capacity), size = 1) +
theme_minimal() + labs(y = "Total capacity by country", x = "country") +
coord_flip()
theme_set(theme_bw())
american_powerplants %>% ggplot(aes(x = reorder(Country, total_capacity), y = total_capacity)) +
geom_point(size = 2, colour = "blue") +
geom_segment(aes(xend = Country, y = mean_world_capacity, yend = total_capacity), size = 1) +
theme_minimal() + labs(y = "Total capacity by country", x = "country") +
coord_flip()
theme_set(theme_bw())
oceanian_powerplants %>% ggplot(aes(x = reorder(Country, total_capacity), y = total_capacity)) +
geom_point(size = 2, colour = "blue") +
geom_segment(aes(xend = Country, y = mean_world_capacity, yend = total_capacity), size = 1) +
theme_minimal() + labs(y = "Total capacity by country", x = "country") +
coord_flip()
On this graph we can observe a comparison between the population, gdp and total power output. In order to be able to visualize the countries, we used a log transformation on population and gdp. That way, the countries that are not world powers also get visualized, instead of being clustered in the corner. What stands out is China and US. Interestingly, their total power capacity is similar, despite having different gdp and drastically different population.
country_and_continent_gdp <- total %>% select(Country, Continent, GDP_nominal, Population_2017)
total_capacity_by_country <- power_plant_data %>% select(Country, Capacity) %>% group_by(Country) %>% summarise(total_capacity = sum(Capacity))
final_dataset <- merge(country_and_continent_gdp, total_capacity_by_country)
final_dataset$Population_2017 <- log(final_dataset$Population_2017)
final_dataset$GDP_nominal <- log(final_dataset$GDP_nominal)
ggplot(final_dataset, aes(x=Population_2017, y=GDP_nominal, size = total_capacity, color = Continent)) +
geom_point(alpha=0.5) +
theme_minimal()
Lastly, we present countries by the amount of powerplants using a wordcloud, just to avoid using another barplot. Although less professional than a barplot, this wordcloud presents what is most important. We can see, that the US has nearly three times as many powerplants as China. This means, that the two biggest greenhouse emitters, who still highly rely on gas, oil and coal, and two biggest contributors by power capacity have a very different structure. Both of these players should slowly diverge to the European, progressive approach of renewable energy. The question is, for which one will it be more difficult?
countries_counted <- power_plant_data %>% count(Country)
wordcloud2(data = countries_counted, size = 0.3)
It would be interesting to see which country faces a more difficult task in converting their energy strategy based on the number of existing powerplants.
https://www.kaggle.com/srikantsahu/co2-and-ghg-emission-data
https://www.worldometers.info/world-population/population-by-country/
http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html
https://www.datanovia.com/en/blog/ggplot-colors-best-tricks-you-will-love/
http://www.sthda.com/english/wiki/ggplot2-axis-ticks-a-guide-to-customize-tick-marks-and-labels
https://statisticsglobe.com/change-font-size-of-ggplot2-plot-in-r-axis-text-main-title-legend