Source: International Energy Statistics, Global trade and production 1990-2014
Dataset: www.kaggle.com/unitednations/international-energy-statistics
install.packages("tidyverse")
##
## The downloaded binary packages are in
## /var/folders/f2/clwdhl813kv3ls5yc62jk9v40000gn/T//RtmpHeXB5x/downloaded_packages
install.packages("readr")
##
## The downloaded binary packages are in
## /var/folders/f2/clwdhl813kv3ls5yc62jk9v40000gn/T//RtmpHeXB5x/downloaded_packages
install.packages("Metrics")
##
## The downloaded binary packages are in
## /var/folders/f2/clwdhl813kv3ls5yc62jk9v40000gn/T//RtmpHeXB5x/downloaded_packages
library(tidyverse)
library(dplyr)
library(ggplot2)
library(plotly)
library(RColorBrewer)
library(viridis)
library(Metrics)
library(splines)
library(mgcv)
Contains the energy sectors, sources, transactions such as exports and imports, fuels in transit in ships or airplanes, consumption by various sectors (power, manufacturing industry, commercial, residential, transport), known reserves in the case of fossil and nuclear resources, and built capacity in the case of electrical and nuclear power plant. In this project we will work primarily with the calculation of “total energy supply”. From the database, it is implied that energy supply is a combination of other commodities: Total Energy Supply = Production + Imports - Exports - Marine or Aviation in transit - Change in Stock
energy <- read.csv("/Users/raulmiranda/Desktop/DATA 110 Fall 2020/Databases/Kaggle-IntlEnergyStats90-14/all_energy_statistics.csv")
head(energy)
## country_or_area commodity_transaction year
## 1 Austria Additives and Oxygenates - Exports 1996
## 2 Austria Additives and Oxygenates - Exports 1995
## 3 Belgium Additives and Oxygenates - Exports 2014
## 4 Belgium Additives and Oxygenates - Exports 2013
## 5 Belgium Additives and Oxygenates - Exports 2012
## 6 Belgium Additives and Oxygenates - Exports 2011
## unit quantity quantity_footnotes category
## 1 Metric tons, thousand 5 NA additives_and_oxygenates
## 2 Metric tons, thousand 17 NA additives_and_oxygenates
## 3 Metric tons, thousand 0 NA additives_and_oxygenates
## 4 Metric tons, thousand 0 NA additives_and_oxygenates
## 5 Metric tons, thousand 35 NA additives_and_oxygenates
## 6 Metric tons, thousand 25 NA additives_and_oxygenates
str(energy)
## 'data.frame': 1189482 obs. of 7 variables:
## $ country_or_area : chr "Austria" "Austria" "Belgium" "Belgium" ...
## $ commodity_transaction: chr "Additives and Oxygenates - Exports" "Additives and Oxygenates - Exports" "Additives and Oxygenates - Exports" "Additives and Oxygenates - Exports" ...
## $ year : int 1996 1995 2014 2013 2012 2011 2010 2009 1998 1995 ...
## $ unit : chr "Metric tons, thousand" "Metric tons, thousand" "Metric tons, thousand" "Metric tons, thousand" ...
## $ quantity : num 5 17 0 0 35 25 22 45 1 7 ...
## $ quantity_footnotes : int NA NA NA NA NA NA NA NA NA NA ...
## $ category : chr "additives_and_oxygenates" "additives_and_oxygenates" "additives_and_oxygenates" "additives_and_oxygenates" ...
USenergy2014 <- filter (energy, country_or_area == "United States", year == "2014")
USenergy2014cat <- filter(USenergy2014, grepl(paste(c("production$", "final energy consumption", "total energy supply","imports","exports","main activity$"),collapse = "|"), commodity_transaction, ignore.case = TRUE))
#clean to avoid duplication - double counting of some energy commodities
USenergy2014cat <- filter(USenergy2014cat, !grepl ("gross production", commodity_transaction, ignore.case=TRUE)) # remove gross production; duplication with net production
USenergy2014cat <- filter(USenergy2014cat, !grepl("capacity", commodity_transaction, ignore.case=TRUE)) # remove capacity, not an actual supply
USenergy2014cat <- filter(USenergy2014cat, !grepl("Uranium - production", commodity_transaction, ignore.case=TRUE)) # remove U production, not an actual supply
USenergy2014cat <- filter(USenergy2014cat, !grepl("Of which", commodity_transaction, ignore.case=TRUE)) # remove "of which", duplicates other commodities
Each type of energy source has different energy content (heating value), requiring different conversion factors. For simplification, all hydrocarbon liquids are equivalent to oils, all hydrocarbon gases and light liquids are equivalent to natural gas, all solid hydrocarbons are equivalent to coals. 1 TOE = 41.868 GJ (metric ton of oil equivalent) 1 TCE = 29.3076 GJ (metric ton of coal equivalent) 1 MWH = 3.6 GJ (megawatt-hour)
1 QUAD = 10^15 BTU = 1.055x10^18 joules (1.055 EJ)
Conversion factors used: oils - 41.868 coals - 29.3076 bitumen - 40.2 gas - 52.9369 electricity - 3.6 fuelwood - 14.286
USenergy2014catUC <- within(USenergy2014cat, {
f <- unit == 'Metric tons, thousand' & grepl(paste( c('additives_and_oxygenates','aviation_gasoline','biodiesel',
'biogasoline','conventional_crude_oil','fuel_oil',
'gas_oil_diesel_oil','kerosene_type_jet_fuel','lubricants',
'motor_gasoline','naphtha','other_hydrocarbons',
'other_kerosene','other_liquid_biofuels','other_oil_products_n_e_c',
'paraffin_waxes','refinery_feedstocks'),
collapse="|"), category)
quantity[f] <- quantity[f]*41.868
unit[f] <- 'Terajoules' })
USenergy2014catUC <- within(USenergy2014catUC, {
f <- unit == 'Metric tons, thousand' & grepl(paste( c('anthracite','brown_coal','charcoal','coking_coal',
'hard_coal','lignite','other_bituminous_coal',
'petroleum_coke','sub_bituminous_coal'), collapse="|"), category)
quantity[f] <- quantity[f]*29.3076
unit[f] <- 'Terajoules' })
USenergy2014catUC <- within(USenergy2014catUC, {
f <- unit == 'Metric tons, thousand' & category =='bagasse'
quantity[f] <- quantity[f]*9.5
unit[f] <- 'Terajoules' })
USenergy2014catUC <- within(USenergy2014catUC, {
f <- unit == 'Metric tons, thousand' & category =='bitumen'
quantity[f] <- quantity[f]*40.2
unit[f] <- 'Terajoules' })
USenergy2014catUC <- within(USenergy2014catUC, {
f <- unit == 'Metric tons, thousand' & grepl(paste( c('ethane','liquified_petroleum_gas','natural_gas_liquids',
'refinery_gas','white_spirit_and_special_boiling_point_industrial_spirits'), collapse="|"), category)
quantity[f] <- quantity[f]*52.9369
unit[f] <- 'Terajoules' })
USenergy2014catUC <- within(USenergy2014catUC, {
f <- unit == 'Cubic metres, thousand' & category =='fuelwood'
quantity[f] <- quantity[f]*14.286
unit[f] <- 'Terajoules' })
USenergy2014catUC <- within(USenergy2014catUC, {
f <- unit == 'Kilowatt-hours, million'
quantity[f] <- quantity[f]*3.6
unit[f] <- 'Terajoules' })
# clean up the resulting dataframe
USenergy2014catUC$f <- NULL # remove the added column 'f'
USenergy2014catUC$quantity_footnotes <- NULL # remove the footnote column
USenergy2014catUC$commodity_transaction <- gsub("White spirit and special boiling point industrial spirits", "White spirit", USenergy2014catUC$commodity_transaction) # shorten the long name
USenergy2014supply <- filter(USenergy2014catUC, grepl ("Total energy supply", commodity_transaction, ignore.case = TRUE)) %>% arrange(quantity) %>% mutate(cumulative = cumsum(quantity))
#USenergy2014supply$commodity_transaction <- factor(USenergy2014supply$commodity_transaction, levels = USenergy2014supply$commodity_transaction)
#Converting to factor leaves the order in the x-axis as appears in the dataframe; otherwise geom_col reorders the string labels alphabetically; below I've used scale_x_discrete (limits) to do the same without converting commodity_transactions to factors.
p1 <- ggplot(data=USenergy2014supply, aes(x=commodity_transaction, y=quantity)) +
geom_col(stat = "identity", color="red") + # barplot
coord_flip() + theme_classic() + # flip coordinates and display classic theme
scale_x_discrete(limits=USenergy2014supply$commodity_transaction, # for x-axis ticks, follow the order of dataframe
label=function(x) gsub("- total energy supply", "",x, ignore.case = TRUE) ) + # for labels, remove unnecessary wording
labs(title="Fig. 1 - US Energy Supplies in 2014 (in TeraJoules)", subtitle = "1.e7 TJ = 9.5 Quads= 9.5e15 BTU", # title and subtitle
caption = "Source: Dataset from International Energy Statistics, Global trade and production 1990-2014\n www.kaggle.com/unitednations/international-energy-statistics") + # add a caption
ylab("Energy Supplied by Each Source (TJ)") + theme(axis.title.y = element_blank()) # label the y axis - will display as x axis
p1 # plot shows all labels
ggplotly(p1) # doesn't show subtitle or caption
p2 <- ggplot(data=USenergy2014supply, aes(x=commodity_transaction, y=quantity)) +
geom_col(stat = "identity", color="red") + # barplot
coord_flip() + theme_classic() + # flip coordinates and display classic theme
geom_point(aes(y=cumulative/4)) +
scale_y_continuous(sec.axis= sec_axis(~.*4, name ="Cumulative Energy, TJ")) + # add a second y axis for cumulative
scale_x_discrete(limits=USenergy2014supply$commodity_transaction, # for x-axis ticks, follow the order of dataframe
label=function(x) gsub("- total energy supply", "",x, ignore.case = TRUE) ) + # for labels, remove unnecessary wording
labs(title="Fig. 2 - US Energy Supplies in 2014 (in TeraJoules)", subtitle = "1.e7 TJ = 9.5 Quads= 9.5e15 BTU",#title and subtitle
caption = "Source: Dataset from International Energy Statistics, Global trade and production 1990-2014\n www.kaggle.com/unitednations/international-energy-statistics") + # add a caption
ylab("Energy Supplied by Each Source (TJ)") + theme(axis.title.y = element_blank()) # label the y axis - will display as x axis
p2 # plot shows all labels
ggplotly(p2) # doesn't show subtitle or caption
USenergy <- filter (energy, country_or_area == "United States")
USenergycat <- filter(USenergy, grepl(paste(c("production$", "final energy consumption", "total energy supply","imports","exports","main activity$"),collapse = "|"), commodity_transaction, ignore.case = TRUE))
#clean to avoid duplication - double counting of some energy commodities
USenergycat <- filter(USenergycat, !grepl ("gross production", commodity_transaction, ignore.case=TRUE)) # remove gross production; duplication with net production
USenergycat <- filter(USenergycat, !grepl("capacity", commodity_transaction, ignore.case=TRUE)) # remove capacity, not an actual supply
USenergycat <- filter(USenergycat, !grepl("Uranium - production", commodity_transaction, ignore.case=TRUE)) # remove U production, not an actual supply
USenergycat <- filter(USenergycat, !grepl("Of which", commodity_transaction, ignore.case=TRUE)) # remove "of which", duplicates other commodities
USenergycatUC <- within(USenergycat, {
f <- unit == 'Metric tons, thousand' & grepl(paste( c('additives_and_oxygenates','aviation_gasoline','biodiesel',
'biogasoline','conventional_crude_oil','fuel_oil',
'gas_oil_diesel_oil','kerosene_type_jet_fuel','lubricants',
'motor_gasoline','naphtha','other_hydrocarbons',
'other_kerosene','other_liquid_biofuels','other_oil_products_n_e_c',
'paraffin_waxes','refinery_feedstocks',
'gasoline_type_jet_fuel'),
collapse="|"), category)
quantity[f] <- quantity[f]*41.868
unit[f] <- 'Terajoules' })
USenergycatUC <- within(USenergycatUC, {
f <- unit == 'Metric tons, thousand' & grepl(paste( c('anthracite','brown_coal','charcoal','coking_coal',
'hard_coal','lignite','other_bituminous_coal',
'petroleum_coke','sub_bituminous_coal',
'patent_fuel','peat'),
collapse="|"), category)
quantity[f] <- quantity[f]*29.3076
unit[f] <- 'Terajoules' })
USenergycatUC <- within(USenergycatUC, {
f <- unit == 'Metric tons, thousand' & category =='bagasse'
quantity[f] <- quantity[f]*9.5
unit[f] <- 'Terajoules' })
USenergycatUC <- within(USenergycatUC, {
f <- unit == 'Metric tons, thousand' & category =='bitumen'
quantity[f] <- quantity[f]*40.2
unit[f] <- 'Terajoules' })
USenergycatUC <- within(USenergycatUC, {
f <- unit == 'Metric tons, thousand' & grepl(paste( c('ethane','liquified_petroleum_gas','natural_gas_liquids',
'refinery_gas','white_spirit_and_special_boiling_point_industrial_spirits'), collapse="|"), category)
quantity[f] <- quantity[f]*52.9369
unit[f] <- 'Terajoules' })
USenergycatUC <- within(USenergycatUC, {
f <- unit == 'Cubic metres, thousand' & category =='fuelwood'
quantity[f] <- quantity[f]*14.286
unit[f] <- 'Terajoules' })
USenergycatUC <- within(USenergycatUC, {
f <- unit == 'Kilowatt-hours, million'
quantity[f] <- quantity[f]*3.6
unit[f] <- 'Terajoules' })
# clean up the resulting dataframe
USenergycatUC$f <- NULL # remove the added column 'f'
USenergycatUC$quantity_footnotes <- NULL # remove the footnote column
USenergycatUC$commodity_transaction <- gsub("White spirit and special boiling point industrial spirits", "White spirit", USenergycatUC$commodity_transaction) # shorten the long name
USenergysupply <- filter(USenergycatUC, grepl ("Total energy supply", commodity_transaction, ignore.case = TRUE))
USenergysupplyannual <- group_by(USenergysupply, year) %>% summarise(totquantity = sum(quantity), .groups = "keep")
p3 <- ggplot(USenergysupplyannual) +
geom_point(aes(year, totquantity), color="turquoise1", size=2) +
geom_smooth (aes(year, totquantity), color="blue", se = FALSE ) +
theme_dark() +
labs(title="Fig. 3 - Total Energy Supply", subtitle = ,
caption = "Units: 1.e7 TJ = 9.5 Quads= 9.5e15 BTU\nSource: Dataset from International Energy Statistics, Global trade and production 1990-2014\n www.kaggle.com/unitednations/international-energy-statistics") +
ylab("Energy Supplied (TJ)") + theme(axis.title.x = element_blank())
p3
ggplotly(p3)
y= Energy Supplied, x= Year
y <- USenergysupplyannual$totquantity
x <- USenergysupplyannual$year
print (paste("Correlation coeff= ", as.character(cor(y, x)))) # get an idea of a linear correlation y ~ x
## [1] "Correlation coeff= 0.650545053850859"
model1 <- lm(totquantity ~ year, data = USenergysupplyannual) # linear model
summary(model1)
##
## Call:
## lm(formula = totquantity ~ year, data = USenergysupplyannual)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12561453 -7173754 262660 6821474 11727736
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.591e+09 4.247e+08 -3.747 0.00105 **
## year 8.714e+05 2.121e+05 4.108 0.00043 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7648000 on 23 degrees of freedom
## Multiple R-squared: 0.4232, Adjusted R-squared: 0.3981
## F-statistic: 16.88 on 1 and 23 DF, p-value: 0.0004301
print ("Confidence interval")
## [1] "Confidence interval"
confint(model1)
## 2.5 % 97.5 %
## (Intercept) -2469637961.6 -712591066
## year 432604.6 1310245
prediction1 <- model1 %>% predict(USenergysupplyannual)
print (paste ("RMSE = ", as.character(rmse(prediction1, y))))
## [1] "RMSE = 7336069.57310496"
# quick plot of fitted model
ggplot(data=USenergysupplyannual, aes(x,y)) +
geom_point() +
stat_smooth(method = lm, formula = y ~ x)
# quick plot of residuals
ggplot(data=USenergysupplyannual, aes(x,model1$residuals)) +
geom_point()
# other stats:
# coef(model1)
# sigma(model)*100/mean(USenergysupplyannual$totquantity)
# function for RMSE: RMSE <- function(error) {sqrt(mean(error^2))}
# call as RMSE(model$residuals)
# functions for R2 and R2 adj: R2 <- summary(model)$r.squared; R2adj <- summary(model)$adj.r.squared
# another way to calc: R2 <- function(preds, actual) {return(1 - sum((preds - actual)^2) / sum(actual - mean(actual))^2) }
# call as R2(model$residuals,USenergysupplyannual$totquantity)
Generative additive models optimize the knots for a spline regression fit y= Energy Supplied, x= Year
model2 <- gam(y ~ s(x), data=USenergysupplyannual) # generative additive model
summary(model2)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ s(x)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 153477718 459825 333.8 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(x) 6.441 7.584 55.23 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.946 Deviance explained = 96%
## GCV = 7.5261e+12 Scale est. = 5.286e+12 n = 25
prediction2 <- model2 %>% predict(USenergysupplyannual)
## Warning in predict.gam(., USenergysupplyannual): not all required variables have been supplied in newdata!
print (paste ("RMSE = ", as.character(rmse(prediction2, y))))
## [1] "RMSE = 1926809.16112615"
#quick plot
ggplot(USenergysupplyannual, aes(x,y)) +
geom_point() +
stat_smooth(method = gam, formula = y ~ s(x))
# quick plot of residuals
ggplot(data=USenergysupplyannual, aes(x,model2$residuals)) +
geom_point()
### Conclusion from the Regression Analysis
Clearly the GAM, Model 2, fits much better than the linear Model 1. Model 1: adjR2=.398, P-value=.0004, t-value=4.1, F=17, RMSE=7336069 Model 2: adjR2=.946, P-value=2e-16, t-value=333, F=55, RMSE=1926809
The R2 close to 1, P closer to 0, RMSE smaller, and t and F higher, represent a “better” fit for Model 2.
Group by category (that is, types of sources of energy) and display
USenergysuppcatann <- group_by(USenergysupply, category, year) %>% summarise(totquantity = sum(quantity), .groups = "keep")
p4 <- ggplot(USenergysuppcatann) +
geom_point(aes(year, totquantity, colour = category)) +
geom_smooth (aes(year, totquantity, colour = category), se = FALSE ) +
theme_bw() +
theme(legend.position="bottom", legend.title = element_blank()) +
labs(fill= "Energy Source", title="Fig. 4 - US Fossil Sources are Diversified", subtitle = ,
caption = "Units: 1.e7 TJ = 9.5 Quads= 9.5e15 BTU\nSource: Dataset from International Energy Statistics, Global trade and production 1990-2014\n www.kaggle.com/unitednations/international-energy-statistics") +
ylab("Energy Supplied (TJ)") + theme(axis.title.x = element_blank())
p4
ggplotly(p4)
USenergysuppcatannarg <-arrange(USenergysuppcatann, desc(totquantity))
mycolors <- colorRampPalette(brewer.pal(8, "Dark2")) (length(unique (USenergysuppcatannarg$category)))
p6 <- ggplot(USenergysuppcatannarg, aes(x=year, y=totquantity, fill= factor(category))) +
geom_area(stat="identity") +
scale_fill_manual(values = mycolors) +
theme_classic() +
theme(legend.position="bottom", legend.title = element_blank()) +
labs(fill= "Energy Source", title="Fig. 5 - US Fossil Sources are Diversified", subtitle = ,
caption = "Units: 1.e7 TJ = 9.5 Quads= 9.5e15 BTU\nSource: Dataset from International Energy Statistics, Global trade and production 1990-2014\n www.kaggle.com/unitednations/international-energy-statistics") +
ylab("Energy Supplied (TJ)") + theme(axis.title.x = element_blank())
p6
ggplotly(p6)
Hypothesis: are renewables increasing?
USenergyelectr <- filter(USenergycatUC, grepl ("electricity", category, ignore.case = TRUE) & grepl(paste(c("production","main activity"),collapse = "|"), commodity_transaction, ignore.case = TRUE)) %>%
filter(!grepl(paste(c("total production, main activity","net production"),collapse = "|"), commodity_transaction, ignore.case = TRUE)) %>%
filter(!grepl(paste(c("Solar – Main activity","Electricity - total"),collapse = "|"), commodity_transaction, ignore.case = TRUE))
USenergyelectcatann <- group_by(USenergyelectr, commodity_transaction, year) %>% summarise(totquantity = sum(quantity), .groups = "keep")
mycolors <- colorRampPalette(brewer.pal(8, "Accent")) (length(unique (USenergyelectcatann$commodity_transaction)))
p7 <- ggplot(USenergyelectcatann) +
geom_point(aes(year, totquantity, colour = commodity_transaction)) +
geom_smooth (aes(year, totquantity, colour = commodity_transaction), se = FALSE ) +
theme_dark() +
theme(legend.position="bottom", legend.title = element_blank()) +
labs(fill= "Energy Source", title="Fig. 6 - Electrical Energy from Multiple Sources", subtitle = ,
caption = "Units: 1.e7 TJ = 9.5 Quads= 9.5e15 BTU\nSource: Dataset from International Energy Statistics, Global trade and production 1990-2014\n www.kaggle.com/unitednations/international-energy-statistics") +
ylab("Energy from Each Source (TJ)") + theme(axis.title.x = element_blank())
p7
ggplotly(p7)
mycolors <- colorRampPalette(brewer.pal(8, "Paired")) (length(unique (USenergyelectcatann$commodity_transaction)))
p8 <- ggplot(USenergyelectcatann, aes(x=year, y=totquantity, fill= factor(commodity_transaction))) +
geom_area(stat="identity") +
scale_fill_manual(values = mycolors) +
theme_dark() +
theme(legend.position="bottom", legend.title = element_blank()) +
labs(fill= "Energy Source", title="Fig. 7 - Total Electrical Energy Share is Growing", subtitle = ,
caption = "Units: 1.e7 TJ = 9.5 Quads= 9.5e15 BTU\nSource: Dataset from International Energy Statistics, Global trade and production 1990-2014\n www.kaggle.com/unitednations/international-energy-statistics") +
ylab("Energy Supply (TJ)") + theme(axis.title.x = element_blank())
p8
ggplotly(p8)
USenergyelectannual <- group_by(USenergyelectcatann, year) %>% summarise(totelect = sum(totquantity), .groups = "keep")
USenergyannual <- merge(USenergysupplyannual, USenergyelectannual, by="year") %>% group_by(year) %>% mutate(totenergy = totquantity + totelect) %>% gather(key="energytype", value="totenergy", totquantity, totelect)
p9 <- ggplot(USenergyannual, aes(x=year, y=totenergy, fill= factor(energytype), text=energytype)) +
geom_area(stat="identity") +
scale_fill_manual(values = c("green2","blue4"), labels=c("Electrical","Oil, Gas, Biomass")) +
theme_classic() +
theme(legend.position="right", legend.title = element_blank()) +
labs(fill= "Energy Type", title="Fig. 8 - US Energy Relies on Fossil and Biomass Sources", subtitle = ,
caption = "Units: 1.e7 TJ = 9.5 Quads= 9.5e15 BTU\nSource: Dataset from International Energy Statistics, Global trade and production 1990-2014\n www.kaggle.com/unitednations/international-energy-statistics") +
ylab("Total Energy Supplied (TJ)") + theme(axis.title.x = element_blank())
p9
ggplotly(p9)
### Change to long and calculate total energy
USenergyannual2 <- spread(USenergyannual, key = "energytype", value = "totenergy") %>% mutate(totenergy= totquantity+totelect)
p10 <- ggplot(USenergyannual2, aes(x=year, y=totenergy, fill=totelect)) +
geom_bar(stat = "identity") +
theme_classic() +
labs(fill= "Electrical Energy (TJ)", title="Fig. 9 - US Energy Supply is Becoming Greener", subtitle = ,
caption = "Units: 1.e7 TJ = 9.5 Quads= 9.5e15 BTU\nSource: Dataset from International Energy Statistics, Global trade and production 1990-2014\n www.kaggle.com/unitednations/international-energy-statistics") +
ylab("Total Energy Supplied (TJ)") + theme(axis.title.x = element_blank()) +
scale_fill_gradient(low= "darkgreen", high= "green2")
p10 # plot shows all labels
ggplotly(p10) # doesn't show subtitle or caption
R. Miranda November 17, 2020
International Energy Statistics, Global trade and production 1990-2014 www.kaggle.com/unitednations/international-energy-statistics
The rapid rise of energy consumption is a mark of the successful expansion of humanity’s standard of living after the industrial revolution of the 19th century. Increasing demand for modern industries, transportation, housing, etc., has pressed communities and countries to extract more natural resources to supply their energy needs. Although we have not yet exhausted our resources, the rapidly increasing demand has put pressures on the planet that need to be dealt with now. The United Nations has called for action and declared seventeen Sustainable Development Goals to address all global challenges necessitating urgent solutions [1]. Goal 7 is Affordable and Clean Energy, noting that 3 billion people on this planet still lack clean cooking fuels and technologies.
Although this project does nothing to address such challenge, it brings attention to the variety of energy sources in use today in the U.S. Accounting for those supplies is the first step to understanding where the U.S. is taking action and could be taking more action towards one day achieving the goal of totally clean energy.
The Energy dataset describes transactions between different types of energy sources and consumption points for 180 countries during the period of 1990 to 2014. It includes all energy sectors, types of energy sources, transactions such as exports and imports, fuels during transit on ships or airplanes, consumption by various sectors (power, manufacturing industry, commercial, residential, transportation), known reserves in the case of fossil and nuclear resources, and built capacity in the case of electrical and nuclear power plants.
The dataset contains 1,189,482 records with 7 variables in a long format, which include Country, Year, Commodity or Transaction, Quantity of Energy, Unit, Category, and Footnotes. Year and Quantity are numerical variables and the others are categorical variables. The Commodity or Transaction variable describes the type of energy source (such as a specific type of coal), and whether it pertains to production, a consumption point (such as industrial manufacturing), or fuel in transport on ships or airplanes, or if it is imported, exported, or taken out of reserves (stocks). The Category variable aggregates all commodities of the same type, such as a specific type of coal, regardless of the type of transaction or whether it is a source or consumption point.
To understand the relationship between the various types of Commodity or Transactions, I had to spread that variable from long to wide, which originated 275 variables –not all of which are independent– and 4325 records. After examination, I concluded that Total Energy Supply is a combination of other transactions as follows:
Total Energy Supply = Production + Imports - Exports - Change in Stock - in transit (Marine or Aviation)
Analysis of the meaning of each type of Commodity or Transaction and interpretation of the data took the majority of the time in this project.
An additional complication was the variety of Units of energy used, which differed according to type of source (coal vs. hydrocarbons vs. gas vs. electrical, etc.) and even between different grades of coal (anthracite vs. bitumen) or biomass (bagasse vs. fuelwood). I decided to convert all units to Tera Joules (TJ or 10^12 J), the SI (Systeme Internationale) unit of energy, in order to make meaningful comparisons among various types of source. I must note that large quantities of energy in the U.S. literature are measured in Quad (quadrillion or 10^15) BTUs [2], with the equivalence of 1 Quad = 1.055x10^6 TJ. The charts and visualizations have scales divided into 10^7 TJ, that is, each division being approximately equal to 10 Quad.
The conversion factors used assumed that the energy content of all hydrocarbon liquids is similar to oils, all hydrocarbon gases and light liquids similar to natural gas, and all solid hydrocarbons similar to coal [3].
1 TOE = 41.868 GJ (metric ton of oil equivalent), i.e., 1000 TOE = 41.868 TJ 1 TCE = 29.3076 GJ (metric ton of coal equivalent), i.e., 1000 TCE = 29.3076 TJ 1 MWH = 3.6 GJ (megawatt-hour), i.e., 1000 MWH = 3.6 TJ
1 QUAD = 10^15 BTU = 1.055x10^18 joules = 1.055 EJ = 1.055x10^6 TJ For reference: the world energy consumption is about 575 Quad = 6.07x10^8 TJ, and the US consumption is about 100 Quad = 1.06x10^8 TJ [4]
Conversion factors used: oils - 41.868 coals - 29.3076 bitumen - 40.2 gas - 52.9369 electricity - 3.6 fuelwood - 14.286
An initial step for exploration of the data, was to reduce the dataset to just the U.S., 19847 records, and one year, 2014. To make this analysis feasible in the limited time available, only a few of the Commodity or Transactions were selected: all those containing the terms Production, Final Energy Consumption, Total Energy Supply, Imports, Exports, and Main activity.
As denoted above Total Energy Supply depends upon Production, Imports, Exports, and constitutes a reasonable measure for further exploration. Before proceeding, we had to convert units for all commodities to TJ, and arranged all commodities from large to small.
Fig. 1 shows that the largest sources in the U.S. are of fossil origin: hydrocarbons, natural gas, and various types of coals. The fourth most important source is biomass-derived fuels of various types. Some interesting sources such as waste of various types appear, but not prominently.
Fig. 2 adds a second y-axis (or x-axis in the flipped graph) to show cumulative energy. The U.S. collects about 1.5x10^8 TJ of energy measured by “Total Energy Supply”. This is 50% larger than the total consumption. The likely explanation is that a significant part of the energy supplied is actually wasted due to inefficiency and not being able to be consumed. This points out the fact that a large “source” of energy is the improvement of energy efficiency. That is a major scientific and technological target in the U.S.[5]
We increased the subset of U.S. data to include all years, and repeated the downselection of Commodity or Transactions, and converted all units to TJ, as before. For exploration, we grouped all sources (Commodity or Transactions) by year in order to display Total Energy Supply by year.
Fig. 3 shows the surprising result that Total Energy Supply peaked at 1.65x10^8 TJ around 2005 and underwent a decrease after that. We explored the possible reason later in the project.
This figure was a good candidate for some statistical analysis.
We carried out a linear regression analysis using the lm() function (Model 1), calculating predicted values, and displaying quick charts of fit and residuals. Obviously the linear regression is a very poor fit as shown visually and by the statistics calculated. Then we resorted to a generative additive model (Model 2), which is an optimized splines fit, using the gam() function. Again we calculated predicted values, displayed quick charts of fit and residuals. This gam is a much better fit as shown by the statistics:
Model 1 (linear): adjR2=.398, P-value=.0004, t-value=4.1, F=17, RMSE=7336069 Model 2 (GAM): adjR2=.946, P-value=2e-16, t-value=333, F=55, RMSE=1926809 The R^2 close to 1, P closer to 0, RMSE smaller, and t and F higher, represent a “better” fit for Model 2.
We re-grouped all sources by Category (types of sources) and by year, to show the diversity of sources.
Fig. 4, a geom_point and smoothed line chart, displays the evolution of the various sources during the lapse.
Fig. 5 is the same data but in a geom_area plot, which accumulates the sources by year and shows each contribution. It confirms the statement that the dominant fossil sources are hydrocarbons, natural gas, coals; biofuels are also dominant.
There’s is still a nagging question of why there is a peak in 2005.
A first assumption (hypothesis) was that renewable energy was not being property measured by “Total Energy Supply”. A larger fraction of renewable energy is expected over the past few years.
An examination of the data showed that renewable energy is mainly shown under Production and conversion of various sources into electricity. Thus nuclear, solar, wind, geothermal, hydroelectric and also thermal power plants, are all converted into Electrical Energy. Thus we filtered all commodities that included production or conversion of electrical energy, and re-grouped by Commodity or Transaction and year.
Fig. 6 now shows the various types of electrical energy sources in a geom_point and line format.
Fig. 7 shows the same data in an area plot, showing the fractional contributions per year. It confirms (or reveals) that the main sources are thermal, nuclear, hydroelectric, and wind. As shown, solar was still in its infancy in 2014. Point to note: thermal (by combustion) is actually considered “cleaner” than distributed combustion, because the combustion is made in controlled units where CO2 and other exhausts can be collected. This is the so-called “clean-coal technology.” Thus thermal electricity is considered clean.
The other important observation is that the electrical energy source was increasing in the 1990-2010 period, and stayed about constant afterwards.
The datasets of fossil plus biomass measured by “total energy supply” and the “electrical energy supply” were merged, grouped by year and displayed.
Fig. 8 shows now the total energy supply that includes fossil, biomass, and all electrical sources over time. The contribution of electrical (clean) energy is now apparent. The U.S. still depends primarily on fossil and biomass . Fig. 9 replots the same data but depicting the “greening” of the energy over time in a more optimistic and glowing fashion.
A lot can. The visualizations shown here are basic in terms of graphics. Not enough time was allocated to that aspect. Understanding the data was a major challenge. Many variables could have been explored further of course, in order to attain a complete “energy balance”. We did not answer satisfactorily the question of why all energy supply still appears to decay after 2005. It may be a deficiency in the dataset but that cannot be ascertained without a complete energy balance. For instance, we did not attempt to calculate all consumption. The flow of energy from source through use points (such as manufacturing and other industries) is potentially recoverable from the full dataset. Perhaps alluvial plots would be feasible, but getting the data appropriately sorted and arranged is a major challenge (I guess that’s why they featured this dataset in Kaggle…) The other study possible here is the global energy source evolution, and even that for particular countries of interest, with some geographical visualizations. This appears to be a major undertaking.