Energy Sector Evolution in the US, 1990-2014

Source: International Energy Statistics, Global trade and production 1990-2014

Dataset: www.kaggle.com/unitednations/international-energy-statistics

Load packages and libraries

install.packages("tidyverse")
## 
## The downloaded binary packages are in
##  /var/folders/f2/clwdhl813kv3ls5yc62jk9v40000gn/T//RtmpHeXB5x/downloaded_packages
install.packages("readr")
## 
## The downloaded binary packages are in
##  /var/folders/f2/clwdhl813kv3ls5yc62jk9v40000gn/T//RtmpHeXB5x/downloaded_packages
install.packages("Metrics")
## 
## The downloaded binary packages are in
##  /var/folders/f2/clwdhl813kv3ls5yc62jk9v40000gn/T//RtmpHeXB5x/downloaded_packages
library(tidyverse)
library(dplyr)
library(ggplot2)
library(plotly)
library(RColorBrewer)
library(viridis)
library(Metrics)
library(splines)
library(mgcv)

Read the dataset - over a million records in a long format, or about 300 variables in a wide format

Contains the energy sectors, sources, transactions such as exports and imports, fuels in transit in ships or airplanes, consumption by various sectors (power, manufacturing industry, commercial, residential, transport), known reserves in the case of fossil and nuclear resources, and built capacity in the case of electrical and nuclear power plant. In this project we will work primarily with the calculation of “total energy supply”. From the database, it is implied that energy supply is a combination of other commodities: Total Energy Supply = Production + Imports - Exports - Marine or Aviation in transit - Change in Stock

energy <- read.csv("/Users/raulmiranda/Desktop/DATA 110 Fall 2020/Databases/Kaggle-IntlEnergyStats90-14/all_energy_statistics.csv")
head(energy)
##   country_or_area              commodity_transaction year
## 1         Austria Additives and Oxygenates - Exports 1996
## 2         Austria Additives and Oxygenates - Exports 1995
## 3         Belgium Additives and Oxygenates - Exports 2014
## 4         Belgium Additives and Oxygenates - Exports 2013
## 5         Belgium Additives and Oxygenates - Exports 2012
## 6         Belgium Additives and Oxygenates - Exports 2011
##                     unit quantity quantity_footnotes                 category
## 1 Metric tons,  thousand        5                 NA additives_and_oxygenates
## 2 Metric tons,  thousand       17                 NA additives_and_oxygenates
## 3 Metric tons,  thousand        0                 NA additives_and_oxygenates
## 4 Metric tons,  thousand        0                 NA additives_and_oxygenates
## 5 Metric tons,  thousand       35                 NA additives_and_oxygenates
## 6 Metric tons,  thousand       25                 NA additives_and_oxygenates
str(energy)
## 'data.frame':    1189482 obs. of  7 variables:
##  $ country_or_area      : chr  "Austria" "Austria" "Belgium" "Belgium" ...
##  $ commodity_transaction: chr  "Additives and Oxygenates - Exports" "Additives and Oxygenates - Exports" "Additives and Oxygenates - Exports" "Additives and Oxygenates - Exports" ...
##  $ year                 : int  1996 1995 2014 2013 2012 2011 2010 2009 1998 1995 ...
##  $ unit                 : chr  "Metric tons,  thousand" "Metric tons,  thousand" "Metric tons,  thousand" "Metric tons,  thousand" ...
##  $ quantity             : num  5 17 0 0 35 25 22 45 1 7 ...
##  $ quantity_footnotes   : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ category             : chr  "additives_and_oxygenates" "additives_and_oxygenates" "additives_and_oxygenates" "additives_and_oxygenates" ...

Filter only the US and year 2014 data

USenergy2014 <- filter (energy, country_or_area == "United States", year == "2014")

Categorize types of energy commodity or transaction - select a few commodities - clean up

USenergy2014cat <- filter(USenergy2014, grepl(paste(c("production$", "final energy consumption", "total energy supply","imports","exports","main activity$"),collapse = "|"), commodity_transaction, ignore.case = TRUE))

#clean to avoid duplication - double counting of some energy commodities
USenergy2014cat <- filter(USenergy2014cat, !grepl ("gross production", commodity_transaction, ignore.case=TRUE))  # remove gross production; duplication with net production
USenergy2014cat <- filter(USenergy2014cat, !grepl("capacity", commodity_transaction, ignore.case=TRUE)) # remove capacity, not an actual supply
USenergy2014cat <- filter(USenergy2014cat, !grepl("Uranium - production", commodity_transaction, ignore.case=TRUE)) # remove U production, not an actual supply
USenergy2014cat <- filter(USenergy2014cat, !grepl("Of which", commodity_transaction, ignore.case=TRUE)) # remove "of which", duplicates other commodities

Convert all units to TeraJoules in order to make meaningful comparisons among various energy sectors

Each type of energy source has different energy content (heating value), requiring different conversion factors. For simplification, all hydrocarbon liquids are equivalent to oils, all hydrocarbon gases and light liquids are equivalent to natural gas, all solid hydrocarbons are equivalent to coals. 1 TOE = 41.868 GJ (metric ton of oil equivalent) 1 TCE = 29.3076 GJ (metric ton of coal equivalent) 1 MWH = 3.6 GJ (megawatt-hour)

1 QUAD = 10^15 BTU = 1.055x10^18 joules (1.055 EJ)

Conversion factors used: oils - 41.868 coals - 29.3076 bitumen - 40.2 gas - 52.9369 electricity - 3.6 fuelwood - 14.286

USenergy2014catUC <- within(USenergy2014cat, {
  f <- unit == 'Metric tons,  thousand' & grepl(paste( c('additives_and_oxygenates','aviation_gasoline','biodiesel',
                                                         'biogasoline','conventional_crude_oil','fuel_oil',
                                                         'gas_oil_diesel_oil','kerosene_type_jet_fuel','lubricants',
                                                         'motor_gasoline','naphtha','other_hydrocarbons',
                                                         'other_kerosene','other_liquid_biofuels','other_oil_products_n_e_c',
                                                         'paraffin_waxes','refinery_feedstocks'),
                                                       collapse="|"), category)
      quantity[f] <- quantity[f]*41.868
      unit[f] <- 'Terajoules' })

USenergy2014catUC <- within(USenergy2014catUC, {
  f <- unit == 'Metric tons,  thousand' & grepl(paste( c('anthracite','brown_coal','charcoal','coking_coal',
                                                         'hard_coal','lignite','other_bituminous_coal',
                                                         'petroleum_coke','sub_bituminous_coal'), collapse="|"), category)
      quantity[f] <- quantity[f]*29.3076
      unit[f] <- 'Terajoules' })

USenergy2014catUC <- within(USenergy2014catUC, {
  f <- unit == 'Metric tons,  thousand' & category =='bagasse'
      quantity[f] <- quantity[f]*9.5
      unit[f] <- 'Terajoules' })

USenergy2014catUC <- within(USenergy2014catUC, {
  f <- unit == 'Metric tons,  thousand' & category =='bitumen'
      quantity[f] <- quantity[f]*40.2
      unit[f] <- 'Terajoules' })

USenergy2014catUC <- within(USenergy2014catUC, {
  f <- unit == 'Metric tons,  thousand' & grepl(paste( c('ethane','liquified_petroleum_gas','natural_gas_liquids',
                     'refinery_gas','white_spirit_and_special_boiling_point_industrial_spirits'), collapse="|"), category)
      quantity[f] <- quantity[f]*52.9369
      unit[f] <- 'Terajoules' })

USenergy2014catUC <- within(USenergy2014catUC, {
  f <- unit == 'Cubic metres, thousand' & category =='fuelwood'
      quantity[f] <- quantity[f]*14.286
      unit[f] <- 'Terajoules' })

USenergy2014catUC <- within(USenergy2014catUC, {
  f <- unit == 'Kilowatt-hours, million'
      quantity[f] <- quantity[f]*3.6
      unit[f] <- 'Terajoules' })

# clean up the resulting dataframe
USenergy2014catUC$f <- NULL # remove the added column 'f'
USenergy2014catUC$quantity_footnotes <- NULL  # remove the footnote column
USenergy2014catUC$commodity_transaction <- gsub("White spirit and special boiling point industrial spirits", "White spirit", USenergy2014catUC$commodity_transaction)    # shorten the long name

Select to visualize only Total Energy Supply for the US in 2014, arranging the sources from largest to smallest

USenergy2014supply <- filter(USenergy2014catUC, grepl ("Total energy supply", commodity_transaction, ignore.case = TRUE)) %>% arrange(quantity) %>% mutate(cumulative = cumsum(quantity))

#USenergy2014supply$commodity_transaction <- factor(USenergy2014supply$commodity_transaction, levels = USenergy2014supply$commodity_transaction)
#Converting to factor leaves the order in the x-axis as appears in the dataframe; otherwise geom_col reorders the string labels alphabetically; below I've used scale_x_discrete (limits) to do the same without converting commodity_transactions to factors.

p1 <- ggplot(data=USenergy2014supply, aes(x=commodity_transaction, y=quantity)) +
       geom_col(stat = "identity", color="red") +                                 # barplot
                 coord_flip() + theme_classic() +                                 # flip coordinates and display classic theme
       scale_x_discrete(limits=USenergy2014supply$commodity_transaction,          # for x-axis ticks, follow the order of dataframe
       label=function(x) gsub("- total energy supply", "",x, ignore.case = TRUE) ) +   # for labels, remove unnecessary wording
       labs(title="Fig. 1 - US Energy Supplies in 2014 (in TeraJoules)", subtitle = "1.e7 TJ = 9.5 Quads= 9.5e15 BTU",  # title and subtitle
       caption = "Source: Dataset from International Energy Statistics, Global trade and production 1990-2014\n www.kaggle.com/unitednations/international-energy-statistics") +                  # add a caption
       ylab("Energy Supplied by Each Source (TJ)") + theme(axis.title.y = element_blank())   # label the y axis - will display as x axis
p1             # plot shows all labels

ggplotly(p1)   # doesn't show subtitle or caption

Overlay the Cumulative Energy

p2 <- ggplot(data=USenergy2014supply, aes(x=commodity_transaction, y=quantity)) +
       geom_col(stat = "identity", color="red") +                                 # barplot
                 coord_flip() + theme_classic() +                                 # flip coordinates and display classic theme
       geom_point(aes(y=cumulative/4)) +
       scale_y_continuous(sec.axis= sec_axis(~.*4, name ="Cumulative Energy, TJ")) +        # add a second y axis for cumulative
       scale_x_discrete(limits=USenergy2014supply$commodity_transaction,          # for x-axis ticks, follow the order of dataframe
       label=function(x) gsub("- total energy supply", "",x, ignore.case = TRUE) ) +   # for labels, remove unnecessary wording
       labs(title="Fig. 2 - US Energy Supplies in 2014 (in TeraJoules)", subtitle = "1.e7 TJ = 9.5 Quads= 9.5e15 BTU",#title and subtitle
       caption = "Source: Dataset from International Energy Statistics, Global trade and production 1990-2014\n www.kaggle.com/unitednations/international-energy-statistics") +                  # add a caption
       ylab("Energy Supplied by Each Source (TJ)") + theme(axis.title.y = element_blank())   # label the y axis - will display as x axis

p2           # plot shows all labels

ggplotly(p2)   # doesn't show subtitle or caption

Filter for all energy transactions and commodities for the US during the 1990-2014 period

USenergy <- filter (energy, country_or_area == "United States")

Repeat the categorization by types of energy commodity or transaction - select a few commodities - clean up

USenergycat <- filter(USenergy, grepl(paste(c("production$", "final energy consumption", "total energy supply","imports","exports","main activity$"),collapse = "|"), commodity_transaction, ignore.case = TRUE))

#clean to avoid duplication - double counting of some energy commodities
USenergycat <- filter(USenergycat, !grepl ("gross production", commodity_transaction, ignore.case=TRUE))  # remove gross production; duplication with net production
USenergycat <- filter(USenergycat, !grepl("capacity", commodity_transaction, ignore.case=TRUE)) # remove capacity, not an actual supply
USenergycat <- filter(USenergycat, !grepl("Uranium - production", commodity_transaction, ignore.case=TRUE)) # remove U production, not an actual supply
USenergycat <- filter(USenergycat, !grepl("Of which", commodity_transaction, ignore.case=TRUE)) # remove "of which", duplicates other commodities

Convert all units to TeraJoules in order to make meaningful comparisons among various energy sectors

USenergycatUC <- within(USenergycat, {
  f <- unit == 'Metric tons,  thousand' & grepl(paste( c('additives_and_oxygenates','aviation_gasoline','biodiesel',
                                                         'biogasoline','conventional_crude_oil','fuel_oil',
                                                         'gas_oil_diesel_oil','kerosene_type_jet_fuel','lubricants',
                                                         'motor_gasoline','naphtha','other_hydrocarbons',
                                                         'other_kerosene','other_liquid_biofuels','other_oil_products_n_e_c',
                                                         'paraffin_waxes','refinery_feedstocks',
                                                         'gasoline_type_jet_fuel'),
                                                       collapse="|"), category)
      quantity[f] <- quantity[f]*41.868
      unit[f] <- 'Terajoules' })

USenergycatUC <- within(USenergycatUC, {
  f <- unit == 'Metric tons,  thousand' & grepl(paste( c('anthracite','brown_coal','charcoal','coking_coal',
                                                         'hard_coal','lignite','other_bituminous_coal',
                                                         'petroleum_coke','sub_bituminous_coal',
                                                         'patent_fuel','peat'), 
                                                        collapse="|"), category)
      quantity[f] <- quantity[f]*29.3076
      unit[f] <- 'Terajoules' })

USenergycatUC <- within(USenergycatUC, {
  f <- unit == 'Metric tons,  thousand' & category =='bagasse'
      quantity[f] <- quantity[f]*9.5
      unit[f] <- 'Terajoules' })

USenergycatUC <- within(USenergycatUC, {
  f <- unit == 'Metric tons,  thousand' & category =='bitumen'
      quantity[f] <- quantity[f]*40.2
      unit[f] <- 'Terajoules' })

USenergycatUC <- within(USenergycatUC, {
  f <- unit == 'Metric tons,  thousand' & grepl(paste( c('ethane','liquified_petroleum_gas','natural_gas_liquids',
                     'refinery_gas','white_spirit_and_special_boiling_point_industrial_spirits'), collapse="|"), category)
      quantity[f] <- quantity[f]*52.9369
      unit[f] <- 'Terajoules' })

USenergycatUC <- within(USenergycatUC, {
  f <- unit == 'Cubic metres, thousand' & category =='fuelwood'
      quantity[f] <- quantity[f]*14.286
      unit[f] <- 'Terajoules' })

USenergycatUC <- within(USenergycatUC, {
  f <- unit == 'Kilowatt-hours, million'
      quantity[f] <- quantity[f]*3.6
      unit[f] <- 'Terajoules' })

# clean up the resulting dataframe
USenergycatUC$f <- NULL # remove the added column 'f'
USenergycatUC$quantity_footnotes <- NULL  # remove the footnote column
USenergycatUC$commodity_transaction <- gsub("White spirit and special boiling point industrial spirits", "White spirit", USenergycatUC$commodity_transaction)    # shorten the long name

Select only Total Energy Supply for the US during 1990-2014, and group by year

USenergysupply <- filter(USenergycatUC, grepl ("Total energy supply", commodity_transaction, ignore.case = TRUE))
USenergysupplyannual <- group_by(USenergysupply, year) %>% summarise(totquantity = sum(quantity), .groups = "keep")

Display annual supply

p3 <- ggplot(USenergysupplyannual) +
   geom_point(aes(year, totquantity), color="turquoise1", size=2) +
   geom_smooth (aes(year, totquantity), color="blue", se = FALSE ) +
      theme_dark() +
      labs(title="Fig. 3 - Total Energy Supply", subtitle = ,
       caption = "Units: 1.e7 TJ = 9.5 Quads= 9.5e15 BTU\nSource: Dataset from International Energy Statistics, Global trade and production 1990-2014\n www.kaggle.com/unitednations/international-energy-statistics") +
       ylab("Energy Supplied (TJ)") + theme(axis.title.x = element_blank())
p3

ggplotly(p3)

Do regression analysis with a linear model

y= Energy Supplied, x= Year

y <- USenergysupplyannual$totquantity
x <- USenergysupplyannual$year
print (paste("Correlation coeff= ", as.character(cor(y, x))))  # get an idea of a linear correlation y ~ x
## [1] "Correlation coeff=  0.650545053850859"
model1 <- lm(totquantity ~ year, data = USenergysupplyannual)  # linear model
summary(model1)
## 
## Call:
## lm(formula = totquantity ~ year, data = USenergysupplyannual)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -12561453  -7173754    262660   6821474  11727736 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.591e+09  4.247e+08  -3.747  0.00105 ** 
## year         8.714e+05  2.121e+05   4.108  0.00043 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7648000 on 23 degrees of freedom
## Multiple R-squared:  0.4232, Adjusted R-squared:  0.3981 
## F-statistic: 16.88 on 1 and 23 DF,  p-value: 0.0004301
print ("Confidence interval")
## [1] "Confidence interval"
confint(model1)
##                     2.5 %     97.5 %
## (Intercept) -2469637961.6 -712591066
## year             432604.6    1310245
prediction1 <- model1 %>% predict(USenergysupplyannual)
print (paste ("RMSE = ", as.character(rmse(prediction1, y))))
## [1] "RMSE =  7336069.57310496"
# quick plot of fitted model
ggplot(data=USenergysupplyannual, aes(x,y)) +
  geom_point() +
  stat_smooth(method = lm, formula = y ~ x)

# quick plot of residuals
ggplot(data=USenergysupplyannual, aes(x,model1$residuals)) +
  geom_point()

# other stats: 
# coef(model1)
# sigma(model)*100/mean(USenergysupplyannual$totquantity)
# function for RMSE:  RMSE <- function(error) {sqrt(mean(error^2))}
#      call as RMSE(model$residuals)
# functions for R2 and R2 adj:  R2 <- summary(model)$r.squared;  R2adj <- summary(model)$adj.r.squared
# another way to calc: R2 <- function(preds, actual) {return(1 - sum((preds - actual)^2) / sum(actual - mean(actual))^2) }
#       call as R2(model$residuals,USenergysupplyannual$totquantity)

Do regression analysis with a generative additive model

Generative additive models optimize the knots for a spline regression fit y= Energy Supplied, x= Year

model2 <- gam(y ~ s(x), data=USenergysupplyannual)     # generative additive model
summary(model2)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## y ~ s(x)
## 
## Parametric coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 153477718     459825   333.8   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##        edf Ref.df     F p-value    
## s(x) 6.441  7.584 55.23  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.946   Deviance explained =   96%
## GCV = 7.5261e+12  Scale est. = 5.286e+12  n = 25
prediction2 <- model2 %>% predict(USenergysupplyannual)
## Warning in predict.gam(., USenergysupplyannual): not all required variables have been supplied in  newdata!
print (paste ("RMSE = ", as.character(rmse(prediction2, y))))
## [1] "RMSE =  1926809.16112615"
#quick plot
ggplot(USenergysupplyannual, aes(x,y)) +
  geom_point() +
  stat_smooth(method = gam, formula = y ~ s(x))

# quick plot of residuals
ggplot(data=USenergysupplyannual, aes(x,model2$residuals)) +
  geom_point()

### Conclusion from the Regression Analysis

Clearly the GAM, Model 2, fits much better than the linear Model 1. Model 1: adjR2=.398, P-value=.0004, t-value=4.1, F=17, RMSE=7336069 Model 2: adjR2=.946, P-value=2e-16, t-value=333, F=55, RMSE=1926809

The R2 close to 1, P closer to 0, RMSE smaller, and t and F higher, represent a “better” fit for Model 2.

————-

Identify the sources of energy in Total Energy Supply

Group by category (that is, types of sources of energy) and display

USenergysuppcatann <- group_by(USenergysupply, category, year) %>% summarise(totquantity = sum(quantity), .groups = "keep")

p4 <- ggplot(USenergysuppcatann) +
   geom_point(aes(year, totquantity, colour = category)) +
   geom_smooth (aes(year, totquantity, colour = category), se = FALSE ) +
    theme_bw() +
    theme(legend.position="bottom", legend.title = element_blank()) +
        labs(fill= "Energy Source", title="Fig. 4 - US Fossil Sources are Diversified", subtitle = ,
       caption = "Units: 1.e7 TJ = 9.5 Quads= 9.5e15 BTU\nSource: Dataset from International Energy Statistics, Global trade and production 1990-2014\n www.kaggle.com/unitednations/international-energy-statistics") +
       ylab("Energy Supplied (TJ)") + theme(axis.title.x = element_blank())
p4

ggplotly(p4)

Visualize with an area plot

USenergysuppcatannarg <-arrange(USenergysuppcatann, desc(totquantity))
mycolors <- colorRampPalette(brewer.pal(8, "Dark2")) (length(unique (USenergysuppcatannarg$category)))
p6 <- ggplot(USenergysuppcatannarg, aes(x=year, y=totquantity, fill= factor(category))) +
  geom_area(stat="identity") +
  scale_fill_manual(values = mycolors) +
      theme_classic() +
      theme(legend.position="bottom", legend.title = element_blank()) +
      labs(fill= "Energy Source", title="Fig. 5 - US Fossil Sources are Diversified", subtitle = ,
       caption = "Units: 1.e7 TJ = 9.5 Quads= 9.5e15 BTU\nSource: Dataset from International Energy Statistics, Global trade and production 1990-2014\n www.kaggle.com/unitednations/international-energy-statistics") +
       ylab("Energy Supplied (TJ)") + theme(axis.title.x = element_blank()) 
p6

ggplotly(p6)

Why is there a decrease starting around 2005?

Hypothesis: are renewables increasing?

USenergyelectr <- filter(USenergycatUC, grepl ("electricity", category, ignore.case = TRUE) & grepl(paste(c("production","main activity"),collapse = "|"), commodity_transaction, ignore.case = TRUE)) %>%
  filter(!grepl(paste(c("total production, main activity","net production"),collapse = "|"), commodity_transaction, ignore.case = TRUE)) %>%
  filter(!grepl(paste(c("Solar – Main activity","Electricity - total"),collapse = "|"), commodity_transaction, ignore.case = TRUE))

USenergyelectcatann <- group_by(USenergyelectr, commodity_transaction, year) %>% summarise(totquantity = sum(quantity), .groups = "keep")

Display types of electrical energy sources

mycolors <- colorRampPalette(brewer.pal(8, "Accent")) (length(unique (USenergyelectcatann$commodity_transaction)))
p7 <- ggplot(USenergyelectcatann) +
   geom_point(aes(year, totquantity, colour = commodity_transaction)) +
   geom_smooth (aes(year, totquantity, colour = commodity_transaction), se = FALSE ) +
    theme_dark() +
    theme(legend.position="bottom", legend.title = element_blank()) +
    labs(fill= "Energy Source", title="Fig. 6 - Electrical Energy from Multiple Sources", subtitle = ,
       caption = "Units: 1.e7 TJ = 9.5 Quads= 9.5e15 BTU\nSource: Dataset from International Energy Statistics, Global trade and production 1990-2014\n www.kaggle.com/unitednations/international-energy-statistics") +
       ylab("Energy from Each Source (TJ)") + theme(axis.title.x = element_blank())
p7

ggplotly(p7)

Display with area plot the cumulative electrical energy supply

mycolors <- colorRampPalette(brewer.pal(8, "Paired")) (length(unique (USenergyelectcatann$commodity_transaction)))
p8 <- ggplot(USenergyelectcatann, aes(x=year, y=totquantity, fill= factor(commodity_transaction))) +
  geom_area(stat="identity") +
   scale_fill_manual(values = mycolors) + 
      theme_dark() +
      theme(legend.position="bottom", legend.title = element_blank())  +
      labs(fill= "Energy Source", title="Fig. 7 - Total Electrical Energy Share is Growing", subtitle = ,
       caption = "Units: 1.e7 TJ = 9.5 Quads= 9.5e15 BTU\nSource: Dataset from International Energy Statistics, Global trade and production 1990-2014\n www.kaggle.com/unitednations/international-energy-statistics") +
       ylab("Energy Supply (TJ)") + theme(axis.title.x = element_blank())
p8

ggplotly(p8)

Add up all energy supply from fossil, biomass and electrical sources

USenergyelectannual <- group_by(USenergyelectcatann, year) %>% summarise(totelect = sum(totquantity), .groups = "keep")
USenergyannual <- merge(USenergysupplyannual, USenergyelectannual, by="year") %>% group_by(year) %>% mutate(totenergy = totquantity + totelect) %>% gather(key="energytype", value="totenergy", totquantity, totelect)

Display all energy

p9 <- ggplot(USenergyannual, aes(x=year, y=totenergy, fill= factor(energytype), text=energytype)) +
  geom_area(stat="identity") +
  scale_fill_manual(values = c("green2","blue4"), labels=c("Electrical","Oil, Gas, Biomass")) +
      theme_classic() +
      theme(legend.position="right", legend.title = element_blank()) +
  labs(fill= "Energy Type", title="Fig. 8 - US Energy Relies on Fossil and Biomass Sources", subtitle = ,
       caption = "Units: 1.e7 TJ = 9.5 Quads= 9.5e15 BTU\nSource: Dataset from International Energy Statistics, Global trade and production 1990-2014\n www.kaggle.com/unitednations/international-energy-statistics") +
       ylab("Total Energy Supplied (TJ)") + theme(axis.title.x = element_blank())
p9

ggplotly(p9)

### Change to long and calculate total energy

USenergyannual2 <- spread(USenergyannual, key = "energytype", value = "totenergy") %>% mutate(totenergy= totquantity+totelect)

Display the greenness of US energy supply

p10 <- ggplot(USenergyannual2, aes(x=year, y=totenergy, fill=totelect)) +
       geom_bar(stat = "identity") +
       theme_classic()  +
       labs(fill= "Electrical Energy (TJ)", title="Fig. 9 - US Energy Supply is Becoming Greener", subtitle = ,
       caption = "Units: 1.e7 TJ = 9.5 Quads= 9.5e15 BTU\nSource: Dataset from International Energy Statistics, Global trade and production 1990-2014\n www.kaggle.com/unitednations/international-energy-statistics") +
       ylab("Total Energy Supplied (TJ)") + theme(axis.title.x = element_blank()) +
       scale_fill_gradient(low= "darkgreen", high= "green2")
p10             # plot shows all labels

ggplotly(p10)   # doesn't show subtitle or caption

ESSAY

The Emergence of Clean Energy in the U.S.

R. Miranda November 17, 2020

Origin of the dataset

International Energy Statistics, Global trade and production 1990-2014 www.kaggle.com/unitednations/international-energy-statistics

Importance of the topic

The rapid rise of energy consumption is a mark of the successful expansion of humanity’s standard of living after the industrial revolution of the 19th century. Increasing demand for modern industries, transportation, housing, etc., has pressed communities and countries to extract more natural resources to supply their energy needs. Although we have not yet exhausted our resources, the rapidly increasing demand has put pressures on the planet that need to be dealt with now. The United Nations has called for action and declared seventeen Sustainable Development Goals to address all global challenges necessitating urgent solutions [1]. Goal 7 is Affordable and Clean Energy, noting that 3 billion people on this planet still lack clean cooking fuels and technologies.

Although this project does nothing to address such challenge, it brings attention to the variety of energy sources in use today in the U.S. Accounting for those supplies is the first step to understanding where the U.S. is taking action and could be taking more action towards one day achieving the goal of totally clean energy.

Description of the dataset and data manipulation

The Energy dataset describes transactions between different types of energy sources and consumption points for 180 countries during the period of 1990 to 2014. It includes all energy sectors, types of energy sources, transactions such as exports and imports, fuels during transit on ships or airplanes, consumption by various sectors (power, manufacturing industry, commercial, residential, transportation), known reserves in the case of fossil and nuclear resources, and built capacity in the case of electrical and nuclear power plants.

The dataset contains 1,189,482 records with 7 variables in a long format, which include Country, Year, Commodity or Transaction, Quantity of Energy, Unit, Category, and Footnotes. Year and Quantity are numerical variables and the others are categorical variables. The Commodity or Transaction variable describes the type of energy source (such as a specific type of coal), and whether it pertains to production, a consumption point (such as industrial manufacturing), or fuel in transport on ships or airplanes, or if it is imported, exported, or taken out of reserves (stocks). The Category variable aggregates all commodities of the same type, such as a specific type of coal, regardless of the type of transaction or whether it is a source or consumption point.

To understand the relationship between the various types of Commodity or Transactions, I had to spread that variable from long to wide, which originated 275 variables –not all of which are independent– and 4325 records. After examination, I concluded that Total Energy Supply is a combination of other transactions as follows:

Total Energy Supply = Production + Imports - Exports - Change in Stock - in transit (Marine or Aviation)

Analysis of the meaning of each type of Commodity or Transaction and interpretation of the data took the majority of the time in this project.

An additional complication was the variety of Units of energy used, which differed according to type of source (coal vs. hydrocarbons vs. gas vs. electrical, etc.) and even between different grades of coal (anthracite vs. bitumen) or biomass (bagasse vs. fuelwood). I decided to convert all units to Tera Joules (TJ or 10^12 J), the SI (Systeme Internationale) unit of energy, in order to make meaningful comparisons among various types of source. I must note that large quantities of energy in the U.S. literature are measured in Quad (quadrillion or 10^15) BTUs [2], with the equivalence of 1 Quad = 1.055x10^6 TJ. The charts and visualizations have scales divided into 10^7 TJ, that is, each division being approximately equal to 10 Quad.

The conversion factors used assumed that the energy content of all hydrocarbon liquids is similar to oils, all hydrocarbon gases and light liquids similar to natural gas, and all solid hydrocarbons similar to coal [3].

1 TOE = 41.868 GJ (metric ton of oil equivalent), i.e., 1000 TOE = 41.868 TJ 1 TCE = 29.3076 GJ (metric ton of coal equivalent), i.e., 1000 TCE = 29.3076 TJ 1 MWH = 3.6 GJ (megawatt-hour), i.e., 1000 MWH = 3.6 TJ

1 QUAD = 10^15 BTU = 1.055x10^18 joules = 1.055 EJ = 1.055x10^6 TJ For reference: the world energy consumption is about 575 Quad = 6.07x10^8 TJ, and the US consumption is about 100 Quad = 1.06x10^8 TJ [4]

Conversion factors used: oils - 41.868 coals - 29.3076 bitumen - 40.2 gas - 52.9369 electricity - 3.6 fuelwood - 14.286

Description of the data reduction

An initial step for exploration of the data, was to reduce the dataset to just the U.S., 19847 records, and one year, 2014. To make this analysis feasible in the limited time available, only a few of the Commodity or Transactions were selected: all those containing the terms Production, Final Energy Consumption, Total Energy Supply, Imports, Exports, and Main activity.

As denoted above Total Energy Supply depends upon Production, Imports, Exports, and constitutes a reasonable measure for further exploration. Before proceeding, we had to convert units for all commodities to TJ, and arranged all commodities from large to small.

Exploration of Total Energy Supply in the U.S. in 2014

Fig. 1 shows that the largest sources in the U.S. are of fossil origin: hydrocarbons, natural gas, and various types of coals. The fourth most important source is biomass-derived fuels of various types. Some interesting sources such as waste of various types appear, but not prominently.

Fig. 2 adds a second y-axis (or x-axis in the flipped graph) to show cumulative energy. The U.S. collects about 1.5x10^8 TJ of energy measured by “Total Energy Supply”. This is 50% larger than the total consumption. The likely explanation is that a significant part of the energy supplied is actually wasted due to inefficiency and not being able to be consumed. This points out the fact that a large “source” of energy is the improvement of energy efficiency. That is a major scientific and technological target in the U.S.[5]

Exploration of Total Energy Supply in the U.S. during 1990-2014

We increased the subset of U.S. data to include all years, and repeated the downselection of Commodity or Transactions, and converted all units to TJ, as before. For exploration, we grouped all sources (Commodity or Transactions) by year in order to display Total Energy Supply by year.

Fig. 3 shows the surprising result that Total Energy Supply peaked at 1.65x10^8 TJ around 2005 and underwent a decrease after that. We explored the possible reason later in the project.

This figure was a good candidate for some statistical analysis.

Analysis of Total Energy Supply by year from Fig. 3

We carried out a linear regression analysis using the lm() function (Model 1), calculating predicted values, and displaying quick charts of fit and residuals. Obviously the linear regression is a very poor fit as shown visually and by the statistics calculated. Then we resorted to a generative additive model (Model 2), which is an optimized splines fit, using the gam() function. Again we calculated predicted values, displayed quick charts of fit and residuals. This gam is a much better fit as shown by the statistics:

Model 1 (linear): adjR2=.398, P-value=.0004, t-value=4.1, F=17, RMSE=7336069 Model 2 (GAM): adjR2=.946, P-value=2e-16, t-value=333, F=55, RMSE=1926809 The R^2 close to 1, P closer to 0, RMSE smaller, and t and F higher, represent a “better” fit for Model 2.

Identification of the Sources of Energy for U.S. 1990-2014 as measured by Total Energy Supply

We re-grouped all sources by Category (types of sources) and by year, to show the diversity of sources.

Fig. 4, a geom_point and smoothed line chart, displays the evolution of the various sources during the lapse.

Fig. 5 is the same data but in a geom_area plot, which accumulates the sources by year and shows each contribution. It confirms the statement that the dominant fossil sources are hydrocarbons, natural gas, coals; biofuels are also dominant.

There’s is still a nagging question of why there is a peak in 2005.

Why is there a decrease in Total Energy Supply after 2005?

A first assumption (hypothesis) was that renewable energy was not being property measured by “Total Energy Supply”. A larger fraction of renewable energy is expected over the past few years.

An examination of the data showed that renewable energy is mainly shown under Production and conversion of various sources into electricity. Thus nuclear, solar, wind, geothermal, hydroelectric and also thermal power plants, are all converted into Electrical Energy. Thus we filtered all commodities that included production or conversion of electrical energy, and re-grouped by Commodity or Transaction and year.

Fig. 6 now shows the various types of electrical energy sources in a geom_point and line format.

Fig. 7 shows the same data in an area plot, showing the fractional contributions per year. It confirms (or reveals) that the main sources are thermal, nuclear, hydroelectric, and wind. As shown, solar was still in its infancy in 2014. Point to note: thermal (by combustion) is actually considered “cleaner” than distributed combustion, because the combustion is made in controlled units where CO2 and other exhausts can be collected. This is the so-called “clean-coal technology.” Thus thermal electricity is considered clean.

The other important observation is that the electrical energy source was increasing in the 1990-2010 period, and stayed about constant afterwards.

Adding up the various sources for 1990-2014

The datasets of fossil plus biomass measured by “total energy supply” and the “electrical energy supply” were merged, grouped by year and displayed.

Fig. 8 shows now the total energy supply that includes fossil, biomass, and all electrical sources over time. The contribution of electrical (clean) energy is now apparent. The U.S. still depends primarily on fossil and biomass . Fig. 9 replots the same data but depicting the “greening” of the energy over time in a more optimistic and glowing fashion.

What can be done better?

A lot can. The visualizations shown here are basic in terms of graphics. Not enough time was allocated to that aspect. Understanding the data was a major challenge. Many variables could have been explored further of course, in order to attain a complete “energy balance”. We did not answer satisfactorily the question of why all energy supply still appears to decay after 2005. It may be a deficiency in the dataset but that cannot be ascertained without a complete energy balance. For instance, we did not attempt to calculate all consumption. The flow of energy from source through use points (such as manufacturing and other industries) is potentially recoverable from the full dataset. Perhaps alluvial plots would be feasible, but getting the data appropriately sorted and arranged is a major challenge (I guess that’s why they featured this dataset in Kaggle…) The other study possible here is the global energy source evolution, and even that for particular countries of interest, with some geographical visualizations. This appears to be a major undertaking.

References

  1. https://www.un.org/sustainabledevelopment/sustainable-development-goals/
  2. François Cardarelli (Author), M.J. Shields (Translator) (2004). Encyclopaedia of Scientific Units, Weights and Measures: Their SI Equivalences and Origins, 1st Edition. Springer. ISBN 1-85233-682-X, quoted in web article titled “U.S. Customary Units”, at https://www.tau.ac.il/~tsirel/dump/Static/knowino.org/wiki/U.S-2.html
  3. https://www.eia.gov/energyexplained/units-and-calculators/
  4. https://www.eia.gov/energyexplained/us-energy-facts/
  5. https://www.energy.gov/science-innovation/energy-efficiency