Production, Trade and Supply of Energy

The dataset has 5 columns:

  1. Region code
  2. Region/Country/Area
  3. Year
  4. Value
  5. Footnotes
  6. Source

Following is the summary of the data:

# Importing data from .csv file

dat <- read_csv("G:/Project/IITB_FLOSS/Data/Data/SYB62_263_201904_Production, Trade and Supply of Energy.csv", 
    col_names = FALSE) # Removing unnecessary columns irrelevant for DA



dat <- dat[-c(1,2), -c(6,7)] 
colnames(dat) <- c("area_code", "region", "year", "series", "value") # New column names

# Renaming a few countries from dataset for Map plotting (under EDA)

dat$region[dat$region == "Russian Federation"] <- "Russia"
dat$region[dat$region == "United States of America"] <- "USA"
dat$region[dat$region == "Bolivia (Plurin. State of)"] <- "Bolivia"
dat$region[dat$region == "Venezuela (Boliv. Rep. of)"] <- "Venezuela"
dat$region[dat$region == "Iran (Islamic Republic of)"] <- "Iran"
dat$region[dat$region == "Congo"] <- "Democratic Republic of the Congo"
dat$region[dat$region == "United Rep. of Tanzania"] <- "Tanzania"
dat$region[dat$region == "Republic of Korea"] <- "South Korea"
dat$region[dat$region == "Dem. People's Rep. Korea"] <- "North Korea"

# Structure of the data

str(dat) 
## tibble [8,293 x 5] (S3: tbl_df/tbl/data.frame)
##  $ area_code: chr [1:8293] "1" "1" "1" "1" ...
##  $ region   : chr [1:8293] "Total, all countries or areas" "Total, all countries or areas" "Total, all countries or areas" "Total, all countries or areas" ...
##  $ year     : chr [1:8293] "1990" "1995" "2000" "2005" ...
##  $ series   : chr [1:8293] "Primary energy production (petajoules)" "Primary energy production (petajoules)" "Primary energy production (petajoules)" "Primary energy production (petajoules)" ...
##  $ value    : chr [1:8293] "360,903" "380,713" "412,155" "476,738" ...
# Changing data types wherever required

dat$area_code <- as.numeric(dat$area_code)
dat$region <- as.factor(dat$region)
dat$year <- as.numeric(dat$year)
dat$series <- as.factor(dat$series)               
dat$value <- as.numeric(gsub(",","",dat$value))

# Data summary

skim(dat) 
Data summary
Name dat
Number of rows 8293
Number of columns 5
_______________________
Column type frequency:
factor 2
numeric 3
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
region 0 1 FALSE 238 Afr: 40, Alg: 40, Arg: 40, Asi: 40
series 0 1 FALSE 5 Tot: 1802, Sup: 1801, Net: 1795, Pri: 1707

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
area_code 0 1 415.47 256.50 1 192 412 634 894 ▇▇▇▆▆
year 0 1 2006.12 8.99 1990 2000 2010 2015 2016 ▅▂▂▂▇
value 0 1 2718.62 24728.96 -26726 3 41 252 570980 ▇▁▁▁▁

Introduction

Exploratory Data Analysis (EDA) and comparison of different regression models is done for the data described in the previous section. A variety of graphical techniques are used to perform EDA primarily using dplyr and ggplot2 functions, and a few more available in Base R. Following are the details of the packages, R (and RStudio) versions used in this project:

sessionInfo() #Version information of R and packages
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18363)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_India.1252  LC_CTYPE=English_India.1252   
## [3] LC_MONETARY=English_India.1252 LC_NUMERIC=C                  
## [5] LC_TIME=English_India.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ggpubr_0.4.0      skimr_2.1.2       readr_1.4.0       viridis_0.5.1    
## [5] viridisLite_0.3.0 maps_3.3.0        ggplot2_3.3.2     dplyr_1.0.2      
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.1.0  xfun_0.18         repr_1.1.0        purrr_0.3.4      
##  [5] haven_2.3.1       carData_3.0-4     colorspace_1.4-1  vctrs_0.3.4      
##  [9] generics_0.0.2    htmltools_0.5.0   yaml_2.2.1        base64enc_0.1-3  
## [13] rlang_0.4.7       pillar_1.4.6      foreign_0.8-80    glue_1.4.2       
## [17] withr_2.3.0       readxl_1.3.1      lifecycle_0.2.0   stringr_1.4.0    
## [21] munsell_0.5.0     ggsignif_0.6.0    gtable_0.3.0      cellranger_1.1.0 
## [25] zip_2.1.1         evaluate_0.14     knitr_1.30        rio_0.5.16       
## [29] forcats_0.5.0     curl_4.3          fansi_0.4.1       highr_0.8        
## [33] broom_0.7.2       Rcpp_1.0.5        scales_1.1.1      backports_1.1.10 
## [37] jsonlite_1.7.1    abind_1.4-5       gridExtra_2.3     hms_0.5.3        
## [41] digest_0.6.25     stringi_1.5.3     openxlsx_4.2.3    rstatix_0.6.0    
## [45] grid_4.0.2        cli_2.0.2         tools_4.0.2       magrittr_1.5     
## [49] tibble_3.0.3      crayon_1.3.4      tidyr_1.1.2       car_3.0-10       
## [53] pkgconfig_2.0.3   ellipsis_0.3.1    data.table_1.13.0 rstudioapi_0.11  
## [57] assertthat_0.2.1  rmarkdown_2.5     R6_2.4.1          compiler_4.0.2

Exploratory Data analysis

a. Country-wise outlook

# Creating subsets (for EDA) of original dataset

## For Production

prodn2015 <- dat %>% filter(!region %in% c("Total, all countries or areas", "Asia", "Africa", "North America", "South America", "Europe") & series == "Primary energy production (petajoules)" & year == 2015) %>% select(region, value)

prodn2010 <- dat %>% filter(!region %in% c("Total, all countries or areas", "Asia", "Africa", "North America", "South America", "Europe") & series == "Primary energy production (petajoules)" & year == 2010) %>% select(region, value)

prodn2005 <- dat %>% filter(!region %in% c("Total, all countries or areas", "Asia", "Africa", "North America", "South America", "Europe") & series == "Primary energy production (petajoules)" & year == 2005) %>% select(region, value)

prodn2000 <- dat %>% filter(!region %in% c("Total, all countries or areas", "Asia", "Africa", "North America", "South America", "Europe") & series == "Primary energy production (petajoules)" & year == 2000) %>% select(region, value)

prodn1995 <- dat %>% filter(!region %in% c("Total, all countries or areas", "Asia", "Africa", "North America", "South America", "Europe") & series == "Primary energy production (petajoules)" & year == 1995) %>% select(region, value)

prodn1990 <- dat %>% filter(!region %in% c("Total, all countries or areas", "Asia", "Africa", "North America", "South America", "Europe") & series == "Primary energy production (petajoules)" & year == 1990) 


## For Trade

trade2015 <- dat %>% filter(!region %in% c("Total, all countries or areas", "Asia", "Africa", "North America", "South America", "Europe") & series == "Net imports [Imports - Exports - Bunkers] (petajoules)" & year == 2015) %>% select(region, value)

trade2010 <- dat %>% filter(!region %in% c("Total, all countries or areas", "Asia", "Africa", "North America", "South America", "Europe") & series == "Net imports [Imports - Exports - Bunkers] (petajoules)" & year == 2010) %>% select(region, value)

trade2005 <- dat %>% filter(!region %in% c("Total, all countries or areas", "Asia", "Africa", "North America", "South America", "Europe") & series == "Net imports [Imports - Exports - Bunkers] (petajoules)" & year == 2005) %>% select(region, value)

trade2000 <- dat %>% filter(!region %in% c("Total, all countries or areas", "Asia", "Africa", "North America", "South America", "Europe") & series == "Net imports [Imports - Exports - Bunkers] (petajoules)" & year == 2000) %>% select(region, value)

trade1995 <- dat %>% filter(!region %in% c("Total, all countries or areas", "Asia", "Africa", "North America", "South America", "Europe") & series == "Net imports [Imports - Exports - Bunkers] (petajoules)" & year == 1995) %>% select(region, value)

trade1990 <- dat %>% filter(!region %in% c("Total, all countries or areas", "Asia", "Africa", "North America", "South America", "Europe") & series == "Net imports [Imports - Exports - Bunkers] (petajoules)" & year == 1990) 


## For Supply

supply2015 <- dat %>% filter(!region %in% c("Total, all countries or areas", "Asia", "Africa", "North America", "South America", "Europe") & series == "Total supply (petajoules)" & year == 2015) %>% select(region, value)

supply2010 <- dat %>% filter(!region %in% c("Total, all countries or areas", "Asia", "Africa", "North America", "South America", "Europe") & series == "Total supply (petajoules)" & year == 2010) %>% select(region, value)

supply2005 <- dat %>% filter(!region %in% c("Total, all countries or areas", "Asia", "Africa", "North America", "South America", "Europe") & series == "Total supply (petajoules)" & year == 2005) %>% select(region, value)

supply2000 <- dat %>% filter(!region %in% c("Total, all countries or areas", "Asia", "Africa", "North America", "South America", "Europe") & series == "Total supply (petajoules)" & year == 2000) %>% select(region, value)

supply1995 <- dat %>% filter(!region %in% c("Total, all countries or areas", "Asia", "Africa", "North America", "South America", "Europe") & series == "Total supply (petajoules)" & year == 1995) %>% select(region, value)

supply1990 <- dat %>% filter(!region %in% c("Total, all countries or areas", "Asia", "Africa", "North America", "South America", "Europe") & series == "Total supply (petajoules)" & year == 1990)


# For EDA (plotting on map)
world <- map_data("world")  # Function from "maps" package

#Setting theme for map
plain <- theme(
  axis.text = element_blank(),
  axis.line = element_blank(),
  axis.ticks = element_blank(),
  panel.border = element_blank(),
  panel.grid = element_blank(),
  axis.title = element_blank(),
  panel.background = element_rect(fill = "white"),
  plot.title = element_text(hjust = 0.5)
)

i. Production

# Data Visualizations

## Initializing ggplot functions for the maps

attach(prodn1990)

map <- world %>% inner_join(prodn1990, by = "region")

one <- map %>% ggplot(aes(x = long, y = lat, group = group)) +
  coord_fixed(1.5) +
  geom_polygon(aes(fill = value)) +
  scale_fill_gradient(low = "steelblue   ", high = "darkolivegreen3", limits = c(min(value), max(value))) +
  plain + labs(fill = "Production (in PJ)", title = "Total production 1990")

detach(prodn1990)

attach(prodn1995)

map <- world %>% inner_join(prodn1995, by = "region")

two <- map %>% ggplot(aes(x = long, y = lat, group = group)) +
  coord_fixed(1.5) +
  geom_polygon(aes(fill = value)) +
  scale_fill_gradient(low = "steelblue   ", high = "darkolivegreen3", limits = c(min(value), max(value))) +
  plain + labs(fill = "Production (in PJ)", title = "Total production 1995")

detach(prodn1995)

attach(prodn2000)

map <- world %>% inner_join(prodn2000, by = "region")

three <- map %>% ggplot(aes(x = long, y = lat, group = group)) +
  coord_fixed(1.5) +
  geom_polygon(aes(fill = value)) +
  scale_fill_gradient(low = "steelblue   ", high = "darkolivegreen3", limits = c(min(value), max(value))) +
  plain + labs(fill = "Production (in PJ)", title = "Total production 2000")

detach(prodn2000)

attach(prodn2005)

map <- world %>% inner_join(prodn2005, by = "region")

four <- map %>% ggplot(aes(x = long, y = lat, group = group)) +
  coord_fixed(1.5) +
  geom_polygon(aes(fill = value)) +
  scale_fill_gradient(low = "steelblue   ", high = "darkolivegreen3", limits = c(min(value), max(value))) +
  plain + labs(fill = "Production (in PJ)", title = "Total production 2005")

detach(prodn2005)

attach(prodn2010)

map <- world %>% inner_join(prodn2010, by = "region")

five <- map %>% ggplot(aes(x = long, y = lat, group = group)) +
  coord_fixed(1.5) +
  geom_polygon(aes(fill = value)) +
  scale_fill_gradient(low = "steelblue   ", high = "darkolivegreen3", limits = c(min(value), max(value))) +
  plain + labs(fill = "Production (in PJ)", title = "Total production 2010")

detach(prodn2010)

attach(prodn2015)

map <- world %>% inner_join(prodn2015, by = "region")

six <- map %>% ggplot(aes(x = long, y = lat, group = group)) +
  coord_fixed(1.5) +
  geom_polygon(aes(fill = value)) +
  scale_fill_gradient(low = "steelblue", high = "darkolivegreen3", limits = c(min(value), max(value))) +
  plain + labs(fill = "Production (in PJ)", title = "Total production 2015")

detach(prodn2015)

# Calling the ggplot functions
one

two

three

four

five

six

ii. Trade

# Data visualization

## Initializing ggplot functions for the maps

attach(trade1990)

map <- world %>% inner_join(trade1990, by = "region")

one <- map %>% ggplot(aes(x = long, y = lat, group = group)) +
  coord_fixed(1.5) +
  geom_polygon(aes(fill = value)) +
  scale_fill_gradient(low = "steelblue   ", high = "firebrick1", limits = c(min(value), max(value))) +
  plain + labs(fill = "Petajoules", title = "Net imports 1990")

detach(trade1990)

attach(trade1995)

map <- world %>% inner_join(trade1995, by = "region")

two <- map %>% ggplot(aes(x = long, y = lat, group = group)) +
  coord_fixed(1.5) +
  geom_polygon(aes(fill = value)) +
  scale_fill_gradient(low = "steelblue   ", high = "firebrick1", limits = c(min(value), max(value))) +
  plain + labs(fill = "Petajoules", title = "Net imports 1995")

detach(trade1995)

attach(trade2000)

map <- world %>% inner_join(trade2000, by = "region")

three <- map %>% ggplot(aes(x = long, y = lat, group = group)) +
  coord_fixed(1.5) +
  geom_polygon(aes(fill = value)) +
  scale_fill_gradient(low = "steelblue   ", high = "firebrick1", limits = c(min(value), max(value))) +
  plain + labs(fill = "Petajoules", title = "Net imports 2000")

detach(trade2000)

attach(trade2005)

map <- world %>% inner_join(trade2005, by = "region")

four <- map %>% ggplot(aes(x = long, y = lat, group = group)) +
  coord_fixed(1.5) +
  geom_polygon(aes(fill = value)) +
  scale_fill_gradient(low = "steelblue   ", high = "firebrick1", limits = c(min(value), max(value))) +
  plain + labs(fill = "Petajoules", title = "Net imports 2005")

detach(trade2005)

attach(trade2010)

map <- world %>% inner_join(trade2010, by = "region")

five <- map %>% ggplot(aes(x = long, y = lat, group = group)) +
  coord_fixed(1.5) +
  geom_polygon(aes(fill = value)) +
  scale_fill_gradient(low = "steelblue   ", high = "firebrick1", limits = c(min(value), max(value))) +
  plain + labs(fill = "Petajoules", title = "Net imports 2010")

detach(trade2010)

attach(trade2015)

map <- world %>% inner_join(trade2015, by = "region")

six <- map %>% ggplot(aes(x = long, y = lat, group = group)) +
  coord_fixed(1.5) +
  geom_polygon(aes(fill = value)) +
  scale_fill_gradient(low = "steelblue", high = "firebrick1", limits = c(min(value), max(value))) +
  plain + labs(fill = "Petajoules", title = "Net imports 2015")

detach(trade2015)

# Calling the ggplot functions

one

two

three

four

five

six

iii. Total supply

# Data visualization

## Initializing ggplot functions for the maps

attach(supply1990)

map <- world %>% inner_join(supply1990, by = "region")

one <- map %>% ggplot(aes(x = long, y = lat, group = group)) +
  coord_fixed(1.5) +
  geom_polygon(aes(fill = value)) +
  scale_fill_gradient(low = "yellow", high = "red", limits = c(min(value), max(value))) +
  plain + labs(fill = "Petajoules", title = "Total supply 1990")

detach(supply1990)

attach(supply1995)

map <- world %>% inner_join(supply1995, by = "region")

two <- map %>% ggplot(aes(x = long, y = lat, group = group)) +
  coord_fixed(1.5) +
  geom_polygon(aes(fill = value)) +
  scale_fill_gradient(low = "yellow", high = "red", limits = c(min(value), max(value))) +
  plain + labs(fill = "Petajoules", title = "Total supply 1995")

detach(supply1995)

attach(supply2000)

map <- world %>% inner_join(supply2000, by = "region")

three <- map %>% ggplot(aes(x = long, y = lat, group = group)) +
  coord_fixed(1.5) +
  geom_polygon(aes(fill = value)) +
  scale_fill_gradient(low = "yellow", high = "red", limits = c(min(value), max(value))) +
  plain + labs(fill = "Petajoules", title = "Total supply 2000")

detach(supply2000)

attach(supply2005)

map <- world %>% inner_join(supply2005, by = "region")

four <- map %>% ggplot(aes(x = long, y = lat, group = group)) +
  coord_fixed(1.5) +
  geom_polygon(aes(fill = value)) +
  scale_fill_gradient(low = "yellow", high = "red", limits = c(min(value), max(value))) +
  plain + labs(fill = "Petajoules", title = "Total supply 2005")

detach(supply2005)

attach(supply2010)

map <- world %>% inner_join(supply2010, by = "region")

five <- map %>% ggplot(aes(x = long, y = lat, group = group)) +
  coord_fixed(1.5) +
  geom_polygon(aes(fill = value)) +
  scale_fill_gradient(low = "yellow", high = "red", limits = c(min(value), max(value))) +
  plain + labs(fill = "Petajoules", title = "Total supply 2010")

detach(supply2010)

attach(supply2015)

map <- world %>% inner_join(supply2015, by = "region")

six <- map %>% ggplot(aes(x = long, y = lat, group = group)) +
  coord_fixed(1.5) +
  geom_polygon(aes(fill = value)) +
  scale_fill_gradient(low = "yellow", high = "red", limits = c(min(value), max(value))) +
  plain + labs(fill = "Petajoules", title = "Total supply 2015")

detach(supply2015)

# Calling the created ggplot functions

one

two

three

four

five

six

b. Global outlook

# Subsetting relevant data based on the factors "region" and "series"

attach(dat)

energy_total <- dat %>% filter(region == "Total, all countries or areas" &  series == "Primary energy production (petajoules)")
imports_total <- dat %>% filter(region == "Total, all countries or areas" &  series == "Net imports [Imports - Exports - Bunkers] (petajoules)")
stocks_total <- dat %>% filter(region == "Total, all countries or areas" &  series == "Changes in stocks (petajoules)")
supply_total <- dat %>% filter(region == "Total, all countries or areas" &  series == "Total supply (petajoules)")
supply_pc_total <- dat %>% filter(region == "Total, all countries or areas" &  series == "Supply per capita (gigajoules)")

detach(dat)
# Data visualization

a <- energy_total %>% ggplot(aes(year, value)) + geom_line(col = 2) + geom_point() + labs(x = "Year", y = "Petajoules") + theme_minimal() 
b <- imports_total %>% ggplot(aes(year, value)) + geom_line(col = 3) + geom_point()+ labs(x = "Year", y = "Petajoules") + theme_minimal()
c <- stocks_total %>% ggplot(aes(year, value)) + geom_line(col = 4) + geom_point() + labs(x = "Year", y = "Petajoules") + theme_minimal()
d <- supply_total %>% ggplot(aes(year, value)) + geom_line(col = 5) + geom_point() + labs(x = "Year", y = "Petajoules") + theme_minimal()
e <- supply_pc_total %>% ggplot(aes(year, value)) + geom_line(col = 6) + geom_point() + labs(x = "Year", y = "Gigajoules") + theme_minimal()

# Arranging and calling all the plots together

ggarrange(a + labs(title = "Production"), 
          b + labs(title = "Net import"), 
          c + labs(title = "Changes in stocks"), 
          d + labs(title = "Total Supply"), 
          e + labs(title = "Supply per capita"), ncol = 3, nrow = 2)

Regression analysis

Linear regression vs LOESS regression vs Quadratic regression

# Creating ggplot functions of the plots

# LOESS models

f <- a + geom_smooth(formula = "y ~ x", method = loess, col = "firebrick1") +
  labs(subtitle = "LOESS")
g <- b + geom_smooth(formula = "y ~ x", method = loess, col = "firebrick1") +
  labs(subtitle = "LOESS")
h <- c + geom_smooth(formula = "y ~ x", method = loess, col = "firebrick1") +
  labs(subtitle = "LOESS")
i <- d + geom_smooth(formula = "y ~ x", method = loess, col = "firebrick1") +
  labs(subtitle = "LOESS")
j <- e + geom_smooth(formula = "y ~ x", method = loess, col = "firebrick1") +
  labs(subtitle = "LOESS")

# Linear models

k <- a + geom_smooth(formula = "y ~ x", method = lm, col = "seagreen") +
  labs(subtitle = "Linear")
l <- b + geom_smooth(formula = "y ~ x", method = lm, col = "seagreen") +
  labs(subtitle = "Linear")
m <- c + geom_smooth(formula = "y ~ x", method = lm, col = "seagreen") +
  labs(subtitle = "Linear")
n <- d + geom_smooth(formula = "y ~ x", method = lm, col = "seagreen") +
  labs(subtitle = "Linear")
o <- e + geom_smooth(formula = "y ~ x", method = lm, col = "seagreen") +
  labs(subtitle = "Linear")


# Quadratic models

p <- a + geom_smooth(method="lm", formula = y ~ x + I(x^2)) +
  labs(subtitle = "Quadratic")
q <- b + geom_smooth(method="lm", formula = y ~ x + I(x^2)) +
  labs(subtitle = "Quadratic")
r <- c + geom_smooth(method="lm", formula = y ~ x + I(x^2)) +
  labs(subtitle = "Quadratic")
s <- d + geom_smooth(method="lm", formula = y ~ x + I(x^2)) +
  labs(subtitle = "Quadratic")
t <- e + geom_smooth(method="lm", formula = y ~ x + I(x^2)) +
  labs(subtitle = "Quadratic")

i. Total production

# Calling and arranging required ggplot functions together where series == Total production

ggarrange(k,f,p, ncol = 1, nrow = 3) %>% 
  annotate_figure(top = text_grob("Primary energy production", 
                                  face = "bold", size = 15))

ii. Net import

# Calling and arranging required ggplot functions together where series == Net imports

ggarrange(l,g,q, ncol = 1, nrow = 3) %>% 
  annotate_figure(top = text_grob("Net Imports", 
                                  face = "bold", size = 15))

iii. Changes in stocks

# Calling and arranging required ggplot functions together where series == Changes in stocks

ggarrange(m,h,r, ncol = 1, nrow = 3) %>% 
  annotate_figure(top = text_grob("Changes in stocks", 
                                  face = "bold", size = 15))

iv. Total supply

# Calling and arranging required ggplot functions together where series == Total supply

ggarrange(n,i,s, ncol = 1, nrow = 3) %>% 
  annotate_figure(top = text_grob("Total supply", 
                                  face = "bold", size = 15))

v. Supply per capita

# Calling and arranging required ggplot functions together where series == Supply per capita

ggarrange(o,j,t, ncol = 1, nrow = 3) %>% 
  annotate_figure(top = text_grob("Supply per capita", 
                                  face = "bold", size = 15))

# Comparing the regression models

# R^2 values for fitted linear models

r1 <- summary(lm(energy_total$value~energy_total$year))$r.squared
r2 <- summary(lm(imports_total$value~imports_total$year))$r.squared
r3 <- summary(lm(supply_total$value~supply_total$year))$r.squared
r4 <- summary(lm(supply_pc_total$value~supply_pc_total$year))$r.squared

# Printing these R^2 values
cat("LINEAR REGRESSION\nCoeff. of det (Total prodn): ", r1,
    "\nCoeff. of det (Net imports): ", r2,
    "\nCoeff. of det (Total supply): ", r3,
    "\nCoeff. of det (Supply per capita): ", r4)
## LINEAR REGRESSION
## Coeff. of det (Total prodn):  0.9817323 
## Coeff. of det (Net imports):  0.8402147 
## Coeff. of det (Total supply):  0.9862789 
## Coeff. of det (Supply per capita):  0.9570919
# R^2 values to compare between linear, quadratic, and LOESS models and check
# for the best possible fit for non-linear data

# 1. Changes in stocks

stocks.lo <- loess(stocks_total$value~stocks_total$year)
r.lo <- cor(stocks_total$value, predict(stocks.lo))

r.lm <- summary(lm(stocks_total$value~stocks_total$year))$r.squared

r.qm <- summary(lm(stocks_total$value~stocks_total$year+I(stocks_total$year^2)))$r.squared

cat("Coeffs. of determination for variable 'Changes in Stocks'
    \nLinear regression: ", r.lm,
    "\nQuadratic regression: ", r.qm,
    "\nLOESS regression: ", r.lo)
## Coeffs. of determination for variable 'Changes in Stocks'
##     
## Linear regression:  0.04427113 
## Quadratic regression:  0.1889551 
## LOESS regression:  0.7909008
# 2. Net imports

imports.lo <- loess(imports_total$value~imports_total$year)
r1.lo <- cor(imports_total$value, predict(imports.lo))

r1.lm <- summary(lm(imports_total$value~imports_total$year))$r.squared

r1.qm <- summary(lm(imports_total$value~imports_total$year+I(imports_total$year^2)))$r.squared

cat("Coeffs. of determination for variable 'Net imports'
    \nLinear regression: ", r1.lm,
    "\nQuadratic regression: ", r1.qm,
    "\nLOESS regression: ", r1.lo)
## Coeffs. of determination for variable 'Net imports'
##     
## Linear regression:  0.8402147 
## Quadratic regression:  0.8983214 
## LOESS regression:  0.9939503

Results and discussions

EDA

The exploratory data analysis done for the dataset has a few limitations. The analysis could not capture information of countries whose data was unavailable (Russia in 1990). The dataset was not linked/joined with the data of map (used from package “maps”) due to conflicts in a few countries’ political boundaries. This resulted in missing out on information of those countries (not major when it comes to economy and power in production, supply, and trade) for analysis.

Country-wise outlook

When we look at changes in world wide production, trade and supply, it is evident that over the years USA has deemed itself worthy of being called a superpower nation. Constantly, from 1990 to 2015, USA has been one of the very few leading producers, importer and exporter of energy.

China, on the other hand, has had the most significant growth over the years which ultimately resulted in USA becoming second to it. By mid 2000s, China overtook USA to become the leading primary energy producer. By early 2010s, China had already gained the reputation and had become the biggest supplier. In the following years, by 2015, the country rose to become the biggest importer as well, beating USA in the business that they once owned like gods.

Russia has constantly been significantly important as well when it comes to total production; but is second to almost every other country in terms of net imports and supply. Over the years, the country has become one of the least importers; perhaps because of its home production whereas most of the other countries like India have been importing more and more energy over the years. Overall, apart from USA and China, all the other countries are significantly less involved in the activities, be it production, trade or supply.

Global outlook

By the line graphs of the global activities, it is quite obvious that total production and supply has been linearly growing (thanks to China and USA). On the contrary to that, net import has been constantly declining, except for a small rise between 2005 and 2010. This could possibly be because most of the countries in need of resources have been tending to look for self-production of the same. This results in significant development in terms of national savings.

As total supply (in PJ) has been on a rise, supply per capita (in GJ) has also been increasing over the years.

Changes in stocks has been unpredictable. Over the years from 1990 to 2000, there was a steep decline and then for the following 10 years, a incline. Since 2014, the decline makes a show again and the curve of change has been negative ever since then.

Regression

Total Production, Total Supply, and Supply per capita are linear in nature (as discussed in EDA). Hence, fitting a linear model would be ideal for those variables.

On fitting linear models as mentioned above, the coeffs. of determination are 0.9817323, 0.9862789, 0.9570919 respectively (all close to 1, which determines the goodness of fit). This implies that our assumption for modelling was right and hence, linear modelling is better than other alternatives like Quadratic and LOESS regression.

Now, since Net imports and Changes in Stocks are not linear in nature (discovered by EDA and \(R^{2}\)), we proceed and look for other alternatives. As discussed in the previous section,the \(R^{2}\) value determines LOESS regression to be the best fit for both Net Imports and Changes in Stocks since both the values are 0.7909008 and 0.9939503 respectively (relatively much better than that of Linear and Quadratic models [in that particular order]). Thus, we may proceed and declare that LOESS regression is the best option for predictive analytics on the two variables considered.

Conclusion

Through sophisticated exploratory data analysis, trends in growth of production, trade, and supply activities worldwide. The race for development between USA and China is significantly evident in the stretch of years since 1990. Apart from the two global superpowers, India and Russia share major portion of the market.

Increasing trend in primary energy production, and total and per capita supply of energy over decades suggest the ever increasing demand and supply chain of energy business. Developing countries like India have and will be producing and importing for years to come. The trend can be understood very well with the help of numerous statistical models. To understand linearly increasing/decreasing trends, linear models are handy, and to understand non-linear patterns, LOESS and Quadratic models suffice.

References

Error: Aesthetics must be either length 1 or the same as the data (2): fill. (2018, October 4). RStudio Community. https://community.rstudio.com/t/error-aesthetics-must-be-either-length-1-or-the-same-as-the-data-2-fill/15579

Roberts, D. (n.d.). Statistic 2 - Comparing Exponential and Power Regression Models. MathBits.Com - Fred and Donna Roberts. Retrieved November 7, 2020, from https://mathbits.com/MathBits/TISection/Statistics2/compareExpPower.html

R help - Non-linear Regression best-fit line. (n.d.). Nabble Forum. Retrieved November 6, 2020, from https://r.789695.n4.nabble.com/Non-linear-Regression-best-fit-line-td3606769.html

UCLA: Statistical Consulting Group. (2011). How can I explore different smooths in ggplot2? UCLA Stats. https://stats.idre.ucla.edu/r/faq/how-can-i-explore-different-smooths-in-ggplot2/

ggplot2 colors : How to change colors automatically and manually? - Easy Guides - Wiki - STHDA. (n.d.). Http://Www.Sthda.Com/. Retrieved November 5, 2020, from http://www.sthda.com/english/wiki/ggplot2-colors-how-to-change-colors-automatically-and-manually

Power regression in R similar to excel. (2013, August 19). Stack Overflow. https://stackoverflow.com/questions/18305852/power-regression-in-r-similar-to-excel

Get Initial Parameter Estimates for Nonlinear Least Squares Models. (n.d.). GetInitial Blog. Retrieved November 7, 2020, from https://docs.tibco.com/pub/enterprise-runtime-for-R/5.1.0/doc/html/Language_Reference/stats/getInitial.html

Does lack of seasonality imply random time series? (2019, April 18). Cross Validated. https://stats.stackexchange.com/questions/403819/does-lack-of-seasonality-imply-random-time-series

Alice, M. (2017, April 28). Fitting Polynomial Regression in R. DataScience+. https://datascienceplus.com/fitting-polynomial-regression-r/

Fitting Nonlinear Mixed-Effects Models. (2020, November 7). SpringerLink. https://link.springer.com/chapter/10.1007%2F0-387-22747-4_8