The dataset has 5 columns:
Following is the summary of the data:
# Importing data from .csv file
dat <- read_csv("G:/Project/IITB_FLOSS/Data/Data/SYB62_263_201904_Production, Trade and Supply of Energy.csv",
col_names = FALSE) # Removing unnecessary columns irrelevant for DA
dat <- dat[-c(1,2), -c(6,7)]
colnames(dat) <- c("area_code", "region", "year", "series", "value") # New column names
# Renaming a few countries from dataset for Map plotting (under EDA)
dat$region[dat$region == "Russian Federation"] <- "Russia"
dat$region[dat$region == "United States of America"] <- "USA"
dat$region[dat$region == "Bolivia (Plurin. State of)"] <- "Bolivia"
dat$region[dat$region == "Venezuela (Boliv. Rep. of)"] <- "Venezuela"
dat$region[dat$region == "Iran (Islamic Republic of)"] <- "Iran"
dat$region[dat$region == "Congo"] <- "Democratic Republic of the Congo"
dat$region[dat$region == "United Rep. of Tanzania"] <- "Tanzania"
dat$region[dat$region == "Republic of Korea"] <- "South Korea"
dat$region[dat$region == "Dem. People's Rep. Korea"] <- "North Korea"
# Structure of the data
str(dat)
## tibble [8,293 x 5] (S3: tbl_df/tbl/data.frame)
## $ area_code: chr [1:8293] "1" "1" "1" "1" ...
## $ region : chr [1:8293] "Total, all countries or areas" "Total, all countries or areas" "Total, all countries or areas" "Total, all countries or areas" ...
## $ year : chr [1:8293] "1990" "1995" "2000" "2005" ...
## $ series : chr [1:8293] "Primary energy production (petajoules)" "Primary energy production (petajoules)" "Primary energy production (petajoules)" "Primary energy production (petajoules)" ...
## $ value : chr [1:8293] "360,903" "380,713" "412,155" "476,738" ...
# Changing data types wherever required
dat$area_code <- as.numeric(dat$area_code)
dat$region <- as.factor(dat$region)
dat$year <- as.numeric(dat$year)
dat$series <- as.factor(dat$series)
dat$value <- as.numeric(gsub(",","",dat$value))
# Data summary
skim(dat)
| Name | dat |
| Number of rows | 8293 |
| Number of columns | 5 |
| _______________________ | |
| Column type frequency: | |
| factor | 2 |
| numeric | 3 |
| ________________________ | |
| Group variables | None |
Variable type: factor
| skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
|---|---|---|---|---|---|
| region | 0 | 1 | FALSE | 238 | Afr: 40, Alg: 40, Arg: 40, Asi: 40 |
| series | 0 | 1 | FALSE | 5 | Tot: 1802, Sup: 1801, Net: 1795, Pri: 1707 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| area_code | 0 | 1 | 415.47 | 256.50 | 1 | 192 | 412 | 634 | 894 | ▇▇▇▆▆ |
| year | 0 | 1 | 2006.12 | 8.99 | 1990 | 2000 | 2010 | 2015 | 2016 | ▅▂▂▂▇ |
| value | 0 | 1 | 2718.62 | 24728.96 | -26726 | 3 | 41 | 252 | 570980 | ▇▁▁▁▁ |
Exploratory Data Analysis (EDA) and comparison of different regression models is done for the data described in the previous section. A variety of graphical techniques are used to perform EDA primarily using dplyr and ggplot2 functions, and a few more available in Base R. Following are the details of the packages, R (and RStudio) versions used in this project:
sessionInfo() #Version information of R and packages
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18363)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_India.1252 LC_CTYPE=English_India.1252
## [3] LC_MONETARY=English_India.1252 LC_NUMERIC=C
## [5] LC_TIME=English_India.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ggpubr_0.4.0 skimr_2.1.2 readr_1.4.0 viridis_0.5.1
## [5] viridisLite_0.3.0 maps_3.3.0 ggplot2_3.3.2 dplyr_1.0.2
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.1.0 xfun_0.18 repr_1.1.0 purrr_0.3.4
## [5] haven_2.3.1 carData_3.0-4 colorspace_1.4-1 vctrs_0.3.4
## [9] generics_0.0.2 htmltools_0.5.0 yaml_2.2.1 base64enc_0.1-3
## [13] rlang_0.4.7 pillar_1.4.6 foreign_0.8-80 glue_1.4.2
## [17] withr_2.3.0 readxl_1.3.1 lifecycle_0.2.0 stringr_1.4.0
## [21] munsell_0.5.0 ggsignif_0.6.0 gtable_0.3.0 cellranger_1.1.0
## [25] zip_2.1.1 evaluate_0.14 knitr_1.30 rio_0.5.16
## [29] forcats_0.5.0 curl_4.3 fansi_0.4.1 highr_0.8
## [33] broom_0.7.2 Rcpp_1.0.5 scales_1.1.1 backports_1.1.10
## [37] jsonlite_1.7.1 abind_1.4-5 gridExtra_2.3 hms_0.5.3
## [41] digest_0.6.25 stringi_1.5.3 openxlsx_4.2.3 rstatix_0.6.0
## [45] grid_4.0.2 cli_2.0.2 tools_4.0.2 magrittr_1.5
## [49] tibble_3.0.3 crayon_1.3.4 tidyr_1.1.2 car_3.0-10
## [53] pkgconfig_2.0.3 ellipsis_0.3.1 data.table_1.13.0 rstudioapi_0.11
## [57] assertthat_0.2.1 rmarkdown_2.5 R6_2.4.1 compiler_4.0.2
# Creating subsets (for EDA) of original dataset
## For Production
prodn2015 <- dat %>% filter(!region %in% c("Total, all countries or areas", "Asia", "Africa", "North America", "South America", "Europe") & series == "Primary energy production (petajoules)" & year == 2015) %>% select(region, value)
prodn2010 <- dat %>% filter(!region %in% c("Total, all countries or areas", "Asia", "Africa", "North America", "South America", "Europe") & series == "Primary energy production (petajoules)" & year == 2010) %>% select(region, value)
prodn2005 <- dat %>% filter(!region %in% c("Total, all countries or areas", "Asia", "Africa", "North America", "South America", "Europe") & series == "Primary energy production (petajoules)" & year == 2005) %>% select(region, value)
prodn2000 <- dat %>% filter(!region %in% c("Total, all countries or areas", "Asia", "Africa", "North America", "South America", "Europe") & series == "Primary energy production (petajoules)" & year == 2000) %>% select(region, value)
prodn1995 <- dat %>% filter(!region %in% c("Total, all countries or areas", "Asia", "Africa", "North America", "South America", "Europe") & series == "Primary energy production (petajoules)" & year == 1995) %>% select(region, value)
prodn1990 <- dat %>% filter(!region %in% c("Total, all countries or areas", "Asia", "Africa", "North America", "South America", "Europe") & series == "Primary energy production (petajoules)" & year == 1990)
## For Trade
trade2015 <- dat %>% filter(!region %in% c("Total, all countries or areas", "Asia", "Africa", "North America", "South America", "Europe") & series == "Net imports [Imports - Exports - Bunkers] (petajoules)" & year == 2015) %>% select(region, value)
trade2010 <- dat %>% filter(!region %in% c("Total, all countries or areas", "Asia", "Africa", "North America", "South America", "Europe") & series == "Net imports [Imports - Exports - Bunkers] (petajoules)" & year == 2010) %>% select(region, value)
trade2005 <- dat %>% filter(!region %in% c("Total, all countries or areas", "Asia", "Africa", "North America", "South America", "Europe") & series == "Net imports [Imports - Exports - Bunkers] (petajoules)" & year == 2005) %>% select(region, value)
trade2000 <- dat %>% filter(!region %in% c("Total, all countries or areas", "Asia", "Africa", "North America", "South America", "Europe") & series == "Net imports [Imports - Exports - Bunkers] (petajoules)" & year == 2000) %>% select(region, value)
trade1995 <- dat %>% filter(!region %in% c("Total, all countries or areas", "Asia", "Africa", "North America", "South America", "Europe") & series == "Net imports [Imports - Exports - Bunkers] (petajoules)" & year == 1995) %>% select(region, value)
trade1990 <- dat %>% filter(!region %in% c("Total, all countries or areas", "Asia", "Africa", "North America", "South America", "Europe") & series == "Net imports [Imports - Exports - Bunkers] (petajoules)" & year == 1990)
## For Supply
supply2015 <- dat %>% filter(!region %in% c("Total, all countries or areas", "Asia", "Africa", "North America", "South America", "Europe") & series == "Total supply (petajoules)" & year == 2015) %>% select(region, value)
supply2010 <- dat %>% filter(!region %in% c("Total, all countries or areas", "Asia", "Africa", "North America", "South America", "Europe") & series == "Total supply (petajoules)" & year == 2010) %>% select(region, value)
supply2005 <- dat %>% filter(!region %in% c("Total, all countries or areas", "Asia", "Africa", "North America", "South America", "Europe") & series == "Total supply (petajoules)" & year == 2005) %>% select(region, value)
supply2000 <- dat %>% filter(!region %in% c("Total, all countries or areas", "Asia", "Africa", "North America", "South America", "Europe") & series == "Total supply (petajoules)" & year == 2000) %>% select(region, value)
supply1995 <- dat %>% filter(!region %in% c("Total, all countries or areas", "Asia", "Africa", "North America", "South America", "Europe") & series == "Total supply (petajoules)" & year == 1995) %>% select(region, value)
supply1990 <- dat %>% filter(!region %in% c("Total, all countries or areas", "Asia", "Africa", "North America", "South America", "Europe") & series == "Total supply (petajoules)" & year == 1990)
# For EDA (plotting on map)
world <- map_data("world") # Function from "maps" package
#Setting theme for map
plain <- theme(
axis.text = element_blank(),
axis.line = element_blank(),
axis.ticks = element_blank(),
panel.border = element_blank(),
panel.grid = element_blank(),
axis.title = element_blank(),
panel.background = element_rect(fill = "white"),
plot.title = element_text(hjust = 0.5)
)
# Data Visualizations
## Initializing ggplot functions for the maps
attach(prodn1990)
map <- world %>% inner_join(prodn1990, by = "region")
one <- map %>% ggplot(aes(x = long, y = lat, group = group)) +
coord_fixed(1.5) +
geom_polygon(aes(fill = value)) +
scale_fill_gradient(low = "steelblue ", high = "darkolivegreen3", limits = c(min(value), max(value))) +
plain + labs(fill = "Production (in PJ)", title = "Total production 1990")
detach(prodn1990)
attach(prodn1995)
map <- world %>% inner_join(prodn1995, by = "region")
two <- map %>% ggplot(aes(x = long, y = lat, group = group)) +
coord_fixed(1.5) +
geom_polygon(aes(fill = value)) +
scale_fill_gradient(low = "steelblue ", high = "darkolivegreen3", limits = c(min(value), max(value))) +
plain + labs(fill = "Production (in PJ)", title = "Total production 1995")
detach(prodn1995)
attach(prodn2000)
map <- world %>% inner_join(prodn2000, by = "region")
three <- map %>% ggplot(aes(x = long, y = lat, group = group)) +
coord_fixed(1.5) +
geom_polygon(aes(fill = value)) +
scale_fill_gradient(low = "steelblue ", high = "darkolivegreen3", limits = c(min(value), max(value))) +
plain + labs(fill = "Production (in PJ)", title = "Total production 2000")
detach(prodn2000)
attach(prodn2005)
map <- world %>% inner_join(prodn2005, by = "region")
four <- map %>% ggplot(aes(x = long, y = lat, group = group)) +
coord_fixed(1.5) +
geom_polygon(aes(fill = value)) +
scale_fill_gradient(low = "steelblue ", high = "darkolivegreen3", limits = c(min(value), max(value))) +
plain + labs(fill = "Production (in PJ)", title = "Total production 2005")
detach(prodn2005)
attach(prodn2010)
map <- world %>% inner_join(prodn2010, by = "region")
five <- map %>% ggplot(aes(x = long, y = lat, group = group)) +
coord_fixed(1.5) +
geom_polygon(aes(fill = value)) +
scale_fill_gradient(low = "steelblue ", high = "darkolivegreen3", limits = c(min(value), max(value))) +
plain + labs(fill = "Production (in PJ)", title = "Total production 2010")
detach(prodn2010)
attach(prodn2015)
map <- world %>% inner_join(prodn2015, by = "region")
six <- map %>% ggplot(aes(x = long, y = lat, group = group)) +
coord_fixed(1.5) +
geom_polygon(aes(fill = value)) +
scale_fill_gradient(low = "steelblue", high = "darkolivegreen3", limits = c(min(value), max(value))) +
plain + labs(fill = "Production (in PJ)", title = "Total production 2015")
detach(prodn2015)
# Calling the ggplot functions
one
two
three
four
five
six
# Data visualization
## Initializing ggplot functions for the maps
attach(trade1990)
map <- world %>% inner_join(trade1990, by = "region")
one <- map %>% ggplot(aes(x = long, y = lat, group = group)) +
coord_fixed(1.5) +
geom_polygon(aes(fill = value)) +
scale_fill_gradient(low = "steelblue ", high = "firebrick1", limits = c(min(value), max(value))) +
plain + labs(fill = "Petajoules", title = "Net imports 1990")
detach(trade1990)
attach(trade1995)
map <- world %>% inner_join(trade1995, by = "region")
two <- map %>% ggplot(aes(x = long, y = lat, group = group)) +
coord_fixed(1.5) +
geom_polygon(aes(fill = value)) +
scale_fill_gradient(low = "steelblue ", high = "firebrick1", limits = c(min(value), max(value))) +
plain + labs(fill = "Petajoules", title = "Net imports 1995")
detach(trade1995)
attach(trade2000)
map <- world %>% inner_join(trade2000, by = "region")
three <- map %>% ggplot(aes(x = long, y = lat, group = group)) +
coord_fixed(1.5) +
geom_polygon(aes(fill = value)) +
scale_fill_gradient(low = "steelblue ", high = "firebrick1", limits = c(min(value), max(value))) +
plain + labs(fill = "Petajoules", title = "Net imports 2000")
detach(trade2000)
attach(trade2005)
map <- world %>% inner_join(trade2005, by = "region")
four <- map %>% ggplot(aes(x = long, y = lat, group = group)) +
coord_fixed(1.5) +
geom_polygon(aes(fill = value)) +
scale_fill_gradient(low = "steelblue ", high = "firebrick1", limits = c(min(value), max(value))) +
plain + labs(fill = "Petajoules", title = "Net imports 2005")
detach(trade2005)
attach(trade2010)
map <- world %>% inner_join(trade2010, by = "region")
five <- map %>% ggplot(aes(x = long, y = lat, group = group)) +
coord_fixed(1.5) +
geom_polygon(aes(fill = value)) +
scale_fill_gradient(low = "steelblue ", high = "firebrick1", limits = c(min(value), max(value))) +
plain + labs(fill = "Petajoules", title = "Net imports 2010")
detach(trade2010)
attach(trade2015)
map <- world %>% inner_join(trade2015, by = "region")
six <- map %>% ggplot(aes(x = long, y = lat, group = group)) +
coord_fixed(1.5) +
geom_polygon(aes(fill = value)) +
scale_fill_gradient(low = "steelblue", high = "firebrick1", limits = c(min(value), max(value))) +
plain + labs(fill = "Petajoules", title = "Net imports 2015")
detach(trade2015)
# Calling the ggplot functions
one
two
three
four
five
six
# Data visualization
## Initializing ggplot functions for the maps
attach(supply1990)
map <- world %>% inner_join(supply1990, by = "region")
one <- map %>% ggplot(aes(x = long, y = lat, group = group)) +
coord_fixed(1.5) +
geom_polygon(aes(fill = value)) +
scale_fill_gradient(low = "yellow", high = "red", limits = c(min(value), max(value))) +
plain + labs(fill = "Petajoules", title = "Total supply 1990")
detach(supply1990)
attach(supply1995)
map <- world %>% inner_join(supply1995, by = "region")
two <- map %>% ggplot(aes(x = long, y = lat, group = group)) +
coord_fixed(1.5) +
geom_polygon(aes(fill = value)) +
scale_fill_gradient(low = "yellow", high = "red", limits = c(min(value), max(value))) +
plain + labs(fill = "Petajoules", title = "Total supply 1995")
detach(supply1995)
attach(supply2000)
map <- world %>% inner_join(supply2000, by = "region")
three <- map %>% ggplot(aes(x = long, y = lat, group = group)) +
coord_fixed(1.5) +
geom_polygon(aes(fill = value)) +
scale_fill_gradient(low = "yellow", high = "red", limits = c(min(value), max(value))) +
plain + labs(fill = "Petajoules", title = "Total supply 2000")
detach(supply2000)
attach(supply2005)
map <- world %>% inner_join(supply2005, by = "region")
four <- map %>% ggplot(aes(x = long, y = lat, group = group)) +
coord_fixed(1.5) +
geom_polygon(aes(fill = value)) +
scale_fill_gradient(low = "yellow", high = "red", limits = c(min(value), max(value))) +
plain + labs(fill = "Petajoules", title = "Total supply 2005")
detach(supply2005)
attach(supply2010)
map <- world %>% inner_join(supply2010, by = "region")
five <- map %>% ggplot(aes(x = long, y = lat, group = group)) +
coord_fixed(1.5) +
geom_polygon(aes(fill = value)) +
scale_fill_gradient(low = "yellow", high = "red", limits = c(min(value), max(value))) +
plain + labs(fill = "Petajoules", title = "Total supply 2010")
detach(supply2010)
attach(supply2015)
map <- world %>% inner_join(supply2015, by = "region")
six <- map %>% ggplot(aes(x = long, y = lat, group = group)) +
coord_fixed(1.5) +
geom_polygon(aes(fill = value)) +
scale_fill_gradient(low = "yellow", high = "red", limits = c(min(value), max(value))) +
plain + labs(fill = "Petajoules", title = "Total supply 2015")
detach(supply2015)
# Calling the created ggplot functions
one
two
three
four
five
six
# Subsetting relevant data based on the factors "region" and "series"
attach(dat)
energy_total <- dat %>% filter(region == "Total, all countries or areas" & series == "Primary energy production (petajoules)")
imports_total <- dat %>% filter(region == "Total, all countries or areas" & series == "Net imports [Imports - Exports - Bunkers] (petajoules)")
stocks_total <- dat %>% filter(region == "Total, all countries or areas" & series == "Changes in stocks (petajoules)")
supply_total <- dat %>% filter(region == "Total, all countries or areas" & series == "Total supply (petajoules)")
supply_pc_total <- dat %>% filter(region == "Total, all countries or areas" & series == "Supply per capita (gigajoules)")
detach(dat)
# Data visualization
a <- energy_total %>% ggplot(aes(year, value)) + geom_line(col = 2) + geom_point() + labs(x = "Year", y = "Petajoules") + theme_minimal()
b <- imports_total %>% ggplot(aes(year, value)) + geom_line(col = 3) + geom_point()+ labs(x = "Year", y = "Petajoules") + theme_minimal()
c <- stocks_total %>% ggplot(aes(year, value)) + geom_line(col = 4) + geom_point() + labs(x = "Year", y = "Petajoules") + theme_minimal()
d <- supply_total %>% ggplot(aes(year, value)) + geom_line(col = 5) + geom_point() + labs(x = "Year", y = "Petajoules") + theme_minimal()
e <- supply_pc_total %>% ggplot(aes(year, value)) + geom_line(col = 6) + geom_point() + labs(x = "Year", y = "Gigajoules") + theme_minimal()
# Arranging and calling all the plots together
ggarrange(a + labs(title = "Production"),
b + labs(title = "Net import"),
c + labs(title = "Changes in stocks"),
d + labs(title = "Total Supply"),
e + labs(title = "Supply per capita"), ncol = 3, nrow = 2)
# Creating ggplot functions of the plots
# LOESS models
f <- a + geom_smooth(formula = "y ~ x", method = loess, col = "firebrick1") +
labs(subtitle = "LOESS")
g <- b + geom_smooth(formula = "y ~ x", method = loess, col = "firebrick1") +
labs(subtitle = "LOESS")
h <- c + geom_smooth(formula = "y ~ x", method = loess, col = "firebrick1") +
labs(subtitle = "LOESS")
i <- d + geom_smooth(formula = "y ~ x", method = loess, col = "firebrick1") +
labs(subtitle = "LOESS")
j <- e + geom_smooth(formula = "y ~ x", method = loess, col = "firebrick1") +
labs(subtitle = "LOESS")
# Linear models
k <- a + geom_smooth(formula = "y ~ x", method = lm, col = "seagreen") +
labs(subtitle = "Linear")
l <- b + geom_smooth(formula = "y ~ x", method = lm, col = "seagreen") +
labs(subtitle = "Linear")
m <- c + geom_smooth(formula = "y ~ x", method = lm, col = "seagreen") +
labs(subtitle = "Linear")
n <- d + geom_smooth(formula = "y ~ x", method = lm, col = "seagreen") +
labs(subtitle = "Linear")
o <- e + geom_smooth(formula = "y ~ x", method = lm, col = "seagreen") +
labs(subtitle = "Linear")
# Quadratic models
p <- a + geom_smooth(method="lm", formula = y ~ x + I(x^2)) +
labs(subtitle = "Quadratic")
q <- b + geom_smooth(method="lm", formula = y ~ x + I(x^2)) +
labs(subtitle = "Quadratic")
r <- c + geom_smooth(method="lm", formula = y ~ x + I(x^2)) +
labs(subtitle = "Quadratic")
s <- d + geom_smooth(method="lm", formula = y ~ x + I(x^2)) +
labs(subtitle = "Quadratic")
t <- e + geom_smooth(method="lm", formula = y ~ x + I(x^2)) +
labs(subtitle = "Quadratic")
# Calling and arranging required ggplot functions together where series == Total production
ggarrange(k,f,p, ncol = 1, nrow = 3) %>%
annotate_figure(top = text_grob("Primary energy production",
face = "bold", size = 15))
# Calling and arranging required ggplot functions together where series == Net imports
ggarrange(l,g,q, ncol = 1, nrow = 3) %>%
annotate_figure(top = text_grob("Net Imports",
face = "bold", size = 15))
# Calling and arranging required ggplot functions together where series == Changes in stocks
ggarrange(m,h,r, ncol = 1, nrow = 3) %>%
annotate_figure(top = text_grob("Changes in stocks",
face = "bold", size = 15))
# Calling and arranging required ggplot functions together where series == Total supply
ggarrange(n,i,s, ncol = 1, nrow = 3) %>%
annotate_figure(top = text_grob("Total supply",
face = "bold", size = 15))
# Calling and arranging required ggplot functions together where series == Supply per capita
ggarrange(o,j,t, ncol = 1, nrow = 3) %>%
annotate_figure(top = text_grob("Supply per capita",
face = "bold", size = 15))
# Comparing the regression models
# R^2 values for fitted linear models
r1 <- summary(lm(energy_total$value~energy_total$year))$r.squared
r2 <- summary(lm(imports_total$value~imports_total$year))$r.squared
r3 <- summary(lm(supply_total$value~supply_total$year))$r.squared
r4 <- summary(lm(supply_pc_total$value~supply_pc_total$year))$r.squared
# Printing these R^2 values
cat("LINEAR REGRESSION\nCoeff. of det (Total prodn): ", r1,
"\nCoeff. of det (Net imports): ", r2,
"\nCoeff. of det (Total supply): ", r3,
"\nCoeff. of det (Supply per capita): ", r4)
## LINEAR REGRESSION
## Coeff. of det (Total prodn): 0.9817323
## Coeff. of det (Net imports): 0.8402147
## Coeff. of det (Total supply): 0.9862789
## Coeff. of det (Supply per capita): 0.9570919
# R^2 values to compare between linear, quadratic, and LOESS models and check
# for the best possible fit for non-linear data
# 1. Changes in stocks
stocks.lo <- loess(stocks_total$value~stocks_total$year)
r.lo <- cor(stocks_total$value, predict(stocks.lo))
r.lm <- summary(lm(stocks_total$value~stocks_total$year))$r.squared
r.qm <- summary(lm(stocks_total$value~stocks_total$year+I(stocks_total$year^2)))$r.squared
cat("Coeffs. of determination for variable 'Changes in Stocks'
\nLinear regression: ", r.lm,
"\nQuadratic regression: ", r.qm,
"\nLOESS regression: ", r.lo)
## Coeffs. of determination for variable 'Changes in Stocks'
##
## Linear regression: 0.04427113
## Quadratic regression: 0.1889551
## LOESS regression: 0.7909008
# 2. Net imports
imports.lo <- loess(imports_total$value~imports_total$year)
r1.lo <- cor(imports_total$value, predict(imports.lo))
r1.lm <- summary(lm(imports_total$value~imports_total$year))$r.squared
r1.qm <- summary(lm(imports_total$value~imports_total$year+I(imports_total$year^2)))$r.squared
cat("Coeffs. of determination for variable 'Net imports'
\nLinear regression: ", r1.lm,
"\nQuadratic regression: ", r1.qm,
"\nLOESS regression: ", r1.lo)
## Coeffs. of determination for variable 'Net imports'
##
## Linear regression: 0.8402147
## Quadratic regression: 0.8983214
## LOESS regression: 0.9939503
The exploratory data analysis done for the dataset has a few limitations. The analysis could not capture information of countries whose data was unavailable (Russia in 1990). The dataset was not linked/joined with the data of map (used from package “maps”) due to conflicts in a few countries’ political boundaries. This resulted in missing out on information of those countries (not major when it comes to economy and power in production, supply, and trade) for analysis.
When we look at changes in world wide production, trade and supply, it is evident that over the years USA has deemed itself worthy of being called a superpower nation. Constantly, from 1990 to 2015, USA has been one of the very few leading producers, importer and exporter of energy.
China, on the other hand, has had the most significant growth over the years which ultimately resulted in USA becoming second to it. By mid 2000s, China overtook USA to become the leading primary energy producer. By early 2010s, China had already gained the reputation and had become the biggest supplier. In the following years, by 2015, the country rose to become the biggest importer as well, beating USA in the business that they once owned like gods.
Russia has constantly been significantly important as well when it comes to total production; but is second to almost every other country in terms of net imports and supply. Over the years, the country has become one of the least importers; perhaps because of its home production whereas most of the other countries like India have been importing more and more energy over the years. Overall, apart from USA and China, all the other countries are significantly less involved in the activities, be it production, trade or supply.
By the line graphs of the global activities, it is quite obvious that total production and supply has been linearly growing (thanks to China and USA). On the contrary to that, net import has been constantly declining, except for a small rise between 2005 and 2010. This could possibly be because most of the countries in need of resources have been tending to look for self-production of the same. This results in significant development in terms of national savings.
As total supply (in PJ) has been on a rise, supply per capita (in GJ) has also been increasing over the years.
Changes in stocks has been unpredictable. Over the years from 1990 to 2000, there was a steep decline and then for the following 10 years, a incline. Since 2014, the decline makes a show again and the curve of change has been negative ever since then.
Total Production, Total Supply, and Supply per capita are linear in nature (as discussed in EDA). Hence, fitting a linear model would be ideal for those variables.
On fitting linear models as mentioned above, the coeffs. of determination are 0.9817323, 0.9862789, 0.9570919 respectively (all close to 1, which determines the goodness of fit). This implies that our assumption for modelling was right and hence, linear modelling is better than other alternatives like Quadratic and LOESS regression.
Now, since Net imports and Changes in Stocks are not linear in nature (discovered by EDA and \(R^{2}\)), we proceed and look for other alternatives. As discussed in the previous section,the \(R^{2}\) value determines LOESS regression to be the best fit for both Net Imports and Changes in Stocks since both the values are 0.7909008 and 0.9939503 respectively (relatively much better than that of Linear and Quadratic models [in that particular order]). Thus, we may proceed and declare that LOESS regression is the best option for predictive analytics on the two variables considered.
Through sophisticated exploratory data analysis, trends in growth of production, trade, and supply activities worldwide. The race for development between USA and China is significantly evident in the stretch of years since 1990. Apart from the two global superpowers, India and Russia share major portion of the market.
Increasing trend in primary energy production, and total and per capita supply of energy over decades suggest the ever increasing demand and supply chain of energy business. Developing countries like India have and will be producing and importing for years to come. The trend can be understood very well with the help of numerous statistical models. To understand linearly increasing/decreasing trends, linear models are handy, and to understand non-linear patterns, LOESS and Quadratic models suffice.
Error: Aesthetics must be either length 1 or the same as the data (2): fill. (2018, October 4). RStudio Community. https://community.rstudio.com/t/error-aesthetics-must-be-either-length-1-or-the-same-as-the-data-2-fill/15579
Roberts, D. (n.d.). Statistic 2 - Comparing Exponential and Power Regression Models. MathBits.Com - Fred and Donna Roberts. Retrieved November 7, 2020, from https://mathbits.com/MathBits/TISection/Statistics2/compareExpPower.html
R help - Non-linear Regression best-fit line. (n.d.). Nabble Forum. Retrieved November 6, 2020, from https://r.789695.n4.nabble.com/Non-linear-Regression-best-fit-line-td3606769.html
UCLA: Statistical Consulting Group. (2011). How can I explore different smooths in ggplot2? UCLA Stats. https://stats.idre.ucla.edu/r/faq/how-can-i-explore-different-smooths-in-ggplot2/
ggplot2 colors : How to change colors automatically and manually? - Easy Guides - Wiki - STHDA. (n.d.). Http://Www.Sthda.Com/. Retrieved November 5, 2020, from http://www.sthda.com/english/wiki/ggplot2-colors-how-to-change-colors-automatically-and-manually
Power regression in R similar to excel. (2013, August 19). Stack Overflow. https://stackoverflow.com/questions/18305852/power-regression-in-r-similar-to-excel
Get Initial Parameter Estimates for Nonlinear Least Squares Models. (n.d.). GetInitial Blog. Retrieved November 7, 2020, from https://docs.tibco.com/pub/enterprise-runtime-for-R/5.1.0/doc/html/Language_Reference/stats/getInitial.html
Does lack of seasonality imply random time series? (2019, April 18). Cross Validated. https://stats.stackexchange.com/questions/403819/does-lack-of-seasonality-imply-random-time-series
Alice, M. (2017, April 28). Fitting Polynomial Regression in R. DataScience+. https://datascienceplus.com/fitting-polynomial-regression-r/
Fitting Nonlinear Mixed-Effects Models. (2020, November 7). SpringerLink. https://link.springer.com/chapter/10.1007%2F0-387-22747-4_8