The world needs energy because it is a basic human need as result it is important to find out where energy comes from and whether it is accessible to everyone through electricity. No electricity means no refrigeration of food, no washing machine, and no light at night. Secondly with access to electricity the energy sources currently emit high carbon emissions which is leading to climate change. This project will focus on investigating energy consumption through tidying, mutating ,and visualizing data from various sources and building models to assess whether we are all countries have access to electricity and if we are doing anything to reduce carbon emissions.
The world consumes about 580 million terajoules of energy a year. 83% of this energy comes from fossil fuels which produces greenhouse emissions. Furthermore, hundreds of millions lack access to energy entirely. Although countries are taking initiatives to increase the use of renewables there continues to be a rising demand for fossil fuels and the world lacks safe, low-carbon , cheap large-scale energy alternatives to use.
We will then use the answers to these questions to answer our main question: Is energy accessible to everyone and how close are we to getting rid of fossil fuels??
Our main our data source will be Kaggle. We selected this dataset because it has categorical data columns which would be useful in data visualization and get valuable insights as the data is also updated regularly by Our World in Data. The dataset has 122 columns with metrics like primary energy per capita, growth rates, energy mix and electricity mixed ,etc. We are aware that some columns have null values and intend to handle such use cases in our analysis. We have data dating from 1990 to 2020 with 242 unique countries included. Our second data source is Our World in Data with multiple datasets related to which countries have access to electricity and the GDP per capita share with metrics such electricity per capita, access to electricity and others. We wanted to include this to our analysis to enrich our results and help gather more plots to provide a better conclusion on the issue about energy consumption.
# energy consumed per year
global_energy_substitution <- read_csv("World Energy Consumption Datasets/global-energy-substitution.csv")
energy_consumed <- select(global_energy_substitution,-Entity,-Code)
energy_consumed_result_line_graph <- energy_consumed %>%
pivot_longer(-c(Year)) %>%
ggplot(aes(x=Year,y=value,group=name, color=name, fill = name)) +
geom_line() + scale_x_continuous(breaks=seq(min(energy_consumed$Year),max(energy_consumed$Year),10),guide = guide_axis(check.overlap = TRUE) ) +
labs(title="Global primary energy consumption by source",
caption="World Energy Consumption",y = "Twh",x="Year")
energy_consumed_result_area_graph <- energy_consumed %>%
pivot_longer(-c(Year)) %>%
ggplot(aes(x=Year,y=value,group=name, color=name, fill = name)) +
geom_area() + scale_x_continuous(breaks=seq(min(energy_consumed$Year),max(energy_consumed$Year),10),guide = guide_axis(check.overlap = TRUE) ) +
labs(title="Global primary energy consumption by source",
caption="World Energy Consumption",y = "Twh",x="Year")
energy_consumed_result_area_graph
energy_consumed_result_line_graph
#The number of people without electricity more than halved over the last 20 years
number_of_people_with_and_without_electricity_access <- read_csv("World Energy Consumption Datasets/number-of-people-with-and-without-electricity-access.csv")
number_of_people_with_and_without_electricity_access_filtered <- number_of_people_with_and_without_electricity_access %>%
filter(Entity %in% c("World"))
number_of_people_with_and_without_electricity_access_display <- select(number_of_people_with_and_without_electricity_access_filtered,-Entity,-Code) %>%
filter(Year %in% c(1998,2000,2002,2004,2006,2008,2010,2012,2014,2016,2019)) %>%
pivot_longer(-c(Year)) %>%
ggplot(aes(x=Year,y=value,group=name, color=name, fill = name)) +
geom_bar(stat = "identity") + scale_x_continuous(breaks=seq(min(number_of_people_with_and_without_electricity_access_filtered$Year),max(number_of_people_with_and_without_electricity_access_filtered$Year),10),guide = guide_axis(check.overlap = TRUE) ) +
labs(title="Number of people with and without electricity access, World",y = "Billion",x="Year")
number_of_people_with_and_without_electricity_access_display
#how much energy do people consume per capita?
per_capita_energy_use_data <- read_csv("World Energy Consumption Datasets/per-capita-energy-use.csv")
world <- map_data('world') %>%
filter(region != 'Antarctica')
gapminder_data_3 <- per_capita_energy_use_data %>%
inner_join(maps::iso3166 %>%
select(a3, mapname), by= c(Code = "a3")) %>%
mutate(mapname = str_remove(mapname, "\\(.*"))
per_capita_energy_use_result <- map_data("world") %>%
as_tibble() %>%
inner_join(gapminder_data_3, by=c(region= "mapname")) %>%
filter(Year %in% c(2021)) %>%
ggplot(aes(long, lat, group= group, fill= primary_energy_consumption_per_capita )) +
geom_polygon(color = "white", size = 0.05, alpha = 0.8) +
scale_fill_viridis(
option= "magma",
direction = -1,
name = "years",
guide =guide_colorbar(
direction = "horizontal",
barheight = unit(2, units = "mm"),
barwidth = unit(50, units = "mm"),
draw.ulim = F,
title.position = "top",
title.hjust = 0.5,
label.hjust = 0.5
)) +
theme_void() +
facet_wrap(~Year) +
labs(title="Energy Use per person,2021") +
coord_fixed (ratio = 1.3) +
theme(plot.title=element_text(size = 16,
hjust = 0.5),
legend.position = "bottom")
per_capita_energy_use_result
#Plot a curves of energy consumption per capita
energy_consumed_curves_result <- read_csv("World Energy Consumption Datasets/per-capita-energy-use.csv") %>%
filter(Entity %in% c("Sweden","Nigeria","India","United States","China","Brazil","World","United Kingdom")) %>%
ggplot(aes(y = primary_energy_consumption_per_capita, x = Year,group=Entity, color=Entity, fill = Entity)) +
geom_point() + geom_line() +
labs(y = "kWh", x = "Year")
energy_consumed_curves_result
# access to electricty vs gdp per capita
access_to_electricity_vs_gdp_per_capita <- read_csv("World Energy Consumption Datasets/access-to-electricity-vs-gdp-per-capita.csv")
access_to_electricity_vs_gdp_per_capita_plot <- ggplot(data=na.omit(access_to_electricity_vs_gdp_per_capita),aes(x=gdp,y=access,color=Continent,size=`Population (historical estimates)`,label=Entity)) + geom_point() +
labs(x = "GDP per capita", y = "Consumption-based emissions per capita", title = "Consumption-based CO₂ emissions per capita vs GDP per capita,2019")
# ggploty
access_to_electricity_vs_gdp_per_capita_display <- ggplotly(access_to_electricity_vs_gdp_per_capita_plot)
access_to_electricity_vs_gdp_per_capita_plot
This scatter plot shows the carbon emissions per capita on the vertical axis against the average income of the speficifed country on the horizontal axis with the size of the circle representing the population size.
Attached is a link with a interactive display of the scatter plot showing the information of country plotted.
Interactive Display Link : (https://rpubs.com/MichelleVava/960877)
Analysis :
Next steps : We still want to investigate why emissions are low in power countries. Does this mean the poor countries use clean energy? Or do they have access modern energy and technology?
Before the long-term impacts of climate change, it is important to analysis how each source stacks up in terms of short-term health risks.
# death tolls
death_rates_from_energy_production_per_twh <- read_csv("World Energy Consumption Datasets/death-rates-from-energy-production-per-twh.csv")
death_rates_result <- ggplot(death_rates_from_energy_production_per_twh, aes(x=Entity, y=Deaths_per_TWh_of_electricity_production,fill=Entity)) +
geom_bar(stat = "identity") +
coord_flip()
death_rates_result
df <- read.csv("World Energy Consumption Datasets/electricity_production.csv")
df[is.na(df)] = 0
electricity_production <- df[,c('continent','year','biofuel_electricity','hydro_electricity',
'nuclear_electricity','solar_electricity',
'wind_electricity','other_renewable_electricity',
"fossil_electricity","renewables_consumption",
"fossil_fuel_consumption","renewables_electricity",
"coal_electricity","gas_electricity","oil_electricity",
"coal_consumption","gas_consumption",'oil_consumption')]
electricity_production<-electricity_production %>%
select( continent,year,
biofuel_electricity,
hydro_electricity,
nuclear_electricity,
solar_electricity,
wind_electricity,
other_renewable_electricity,
renewables_electricity,
fossil_electricity,
coal_electricity,
gas_electricity,
oil_electricity,
renewables_consumption,
fossil_fuel_consumption,
coal_consumption,
gas_consumption,
oil_consumption)
# renewable electricity production
df2 <- mutate(electricity_production,total_renewable = (biofuel_electricity + hydro_electricity +
nuclear_electricity + solar_electricity +
wind_electricity + other_renewable_electricity))
grouped <- df2 %>%
group_by(continent,year) %>%
summarise(renewable_production= sum(total_renewable))
oil_coal_gas <- mutate(electricity_production, total_fossil = oil_electricity+gas_electricity+coal_electricity)
grouped1 <- oil_coal_gas%>%
group_by(continent,year)%>%
summarise(production = sum(total_fossil))
accumulate_by <- function(dat, var) {
var <- lazyeval::f_eval(var, dat)
lvls <- plotly:::getLevels(var)
dats <- lapply(seq_along(lvls), function(x) {
cbind(dat[var %in% lvls[seq(1, x)], ], frame = lvls[[x]])
})
dplyr::bind_rows(dats)
}
# df3 <- grouped
# fig <- df3 %>%
# filter(between(year,1980,2019), continent %in% c("Asia", "Europe","Africa","Americas","Oceania"))
# fig <- fig %>% accumulate_by(~year)
fig1 <- electricity_production %>%
group_by(continent,year)%>%
summarise(production = sum(renewables_electricity))
fig1<-fig1 %>%
filter(between(year,1980,2019), continent %in% c("Asia", "Europe","Africa","Americas","Oceania"))%>%
accumulate_by(~year)
fig2 <- electricity_production %>%
group_by(continent,year)%>%
summarise(consumption = sum(renewables_consumption))
fig2<-fig2 %>%
filter(between(year,1980,2019), continent %in% c("Asia", "Europe","Africa","Americas","Oceania"))%>%
accumulate_by(~year)
fig3 <- electricity_production %>%
group_by(continent,year)%>%
summarise(consumption = sum(fossil_fuel_consumption))
fig3<-fig3 %>%
filter(between(year,1980,2019), continent %in% c("Asia", "Europe","Africa","Americas","Oceania"))%>%
accumulate_by(~year)
fig4 <- electricity_production %>%
group_by(continent,year)%>%
summarise(production = sum(fossil_electricity))
fig4<-fig4 %>%
filter(between(year,1980,2019), continent %in% c("Asia", "Europe","Africa","Americas","Oceania"))%>%
accumulate_by(~year)
fig5 <- grouped1
# fig5 %>%filter(fossil_production_sum > 0)
fig5<-grouped1 %>%
filter(between(year,1980,2019), continent %in% c("Asia", "Europe","Africa","Americas","Oceania"))%>%
accumulate_by(~year)
pltly1 <-plot_ly() %>%
# add_trace(
# x = ~year,
# y = ~consumption,
# split = ~continent,
# frame = ~frame,
# type = 'scatter',
# mode = 'lines',
# data = fig2,
# opacity = 1.0
# ) %>%
# add_trace(
# x = ~year,
# y = ~consumption,
# split = ~continent,
# frame = ~frame,
# type = 'scatter',
# mode = 'lines',
# data = fig3,
# opacity = 0.5
# ) %>%
add_trace(
x = ~year,
y = ~production,
split = ~continent,
frame = ~frame,
type = 'scatter',
mode = 'lines',
data = fig4,
opacity = 1.0
) %>%
add_trace(
x = ~year,
y = ~production,
split = ~continent,
frame = ~frame,
type = 'scatter',
mode = 'lines',
data = fig1,
opacity = 0.5
) %>%
animation_opts(
frame = 100,
transition = 0,
redraw = FALSE
) %>%
layout(title = "Electricity production from fossil and renewable sources")%>%
animation_slider(
hide = T
) %>%
animation_button(
x = 1, xanchor = "right", y = 0, yanchor = "bottom"
)
# pltly1
pltly2 <-plot_ly() %>%
add_trace(
x = ~year,
y = ~consumption,
split = ~continent,
frame = ~frame,
type = 'scatter',
mode = 'lines',
data = fig2,
opacity = 1.0
) %>%
add_trace(
x = ~year,
y = ~consumption,
split = ~continent,
frame = ~frame,
type = 'scatter',
mode = 'lines',
data = fig3,
opacity = 0.5
) %>%
# add_trace(
# x = ~year,
# y = ~production,
# split = ~continent,
# frame = ~frame,
# type = 'scatter',
# mode = 'lines',
# data = fig4,
# opacity = 1.0
# ) %>%
# add_trace(
# x = ~year,
# y = ~production,
# split = ~continent,
# frame = ~frame,
# type = 'scatter',
# mode = 'lines',
# data = fig1,
# opacity = 0.5
# ) %>%
animation_opts(
frame = 100,
transition = 0,
redraw = FALSE
) %>%
layout(title = "Electricity consumption from fossil and renewable sources")%>%
animation_slider(
hide = T
) %>%
animation_button(
x = 1, xanchor = "right", y = 0, yanchor = "bottom"
)
# pltly2
Renewable and fossil electricity production
Renewable and fossil electricity Consumption
Population is a main factor which affects the energy consumption. It is evident that Asia, being biggest in population their fossil fuel consumption surpasses the production.Which indicates to meet their needs they have to borrow energy from other countries.
Americas and Europe shows a decreasing trend in fossil fuel consumption
Among both Europe fuel consumption shows a tremendous decline in the past years.
On the other hand the renewable energy consumption shows increasing trend.
Interactive Display : (https://rpubs.com/Francis2707/981100)
library(forecast)
library(tseries)
df <- read.csv("World Energy Consumption Datasets/electricity_production.csv")
df[is.na(df)] = 0
electricity_production <- df[,c('continent','year','biofuel_electricity','hydro_electricity','nuclear_electricity','solar_electricity',
'wind_electricity','other_renewable_electricity',"fossil_electricity",
"renewables_consumption","fossil_fuel_consumption","renewables_electricity")]
electricity_production <- electricity_production %>%
select( continent,year,
biofuel_electricity,
hydro_electricity,
nuclear_electricity,
solar_electricity,
wind_electricity,
other_renewable_electricity,
renewables_electricity,
fossil_electricity,
renewables_consumption,
fossil_fuel_consumption)
df2 <- mutate(electricity_production,total_renewable = (biofuel_electricity + hydro_electricity + nuclear_electricity + solar_electricity + wind_electricity + other_renewable_electricity))
grouped <- df2 %>%
group_by(continent,year) %>%
summarise(renewable_production_sum = sum(total_renewable))
world_renewable_production <- grouped %>%
group_by(year)%>%
summarise(total_renewable_production = sum(renewable_production_sum))%>%
filter(total_renewable_production > 0 & year < 2020)
# View(world_renewable_production)
fig3 <- electricity_production %>%
group_by(continent,year)%>%
summarise(production = sum(fossil_electricity))
world_fossil_production <- fig3 %>%
group_by(year)%>%
summarise(total_fossil_production = sum(production))%>%
filter(total_fossil_production > 0 & year <2020)
# View(world_fossil_production)
fig2 <- electricity_production %>%
group_by(continent,year)%>%
summarise(consumption = sum(renewables_consumption))
world_renewable_consumption <- fig2 %>%
group_by(year)%>%
summarise(total_renewable_consumption = sum(consumption))%>%
filter(total_renewable_consumption > 0 & year <2020)
fig1 <- electricity_production %>%
group_by(continent,year)%>%
summarise(consumption= sum(fossil_fuel_consumption))
world_fossil_consumption <- fig1 %>%
group_by(year)%>%
summarise(total_fossil_consumption = sum(consumption))%>%
filter(total_fossil_consumption > 0 & year <2020)
continent_name = 'Europe'
fossil_consumtion <- fig1 %>% filter(continent == continent_name & consumption > 0)
renewable_consumtion <- fig2 %>% filter(continent == continent_name & consumption > 0)
fossil_production <- fig3 %>% filter(continent == continent_name & production >0)
renewable_production <- grouped %>% filter(continent == continent_name & renewable_production_sum > 0 & year < 2020)
z <- ts(renewable_production$renewable_production_sum,start = c(1985))
x <- ts(fossil_consumtion$consumption,start = c(1985))
y <- ts(renewable_consumtion$consumption, start = c(1985))
w <-ts(fossil_production$production, start = c(1985))
a <- ts(world_renewable_production$total_renewable_production, start = c(1985))
b <- ts(world_fossil_production$total_fossil_production, start = c(1985))
c <- ts(world_renewable_consumption$total_renewable_consumption, start = c(1985))
d <- ts(world_fossil_consumption$total_fossil_consumption, start = c(1985))
fit_z <- auto.arima(z)
fit_x <- auto.arima(x)
fit_y <- auto.arima(y)
fit_w <- auto.arima(w)
fit_a <-auto.arima(a)
fit_b <- auto.arima(b)
fit_c <- auto.arima(c)
fit_d <- auto.arima(d)
plot(forecast(fit_a,12),xlab = "year",ylab = "production in tWh", main = "Predicted world renewable energy production")
pred_values_a <- forecast(fit_a,12)
pred_values_a
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2020 10488.78 10337.00 10640.56 10256.66 10720.90
## 2021 10946.51 10719.18 11173.84 10598.84 11294.19
## 2022 11404.30 11078.70 11729.90 10906.33 11902.27
## 2023 11862.07 11432.02 12292.12 11204.36 12519.77
## 2024 12319.84 11775.13 12864.55 11486.78 13152.90
## 2025 12777.61 12110.14 13445.09 11756.80 13798.43
## 2026 13235.39 12437.10 14033.68 12014.51 14456.27
## 2027 13693.16 12756.60 14629.72 12260.81 15125.51
## 2028 14150.93 13068.97 15232.89 12496.22 15805.65
## 2029 14608.71 13374.56 15842.85 12721.25 16496.16
## 2030 15066.48 13673.65 16459.30 12936.34 17196.62
## 2031 15524.25 13966.50 17082.00 13141.87 17906.63
plot(forecast(fit_b,12),xlab = "year",ylab = "production in tWh", main = "Predicted world fossil energy production")
pred_values_b <- forecast(fit_b,12)
pred_values_b
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2020 15891.80 15567.03 16216.56 15395.11 16388.48
## 2021 16192.28 15732.99 16651.56 15489.86 16894.69
## 2022 16492.76 15930.25 17055.27 15632.48 17353.04
## 2023 16793.24 16143.72 17442.77 15799.88 17786.61
## 2024 17093.72 16367.53 17819.92 15983.11 18204.34
## 2025 17394.21 16598.70 18189.71 16177.59 18610.82
## 2026 17694.69 16835.45 18553.93 16380.59 19008.79
## 2027 17995.17 17076.60 18913.74 16590.34 19400.00
## 2028 18295.65 17321.36 19269.94 16805.61 19785.70
## 2029 18596.13 17569.14 19623.13 17025.49 20166.78
## 2030 18896.62 17819.50 19973.73 17249.31 20543.92
## 2031 19197.10 18072.09 20322.11 17476.54 20917.66
plot(forecast(fit_c,12),xlab = "year",ylab = "consumption in tWh", main = "Predicted world renewable energy consumption")
pred_values_c <- forecast(fit_c,12)
pred_values_c
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2040 18221.35 17996.99 18445.71 17878.22 18564.48
## 2041 19135.79 18787.58 19484.01 18603.25 19668.34
## 2042 20037.80 19512.92 20562.67 19235.06 20840.53
## 2043 20945.56 20234.27 21656.85 19857.74 22033.38
## 2044 21850.66 20929.41 22771.91 20441.72 23259.59
## 2045 22756.99 21610.47 23903.51 21003.54 24510.44
## 2046 23662.75 22274.13 25051.36 21539.05 25786.45
## 2047 24568.77 22923.44 26214.11 22052.45 27085.09
## 2048 25474.68 23558.32 27391.03 22543.87 28405.49
## 2049 26380.63 24179.89 28581.38 23014.89 29746.38
## 2050 27286.57 24788.58 29784.55 23466.23 31106.90
## 2051 28192.51 25385.01 31000.02 23898.81 32486.22
plot(forecast(fit_d,12),xlab = "year",ylab = "consumption in tWh", main = "Predicted world fossil energy consumption")
pred_values_d <- forecast(fit_d,12)
pred_values_d
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2040 130245.7 127273.4 133218.0 125700.0 134791.4
## 2041 132025.3 127821.8 136228.7 125596.7 138453.9
## 2042 133804.9 128656.7 138953.0 125931.5 141678.3
## 2043 135584.5 129639.9 141529.1 126493.1 144675.9
## 2044 137364.1 130717.9 144010.3 127199.6 147528.7
## 2045 139143.7 131863.2 146424.3 128009.0 150278.4
## 2046 140923.3 133059.4 148787.3 128896.5 152950.2
## 2047 142703.0 134296.1 151109.8 129845.7 155560.2
## 2048 144482.6 135565.7 153399.4 130845.4 158119.7
## 2049 146262.2 136863.0 155661.4 131887.4 160637.0
## 2050 148041.8 138183.8 157899.7 132965.4 163118.2
## 2051 149821.4 139525.1 160117.7 134074.6 165568.2
plot(forecast(fit_z,12),xlab = "year",ylab = "production in tWh", main = "Predicted world renewable energy production in Europe")
pred_values_z <- forecast(fit_z,12)
pred_values_z
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2020 2440.996 2390.140 2491.851 2363.219 2518.772
## 2021 2479.899 2417.043 2542.756 2383.769 2576.030
## 2022 2518.803 2445.895 2591.711 2407.300 2630.306
## 2023 2557.707 2475.974 2639.439 2432.708 2682.706
## 2024 2596.610 2506.918 2686.303 2459.437 2733.784
## 2025 2635.514 2538.512 2732.516 2487.162 2783.866
## 2026 2674.418 2570.620 2778.216 2515.672 2833.163
## 2027 2713.322 2603.146 2823.497 2544.822 2881.821
## 2028 2752.225 2636.021 2868.429 2574.507 2929.944
## 2029 2791.129 2669.195 2913.063 2604.647 2977.611
## 2030 2830.033 2702.625 2957.440 2635.180 3024.885
## 2031 2868.936 2736.282 3001.591 2666.059 3071.814
plot(forecast(fit_w,12),xlab = "year",ylab = "production in tWh", main = "Predicted world fossil energy production in Europe")
pred_values_w <- forecast(fit_w,12)
pred_values_w
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2021 1228.1396 1102.40277 1353.876 1035.84172 1420.437
## 2022 1157.5562 956.68010 1358.432 850.34271 1464.770
## 2023 1086.9728 812.13374 1361.812 666.64273 1507.303
## 2024 1016.3894 665.28004 1367.499 479.41397 1553.365
## 2025 945.8060 515.12380 1376.488 287.13441 1604.478
## 2026 875.2226 361.33859 1389.107 89.30483 1661.140
## 2027 804.6392 203.84077 1405.438 -114.20270 1723.481
## 2028 734.0558 42.64765 1425.464 -323.36171 1791.473
## 2029 663.4724 -122.17913 1449.124 -538.07793 1865.023
## 2030 592.8890 -290.55891 1476.337 -758.22798 1944.006
## 2031 522.3056 -462.40398 1507.015 -983.67775 2028.289
## 2032 451.7222 -637.62565 1541.070 -1214.29157 2117.736
plot(forecast(fit_y,12),xlab = "year",ylab = "consumption in tWh", main = "Predicted world renewable energy consumption in Europe")
pred_values_y <- forecast(fit_y,12)
pred_values_y
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2040 3739.719 3633.461 3845.978 3577.211 3902.228
## 2041 3869.301 3726.178 4012.424 3650.414 4088.189
## 2042 3998.883 3815.617 4182.149 3718.602 4279.164
## 2043 4128.465 3901.996 4354.934 3782.111 4474.819
## 2044 4258.047 3985.503 4530.591 3841.227 4674.867
## 2045 4387.629 4066.299 4708.959 3896.197 4879.061
## 2046 4517.211 4144.521 4889.900 3947.231 5087.190
## 2047 4646.793 4220.290 5073.295 3994.514 5299.072
## 2048 4776.375 4293.711 5259.038 4038.205 5514.545
## 2049 4905.957 4364.877 5447.036 4078.447 5733.466
## 2050 5035.538 4433.870 5637.207 4115.366 5955.711
## 2051 5165.120 4500.766 5829.475 4149.078 6181.163
plot(forecast(fit_x,12),xlab = "year",ylab = "consumption in tWh", main = "Predicted world fossil energy consumption in Europe")
pred_values_x <- forecast(fit_x,12)
pred_values_x
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2040 15190.54 14628.36 15752.72 14330.761 16050.32
## 2041 15049.85 14177.17 15922.53 13715.207 16384.50
## 2042 14909.16 13743.52 16074.81 13126.460 16691.86
## 2043 14768.47 13309.12 16227.83 12536.583 17000.36
## 2044 14627.78 12868.04 16387.53 11936.490 17319.08
## 2045 14487.10 12417.84 16556.36 11322.435 17651.76
## 2046 14346.41 11957.41 16735.41 10692.747 18000.07
## 2047 14205.72 11486.28 16925.15 10046.704 18364.73
## 2048 14065.03 11004.31 17125.75 9384.065 18745.99
## 2049 13924.34 10511.49 17337.19 8704.839 19143.84
## 2050 13783.65 10007.93 17559.37 8009.178 19558.12
## 2051 13642.96 9493.76 17792.16 7297.306 19988.62
#
# qqnorm(fit$residuals)
# qqline(fit$residuals)
# Box.test(fit_d$residuals,type = "Ljung-Box")
# fig_1 <- plot_ly(forecast(fit,12),type = 'scatter',mode = 'lines+markers')
library(FactoMineR)
library(factoextra)
wenergy <- read.csv ("World Energy Consumption Datasets/electricity_production.csv")
w1 <- select(filter(wenergy,country == "Canada",year > 2001),year,renewables_electricity)
# ggplot(data = w1, mapping = aes(x = renewables_electricity, y = year, color = factor(renewables_electricity))) +
# geom_point()+
# ggtitle("Renewable Energy Usage in Canada from 2005-2020")
w2 <- select(filter(wenergy,year >2000),country,biofuel_electricity,
coal_electricity,fossil_electricity,gas_electricity,
hydro_electricity,nuclear_electricity,oil_electricity,
other_renewable_electricity,other_renewable_exc_biofuel_electricity,renewables_electricity,solar_electricity)
colnames(w2)[1] <- "region"
w2 <- w2 %>%
mutate(region = ifelse(as.character(region) == "United States", "USA", as.character(region)))
w2 <- w2 %>%
mutate(region = ifelse(as.character(region) == "United Kingdom", "UK", as.character(region)))
w2 <- w2 %>%
mutate(region = ifelse(as.character(region) == "Democratic Republic of Congo", "Democratic Republic of the Congo", as.character(region)))
w3 <- subset(w2, select = -region)
w3$emax <- colnames(w3)[max.col(w3)]
w31 <- subset(w3, select = -emax)
w31$maxe <-do.call(pmax, c(w31, na.rm=TRUE))
for(i in 1:nrow(w31)){
for(j in colnames(w31)){
if( isTRUE(w31) && w31[i,"j" ] == w31[i,"maxe"]){
w31[i, "j"] <- 0
}
}
}
#second highest
w32 <- subset(w31, select = -maxe)
w32$maxe2 <-do.call(pmax, c(w32, na.rm=TRUE))
w312 <- subset(w31, select = -maxe)
w32$e2max <- colnames(w31)[max.col(w312)]
w32 <- cbind(w2["region"], w32[])
w4 <- cbind(w2["region"], w3["emax"])
#view(w4)
#Secondary energy source
ggplot(data = w32, aes(x=e2max))+
geom_bar(aes(fill = e2max))+
ggtitle("Secondary source of energy since 2001")+
theme(axis.text.x = element_text(angle = 90,hjust = 1))
Filtered data for secondary energy source by getting rid of primary
sources as most of the results wouldn’t be any different We were
interested in secondary energy source type and its production around the
world hence we did that by filtering out the data and eliminating
primary source by finding it and making it zero using for
loop.Visualization of secondary source of energy around the world can be
seen here.
library(factoextra)
set.seed(123)
w33 <- subset(w32, select = -c(1,13,14))
# ?kmeans
numeric_round_func <- function(x){
round(as.numeric(as.character(x)),2)
}
w34 <- w32
w34 <- w34 %>%
mutate_at(vars(-one_of("region", "e2max")), numeric_round_func)
w35 <- w34%>%
drop_na()
set.seed(1234)
# Cluster plot using kmeans
kmeans_b <- kmeans(w35[,2:12], centers = 11)
kmeans_table <- data.frame(kmeans_b$size, kmeans_b$centers)
kmeans_df <- data.frame(Cluster = kmeans_b$cluster, w35)
#description 1
# head of dataframe after kmeans
head(kmeans_df)
#description2
# kmeans fancy
kmeans_f <- kmeans(scale(w35[,2:12]), 11 , nstart = 10)
# plotting clusters
fviz_cluster(kmeans_f, data = scale(w35[,2:12]), geom = c("point"),ellipse.type = "euclid")
#description 3
#plotting each type of electricity in each cluster
ggplot(data = kmeans_df, aes(y = Cluster)) +
geom_bar(aes(fill = e2max)) +
ggtitle("Count of Clusters by Secondary Source of Electricity") +
theme(plot.title = element_text(hjust = 0.5))
K-Means clustering is basically clustering data into different clusters with their nearest mean. Applying K-Means clustering algorithm over the data frame for better analysis of data.You can see from the head of the data frame that secondary energy source of Afghanistan is renewables_electricity, and it belongs to cluster 4 depending on the amount in kWH produced.Here, K-Means clustering was used to cluster the secondary energy source production in each country. We can see that most of the data has lesser secondary source productions as most of the clusters are towards the left most end.The cluster towards the right is mostly comprising fossil electricity which is the most produced electricity type.Clustering helps in analyzing the dataset in a way that excludes outliers from the clusters.We achieved clustering analysis using factoextra library and functions like kmeans and fviz_cluster Following is the visualisation of clusters and their counts depending on the electricity type. Each bar shows different electricity types contained in each cluster depicting their production amount.Contradicting to expectation was fossil electricity is still the second highest source of energy in various countries.
We can see that based on our analysis that :
References