Since the invention of electricity, electricity has been the most important type of energy source in human histroy. As technology goes by and more advance these days, the use of electricity has been very crucial in almost all aspects of our everyday life.
Bussiness Requirements:
The main data on this report is gathered some insights and make some simple visual representation chart and graph from the worldwide electricity production based on its categories. This report try to gather on how the electricity has been used so far, based on its categories. As there are many categories where the electicity is produced from, we will only select the top 5 categories from what available.
Furthermore, as a preliminary report, this report wil only focused on the ASEAN Countries, where the report will try to present some insights on how the electricity is produced in these ASEAN Countries, with some of coprison values on it.
From our inspection we can conclude : - power data contain 1189482 of rows and 7 of columns.
energy <- read.csv("energy1.csv")
head(energy)tail(energy)dim(energy)## [1] 1048575 7
str(energy)## 'data.frame': 1048575 obs. of 7 variables:
## $ country_or_area : chr "Austria" "Austria" "Belgium" "Belgium" ...
## $ commodity_transaction: chr "Additives and Oxygenates - Exports" "Additives and Oxygenates - Exports" "Additives and Oxygenates - Exports" "Additives and Oxygenates - Exports" ...
## $ year : int 1996 1995 2014 2013 2012 2011 2010 2009 1998 1995 ...
## $ unit : chr "Metric tons" "Metric tons" "Metric tons" "Metric tons" ...
## $ quantity : num 5 17 0 0 35 25 22 45 1 7 ...
## $ quantity_footnotes : int NA NA NA NA NA NA NA NA NA NA ...
## $ category : chr "additives_and_oxygenates" "additives_and_oxygenates" "additives_and_oxygenates" "additives_and_oxygenates" ...
Dictionary of Data: - country_or_area : type chr - commodity_transaction : type chr - year : type int - unit : type chr - quantity : type num - quantity_footnotes : type int - category : type chr
As some of data types are not in the corect types. We need to convert it into the corect data-type to enable further data exploration.
Data Type to be corrected: - Country or Area (country_or_area) as Factor - Commodity Transaction (commodity_transaction) as Factor - Category (category) as Factor - Unit (unit) as Factor
In this report, we will also delete data that has null value.
any(is.na(energy))## [1] TRUE
na.omit(energy)energy$country_or_area <- as.factor(energy$country_or_area)
energy$commodity_transaction <- as.factor(energy$commodity_transaction)
energy$category <- as.factor(energy$category)
energy$unit <- as.factor(energy$unit)
options(scipen = 1234)
str(energy)## 'data.frame': 1048575 obs. of 7 variables:
## $ country_or_area : Factor w/ 243 levels "Afghanistan",..: 14 14 21 21 21 21 21 21 58 58 ...
## $ commodity_transaction: Factor w/ 2336 levels "Additives and Oxygenates - Exports",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ year : int 1996 1995 2014 2013 2012 2011 2010 2009 1998 1995 ...
## $ unit : Factor w/ 1 level "Metric tons": 1 1 1 1 1 1 1 1 1 1 ...
## $ quantity : num 5 17 0 0 35 25 22 45 1 7 ...
## $ quantity_footnotes : int NA NA NA NA NA NA NA NA NA NA ...
## $ category : Factor w/ 67 levels "additives_and_oxygenates",..: 1 1 1 1 1 1 1 1 1 1 ...
levels(energy$category)## [1] "additives_and_oxygenates"
## [2] "animal_waste"
## [3] "anthracite"
## [4] "aviation_gasoline"
## [5] "bagasse"
## [6] "biodiesel"
## [7] "biogases"
## [8] "biogasoline"
## [9] "bitumen"
## [10] "black_liquor"
## [11] "blast_furnace_gas"
## [12] "brown_coal"
## [13] "brown_coal_briquettes"
## [14] "charcoal"
## [15] "coal_tar"
## [16] "coke_oven_coke"
## [17] "coking_coal"
## [18] "conventional_crude_oil"
## [19] "direct_use_of_geothermal_heat"
## [20] "direct_use_of_solar_thermal_heat"
## [21] "electricity_net_installed_capacity_of_electric_power_plants"
## [22] "ethane"
## [23] "falling_water"
## [24] "fuel_oil"
## [25] "fuelwood"
## [26] "gas_coke"
## [27] "gas_oil_diesel_oil"
## [28] "gasoline_type_jet_fuel"
## [29] "gasworks_gas"
## [30] "geothermal"
## [31] "hard_coal"
## [32] "heat"
## [33] "hydro"
## [34] "industrial_waste"
## [35] "kerosene_type_jet_fuel"
## [36] "lignite"
## [37] "liquified_petroleum_gas"
## [38] "lubricants"
## [39] "motor_gasoline"
## [40] "municipal_wastes"
## [41] "naphtha"
## [42] "natural_gas_including_lng"
## [43] "natural_gas_liquids"
## [44] "nuclear_electricity"
## [45] "of_which_biodiesel"
## [46] "of_which_biogasoline"
## [47] "oil_shale_oil_sands"
## [48] "other_bituminous_coal"
## [49] "other_coal_products"
## [50] "other_hydrocarbons"
## [51] "other_kerosene"
## [52] "other_liquid_biofuels"
## [53] "other_oil_products_n_e_c"
## [54] "other_recovered_gases"
## [55] "other_vegetal_material_and_residues"
## [56] "paraffin_waxes"
## [57] "patent_fuel"
## [58] "peat"
## [59] "peat_products"
## [60] "petroleum_coke"
## [61] "refinery_feedstocks"
## [62] "refinery_gas"
## [63] "solar_electricity"
## [64] "sub_bituminous_coal"
## [65] "thermal_electricity"
## [66] "tide_wave_and_ocean_electricity"
## [67] "total_electricity"
From confirmation using str(), we can confirm that all of the Data Type to be changed has been corrected properly: - Country or Area (country_or_area) as Factor - Commodity Transaction (commodity_transaction) as Factor - Category (category) as Factor - Unit (unit) as Factor
In this report, data will be subsetted based on following conditions: - Only focused on ASEAN Region, with member = Brunei Darussalam, Cambodia, Indonesia, Laos, Malaysia, Myanmar, Phillipines, Singapore, Thailand, and Vietnam
As it will not be needed on this report, we will also delete column and Quantity that will not be needed on the report (Quantity Footnotes)
Find each Total Electricity produced per year for each ASEAN Countries.
country_asean <- c("Brunei Darussalam", "Cambodia", "Indonesia", "Laos", "Malaysia", "Myanmar", "Phillipines", "Singapore", "Thailand", "Vietnam")
energy_asean <- energy[energy$country_or_area == country_asean,]## Warning in `==.default`(energy$country_or_area, country_asean): longer object
## length is not a multiple of shorter object length
## Warning in is.na(e1) | is.na(e2): longer object length is not a multiple of
## shorter object length
energy_asean$quantity_footnotes <- NULL
sample(energy_asean)From the data sampling using sample() function, we can conclude that the data has been subsetted only to include ASEAN Countries as mentioned on object country_asean, where the column quantity_footnotes has also been excluded.
To get more understanding of the data, we will also want to do aggregate the Data of Quantity based on Year, Category, and Country:
We will also want to know the Total Qty of Electricity Produce by each Countries.
library("dplyr")##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
tot_elect_asean_agg1 <- aggregate(quantity ~ year + country_or_area + category ,energy_asean,sum)
tot_elect_asean_agg1tot_elect_asean_agg2 <- aggregate(quantity ~ country_or_area ,energy_asean,sum)
tot_elect_asean_agg2tot_elect_asean_agg3 <- aggregate(quantity ~ category ,energy_asean, sum)
tot_elect_asean_agg3top10_country <- top_n(tot_elect_asean_agg2, 10, quantity)
top10_countryAs for Category, we also want to filter/ aggregate the data to know the top 5 category of total quantity of electricity been produced.
top5_category <- top_n(tot_elect_asean_agg3, 5, quantity)
top5_category[order(top5_category$quantity, decreasing = T),]top5_categoryThe top 5 Categories are: - natural_gas_including_lng - peat
- hard_coal - brown_coal - other_vegetal_material_and_residues
As there are too many Categories with too dispersed/ un-even quantity values, in this report we will only focus on Top 5 Categories:
energy_asean_top5cat <- energy_asean[energy_asean$category == top5_category$category,]
sample(energy_asean_top5cat)library(ggplot2)
ggplot(data = energy_asean_top5cat, mapping = aes( y = country_or_area, x = quantity)) +
geom_col(aes(fill = category)) +
labs(title = "Electricity Produce By ASEAN Countries",
y = "Country",
x = "Quantity") +
theme_minimal() +
theme(legend.position = "bottom")ggplot(data = top5_category, mapping = aes(x = quantity, y = category)) +
geom_col(aes(fill = category)) +
theme(legend.position = "bottom") +
labs(title = "Electricity Produce By Category",
y = "Category",
x = "Quantity") +
theme_minimal() +
theme(legend.position = "bottom")To make it easier in reading and understanding the data, we will sort the visual data to show from highest to lowest, and give the vertical line for average quantity value for the electricity graph by Categories:
ggplot(data = energy_asean_top5cat, mapping = aes( y = reorder(country_or_area, quantity), x = quantity)) +
geom_col(aes(fill = category)) +
labs(title = "Electricity Produce By ASEAN Countries",
y = "Country",
x = "Quantity") +
theme_minimal() +
theme(legend.position = "bottom")ggplot(data = top5_category, mapping = aes(x = quantity, y = reorder(category, quantity))) +
geom_col(aes(fill = category), position = "stack", show.legend = F) +
geom_vline(xintercept = mean(top5_category$quantity), col = "firebrick") +
theme(legend.position = "bottom") +
labs(title = "Top 5 Electricity Produce By Category",
y = "Category",
x = "Quantity") +
theme_minimal() +
theme(legend.position = "bottom")From the data, we can conclude as follows: - Top 5 of Category = - natural_gas_including_lng
- peat
- hard_coal - brown_coal - other_vegetal_material_and_residues
Last, from the top 5 Category, we will try to visualize each of the category when compared with quantity and year:
library(lattice)
top5_cat <- c("hard_coal" , "natural_gas_including_lng" , "peat" , "other_vegetal_material_and_residues" , "brown_coal")
ggplot(data = tot_elect_asean_agg1[tot_elect_asean_agg1$category == top5_category$category,], mapping = aes(x = year, y = quantity)) +
geom_col(mapping = aes(fill = year), position = "dodge", show.legend = F) + facet_wrap(~category, scales = "free_y") + theme(axis.text.x = element_text(angle = 90))+
labs(title = "Comparison Between Top 5 Electricity Categories in ASEAN Countries",
y = "Category",
x = "Year") ## Warning in `==.default`(tot_elect_asean_agg1$category, top5_category$category):
## longer object length is not a multiple of shorter object length
## Warning in is.na(e1) | is.na(e2): longer object length is not a multiple of
## shorter object length
agg_top5cat <- tot_elect_asean_agg1[tot_elect_asean_agg1$category == top5_category$category,]## Warning in `==.default`(tot_elect_asean_agg1$category, top5_category$category):
## longer object length is not a multiple of shorter object length
## Warning in `==.default`(tot_elect_asean_agg1$category, top5_category$category):
## longer object length is not a multiple of shorter object length
xyplot(quantity ~ year | category , data=agg_top5cat, pch=20 , cex=3 , col=rgb(0.2,0.4,0.8,0.5)) Each of the data above show that there are no correlation between year and the quantity that the electricity been produced when compared to its category. This may due to each of the ASEAN Countries governments may produce the electricity based on their specific needs per year and per its category.