Since the invention of electricity, electricity has been the most important type of energy source in human histroy. As technology goes by and more advance these days, the use of electricity has been very crucial in almost all aspects of our everyday life.
The main data on this report is gathered from the worldwide electricity production based on its categories. This report will then try to gather more insights on how the electricity has been used so far, based on its categories. As there are many categories where the electicity is produced from, we will only select the top 5 categories from what available.
Furthermore, as a preliminary report, this report wil only focused on the ASEAN Countries, where the report will try to present some insights on how the electricity is produced in these ASEAN Countries, with some of coprison values on it.
From our inspection we can conclude : - power data contain 1189482 of rows and 7 of columns.
energy <- read.csv("energy.csv")
head(energy)tail(energy)dim(energy)## [1] 1189482 7
Dictionary of Data: - country_or_area : type chr - commodity_transaction : type chr - year : type int - unit : type chr - quantity : type num - quantity_footnotes : type int - category : type chr
First, we need to check the data types for each column using str() function:
str(energy)## 'data.frame': 1189482 obs. of 7 variables:
## $ country_or_area : chr "Austria" "Austria" "Belgium" "Belgium" ...
## $ commodity_transaction: chr "Additives and Oxygenates - Exports" "Additives and Oxygenates - Exports" "Additives and Oxygenates - Exports" "Additives and Oxygenates - Exports" ...
## $ year : int 1996 1995 2014 2013 2012 2011 2010 2009 1998 1995 ...
## $ unit : chr "Metric tons, thousand" "Metric tons, thousand" "Metric tons, thousand" "Metric tons, thousand" ...
## $ quantity : num 5 17 0 0 35 25 22 45 1 7 ...
## $ quantity_footnotes : int NA NA NA NA NA NA NA NA NA NA ...
## $ category : chr "additives_and_oxygenates" "additives_and_oxygenates" "additives_and_oxygenates" "additives_and_oxygenates" ...
As some of data types are not in the corect types. We need to convert it into the corect data-type to enable further data exploration.
Data Type to be corrected are as follows: - Country or Area (country_or_area) as Factor - Commodity Transaction (commodity_transaction) as Factor - Category (category) as Factor
energy$country_or_area <- as.factor(energy$country_or_area)
energy$commodity_transaction <- as.factor(energy$commodity_transaction)
energy$category <- as.factor(energy$category)
str(energy)## 'data.frame': 1189482 obs. of 7 variables:
## $ country_or_area : Factor w/ 243 levels "Afghanistan",..: 14 14 21 21 21 21 21 21 58 58 ...
## $ commodity_transaction: Factor w/ 2452 levels "Additives and Oxygenates - Exports",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ year : int 1996 1995 2014 2013 2012 2011 2010 2009 1998 1995 ...
## $ unit : chr "Metric tons, thousand" "Metric tons, thousand" "Metric tons, thousand" "Metric tons, thousand" ...
## $ quantity : num 5 17 0 0 35 25 22 45 1 7 ...
## $ quantity_footnotes : int NA NA NA NA NA NA NA NA NA NA ...
## $ category : Factor w/ 71 levels "additives_and_oxygenates",..: 1 1 1 1 1 1 1 1 1 1 ...
levels(energy$category)## [1] "additives_and_oxygenates"
## [2] "animal_waste"
## [3] "anthracite"
## [4] "aviation_gasoline"
## [5] "bagasse"
## [6] "biodiesel"
## [7] "biogases"
## [8] "biogasoline"
## [9] "bitumen"
## [10] "black_liquor"
## [11] "blast_furnace_gas"
## [12] "brown_coal"
## [13] "brown_coal_briquettes"
## [14] "charcoal"
## [15] "coal_tar"
## [16] "coke_oven_coke"
## [17] "coking_coal"
## [18] "conventional_crude_oil"
## [19] "direct_use_of_geothermal_heat"
## [20] "direct_use_of_solar_thermal_heat"
## [21] "electricity_net_installed_capacity_of_electric_power_plants"
## [22] "ethane"
## [23] "falling_water"
## [24] "fuel_oil"
## [25] "fuelwood"
## [26] "gas_coke"
## [27] "gas_oil_diesel_oil"
## [28] "gasoline_type_jet_fuel"
## [29] "gasworks_gas"
## [30] "geothermal"
## [31] "hard_coal"
## [32] "heat"
## [33] "hydro"
## [34] "industrial_waste"
## [35] "kerosene_type_jet_fuel"
## [36] "lignite"
## [37] "liquified_petroleum_gas"
## [38] "lubricants"
## [39] "motor_gasoline"
## [40] "municipal_wastes"
## [41] "naphtha"
## [42] "natural_gas_including_lng"
## [43] "natural_gas_liquids"
## [44] "nuclear_electricity"
## [45] "of_which_biodiesel"
## [46] "of_which_biogasoline"
## [47] "oil_shale_oil_sands"
## [48] "other_bituminous_coal"
## [49] "other_coal_products"
## [50] "other_hydrocarbons"
## [51] "other_kerosene"
## [52] "other_liquid_biofuels"
## [53] "other_oil_products_n_e_c"
## [54] "other_recovered_gases"
## [55] "other_vegetal_material_and_residues"
## [56] "paraffin_waxes"
## [57] "patent_fuel"
## [58] "peat"
## [59] "peat_products"
## [60] "petroleum_coke"
## [61] "refinery_feedstocks"
## [62] "refinery_gas"
## [63] "solar_electricity"
## [64] "sub_bituminous_coal"
## [65] "thermal_electricity"
## [66] "tide_wave_and_ocean_electricity"
## [67] "total_electricity"
## [68] "total_refinery_output"
## [69] "uranium"
## [70] "white_spirit_and_special_boiling_point_industrial_spirits"
## [71] "wind_electricity"
By using the function str(), we can confirm that all of the Data Types has been changed/ corrected properly, as follow: - Country or Area (country_or_area) data type has been changed as Factor - Commodity Transaction (commodity_transaction) data type has been changed as Factor - Category (category) data type has been changed as Factor
For this Report, we will only focus on Electricity been produced in ASEAN Region.
Here, we will use the member of ASEAN Countries as follow: - Brunei Darussalam - Cambodia - Indonesia - Laos - Malaysia - Myanmar - Phillipines - Singapore - Thailand - Vietnam
country_asean <- c("Brunei Darussalam", "Cambodia", "Indonesia", "Laos", "Malaysia", "Myanmar", "Phillipines", "Singapore", "Thailand", "Vietnam")
energy_asean <- energy[energy$country_or_area == country_asean,]## Warning in `==.default`(energy$country_or_area, country_asean): longer object
## length is not a multiple of shorter object length
## Warning in is.na(e1) | is.na(e2): longer object length is not a multiple of
## shorter object length
sample(energy_asean)From the data sampling using sample() function, we can conclude that the data has been subsetted only to include ASEAN Countries as mentioned on object country_asean.
As not all information of the data needed, we need to make some Data wrangling by: - Do subset to omit 4th column as it will not be needed on the report. - Drop Levels that are not considered as ASEAN countries listed on country_asean.
energy_asean$quantity_footnotes <- NULL
energy_asean$country_or_area <- droplevels( energy_asean$country_or_area)
energy_asean <- energy_asean[energy_asean$quantity > 0,]
energy_asean <- energy_asean[order(energy_asean$quantity, decreasing = T),]
head(energy_asean)tail(energy_asean)hist(energy_asean$year, freq = energy_asean$quantity)## Warning in if (freq) x$counts else x$density: the condition has length > 1 and
## only the first element will be used
## Warning in if (!freq) "Density" else "Frequency": the condition has length > 1
## and only the first element will be used
summary(energy_asean)## country_or_area
## Brunei Darussalam:343
## Cambodia :240
## Indonesia :766
## Malaysia :609
## Myanmar :520
## Singapore :489
## Thailand :703
## commodity_transaction
## Electricity - net installed capacity of electric power plants, public combustible fuels: 20
## Gas Oil/ Diesel Oil - Total energy supply : 20
## Electricity - Consumption by other manuf., const. and non-fuel ind. : 19
## Electricity - total production, main activity : 19
## From combustible fuels â\200“ Main activity : 19
## Fuelwood - Total energy supply : 19
## (Other) :3554
## year unit quantity
## Min. :1990 Length:3670 Min. : 0
## 1st Qu.:1997 Class :character 1st Qu.: 50
## Median :2003 Mode :character Median : 553
## Mean :2003 Mean : 198500
## 3rd Qu.:2009 3rd Qu.: 5419
## Max. :2014 Max. :200000000
##
## category
## total_electricity : 358
## gas_oil_diesel_oil : 309
## natural_gas_including_lng: 274
## fuel_oil : 250
## liquified_petroleum_gas : 214
## motor_gasoline : 192
## (Other) :2073
From the data above, we can conclude as follows: - Indonesia is the country with highest electricity produce in ASEAN region, followed by Thailand, and Malaysia. - Top 5 of biggest Electricity Producer Countries are: Indonesia, Thailand, Malaysia, Myanmar, and Singapore - Data is excavated from 1990 until 2014 for all ASEAN Region.
We want to explore more Insights by conducting Data Aggregation.
Explore Total Electricity Produce based on Year:
tot_energy_asean <- energy_asean[energy_asean$category == "total_electricity",]
agg1 <- aggregate(tot_energy_asean$quantity~tot_energy_asean$year,tot_energy_asean,sum)
head(agg1)tail(agg1)summary(agg1)## tot_energy_asean$year tot_energy_asean$quantity
## Min. :1990 Min. : 86216
## 1st Qu.:1996 1st Qu.:168177
## Median :2002 Median :244410
## Mean :2002 Mean :290187
## 3rd Qu.:2008 3rd Qu.:370163
## Max. :2014 Max. :894132
Insight: - In overall, the trend of Energy/ Electricity Produce year-by-year on ASEAN region is increasing.
We like to explore more insights from the Top 3 Countries. Using BOXPLOT Chart, we can then make a comparison for TOP 3 countries, between Indonesia, Thailand, and Malaysia:
indo_energy <- tot_energy_asean[tot_energy_asean$country_or_area == "Indonesia",]
thai_energy <- tot_energy_asean[tot_energy_asean$country_or_area == "Thailand",]
malay_energy <- tot_energy_asean[tot_energy_asean$country_or_area == "Malaysia",]
sample(indo_energy)boxplot(indo_energy$quantity, thai_energy$quantity, malay_energy$quantity, names=c("Indonesia","Thailand","Malaysia"))Insights that we can conclude from the chart are as follows: - All of top 3 countries have outliers in total energy produce, which seems to represent above-normal electricity produce in certain year.