LBB-P4DS: Energy Producer in Asean

Wayan K

3/26/2021


1 About the Data

Since the invention of electricity, electricity has been the most important type of energy source in human histroy. As technology goes by and more advance these days, the use of electricity has been very crucial in almost all aspects of our everyday life.

The main data on this report is gathered from the worldwide electricity production based on its categories. This report will then try to gather more insights on how the electricity has been used so far, based on its categories. As there are many categories where the electicity is produced from, we will only select the top 5 categories from what available.

Furthermore, as a preliminary report, this report wil only focused on the ASEAN Countries, where the report will try to present some insights on how the electricity is produced in these ASEAN Countries, with some of coprison values on it.

2 Data Input and Inspection

From our inspection we can conclude : - power data contain 1189482 of rows and 7 of columns.

energy <- read.csv("energy.csv") 

head(energy)
tail(energy)
dim(energy)
## [1] 1189482       7

Dictionary of Data: - country_or_area : type chr - commodity_transaction : type chr - year : type int - unit : type chr - quantity : type num - quantity_footnotes : type int - category : type chr

3 Data Cleansing & Coersion

First, we need to check the data types for each column using str() function:

str(energy)
## 'data.frame':    1189482 obs. of  7 variables:
##  $ country_or_area      : chr  "Austria" "Austria" "Belgium" "Belgium" ...
##  $ commodity_transaction: chr  "Additives and Oxygenates - Exports" "Additives and Oxygenates - Exports" "Additives and Oxygenates - Exports" "Additives and Oxygenates - Exports" ...
##  $ year                 : int  1996 1995 2014 2013 2012 2011 2010 2009 1998 1995 ...
##  $ unit                 : chr  "Metric tons,  thousand" "Metric tons,  thousand" "Metric tons,  thousand" "Metric tons,  thousand" ...
##  $ quantity             : num  5 17 0 0 35 25 22 45 1 7 ...
##  $ quantity_footnotes   : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ category             : chr  "additives_and_oxygenates" "additives_and_oxygenates" "additives_and_oxygenates" "additives_and_oxygenates" ...

As some of data types are not in the corect types. We need to convert it into the corect data-type to enable further data exploration.

Data Type to be corrected are as follows: - Country or Area (country_or_area) as Factor - Commodity Transaction (commodity_transaction) as Factor - Category (category) as Factor

energy$country_or_area <- as.factor(energy$country_or_area)
energy$commodity_transaction <- as.factor(energy$commodity_transaction)
energy$category <- as.factor(energy$category)

str(energy)
## 'data.frame':    1189482 obs. of  7 variables:
##  $ country_or_area      : Factor w/ 243 levels "Afghanistan",..: 14 14 21 21 21 21 21 21 58 58 ...
##  $ commodity_transaction: Factor w/ 2452 levels "Additives and Oxygenates - Exports",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ year                 : int  1996 1995 2014 2013 2012 2011 2010 2009 1998 1995 ...
##  $ unit                 : chr  "Metric tons,  thousand" "Metric tons,  thousand" "Metric tons,  thousand" "Metric tons,  thousand" ...
##  $ quantity             : num  5 17 0 0 35 25 22 45 1 7 ...
##  $ quantity_footnotes   : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ category             : Factor w/ 71 levels "additives_and_oxygenates",..: 1 1 1 1 1 1 1 1 1 1 ...
levels(energy$category)
##  [1] "additives_and_oxygenates"                                   
##  [2] "animal_waste"                                               
##  [3] "anthracite"                                                 
##  [4] "aviation_gasoline"                                          
##  [5] "bagasse"                                                    
##  [6] "biodiesel"                                                  
##  [7] "biogases"                                                   
##  [8] "biogasoline"                                                
##  [9] "bitumen"                                                    
## [10] "black_liquor"                                               
## [11] "blast_furnace_gas"                                          
## [12] "brown_coal"                                                 
## [13] "brown_coal_briquettes"                                      
## [14] "charcoal"                                                   
## [15] "coal_tar"                                                   
## [16] "coke_oven_coke"                                             
## [17] "coking_coal"                                                
## [18] "conventional_crude_oil"                                     
## [19] "direct_use_of_geothermal_heat"                              
## [20] "direct_use_of_solar_thermal_heat"                           
## [21] "electricity_net_installed_capacity_of_electric_power_plants"
## [22] "ethane"                                                     
## [23] "falling_water"                                              
## [24] "fuel_oil"                                                   
## [25] "fuelwood"                                                   
## [26] "gas_coke"                                                   
## [27] "gas_oil_diesel_oil"                                         
## [28] "gasoline_type_jet_fuel"                                     
## [29] "gasworks_gas"                                               
## [30] "geothermal"                                                 
## [31] "hard_coal"                                                  
## [32] "heat"                                                       
## [33] "hydro"                                                      
## [34] "industrial_waste"                                           
## [35] "kerosene_type_jet_fuel"                                     
## [36] "lignite"                                                    
## [37] "liquified_petroleum_gas"                                    
## [38] "lubricants"                                                 
## [39] "motor_gasoline"                                             
## [40] "municipal_wastes"                                           
## [41] "naphtha"                                                    
## [42] "natural_gas_including_lng"                                  
## [43] "natural_gas_liquids"                                        
## [44] "nuclear_electricity"                                        
## [45] "of_which_biodiesel"                                         
## [46] "of_which_biogasoline"                                       
## [47] "oil_shale_oil_sands"                                        
## [48] "other_bituminous_coal"                                      
## [49] "other_coal_products"                                        
## [50] "other_hydrocarbons"                                         
## [51] "other_kerosene"                                             
## [52] "other_liquid_biofuels"                                      
## [53] "other_oil_products_n_e_c"                                   
## [54] "other_recovered_gases"                                      
## [55] "other_vegetal_material_and_residues"                        
## [56] "paraffin_waxes"                                             
## [57] "patent_fuel"                                                
## [58] "peat"                                                       
## [59] "peat_products"                                              
## [60] "petroleum_coke"                                             
## [61] "refinery_feedstocks"                                        
## [62] "refinery_gas"                                               
## [63] "solar_electricity"                                          
## [64] "sub_bituminous_coal"                                        
## [65] "thermal_electricity"                                        
## [66] "tide_wave_and_ocean_electricity"                            
## [67] "total_electricity"                                          
## [68] "total_refinery_output"                                      
## [69] "uranium"                                                    
## [70] "white_spirit_and_special_boiling_point_industrial_spirits"  
## [71] "wind_electricity"

By using the function str(), we can confirm that all of the Data Types has been changed/ corrected properly, as follow: - Country or Area (country_or_area) data type has been changed as Factor - Commodity Transaction (commodity_transaction) data type has been changed as Factor - Category (category) data type has been changed as Factor

For this Report, we will only focus on Electricity been produced in ASEAN Region.

Here, we will use the member of ASEAN Countries as follow: - Brunei Darussalam - Cambodia - Indonesia - Laos - Malaysia - Myanmar - Phillipines - Singapore - Thailand - Vietnam

country_asean <- c("Brunei Darussalam", "Cambodia", "Indonesia", "Laos", "Malaysia", "Myanmar", "Phillipines", "Singapore", "Thailand", "Vietnam")

energy_asean <- energy[energy$country_or_area == country_asean,]
## Warning in `==.default`(energy$country_or_area, country_asean): longer object
## length is not a multiple of shorter object length
## Warning in is.na(e1) | is.na(e2): longer object length is not a multiple of
## shorter object length
sample(energy_asean)

From the data sampling using sample() function, we can conclude that the data has been subsetted only to include ASEAN Countries as mentioned on object country_asean.

As not all information of the data needed, we need to make some Data wrangling by: - Do subset to omit 4th column as it will not be needed on the report. - Drop Levels that are not considered as ASEAN countries listed on country_asean.

energy_asean$quantity_footnotes <- NULL

energy_asean$country_or_area <- droplevels( energy_asean$country_or_area)
energy_asean <- energy_asean[energy_asean$quantity > 0,]
energy_asean <- energy_asean[order(energy_asean$quantity, decreasing = T),]

head(energy_asean)
tail(energy_asean)

4 Data Analysis of Electricity produce in ASEAN Region:

hist(energy_asean$year, freq = energy_asean$quantity)
## Warning in if (freq) x$counts else x$density: the condition has length > 1 and
## only the first element will be used
## Warning in if (!freq) "Density" else "Frequency": the condition has length > 1
## and only the first element will be used

summary(energy_asean)
##           country_or_area
##  Brunei Darussalam:343   
##  Cambodia         :240   
##  Indonesia        :766   
##  Malaysia         :609   
##  Myanmar          :520   
##  Singapore        :489   
##  Thailand         :703   
##                                                                              commodity_transaction
##  Electricity - net installed capacity of electric power plants, public combustible fuels:  20     
##  Gas Oil/ Diesel Oil - Total energy supply                                              :  20     
##  Electricity - Consumption by other manuf., const. and non-fuel ind.                    :  19     
##  Electricity - total production, main activity                                          :  19     
##  From combustible fuels â\200“ Main activity                                               :  19     
##  Fuelwood - Total energy supply                                                         :  19     
##  (Other)                                                                                :3554     
##       year          unit              quantity        
##  Min.   :1990   Length:3670        Min.   :        0  
##  1st Qu.:1997   Class :character   1st Qu.:       50  
##  Median :2003   Mode  :character   Median :      553  
##  Mean   :2003                      Mean   :   198500  
##  3rd Qu.:2009                      3rd Qu.:     5419  
##  Max.   :2014                      Max.   :200000000  
##                                                       
##                       category   
##  total_electricity        : 358  
##  gas_oil_diesel_oil       : 309  
##  natural_gas_including_lng: 274  
##  fuel_oil                 : 250  
##  liquified_petroleum_gas  : 214  
##  motor_gasoline           : 192  
##  (Other)                  :2073

From the data above, we can conclude as follows: - Indonesia is the country with highest electricity produce in ASEAN region, followed by Thailand, and Malaysia. - Top 5 of biggest Electricity Producer Countries are: Indonesia, Thailand, Malaysia, Myanmar, and Singapore - Data is excavated from 1990 until 2014 for all ASEAN Region.

We want to explore more Insights by conducting Data Aggregation.

Explore Total Electricity Produce based on Year:

tot_energy_asean <- energy_asean[energy_asean$category == "total_electricity",]
agg1 <- aggregate(tot_energy_asean$quantity~tot_energy_asean$year,tot_energy_asean,sum)

head(agg1)
tail(agg1)
summary(agg1)
##  tot_energy_asean$year tot_energy_asean$quantity
##  Min.   :1990          Min.   : 86216           
##  1st Qu.:1996          1st Qu.:168177           
##  Median :2002          Median :244410           
##  Mean   :2002          Mean   :290187           
##  3rd Qu.:2008          3rd Qu.:370163           
##  Max.   :2014          Max.   :894132

Insight: - In overall, the trend of Energy/ Electricity Produce year-by-year on ASEAN region is increasing.

We like to explore more insights from the Top 3 Countries. Using BOXPLOT Chart, we can then make a comparison for TOP 3 countries, between Indonesia, Thailand, and Malaysia:

indo_energy <- tot_energy_asean[tot_energy_asean$country_or_area == "Indonesia",]
thai_energy <- tot_energy_asean[tot_energy_asean$country_or_area == "Thailand",]
malay_energy <- tot_energy_asean[tot_energy_asean$country_or_area == "Malaysia",]

sample(indo_energy)
boxplot(indo_energy$quantity, thai_energy$quantity, malay_energy$quantity, names=c("Indonesia","Thailand","Malaysia"))

Insights that we can conclude from the chart are as follows: - All of top 3 countries have outliers in total energy produce, which seems to represent above-normal electricity produce in certain year.

  • All of the 3 countris seems to have similar data spread (shown on similar IQR range which shown on the similar size of the box), however, only Indonesia seems to have a more symmetrical/ normalized mean value, where the other two (Thailand and malaysia) seems to have non-symmetrical data spread values of electricity produce.