Geo Dipa Energy’s Geothermal Power Plant, located at Dieng, West Java, Indonesia
A power plant is an industrial facility that generates electricity from primary energy. Most power plants use one or more generators that convert mechanical energy into electrical energy. The energy source harnessed to turn the generator varies widely. Most power plants in the world burn fossil fuels such as coal, oil, and natural gas to generate electricity. Low-carbon power sources include nuclear power, and an increasing use of renewables such as solar, wind, geothermal, and hydroelectric.
Dataset include all power plants from all over the world classified by the plant primary fuel and secondary fuel. Primary fuel and secondary fuel includes oil, hydro, gas, solar, etc. Analysis objective is to gain insight on the how the dataset represent, comparing each country power capacity in MW, comparing the power plant primary fuel and to gain insight from it for further analysis that can be compare to each country power needs, etc.
Raw data obtained is inspected first to see the data structure, etc. Before analysis, data needs to be cleaned to easier use for further analysis.
#load library for data preparation
library(dplyr)#read csv data into data frame
power_plant <- read.csv("power_plant.csv", na.strings = "")#inspect few first rows of the data frame
head(power_plant)#inspect data
glimpse(power_plant)#> Rows: 34,936
#> Columns: 16
#> $ country.code <chr> "AFG", "AFG", "AFG", "AFG", "AFG", "AFG"~
#> $ country <chr> "Afghanistan", "Afghanistan", "Afghanist~
#> $ name.of.powerplant <chr> "Kajaki Hydroelectric Power Plant Afghan~
#> $ capacity.in.MW <dbl> 33.00, 10.00, 10.00, 66.00, 100.00, 11.5~
#> $ latitude <dbl> 32.3220, 31.6700, 31.6230, 34.5560, 34.6~
#> $ longitude <dbl> 65.1190, 65.7950, 65.7920, 69.4787, 69.7~
#> $ primary_fuel <chr> "Hydro", "Solar", "Solar", "Hydro", "Hyd~
#> $ secondary.fuel <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ~
#> $ other_fuel.1 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ~
#> $ other_fuel.2 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ~
#> $ start.date <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 1965~
#> $ owner.of.plant <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ~
#> $ geolocation_source <chr> "GEODB", "Wiki-Solar", "Wiki-Solar", "GE~
#> $ generation_gwh_2020 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ~
#> $ generation_data_source <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ~
#> $ estimated_generation_gwh_2020 <dbl> 119.50, 18.29, 18.72, 174.91, 350.80, 46~
#checking missing value
colSums(is.na(power_plant))#> country.code country
#> 0 0
#> name.of.powerplant capacity.in.MW
#> 0 0
#> latitude longitude
#> 0 0
#> primary_fuel secondary.fuel
#> 0 32992
#> other_fuel.1 other_fuel.2
#> 34660 34844
#> start.date owner.of.plant
#> 17489 14068
#> geolocation_source generation_gwh_2020
#> 419 25277
#> generation_data_source estimated_generation_gwh_2020
#> 23536 1798
From the missing value data and data inspection, there is some unnecessary data that can be removed ;
country.code : Already represented in
countrysecondary_fuel: Only primary fuel is necessary to
classifyother_fuel.1, other_fuel.2: Only primary
fuel is necessary to classifystart.date: Start date of the plant is not necessary in
this analysisowner.of.plant: Additional data to have, not
necessarygeolocation_source: Location source, not necessary in
the analysisgeneration_data_source: Not necessary for the
analysisgeneration_gwh_2020: Removed because too many missing
data#removing unnecessary data
power_plant_clean <- power_plant %>%
select(-c(country.code,
secondary.fuel,
other_fuel.1,
other_fuel.2,
start.date,
owner.of.plant,
geolocation_source,
generation_data_source,
generation_gwh_2020))#checking cleaned data
power_plant_cleanData have been prepared and ready to going through analysis, this is just a few example of analysis of this data provided with Insights.
#Sorting the power plants data by Capacity in MW
power_plant_clean[order(power_plant_clean$capacity.in.MW,
decreasing = T),]#Sorting the power plants data by Estimated Generation in 2020
power_plant_clean[order(power_plant_clean$estimated_generation_gwh_2020,
decreasing = T),]# Power Plants capacity based on its primary fuel
power_plant_capacity <- aggregate(formula = capacity.in.MW~primary_fuel,
data = power_plant_clean,
FUN = sum)
power_plant_capacity[order(power_plant_capacity$capacity.in.MW,
decreasing = T),]power_plant_country <- as.data.frame(table(power_plant_clean$country))
power_plant_country[order(power_plant_country$Freq,
decreasing = T),]Insights:
#Power Plants filtered only located in Indonesia
indonesia <- power_plant_clean[power_plant_clean$country == "Indonesia",]
indonesia#Indonesia Power Plant sorted by capacity in MW
indonesia[order(indonesia$capacity.in.MW,
decreasing = T),]#Indonesia Power Plant primary fuel comparison
table(indonesia$primary_fuel)#>
#> Coal Gas Geothermal Hydro Oil
#> 70 41 10 41 16
#Indonesia Power Plant primary fuel comparison in percentage
prop.table(table(indonesia$primary_fuel))#>
#> Coal Gas Geothermal Hydro Oil
#> 0.39325843 0.23033708 0.05617978 0.23033708 0.08988764
Insights: