1 Global Power Plant

Geo Dipa Energy’s Geothermal Power Plant, located at Dieng, West Java, Indonesia

1.1 Brief Explanation

A power plant is an industrial facility that generates electricity from primary energy. Most power plants use one or more generators that convert mechanical energy into electrical energy. The energy source harnessed to turn the generator varies widely. Most power plants in the world burn fossil fuels such as coal, oil, and natural gas to generate electricity. Low-carbon power sources include nuclear power, and an increasing use of renewables such as solar, wind, geothermal, and hydroelectric.

1.2 Analysis Objective

Dataset include all power plants from all over the world classified by the plant primary fuel and secondary fuel. Primary fuel and secondary fuel includes oil, hydro, gas, solar, etc. Analysis objective is to gain insight on the how the dataset represent, comparing each country power capacity in MW, comparing the power plant primary fuel and to gain insight from it for further analysis that can be compare to each country power needs, etc.

2 Data Preparation

Raw data obtained is inspected first to see the data structure, etc. Before analysis, data needs to be cleaned to easier use for further analysis.

2.1 Data Inspection

#load library for data preparation
library(dplyr)
#read csv data into data frame
power_plant <- read.csv("power_plant.csv", na.strings = "")
#inspect few first rows of the data frame
head(power_plant)
#inspect data
glimpse(power_plant)
#> Rows: 34,936
#> Columns: 16
#> $ country.code                  <chr> "AFG", "AFG", "AFG", "AFG", "AFG", "AFG"~
#> $ country                       <chr> "Afghanistan", "Afghanistan", "Afghanist~
#> $ name.of.powerplant            <chr> "Kajaki Hydroelectric Power Plant Afghan~
#> $ capacity.in.MW                <dbl> 33.00, 10.00, 10.00, 66.00, 100.00, 11.5~
#> $ latitude                      <dbl> 32.3220, 31.6700, 31.6230, 34.5560, 34.6~
#> $ longitude                     <dbl> 65.1190, 65.7950, 65.7920, 69.4787, 69.7~
#> $ primary_fuel                  <chr> "Hydro", "Solar", "Solar", "Hydro", "Hyd~
#> $ secondary.fuel                <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ~
#> $ other_fuel.1                  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ~
#> $ other_fuel.2                  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ~
#> $ start.date                    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 1965~
#> $ owner.of.plant                <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ~
#> $ geolocation_source            <chr> "GEODB", "Wiki-Solar", "Wiki-Solar", "GE~
#> $ generation_gwh_2020           <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ~
#> $ generation_data_source        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ~
#> $ estimated_generation_gwh_2020 <dbl> 119.50, 18.29, 18.72, 174.91, 350.80, 46~

2.2 Data Cleaning

#checking missing value
colSums(is.na(power_plant))
#>                  country.code                       country 
#>                             0                             0 
#>            name.of.powerplant                capacity.in.MW 
#>                             0                             0 
#>                      latitude                     longitude 
#>                             0                             0 
#>                  primary_fuel                secondary.fuel 
#>                             0                         32992 
#>                  other_fuel.1                  other_fuel.2 
#>                         34660                         34844 
#>                    start.date                owner.of.plant 
#>                         17489                         14068 
#>            geolocation_source           generation_gwh_2020 
#>                           419                         25277 
#>        generation_data_source estimated_generation_gwh_2020 
#>                         23536                          1798

From the missing value data and data inspection, there is some unnecessary data that can be removed ;

  • country.code : Already represented in country
  • secondary_fuel: Only primary fuel is necessary to classify
  • other_fuel.1, other_fuel.2: Only primary fuel is necessary to classify
  • start.date: Start date of the plant is not necessary in this analysis
  • owner.of.plant: Additional data to have, not necessary
  • geolocation_source: Location source, not necessary in the analysis
  • generation_data_source: Not necessary for the analysis
  • generation_gwh_2020: Removed because too many missing data
#removing unnecessary data
power_plant_clean <- power_plant %>% 
    select(-c(country.code,
              secondary.fuel, 
              other_fuel.1, 
              other_fuel.2,
              start.date,
              owner.of.plant,
              geolocation_source,
              generation_data_source,
              generation_gwh_2020))
#checking cleaned data
power_plant_clean

3 Data Analysis

Data have been prepared and ready to going through analysis, this is just a few example of analysis of this data provided with Insights.

3.1 Global Analysis

#Sorting the power plants data by Capacity in MW
power_plant_clean[order(power_plant_clean$capacity.in.MW, 
                                        decreasing = T),]
#Sorting the power plants data by Estimated Generation in 2020
power_plant_clean[order(power_plant_clean$estimated_generation_gwh_2020, 
                                        decreasing = T),]
# Power Plants capacity based on its primary fuel 
power_plant_capacity <- aggregate(formula = capacity.in.MW~primary_fuel,
                             data = power_plant_clean,
                             FUN = sum)

power_plant_capacity[order(power_plant_capacity$capacity.in.MW,
                      decreasing = T),]
power_plant_country <- as.data.frame(table(power_plant_clean$country))

power_plant_country[order(power_plant_country$Freq,
                          decreasing = T),]

Insights:

  • The highest capacity and generation estimation power plant in the world is called Three Gorges Dam located in China.
  • Power Plant with Coal as its primary fuel have the highest capacity in total all over the world.
  • United States have the most Power Plants in total with 9833 power plants.

3.2 Indonesia Analysis

#Power Plants filtered only located in Indonesia
indonesia <- power_plant_clean[power_plant_clean$country == "Indonesia",]
indonesia
#Indonesia Power Plant sorted by capacity in MW
indonesia[order(indonesia$capacity.in.MW,
                decreasing = T),]
#Indonesia Power Plant primary fuel comparison
table(indonesia$primary_fuel)
#> 
#>       Coal        Gas Geothermal      Hydro        Oil 
#>         70         41         10         41         16
#Indonesia Power Plant primary fuel comparison in percentage
prop.table(table(indonesia$primary_fuel))
#> 
#>       Coal        Gas Geothermal      Hydro        Oil 
#> 0.39325843 0.23033708 0.05617978 0.23033708 0.08988764

Insights:

  • The highest capacity Power Plant in Indonesia is PLTU Paiton I Unit 7 & 8 with 5355 MW Capacity.
  • Indonesia have 70 Power Plants with Coal as its primary fuel, 41 with Gas as its primary fuel, 41 with Hydro, 16 Oil Power Plants, and 10 Geothermal Power Plants.
  • Power Plant with Coal as its primary fuel have almost 40% in proportion to other power plants with primary fuel other than coal in Indonesia.