UK Energy Generation Data

We will take a look at the breakdown of Renewable Generators (“Anaerobic digestion”, “Hydro”, “Micro CHP”, “Photovoltaic”, “Wind”) in the different regions of the UK.

UK.Renewables <- as_tibble(read.csv("https://raw.githubusercontent.com/JMawyin/MSDS2019-607/master/UKRenewables.csv"))

As we can see below, the dataset contains a mix of numeric and factor information as loaded into the dataframe.

str(UK.Renewables)
## Classes 'tbl_df', 'tbl' and 'data.frame':    19723 obs. of  15 variables:
##  $ FITID                     : Factor w/ 19543 levels "FIT00000001",..: 743 1867 5937 5981 7521 9587 14088 19052 545 826 ...
##  $ PostCode                  : Factor w/ 2339 levels "AB","AB12","AB15",..: 4 6 5 3 2 1 3 3 21 9 ...
##  $ TechnologyTypeName        : Factor w/ 5 levels "Anaerobic digestion",..: 4 4 4 4 5 5 4 4 4 4 ...
##  $ InstallationTypeName      : Factor w/ 4 levels "Community","Domestic",..: 2 2 2 2 2 4 2 2 2 2 ...
##  $ InstalledCapacity         : num  3.96 1.5 2 2.1 50 11 1.72 3.69 4.32 2.59 ...
##  $ DeclaredNetCapacity       : num  3.96 1.5 2 2.1 50 11 1.72 3.69 4.32 2.38 ...
##  $ ApplicationDate           : Factor w/ 275 levels "10/1/10","10/10/10",..: 181 136 93 93 93 93 267 266 107 93 ...
##  $ CommissionedDate          : Factor w/ 1509 levels "1/1/00","1/1/01",..: 847 58 1379 384 730 178 1476 1434 738 348 ...
##  $ ExportStatusTypeName      : Factor w/ 5 levels "Export (deemed)",..: 1 1 1 1 2 1 1 1 1 1 ...
##  $ TariffCode                : Factor w/ 20 levels "AD/0-500/01",..: 9 9 13 9 20 16 9 9 12 9 ...
##  $ Description               : Factor w/ 20 levels "Anaerobic Digestion (<=500kW)-2010/11",..: 11 11 4 11 5 17 11 11 14 11 ...
##  $ CountryName               : Factor w/ 4 levels "England","NULL",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ GovernmentOfficeRegionName: Factor w/ 10 levels "East Midlands",..: 6 6 6 6 6 6 6 6 6 6 ...
##  $ LocalAuthorityName        : Factor w/ 379 levels "Aberdeen City",..: 1 1 1 1 1 1 1 1 2 2 ...
##  $ AccreditationNo           : Factor w/ 19662 levels "FAD00003EN","FAD00004EN",..: 1724 897 15223 15515 16041 15947 9033 9832 703 730 ...

Below we can see the different variables contained in the dataset.

colnames(UK.Renewables)
##  [1] "FITID"                      "PostCode"                  
##  [3] "TechnologyTypeName"         "InstallationTypeName"      
##  [5] "InstalledCapacity"          "DeclaredNetCapacity"       
##  [7] "ApplicationDate"            "CommissionedDate"          
##  [9] "ExportStatusTypeName"       "TariffCode"                
## [11] "Description"                "CountryName"               
## [13] "GovernmentOfficeRegionName" "LocalAuthorityName"        
## [15] "AccreditationNo"

How spread out are the installations based on their rated capacity? As we can see below, most installations are rated near 3 kilowatts. However, we have extreme outliers such as a system that was rated at 2000 kilowatts or almost 3 orders of magnitude larger than the typical installation size in kilowats.

summary(UK.Renewables$InstalledCapacity)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##    0.000    1.800    2.520    3.669    3.330 2000.000
Install.Cap <- UK.Renewables$InstalledCapacity
Install.Cap[(Install.Cap == 0)] <- 1
boxplot(Install.Cap, log = "y", col = "gold",  ylab="Installed Capacity (KiloWatts)",main="Variation in Renewable Generator Installed Capacity")

We can see below that most of the installations are of the domestic type, fllowed by commercial and community owned PV systems.

table(UK.Renewables$InstallationTypeName)
## 
##                 Community                  Domestic 
##                       276                     19104 
## Non Domestic (Commercial) Non Domestic (Industrial) 
##                       321                        22

Most of the installations are Photovoltaic per technology and domestic per installation type.

T.Tech.Installs <- table(UK.Renewables$TechnologyTypeName)

**What is the breakdown of installations per Region and per Technology Type?

ByRegion.Tech <- table(UK.Renewables$GovernmentOfficeRegionName,UK.Renewables$TechnologyTypeName)
ByRegion.Tech
##                           
##                            Anaerobic digestion Hydro Micro CHP
##   East Midlands                              0     7         2
##   East of England                            0     6         4
##   London                                     0     0         1
##   North East                                 0     4         0
##   North West                                 0    11         5
##   NULL                                       2    94         1
##   South East                                 0     6        10
##   South West                                 0    39         3
##   West Midlands                              0     2         1
##   Yorkshire and The Humber                   0     7         3
##                           
##                            Photovoltaic Wind
##   East Midlands                    1546   72
##   East of England                  2507   75
##   London                            891    4
##   North East                        320   34
##   North West                        916   79
##   NULL                             1240  466
##   South East                       3943   42
##   South West                       3452  149
##   West Midlands                    1192   46
##   Yorkshire and The Humber         2397  144
#Converting from Table to Data Frame format
DF.ByRegion.Tech <- as.data.frame.matrix(ByRegion.Tech) 
#Extractin Row Names and adding to Data Frame as First Column
DF.ByRegion.Tech <- setDT(DF.ByRegion.Tech, keep.rownames = "Region")[]
DF.ByRegion.Tech

How are the Renewable Generators installed in a per Region basis in the UK?

dfm.ByRegion.Tech <- melt(DF.ByRegion.Tech[,c("Region", "Anaerobic digestion", "Hydro", "Micro CHP", "Photovoltaic", "Wind")],id.vars = 1)

#reorder(,-value) orders the bars from high to low.
ggplot(dfm.ByRegion.Tech,aes(x = reorder(Region, -value),y = value)) + 
    geom_bar(aes(fill = variable),stat = "identity",position = "dodge") + theme(axis.text.x = element_text(angle = 90))+ ylab("Installations") + xlab("Districs in the UK") + ggtitle("Number of Installations per Region")

Calculating Installed Generation Capacity per Region and per Technology Type

require(data.table) 
DT <- data.table(UK.Renewables) 
Region.Sums <- DT[ , .(Installed.Capacity = sum(InstalledCapacity)), by = .(GovernmentOfficeRegionName, TechnologyTypeName)]
Region.Sums <- arrange(Region.Sums, GovernmentOfficeRegionName)
Region.Sums

**Using Spread to Tyding up previous Data Frame.

spread.Regions.Sum <- spread(Region.Sums, TechnologyTypeName, Installed.Capacity)
dfm.Region.Sums <- melt(spread.Regions.Sum[,c("GovernmentOfficeRegionName", "Anaerobic digestion", "Hydro", "Micro CHP", "Photovoltaic", "Wind")],id.vars = 1)

#reorder(,-value) orders the bars from high to low.
ggplot(dfm.Region.Sums,aes(x = reorder(GovernmentOfficeRegionName, -value),y = value)) + 
    geom_bar(aes(fill = variable),stat = "identity",position = "dodge") + theme(axis.text.x = element_text(angle = 90))+ ylab("Installed Capacity (KiloWatts)") + xlab("Regions in the UK") + ggtitle("Installed Capacity per Region")
## Warning: Removed 11 rows containing missing values (geom_bar).

We have seen from the study above that the most common installation type and most installed generation capacity both correspond to Photovoltaic technology.