We will take a look at the breakdown of Renewable Generators (“Anaerobic digestion”, “Hydro”, “Micro CHP”, “Photovoltaic”, “Wind”) in the different regions of the UK.
UK.Renewables <- as_tibble(read.csv("https://raw.githubusercontent.com/JMawyin/MSDS2019-607/master/UKRenewables.csv"))
As we can see below, the dataset contains a mix of numeric and factor information as loaded into the dataframe.
str(UK.Renewables)
## Classes 'tbl_df', 'tbl' and 'data.frame': 19723 obs. of 15 variables:
## $ FITID : Factor w/ 19543 levels "FIT00000001",..: 743 1867 5937 5981 7521 9587 14088 19052 545 826 ...
## $ PostCode : Factor w/ 2339 levels "AB","AB12","AB15",..: 4 6 5 3 2 1 3 3 21 9 ...
## $ TechnologyTypeName : Factor w/ 5 levels "Anaerobic digestion",..: 4 4 4 4 5 5 4 4 4 4 ...
## $ InstallationTypeName : Factor w/ 4 levels "Community","Domestic",..: 2 2 2 2 2 4 2 2 2 2 ...
## $ InstalledCapacity : num 3.96 1.5 2 2.1 50 11 1.72 3.69 4.32 2.59 ...
## $ DeclaredNetCapacity : num 3.96 1.5 2 2.1 50 11 1.72 3.69 4.32 2.38 ...
## $ ApplicationDate : Factor w/ 275 levels "10/1/10","10/10/10",..: 181 136 93 93 93 93 267 266 107 93 ...
## $ CommissionedDate : Factor w/ 1509 levels "1/1/00","1/1/01",..: 847 58 1379 384 730 178 1476 1434 738 348 ...
## $ ExportStatusTypeName : Factor w/ 5 levels "Export (deemed)",..: 1 1 1 1 2 1 1 1 1 1 ...
## $ TariffCode : Factor w/ 20 levels "AD/0-500/01",..: 9 9 13 9 20 16 9 9 12 9 ...
## $ Description : Factor w/ 20 levels "Anaerobic Digestion (<=500kW)-2010/11",..: 11 11 4 11 5 17 11 11 14 11 ...
## $ CountryName : Factor w/ 4 levels "England","NULL",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ GovernmentOfficeRegionName: Factor w/ 10 levels "East Midlands",..: 6 6 6 6 6 6 6 6 6 6 ...
## $ LocalAuthorityName : Factor w/ 379 levels "Aberdeen City",..: 1 1 1 1 1 1 1 1 2 2 ...
## $ AccreditationNo : Factor w/ 19662 levels "FAD00003EN","FAD00004EN",..: 1724 897 15223 15515 16041 15947 9033 9832 703 730 ...
Below we can see the different variables contained in the dataset.
colnames(UK.Renewables)
## [1] "FITID" "PostCode"
## [3] "TechnologyTypeName" "InstallationTypeName"
## [5] "InstalledCapacity" "DeclaredNetCapacity"
## [7] "ApplicationDate" "CommissionedDate"
## [9] "ExportStatusTypeName" "TariffCode"
## [11] "Description" "CountryName"
## [13] "GovernmentOfficeRegionName" "LocalAuthorityName"
## [15] "AccreditationNo"
How spread out are the installations based on their rated capacity? As we can see below, most installations are rated near 3 kilowatts. However, we have extreme outliers such as a system that was rated at 2000 kilowatts or almost 3 orders of magnitude larger than the typical installation size in kilowats.
summary(UK.Renewables$InstalledCapacity)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.800 2.520 3.669 3.330 2000.000
Install.Cap <- UK.Renewables$InstalledCapacity
Install.Cap[(Install.Cap == 0)] <- 1
boxplot(Install.Cap, log = "y", col = "gold", ylab="Installed Capacity (KiloWatts)",main="Variation in Renewable Generator Installed Capacity")
We can see below that most of the installations are of the domestic type, fllowed by commercial and community owned PV systems.
table(UK.Renewables$InstallationTypeName)
##
## Community Domestic
## 276 19104
## Non Domestic (Commercial) Non Domestic (Industrial)
## 321 22
Most of the installations are Photovoltaic per technology and domestic per installation type.
T.Tech.Installs <- table(UK.Renewables$TechnologyTypeName)
**What is the breakdown of installations per Region and per Technology Type?
ByRegion.Tech <- table(UK.Renewables$GovernmentOfficeRegionName,UK.Renewables$TechnologyTypeName)
ByRegion.Tech
##
## Anaerobic digestion Hydro Micro CHP
## East Midlands 0 7 2
## East of England 0 6 4
## London 0 0 1
## North East 0 4 0
## North West 0 11 5
## NULL 2 94 1
## South East 0 6 10
## South West 0 39 3
## West Midlands 0 2 1
## Yorkshire and The Humber 0 7 3
##
## Photovoltaic Wind
## East Midlands 1546 72
## East of England 2507 75
## London 891 4
## North East 320 34
## North West 916 79
## NULL 1240 466
## South East 3943 42
## South West 3452 149
## West Midlands 1192 46
## Yorkshire and The Humber 2397 144
#Converting from Table to Data Frame format
DF.ByRegion.Tech <- as.data.frame.matrix(ByRegion.Tech)
#Extractin Row Names and adding to Data Frame as First Column
DF.ByRegion.Tech <- setDT(DF.ByRegion.Tech, keep.rownames = "Region")[]
DF.ByRegion.Tech
How are the Renewable Generators installed in a per Region basis in the UK?
dfm.ByRegion.Tech <- melt(DF.ByRegion.Tech[,c("Region", "Anaerobic digestion", "Hydro", "Micro CHP", "Photovoltaic", "Wind")],id.vars = 1)
#reorder(,-value) orders the bars from high to low.
ggplot(dfm.ByRegion.Tech,aes(x = reorder(Region, -value),y = value)) +
geom_bar(aes(fill = variable),stat = "identity",position = "dodge") + theme(axis.text.x = element_text(angle = 90))+ ylab("Installations") + xlab("Districs in the UK") + ggtitle("Number of Installations per Region")
Calculating Installed Generation Capacity per Region and per Technology Type
require(data.table)
DT <- data.table(UK.Renewables)
Region.Sums <- DT[ , .(Installed.Capacity = sum(InstalledCapacity)), by = .(GovernmentOfficeRegionName, TechnologyTypeName)]
Region.Sums <- arrange(Region.Sums, GovernmentOfficeRegionName)
Region.Sums
**Using Spread to Tyding up previous Data Frame.
spread.Regions.Sum <- spread(Region.Sums, TechnologyTypeName, Installed.Capacity)
dfm.Region.Sums <- melt(spread.Regions.Sum[,c("GovernmentOfficeRegionName", "Anaerobic digestion", "Hydro", "Micro CHP", "Photovoltaic", "Wind")],id.vars = 1)
#reorder(,-value) orders the bars from high to low.
ggplot(dfm.Region.Sums,aes(x = reorder(GovernmentOfficeRegionName, -value),y = value)) +
geom_bar(aes(fill = variable),stat = "identity",position = "dodge") + theme(axis.text.x = element_text(angle = 90))+ ylab("Installed Capacity (KiloWatts)") + xlab("Regions in the UK") + ggtitle("Installed Capacity per Region")
## Warning: Removed 11 rows containing missing values (geom_bar).
We have seen from the study above that the most common installation type and most installed generation capacity both correspond to Photovoltaic technology.