Processing Monthly CER data
Last updated - 22 July 2022
This needs refinement to publish as RMarkdown doc
Will also need to include instructions & explanation
code has been taken from CER_tidydata.R
Function to process CER data types
09 May 2022 - check state of latest SGU-Solar csv file
Checked csv file and header for Feb 22 and Mar 22 <Feb 2022 - SGU
Rated Output In kW> Note the TWO spaces between Feb & 2022, also
the same for Mar & 22
will cause the text to date function to miss Feb-22 & Mar-22
Temparary fix is to remove extra space for Feb & Mar 22 from the csv
file Used TextEdit
Might have to adjust the parsing code :-( Will also contact them
again !
29 Mar 2022 - check processing of PV_kW data for Feb 2022
Local_Generation.Rmd is showing 0 kW installed for Feb 2022
Checked csv file and header for <Feb 2022 - SGU Rated Output In
kW> Note the extra space between Feb & 2022, which I guess will
cause the text to date function to miss that this is Feb-22 Temparary
fix is to remove this space from the csv file
Run with corrected header data for Feb-22 and now shows non-zero kW
for Feb-22
Will also provide feedback to CER
29 Mar 2022 - CER have provided 2020 historical data
Refer to next heading para….. CER have now provided the historical
data for 2020 So now use the CER provided csv files like <Postcode
data for small-scale installations 2020 - SGU-Solar.csv> But have
kept the renamed files - now have extension “.temporary”
Ran test - 2020 data not read Checked filenames <Postcode data for
small-scale installations 2020 - SGU - Solar.csv> so they have
inserted spaces in SGU-Solar Simple hack to get it working…. Remove
spaces from “2020 - SGU - Solar” to be “2020 - SGU-Solar” And if they
persist with different format, then change the code :-)
Also they have changed filenames for ASHP from <Postcode data for
small-scale installations - SWH-Air source heat pump.csv> to
<Postcode data for small-scale installations 2020 - SWH -
ASHP.csv> Which is wierd, cos the “current” datafile still has
unabbreviated filename
So temporary solution is to rename each of the 2020 historical
filenames to match the older historical filename convention
Also noticed CER have changed filename for ASHP “current” This
changed late last year was <Postcode data for small-scale
installations 2019 - SWH-Air-source-heat-pump.csv> now <Postcode
data for small-scale installations - SWH-Air source heat pump.csv> so
note “-” is now ” ” (space) Temporary fix is to put “-” back in filename
for “current”
Problem to solve with CER data with cutover to include 2022
data
The “current” csv files contain - cumulative data for previous years
- monthly data for the current year and the previous year So as at March
2022, data is up to date to Jan 2022 - so cumulative data up to end 2020
- monthly data Jan 2021 to Jan 2022 Calculations assume that historical
data is available for previous years, ie not included in current csv
file So there should be a file <Postcode data for small-scale
installations 2020 - SWH-Solar.csv> which would contain monthly data
for Jan 2019 to Dec 2020 As at Mar 15 2022, CER have not published this
So the calculation algorithm will be excluding Jan 2020 to Dec 2020
data
Process at the end of a year, when Jan data is made available - get
new historical data file for year now “bumped” and add it to
folder
Possible solutions: 1. Wait until CER provide the historical data
file 2. Hack the algorithm - rename the “current” csv that had data to
end Dec 2021 to include 2020 in its filename - the algorithm will just
get column data for the filename year - try this and check
Ran test - now PV kW & PV qty look to be ok - now do same hack
for SWH & ASHPSWH
Feedback provided to CER
From Feedback form on this page http://www.cleanenergyregulator.gov.au/RET/Forms-and-resources/Postcode-data-for-small-scale-installations/historical-postcode-data-for-small-scale-installations
Done as 3 feedbacks as feedback textbox is limited in number of chars
allowed Submitted 30-Mar-2022
Request to keep historical filenames and columns headers consistent.
I use the current and historical CER csv files to track progress of
renewable installations over time using programs written in the R
language. Really appreciate having access to the data ! The programs
depend on the filename conventions and csv header conventions remaining
consistent.
I source the data from http://www.cleanenergyregulator.gov.au/RET/Forms-and-resources/Postcode-data-for-small-scale-installations
- Consistent csv headers
For the data as at 24 March 2022, in the file “Postcode data for
small-scale installations - SGU-Solar.csv” the header <Feb 2022 - SGU
Rated Output In kW> has 2 space characters between and
<2022> which required adjustment to my algorithm. Request is to
keep the same format as
- Consistent filenames - SGU-Solar
The filename <Postcode data for small-scale installations 2020 -
SGU - Solar.csv> is inconsistent from all previous historical files,
eg <Postcode data for small-scale installations 2019 -
SGU-Solar.csv> Note that “SGU-Solar” has changed to “SGU - Solar”
Request is to keep a consistent format
- Consistent filenames - SWH - Air Source Heat Pump
The latest historical filename is <Postcode data for small-scale
installations 2020 - SWH - ASHP.csv> This differs from all previous
historical filenames, e.g. <Postcode data for small-scale
installations 2019 - SWH-Air-source-heat-pump.csv> Also the “current”
SWH-ASHP csv filename convention is different to the historical data
filenames from 2019 and earlier, ie <Postcode data for small-scale
installations - SWH-Air source heat pump.csv> versus <Postcode
data for small-scale installations 2019 -
SWH-Air-source-heat-pump.csv> Note that the dash “-” has become a
space “ “ Request is to keep the file naming conventions consistent
# ================================== F U N C T I O N =================
process_CER_raw_data <- function(CER_type, CER_value) {
print("process_CER_raw_data")
column_pattern_mask = "No column pattern"
write_filename_mask = "No write filename"
if (CER_type =="SGU_Solar") {
file_pattern = "SGU-Solar.csv"
if (CER_value == "PV_qty") {
column_pattern_mask = "...Installations.Quantity"
write_filename_mask = "CER_PVqty"
}
if (CER_value == "PV_kW") {
column_pattern_mask = "...SGU.Rated.Output.In.kW"
write_filename_mask = "CER_PVkW"
}
}
if (CER_type =="SWH_Solar") {
file_pattern = "SWH-Solar.csv"
column_pattern_mask = "...Installations.Quantity"
write_filename_mask = "CER_SWHqty"
}
# EXCEPT - CER has changed the file mask for ASHP- current csv doesn't use some dashes
# so create a special filelist for ASHP files
if (CER_type =="SWH_ASHP") {
file_pattern = "SWH-Air-source-heat-pump.csv"
column_pattern_mask = "...Installations.Quantity"
write_filename_mask = "CER_SWHASHPqty"
file_list <- append(
as.list(list.files(path=CER_data_folder,pattern = "SWH-Air source heat pump.csv")),
as.list(list.files(path=CER_data_folder,pattern = file_pattern))
)
}
else {
# create list of files matching the pattern
file_list <- as.list(list.files(path=CER_data_folder,pattern = file_pattern))
}
# This will get patterns such as "SGU-Solar.csv"
# print(file_list)
for (filename in file_list) {
print(sprintf("Processing %s", filename))
# check if filename has "yyyy" in it - no year = current
filename_year <- str_match(filename,"[0-9]+")
if(!is.na(filename_year)) {
# makes write_filename like "CER_PVqty_",filename_year,".rds"
write_filename = paste(write_filename_mask,"_",filename_year,".rds",sep = "")
} else {
write_filename = paste(write_filename_mask,".rds",sep = "")
}
data <- read.csv(paste(CER_data_folder,filename,sep = ""))
# as of CER data supplied for end Sep 2021, non-column data was
# added to start of csv, so changed csv read to fread from data.table package
# to handdle skip of lines until column row
# data <- fread(paste(CER_data_folder,filename,sep = ""),
# skip = "Small Unit Installation Postcode",
# check.names = TRUE,data.table = FALSE)
print(sprintf("Number of rows = %f", nrow(data)))
# now make all columns numeric, except first one (postcode)
all_columns <- colnames(data)
all_columns_except_first <- all_columns[-1]
# ========================
# Removing commas from selected columns
# Prior to setting to numeric, otherwise will get NA when casting
# https://statisticsglobe.com/modify-numbers-with-comma-as-thousand-separator-in-r
data[ , all_columns_except_first] <-
lapply(data[ , all_columns_except_first],
function(x){ as.numeric(as.character(gsub(",", "", x))) })
# now cast to numeric
data[all_columns_except_first] <- sapply(data[all_columns_except_first],as.numeric)
# rename columns now, so can more easily remove duplicate rows (min value)
data <- rename(data, Postcode = 1,
Previous = 2)
# check if any duplicate postcodes
my_duplicated <- duplicated(data[,1])
sum_my_duplicates = sum(my_duplicated)
print(sprintf("Duplicate Postcode = %f", sum_my_duplicates))
# make column one (postcode) to character
# ================ try as integer instead ===================
# data[1] <- sapply(data[1],as.character)
# do this later just before saving......
if(sum_my_duplicates > 0) {
my_duplicated_table <- data[my_duplicated,]
# https://www.biostars.org/p/304213/
# remove duplicates - keep highest value of a column
data_before_remove <- data
data <- data[order(data$Postcode, -abs(data$Previous) ), ] ### sort first
data <- data[ !duplicated(data$Postcode), ] ### Keep highest
# browser(text="Found duplicates")
}
# ====== This will be the place to remove redundant space between Month(Mmm) and Year(yy)
# ====== Since Feb 2022, for SGU Solar, and extra space has been inserted
# ====== so remove it, so that the column name in the csv file will match
# ====== the column_pattern_mask
# ====== N O T E T O DO ========
# maybe use something like str_squish ???? see https://statisticsglobe.com/replace-multiple-spaces-by-single-space-in-r
# ====== OR maybe this article is better
# ===== https://www.codingprof.com/5-easy-ways-to-replace-blanks-in-column-names-in-r-examples/
# ===== suggest try it out by just reading in an offending csv file.....
if(is.na(filename_year)) {
# current data doesn't have yyyy in filename
# column_pattern will be like "[a-zA-Z]+.[0-9]+...Installations.Quantity"
column_pattern = paste("[a-zA-Z]+.[0-9]+",column_pattern_mask, sep = "")
}
else {
# just interested in getting column data for the year of the filename
# as historical files have data for 2 year spread
column_pattern <- paste("[a-zA-Z]+.",filename_year,column_pattern_mask, sep = "")
}
# get a vector just containing column names we are interested in
column_list_vector <- as.vector(
na.omit(
str_match(colnames(data),column_pattern)))
# for now, values_to generic "CER_value_col"
# and then rename the column just before saving
# Note that pivot_longer is preferred over gather - https://tidyr.tidyverse.org/reference/pivot_longer.html
data <- pivot_longer(data,
cols = all_of(column_list_vector),
names_to = "year_month_string",
values_to = "CER_value_col")
# now add "dd=01" to the year_month_string so we can force it to as.Date
# then add 1 month
# and subtract a day so we get end-of-month date
# as that is how the CER data is reported
# see https://statisticsglobe.com/add-subtract-months-years-from-date-in-r
data <- mutate(data,
year_month =
(as.Date(
paste0(sub("^([^.]*.[^.]*).*", "\\1",
year_month_string),".01"),
format = "%B.%Y.%d" )
%m+% months(1))
%m-% days(1)
)
# rename the postcode column
# already done above
# data <- rename(data, Postcode = Small.Unit.Installation.Postcode)
# and make Postcode column integer
data <- data %>% mutate(Postcode = as.integer((Postcode)))
# swhashp_totals <- data.frame(year = numeric(),
# year_total = numeric(),
# total_qty = numeric() )
if(CER_value == "SWHASHP_qty") {
# before we remove columns, get the sum of last column & save it
# browser(text="about to SWHASHP adding rows for years")
swhashp_totals <- swhashp_totals %>% add_row(year=as.integer(filename_year),
year_total = sum(data$CER_value_col),
total_qty = sum(data$Installations.Quantity.Total))
# browser(text="SWHASHP adding rows for years")
}
# now just retain the columns we want
data <- data[,names(data) %in% c("Postcode",
"year_month","CER_value_col")]
# rename the CER_value_col
if(CER_value == "PV_qty") {
# rename the CER_value_col column
data <- rename(data, PV_qty = CER_value_col)
# get the columns in the right order
data <- relocate(data, PV_qty, .after = year_month)
}
if(CER_value == "PV_kW") {
# rename the CER_value_col column
data <- rename(data, PV_kW = CER_value_col)
# get the columns in the right order
data <- relocate(data, PV_kW, .after = year_month)
}
if(CER_value == "SWH_qty") {
# rename the CER_value_col column
data <- rename(data, SWH_qty = CER_value_col)
# get the columns in the right order
data <- relocate(data, SWH_qty, .after = year_month)
}
if(CER_value == "SWHASHP_qty") {
# rename the CER_value_col column
data <- rename(data, SWHASHP_qty = CER_value_col)
# get the columns in the right order
data <- relocate(data, SWHASHP_qty, .after = year_month)
}
# Sort by postcode, then year_month
data <- data[
with(data, order(Postcode, year_month)),
]
# View(data)
# save the tidy data format as .rds
print(sprintf("Saving %s",write_filename))
saveRDS(data, paste(tmp_folder,write_filename,sep = ""))
}
# E N D O F L O O P - should now have rds files for each year + current
# Now create a consolidated file of all years data for eg PVqty, PVkW, etc
# Read all the CER_PVqty/kW, etc rds files into a dataframe
# should give us a list matching patterns like "^CER_PVqty.*\\.rds$"
setwd(tmp_folder)
tidy_file_list <- as.list(list.files(pattern= paste("^",write_filename_mask,".*\\.rds$", sep = "")))
# bind all the matching files into a data frame
my_tidy_data <- do.call(rbind, lapply(tidy_file_list, readRDS))
# now set working directory back to base
setwd(base_folder)
# Sort by postcode, then year_month
my_tidy_data <- my_tidy_data[
with(my_tidy_data, order(Postcode, year_month)),
]
# save this to a file like CER_PVqty_all_years.rds in tmp folder
print(sprintf("Writing %s", paste(write_filename_mask,"_all_years.rds", sep = "")))
saveRDS(my_tidy_data, file = paste(tmp_folder,write_filename_mask,"_all_years.rds", sep = ""))
return(paste("====> exiting function: ",write_filename_mask))
}
# ================================== F U N C T I O N =================
Test functions
# ================================== F U N C T I O N =================
test_CER_PVkW <- function() {
# Now do a test
# Try doing test for CER_PVkW_all_years.rds
# and check totals against csv file in CER_data
# Then repeat for other data - PVqty, SWH, SWHASHP
print("========== S Y S T E M T E S T - PVkW =========")
print("readRDS CER_PVkW_all_years.rds")
read_data <- readRDS(file=paste(tmp_folder,"CER_PVkW_all_years.rds",sep = ""))
print("Calculate totals")
calc_PV_kW_total <- sum(read_data$PV_kW,na.rm=TRUE)
print("read current - Postcode data for small-scale installations - SGU-Solar.csv")
print("and extract totals")
data_SGU = read.csv(paste(CER_data_folder,"Postcode data for small-scale installations - SGU-Solar.csv",sep = ""))
# Removing commas from selected columns
# Prior to setting to numeric
# https://statisticsglobe.com/modify-numbers-with-comma-as-thousand-separator-in-r
col_conv <- setdiff(names(data_SGU),"Small.Unit.Installation.Postcode")
data_SGU[ , col_conv] <- lapply(data_SGU[ , col_conv], # Convert data
function(x){ as.numeric(as.character(gsub(",", "", x))) })
# set the last 2 columns as numeric
data_SGU$SGU.Rated.Output.In.kW.Total <- as.numeric(as.character(data_SGU$SGU.Rated.Output.In.kW.Total))
raw_PV_kW_total <- sum(data_SGU$SGU.Rated.Output.In.kW.Total,na.rm=TRUE)
print("===== Totals =====")
print(sprintf("Calc PV kW = %f", calc_PV_kW_total))
print(sprintf("Raw PV kW = %f", raw_PV_kW_total))
print(sprintf("Difference = %f", calc_PV_kW_total - raw_PV_kW_total))
print(sprintf("Difference percent = %f", 100*(calc_PV_kW_total-raw_PV_kW_total)/raw_PV_kW_total))
}
# ================================== F U N C T I O N =================
test_CER_PVqty <- function() {
# Now do a test
# Try doing test for CER_PVqty_all_years.rds
# and check totals against csv file in CER_data
# Then repeat for other data - PVqty, SWH, SWHASHP
print("========== S Y S T E M T E S T - PVqty =========")
print("readRDS CER_PVqty_all_years.rds")
read_data <- readRDS(file=paste(tmp_folder,"CER_PVqty_all_years.rds",sep = ""))
print("Calculate totals")
calc_PV_qty_total <- sum(read_data$PV_qty,na.rm=TRUE)
print("read current - Postcode data for small-scale installations - SGU-Solar.csv")
print("and extract totals")
data_SGU = read.csv(paste(CER_data_folder,"Postcode data for small-scale installations - SGU-Solar.csv",sep = ""))
# Removing commas from selected columns
# Prior to setting to numeric
# https://statisticsglobe.com/modify-numbers-with-comma-as-thousand-separator-in-r
col_conv <- setdiff(names(data_SGU),"Small.Unit.Installation.Postcode")
data_SGU[ , col_conv] <- lapply(data_SGU[ , col_conv], # Convert data
function(x){ as.numeric(as.character(gsub(",", "", x))) })
# set the last 2 columns as numeric
data_SGU$Installations.Quantity.Total <- as.numeric(as.character(data_SGU$Installations.Quantity.Total))
# data_SGU$SGU.Rated.Output.In.kW.Total <- as.numeric(as.character(data_SGU$SGU.Rated.Output.In.kW.Total))
raw_PV_qty_total <- sum(data_SGU$Installations.Quantity.Total,na.rm=TRUE)
# raw_PV_kW_total <- sum(data_SGU$SGU.Rated.Output.In.kW.Total,na.rm=TRUE)
print("===== Totals =====")
print(sprintf("Calc PV total qty = %f", calc_PV_qty_total))
print(sprintf("Raw PV total qty = %f", raw_PV_qty_total))
# print(sprintf("Calc PV kW = %f", calc_PV_kW_total))
# print(sprintf("Raw PV kW = %f", raw_PV_kW_total))
print(sprintf("Difference = %f", calc_PV_qty_total - raw_PV_qty_total))
print(sprintf("Difference percent = %f", 100*(calc_PV_qty_total-raw_PV_qty_total)/raw_PV_qty_total))
}
# ================================== F U N C T I O N =================
test_CER_SWH <- function() {
# Now do a test
# Try doing test for CER_SWH_all_years.rds
# and check totals against csv file in CER_data
# Then repeat for other data - PVqty, SWH, SWHASHP
print("========== S Y S T E M T E S T - SWH =========")
print("readRDS CER_SWHqty_all_years.rds")
read_data <- readRDS(file=paste(tmp_folder,"CER_SWHqty_all_years.rds",sep = ""))
print("Calculate totals")
calc_SWH_qty_total <- sum(read_data$SWH_qty,na.rm=TRUE)
print("read current - Postcode data for small-scale installations - SWH-Solar.csv")
print("and extract totals")
data_SGU = read.csv(paste(CER_data_folder,"Postcode data for small-scale installations - SWH-Solar.csv",sep = ""))
# Removing commas from selected columns
# Prior to setting to numeric
# https://statisticsglobe.com/modify-numbers-with-comma-as-thousand-separator-in-r
col_conv <- setdiff(names(data_SGU),"Small.Unit.Installation.Postcode")
data_SGU[ , col_conv] <- lapply(data_SGU[ , col_conv], # Convert data
function(x){ as.numeric(as.character(gsub(",", "", x))) })
# set the last column as numeric
data_SGU$Installations.Quantity.Total <- as.numeric(as.character(data_SGU$Installations.Quantity.Total))
# data_SGU$SGU.Rated.Output.In.kW.Total <- as.numeric(as.character(data_SGU$SGU.Rated.Output.In.kW.Total))
raw_SWH_qty_total <- sum(data_SGU$Installations.Quantity.Total,na.rm=TRUE)
# raw_PV_kW_total <- sum(data_SGU$SGU.Rated.Output.In.kW.Total,na.rm=TRUE)
print("===== Totals =====")
print(sprintf("Calc SWH total qty = %f", calc_SWH_qty_total))
print(sprintf("Raw SWH total qty = %f", raw_SWH_qty_total))
print(sprintf("Difference = %f", calc_SWH_qty_total - raw_SWH_qty_total))
print(sprintf("Difference percent = %f", 100*(calc_SWH_qty_total-raw_SWH_qty_total)/raw_SWH_qty_total))
}
# ================================== F U N C T I O N =================
test_CER_SWHASHP <- function() {
# Now do a test
# Try doing test for CER_SWH_all_years.rds
# and check totals against csv file in CER_data
# Then repeat for other data - PVqty, SWH, SWHASHP
print("========== S Y S T E M T E S T - SWHASHP =========")
print("readRDS CER_SWHASHPqty_all_years.rds")
read_data <- readRDS(file=paste(tmp_folder,"CER_SWHASHPqty_all_years.rds",sep = ""))
print("Calculate totals")
calc_SWHASHP_qty_total <- sum(read_data$SWHASHP_qty,na.rm=TRUE)
# ======= note changed mask from "SWH-Air source heat pump"
# ======= to "SWH-Air-source-heat-pump"- 7 June 2022
print("read current - Postcode data for small-scale installations - SWH-Air-source-heat-pump.csv")
print("and extract totals")
data_SGU = read.csv(paste(CER_data_folder,"Postcode data for small-scale installations - SWH-Air-source-heat-pump.csv",sep = ""))
# Removing commas from selected columns
# Prior to setting to numeric
# https://statisticsglobe.com/modify-numbers-with-comma-as-thousand-separator-in-r
col_conv <- setdiff(names(data_SGU),"Small.Unit.Installation.Postcode")
data_SGU[ , col_conv] <- lapply(data_SGU[ , col_conv], # Convert data
function(x){ as.numeric(as.character(gsub(",", "", x))) })
# set the last column as numeric
data_SGU$Installations.Quantity.Total <- as.numeric(as.character(data_SGU$Installations.Quantity.Total))
# data_SGU$SGU.Rated.Output.In.kW.Total <- as.numeric(as.character(data_SGU$SGU.Rated.Output.In.kW.Total))
raw_SWHASHP_qty_total <- sum(data_SGU$Installations.Quantity.Total,na.rm=TRUE)
# raw_PV_kW_total <- sum(data_SGU$SGU.Rated.Output.In.kW.Total,na.rm=TRUE)
print("===== Totals =====")
print(sprintf("Calc SWHASHP total qty = %f", calc_SWHASHP_qty_total))
print(sprintf("Raw SWHASHP total qty = %f", raw_SWHASHP_qty_total))
print(sprintf("Difference = %f", calc_SWHASHP_qty_total - raw_SWHASHP_qty_total))
print(sprintf("Difference percent = %f", 100*(calc_SWHASHP_qty_total-raw_SWHASHP_qty_total)/raw_SWHASHP_qty_total))
# browser(text="End of test_CER_SWHASHP function")
}
Main processing
# ================ T H I S I S T H E R E A L P R O C E S S =============
# define some global variables for folders
base_folder <- getwd()
CER_data_folder <- paste(base_folder,"/CER_data/",sep = "")
tmp_folder <- paste(base_folder,"/tmp/",sep = "")
output_folder <- paste(base_folder,"/Output/",sep = "")
# test_folder <- paste(getwd(),"/test/",sep = "")
run_time <- sprintf("%s",Sys.time())
# Specify empty vectors in data.frame
swhashp_totals <- data.frame(year = numeric(),
year_total = numeric(),
total_qty = numeric() )
# = clear out RDS files in tmp folder
# options
# https://stackoverflow.com/questions/9296377/automatically-delete-files-folders
# do.call(file.remove, list(list.files("C:/Temp", full.names = TRUE)))
#list.files(tmp_folder)
do.call(file.remove, list(list.files(tmp_folder, full.names = TRUE)))
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [46] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [61] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [76] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#list.files(tmp_folder)
#sink(sprintf("CER tidydata:%s.log",run_time))
print("CER tidydata Log")
## [1] "CER tidydata Log"
print("================")
## [1] "================"
print(sprintf("Run on %s",run_time))
## [1] "Run on 2022-07-22 12:26:32"
print("Processing SGU_Solar - PV_kW.......")
## [1] "Processing SGU_Solar - PV_kW......."
process_CER_raw_data("SGU_Solar", "PV_kW")
## [1] "process_CER_raw_data"
## [1] "Processing Postcode data for small-scale installations - SGU-Solar.csv"
## [1] "Number of rows = 2805.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVkW.rds"
## [1] "Processing Postcode data for small-scale installations 2001 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVkW_2001.rds"
## [1] "Processing Postcode data for small-scale installations 2002 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVkW_2002.rds"
## [1] "Processing Postcode data for small-scale installations 2003 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVkW_2003.rds"
## [1] "Processing Postcode data for small-scale installations 2004 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVkW_2004.rds"
## [1] "Processing Postcode data for small-scale installations 2005 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVkW_2005.rds"
## [1] "Processing Postcode data for small-scale installations 2006 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVkW_2006.rds"
## [1] "Processing Postcode data for small-scale installations 2007 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVkW_2007.rds"
## [1] "Processing Postcode data for small-scale installations 2008 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVkW_2008.rds"
## [1] "Processing Postcode data for small-scale installations 2009 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVkW_2009.rds"
## [1] "Processing Postcode data for small-scale installations 2010 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVkW_2010.rds"
## [1] "Processing Postcode data for small-scale installations 2011 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVkW_2011.rds"
## [1] "Processing Postcode data for small-scale installations 2012 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVkW_2012.rds"
## [1] "Processing Postcode data for small-scale installations 2013 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVkW_2013.rds"
## [1] "Processing Postcode data for small-scale installations 2014 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVkW_2014.rds"
## [1] "Processing Postcode data for small-scale installations 2015 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVkW_2015.rds"
## [1] "Processing Postcode data for small-scale installations 2016 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVkW_2016.rds"
## [1] "Processing Postcode data for small-scale installations 2017 - SGU-Solar.csv"
## [1] "Number of rows = 2800.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVkW_2017.rds"
## [1] "Processing Postcode data for small-scale installations 2018 - SGU-Solar.csv"
## [1] "Number of rows = 2800.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVkW_2018.rds"
## [1] "Processing Postcode data for small-scale installations 2019 - SGU-Solar.csv"
## [1] "Number of rows = 2800.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVkW_2019.rds"
## [1] "Processing Postcode data for small-scale installations 2020 - SGU-Solar.csv"
## [1] "Number of rows = 2800.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVkW_2020.rds"
## [1] "Writing CER_PVkW_all_years.rds"
## [1] "====> exiting function: CER_PVkW"
test_CER_PVkW()
## [1] "========== S Y S T E M T E S T - PVkW ========="
## [1] "readRDS CER_PVkW_all_years.rds"
## [1] "Calculate totals"
## [1] "read current - Postcode data for small-scale installations - SGU-Solar.csv"
## [1] "and extract totals"
## [1] "===== Totals ====="
## [1] "Calc PV kW = 17481939.378999"
## [1] "Raw PV kW = 17482411.107000"
## [1] "Difference = -471.728001"
## [1] "Difference percent = -0.002698"
print("Processing SGU_Solar - PV_qty.......")
## [1] "Processing SGU_Solar - PV_qty......."
process_CER_raw_data("SGU_Solar", "PV_qty")
## [1] "process_CER_raw_data"
## [1] "Processing Postcode data for small-scale installations - SGU-Solar.csv"
## [1] "Number of rows = 2805.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVqty.rds"
## [1] "Processing Postcode data for small-scale installations 2001 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVqty_2001.rds"
## [1] "Processing Postcode data for small-scale installations 2002 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVqty_2002.rds"
## [1] "Processing Postcode data for small-scale installations 2003 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVqty_2003.rds"
## [1] "Processing Postcode data for small-scale installations 2004 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVqty_2004.rds"
## [1] "Processing Postcode data for small-scale installations 2005 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVqty_2005.rds"
## [1] "Processing Postcode data for small-scale installations 2006 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVqty_2006.rds"
## [1] "Processing Postcode data for small-scale installations 2007 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVqty_2007.rds"
## [1] "Processing Postcode data for small-scale installations 2008 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVqty_2008.rds"
## [1] "Processing Postcode data for small-scale installations 2009 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVqty_2009.rds"
## [1] "Processing Postcode data for small-scale installations 2010 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVqty_2010.rds"
## [1] "Processing Postcode data for small-scale installations 2011 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVqty_2011.rds"
## [1] "Processing Postcode data for small-scale installations 2012 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVqty_2012.rds"
## [1] "Processing Postcode data for small-scale installations 2013 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVqty_2013.rds"
## [1] "Processing Postcode data for small-scale installations 2014 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVqty_2014.rds"
## [1] "Processing Postcode data for small-scale installations 2015 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVqty_2015.rds"
## [1] "Processing Postcode data for small-scale installations 2016 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVqty_2016.rds"
## [1] "Processing Postcode data for small-scale installations 2017 - SGU-Solar.csv"
## [1] "Number of rows = 2800.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVqty_2017.rds"
## [1] "Processing Postcode data for small-scale installations 2018 - SGU-Solar.csv"
## [1] "Number of rows = 2800.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVqty_2018.rds"
## [1] "Processing Postcode data for small-scale installations 2019 - SGU-Solar.csv"
## [1] "Number of rows = 2800.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVqty_2019.rds"
## [1] "Processing Postcode data for small-scale installations 2020 - SGU-Solar.csv"
## [1] "Number of rows = 2800.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_PVqty_2020.rds"
## [1] "Writing CER_PVqty_all_years.rds"
## [1] "====> exiting function: CER_PVqty"
test_CER_PVqty()
## [1] "========== S Y S T E M T E S T - PVqty ========="
## [1] "readRDS CER_PVqty_all_years.rds"
## [1] "Calculate totals"
## [1] "read current - Postcode data for small-scale installations - SGU-Solar.csv"
## [1] "and extract totals"
## [1] "===== Totals ====="
## [1] "Calc PV total qty = 3191103.000000"
## [1] "Raw PV total qty = 3191201.000000"
## [1] "Difference = -98.000000"
## [1] "Difference percent = -0.003071"
print("Processing SWH_Solar - SWHqty.......")
## [1] "Processing SWH_Solar - SWHqty......."
process_CER_raw_data("SWH_Solar", "SWH_qty")
## [1] "process_CER_raw_data"
## [1] "Processing Postcode data for small-scale installations - SWH-Solar.csv"
## [1] "Number of rows = 2963.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_SWHqty.rds"
## [1] "Processing Postcode data for small-scale installations 2001 - SWH-Solar.csv"
## [1] "Number of rows = 2952.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_SWHqty_2001.rds"
## [1] "Processing Postcode data for small-scale installations 2002 - SWH-Solar.csv"
## [1] "Number of rows = 2952.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_SWHqty_2002.rds"
## [1] "Processing Postcode data for small-scale installations 2003 - SWH-Solar.csv"
## [1] "Number of rows = 2952.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_SWHqty_2003.rds"
## [1] "Processing Postcode data for small-scale installations 2004 - SWH-Solar.csv"
## [1] "Number of rows = 2952.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_SWHqty_2004.rds"
## [1] "Processing Postcode data for small-scale installations 2005 - SWH-Solar.csv"
## [1] "Number of rows = 2952.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_SWHqty_2005.rds"
## [1] "Processing Postcode data for small-scale installations 2006 - SWH-Solar.csv"
## [1] "Number of rows = 2952.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_SWHqty_2006.rds"
## [1] "Processing Postcode data for small-scale installations 2007 - SWH-Solar.csv"
## [1] "Number of rows = 2952.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_SWHqty_2007.rds"
## [1] "Processing Postcode data for small-scale installations 2008 - SWH-Solar.csv"
## [1] "Number of rows = 2952.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_SWHqty_2008.rds"
## [1] "Processing Postcode data for small-scale installations 2009 - SWH-Solar.csv"
## [1] "Number of rows = 2952.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_SWHqty_2009.rds"
## [1] "Processing Postcode data for small-scale installations 2010 - SWH-Solar.csv"
## [1] "Number of rows = 2952.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_SWHqty_2010.rds"
## [1] "Processing Postcode data for small-scale installations 2011 - SWH-Solar.csv"
## [1] "Number of rows = 2952.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_SWHqty_2011.rds"
## [1] "Processing Postcode data for small-scale installations 2012 - SWH-Solar.csv"
## [1] "Number of rows = 2952.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_SWHqty_2012.rds"
## [1] "Processing Postcode data for small-scale installations 2013 - SWH-Solar.csv"
## [1] "Number of rows = 2952.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_SWHqty_2013.rds"
## [1] "Processing Postcode data for small-scale installations 2014 - SWH-Solar.csv"
## [1] "Number of rows = 2952.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_SWHqty_2014.rds"
## [1] "Processing Postcode data for small-scale installations 2015 - SWH-Solar.csv"
## [1] "Number of rows = 2952.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_SWHqty_2015.rds"
## [1] "Processing Postcode data for small-scale installations 2016 - SWH-Solar.csv"
## [1] "Number of rows = 2952.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_SWHqty_2016.rds"
## [1] "Processing Postcode data for small-scale installations 2017 - SWH-Solar.csv"
## [1] "Number of rows = 2963.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_SWHqty_2017.rds"
## [1] "Processing Postcode data for small-scale installations 2018 - SWH-Solar.csv"
## [1] "Number of rows = 2963.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_SWHqty_2018.rds"
## [1] "Processing Postcode data for small-scale installations 2019 - SWH-Solar.csv"
## [1] "Number of rows = 2963.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_SWHqty_2019.rds"
## [1] "Processing Postcode data for small-scale installations 2020 - SWH-Solar.csv"
## [1] "Number of rows = 2962.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_SWHqty_2020.rds"
## [1] "Writing CER_SWHqty_all_years.rds"
## [1] "====> exiting function: CER_SWHqty"
test_CER_SWH()
## [1] "========== S Y S T E M T E S T - SWH ========="
## [1] "readRDS CER_SWHqty_all_years.rds"
## [1] "Calculate totals"
## [1] "read current - Postcode data for small-scale installations - SWH-Solar.csv"
## [1] "and extract totals"
## [1] "===== Totals ====="
## [1] "Calc SWH total qty = 1047172.000000"
## [1] "Raw SWH total qty = 1047169.000000"
## [1] "Difference = 3.000000"
## [1] "Difference percent = 0.000286"
print("Processing SWH_ASHP - SWHASHPqty.......")
## [1] "Processing SWH_ASHP - SWHASHPqty......."
process_CER_raw_data("SWH_ASHP", "SWHASHP_qty")
## [1] "process_CER_raw_data"
## [1] "Processing Postcode data for small-scale installations - SWH-Air-source-heat-pump.csv"
## [1] "Number of rows = 2603.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_SWHASHPqty.rds"
## [1] "Processing Postcode data for small-scale installations 2005 - SWH-Air-source-heat-pump.csv"
## [1] "Number of rows = 2548.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_SWHASHPqty_2005.rds"
## [1] "Processing Postcode data for small-scale installations 2006 - SWH-Air-source-heat-pump.csv"
## [1] "Number of rows = 2548.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_SWHASHPqty_2006.rds"
## [1] "Processing Postcode data for small-scale installations 2007 - SWH-Air-source-heat-pump.csv"
## [1] "Number of rows = 2548.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_SWHASHPqty_2007.rds"
## [1] "Processing Postcode data for small-scale installations 2008 - SWH-Air-source-heat-pump.csv"
## [1] "Number of rows = 2548.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_SWHASHPqty_2008.rds"
## [1] "Processing Postcode data for small-scale installations 2009 - SWH-Air-source-heat-pump.csv"
## [1] "Number of rows = 2548.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_SWHASHPqty_2009.rds"
## [1] "Processing Postcode data for small-scale installations 2010 - SWH-Air-source-heat-pump.csv"
## [1] "Number of rows = 2548.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_SWHASHPqty_2010.rds"
## [1] "Processing Postcode data for small-scale installations 2011 - SWH-Air-source-heat-pump.csv"
## [1] "Number of rows = 2548.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_SWHASHPqty_2011.rds"
## [1] "Processing Postcode data for small-scale installations 2012 - SWH-Air-source-heat-pump.csv"
## [1] "Number of rows = 2548.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_SWHASHPqty_2012.rds"
## [1] "Processing Postcode data for small-scale installations 2013 - SWH-Air-source-heat-pump.csv"
## [1] "Number of rows = 2548.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_SWHASHPqty_2013.rds"
## [1] "Processing Postcode data for small-scale installations 2014 - SWH-Air-source-heat-pump.csv"
## [1] "Number of rows = 2548.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_SWHASHPqty_2014.rds"
## [1] "Processing Postcode data for small-scale installations 2016 - SWH-Air-source-heat-pump.csv"
## [1] "Number of rows = 2548.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_SWHASHPqty_2016.rds"
## [1] "Processing Postcode data for small-scale installations 2017 - SWH-Air-source-heat-pump.csv"
## [1] "Number of rows = 2583.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_SWHASHPqty_2017.rds"
## [1] "Processing Postcode data for small-scale installations 2018 - SWH-Air-source-heat-pump.csv"
## [1] "Number of rows = 2583.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_SWHASHPqty_2018.rds"
## [1] "Processing Postcode data for small-scale installations 2019 - SWH-Air-source-heat-pump.csv"
## [1] "Number of rows = 2583.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_SWHASHPqty_2019.rds"
## [1] "Processing Postcode data for small-scale installations 2020 - SWH-Air-source-heat-pump.csv"
## [1] "Number of rows = 2583.000000"
## [1] "Duplicate Postcode = 0.000000"
## [1] "Saving CER_SWHASHPqty_2020.rds"
## [1] "Writing CER_SWHASHPqty_all_years.rds"
## [1] "====> exiting function: CER_SWHASHPqty"
test_CER_SWHASHP()
## [1] "========== S Y S T E M T E S T - SWHASHP ========="
## [1] "readRDS CER_SWHASHPqty_all_years.rds"
## [1] "Calculate totals"
## [1] "read current - Postcode data for small-scale installations - SWH-Air-source-heat-pump.csv"
## [1] "and extract totals"
## [1] "===== Totals ====="
## [1] "Calc SWHASHP total qty = 383469.000000"
## [1] "Raw SWHASHP total qty = 393400.000000"
## [1] "Difference = -9931.000000"
## [1] "Difference percent = -2.524403"
Consolidate - separate chunk so can set directly
# now consolidate.......
# read in all_years for PVqty, PVkW, SWHqty & SWHASHPqty
# join & save as .rds
print("read in all_years for PVqty, PVkW, SWHqty & SWHASHPqty, join & save as .rds")
## [1] "read in all_years for PVqty, PVkW, SWHqty & SWHASHPqty, join & save as .rds"
# https://www.statology.org/join-multiple-data-frames-dplyr/
#setwd(tmp_folder)
data <- readRDS(file=paste0(tmp_folder,"CER_PVqty_all_years.rds")) %>%
full_join(readRDS(file=paste0(tmp_folder,"CER_PVkW_all_years.rds")),by = c("Postcode","year_month")) %>%
full_join(readRDS(file=paste0(tmp_folder,"CER_SWHqty_all_years.rds")), by = c("Postcode","year_month")) %>%
full_join(readRDS(file=paste0(tmp_folder,"CER_SWHASHPqty_all_years.rds")), by = c("Postcode","year_month"))
#setwd(base_folder)
# The joins may result in some numeric columns having NA values
# So convert all NA values in numeric columns to zero
# https://www.delftstack.com/howto/r/replace-na-with-0-in-r/
data <- mutate_if(data, is.numeric, ~replace(., is.na(.), 0))
# Sort by postcode, then year_month - just in case
data <- data[
with(data, order(Postcode, year_month)),
]
# save joined data to a file - CER_PVqty_kW_all_years.rds
print("save joined data to a file - CER_PVqty_kW_SWH_SWHASHP_all_years.rds")
## [1] "save joined data to a file - CER_PVqty_kW_SWH_SWHASHP_all_years.rds"
# ============ >>>>> save in safe, accessible place.....
# check if file exists, and if so delete
rds_filename = paste(output_folder,"CER_PVqty_kW_SWH_SWHASHP_all_years.rds",sep = "")
if(file.exists(rds_filename)) {
file.remove(rds_filename)
}
## [1] TRUE
saveRDS(data,file = rds_filename)
# and save a copy with a date/time in filename
saveRDS(data,file = paste(output_folder,"CER_PVqty_kW_SWH_SWHASHP_all_years - ",run_time ,".rds",sep = ""))
System Test
# ====================== S Y S T E M T E S T =========
# Test that it works !
# *************** Need to update for SWH & SWHASHP data ****************
print("====================== S Y S T E M T E S T =========")
## [1] "====================== S Y S T E M T E S T ========="
print("readRDS CER_PVqty_kW_SWH_SWHASHP_all_years.rds")
## [1] "readRDS CER_PVqty_kW_SWH_SWHASHP_all_years.rds"
read_data <- readRDS(file=paste(output_folder,"CER_PVqty_kW_SWH_SWHASHP_all_years.rds",sep = ""))
print("Calculate totals")
## [1] "Calculate totals"
calc_PV_qty_total <- sum(read_data$PV_qty,na.rm=TRUE)
calc_PV_kW_total <- sum(read_data$PV_kW,na.rm=TRUE)
calc_SWH_qty_total <- sum(read_data$SWH_qty,na.rm=TRUE)
calc_SWHASHP_qty_total <- sum(read_data$SWHASHP_qty,na.rm=TRUE)
how_many_duplicates <- sum(duplicated(read_data[,1:2]))
print("read current - Postcode data for small-scale installations - SGU-Solar.csv")
## [1] "read current - Postcode data for small-scale installations - SGU-Solar.csv"
print("and extract totals")
## [1] "and extract totals"
data_SGU = read.csv(paste(CER_data_folder,"Postcode data for small-scale installations - SGU-Solar.csv",sep = ""))
# Removing commas from selected columns
# Prior to setting to numeric
# https://statisticsglobe.com/modify-numbers-with-comma-as-thousand-separator-in-r
col_conv <- setdiff(names(data_SGU),"Small.Unit.Installation.Postcode")
data_SGU[ , col_conv] <- lapply(data_SGU[ , col_conv], # Convert data
function(x){ as.numeric(as.character(gsub(",", "", x))) })
# set the last 2 columns as numeric
data_SGU$Installations.Quantity.Total <- as.numeric(as.character(data_SGU$Installations.Quantity.Total))
data_SGU$SGU.Rated.Output.In.kW.Total <- as.numeric(as.character(data_SGU$SGU.Rated.Output.In.kW.Total))
raw_PV_qty_total <- sum(data_SGU$Installations.Quantity.Total,na.rm=TRUE)
raw_PV_kW_total <- sum(data_SGU$SGU.Rated.Output.In.kW.Total,na.rm=TRUE)
print("===== Totals =====")
## [1] "===== Totals ====="
print(sprintf("Calc PV total qty = %f", calc_PV_qty_total))
## [1] "Calc PV total qty = 3191103.000000"
print(sprintf("Raw PV total qty = %f", raw_PV_qty_total))
## [1] "Raw PV total qty = 3191201.000000"
print(sprintf("Calc PV kW = %f", calc_PV_kW_total))
## [1] "Calc PV kW = 17481939.378999"
print(sprintf("Raw PV kW = %f", raw_PV_kW_total))
## [1] "Raw PV kW = 17482411.107000"
# Coolio ! - there's a very small discrepancy....
# aggregate the sum of qty & kW for each postcode in the consolidated .rds
print("aggregate the sum of qty & kW for each postcode in the consolidated .rds")
## [1] "aggregate the sum of qty & kW for each postcode in the consolidated .rds"
calc_SGU_PV_qty <- setNames(
aggregate(x = read_data$PV_qty, # Specify data column
by = list(read_data$Postcode), # Specify group indicator
FUN = sum, # Specify function (i.e. sum)
na.rm=TRUE),
c("Postcode","Calc_PV_qty")
)
calc_SGU_PV_kW <- setNames(
aggregate(x = read_data$PV_kW, # Specify data column
by = list(read_data$Postcode), # Specify group indicator
FUN = sum, # Specify function (i.e. sum)
na.rm=TRUE),
c("Postcode","Calc_PV_kW")
)
# and join them into a single table
calc_SGU_PV_qty_kW <- full_join(calc_SGU_PV_qty, calc_SGU_PV_kW, by = "Postcode")
# now get the current CER SGU csv
# rename Postcode column for joining & just keep totals columns
data_SGU = read.csv(paste(CER_data_folder,"Postcode data for small-scale installations - SGU-Solar.csv",sep = ""))
# rename the Postcode column
# better to use rename rather than relocate
data_SGU <- rename(data_SGU, Postcode = Small.Unit.Installation.Postcode)
# now just retain the columns we want
data_SGU <- data_SGU[,names(data_SGU) %in% c("Postcode",
"Installations.Quantity.Total",
"SGU.Rated.Output.In.kW.Total")]
# make the Postcode column - character
# ====== make as integer instead ================= ?????
data_SGU$Postcode <- as.integer(data_SGU$Postcode)
# mtcars %<>% mutate(qsec = as.integer(qsec))
# data_SGU$Postcode <- as.character(data_SGU$Postcode)
# make the columns numeric
# but make sure we strip out commas first !
data_SGU$Installations.Quantity.Total <- as.numeric(gsub(",","",
data_SGU$Installations.Quantity.Total))
data_SGU$SGU.Rated.Output.In.kW.Total <- as.numeric(gsub(",","",
data_SGU$SGU.Rated.Output.In.kW.Total))
# now join all
compare_SGU_data <- full_join(calc_SGU_PV_qty_kW, data_SGU, by = "Postcode")
# may have to clear out N/A ????????
# save the test results
print("save test results")
## [1] "save test results"
saveRDS(compare_SGU_data,file = paste("Test : ",Sys.time(),".rds",sep = ""))
# and look for differences
compare_SGU_data$qty_check <- compare_SGU_data$Calc_PV_qty -
compare_SGU_data$Installations.Quantity.Total
compare_SGU_data$kW_check <- compare_SGU_data$Calc_PV_kW -
compare_SGU_data$SGU.Rated.Output.In.kW.Total
# can also check by viewing the dataframe & sorting compare columns
print("====== Results of checks at postcode level =====")
## [1] "====== Results of checks at postcode level ====="
print(sprintf("Sum of SGU qty checks : %f",sum(compare_SGU_data$qty_check,na.rm=TRUE)))
## [1] "Sum of SGU qty checks : -98.000000"
print(sprintf("Sum of SGU kW checks : %f",sum(compare_SGU_data$kW_check,na.rm=TRUE)))
## [1] "Sum of SGU kW checks : -471.728000"
#sink()
# ============== E N D O F S Y S T E M T E S T ============