Processing Monthly CER data

Last updated - 22 July 2022

This needs refinement to publish as RMarkdown doc

Will also need to include instructions & explanation

code has been taken from CER_tidydata.R

Function to process CER data types

27 June 2022 - SGU-Solar csv file - extra spaces for dates Feb, Mar, Apr 2022

Extra spaces have crept back in for Feb, Mar & Apr 2022 Manually deleted extra spaces in csv Notified CER - Bryant, Darren

07 June 2022 - SGU-Solar csv file - no extra spaces for dates !

Can use csv file without modification of headers :-) But more filename changes ! in test_CER_SWHASHP <- function() note changed mask from “SWH-Air source heat pump” to “SWH-Air-source-heat-pump”- 7 June 2022

09 May 2022 - check state of latest SGU-Solar csv file

Checked csv file and header for Feb 22 and Mar 22 <Feb 2022 - SGU Rated Output In kW> Note the TWO spaces between Feb & 2022, also the same for Mar & 22

will cause the text to date function to miss Feb-22 & Mar-22 Temparary fix is to remove extra space for Feb & Mar 22 from the csv file Used TextEdit

Might have to adjust the parsing code :-( Will also contact them again !

29 Mar 2022 - check processing of PV_kW data for Feb 2022

Local_Generation.Rmd is showing 0 kW installed for Feb 2022

Checked csv file and header for <Feb 2022 - SGU Rated Output In kW> Note the extra space between Feb & 2022, which I guess will cause the text to date function to miss that this is Feb-22 Temparary fix is to remove this space from the csv file

Run with corrected header data for Feb-22 and now shows non-zero kW for Feb-22

Will also provide feedback to CER

29 Mar 2022 - CER have provided 2020 historical data

Refer to next heading para….. CER have now provided the historical data for 2020 So now use the CER provided csv files like <Postcode data for small-scale installations 2020 - SGU-Solar.csv> But have kept the renamed files - now have extension “.temporary”

Ran test - 2020 data not read Checked filenames <Postcode data for small-scale installations 2020 - SGU - Solar.csv> so they have inserted spaces in SGU-Solar Simple hack to get it working…. Remove spaces from “2020 - SGU - Solar” to be “2020 - SGU-Solar” And if they persist with different format, then change the code :-)

Also they have changed filenames for ASHP from <Postcode data for small-scale installations - SWH-Air source heat pump.csv> to <Postcode data for small-scale installations 2020 - SWH - ASHP.csv> Which is wierd, cos the “current” datafile still has unabbreviated filename

So temporary solution is to rename each of the 2020 historical filenames to match the older historical filename convention

Also noticed CER have changed filename for ASHP “current” This changed late last year was <Postcode data for small-scale installations 2019 - SWH-Air-source-heat-pump.csv> now <Postcode data for small-scale installations - SWH-Air source heat pump.csv> so note “-” is now ” ” (space) Temporary fix is to put “-” back in filename for “current”

Problem to solve with CER data with cutover to include 2022 data

The “current” csv files contain - cumulative data for previous years - monthly data for the current year and the previous year So as at March 2022, data is up to date to Jan 2022 - so cumulative data up to end 2020 - monthly data Jan 2021 to Jan 2022 Calculations assume that historical data is available for previous years, ie not included in current csv file So there should be a file <Postcode data for small-scale installations 2020 - SWH-Solar.csv> which would contain monthly data for Jan 2019 to Dec 2020 As at Mar 15 2022, CER have not published this So the calculation algorithm will be excluding Jan 2020 to Dec 2020 data

Process at the end of a year, when Jan data is made available - get new historical data file for year now “bumped” and add it to folder

Possible solutions: 1. Wait until CER provide the historical data file 2. Hack the algorithm - rename the “current” csv that had data to end Dec 2021 to include 2020 in its filename - the algorithm will just get column data for the filename year - try this and check

Ran test - now PV kW & PV qty look to be ok - now do same hack for SWH & ASHPSWH

Feedback provided to CER

From Feedback form on this page http://www.cleanenergyregulator.gov.au/RET/Forms-and-resources/Postcode-data-for-small-scale-installations/historical-postcode-data-for-small-scale-installations Done as 3 feedbacks as feedback textbox is limited in number of chars allowed Submitted 30-Mar-2022

Request to keep historical filenames and columns headers consistent. I use the current and historical CER csv files to track progress of renewable installations over time using programs written in the R language. Really appreciate having access to the data ! The programs depend on the filename conventions and csv header conventions remaining consistent.
I source the data from http://www.cleanenergyregulator.gov.au/RET/Forms-and-resources/Postcode-data-for-small-scale-installations

  1. Consistent csv headers

For the data as at 24 March 2022, in the file “Postcode data for small-scale installations - SGU-Solar.csv” the header <Feb 2022 - SGU Rated Output In kW> has 2 space characters between and <2022> which required adjustment to my algorithm. Request is to keep the same format as

  1. Consistent filenames - SGU-Solar

The filename <Postcode data for small-scale installations 2020 - SGU - Solar.csv> is inconsistent from all previous historical files, eg <Postcode data for small-scale installations 2019 - SGU-Solar.csv> Note that “SGU-Solar” has changed to “SGU - Solar” Request is to keep a consistent format

  1. Consistent filenames - SWH - Air Source Heat Pump

The latest historical filename is <Postcode data for small-scale installations 2020 - SWH - ASHP.csv> This differs from all previous historical filenames, e.g. <Postcode data for small-scale installations 2019 - SWH-Air-source-heat-pump.csv> Also the “current” SWH-ASHP csv filename convention is different to the historical data filenames from 2019 and earlier, ie <Postcode data for small-scale installations - SWH-Air source heat pump.csv> versus <Postcode data for small-scale installations 2019 - SWH-Air-source-heat-pump.csv> Note that the dash “-” has become a space “ “ Request is to keep the file naming conventions consistent

# ================================== F U N C T I O N =================
process_CER_raw_data <- function(CER_type, CER_value) {
  print("process_CER_raw_data")
  column_pattern_mask = "No column pattern"
  write_filename_mask = "No write filename"
if (CER_type =="SGU_Solar") {
  file_pattern = "SGU-Solar.csv"
  if (CER_value == "PV_qty") {
    column_pattern_mask = "...Installations.Quantity"
    write_filename_mask = "CER_PVqty"
  }
  if (CER_value == "PV_kW") {
    column_pattern_mask = "...SGU.Rated.Output.In.kW"
    write_filename_mask = "CER_PVkW"
  }
}
if (CER_type =="SWH_Solar") {
  file_pattern = "SWH-Solar.csv"
  column_pattern_mask = "...Installations.Quantity"
  write_filename_mask = "CER_SWHqty"
}
# EXCEPT - CER has changed the file mask for ASHP- current csv doesn't use some dashes
# so create a special filelist for ASHP files
  if (CER_type =="SWH_ASHP") {
    file_pattern = "SWH-Air-source-heat-pump.csv"
    column_pattern_mask = "...Installations.Quantity"
    write_filename_mask = "CER_SWHASHPqty" 
    file_list <- append(
      as.list(list.files(path=CER_data_folder,pattern = "SWH-Air source heat pump.csv")),
      as.list(list.files(path=CER_data_folder,pattern = file_pattern))
    )
  }
  else {
    # create list of files matching the pattern
    file_list <- as.list(list.files(path=CER_data_folder,pattern = file_pattern))  
  }
# This will get patterns such as "SGU-Solar.csv"
#  print(file_list)
  for (filename in file_list) {
    print(sprintf("Processing %s", filename))
    #  check if filename has "yyyy" in it - no year = current
    filename_year <- str_match(filename,"[0-9]+")
    if(!is.na(filename_year))  {
#     makes write_filename like "CER_PVqty_",filename_year,".rds"
      write_filename = paste(write_filename_mask,"_",filename_year,".rds",sep = "")
    } else {
      write_filename = paste(write_filename_mask,".rds",sep = "")
    }
    data <- read.csv(paste(CER_data_folder,filename,sep = ""))
# as of CER data supplied for end Sep 2021, non-column data was
# added to start of csv, so changed csv read to fread from data.table package
# to handdle skip of lines until column row   
#    data <- fread(paste(CER_data_folder,filename,sep = ""),
#            skip = "Small Unit Installation Postcode",      
#            check.names = TRUE,data.table = FALSE)
    print(sprintf("Number of rows = %f", nrow(data)))
    # now make all columns numeric, except first one (postcode)
    all_columns <- colnames(data)
    all_columns_except_first <- all_columns[-1]
    # ========================
    # Removing commas from selected columns
    # Prior to setting to numeric, otherwise will get NA when casting
    # https://statisticsglobe.com/modify-numbers-with-comma-as-thousand-separator-in-r
    data[ , all_columns_except_first] <- 
      lapply(data[ , all_columns_except_first],
      function(x){ as.numeric(as.character(gsub(",", "", x))) })
    # now cast to numeric
    data[all_columns_except_first] <- sapply(data[all_columns_except_first],as.numeric)
# rename columns now, so can more easily remove duplicate rows (min value)
    data <- rename(data, Postcode = 1,
                 Previous = 2)      
  
    # check if any duplicate postcodes
    my_duplicated <- duplicated(data[,1])
    sum_my_duplicates = sum(my_duplicated)
    print(sprintf("Duplicate Postcode         = %f", sum_my_duplicates))
        # make column one (postcode) to character
    # ================ try as integer instead ===================
#    data[1] <- sapply(data[1],as.character)
    # do this later just before saving......
    if(sum_my_duplicates > 0) {
        my_duplicated_table <- data[my_duplicated,]
        # https://www.biostars.org/p/304213/
        # remove duplicates - keep highest value of a column
        data_before_remove <- data
        data <- data[order(data$Postcode, -abs(data$Previous) ), ] ### sort first
        data <- data[ !duplicated(data$Postcode), ]  ### Keep highest
#        browser(text="Found duplicates")
    }

# ====== This will be the place to remove redundant space between Month(Mmm) and Year(yy)
# ====== Since Feb 2022, for SGU Solar, and extra space has been inserted
# ====== so remove it, so that the column name in the csv file will match
# ====== the column_pattern_mask
# ======     N O T E       T O   DO   ========
# maybe use something like str_squish   ????  see https://statisticsglobe.com/replace-multiple-spaces-by-single-space-in-r
# ====== OR maybe this article is better
# ===== https://www.codingprof.com/5-easy-ways-to-replace-blanks-in-column-names-in-r-examples/
# ===== suggest try it out by just reading in an offending csv file.....    
    
        
    if(is.na(filename_year)) {
      # current data doesn't have yyyy in filename
      # column_pattern will be like  "[a-zA-Z]+.[0-9]+...Installations.Quantity"
      column_pattern = paste("[a-zA-Z]+.[0-9]+",column_pattern_mask, sep = "")
    }
    else {
      # just interested in getting column data for the year of the filename
      # as historical files have data for 2 year spread
      column_pattern <- paste("[a-zA-Z]+.",filename_year,column_pattern_mask, sep = "")
    }
    # get a vector just containing column names we are interested in
    column_list_vector <- as.vector(
      na.omit(
        str_match(colnames(data),column_pattern)))
    # for now, values_to generic "CER_value_col"
    # and then rename the column just before saving
    # Note that pivot_longer is preferred over gather - https://tidyr.tidyverse.org/reference/pivot_longer.html
    data <- pivot_longer(data,
                         cols = all_of(column_list_vector),
                         names_to = "year_month_string",
                         values_to = "CER_value_col")
    # now add "dd=01" to the year_month_string so we can force it to as.Date
    # then add 1 month
    # and subtract a day so we get end-of-month date
    # as that is how the CER data is reported
    # see https://statisticsglobe.com/add-subtract-months-years-from-date-in-r
    data <- mutate(data,
                   year_month =
                     (as.Date(
                       paste0(sub("^([^.]*.[^.]*).*", "\\1",
                                  year_month_string),".01"),
                       format = "%B.%Y.%d" )
                      %m+% months(1))
                   %m-% days(1)
    )
    # rename the postcode column
    # already done above
#    data <- rename(data, Postcode = Small.Unit.Installation.Postcode)
    # and make Postcode column integer
    data <- data %>% mutate(Postcode = as.integer((Postcode)))  

    # swhashp_totals <- data.frame(year = numeric(),          
    #                              year_total = numeric(),
    #                              total_qty = numeric() )       
    if(CER_value == "SWHASHP_qty") {
      # before we remove columns, get the sum of last column & save it
#      browser(text="about to SWHASHP adding rows for years")
      swhashp_totals <- swhashp_totals %>% add_row(year=as.integer(filename_year),
              year_total = sum(data$CER_value_col),
              total_qty = sum(data$Installations.Quantity.Total))
#      browser(text="SWHASHP adding rows for years")
    }
    # now just retain the columns we want
    data <- data[,names(data) %in% c("Postcode",
                                     "year_month","CER_value_col")]
    # rename the CER_value_col
    if(CER_value == "PV_qty") {
      # rename the CER_value_col column
      data <- rename(data, PV_qty = CER_value_col)
      # get the columns in the right order
      data <- relocate(data, PV_qty, .after = year_month)
    }
    if(CER_value == "PV_kW") {
      # rename the CER_value_col column
      data <- rename(data, PV_kW = CER_value_col)
      # get the columns in the right order
      data <- relocate(data, PV_kW, .after = year_month)
    }
    if(CER_value == "SWH_qty") {
      # rename the CER_value_col column
      data <- rename(data, SWH_qty = CER_value_col)
      # get the columns in the right order
      data <- relocate(data, SWH_qty, .after = year_month)
    }
    if(CER_value == "SWHASHP_qty") {
      # rename the CER_value_col column
      data <- rename(data, SWHASHP_qty = CER_value_col)
      # get the columns in the right order
      data <- relocate(data, SWHASHP_qty, .after = year_month)
    }
# Sort by postcode, then year_month
    data <- data[
      with(data, order(Postcode, year_month)),
    ]
#    View(data)
    # save the tidy data format as .rds
    print(sprintf("Saving %s",write_filename)) 
    saveRDS(data, paste(tmp_folder,write_filename,sep = ""))
 }
# E N D   O F   L O O P  - should now have rds files for each year + current
# Now create a consolidated file of all years data for eg PVqty, PVkW, etc
# Read all the CER_PVqty/kW, etc rds files into a dataframe
# should give us a list matching patterns like "^CER_PVqty.*\\.rds$"
  setwd(tmp_folder)
  tidy_file_list <- as.list(list.files(pattern= paste("^",write_filename_mask,".*\\.rds$", sep = "")))
  # bind all the matching files into a data frame
  my_tidy_data <- do.call(rbind, lapply(tidy_file_list, readRDS))
  # now set working directory back to base
  setwd(base_folder)
  # Sort by postcode, then year_month
  my_tidy_data <- my_tidy_data[
    with(my_tidy_data, order(Postcode, year_month)),
  ]
  # save this to a file like CER_PVqty_all_years.rds in tmp folder
  print(sprintf("Writing %s", paste(write_filename_mask,"_all_years.rds", sep = "")))
  saveRDS(my_tidy_data, file = paste(tmp_folder,write_filename_mask,"_all_years.rds", sep = ""))
return(paste("====> exiting function: ",write_filename_mask))
}


# ================================== F U N C T I O N =================

Test functions

# ================================== F U N C T I O N =================
test_CER_PVkW <- function() {

# Now do a test  
# Try doing test for CER_PVkW_all_years.rds
# and check totals against csv file in CER_data
# Then repeat for other data - PVqty, SWH, SWHASHP
print("========== S Y S T E M    T E S T - PVkW =========")
print("readRDS CER_PVkW_all_years.rds")
read_data <- readRDS(file=paste(tmp_folder,"CER_PVkW_all_years.rds",sep = ""))
print("Calculate totals")
calc_PV_kW_total <- sum(read_data$PV_kW,na.rm=TRUE)
print("read current - Postcode data for small-scale installations - SGU-Solar.csv")
print("and extract totals")
data_SGU = read.csv(paste(CER_data_folder,"Postcode data for small-scale installations - SGU-Solar.csv",sep = ""))
# Removing commas from selected columns
# Prior to setting to numeric
# https://statisticsglobe.com/modify-numbers-with-comma-as-thousand-separator-in-r
col_conv <- setdiff(names(data_SGU),"Small.Unit.Installation.Postcode")
data_SGU[ , col_conv] <- lapply(data_SGU[ , col_conv],  # Convert data
                                function(x){ as.numeric(as.character(gsub(",", "", x))) })
# set the last 2 columns as numeric
data_SGU$SGU.Rated.Output.In.kW.Total <- as.numeric(as.character(data_SGU$SGU.Rated.Output.In.kW.Total))

raw_PV_kW_total <- sum(data_SGU$SGU.Rated.Output.In.kW.Total,na.rm=TRUE)
print("===== Totals =====")
print(sprintf("Calc PV kW         = %f", calc_PV_kW_total))
print(sprintf("Raw PV kW          = %f", raw_PV_kW_total))
print(sprintf("Difference         = %f", calc_PV_kW_total - raw_PV_kW_total))
print(sprintf("Difference percent = %f", 100*(calc_PV_kW_total-raw_PV_kW_total)/raw_PV_kW_total))
}

# ================================== F U N C T I O N =================
test_CER_PVqty <- function() {
  
  # Now do a test  
  # Try doing test for CER_PVqty_all_years.rds
  # and check totals against csv file in CER_data
  # Then repeat for other data - PVqty, SWH, SWHASHP
  print("========== S Y S T E M    T E S T - PVqty =========")
  print("readRDS CER_PVqty_all_years.rds")
  read_data <- readRDS(file=paste(tmp_folder,"CER_PVqty_all_years.rds",sep = ""))
  print("Calculate totals")
  calc_PV_qty_total <- sum(read_data$PV_qty,na.rm=TRUE)
  
  print("read current - Postcode data for small-scale installations - SGU-Solar.csv")
  print("and extract totals")
  data_SGU = read.csv(paste(CER_data_folder,"Postcode data for small-scale installations - SGU-Solar.csv",sep = ""))
  # Removing commas from selected columns
  # Prior to setting to numeric
  # https://statisticsglobe.com/modify-numbers-with-comma-as-thousand-separator-in-r
  col_conv <- setdiff(names(data_SGU),"Small.Unit.Installation.Postcode")
  data_SGU[ , col_conv] <- lapply(data_SGU[ , col_conv],  # Convert data
                                  function(x){ as.numeric(as.character(gsub(",", "", x))) })
  # set the last 2 columns as numeric
  data_SGU$Installations.Quantity.Total <- as.numeric(as.character(data_SGU$Installations.Quantity.Total))
#  data_SGU$SGU.Rated.Output.In.kW.Total <- as.numeric(as.character(data_SGU$SGU.Rated.Output.In.kW.Total))
  
  raw_PV_qty_total <- sum(data_SGU$Installations.Quantity.Total,na.rm=TRUE)
#  raw_PV_kW_total <- sum(data_SGU$SGU.Rated.Output.In.kW.Total,na.rm=TRUE)
  print("===== Totals =====")
  print(sprintf("Calc PV total qty = %f", calc_PV_qty_total))
  print(sprintf("Raw PV total qty  = %f", raw_PV_qty_total))
#  print(sprintf("Calc PV kW        = %f", calc_PV_kW_total))
#  print(sprintf("Raw PV kW         = %f", raw_PV_kW_total))
  
  print(sprintf("Difference         = %f", calc_PV_qty_total - raw_PV_qty_total))
  print(sprintf("Difference percent = %f", 100*(calc_PV_qty_total-raw_PV_qty_total)/raw_PV_qty_total))
}

# ================================== F U N C T I O N =================
test_CER_SWH <- function() {
  
  # Now do a test  
  # Try doing test for CER_SWH_all_years.rds
  # and check totals against csv file in CER_data
  # Then repeat for other data - PVqty, SWH, SWHASHP
  print("========== S Y S T E M    T E S T - SWH =========")
  print("readRDS CER_SWHqty_all_years.rds")
  read_data <- readRDS(file=paste(tmp_folder,"CER_SWHqty_all_years.rds",sep = ""))
  print("Calculate totals")
  calc_SWH_qty_total <- sum(read_data$SWH_qty,na.rm=TRUE)
  
  print("read current - Postcode data for small-scale installations - SWH-Solar.csv")
  print("and extract totals")
  data_SGU = read.csv(paste(CER_data_folder,"Postcode data for small-scale installations - SWH-Solar.csv",sep = ""))
  # Removing commas from selected columns
  # Prior to setting to numeric
  # https://statisticsglobe.com/modify-numbers-with-comma-as-thousand-separator-in-r
  col_conv <- setdiff(names(data_SGU),"Small.Unit.Installation.Postcode")
  data_SGU[ , col_conv] <- lapply(data_SGU[ , col_conv],  # Convert data
                                  function(x){ as.numeric(as.character(gsub(",", "", x))) })
  # set the last column as numeric
  data_SGU$Installations.Quantity.Total <- as.numeric(as.character(data_SGU$Installations.Quantity.Total))
  #  data_SGU$SGU.Rated.Output.In.kW.Total <- as.numeric(as.character(data_SGU$SGU.Rated.Output.In.kW.Total))
  
  raw_SWH_qty_total <- sum(data_SGU$Installations.Quantity.Total,na.rm=TRUE)
  #  raw_PV_kW_total <- sum(data_SGU$SGU.Rated.Output.In.kW.Total,na.rm=TRUE)
  print("===== Totals =====")
  print(sprintf("Calc SWH total qty = %f", calc_SWH_qty_total))
  print(sprintf("Raw SWH total qty  = %f", raw_SWH_qty_total))

  print(sprintf("Difference         = %f", calc_SWH_qty_total - raw_SWH_qty_total))
  print(sprintf("Difference percent = %f", 100*(calc_SWH_qty_total-raw_SWH_qty_total)/raw_SWH_qty_total))
}

# ================================== F U N C T I O N =================
test_CER_SWHASHP <- function() {
  
  # Now do a test  
  # Try doing test for CER_SWH_all_years.rds
  # and check totals against csv file in CER_data
  # Then repeat for other data - PVqty, SWH, SWHASHP
  print("========== S Y S T E M    T E S T - SWHASHP =========")
  print("readRDS CER_SWHASHPqty_all_years.rds")
  read_data <- readRDS(file=paste(tmp_folder,"CER_SWHASHPqty_all_years.rds",sep = ""))
  print("Calculate totals")
  calc_SWHASHP_qty_total <- sum(read_data$SWHASHP_qty,na.rm=TRUE)
  
# ======= note changed mask from "SWH-Air source heat pump" 
# ======= to "SWH-Air-source-heat-pump"- 7 June 2022
  print("read current - Postcode data for small-scale installations - SWH-Air-source-heat-pump.csv")
  print("and extract totals")
  data_SGU = read.csv(paste(CER_data_folder,"Postcode data for small-scale installations - SWH-Air-source-heat-pump.csv",sep = ""))
  # Removing commas from selected columns
  # Prior to setting to numeric
  # https://statisticsglobe.com/modify-numbers-with-comma-as-thousand-separator-in-r
  col_conv <- setdiff(names(data_SGU),"Small.Unit.Installation.Postcode")
  data_SGU[ , col_conv] <- lapply(data_SGU[ , col_conv],  # Convert data
                                  function(x){ as.numeric(as.character(gsub(",", "", x))) })
  # set the last column as numeric
  data_SGU$Installations.Quantity.Total <- as.numeric(as.character(data_SGU$Installations.Quantity.Total))
  #  data_SGU$SGU.Rated.Output.In.kW.Total <- as.numeric(as.character(data_SGU$SGU.Rated.Output.In.kW.Total))
  
  raw_SWHASHP_qty_total <- sum(data_SGU$Installations.Quantity.Total,na.rm=TRUE)
  #  raw_PV_kW_total <- sum(data_SGU$SGU.Rated.Output.In.kW.Total,na.rm=TRUE)
  print("===== Totals =====")
  print(sprintf("Calc SWHASHP total qty = %f", calc_SWHASHP_qty_total))
  print(sprintf("Raw SWHASHP total qty  = %f", raw_SWHASHP_qty_total))
  
  print(sprintf("Difference         = %f", calc_SWHASHP_qty_total - raw_SWHASHP_qty_total))
  print(sprintf("Difference percent = %f", 100*(calc_SWHASHP_qty_total-raw_SWHASHP_qty_total)/raw_SWHASHP_qty_total))

#  browser(text="End of test_CER_SWHASHP function")  
  
  }

Main processing

# ================ T H I S    I S   T H E    R E A L   P R O C E S S =============
# define some global variables for folders
base_folder <- getwd()
CER_data_folder <- paste(base_folder,"/CER_data/",sep = "")
tmp_folder <- paste(base_folder,"/tmp/",sep = "")
output_folder <- paste(base_folder,"/Output/",sep = "")
# test_folder <- paste(getwd(),"/test/",sep = "")
run_time <- sprintf("%s",Sys.time())
# Specify empty vectors in data.frame
swhashp_totals <- data.frame(year = numeric(),          
                     year_total = numeric(),
                     total_qty = numeric() )

# = clear out RDS files in tmp folder
# options
# https://stackoverflow.com/questions/9296377/automatically-delete-files-folders
# do.call(file.remove, list(list.files("C:/Temp", full.names = TRUE)))

#list.files(tmp_folder)
do.call(file.remove, list(list.files(tmp_folder, full.names = TRUE)))
##  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [46] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [61] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [76] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#list.files(tmp_folder)

#sink(sprintf("CER tidydata:%s.log",run_time))
print("CER tidydata Log")
## [1] "CER tidydata Log"
print("================")
## [1] "================"
print(sprintf("Run on %s",run_time))
## [1] "Run on 2022-07-22 12:26:32"
print("Processing SGU_Solar - PV_kW.......")
## [1] "Processing SGU_Solar - PV_kW......."
process_CER_raw_data("SGU_Solar", "PV_kW")
## [1] "process_CER_raw_data"
## [1] "Processing Postcode data for small-scale installations - SGU-Solar.csv"
## [1] "Number of rows = 2805.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVkW.rds"
## [1] "Processing Postcode data for small-scale installations 2001 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVkW_2001.rds"
## [1] "Processing Postcode data for small-scale installations 2002 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVkW_2002.rds"
## [1] "Processing Postcode data for small-scale installations 2003 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVkW_2003.rds"
## [1] "Processing Postcode data for small-scale installations 2004 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVkW_2004.rds"
## [1] "Processing Postcode data for small-scale installations 2005 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVkW_2005.rds"
## [1] "Processing Postcode data for small-scale installations 2006 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVkW_2006.rds"
## [1] "Processing Postcode data for small-scale installations 2007 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVkW_2007.rds"
## [1] "Processing Postcode data for small-scale installations 2008 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVkW_2008.rds"
## [1] "Processing Postcode data for small-scale installations 2009 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVkW_2009.rds"
## [1] "Processing Postcode data for small-scale installations 2010 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVkW_2010.rds"
## [1] "Processing Postcode data for small-scale installations 2011 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVkW_2011.rds"
## [1] "Processing Postcode data for small-scale installations 2012 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVkW_2012.rds"
## [1] "Processing Postcode data for small-scale installations 2013 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVkW_2013.rds"
## [1] "Processing Postcode data for small-scale installations 2014 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVkW_2014.rds"
## [1] "Processing Postcode data for small-scale installations 2015 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVkW_2015.rds"
## [1] "Processing Postcode data for small-scale installations 2016 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVkW_2016.rds"
## [1] "Processing Postcode data for small-scale installations 2017 - SGU-Solar.csv"
## [1] "Number of rows = 2800.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVkW_2017.rds"
## [1] "Processing Postcode data for small-scale installations 2018 - SGU-Solar.csv"
## [1] "Number of rows = 2800.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVkW_2018.rds"
## [1] "Processing Postcode data for small-scale installations 2019 - SGU-Solar.csv"
## [1] "Number of rows = 2800.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVkW_2019.rds"
## [1] "Processing Postcode data for small-scale installations 2020 - SGU-Solar.csv"
## [1] "Number of rows = 2800.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVkW_2020.rds"
## [1] "Writing CER_PVkW_all_years.rds"
## [1] "====> exiting function:  CER_PVkW"
test_CER_PVkW()
## [1] "========== S Y S T E M    T E S T - PVkW ========="
## [1] "readRDS CER_PVkW_all_years.rds"
## [1] "Calculate totals"
## [1] "read current - Postcode data for small-scale installations - SGU-Solar.csv"
## [1] "and extract totals"
## [1] "===== Totals ====="
## [1] "Calc PV kW         = 17481939.378999"
## [1] "Raw PV kW          = 17482411.107000"
## [1] "Difference         = -471.728001"
## [1] "Difference percent = -0.002698"
print("Processing SGU_Solar - PV_qty.......")
## [1] "Processing SGU_Solar - PV_qty......."
process_CER_raw_data("SGU_Solar", "PV_qty")
## [1] "process_CER_raw_data"
## [1] "Processing Postcode data for small-scale installations - SGU-Solar.csv"
## [1] "Number of rows = 2805.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVqty.rds"
## [1] "Processing Postcode data for small-scale installations 2001 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVqty_2001.rds"
## [1] "Processing Postcode data for small-scale installations 2002 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVqty_2002.rds"
## [1] "Processing Postcode data for small-scale installations 2003 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVqty_2003.rds"
## [1] "Processing Postcode data for small-scale installations 2004 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVqty_2004.rds"
## [1] "Processing Postcode data for small-scale installations 2005 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVqty_2005.rds"
## [1] "Processing Postcode data for small-scale installations 2006 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVqty_2006.rds"
## [1] "Processing Postcode data for small-scale installations 2007 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVqty_2007.rds"
## [1] "Processing Postcode data for small-scale installations 2008 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVqty_2008.rds"
## [1] "Processing Postcode data for small-scale installations 2009 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVqty_2009.rds"
## [1] "Processing Postcode data for small-scale installations 2010 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVqty_2010.rds"
## [1] "Processing Postcode data for small-scale installations 2011 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVqty_2011.rds"
## [1] "Processing Postcode data for small-scale installations 2012 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVqty_2012.rds"
## [1] "Processing Postcode data for small-scale installations 2013 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVqty_2013.rds"
## [1] "Processing Postcode data for small-scale installations 2014 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVqty_2014.rds"
## [1] "Processing Postcode data for small-scale installations 2015 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVqty_2015.rds"
## [1] "Processing Postcode data for small-scale installations 2016 - SGU-Solar.csv"
## [1] "Number of rows = 2795.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVqty_2016.rds"
## [1] "Processing Postcode data for small-scale installations 2017 - SGU-Solar.csv"
## [1] "Number of rows = 2800.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVqty_2017.rds"
## [1] "Processing Postcode data for small-scale installations 2018 - SGU-Solar.csv"
## [1] "Number of rows = 2800.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVqty_2018.rds"
## [1] "Processing Postcode data for small-scale installations 2019 - SGU-Solar.csv"
## [1] "Number of rows = 2800.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVqty_2019.rds"
## [1] "Processing Postcode data for small-scale installations 2020 - SGU-Solar.csv"
## [1] "Number of rows = 2800.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_PVqty_2020.rds"
## [1] "Writing CER_PVqty_all_years.rds"
## [1] "====> exiting function:  CER_PVqty"
test_CER_PVqty()
## [1] "========== S Y S T E M    T E S T - PVqty ========="
## [1] "readRDS CER_PVqty_all_years.rds"
## [1] "Calculate totals"
## [1] "read current - Postcode data for small-scale installations - SGU-Solar.csv"
## [1] "and extract totals"
## [1] "===== Totals ====="
## [1] "Calc PV total qty = 3191103.000000"
## [1] "Raw PV total qty  = 3191201.000000"
## [1] "Difference         = -98.000000"
## [1] "Difference percent = -0.003071"
print("Processing SWH_Solar - SWHqty.......")
## [1] "Processing SWH_Solar - SWHqty......."
process_CER_raw_data("SWH_Solar", "SWH_qty")
## [1] "process_CER_raw_data"
## [1] "Processing Postcode data for small-scale installations - SWH-Solar.csv"
## [1] "Number of rows = 2963.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_SWHqty.rds"
## [1] "Processing Postcode data for small-scale installations 2001 - SWH-Solar.csv"
## [1] "Number of rows = 2952.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_SWHqty_2001.rds"
## [1] "Processing Postcode data for small-scale installations 2002 - SWH-Solar.csv"
## [1] "Number of rows = 2952.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_SWHqty_2002.rds"
## [1] "Processing Postcode data for small-scale installations 2003 - SWH-Solar.csv"
## [1] "Number of rows = 2952.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_SWHqty_2003.rds"
## [1] "Processing Postcode data for small-scale installations 2004 - SWH-Solar.csv"
## [1] "Number of rows = 2952.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_SWHqty_2004.rds"
## [1] "Processing Postcode data for small-scale installations 2005 - SWH-Solar.csv"
## [1] "Number of rows = 2952.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_SWHqty_2005.rds"
## [1] "Processing Postcode data for small-scale installations 2006 - SWH-Solar.csv"
## [1] "Number of rows = 2952.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_SWHqty_2006.rds"
## [1] "Processing Postcode data for small-scale installations 2007 - SWH-Solar.csv"
## [1] "Number of rows = 2952.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_SWHqty_2007.rds"
## [1] "Processing Postcode data for small-scale installations 2008 - SWH-Solar.csv"
## [1] "Number of rows = 2952.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_SWHqty_2008.rds"
## [1] "Processing Postcode data for small-scale installations 2009 - SWH-Solar.csv"
## [1] "Number of rows = 2952.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_SWHqty_2009.rds"
## [1] "Processing Postcode data for small-scale installations 2010 - SWH-Solar.csv"
## [1] "Number of rows = 2952.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_SWHqty_2010.rds"
## [1] "Processing Postcode data for small-scale installations 2011 - SWH-Solar.csv"
## [1] "Number of rows = 2952.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_SWHqty_2011.rds"
## [1] "Processing Postcode data for small-scale installations 2012 - SWH-Solar.csv"
## [1] "Number of rows = 2952.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_SWHqty_2012.rds"
## [1] "Processing Postcode data for small-scale installations 2013 - SWH-Solar.csv"
## [1] "Number of rows = 2952.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_SWHqty_2013.rds"
## [1] "Processing Postcode data for small-scale installations 2014 - SWH-Solar.csv"
## [1] "Number of rows = 2952.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_SWHqty_2014.rds"
## [1] "Processing Postcode data for small-scale installations 2015 - SWH-Solar.csv"
## [1] "Number of rows = 2952.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_SWHqty_2015.rds"
## [1] "Processing Postcode data for small-scale installations 2016 - SWH-Solar.csv"
## [1] "Number of rows = 2952.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_SWHqty_2016.rds"
## [1] "Processing Postcode data for small-scale installations 2017 - SWH-Solar.csv"
## [1] "Number of rows = 2963.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_SWHqty_2017.rds"
## [1] "Processing Postcode data for small-scale installations 2018 - SWH-Solar.csv"
## [1] "Number of rows = 2963.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_SWHqty_2018.rds"
## [1] "Processing Postcode data for small-scale installations 2019 - SWH-Solar.csv"
## [1] "Number of rows = 2963.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_SWHqty_2019.rds"
## [1] "Processing Postcode data for small-scale installations 2020 - SWH-Solar.csv"
## [1] "Number of rows = 2962.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_SWHqty_2020.rds"
## [1] "Writing CER_SWHqty_all_years.rds"
## [1] "====> exiting function:  CER_SWHqty"
test_CER_SWH()
## [1] "========== S Y S T E M    T E S T - SWH ========="
## [1] "readRDS CER_SWHqty_all_years.rds"
## [1] "Calculate totals"
## [1] "read current - Postcode data for small-scale installations - SWH-Solar.csv"
## [1] "and extract totals"
## [1] "===== Totals ====="
## [1] "Calc SWH total qty = 1047172.000000"
## [1] "Raw SWH total qty  = 1047169.000000"
## [1] "Difference         = 3.000000"
## [1] "Difference percent = 0.000286"
print("Processing SWH_ASHP - SWHASHPqty.......")
## [1] "Processing SWH_ASHP - SWHASHPqty......."
process_CER_raw_data("SWH_ASHP", "SWHASHP_qty")
## [1] "process_CER_raw_data"
## [1] "Processing Postcode data for small-scale installations - SWH-Air-source-heat-pump.csv"
## [1] "Number of rows = 2603.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_SWHASHPqty.rds"
## [1] "Processing Postcode data for small-scale installations 2005 - SWH-Air-source-heat-pump.csv"
## [1] "Number of rows = 2548.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_SWHASHPqty_2005.rds"
## [1] "Processing Postcode data for small-scale installations 2006 - SWH-Air-source-heat-pump.csv"
## [1] "Number of rows = 2548.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_SWHASHPqty_2006.rds"
## [1] "Processing Postcode data for small-scale installations 2007 - SWH-Air-source-heat-pump.csv"
## [1] "Number of rows = 2548.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_SWHASHPqty_2007.rds"
## [1] "Processing Postcode data for small-scale installations 2008 - SWH-Air-source-heat-pump.csv"
## [1] "Number of rows = 2548.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_SWHASHPqty_2008.rds"
## [1] "Processing Postcode data for small-scale installations 2009 - SWH-Air-source-heat-pump.csv"
## [1] "Number of rows = 2548.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_SWHASHPqty_2009.rds"
## [1] "Processing Postcode data for small-scale installations 2010 - SWH-Air-source-heat-pump.csv"
## [1] "Number of rows = 2548.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_SWHASHPqty_2010.rds"
## [1] "Processing Postcode data for small-scale installations 2011 - SWH-Air-source-heat-pump.csv"
## [1] "Number of rows = 2548.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_SWHASHPqty_2011.rds"
## [1] "Processing Postcode data for small-scale installations 2012 - SWH-Air-source-heat-pump.csv"
## [1] "Number of rows = 2548.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_SWHASHPqty_2012.rds"
## [1] "Processing Postcode data for small-scale installations 2013 - SWH-Air-source-heat-pump.csv"
## [1] "Number of rows = 2548.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_SWHASHPqty_2013.rds"
## [1] "Processing Postcode data for small-scale installations 2014 - SWH-Air-source-heat-pump.csv"
## [1] "Number of rows = 2548.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_SWHASHPqty_2014.rds"
## [1] "Processing Postcode data for small-scale installations 2016 - SWH-Air-source-heat-pump.csv"
## [1] "Number of rows = 2548.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_SWHASHPqty_2016.rds"
## [1] "Processing Postcode data for small-scale installations 2017 - SWH-Air-source-heat-pump.csv"
## [1] "Number of rows = 2583.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_SWHASHPqty_2017.rds"
## [1] "Processing Postcode data for small-scale installations 2018 - SWH-Air-source-heat-pump.csv"
## [1] "Number of rows = 2583.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_SWHASHPqty_2018.rds"
## [1] "Processing Postcode data for small-scale installations 2019 - SWH-Air-source-heat-pump.csv"
## [1] "Number of rows = 2583.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_SWHASHPqty_2019.rds"
## [1] "Processing Postcode data for small-scale installations 2020 - SWH-Air-source-heat-pump.csv"
## [1] "Number of rows = 2583.000000"
## [1] "Duplicate Postcode         = 0.000000"
## [1] "Saving CER_SWHASHPqty_2020.rds"
## [1] "Writing CER_SWHASHPqty_all_years.rds"
## [1] "====> exiting function:  CER_SWHASHPqty"
test_CER_SWHASHP()
## [1] "========== S Y S T E M    T E S T - SWHASHP ========="
## [1] "readRDS CER_SWHASHPqty_all_years.rds"
## [1] "Calculate totals"
## [1] "read current - Postcode data for small-scale installations - SWH-Air-source-heat-pump.csv"
## [1] "and extract totals"
## [1] "===== Totals ====="
## [1] "Calc SWHASHP total qty = 383469.000000"
## [1] "Raw SWHASHP total qty  = 393400.000000"
## [1] "Difference         = -9931.000000"
## [1] "Difference percent = -2.524403"

Consolidate - separate chunk so can set directly

# now consolidate.......
# read in all_years for PVqty, PVkW, SWHqty & SWHASHPqty
# join & save as .rds
print("read in all_years for PVqty, PVkW, SWHqty & SWHASHPqty, join & save as .rds")
## [1] "read in all_years for PVqty, PVkW, SWHqty & SWHASHPqty, join & save as .rds"
# https://www.statology.org/join-multiple-data-frames-dplyr/
#setwd(tmp_folder)
data <- readRDS(file=paste0(tmp_folder,"CER_PVqty_all_years.rds")) %>%
    full_join(readRDS(file=paste0(tmp_folder,"CER_PVkW_all_years.rds")),by = c("Postcode","year_month")) %>%
    full_join(readRDS(file=paste0(tmp_folder,"CER_SWHqty_all_years.rds")), by = c("Postcode","year_month")) %>%
    full_join(readRDS(file=paste0(tmp_folder,"CER_SWHASHPqty_all_years.rds")), by = c("Postcode","year_month"))
#setwd(base_folder)  
# The joins may result in some numeric columns having NA values
# So convert all NA values in numeric columns to zero
# https://www.delftstack.com/howto/r/replace-na-with-0-in-r/
data <- mutate_if(data, is.numeric, ~replace(., is.na(.), 0))
# Sort by postcode, then year_month - just in case
data <- data[
  with(data, order(Postcode, year_month)),
]
# save joined data to a file - CER_PVqty_kW_all_years.rds
print("save joined data to a file - CER_PVqty_kW_SWH_SWHASHP_all_years.rds")
## [1] "save joined data to a file - CER_PVqty_kW_SWH_SWHASHP_all_years.rds"
# ============ >>>>> save in safe, accessible place.....

# check if file exists, and if so delete
rds_filename = paste(output_folder,"CER_PVqty_kW_SWH_SWHASHP_all_years.rds",sep = "")
if(file.exists(rds_filename)) {
  file.remove(rds_filename)
}
## [1] TRUE
saveRDS(data,file = rds_filename)
# and save a copy with a date/time in filename
saveRDS(data,file = paste(output_folder,"CER_PVqty_kW_SWH_SWHASHP_all_years - ",run_time ,".rds",sep = ""))

System Test

# ====================== S Y S T E M    T E S T =========  
# Test that it works ! 
# *************** Need to update for SWH & SWHASHP data ****************
print("====================== S Y S T E M    T E S T =========")
## [1] "====================== S Y S T E M    T E S T ========="
print("readRDS CER_PVqty_kW_SWH_SWHASHP_all_years.rds")
## [1] "readRDS CER_PVqty_kW_SWH_SWHASHP_all_years.rds"
read_data <- readRDS(file=paste(output_folder,"CER_PVqty_kW_SWH_SWHASHP_all_years.rds",sep = ""))
print("Calculate totals")
## [1] "Calculate totals"
calc_PV_qty_total <- sum(read_data$PV_qty,na.rm=TRUE)
calc_PV_kW_total <- sum(read_data$PV_kW,na.rm=TRUE)
calc_SWH_qty_total <- sum(read_data$SWH_qty,na.rm=TRUE)
calc_SWHASHP_qty_total <- sum(read_data$SWHASHP_qty,na.rm=TRUE)

how_many_duplicates <- sum(duplicated(read_data[,1:2]))


print("read current - Postcode data for small-scale installations - SGU-Solar.csv")
## [1] "read current - Postcode data for small-scale installations - SGU-Solar.csv"
print("and extract totals")
## [1] "and extract totals"
data_SGU = read.csv(paste(CER_data_folder,"Postcode data for small-scale installations - SGU-Solar.csv",sep = ""))
# Removing commas from selected columns
# Prior to setting to numeric
# https://statisticsglobe.com/modify-numbers-with-comma-as-thousand-separator-in-r
col_conv <- setdiff(names(data_SGU),"Small.Unit.Installation.Postcode")
data_SGU[ , col_conv] <- lapply(data_SGU[ , col_conv],  # Convert data
                                function(x){ as.numeric(as.character(gsub(",", "", x))) })
# set the last 2 columns as numeric
data_SGU$Installations.Quantity.Total <- as.numeric(as.character(data_SGU$Installations.Quantity.Total))
data_SGU$SGU.Rated.Output.In.kW.Total <- as.numeric(as.character(data_SGU$SGU.Rated.Output.In.kW.Total))

raw_PV_qty_total <- sum(data_SGU$Installations.Quantity.Total,na.rm=TRUE)
raw_PV_kW_total <- sum(data_SGU$SGU.Rated.Output.In.kW.Total,na.rm=TRUE)
print("===== Totals =====")
## [1] "===== Totals ====="
print(sprintf("Calc PV total qty = %f", calc_PV_qty_total))
## [1] "Calc PV total qty = 3191103.000000"
print(sprintf("Raw PV total qty  = %f", raw_PV_qty_total))
## [1] "Raw PV total qty  = 3191201.000000"
print(sprintf("Calc PV kW        = %f", calc_PV_kW_total))
## [1] "Calc PV kW        = 17481939.378999"
print(sprintf("Raw PV kW         = %f", raw_PV_kW_total))
## [1] "Raw PV kW         = 17482411.107000"
# Coolio ! - there's a very small discrepancy....





# aggregate the sum of qty & kW for each postcode in the consolidated .rds
print("aggregate the sum of qty & kW for each postcode in the consolidated .rds")
## [1] "aggregate the sum of qty & kW for each postcode in the consolidated .rds"
calc_SGU_PV_qty <- setNames(
          aggregate(x = read_data$PV_qty,                # Specify data column
          by = list(read_data$Postcode),      # Specify group indicator
          FUN = sum,                          # Specify function (i.e. sum)
          na.rm=TRUE),
          c("Postcode","Calc_PV_qty")
)
calc_SGU_PV_kW <- setNames(
  aggregate(x = read_data$PV_kW,                # Specify data column
            by = list(read_data$Postcode),      # Specify group indicator
            FUN = sum,                          # Specify function (i.e. sum)
            na.rm=TRUE),
  c("Postcode","Calc_PV_kW")
)
# and join them into a single table
calc_SGU_PV_qty_kW <- full_join(calc_SGU_PV_qty, calc_SGU_PV_kW, by = "Postcode")

# now get the current CER SGU csv
# rename Postcode column for joining & just keep totals columns
data_SGU = read.csv(paste(CER_data_folder,"Postcode data for small-scale installations - SGU-Solar.csv",sep = ""))
# rename the Postcode column
# better to use rename rather than relocate
data_SGU <- rename(data_SGU, Postcode = Small.Unit.Installation.Postcode)
# now just retain the columns we want
data_SGU <- data_SGU[,names(data_SGU) %in% c("Postcode",
                                 "Installations.Quantity.Total",
                                 "SGU.Rated.Output.In.kW.Total")]
# make the Postcode column - character
# ====== make as integer instead ================= ?????
data_SGU$Postcode <- as.integer(data_SGU$Postcode)

# mtcars %<>% mutate(qsec = as.integer(qsec))


# data_SGU$Postcode <- as.character(data_SGU$Postcode)
# make the columns numeric
# but make sure we strip out commas first !
data_SGU$Installations.Quantity.Total <- as.numeric(gsub(",","",
                data_SGU$Installations.Quantity.Total))
data_SGU$SGU.Rated.Output.In.kW.Total <- as.numeric(gsub(",","",
                data_SGU$SGU.Rated.Output.In.kW.Total))
# now join all
compare_SGU_data <- full_join(calc_SGU_PV_qty_kW, data_SGU, by = "Postcode")
# may have to clear out N/A ????????
# save the test results
print("save test results")
## [1] "save test results"
saveRDS(compare_SGU_data,file = paste("Test : ",Sys.time(),".rds",sep = ""))
# and look for differences
compare_SGU_data$qty_check <- compare_SGU_data$Calc_PV_qty -
                              compare_SGU_data$Installations.Quantity.Total
compare_SGU_data$kW_check <- compare_SGU_data$Calc_PV_kW -
  compare_SGU_data$SGU.Rated.Output.In.kW.Total
# can also check by viewing the dataframe & sorting compare columns
print("====== Results of checks at postcode level =====")
## [1] "====== Results of checks at postcode level ====="
print(sprintf("Sum of SGU qty checks : %f",sum(compare_SGU_data$qty_check,na.rm=TRUE)))
## [1] "Sum of SGU qty checks : -98.000000"
print(sprintf("Sum of SGU kW checks  : %f",sum(compare_SGU_data$kW_check,na.rm=TRUE)))
## [1] "Sum of SGU kW checks  : -471.728000"
#sink()
# ============== E N D   O F  S Y S T E M   T E S T ============