| Last modified: 12/08/2017 05:15 |
Solar energy is becoming an increasingly important component of the electric grid. From 2011 to 2016, New York saw an increase in the total solar installed of more than 800%. New York energy policy requires at least 50% of the energy consumed in the state by 2030, up from the level of 25% in 2017, which will further increase the amount of solar installed across the state.
The New York Power Authority works with its customers to install large-scale and distributed solar, and has been a leader in advanced clean energy projects for decades. NYPA installed one of the first rooftop solar projects in New York State on its headquarters in White Plains and currently monitors the energy production on 1-minute intervals.
Solar production is highly dependent on weather, both the intensity of the solar radiation on the panels and also the cloud cover. Further, significant events such as full or partial solar eclipse, will also significantly impact the production from panels. It is critical that the impact of weather on solar production is understood by system designers and operators to ensure the increase of intermittent resources can be effectively managed to provide reliable energy supply to all customers.
library(RCurl)
library(tidyr)
library(dplyr)
library(ggplot2)
library(DT)
library(knitr)
library(stringr)
Sunshine minutes in each hour
sunshine_url <- getURL("https://raw.githubusercontent.com/jillenergy/Solar-Weather/master/SunshineMinAug2017.csv")
sunshine_raw <- read.csv(text = sunshine_url)
head(sunshine_raw)
## Station Scenario IntervalStartDt IntervalEndDt Sunshine
## 1 White Plains ACTUAL 8/1/2017 0:00 8/1/2017 1:00 0
## 2 White Plains ACTUAL 8/1/2017 1:00 8/1/2017 2:00 0
## 3 White Plains ACTUAL 8/1/2017 2:00 8/1/2017 3:00 0
## 4 White Plains ACTUAL 8/1/2017 3:00 8/1/2017 4:00 0
## 5 White Plains ACTUAL 8/1/2017 4:00 8/1/2017 5:00 9
## 6 White Plains ACTUAL 8/1/2017 5:00 8/1/2017 6:00 60
Cloud cover minutes in each hour
cloud_url <- getURL("https://raw.githubusercontent.com/jillenergy/Solar-Weather/master/CloudCoverMinAug2017.csv")
cloud_raw <- read.csv(text = cloud_url)
head(cloud_raw)
## Station Scenario IntervalStartDt IntervalEndDt CloudCover
## 1 White Plains ACTUAL 08/01/2017 00:00 08/01/2017 01:00 0
## 2 White Plains ACTUAL 08/01/2017 01:00 08/01/2017 02:00 0
## 3 White Plains ACTUAL 08/01/2017 02:00 08/01/2017 03:00 0
## 4 White Plains ACTUAL 08/01/2017 03:00 08/01/2017 04:00 0
## 5 White Plains ACTUAL 08/01/2017 04:00 08/01/2017 05:00 0
## 6 White Plains ACTUAL 08/01/2017 05:00 08/01/2017 06:00 0
Solar production (kWh) in each minute from 2 solar panels located at NYPA's headquarters in WPO
solar_raw <- read.csv("/Users/emiliembolduc/CUNY Data 607/Final Project/Data/WPOsolarPHA+PHB_08.01-31.2017.csv", header = TRUE, stringsAsFactors = FALSE)
head(solar_raw)
## Device Standard.Name Display.Name Units Timestamp
## 1 WPO Solar PHA em_ActEnergyDlvd Active Energy Delivered kWh 8/1/17 0:00
## 2 WPO Solar PHA em_ActEnergyDlvd Active Energy Delivered kWh 8/1/17 0:01
## 3 WPO Solar PHA em_ActEnergyDlvd Active Energy Delivered kWh 8/1/17 0:02
## 4 WPO Solar PHA em_ActEnergyDlvd Active Energy Delivered kWh 8/1/17 0:03
## 5 WPO Solar PHA em_ActEnergyDlvd Active Energy Delivered kWh 8/1/17 0:04
## 6 WPO Solar PHA em_ActEnergyDlvd Active Energy Delivered kWh 8/1/17 0:05
## Timezone Value
## 1 undefined 0
## 2 undefined 0
## 3 undefined 0
## 4 undefined 0
## 5 undefined 0
## 6 undefined 0
Separate the date and minute intervals into two columns, and create dataframe with the columns: Panel Name / Date / Time / kWh
solar_df <- as.data.frame(solar_raw, stringsAsFactors = FALSE)
solar_df$Date <- sapply(strsplit(as.character(solar_df$Timestamp), " "), "[", 1)
solar_df$Minutes <- sapply(strsplit(as.character(solar_df$Timestamp), " "), "[", 2)
new_solar <- data.frame(solar_df$Device, solar_df$Date, solar_df$Minutes, solar_df$Value, stringsAsFactors = FALSE)
colnames(new_solar) <- c("PanelName", "Date", "Minutes", "kWh")
head(new_solar)
## PanelName Date Minutes kWh
## 1 WPO Solar PHA 8/1/17 0:00 0
## 2 WPO Solar PHA 8/1/17 0:01 0
## 3 WPO Solar PHA 8/1/17 0:02 0
## 4 WPO Solar PHA 8/1/17 0:03 0
## 5 WPO Solar PHA 8/1/17 0:04 0
## 6 WPO Solar PHA 8/1/17 0:05 0
Make variables of the PanelNames from observations with tidyr ‘spread’ function, in order to then take the mean of both.
new_solar <- spread(new_solar, "PanelName", "kWh")
dplyr::tbl_df(new_solar)
## # A tibble: 44,640 x 4
## Date Minutes `WPO Solar PHA` `WPO Solar PHB`
## * <chr> <chr> <chr> <chr>
## 1 8/1/17 0:00 0 0
## 2 8/1/17 0:01 0 0
## 3 8/1/17 0:02 0 0
## 4 8/1/17 0:03 0 0
## 5 8/1/17 0:04 0 0
## 6 8/1/17 0:05 0 0
## 7 8/1/17 0:06 0 0
## 8 8/1/17 0:07 0 0
## 9 8/1/17 0:08 0 0
## 10 8/1/17 0:09 0 0
## # ... with 44,630 more rows
colnames(new_solar) <- c("Date", "Minutes", "PHA_kWh", "PHB_kWh")
head(new_solar)
## Date Minutes PHA_kWh PHB_kWh
## 1 8/1/17 0:00 0 0
## 2 8/1/17 0:01 0 0
## 3 8/1/17 0:02 0 0
## 4 8/1/17 0:03 0 0
## 5 8/1/17 0:04 0 0
## 6 8/1/17 0:05 0 0
sapply(new_solar, class)
## Date Minutes PHA_kWh PHB_kWh
## "character" "character" "character" "character"
Take the mean of the kWh produced by the two solar panels.
cols.num <- c("PHA_kWh","PHB_kWh")
new_solar[cols.num] <- sapply(new_solar[cols.num],as.numeric)
sapply(new_solar, class)
## Date Minutes PHA_kWh PHB_kWh
## "character" "character" "numeric" "numeric"
There were a couple of "null" entries that were trasnformed to "NAs," which posed a bit of challenge to work with later.
new_solar <- new_solar %>% mutate_if(is.numeric, funs(round(., 5)))
Reference: https://stackoverflow.com/questions/27613310/rounding-selected-columns-of-data-table-in-r
new_solar$SolarSum <- rowSums(new_solar[,3:4], na.rm = TRUE)
tail(new_solar)
## Date Minutes PHA_kWh PHB_kWh SolarSum
## 44635 8/9/17 9:54 0.03795 0.05112 0.08907
## 44636 8/9/17 9:55 0.03795 0.05112 0.08907
## 44637 8/9/17 9:56 0.05284 0.05835 0.11119
## 44638 8/9/17 9:57 0.05284 0.05835 0.11119
## 44639 8/9/17 9:58 0.05284 0.05835 0.11119
## 44640 8/9/17 9:59 0.05284 0.05835 0.11119
new_solar$Date <- format(as.Date(new_solar$Date, format = "%m/%d/%y"))
tail(new_solar)
## Date Minutes PHA_kWh PHB_kWh SolarSum
## 44635 2017-08-09 9:54 0.03795 0.05112 0.08907
## 44636 2017-08-09 9:55 0.03795 0.05112 0.08907
## 44637 2017-08-09 9:56 0.05284 0.05835 0.11119
## 44638 2017-08-09 9:57 0.05284 0.05835 0.11119
## 44639 2017-08-09 9:58 0.05284 0.05835 0.11119
## 44640 2017-08-09 9:59 0.05284 0.05835 0.11119
Separate the date and hour start times into two columns in order to be able to match solar production and weather and create dataframe with the columns: Date / HourBegin / SunshineMinutes. Convert "Date" to ISO 8601 standard date format.
sunshine_df <- as.data.frame(sunshine_raw)
sunshine_df$Date <- sapply(strsplit(as.character(sunshine_df$IntervalStartDt), " "), "[", 1)
sunshine_df$HourBegin <- sapply(strsplit(as.character(sunshine_df$IntervalStartDt), " "), "[", 2)
new_sunshine <- data.frame(sunshine_df$Date,sunshine_df$HourBegin,sunshine_df$Sunshine)
colnames(new_sunshine) <- c("Date", "HourBegin", "SunshineMinutes")
new_sunshine$Date <- format(as.Date(new_sunshine$Date, format = "%m/%d/%Y"))
head(new_sunshine)
## Date HourBegin SunshineMinutes
## 1 2017-08-01 0:00 0
## 2 2017-08-01 1:00 0
## 3 2017-08-01 2:00 0
## 4 2017-08-01 3:00 0
## 5 2017-08-01 4:00 9
## 6 2017-08-01 5:00 60
Separate the date and hour start times into two columns in order to be able to match solar production and weather and create dataframe with the columns: Date / HourBegin / PercentCloudCover. Convert "Date" to ISO 8601 standard date format.
cloud_df <- as.data.frame(cloud_raw)
cloud_df$Date <- sapply(strsplit(as.character(cloud_df$IntervalStartDt), " "), "[", 1)
cloud_df$HourBegin <- sapply(strsplit(as.character(cloud_df$IntervalStartDt), " "), "[", 2)
new_cloud <- data.frame(cloud_df$Date,cloud_df$HourBegin,cloud_df$CloudCover)
colnames(new_cloud) <- c("Date", "HourBegin", "PercentCloudCover")
new_cloud$Date <- format(as.Date(new_cloud$Date, format = "%m/%d/%Y"))
head(new_cloud)
## Date HourBegin PercentCloudCover
## 1 2017-08-01 00:00 0
## 2 2017-08-01 01:00 0
## 3 2017-08-01 02:00 0
## 4 2017-08-01 03:00 0
## 5 2017-08-01 04:00 0
## 6 2017-08-01 05:00 0
Aggregate the Solar, Sunshine and Cloud Cover data set into one point for each day in the month to see if there is a correlation between sunshine and cloud cover and solar production.
onedate <- new_cloud[c(TRUE,rep(FALSE,23)), ]
head(onedate)
## Date HourBegin PercentCloudCover
## 1 2017-08-01 00:00 0
## 25 2017-08-02 00:00 0
## 49 2017-08-03 00:00 30
## 73 2017-08-04 00:00 30
## 97 2017-08-05 00:00 70
## 121 2017-08-06 00:00 0
SunshineDaily <- round(colMeans(matrix(new_sunshine$SunshineMinutes, nrow=24)), digits=0)
SunshineDaily_df <- data.frame(onedate$Date,SunshineDaily)
colnames(SunshineDaily_df) <- c("Date","SunshineMinutes")
SunshineDaily_df
## Date SunshineMinutes
## 1 2017-08-01 28
## 2 2017-08-02 23
## 3 2017-08-03 26
## 4 2017-08-04 18
## 5 2017-08-05 14
## 6 2017-08-06 26
## 7 2017-08-07 8
## 8 2017-08-08 14
## 9 2017-08-09 29
## 10 2017-08-10 29
## 11 2017-08-11 22
## 12 2017-08-12 19
## 13 2017-08-13 23
## 14 2017-08-14 22
## 15 2017-08-15 8
## 16 2017-08-16 22
## 17 2017-08-17 21
## 18 2017-08-18 4
## 19 2017-08-19 21
## 20 2017-08-20 25
## 21 2017-08-21 29
## 22 2017-08-22 24
## 23 2017-08-23 21
## 24 2017-08-24 26
## 25 2017-08-25 20
## 26 2017-08-26 25
## 27 2017-08-27 27
## 28 2017-08-28 23
## 29 2017-08-29 14
## 30 2017-08-30 19
## 31 2017-08-31 24
CloudCoverDaily <- round(colMeans(matrix(new_cloud$PercentCloudCover, nrow=24)), digits=0)
CloudCoverDaily_df <- data.frame(onedate$Date,CloudCoverDaily)
colnames(CloudCoverDaily_df) <- c("Date","PercentCloudCover")
CloudCoverDaily_df
## Date PercentCloudCover
## 1 2017-08-01 12
## 2 2017-08-02 25
## 3 2017-08-03 20
## 4 2017-08-04 43
## 5 2017-08-05 50
## 6 2017-08-06 14
## 7 2017-08-07 67
## 8 2017-08-08 55
## 9 2017-08-09 12
## 10 2017-08-10 9
## 11 2017-08-11 38
## 12 2017-08-12 55
## 13 2017-08-13 35
## 14 2017-08-14 25
## 15 2017-08-15 77
## 16 2017-08-16 39
## 17 2017-08-17 26
## 18 2017-08-18 75
## 19 2017-08-19 30
## 20 2017-08-20 19
## 21 2017-08-21 9
## 22 2017-08-22 27
## 23 2017-08-23 38
## 24 2017-08-24 18
## 25 2017-08-25 41
## 26 2017-08-26 14
## 27 2017-08-27 11
## 28 2017-08-28 20
## 29 2017-08-29 50
## 30 2017-08-30 43
## 31 2017-08-31 16
SolarDaily_df <- aggregate(new_solar$SolarSum, list(Day = new_solar$Date), sum, na.rm = TRUE)
colnames(SolarDaily_df) <- c("Date", "kWh Produced")
SolarDaily_df
## Date kWh Produced
## 1 2017-08-01 63.30620
## 2 2017-08-02 27.23490
## 3 2017-08-03 56.57205
## 4 2017-08-04 55.31905
## 5 2017-08-05 33.19475
## 6 2017-08-06 50.65345
## 7 2017-08-07 4.78915
## 8 2017-08-08 37.15740
## 9 2017-08-09 60.30955
## 10 2017-08-10 56.19985
## 11 2017-08-11 44.07650
## 12 2017-08-12 37.47890
## 13 2017-08-13 68.29645
## 14 2017-08-14 39.45370
## 15 2017-08-15 10.70985
## 16 2017-08-16 41.72928
## 17 2017-08-17 6.58500
## 18 2017-08-18 2.60136
## 19 2017-08-19 11.71333
## 20 2017-08-20 12.65800
## 21 2017-08-21 9.00394
## 22 2017-08-22 10.37621
## 23 2017-08-23 11.34888
## 24 2017-08-24 12.59796
## 25 2017-08-25 9.50188
## 26 2017-08-26 13.09008
## 27 2017-08-27 12.57221
## 28 2017-08-28 10.87120
## 29 2017-08-29 1.71491
## 30 2017-08-30 12.88457
## 31 2017-08-31 9.46295
We expected to see the solar production data from the 2 panels at NYPA's White Plain office to be highly dependent on weather, both the intensity of the solar radiation on the panels and also the cloud cover.
We did see on August 21, the day of the partial solar eclipse [if we can get this in a graph - that would be awesome], a significantly impact the production from panels.
FROM INTRO: It is critical that the impact of weather on solar production is understood by system designers and operators to ensure the increase of intermittent resources can be effectively managed to provide reliable energy supply to all customers.