Last modified: 12/08/2017 05:15

Business Problem and Motivation for Project

Solar energy is becoming an increasingly important component of the electric grid. From 2011 to 2016, New York saw an increase in the total solar installed of more than 800%. New York energy policy requires at least 50% of the energy consumed in the state by 2030, up from the level of 25% in 2017, which will further increase the amount of solar installed across the state.

The New York Power Authority works with its customers to install large-scale and distributed solar, and has been a leader in advanced clean energy projects for decades. NYPA installed one of the first rooftop solar projects in New York State on its headquarters in White Plains and currently monitors the energy production on 1-minute intervals.

Solar production is highly dependent on weather, both the intensity of the solar radiation on the panels and also the cloud cover. Further, significant events such as full or partial solar eclipse, will also significantly impact the production from panels. It is critical that the impact of weather on solar production is understood by system designers and operators to ensure the increase of intermittent resources can be effectively managed to provide reliable energy supply to all customers.

Data Acquisition

Data Sources

  1. Interval (every minute) energy production data in kilowatt-hours (kWh) from a rooftop solar installation on the headquarters of the New York Power Authority’s (NYPA) headquarters at 123 Main Street, White Plains, NY. This data is loaded into NYPA’s New York Energy Manager customer data platform and can be extracted to a CSV file.
  2. Weather data from the National Oceanic Atmospheric Administration (NOAA) for White Plains providing solar incidence and cloud cover data.

Libraries

library(RCurl)
library(tidyr)
library(dplyr)
library(ggplot2)
library(DT)
library(knitr)
library(stringr)
library(reshape2)

Raw Data

Sunshine minutes in each hour

sunshine_url <- getURL("https://raw.githubusercontent.com/jillenergy/Solar-Weather/master/SunshineMinAug2017.csv")
sunshine_raw <- read.csv(text = sunshine_url)
head(sunshine_raw)
##        Station Scenario IntervalStartDt IntervalEndDt Sunshine
## 1 White Plains   ACTUAL   8/1/2017 0:00 8/1/2017 1:00        0
## 2 White Plains   ACTUAL   8/1/2017 1:00 8/1/2017 2:00        0
## 3 White Plains   ACTUAL   8/1/2017 2:00 8/1/2017 3:00        0
## 4 White Plains   ACTUAL   8/1/2017 3:00 8/1/2017 4:00        0
## 5 White Plains   ACTUAL   8/1/2017 4:00 8/1/2017 5:00        9
## 6 White Plains   ACTUAL   8/1/2017 5:00 8/1/2017 6:00       60

Cloud cover minutes in each hour

cloud_url <- getURL("https://raw.githubusercontent.com/jillenergy/Solar-Weather/master/CloudCoverMinAug2017.csv")
cloud_raw <- read.csv(text = cloud_url)
head(cloud_raw)
##        Station Scenario    IntervalStartDt      IntervalEndDt CloudCover
## 1 White Plains   ACTUAL 08/01/2017  00:00  08/01/2017  01:00           0
## 2 White Plains   ACTUAL 08/01/2017  01:00  08/01/2017  02:00           0
## 3 White Plains   ACTUAL 08/01/2017  02:00  08/01/2017  03:00           0
## 4 White Plains   ACTUAL 08/01/2017  03:00  08/01/2017  04:00           0
## 5 White Plains   ACTUAL 08/01/2017  04:00  08/01/2017  05:00           0
## 6 White Plains   ACTUAL 08/01/2017  05:00  08/01/2017  06:00           0

Solar production (kWh) in each minute from 2 solar panels located at NYPA's headquarters in WPO

solar_raw <- read.csv("https://raw.githubusercontent.com/jillenergy/Solar-Weather/master/WPOsolarPHA%2BPHB_08.01-31.2017.csv", header = TRUE, stringsAsFactors = FALSE)
head(solar_raw)
##          Device    Standard.Name            Display.Name Units   Timestamp
## 1 WPO Solar PHA em_ActEnergyDlvd Active Energy Delivered   kWh 8/1/17 0:00
## 2 WPO Solar PHA em_ActEnergyDlvd Active Energy Delivered   kWh 8/1/17 0:01
## 3 WPO Solar PHA em_ActEnergyDlvd Active Energy Delivered   kWh 8/1/17 0:02
## 4 WPO Solar PHA em_ActEnergyDlvd Active Energy Delivered   kWh 8/1/17 0:03
## 5 WPO Solar PHA em_ActEnergyDlvd Active Energy Delivered   kWh 8/1/17 0:04
## 6 WPO Solar PHA em_ActEnergyDlvd Active Energy Delivered   kWh 8/1/17 0:05
##    Timezone Value
## 1 undefined     0
## 2 undefined     0
## 3 undefined     0
## 4 undefined     0
## 5 undefined     0
## 6 undefined     0

Data Transformation and Clean-up

Solar

Separate the date and minute intervals into two columns, and create dataframe with the columns: Panel Name / Date / Time / kWh

solar_df <- as.data.frame(solar_raw, stringsAsFactors = FALSE)
solar_df$Date <- sapply(strsplit(as.character(solar_df$Timestamp), " "), "[", 1)
solar_df$Minutes <- sapply(strsplit(as.character(solar_df$Timestamp), " "), "[", 2)
new_solar <- data.frame(solar_df$Device, solar_df$Date, solar_df$Minutes, solar_df$Value, stringsAsFactors = FALSE)
colnames(new_solar) <- c("PanelName", "Date", "Minutes", "kWh")
head(new_solar)
##       PanelName   Date Minutes kWh
## 1 WPO Solar PHA 8/1/17    0:00   0
## 2 WPO Solar PHA 8/1/17    0:01   0
## 3 WPO Solar PHA 8/1/17    0:02   0
## 4 WPO Solar PHA 8/1/17    0:03   0
## 5 WPO Solar PHA 8/1/17    0:04   0
## 6 WPO Solar PHA 8/1/17    0:05   0

Make variables of the PanelNames from observations with tidyr ‘spread’ function, in order to then take the mean of both.

new_solar <- spread(new_solar, "PanelName", "kWh")
dplyr::tbl_df(new_solar)
## # A tibble: 44,640 x 4
##      Date Minutes `WPO Solar PHA` `WPO Solar PHB`
##  *  <chr>   <chr>           <chr>           <chr>
##  1 8/1/17    0:00               0               0
##  2 8/1/17    0:01               0               0
##  3 8/1/17    0:02               0               0
##  4 8/1/17    0:03               0               0
##  5 8/1/17    0:04               0               0
##  6 8/1/17    0:05               0               0
##  7 8/1/17    0:06               0               0
##  8 8/1/17    0:07               0               0
##  9 8/1/17    0:08               0               0
## 10 8/1/17    0:09               0               0
## # ... with 44,630 more rows
colnames(new_solar) <- c("Date", "Minutes", "PHA_kWh", "PHB_kWh")
head(new_solar)
##     Date Minutes PHA_kWh PHB_kWh
## 1 8/1/17    0:00       0       0
## 2 8/1/17    0:01       0       0
## 3 8/1/17    0:02       0       0
## 4 8/1/17    0:03       0       0
## 5 8/1/17    0:04       0       0
## 6 8/1/17    0:05       0       0
sapply(new_solar, class)
##        Date     Minutes     PHA_kWh     PHB_kWh 
## "character" "character" "character" "character"

Take the mean of the kWh produced by the two solar panels.

  1. Change data in columns "PHA_kWh" and "PHB_kWh" from character class to numeric.
cols.num <- c("PHA_kWh","PHB_kWh")
new_solar[cols.num] <- sapply(new_solar[cols.num],as.numeric)
sapply(new_solar, class)
##        Date     Minutes     PHA_kWh     PHB_kWh 
## "character" "character"   "numeric"   "numeric"

There were a couple of "null" entries that were trasnformed to "NAs," which posed a bit of challenge to work with later.

  1. Remove scientific notation and round to 5 digits after decimal.
new_solar <- new_solar %>% mutate_if(is.numeric, funs(round(., 5)))

Reference: https://stackoverflow.com/questions/27613310/rounding-selected-columns-of-data-table-in-r

  1. Take the sum of kWh produced by the two solar panels (accounting for the NAs in the data using na.rm=TRUE)
new_solar$SolarSum <- rowSums(new_solar[,3:4], na.rm = TRUE)
tail(new_solar)
##         Date Minutes PHA_kWh PHB_kWh SolarSum
## 44635 8/9/17    9:54 0.03795 0.05112  0.08907
## 44636 8/9/17    9:55 0.03795 0.05112  0.08907
## 44637 8/9/17    9:56 0.05284 0.05835  0.11119
## 44638 8/9/17    9:57 0.05284 0.05835  0.11119
## 44639 8/9/17    9:58 0.05284 0.05835  0.11119
## 44640 8/9/17    9:59 0.05284 0.05835  0.11119
  1. Convert "Date" to ISO 8601 standard date format.
new_solar$Date <- format(as.Date(new_solar$Date, format = "%m/%d/%y"))
tail(new_solar)
##             Date Minutes PHA_kWh PHB_kWh SolarSum
## 44635 2017-08-09    9:54 0.03795 0.05112  0.08907
## 44636 2017-08-09    9:55 0.03795 0.05112  0.08907
## 44637 2017-08-09    9:56 0.05284 0.05835  0.11119
## 44638 2017-08-09    9:57 0.05284 0.05835  0.11119
## 44639 2017-08-09    9:58 0.05284 0.05835  0.11119
## 44640 2017-08-09    9:59 0.05284 0.05835  0.11119

Sunshine

Separate the date and hour start times into two columns in order to be able to match solar production and weather and create dataframe with the columns: Date / HourBegin / SunshineMinutes. Convert "Date" to ISO 8601 standard date format.

sunshine_df <- as.data.frame(sunshine_raw)
sunshine_df$Date <- sapply(strsplit(as.character(sunshine_df$IntervalStartDt), " "), "[", 1)
sunshine_df$HourBegin <- sapply(strsplit(as.character(sunshine_df$IntervalStartDt), " "), "[", 2)
new_sunshine <- data.frame(sunshine_df$Date,sunshine_df$HourBegin,sunshine_df$Sunshine)
colnames(new_sunshine) <- c("Date", "HourBegin", "SunshineMinutes")
new_sunshine$Date <- format(as.Date(new_sunshine$Date, format = "%m/%d/%Y"))
head(new_sunshine)
##         Date HourBegin SunshineMinutes
## 1 2017-08-01      0:00               0
## 2 2017-08-01      1:00               0
## 3 2017-08-01      2:00               0
## 4 2017-08-01      3:00               0
## 5 2017-08-01      4:00               9
## 6 2017-08-01      5:00              60

Create an alternate look at sunshine as a percentage of the hour

new_sunshine$PercentSunshine <- round(((new_sunshine$SunshineMinutes/60) * 100), digits = 0)
head(new_sunshine)
##         Date HourBegin SunshineMinutes PercentSunshine
## 1 2017-08-01      0:00               0               0
## 2 2017-08-01      1:00               0               0
## 3 2017-08-01      2:00               0               0
## 4 2017-08-01      3:00               0               0
## 5 2017-08-01      4:00               9              15
## 6 2017-08-01      5:00              60             100

Cloud Cover

Separate the date and hour start times into two columns in order to be able to match solar production and weather and create dataframe with the columns: Date / HourBegin / PercentCloudCover. Convert "Date" to ISO 8601 standard date format.

cloud_df <- as.data.frame(cloud_raw)
cloud_df$Date <- sapply(strsplit(as.character(cloud_df$IntervalStartDt), " "), "[", 1)
cloud_df$HourBegin <- sapply(strsplit(as.character(cloud_df$IntervalStartDt), "  "), "[", 2)
new_cloud <- data.frame(cloud_df$Date,cloud_df$HourBegin,cloud_df$CloudCover)
colnames(new_cloud) <- c("Date", "HourBegin", "PercentCloudCover")
new_cloud$Date <- format(as.Date(new_cloud$Date, format = "%m/%d/%Y"))
head(new_cloud)
##         Date HourBegin PercentCloudCover
## 1 2017-08-01    00:00                  0
## 2 2017-08-01    01:00                  0
## 3 2017-08-01    02:00                  0
## 4 2017-08-01    03:00                  0
## 5 2017-08-01    04:00                  0
## 6 2017-08-01    05:00                  0

Data Analysis

Aggregate the Solar, Sunshine and Cloud Cover data set into one point for each day in the month to see if there is a correlation between sunshine and cloud cover and solar production.

onedate <- new_cloud[c(TRUE,rep(FALSE,23)), ]
head(onedate)
##           Date HourBegin PercentCloudCover
## 1   2017-08-01    00:00                  0
## 25  2017-08-02    00:00                  0
## 49  2017-08-03    00:00                 30
## 73  2017-08-04    00:00                 30
## 97  2017-08-05    00:00                 70
## 121 2017-08-06    00:00                  0

Daily sunshine percentage

SunshineDaily <- round(colMeans(matrix(new_sunshine$PercentSunshine, nrow=24)), digits=1)
SunshineDaily_df <- data.frame(onedate$Date,SunshineDaily)
colnames(SunshineDaily_df) <- c("Date","PercentSunshine")
head(SunshineDaily_df) 
##         Date PercentSunshine
## 1 2017-08-01            46.2
## 2 2017-08-02            38.6
## 3 2017-08-03            43.1
## 4 2017-08-04            29.7
## 5 2017-08-05            23.6
## 6 2017-08-06            42.9

Daily cloud cover percentage

CloudCoverDaily <- round(colMeans(matrix(new_cloud$PercentCloudCover, nrow=24)), digits=1)
CloudCoverDaily_df <- data.frame(onedate$Date,CloudCoverDaily)
colnames(CloudCoverDaily_df) <- c("Date","PercentCloudCover")
head(CloudCoverDaily_df) 
##         Date PercentCloudCover
## 1 2017-08-01              12.5
## 2 2017-08-02              25.0
## 3 2017-08-03              20.0
## 4 2017-08-04              43.3
## 5 2017-08-05              50.0
## 6 2017-08-06              13.8

Daily solar production total

SolarDaily_df <- aggregate(new_solar$SolarSum, list(Day = new_solar$Date), sum, na.rm = TRUE)
colnames(SolarDaily_df) <- c("Date", "kWhProduced")
SolarDaily_df[,-1] <-round(SolarDaily_df[,-1],1)
head(SolarDaily_df)
##         Date kWhProduced
## 1 2017-08-01        63.3
## 2 2017-08-02        27.2
## 3 2017-08-03        56.6
## 4 2017-08-04        55.3
## 5 2017-08-05        33.2
## 6 2017-08-06        50.7

Combine data into one dataframe

AllData_df <- data.frame("Date" = SunshineDaily_df$Date, 
                  "PercentSunshine" = SunshineDaily_df$PercentSunshine, 
                  "PercentCloudCover" = CloudCoverDaily_df$PercentCloudCover,
                  "kWhProduced" = SolarDaily_df$kWhProduced)
head(AllData_df)
##         Date PercentSunshine PercentCloudCover kWhProduced
## 1 2017-08-01            46.2              12.5        63.3
## 2 2017-08-02            38.6              25.0        27.2
## 3 2017-08-03            43.1              20.0        56.6
## 4 2017-08-04            29.7              43.3        55.3
## 5 2017-08-05            23.6              50.0        33.2
## 6 2017-08-06            42.9              13.8        50.7

Combine Sunshine Percentage and Solar Production into one dataframe for future plotting

SunshineSolar_df <- data.frame(SunshineDaily_df$Date,
                  "PercentSunshine" = SunshineDaily_df$PercentSunshine, 
                  "kWhProduced" = SolarDaily_df$kWhProduced)
colnames(SunshineSolar_df) <- c("Date","PercentSunshine","kWhProduced")
head(SunshineSolar_df)
##         Date PercentSunshine kWhProduced
## 1 2017-08-01            46.2        63.3
## 2 2017-08-02            38.6        27.2
## 3 2017-08-03            43.1        56.6
## 4 2017-08-04            29.7        55.3
## 5 2017-08-05            23.6        33.2
## 6 2017-08-06            42.9        50.7

Combine Sunshine Percentage and Cloud Percentage into one dataframe for future plotting

SunshineCloud_df <- data.frame(SunshineDaily_df$Date,
                  "PercentSunshine" = SunshineDaily_df$PercentSunshine, 
                  "PercentCloudCover" = CloudCoverDaily_df$PercentCloudCover)
colnames(SunshineCloud_df) <- c("Date","PercentSunshine","PercentCloudCover")
head(SunshineCloud_df)
##         Date PercentSunshine PercentCloudCover
## 1 2017-08-01            46.2              12.5
## 2 2017-08-02            38.6              25.0
## 3 2017-08-03            43.1              20.0
## 4 2017-08-04            29.7              43.3
## 5 2017-08-05            23.6              50.0
## 6 2017-08-06            42.9              13.8

For multi-variable plotting, use melt to create one column with the date and one column with all of the data for plotting

SunshineCloudMelt_df <- melt(SunshineCloud_df, id.vars = 'Date')
head(SunshineCloudMelt_df)
##         Date        variable value
## 1 2017-08-01 PercentSunshine  46.2
## 2 2017-08-02 PercentSunshine  38.6
## 3 2017-08-03 PercentSunshine  43.1
## 4 2017-08-04 PercentSunshine  29.7
## 5 2017-08-05 PercentSunshine  23.6
## 6 2017-08-06 PercentSunshine  42.9

Visualize the data in graphs

Visualize the Percentage of Cloud Cover Data as an average per day

CloudPlot <- ggplot(data=AllData_df, aes(x=Date, y=PercentCloudCover, group=1)) +
  geom_point(color="#0066ff", size=2) +
  ggtitle("Percent Cloud Cover by Hour in White Plains, NY for Each Day August 2017") +
  labs(x="Date in August 2017", y="Average Percent of Cloud Cover per Day") +
  theme(axis.title.y = element_text(size=12, color="#666666")) +
  theme(legend.position = "none",  
        axis.text.x = element_text(angle = 65, hjust = 1)) +
  theme(axis.text = element_text(size=12, family="Trebuchet MS")) +
  theme(plot.title = element_text(size=12, family="Trebuchet MS", face="bold", hjust=0, color="#666666"))
CloudPlot

Visualize the Percentage of Sunshine in each hour as an average per day

SunshinePlot <- ggplot(data=SunshineDaily_df, aes(x=Date, y=PercentSunshine, group=1)) +
  geom_point(color="blue", size=2) +
  ggtitle("Average Percentage of Sunshine by Hour in White Plains, NY for Each Day August 2017") +
  labs(x="Date in August 2017", y="Average Percentage of Sunshine per Hour each Day") +
  theme(axis.title.y = element_text(size=12, color="#666666")) +
  theme(legend.position = "none",  
        axis.text.x = element_text(angle = 65, hjust = 1)) +
  theme(axis.text = element_text(size=12, family="Trebuchet MS")) +
  theme(plot.title = element_text(size=12, family="Trebuchet MS", face="bold", hjust=0, color="#666666"))
SunshinePlot

Visualize the kWh solar produced each day

SolarPlot <- ggplot(data=AllData_df, aes(Date, y=kWhProduced)) +
  geom_point(color="red", size=2) +
  ggtitle("Total kWh Produced in White Plains, NY for Each Day August 2017") +
  labs(x="Date in August 2017", y="Total Solar Produced in kWh each Day") +
  theme(axis.title.y = element_text(size=12, color="#666666")) +
  theme(legend.position = "none",  
        axis.text.x = element_text(angle = 65, hjust = 1)) +
  theme(axis.text = element_text(size=12, family="Trebuchet MS")) +
  theme(plot.title = element_text(size=12, family="Trebuchet MS", face="bold", hjust=0, color="#666666"))
SolarPlot

Visualize Sunshine Percent and Cloud Cover Percent graphed together on one graph as bars

SunCloudPlot <- ggplot(SunshineCloudMelt_df, aes(x=Date, y = value, fill = variable)) +
  geom_bar(stat='identity') +
  ggtitle("Percentage of Cloud Cover and Sunshine Minutes in White Plains, NY for Each Day August 2017") +
  labs(x="Date in August 2017", y="Percentage") +
  theme(axis.title.y = element_text(size=12, color="#666666")) +
  theme(legend.position = 'bottom',  
        axis.text.x = element_text(angle = 65, hjust = 1)) +
  theme(axis.text = element_text(size=12, family="Trebuchet MS")) +
  theme(plot.title = element_text(size=12, family="Trebuchet MS", face="bold", hjust=0, color="#666666"))
SunCloudPlot

Solar production and sunshine percentage graphed togather (line)

SunshineSolarPlot <- ggplot(SunshineSolar_df, aes(Date, group=1)) + 
  geom_line(aes(y = kWhProduced, colour = "SolarProduced")) +
  geom_line(aes(y = PercentSunshine, colour = "PercentSunshine")) + 
  ggtitle("Total Solar Produced and Average Percent Sunshine in White Plains, NY for Each Day August 2017") +
  labs(x="Date in August 2017", y="Percentage") +
  theme(axis.title.y = element_text(size=12, color="#666666")) +
  theme(legend.position = 'bottom',  
        axis.text.x = element_text(angle = 65, hjust = 1)) +
  theme(axis.text = element_text(size=12, family="Trebuchet MS")) +
  theme(plot.title = element_text(size=12, family="Trebuchet MS", face="bold", hjust=0, color="#666666"))
SunshineSolarPlot

All data graphed togather as lines

AllDataPlot <- ggplot(AllData_df, aes(Date, group=1)) + 
  geom_line(aes(y = kWhProduced, colour = "SolarProduced")) +
  geom_line(aes(y = PercentCloudCover, colour = "PercentCloudCover")) +
  geom_line(aes(y = PercentSunshine, colour = "PercentSunshine")) + 
  ggtitle("Total Solar Produced and Average Percent Sunshine and Cloud Cover in White Plains, NY for Each Day August 2017") +
  labs(x="Date in August 2017", y="Percentage") +
  theme(axis.title.y = element_text(size=12, color="#666666")) +
  theme(legend.position = 'bottom',  
        axis.text.x = element_text(angle = 65, hjust = 1)) +
  theme(axis.text = element_text(size=12, family="Trebuchet MS")) +
  theme(plot.title = element_text(size=12, family="Trebuchet MS", face="bold", hjust=0, color="#666666"))
  
AllDataPlot

Conclusions

We expected to see the solar production data from the two panels at NYPA's White Plain office to be highly dependent on weather, both the intensity of the solar radiation on the panels represented in this data set as sunshine minutes and also the cloud cover percentage.

SUNSHINE MINUTES ON SOLAR PRODUCTION We observed a strong correlation between solar production and the percentage of sunshine minutes in an hour throughout the month. As sunshine is high as it was August 9 and 10, there is alos a high produciton of solar. There is a spreading of the lines in the last third of the month when looking at solar production and sunshine, but the corrleation still exists, as demonstrated on August 29 with a dip in production on the same day as a dip in sunshine minutes.

CLOUD COVER PERCENTAGE ON SOLAR PRODUCTION Similarly, there is an inverse correlation between cloud cover and solar produced on most days. For example, August 15 the cloud cover is very high and the solar production and sunshine minutes are both low as one would expect. There are several observations where the correlation does not hold. For example, the dip in cloud cover on August 17 does not result in a spike of either sunshine or solar production, suggestion that the intermittent and variable density nature of cloud cover makes it a imperfect predictor of solar production.

The better engineers and electric grid system operators are able to understand the impact of weather on solar production, the easier it will be to deploy and manage more solar resources throughout the system.