Last modified: 12/08/2017 05:15

Business Problem and Motivation for Project

Solar energy is becoming an increasingly important component of the electric grid. From 2011 to 2016, New York saw an increase in the total solar installed of more than 800%. New York energy policy requires at least 50% of the energy consumed in the state by 2030, up from the level of 25% in 2017, which will further increase the amount of solar installed across the state.

The New York Power Authority works with its customers to install large-scale and distributed solar, and has been a leader in advanced clean energy projects for decades. NYPA installed one of the first rooftop solar projects in New York State on its headquarters in White Plains and currently monitors the energy production on 1-minute intervals.

Solar production is highly dependent on weather, both the intensity of the solar radiation on the panels and also the cloud cover. Further, significant events such as full or partial solar eclipse, will also significantly impact the production from panels. It is critical that the impact of weather on solar production is understood by system designers and operators to ensure the increase of intermittent resources can be effectively managed to provide reliable energy supply to all customers.

Data Acquisition

Data Sources

  1. Interval (every minute) energy production data in kilowatt-hours (kWh) from a rooftop solar installation on the headquarters of the New York Power Authority’s (NYPA) headquarters at 123 Main Street, White Plains, NY. This data is loaded into NYPA’s New York Energy Manager customer data platform and can be extracted to a CSV file.
  2. Weather data from the National Oceanic Atmospheric Administration (NOAA) for White Plains providing solar incidence and cloud cover data.

Libraries

library(RCurl)
library(tidyr)
library(dplyr)
library(ggplot2)
library(DT)
library(knitr)
library(stringr)

Raw Data

Sunshine minutes in each hour

sunshine_url <- getURL("https://raw.githubusercontent.com/jillenergy/Solar-Weather/master/SunshineMinAug2017.csv")
sunshine_raw <- read.csv(text = sunshine_url)
head(sunshine_raw)
##        Station Scenario IntervalStartDt IntervalEndDt Sunshine
## 1 White Plains   ACTUAL   8/1/2017 0:00 8/1/2017 1:00        0
## 2 White Plains   ACTUAL   8/1/2017 1:00 8/1/2017 2:00        0
## 3 White Plains   ACTUAL   8/1/2017 2:00 8/1/2017 3:00        0
## 4 White Plains   ACTUAL   8/1/2017 3:00 8/1/2017 4:00        0
## 5 White Plains   ACTUAL   8/1/2017 4:00 8/1/2017 5:00        9
## 6 White Plains   ACTUAL   8/1/2017 5:00 8/1/2017 6:00       60

Cloud cover minutes in each hour

cloud_url <- getURL("https://raw.githubusercontent.com/jillenergy/Solar-Weather/master/CloudCoverMinAug2017.csv")
cloud_raw <- read.csv(text = cloud_url)
head(cloud_raw)
##        Station Scenario    IntervalStartDt      IntervalEndDt CloudCover
## 1 White Plains   ACTUAL 08/01/2017  00:00  08/01/2017  01:00           0
## 2 White Plains   ACTUAL 08/01/2017  01:00  08/01/2017  02:00           0
## 3 White Plains   ACTUAL 08/01/2017  02:00  08/01/2017  03:00           0
## 4 White Plains   ACTUAL 08/01/2017  03:00  08/01/2017  04:00           0
## 5 White Plains   ACTUAL 08/01/2017  04:00  08/01/2017  05:00           0
## 6 White Plains   ACTUAL 08/01/2017  05:00  08/01/2017  06:00           0

Solar production (kWh) in each minute from 2 solar panels located at NYPA's headquarters in WPO

solar_raw <- read.csv("/Users/emiliembolduc/CUNY Data 607/Final Project/Data/WPOsolarPHA+PHB_08.01-31.2017.csv", header = TRUE, stringsAsFactors = FALSE)
head(solar_raw)
##          Device    Standard.Name            Display.Name Units   Timestamp
## 1 WPO Solar PHA em_ActEnergyDlvd Active Energy Delivered   kWh 8/1/17 0:00
## 2 WPO Solar PHA em_ActEnergyDlvd Active Energy Delivered   kWh 8/1/17 0:01
## 3 WPO Solar PHA em_ActEnergyDlvd Active Energy Delivered   kWh 8/1/17 0:02
## 4 WPO Solar PHA em_ActEnergyDlvd Active Energy Delivered   kWh 8/1/17 0:03
## 5 WPO Solar PHA em_ActEnergyDlvd Active Energy Delivered   kWh 8/1/17 0:04
## 6 WPO Solar PHA em_ActEnergyDlvd Active Energy Delivered   kWh 8/1/17 0:05
##    Timezone Value
## 1 undefined     0
## 2 undefined     0
## 3 undefined     0
## 4 undefined     0
## 5 undefined     0
## 6 undefined     0

Data Transformation and Clean-up

Solar

Separate the date and minute intervals into two columns, and create dataframe with the columns: Panel Name / Date / Time / kWh

solar_df <- as.data.frame(solar_raw, stringsAsFactors = FALSE)
solar_df$Date <- sapply(strsplit(as.character(solar_df$Timestamp), " "), "[", 1)
solar_df$Minutes <- sapply(strsplit(as.character(solar_df$Timestamp), " "), "[", 2)
new_solar <- data.frame(solar_df$Device, solar_df$Date, solar_df$Minutes, solar_df$Value, stringsAsFactors = FALSE)
colnames(new_solar) <- c("PanelName", "Date", "Minutes", "kWh")
head(new_solar)
##       PanelName   Date Minutes kWh
## 1 WPO Solar PHA 8/1/17    0:00   0
## 2 WPO Solar PHA 8/1/17    0:01   0
## 3 WPO Solar PHA 8/1/17    0:02   0
## 4 WPO Solar PHA 8/1/17    0:03   0
## 5 WPO Solar PHA 8/1/17    0:04   0
## 6 WPO Solar PHA 8/1/17    0:05   0

Make variables of the PanelNames from observations with tidyr ‘spread’ function, in order to then take the mean of both.

new_solar <- spread(new_solar, "PanelName", "kWh")
dplyr::tbl_df(new_solar)
## # A tibble: 44,640 x 4
##      Date Minutes `WPO Solar PHA` `WPO Solar PHB`
##  *  <chr>   <chr>           <chr>           <chr>
##  1 8/1/17    0:00               0               0
##  2 8/1/17    0:01               0               0
##  3 8/1/17    0:02               0               0
##  4 8/1/17    0:03               0               0
##  5 8/1/17    0:04               0               0
##  6 8/1/17    0:05               0               0
##  7 8/1/17    0:06               0               0
##  8 8/1/17    0:07               0               0
##  9 8/1/17    0:08               0               0
## 10 8/1/17    0:09               0               0
## # ... with 44,630 more rows
colnames(new_solar) <- c("Date", "Minutes", "PHA_kWh", "PHB_kWh")
head(new_solar)
##     Date Minutes PHA_kWh PHB_kWh
## 1 8/1/17    0:00       0       0
## 2 8/1/17    0:01       0       0
## 3 8/1/17    0:02       0       0
## 4 8/1/17    0:03       0       0
## 5 8/1/17    0:04       0       0
## 6 8/1/17    0:05       0       0
sapply(new_solar, class)
##        Date     Minutes     PHA_kWh     PHB_kWh 
## "character" "character" "character" "character"

Take the mean of the kWh produced by the two solar panels.

  1. Change data in columns "PHA_kWh" and "PHB_kWh" from character class to numeric.
cols.num <- c("PHA_kWh","PHB_kWh")
new_solar[cols.num] <- sapply(new_solar[cols.num],as.numeric)
sapply(new_solar, class)
##        Date     Minutes     PHA_kWh     PHB_kWh 
## "character" "character"   "numeric"   "numeric"

There were a couple of "null" entries that were trasnformed to "NAs," which posed a bit of challenge to work with later.

  1. Remove scientific notation and round to 5 digits after decimal.
new_solar <- new_solar %>% mutate_if(is.numeric, funs(round(., 5)))

Reference: https://stackoverflow.com/questions/27613310/rounding-selected-columns-of-data-table-in-r

  1. Take the sum of kWh produced by the two solar panels (accounting for the NAs in the data using na.rm=TRUE)
new_solar$SolarSum <- rowSums(new_solar[,3:4], na.rm = TRUE)
tail(new_solar)
##         Date Minutes PHA_kWh PHB_kWh SolarSum
## 44635 8/9/17    9:54 0.03795 0.05112  0.08907
## 44636 8/9/17    9:55 0.03795 0.05112  0.08907
## 44637 8/9/17    9:56 0.05284 0.05835  0.11119
## 44638 8/9/17    9:57 0.05284 0.05835  0.11119
## 44639 8/9/17    9:58 0.05284 0.05835  0.11119
## 44640 8/9/17    9:59 0.05284 0.05835  0.11119
  1. Convert "Date" to ISO 8601 standard date format.
new_solar$Date <- format(as.Date(new_solar$Date, format = "%m/%d/%y"))
tail(new_solar)
##             Date Minutes PHA_kWh PHB_kWh SolarSum
## 44635 2017-08-09    9:54 0.03795 0.05112  0.08907
## 44636 2017-08-09    9:55 0.03795 0.05112  0.08907
## 44637 2017-08-09    9:56 0.05284 0.05835  0.11119
## 44638 2017-08-09    9:57 0.05284 0.05835  0.11119
## 44639 2017-08-09    9:58 0.05284 0.05835  0.11119
## 44640 2017-08-09    9:59 0.05284 0.05835  0.11119

Sunshine

Separate the date and hour start times into two columns in order to be able to match solar production and weather and create dataframe with the columns: Date / HourBegin / SunshineMinutes. Convert "Date" to ISO 8601 standard date format.

sunshine_df <- as.data.frame(sunshine_raw)
sunshine_df$Date <- sapply(strsplit(as.character(sunshine_df$IntervalStartDt), " "), "[", 1)
sunshine_df$HourBegin <- sapply(strsplit(as.character(sunshine_df$IntervalStartDt), " "), "[", 2)
new_sunshine <- data.frame(sunshine_df$Date,sunshine_df$HourBegin,sunshine_df$Sunshine)
colnames(new_sunshine) <- c("Date", "HourBegin", "SunshineMinutes")
new_sunshine$Date <- format(as.Date(new_sunshine$Date, format = "%m/%d/%Y"))
head(new_sunshine)
##         Date HourBegin SunshineMinutes
## 1 2017-08-01      0:00               0
## 2 2017-08-01      1:00               0
## 3 2017-08-01      2:00               0
## 4 2017-08-01      3:00               0
## 5 2017-08-01      4:00               9
## 6 2017-08-01      5:00              60

Cloud Cover

Separate the date and hour start times into two columns in order to be able to match solar production and weather and create dataframe with the columns: Date / HourBegin / PercentCloudCover. Convert "Date" to ISO 8601 standard date format.

cloud_df <- as.data.frame(cloud_raw)
cloud_df$Date <- sapply(strsplit(as.character(cloud_df$IntervalStartDt), " "), "[", 1)
cloud_df$HourBegin <- sapply(strsplit(as.character(cloud_df$IntervalStartDt), "  "), "[", 2)
new_cloud <- data.frame(cloud_df$Date,cloud_df$HourBegin,cloud_df$CloudCover)
colnames(new_cloud) <- c("Date", "HourBegin", "PercentCloudCover")
new_cloud$Date <- format(as.Date(new_cloud$Date, format = "%m/%d/%Y"))
head(new_cloud)
##         Date HourBegin PercentCloudCover
## 1 2017-08-01    00:00                  0
## 2 2017-08-01    01:00                  0
## 3 2017-08-01    02:00                  0
## 4 2017-08-01    03:00                  0
## 5 2017-08-01    04:00                  0
## 6 2017-08-01    05:00                  0

Data Analysis

Aggregate the Solar, Sunshine and Cloud Cover data set into one point for each day in the month to see if there is a correlation between sunshine and cloud cover and solar production.

onedate <- new_cloud[c(TRUE,rep(FALSE,23)), ]
head(onedate)
##           Date HourBegin PercentCloudCover
## 1   2017-08-01    00:00                  0
## 25  2017-08-02    00:00                  0
## 49  2017-08-03    00:00                 30
## 73  2017-08-04    00:00                 30
## 97  2017-08-05    00:00                 70
## 121 2017-08-06    00:00                  0

Daily sunshine minutes in August 2017

SunshineDaily <- round(colMeans(matrix(new_sunshine$SunshineMinutes, nrow=24)), digits=0)
SunshineDaily_df <- data.frame(onedate$Date,SunshineDaily)
colnames(SunshineDaily_df) <- c("Date","SunshineMinutes")
SunshineDaily_df 
##          Date SunshineMinutes
## 1  2017-08-01              28
## 2  2017-08-02              23
## 3  2017-08-03              26
## 4  2017-08-04              18
## 5  2017-08-05              14
## 6  2017-08-06              26
## 7  2017-08-07               8
## 8  2017-08-08              14
## 9  2017-08-09              29
## 10 2017-08-10              29
## 11 2017-08-11              22
## 12 2017-08-12              19
## 13 2017-08-13              23
## 14 2017-08-14              22
## 15 2017-08-15               8
## 16 2017-08-16              22
## 17 2017-08-17              21
## 18 2017-08-18               4
## 19 2017-08-19              21
## 20 2017-08-20              25
## 21 2017-08-21              29
## 22 2017-08-22              24
## 23 2017-08-23              21
## 24 2017-08-24              26
## 25 2017-08-25              20
## 26 2017-08-26              25
## 27 2017-08-27              27
## 28 2017-08-28              23
## 29 2017-08-29              14
## 30 2017-08-30              19
## 31 2017-08-31              24

Percent of cloud cover per day in August 2017

CloudCoverDaily <- round(colMeans(matrix(new_cloud$PercentCloudCover, nrow=24)), digits=0)
CloudCoverDaily_df <- data.frame(onedate$Date,CloudCoverDaily)
colnames(CloudCoverDaily_df) <- c("Date","PercentCloudCover")
CloudCoverDaily_df 
##          Date PercentCloudCover
## 1  2017-08-01                12
## 2  2017-08-02                25
## 3  2017-08-03                20
## 4  2017-08-04                43
## 5  2017-08-05                50
## 6  2017-08-06                14
## 7  2017-08-07                67
## 8  2017-08-08                55
## 9  2017-08-09                12
## 10 2017-08-10                 9
## 11 2017-08-11                38
## 12 2017-08-12                55
## 13 2017-08-13                35
## 14 2017-08-14                25
## 15 2017-08-15                77
## 16 2017-08-16                39
## 17 2017-08-17                26
## 18 2017-08-18                75
## 19 2017-08-19                30
## 20 2017-08-20                19
## 21 2017-08-21                 9
## 22 2017-08-22                27
## 23 2017-08-23                38
## 24 2017-08-24                18
## 25 2017-08-25                41
## 26 2017-08-26                14
## 27 2017-08-27                11
## 28 2017-08-28                20
## 29 2017-08-29                50
## 30 2017-08-30                43
## 31 2017-08-31                16

Solar produced per day at NYPA's White Plains office in August 2017

SolarDaily_df <- aggregate(new_solar$SolarSum, list(Day = new_solar$Date), sum, na.rm = TRUE)
colnames(SolarDaily_df) <- c("Date", "kWh Produced")
SolarDaily_df
##          Date kWh Produced
## 1  2017-08-01     63.30620
## 2  2017-08-02     27.23490
## 3  2017-08-03     56.57205
## 4  2017-08-04     55.31905
## 5  2017-08-05     33.19475
## 6  2017-08-06     50.65345
## 7  2017-08-07      4.78915
## 8  2017-08-08     37.15740
## 9  2017-08-09     60.30955
## 10 2017-08-10     56.19985
## 11 2017-08-11     44.07650
## 12 2017-08-12     37.47890
## 13 2017-08-13     68.29645
## 14 2017-08-14     39.45370
## 15 2017-08-15     10.70985
## 16 2017-08-16     41.72928
## 17 2017-08-17      6.58500
## 18 2017-08-18      2.60136
## 19 2017-08-19     11.71333
## 20 2017-08-20     12.65800
## 21 2017-08-21      9.00394
## 22 2017-08-22     10.37621
## 23 2017-08-23     11.34888
## 24 2017-08-24     12.59796
## 25 2017-08-25      9.50188
## 26 2017-08-26     13.09008
## 27 2017-08-27     12.57221
## 28 2017-08-28     10.87120
## 29 2017-08-29      1.71491
## 30 2017-08-30     12.88457
## 31 2017-08-31      9.46295

GRAPHS/PLOTS - JILL

Conclusions - EB & JA based on Analysis

We expected to see the solar production data from the 2 panels at NYPA's White Plain office to be highly dependent on weather, both the intensity of the solar radiation on the panels and also the cloud cover.

We did see on August 21, the day of the partial solar eclipse [if we can get this in a graph - that would be awesome], a significantly impact the production from panels.

FROM INTRO: It is critical that the impact of weather on solar production is understood by system designers and operators to ensure the increase of intermittent resources can be effectively managed to provide reliable energy supply to all customers.