| Last modified: 12/08/2017 05:15 |
Solar energy is becoming an increasingly important component of the electric grid. From 2011 to 2016, New York saw an increase in the total solar installed of more than 800%. New York energy policy requires at least 50% of the energy consumed in the state by 2030, up from the level of 25% in 2017, which will further increase the amount of solar installed across the state.
The New York Power Authority works with its customers to install large-scale and distributed solar, and has been a leader in advanced clean energy projects for decades. NYPA installed one of the first rooftop solar projects in New York State on its headquarters in White Plains and currently monitors the energy production on 1-minute intervals.
Solar production is highly dependent on weather, both the intensity of the solar radiation on the panels and also the cloud cover. Further, significant events such as full or partial solar eclipse, will also significantly impact the production from panels. It is critical that the impact of weather on solar production is understood by system designers and operators to ensure the increase of intermittent resources can be effectively managed to provide reliable energy supply to all customers.
library(RCurl)
library(tidyr)
library(dplyr)
library(ggplot2)
library(DT)
library(knitr)
library(stringr)
library(reshape2)
Sunshine minutes in each hour
sunshine_url <- getURL("https://raw.githubusercontent.com/jillenergy/Solar-Weather/master/SunshineMinAug2017.csv")
sunshine_raw <- read.csv(text = sunshine_url)
head(sunshine_raw)
## Station Scenario IntervalStartDt IntervalEndDt Sunshine
## 1 White Plains ACTUAL 8/1/2017 0:00 8/1/2017 1:00 0
## 2 White Plains ACTUAL 8/1/2017 1:00 8/1/2017 2:00 0
## 3 White Plains ACTUAL 8/1/2017 2:00 8/1/2017 3:00 0
## 4 White Plains ACTUAL 8/1/2017 3:00 8/1/2017 4:00 0
## 5 White Plains ACTUAL 8/1/2017 4:00 8/1/2017 5:00 9
## 6 White Plains ACTUAL 8/1/2017 5:00 8/1/2017 6:00 60
Cloud cover minutes in each hour
cloud_url <- getURL("https://raw.githubusercontent.com/jillenergy/Solar-Weather/master/CloudCoverMinAug2017.csv")
cloud_raw <- read.csv(text = cloud_url)
head(cloud_raw)
## Station Scenario IntervalStartDt IntervalEndDt CloudCover
## 1 White Plains ACTUAL 08/01/2017 00:00 08/01/2017 01:00 0
## 2 White Plains ACTUAL 08/01/2017 01:00 08/01/2017 02:00 0
## 3 White Plains ACTUAL 08/01/2017 02:00 08/01/2017 03:00 0
## 4 White Plains ACTUAL 08/01/2017 03:00 08/01/2017 04:00 0
## 5 White Plains ACTUAL 08/01/2017 04:00 08/01/2017 05:00 0
## 6 White Plains ACTUAL 08/01/2017 05:00 08/01/2017 06:00 0
Solar production (kWh) in each minute from 2 solar panels located at NYPA's headquarters in WPO
solar_raw <- read.csv("https://raw.githubusercontent.com/jillenergy/Solar-Weather/master/WPOsolarPHA%2BPHB_08.01-31.2017.csv", header = TRUE, stringsAsFactors = FALSE)
head(solar_raw)
## Device Standard.Name Display.Name Units Timestamp
## 1 WPO Solar PHA em_ActEnergyDlvd Active Energy Delivered kWh 8/1/17 0:00
## 2 WPO Solar PHA em_ActEnergyDlvd Active Energy Delivered kWh 8/1/17 0:01
## 3 WPO Solar PHA em_ActEnergyDlvd Active Energy Delivered kWh 8/1/17 0:02
## 4 WPO Solar PHA em_ActEnergyDlvd Active Energy Delivered kWh 8/1/17 0:03
## 5 WPO Solar PHA em_ActEnergyDlvd Active Energy Delivered kWh 8/1/17 0:04
## 6 WPO Solar PHA em_ActEnergyDlvd Active Energy Delivered kWh 8/1/17 0:05
## Timezone Value
## 1 undefined 0
## 2 undefined 0
## 3 undefined 0
## 4 undefined 0
## 5 undefined 0
## 6 undefined 0
Separate the date and minute intervals into two columns, and create dataframe with the columns: Panel Name / Date / Time / kWh
solar_df <- as.data.frame(solar_raw, stringsAsFactors = FALSE)
solar_df$Date <- sapply(strsplit(as.character(solar_df$Timestamp), " "), "[", 1)
solar_df$Minutes <- sapply(strsplit(as.character(solar_df$Timestamp), " "), "[", 2)
new_solar <- data.frame(solar_df$Device, solar_df$Date, solar_df$Minutes, solar_df$Value, stringsAsFactors = FALSE)
colnames(new_solar) <- c("PanelName", "Date", "Minutes", "kWh")
head(new_solar)
## PanelName Date Minutes kWh
## 1 WPO Solar PHA 8/1/17 0:00 0
## 2 WPO Solar PHA 8/1/17 0:01 0
## 3 WPO Solar PHA 8/1/17 0:02 0
## 4 WPO Solar PHA 8/1/17 0:03 0
## 5 WPO Solar PHA 8/1/17 0:04 0
## 6 WPO Solar PHA 8/1/17 0:05 0
Make variables of the PanelNames from observations with tidyr ‘spread’ function, in order to then take the mean of both.
new_solar <- spread(new_solar, "PanelName", "kWh")
dplyr::tbl_df(new_solar)
## # A tibble: 44,640 x 4
## Date Minutes `WPO Solar PHA` `WPO Solar PHB`
## * <chr> <chr> <chr> <chr>
## 1 8/1/17 0:00 0 0
## 2 8/1/17 0:01 0 0
## 3 8/1/17 0:02 0 0
## 4 8/1/17 0:03 0 0
## 5 8/1/17 0:04 0 0
## 6 8/1/17 0:05 0 0
## 7 8/1/17 0:06 0 0
## 8 8/1/17 0:07 0 0
## 9 8/1/17 0:08 0 0
## 10 8/1/17 0:09 0 0
## # ... with 44,630 more rows
colnames(new_solar) <- c("Date", "Minutes", "PHA_kWh", "PHB_kWh")
head(new_solar)
## Date Minutes PHA_kWh PHB_kWh
## 1 8/1/17 0:00 0 0
## 2 8/1/17 0:01 0 0
## 3 8/1/17 0:02 0 0
## 4 8/1/17 0:03 0 0
## 5 8/1/17 0:04 0 0
## 6 8/1/17 0:05 0 0
sapply(new_solar, class)
## Date Minutes PHA_kWh PHB_kWh
## "character" "character" "character" "character"
Take the mean of the kWh produced by the two solar panels.
cols.num <- c("PHA_kWh","PHB_kWh")
new_solar[cols.num] <- sapply(new_solar[cols.num],as.numeric)
sapply(new_solar, class)
## Date Minutes PHA_kWh PHB_kWh
## "character" "character" "numeric" "numeric"
There were a couple of "null" entries that were trasnformed to "NAs," which posed a bit of challenge to work with later.
new_solar <- new_solar %>% mutate_if(is.numeric, funs(round(., 5)))
Reference: https://stackoverflow.com/questions/27613310/rounding-selected-columns-of-data-table-in-r
new_solar$SolarSum <- rowSums(new_solar[,3:4], na.rm = TRUE)
tail(new_solar)
## Date Minutes PHA_kWh PHB_kWh SolarSum
## 44635 8/9/17 9:54 0.03795 0.05112 0.08907
## 44636 8/9/17 9:55 0.03795 0.05112 0.08907
## 44637 8/9/17 9:56 0.05284 0.05835 0.11119
## 44638 8/9/17 9:57 0.05284 0.05835 0.11119
## 44639 8/9/17 9:58 0.05284 0.05835 0.11119
## 44640 8/9/17 9:59 0.05284 0.05835 0.11119
new_solar$Date <- format(as.Date(new_solar$Date, format = "%m/%d/%y"))
tail(new_solar)
## Date Minutes PHA_kWh PHB_kWh SolarSum
## 44635 2017-08-09 9:54 0.03795 0.05112 0.08907
## 44636 2017-08-09 9:55 0.03795 0.05112 0.08907
## 44637 2017-08-09 9:56 0.05284 0.05835 0.11119
## 44638 2017-08-09 9:57 0.05284 0.05835 0.11119
## 44639 2017-08-09 9:58 0.05284 0.05835 0.11119
## 44640 2017-08-09 9:59 0.05284 0.05835 0.11119
Separate the date and hour start times into two columns in order to be able to match solar production and weather and create dataframe with the columns: Date / HourBegin / SunshineMinutes. Convert "Date" to ISO 8601 standard date format.
sunshine_df <- as.data.frame(sunshine_raw)
sunshine_df$Date <- sapply(strsplit(as.character(sunshine_df$IntervalStartDt), " "), "[", 1)
sunshine_df$HourBegin <- sapply(strsplit(as.character(sunshine_df$IntervalStartDt), " "), "[", 2)
new_sunshine <- data.frame(sunshine_df$Date,sunshine_df$HourBegin,sunshine_df$Sunshine)
colnames(new_sunshine) <- c("Date", "HourBegin", "SunshineMinutes")
new_sunshine$Date <- format(as.Date(new_sunshine$Date, format = "%m/%d/%Y"))
head(new_sunshine)
## Date HourBegin SunshineMinutes
## 1 2017-08-01 0:00 0
## 2 2017-08-01 1:00 0
## 3 2017-08-01 2:00 0
## 4 2017-08-01 3:00 0
## 5 2017-08-01 4:00 9
## 6 2017-08-01 5:00 60
new_sunshine$PercentSunshine <- round(((new_sunshine$SunshineMinutes/60) * 100), digits = 0)
head(new_sunshine)
## Date HourBegin SunshineMinutes PercentSunshine
## 1 2017-08-01 0:00 0 0
## 2 2017-08-01 1:00 0 0
## 3 2017-08-01 2:00 0 0
## 4 2017-08-01 3:00 0 0
## 5 2017-08-01 4:00 9 15
## 6 2017-08-01 5:00 60 100
Separate the date and hour start times into two columns in order to be able to match solar production and weather and create dataframe with the columns: Date / HourBegin / PercentCloudCover. Convert "Date" to ISO 8601 standard date format.
cloud_df <- as.data.frame(cloud_raw)
cloud_df$Date <- sapply(strsplit(as.character(cloud_df$IntervalStartDt), " "), "[", 1)
cloud_df$HourBegin <- sapply(strsplit(as.character(cloud_df$IntervalStartDt), " "), "[", 2)
new_cloud <- data.frame(cloud_df$Date,cloud_df$HourBegin,cloud_df$CloudCover)
colnames(new_cloud) <- c("Date", "HourBegin", "PercentCloudCover")
new_cloud$Date <- format(as.Date(new_cloud$Date, format = "%m/%d/%Y"))
head(new_cloud)
## Date HourBegin PercentCloudCover
## 1 2017-08-01 00:00 0
## 2 2017-08-01 01:00 0
## 3 2017-08-01 02:00 0
## 4 2017-08-01 03:00 0
## 5 2017-08-01 04:00 0
## 6 2017-08-01 05:00 0
Aggregate the Solar, Sunshine and Cloud Cover data set into one point for each day in the month to see if there is a correlation between sunshine and cloud cover and solar production.
onedate <- new_cloud[c(TRUE,rep(FALSE,23)), ]
head(onedate)
## Date HourBegin PercentCloudCover
## 1 2017-08-01 00:00 0
## 25 2017-08-02 00:00 0
## 49 2017-08-03 00:00 30
## 73 2017-08-04 00:00 30
## 97 2017-08-05 00:00 70
## 121 2017-08-06 00:00 0
SunshineDaily <- round(colMeans(matrix(new_sunshine$PercentSunshine, nrow=24)), digits=1)
SunshineDaily_df <- data.frame(onedate$Date,SunshineDaily)
colnames(SunshineDaily_df) <- c("Date","PercentSunshine")
head(SunshineDaily_df)
## Date PercentSunshine
## 1 2017-08-01 46.2
## 2 2017-08-02 38.6
## 3 2017-08-03 43.1
## 4 2017-08-04 29.7
## 5 2017-08-05 23.6
## 6 2017-08-06 42.9
CloudCoverDaily <- round(colMeans(matrix(new_cloud$PercentCloudCover, nrow=24)), digits=1)
CloudCoverDaily_df <- data.frame(onedate$Date,CloudCoverDaily)
colnames(CloudCoverDaily_df) <- c("Date","PercentCloudCover")
head(CloudCoverDaily_df)
## Date PercentCloudCover
## 1 2017-08-01 12.5
## 2 2017-08-02 25.0
## 3 2017-08-03 20.0
## 4 2017-08-04 43.3
## 5 2017-08-05 50.0
## 6 2017-08-06 13.8
SolarDaily_df <- aggregate(new_solar$SolarSum, list(Day = new_solar$Date), sum, na.rm = TRUE)
colnames(SolarDaily_df) <- c("Date", "kWhProduced")
SolarDaily_df[,-1] <-round(SolarDaily_df[,-1],1)
head(SolarDaily_df)
## Date kWhProduced
## 1 2017-08-01 63.3
## 2 2017-08-02 27.2
## 3 2017-08-03 56.6
## 4 2017-08-04 55.3
## 5 2017-08-05 33.2
## 6 2017-08-06 50.7
AllData_df <- data.frame("Date" = SunshineDaily_df$Date,
"PercentSunshine" = SunshineDaily_df$PercentSunshine,
"PercentCloudCover" = CloudCoverDaily_df$PercentCloudCover,
"kWhProduced" = SolarDaily_df$kWhProduced)
head(AllData_df)
## Date PercentSunshine PercentCloudCover kWhProduced
## 1 2017-08-01 46.2 12.5 63.3
## 2 2017-08-02 38.6 25.0 27.2
## 3 2017-08-03 43.1 20.0 56.6
## 4 2017-08-04 29.7 43.3 55.3
## 5 2017-08-05 23.6 50.0 33.2
## 6 2017-08-06 42.9 13.8 50.7
SunshineSolar_df <- data.frame(SunshineDaily_df$Date,
"PercentSunshine" = SunshineDaily_df$PercentSunshine,
"kWhProduced" = SolarDaily_df$kWhProduced)
colnames(SunshineSolar_df) <- c("Date","PercentSunshine","kWhProduced")
head(SunshineSolar_df)
## Date PercentSunshine kWhProduced
## 1 2017-08-01 46.2 63.3
## 2 2017-08-02 38.6 27.2
## 3 2017-08-03 43.1 56.6
## 4 2017-08-04 29.7 55.3
## 5 2017-08-05 23.6 33.2
## 6 2017-08-06 42.9 50.7
SunshineCloud_df <- data.frame(SunshineDaily_df$Date,
"PercentSunshine" = SunshineDaily_df$PercentSunshine,
"PercentCloudCover" = CloudCoverDaily_df$PercentCloudCover)
colnames(SunshineCloud_df) <- c("Date","PercentSunshine","PercentCloudCover")
head(SunshineCloud_df)
## Date PercentSunshine PercentCloudCover
## 1 2017-08-01 46.2 12.5
## 2 2017-08-02 38.6 25.0
## 3 2017-08-03 43.1 20.0
## 4 2017-08-04 29.7 43.3
## 5 2017-08-05 23.6 50.0
## 6 2017-08-06 42.9 13.8
SunshineCloudMelt_df <- melt(SunshineCloud_df, id.vars = 'Date')
head(SunshineCloudMelt_df)
## Date variable value
## 1 2017-08-01 PercentSunshine 46.2
## 2 2017-08-02 PercentSunshine 38.6
## 3 2017-08-03 PercentSunshine 43.1
## 4 2017-08-04 PercentSunshine 29.7
## 5 2017-08-05 PercentSunshine 23.6
## 6 2017-08-06 PercentSunshine 42.9
CloudPlot <- ggplot(data=AllData_df, aes(x=Date, y=PercentCloudCover, group=1)) +
geom_point(color="#0066ff", size=2) +
ggtitle("Percent Cloud Cover by Hour in White Plains, NY for Each Day August 2017") +
labs(x="Date in August 2017", y="Average Percent of Cloud Cover per Day") +
theme(axis.title.y = element_text(size=12, color="#666666")) +
theme(legend.position = "none",
axis.text.x = element_text(angle = 65, hjust = 1)) +
theme(axis.text = element_text(size=12, family="Trebuchet MS")) +
theme(plot.title = element_text(size=12, family="Trebuchet MS", face="bold", hjust=0, color="#666666"))
CloudPlot
SunshinePlot <- ggplot(data=SunshineDaily_df, aes(x=Date, y=PercentSunshine, group=1)) +
geom_point(color="blue", size=2) +
ggtitle("Average Percentage of Sunshine by Hour in White Plains, NY for Each Day August 2017") +
labs(x="Date in August 2017", y="Average Percentage of Sunshine per Hour each Day") +
theme(axis.title.y = element_text(size=12, color="#666666")) +
theme(legend.position = "none",
axis.text.x = element_text(angle = 65, hjust = 1)) +
theme(axis.text = element_text(size=12, family="Trebuchet MS")) +
theme(plot.title = element_text(size=12, family="Trebuchet MS", face="bold", hjust=0, color="#666666"))
SunshinePlot
SolarPlot <- ggplot(data=AllData_df, aes(Date, y=kWhProduced)) +
geom_point(color="red", size=2) +
ggtitle("Total kWh Produced in White Plains, NY for Each Day August 2017") +
labs(x="Date in August 2017", y="Total Solar Produced in kWh each Day") +
theme(axis.title.y = element_text(size=12, color="#666666")) +
theme(legend.position = "none",
axis.text.x = element_text(angle = 65, hjust = 1)) +
theme(axis.text = element_text(size=12, family="Trebuchet MS")) +
theme(plot.title = element_text(size=12, family="Trebuchet MS", face="bold", hjust=0, color="#666666"))
SolarPlot
SunCloudPlot <- ggplot(SunshineCloudMelt_df, aes(x=Date, y = value, fill = variable)) +
geom_bar(stat='identity') +
ggtitle("Percentage of Cloud Cover and Sunshine Minutes in White Plains, NY for Each Day August 2017") +
labs(x="Date in August 2017", y="Percentage") +
theme(axis.title.y = element_text(size=12, color="#666666")) +
theme(legend.position = 'bottom',
axis.text.x = element_text(angle = 65, hjust = 1)) +
theme(axis.text = element_text(size=12, family="Trebuchet MS")) +
theme(plot.title = element_text(size=12, family="Trebuchet MS", face="bold", hjust=0, color="#666666"))
SunCloudPlot
SunshineSolarPlot <- ggplot(SunshineSolar_df, aes(Date, group=1)) +
geom_line(aes(y = kWhProduced, colour = "SolarProduced")) +
geom_line(aes(y = PercentSunshine, colour = "PercentSunshine")) +
ggtitle("Total Solar Produced and Average Percent Sunshine in White Plains, NY for Each Day August 2017") +
labs(x="Date in August 2017", y="Percentage") +
theme(axis.title.y = element_text(size=12, color="#666666")) +
theme(legend.position = 'bottom',
axis.text.x = element_text(angle = 65, hjust = 1)) +
theme(axis.text = element_text(size=12, family="Trebuchet MS")) +
theme(plot.title = element_text(size=12, family="Trebuchet MS", face="bold", hjust=0, color="#666666"))
SunshineSolarPlot
AllDataPlot <- ggplot(AllData_df, aes(Date, group=1)) +
geom_line(aes(y = kWhProduced, colour = "SolarProduced")) +
geom_line(aes(y = PercentCloudCover, colour = "PercentCloudCover")) +
geom_line(aes(y = PercentSunshine, colour = "PercentSunshine")) +
ggtitle("Total Solar Produced and Average Percent Sunshine and Cloud Cover in White Plains, NY for Each Day August 2017") +
labs(x="Date in August 2017", y="Percentage") +
theme(axis.title.y = element_text(size=12, color="#666666")) +
theme(legend.position = 'bottom',
axis.text.x = element_text(angle = 65, hjust = 1)) +
theme(axis.text = element_text(size=12, family="Trebuchet MS")) +
theme(plot.title = element_text(size=12, family="Trebuchet MS", face="bold", hjust=0, color="#666666"))
AllDataPlot
We expected to see the solar production data from the two panels at NYPA's White Plain office to be highly dependent on weather, both the intensity of the solar radiation on the panels represented in this data set as sunshine minutes and also the cloud cover percentage.
SUNSHINE MINUTES ON SOLAR PRODUCTION We observed a strong correlation between solar production and the percentage of sunshine minutes in an hour throughout the month. As sunshine is high as it was August 9 and 10, there is alos a high produciton of solar. There is a spreading of the lines in the last third of the month when looking at solar production and sunshine, but the corrleation still exists, as demonstrated on August 29 with a dip in production on the same day as a dip in sunshine minutes.
CLOUD COVER PERCENTAGE ON SOLAR PRODUCTION Similarly, there is an inverse correlation between cloud cover and solar produced on most days. For example, August 15 the cloud cover is very high and the solar production and sunshine minutes are both low as one would expect. There are several observations where the correlation does not hold. For example, the dip in cloud cover on August 17 does not result in a spike of either sunshine or solar production, suggestion that the intermittent and variable density nature of cloud cover makes it a imperfect predictor of solar production.
The better engineers and electric grid system operators are able to understand the impact of weather on solar production, the easier it will be to deploy and manage more solar resources throughout the system.