Objective

The purpose of this report is to investigate how precipitation (and later other weather variables) can improve solar forecasting using our argmax probability model. The data of this report is from Schofield Barracks (SCBH1).

At SCBH1, the solar radiation is sampled every hour, ranging from 2003 to 2014.

MON DAY YEAR HR MIN TMZN TMPF RELH SKNT GUST DRCT QFLG SOLR TLKE PREC SINT FT FM PEAK HI24 LO24 PDIR VOLT DWPF
1 1 2003 0 55 HST 57 92 5 5 240 0 0 NA 18.94 NA 57 20 NA NA NA NA NA 54.7
1 1 2003 1 55 HST 59 87 3 6 240 0 0 NA 18.94 NA 55 21 NA NA NA NA NA 55.1
1 1 2003 2 55 HST 58 92 3 3 230 0 0 NA 18.94 NA 55 22 NA NA NA NA NA 55.7
1 1 2003 3 55 HST 57 92 3 5 250 0 0 NA 18.94 NA 56 23 NA NA NA NA NA 54.7
1 1 2003 4 55 HST 58 92 2 4 230 0 0 NA 18.94 NA 56 24 NA NA NA NA NA 55.7
1 1 2003 5 55 HST 57 94 3 5 240 0 0 NA 18.94 NA 54 26 NA NA NA NA NA 55.3

Pre-Processing

Missing Data-Points

Currently we remove vectors who contain any missing value between 8 to 17h. Furthermore, the probability model discard any window that contain any missing day.

Vectorization

We vectorize each weather variable of interest (currently solar and precipitation) in the dataset to daily numeric vectors. Each row on the dataset will be a day of the year, and it’s 24 columns will represent the solar radiation at the hour 0,1,…,23. For instance, the first few rows of solar radiation are:

YEAR MON DAY 8 9 10 11 12 13 14 15 16 17
2003 1 1 322 507 746 790 770 874 645 402 94 7
2003 1 2 131 236 197 225 213 179 145 57 27 5
2003 1 3 314 166 288 470 323 194 161 423 46 7
2003 1 4 64 130 808 793 807 752 150 68 20 3
2003 1 5 53 644 691 821 244 269 236 404 54 10
2003 1 6 226 730 970 500 785 402 632 403 199 10

Since the early and late hours of the day contain little to no solar radiation, we keep only hours between 8 to 17 across all weather variables of interest. For instance for solar:

YEAR MON DAY 8 9 10 11 12 13 14 15 16 17
2003 1 1 322 507 746 790 770 874 645 402 94 7
2003 1 2 131 236 197 225 213 179 145 57 27 5
2003 1 3 314 166 288 470 323 194 161 423 46 7
2003 1 4 64 130 808 793 807 752 150 68 20 3
2003 1 5 53 644 691 821 244 269 236 404 54 10
2003 1 6 226 730 970 500 785 402 632 403 199 10

Discretization

In order to calculate the probability distributions, we need to first discretize each weather variable. Depending on the weather variable, the discretization method is different.

Solar - 5 Means Clustering

For the solar radiation vectors, we apply 5-means clustering. We chose 5 clusters because this created a clear separation between the daily solar radiations over the years used on the dataset.

The id of each centroid (1 to 5) are ordered by the total solar intensity of the centroid (5 is highest, 1 is lowest).

8 9 10 11 12 13 14 15 16 17
96.45584 145.7208 184.0028 208.9416 225.6738 235.2094 239.6068 238.3433 203.0014 111.5228
275.89212 431.7226 542.3784 549.3476 472.7911 394.2021 338.9863 254.0959 164.6301 67.0274
171.47927 302.4083 428.9764 542.8919 646.5695 679.3775 633.5432 530.5714 347.3910 149.9687
362.70761 583.1298 739.1073 815.1869 784.6263 689.3010 566.5087 413.2837 266.8270 109.6263
355.95481 572.0663 756.5471 896.4277 969.5854 957.8625 856.8865 690.5565 459.9856 198.4452

Precipitation - Binning

For precipitation, we observed almost no variation across the values throgout the year when compared to SOLR: