Executive Summary
While several forms of alternative (non-fossil fuel) energy have been in the headlines for years, the solar power industry has recently received much attention due to numerous changes in the industry as well as recent weather initiated power failures.
The National Renewable Energy Lab (NREL) publicly available data serve as the foundation, giving detailed intra-day measurements. These data are used to explore potential geo-related detractors from consumer perceived energy savings.
These results are packaged in an R tool (Shiny application), in which the user can select a location to view the details. The bulk of this work is written and tested using SAS 9.4.
Main purpose
The goal of this document is to highlight SAS language skills…the R functionality is limited to generating this HTML document (written in markdown language), reading the raw SAS programs, and creating the Shiny app. As such, the SAS code can be copied and applied to other data from NREL in a standalone SAS session.
The hope is to qualify/confirm
- efficient programming techniques – look for SAS Code Notes for commentary around the code
- excellent communication skills
- topical flexibility – the author has never worked in the Solar space and has an Economics background
Please note, in terms of sophistication, I have left out – but have experience with and can provide – more complex solutions to bigger data challenges. These other, deep experiences, are left for discussion. For example, there are no SAS ARRAY statements, which is an approach I take often (data just did not suit in this case); when possible, it may appear I use an approach not really necessary, but do so in an effort to convey knowledge (e.g., hash lookup of OIN feature).
Data
The NREL data comes from either:
- a web-based API where the user sets several parameters and downloads area-based data
- program a different API to download the data
Option two was used to use data to drive the process. Code was written in R to take a user input – latitude and longitude coordinates – and generate data for the nearest cell (a cell in this case means geographic area), plus the cells nearby (therefore a total of 9 cells come back from the pull).
These extracts are on an annual basis, therefore many files are involved for each user location. These numerous files are an excellent candidate for SAS macro language. Below are the details for the import_nrel macro:
%let folder="e:/";
%let fnx=".csv";
%macro import_nrel (fnx);
proc import file="&folder/&fnx" dbms=CSV replace out=&dsout;
run;
proc append base=&location data=&dsout;
run;
%mend import_nrel;
* obtain list of available .csv files for the location...;
In the code above, each file will be imported and appended to the final dataset (which will have the same name as location).
SAS Code Notes
- A GLOBAL macro variable
folder defines the location of the raw CSV files. There are 90 files per location (10 years X (1 coordinate + surrounding 8 coordinates)).
- PROC APPEND is used to avoid repeatedly reading the base file (as in using a SET statement). This also requires much less typing.
The next step is to create some variables that will make interpretation easier. For example, features in metric units (cm) are converted to english (inches). The main “energy” variable is GHI (global horizontal irradiance), which is provided as a variable, but a second version is calculated using the industry standard formula as a verification step (100% agreement).
As the topic of interest is reduction to expected energy output, ancillary data is gathered describing
| US Forest Fires |
Smoke derates GHI |
SW Colorado affected by Forest Fire smoke in areas West |
| El Nino/La Nino oceanic effects |
Cloudiness derates GHI |
SW Colorado generally sees more cloudiness near El Nino events |
These ancillary data are brought in to examine any correlation between GHI and large atmospheric conditions changes signaled by presence of El Nino or major forest fires. In Colorado, the prevailing winds come from the West – smoke from forest fire activity in California (for example) has been observed by the author on several, multi-day occasions throughout the past decade. Also related to southwest Colorado, the area is home to the San Juan mountain range, which generally sees a rise in cloudiness and humidity during El Nino events.
The code below uses SAS hash language to read in the OIN data into a hash table – this keeps the lookup values in memory, allowing for one pass through the data and avoiding sorts/merges.
proc import file="&folder/&fnx" dbms=CSV replace out=&dsout;
run;
proc append base=&location data=&dsout;
run;
%mend import_nrel;
* obtain list of available .csv files for the location...;
Exploratory Data Analysis
The following section goes into EDA, but we will spend some time covering some advanced SAS plotting techniques (PROC SGPLOT). These output will be compared to that of R. The plots were created in SAS and saved to disk, then read into this markdown document.

Model
App
Conclusion
Appendix A: Citations
Appendix B: SAS Code
Appendix C: R Code
Appendix D: Processing System