Introduction

Solar Resource data is about various solar data types for a location. Specifically, I will work with these output fields:

The goal is to query the solar resource data API and extract the result into an easily manipulable R object. The result of this project will be a function that can be readily used to extract dataframes from complex lists.

#Libraries
library(httr)
library(dplyr)
library(ggplot2)
library(tidyr)
url <- "https://developer.nrel.gov/api/solar/solar_resource/v1.json"
parameters_list <- list(api_key=my_api_key, lat=41, lon=-75)
#Query API
response <- GET(url, query = parameters_list)
#Check for errors
if(http_error(response)){
  print(status_code(response))
  stop("Something went wrong", .call = FALSE)
}
response_content <- content(response, "text")
#Convert response to an R object
json_lists <- jsonlite::fromJSON(response_content)
str(json_lists)
## List of 6
##  $ version : chr "1.0.0"
##  $ warnings: list()
##  $ errors  : list()
##  $ metadata:List of 1
##   ..$ sources: chr "Perez-SUNY/NREL, 2012"
##  $ inputs  :List of 2
##   ..$ lat: chr "41"
##   ..$ lon: chr "-75"
##  $ outputs :List of 3
##   ..$ avg_dni     :List of 2
##   .. ..$ annual : num 3.69
##   .. ..$ monthly:List of 12
##   .. .. ..$ jan: num 3.12
##   .. .. ..$ feb: num 3.36
##   .. .. ..$ mar: num 4.1
##   .. .. ..$ apr: num 4.07
##   .. .. ..$ may: num 4.15
##   .. .. ..$ jun: num 4.17
##   .. .. ..$ jul: num 4.6
##   .. .. ..$ aug: num 4.14
##   .. .. ..$ sep: num 4.02
##   .. .. ..$ oct: num 3.26
##   .. .. ..$ nov: num 2.58
##   .. .. ..$ dec: num 2.72
##   ..$ avg_ghi     :List of 2
##   .. ..$ annual : num 3.87
##   .. ..$ monthly:List of 12
##   .. .. ..$ jan: num 1.97
##   .. .. ..$ feb: num 2.69
##   .. .. ..$ mar: num 3.86
##   .. .. ..$ apr: num 4.7
##   .. .. ..$ may: num 5.45
##   .. .. ..$ jun: num 5.78
##   .. .. ..$ jul: num 5.98
##   .. .. ..$ aug: num 5.14
##   .. .. ..$ sep: num 4.23
##   .. .. ..$ oct: num 2.94
##   .. .. ..$ nov: num 1.99
##   .. .. ..$ dec: num 1.67
##   ..$ avg_lat_tilt:List of 2
##   .. ..$ annual : num 4.52
##   .. ..$ monthly:List of 12
##   .. .. ..$ jan: num 3.55
##   .. .. ..$ feb: num 4.04
##   .. .. ..$ mar: num 4.86
##   .. .. ..$ apr: num 4.97
##   .. .. ..$ may: num 5.18
##   .. .. ..$ jun: num 5.24
##   .. .. ..$ jul: num 5.58
##   .. .. ..$ aug: num 5.24
##   .. .. ..$ sep: num 5
##   .. .. ..$ oct: num 4.11
##   .. .. ..$ nov: num 3.26
##   .. .. ..$ dec: num 3.13

How to Create Dataframe from Complex List

Extracting the required lists and creating a data frame out of this complex JSON response can be done in many ways.

I will first try to extract the lists separately.

#Extracting required lists to create data frame
outputs <- json_lists$outputs
str(outputs)
## List of 3
##  $ avg_dni     :List of 2
##   ..$ annual : num 3.69
##   ..$ monthly:List of 12
##   .. ..$ jan: num 3.12
##   .. ..$ feb: num 3.36
##   .. ..$ mar: num 4.1
##   .. ..$ apr: num 4.07
##   .. ..$ may: num 4.15
##   .. ..$ jun: num 4.17
##   .. ..$ jul: num 4.6
##   .. ..$ aug: num 4.14
##   .. ..$ sep: num 4.02
##   .. ..$ oct: num 3.26
##   .. ..$ nov: num 2.58
##   .. ..$ dec: num 2.72
##  $ avg_ghi     :List of 2
##   ..$ annual : num 3.87
##   ..$ monthly:List of 12
##   .. ..$ jan: num 1.97
##   .. ..$ feb: num 2.69
##   .. ..$ mar: num 3.86
##   .. ..$ apr: num 4.7
##   .. ..$ may: num 5.45
##   .. ..$ jun: num 5.78
##   .. ..$ jul: num 5.98
##   .. ..$ aug: num 5.14
##   .. ..$ sep: num 4.23
##   .. ..$ oct: num 2.94
##   .. ..$ nov: num 1.99
##   .. ..$ dec: num 1.67
##  $ avg_lat_tilt:List of 2
##   ..$ annual : num 4.52
##   ..$ monthly:List of 12
##   .. ..$ jan: num 3.55
##   .. ..$ feb: num 4.04
##   .. ..$ mar: num 4.86
##   .. ..$ apr: num 4.97
##   .. ..$ may: num 5.18
##   .. ..$ jun: num 5.24
##   .. ..$ jul: num 5.58
##   .. ..$ aug: num 5.24
##   .. ..$ sep: num 5
##   .. ..$ oct: num 4.11
##   .. ..$ nov: num 3.26
##   .. ..$ dec: num 3.13
avg_dni_monthly <- outputs$avg_dni$monthly
avg_ghi_monthly <- outputs$avg_ghi$monthly
avg_lat_monthly <- outputs$avg_lat_tilt$monthly

dataframe <- tibble::tibble(month = month.abb, avg_dni = avg_dni_monthly, avg_ghi = avg_ghi_monthly, avg_lat = avg_lat_monthly)
print(dataframe)
## # A tibble: 12 x 4
##    month avg_dni      avg_ghi      avg_lat     
##    <chr> <named list> <named list> <named list>
##  1 Jan   <dbl [1]>    <dbl [1]>    <dbl [1]>   
##  2 Feb   <dbl [1]>    <dbl [1]>    <dbl [1]>   
##  3 Mar   <dbl [1]>    <dbl [1]>    <dbl [1]>   
##  4 Apr   <dbl [1]>    <dbl [1]>    <dbl [1]>   
##  5 May   <dbl [1]>    <dbl [1]>    <dbl [1]>   
##  6 Jun   <dbl [1]>    <dbl [1]>    <dbl [1]>   
##  7 Jul   <dbl [1]>    <dbl [1]>    <dbl [1]>   
##  8 Aug   <dbl [1]>    <dbl [1]>    <dbl [1]>   
##  9 Sep   <dbl [1]>    <dbl [1]>    <dbl [1]>   
## 10 Oct   <dbl [1]>    <dbl [1]>    <dbl [1]>   
## 11 Nov   <dbl [1]>    <dbl [1]>    <dbl [1]>   
## 12 Dec   <dbl [1]>    <dbl [1]>    <dbl [1]>

The data columns still show up as lists. For future use of this data frame, these will need to be converted to numeric data type.

Another approach to simplify the complex list is by extracting the outputs list and using unlist() to flatten them out, creating a matrix of desired columns and converting to data frame.

unlist_outputs <- unlist(json_lists$outputs)
matrix_outputs <- matrix(unlist_outputs, nrow = 13)
matrix_outputs <- matrix_outputs[-1,]
df_outputs <- as.data.frame(matrix_outputs)
print(df_outputs)
##      V1   V2   V3
## 1  3.12 1.97 3.55
## 2  3.36 2.69 4.04
## 3  4.10 3.86 4.86
## 4  4.07 4.70 4.97
## 5  4.15 5.45 5.18
## 6  4.17 5.78 5.24
## 7  4.60 5.98 5.58
## 8  4.14 5.14 5.24
## 9  4.02 4.23 5.00
## 10 3.26 2.94 4.11
## 11 2.58 1.99 3.26
## 12 2.72 1.67 3.13

The values for each month is now appearing correctly in numeric form.

Creating Function that Returns Perfect Dataframe

Now a function can be created which performs this task perfectly. It can be reused in all future scripts that query APIs and extract the response as a manipulable R object.

solar_resource_api <- function(endpoint, queries=list()){
  url <- modify_url("https://developer.nrel.gov", path=endpoint)
  response <- GET(url, query = queries)
  #Check for errors
  if(http_error(response)){
    print(status_code(response))
    stop("Something went wrong", .call = FALSE)
  }
  response_content <- content(response, "text")
  #Convert response to an R object
  json_lists <- jsonlite::fromJSON(response_content)
  
  unlist_outputs <- unlist(json_lists$outputs)
  matrix_outputs <- matrix(unlist_outputs, nrow = 13)
  matrix_outputs <- matrix_outputs[-1,]
  df_outputs <- as.data.frame(matrix_outputs)
  #Add months column
  month <- month.abb
  df_outputs <- cbind(month, df_outputs)
  dimnames(df_outputs)[[2]] <- c("month", "avg_dni", "avg_ghi", "avg_lat_tilt")
  return(df_outputs)
}

solar_resource_df <- solar_resource_api("api/solar/solar_resource/v1.json", list(api_key=my_api_key, lat=41, lon=-75))
solar_resource_df <- solar_resource_df %>% mutate(month = factor(month, levels = month.abb))
print(solar_resource_df)
##    month avg_dni avg_ghi avg_lat_tilt
## 1    Jan    3.12    1.97         3.55
## 2    Feb    3.36    2.69         4.04
## 3    Mar    4.10    3.86         4.86
## 4    Apr    4.07    4.70         4.97
## 5    May    4.15    5.45         5.18
## 6    Jun    4.17    5.78         5.24
## 7    Jul    4.60    5.98         5.58
## 8    Aug    4.14    5.14         5.24
## 9    Sep    4.02    4.23         5.00
## 10   Oct    3.26    2.94         4.11
## 11   Nov    2.58    1.99         3.26
## 12   Dec    2.72    1.67         3.13

Visualizing New York City Solar Resource Data

ggplot(data = solar_resource_df,
       aes(x = month, y = avg_dni, group = 1)) +
  geom_line() +
  geom_point() +
  theme_bw() +
  ggtitle("New York Solar Direct Irradiation for Each Month") +
  xlab("Month") +
  ylab("Average DNI")

We notice that solar irradiation is the highest in New York City around July. This period corresponds to summer.