Introduction

This tutorial outlines options for querying data from the aWhere API using the R programming language. It details how to query data for one location or for a set of locations, and for flexible time periods.

While it is possible that readers without any R experience can follow along, this tutorial assumes a relatively advanced level of understanding of R and the RStudio Integrated Development Environment (IDE). This understanding should include a grasp of topics like basic data structures in R, working with data frames, and loading packages.

Note that in all of the following examples, code inputs are given in a gray box, and output from the code being run immediately follows the gray-boxed code.

Setting up Your Environment

In order to perform queries, you should first install the aWhereAPI R package (installation is a one-time operation), load the library and any supplementary libraries you may need, and authenticate yourself to the aWhere API.


## 
## #install the aWhere R package and its dependencies - only needs to be done once
## install.packages('devtools')
## install.packages(c('chron', 'magrittr', 'btcops', 'DBI', 'assertthat', 'Rcpp', 'tibble'))
## devtools::install_github("aWhereAPI/aWhere-R-Library")
## 
#load libraries that will be needed
suppressWarnings(suppressPackageStartupMessages(library(tidyr)))
suppressWarnings(suppressPackageStartupMessages(library(dplyr)))
suppressWarnings(suppressPackageStartupMessages(library(aWhereAPI)))

#set the path to your working directory
setwd("~/R Working Directory/API")

## 
## #authenticate yourself to the aWhere API - your key and secret are unique to your account
## key <- "API consumer key"
## secret <- "API consumer secret"
## 
#send your credentials to the API to receive an access token
get_token(key, secret)
## $error
## [1] FALSE
## 
## $error_message
## NULL
## 
## $token
## [1] "0oREvcoXAFV3xxwTAfA1OVQeGddn"
#set some data query parameters
lat <-        -20.465133
lon <-        -42.081328
day_start <-  "2017-06-01"
day_end <-    "2017-12-15"
monthday_start <- "06-01"
monthday_end <- "12-15"
year_start <- 2008
year_end <-   2016

Each API user sets their own username and password when they register an account at developer.awhere.com. The user then gets a unique key and secret generated by the aWhere system for each App created under the user account. This means a user may have multiple apps created under their account, each with their own key/secret combo. The key and secret are the keys to authenticate yourself to the API.

Many companies have APIs from which users can pull data, and setting up your credentials correctly can be technically challenging. Fortunately, the awhereAPI package contains the useful get_token() command, which takes your key and secret as an input and performs all of the necessary technical operations behind the scenes. If successful, the command returns and stores a “token” in your environment, meaning a time-limited authorization code. Tokens are only valid for one hour, but the get_token() command only needs to be run successfully once in a single R session - your token will be automatically refreshed and included in API queries from then on.

The code above also tells R the latitude, longitude, start date, and end date of the data query you wish to run, and the start year and end year to use for long-term normal calculations. Now you’re ready to start running basic commands from the aWhereAPI package.

Creating Fields

Fields are an optional way for users to register points in the aWhere API that they intend to pull data for repeatedly. Registering a field is a one-time operation using the command create_field(), and once registered, you will be able to use the fieldID to query data just as you would a latitude/longitude combo.


################FIELDS#######################
###Create Field
#This will create a field in your aWhere account at the following location.  
#Update location and ID to create additional fields.

aWhereAPI::create_field(field_id = "test_kmpala", 
             latitude = "0.403444", longitude = "32.560327", 
             farm_id = "Test")
## No encoding supplied: defaulting to UTF-8.
## Operation Complete
aWhereAPI::create_field(field_id = "test_kampala", 
             latitude = "0.403444", longitude = "32.560327", 
             farm_id = "Test")
## No encoding supplied: defaulting to UTF-8.
## Operation Complete
###Get Fields List
#This will output a list of all the fields currently stored in your aWhere account.
#Note: The parentheses are empty, as the API only needs your stored credentials

get_fields()
###Delete a Field
#This will delete a field from your list (referenced by field_id)

delete_field("test_kmpala")
## No encoding supplied: defaulting to UTF-8.
## Operation Complete

You can check which fields are already registered under your account with the command get_fields() (run with no parameters inside the parentheses), and you can delete a field (for example, if you misspell the id you wished to assign to it, as in the example above) by referencing its fieldID within the command delete_field().

NOTE: Users may query data using a latitude/longitude point without ever registering it as a field. However, if users anticipate querying the same location (or locations) repeatedly, it may be useful to register the field and assign a common name to it - this aids in the ease of querying and interpretability of results.

Querying Daily Weather Data

Daily weather information is available in the API via either the Observations endpoint (for historical weather data like precipitation and temperature), the Agronomics endpoint (for historical calculated variables like potential evapotranspiration and growing degree days), or the Forecasts endpoint (for data for the current date plus 7 days in the future). Calls to the Observations endpoint can be made with either the daily_observed_fields() command, or the daily_observed_latlng() command, depending on how the user chooses to reference the location.


################DATA########################
####Daily observed weather data - input name/id of field or latitude/longitude points
#This pulls the data and creates a dataset in R titled "obs" that can be viewed later.
#The two lines below retrieve the same weather data, by field ID/name or by coordinates.
#The calls reference the lat, lon, day_start, and day_end objects you created above. They
#can also reference a field you have created. Note that the function names are different
#depending on which way you choose to reference the location.

obs1 <- daily_observed_fields("test_kampala", 
                              day_start = day_start, 
                              day_end = day_end)

obs2 <- daily_observed_latlng(lat = lat, lon = lon, 
                              day_start = day_start, 
                              day_end = day_end)

head(obs1)

As with all R Packages, documentation to help you use custom commands like these from the aWhereAPI package can be accessed through standard R help functions like ??, and by utilizing the Packages & Help panels in the bottom right of your RStudio session.

The commands above return data frames of daily observed weather data, such as maximum and minimum temperature, humidity, and precipitation. The data frame will include data for the full span of days from your pre-determined start_date to the end_date. The below shows similar examples of queries sent to the Agronomics endpoint.


###Daily observed agronomic data
#This creates a dataset in R titled "ag" or "ag_ltn"

agro1 <- agronomic_values_fields("test_kampala", 
                                 day_start = day_start, 
                                 day_end = day_end)

agro2 <- agronomic_values_latlng(lat, lon, 
                                day_start = day_start, 
                                day_end = day_end)

head(agro1)

Data returned by the above commands covers variables like PET (potential evapotranspiration) which is calculated based on the values from the observed dataset, and which gives insight into the agronomic conditions on the ground conducive to plant growth. High PET, for example, may indicate conditions too hot and dry for plant growth.

Finally, the below queries show how to retrieve data from the Forecast endpoint.


###Forecast data - customize call as needed
#This creates a dataset in R titled "fcst" that can be saved/charted/analyzed.
#Note: day_start to be a day in the near future or today.
#Note: Forecasts are available in hourly increments, the block_size parameter 
#allows you to set how many hours of the day should be aggregated in the forecast.

fcst1 <- forecasts_fields("test_kampala", 
                          day_start = as.character(Sys.Date()), 
                          day_end = as.character(Sys.Date()+7), block_size = 24)

fcst2 <- forecasts_latlng(lat, lon, 
                         day_start = as.character(Sys.Date()), 
                         day_end = as.character(Sys.Date()+7), block_size = 24)

head(fcst2)

The commands above return a data frame with forecast data. Forecasts are available in the aWhere API starting from the current date to seven days from current date. This may seem a bit strange - we tend to think of forecasted weather as being for tomorrow on. However, the satellites and weather stations that provide the raw inputs for aWhere’s data generation system need time to record and process the tremendous amount of data they collect every day on the world’s weather, and aWhere’s system needs several hours to run its own systems to ingest the raw data and produce a continuous weather grid for the entire globe.

As a result, queries for forecast data will typically start from today’s date and end within 1-7 days in the future. The base R command Sys.Date() is a useful tool for setting this start date and end date in a responsive way. If you use this method, then you will be able to run the same command each day with the start_date & end_date of the query automatically updated. Otherwise, you will have to manually set those dates each time - e.g. changing “2018-01-01” to “2018-01-02” the next day.

Users should also note that forecast data is available on an hourly basis. Setting the block_size parameter allows for users to aggregate data up to different increments, such as every 3 hours, 6 hours, or 24 hours.

Querying Long-Term Normal Data

Unlike daily data, data on long-term normals is not intended to represent the true conditions on the ground for a single date. Rather, it represents the calculated “normal conditions” on the ground for that date. These norms are calculated as an average across multiple years of observations on that same date of the year, and can be very useful for understanding if current ground conditions are dangerously unusual.

Long-term normal data (or LTN) is available through two endpoints: Weather Norms and Agronomic Norms. An example of Weather Norms:


###Long-term norm data - norms determined based on month-day (MM-DD) spans,   
###with default as 10-year norms. Can customize years and exclude years.
#This pulls the data and creates a dataset in R titled "ltn1" that can
#be viewed, saved, charted, or analyzed.

ltn1 <- weather_norms_fields("test_kampala", 
                             monthday_start = monthday_start, 
                             monthday_end = monthday_end, 
                             year_start = year_start, 
                             year_end = year_end)

##custom-year norms
ltn2 <- weather_norms_latlng(lat, lon, 
                             monthday_start = monthday_start, 
                             monthday_end = monthday_end, 
                             year_start = year_start, 
                             year_end = year_end, 
                             exclude_years = c("2011", "2016"))

head(ltn2)

Note that these queries include parameters for monthday_start and monthday_end, as well as year_start and year_end. This is because the Weather Norms endpoint cannot return different values for January 1 2015 and for January 1 2016. By definition the LTN values for January 1 must be the same.

In addition, the years included in the LTN calculations matter. Ten years is a standard length of time to consider for LTN, however users may wish to restrict the year span, for example in order to understand if the LTN over the past 3 years is significantly different from the 10-year LTN (a common indication of increased weather volatility). Users may also want to exclude a particular year (using the exclude_years parameter) if they know that a particular year was an extreme outlier in the weather it experienced.

The Agronomic Norms endpoint operates similarly:


###Agronomic Norms
##Calculated multiyear normals for a particular span of dates 
#(for example, January 15 - February 15)

agro_ltn1 <- agronomic_norms_fields("test_kampala", 
                                    month_day_start = monthday_start, 
                                    month_day_end = monthday_end, 
                                    year_start = year_start, 
                                    year_end = year_end)

agro_ltn2 <- agronomic_norms_latlng(lat, lon, 
                                   month_day_start = monthday_start, 
                                   month_day_end = monthday_end, 
                                   year_start = year_start, 
                                   year_end = year_end)

Saving Datasets to Disk

As with all data frames in R, you can save the datasets you retrieve to your hard drive using base R commands like write_csv(). An example of saving the daily observed dataset obs1 is shown below, and users should investigate how to save to different formats like Excel workbooks, .txt files, and more.


###Save & export data into .csv file
#You can change which dataset you want to export - here we're exporting "obs1"

write.csv(obs1, file = "weather_dataset.csv")

Pulling Combined Dataframes via a Custom Function

Users will often want to combine different datasets together and filter out particular variables and observations of interest to them to analyze or chart. This data combination and cleaning work can often be tricky to do in R in a reproducible way, and writing custom functions can help to automate this process. aWhere previously created one example of such a function, called generateaWhereDataset(). This function is not part of the aWhereAPI package, but can be loaded into the R environment by sourcing a file in which the function code is written. An example of how this is done:


#source helper function - be sure the pathway given to the file as an input is correct
source("./function_generateaWhereDataset.R")

###Pull entire dataset
weather_df <- generateaWhereDataset(lat = lat, lon = lon, 
                                    day_start = day_start, 
                                    day_end = day_end, 
                                    year_start = year_start, 
                                    year_end = year_end)

head(weather_df)

aWhere clients can ask for the source file for the function demonstrated here, or request customized functions, or even create their own functions for their specific use cases. This specific function was created to work well with additional custom functions created by aWhere, such as generateaWhereChart(). These other custom functions are displayed in a supplementary tutorial, Creating charts with aWhere’s data.

Remember that R, RStudio, and the suite of R packages are free, open-source tools to help you query data, build visualizations, and replicate your work for other datasets, locations, and time periods. This tutorial is intended to help users quickly get started using R to query and process aWhere weather data into insights for decision-making. The scripts which contain the custom functions created by R and demonstrated in this tutorial are available upon request by contacting your organization’s aWhere representative, or emailing beawhere@awhere.com.