Introduction

The swnsmodelr package was designed for users to easily model SWNS weather station data with rasters, stations, input variable(s) and models of their chosing. The package contains functions and weather station data, and input rasters are contained within the package’s R project. This document follows a workflow of preparing data, generating models, and generating outputs. The objects generated at the various steps in the example are stored in swnsmodelr.

Exract Rasters to Weather Station Data

The first step is to extract raster values to corresponding weather station in the weather station data frame. For many rasters, (ex: elevation, proximity to coast, aspect) the value will be same every day for a station. Other rasters (ex: solar radiation) change on a temporal basis, and will have a unique value for each station every day. The two type of rasters, constant, and temporally changing, will be treated differently in the value extract process.

Weather Stations Data Frame

The data frame, swns_stations_df, stored within swnsmodelr, contains SWNS daily temperature data from 2012 - 2017. The data frame has columns for: station I.D. (stationid), date (date_time), minimum temperature (temp_min), maximum temperature (temp_max), and mean temperature (temp_mean). One row of data represents one daily record for one station. Use str(), to see the column names and variables types.

str(swns_stations_df)
## 'data.frame':    188163 obs. of  7 variables:
##  $ stationid: chr  "27141" "27141" "27141" "27141" ...
##  $ date_time: Date, format: "2012-01-01" "2012-01-02" ...
##  $ temp_min : num  -0.8 1 -9.8 -12.2 -8 -10.5 -5.4 -2.4 -9.6 -8.1 ...
##  $ temp_mean: num  2.9 6.4 -3.9 -10 -3.9 -7.8 -0.5 1.1 -5.9 -1.7 ...
##  $ temp_max : num  6.5 11.7 2.1 -7.8 0.3 -5 4.4 4.5 -2.2 4.7 ...
##  $ EASTING  : num  383226 383226 383226 383226 383226 ...
##  $ NORTHING : num  4991426 4991426 4991426 4991426 4991426 ...

Constant Rasters

The constant rasters should be stored as a list of raster objects. The code below is an example of how to create a list of raster objects from rasters available in the package project. The rasters are from the 200 subdirectory within the Rasters folder. In this subdirectory, rasters have a spatial resolution of 200m.

# Names of rasters in Rasters directory (everything before extension)
rasters_names_list <- list("dem",  # elevation
                          "ptoc", # proxmity to coast
                          "east", # easting
                          "north",# northing
                          "asp",  # aspect
                           "tpi", # topographic position index
                          "slope") # slope
# Make into rasters objects list
rasters_list <- lapply(FUN = raster,
                       X   = paste0("F:\\Rasters\\200\\",rasters_names_list, ".tif"))

# Plot brick of rasters
rasters_list %>% brick() %>% plot()

The constant raster values are added to the swns_stations_df table with extract_constant_rasters_values(). A new frame will be made called swns_stations_df_200, which is the original table plus constant raster values at 200m resolution

swns_stations_df_200 <- extract_constant_raster_values(swns_stations_df, rasters_list)
str(swns_stations_df_200)
## 'data.frame':    188163 obs. of  14 variables:
##  $ stationid: chr  "27141" "27141" "27141" "27141" ...
##  $ date_time: Date, format: "2012-01-01" "2012-01-02" ...
##  $ temp_min : num  -0.8 1 -9.8 -12.2 -8 -10.5 -5.4 -2.4 -9.6 -8.1 ...
##  $ temp_mean: num  2.9 6.4 -3.9 -10 -3.9 -7.8 -0.5 1.1 -5.9 -1.7 ...
##  $ temp_max : num  6.5 11.7 2.1 -7.8 0.3 -5 4.4 4.5 -2.2 4.7 ...
##  $ EASTING  : num  383226 383226 383226 383226 383226 ...
##  $ NORTHING : num  4991426 4991426 4991426 4991426 4991426 ...
##  $ dem      : num  59.2 59.2 59.2 59.2 59.2 ...
##  $ ptoc     : num  17782 17782 17782 17782 17782 ...
##  $ east     : num  383177 383177 383177 383177 383177 ...
##  $ north    : num  4991432 4991432 4991432 4991432 4991432 ...
##  $ asp      : num  131 131 131 131 131 ...
##  $ tpi      : num  0.796 0.796 0.796 0.796 0.796 ...
##  $ slope    : num  1.83 1.83 1.83 1.83 1.83 ...

Temporally Changing Rasters

In this modelling procedure, a solar radiation raster for each day is used an input into the model. The large quantity of rasters makes it problematic to store rasters as a list of raster objects, as with the constant rasters. To overcome this issue, there are two steps to adding temporally changing rasters to swns_stations_df. In the first step, the file path to each raster is stored as a row in a new dataframe. The date is then parsed from the file path name, to a new column of corresponding dates. This way, rasters can be selected and converted to rasters objects based on dates. The data frame, referred to as a ‘temporal rasters data frame’, is made with make_temporal_rasters_df(). The user must define the “date characters” or date_chars argrument, which is the index of characters that contain the date in each file path. The date format (date_format agrument) must also be defined (see ?as.Date). The user also defines a start and end date of rasters to include, and the extension of the rasters. Using the file “Sum_Irradiance_2012_1.tif” as an example: the date is formed with the characters 16 to -5, and the format is “%Y_%j”. A sample use of the function for the solar radiation rasters is shown below:

solar_irradiance_rasters_df <- make_temporal_raster_df(
  in_folder = "F:\\rasters\\GOES_200m_2",
  start_date = ymd('2012-01-01'),
  end_date   = ymd('2012-12-31'),
  date_chars = c(16,-5),
  date_format = "%Y_%j",
  extension = ".tif")

head(solar_irradiance_rasters_df)
##                                              path_field  date_time
## 1   F:\\rasters\\GOES_200m_2\\Sum_Irradiance_2012_1.tif 2012-01-01
## 2  F:\\rasters\\GOES_200m_2\\Sum_Irradiance_2012_10.tif 2012-01-10
## 3 F:\\rasters\\GOES_200m_2\\Sum_Irradiance_2012_100.tif 2012-04-09
## 4 F:\\rasters\\GOES_200m_2\\Sum_Irradiance_2012_101.tif 2012-04-10
## 5 F:\\rasters\\GOES_200m_2\\Sum_Irradiance_2012_102.tif 2012-04-11
## 6 F:\\rasters\\GOES_200m_2\\Sum_Irradiance_2012_103.tif 2012-04-12

Finally, to add the values of each temporal raster to the weather station data, use extract_temporal_raster_values(). This may take a while if there are many temporal rasters. The function add a column for the set of temporally changing rasters. The argument col_name is used to specify the name of the column. By setting verbose to true, the function will print a completion report for every raster that has been succesfully extracted to the weather stations data frame. The output data frame was named swns_stations_df_200, because it has the raster values extracted from 200m resolution.

swns_stations_df_200  <- extract_temporal_raster_values(temporal_rasters_df = solar_irradiance_rasters_df,
                               temperatures_df = swns_stations_df_200,
                               col_name = "sum_irradiance",
                               verbose = FALSE)
head(swns_stations_df_200)
##   stationid  date_time temp_min temp_mean temp_max  EASTING NORTHING    dem
## 1     27141 2012-01-01     -0.8       2.9      6.5 383226.3  4991426 59.225
## 2     27141 2012-01-02      1.0       6.4     11.7 383226.3  4991426 59.225
## 3     27141 2012-01-03     -9.8      -3.9      2.1 383226.3  4991426 59.225
## 4     27141 2012-01-04    -12.2     -10.0     -7.8 383226.3  4991426 59.225
## 5     27141 2012-01-05     -8.0      -3.9      0.3 383226.3  4991426 59.225
## 6     27141 2012-01-06    -10.5      -7.8     -5.0 383226.3  4991426 59.225
##       ptoc     east   north      asp       tpi    slope sum_irradiance
## 1 17782.27 383177.3 4991432 130.8961 0.7963742 1.832468       533.4339
## 2 17782.27 383177.3 4991432 130.8961 0.7963742 1.832468       441.0299
## 3 17782.27 383177.3 4991432 130.8961 0.7963742 1.832468       990.5778
## 4 17782.27 383177.3 4991432 130.8961 0.7963742 1.832468       909.8625
## 5 17782.27 383177.3 4991432 130.8961 0.7963742 1.832468       388.7767
## 6 17782.27 383177.3 4991432 130.8961 0.7963742 1.832468      1445.6809

Separate Stations into Modelling and Validating Stations

There are weather stations from AGRG (study stations), ECAN and DNR (external stations) in the data frame. Ideally, the study stations would be used to model the data, and the external stations to validate the models. To be able to differeniate at any time between study (AGRG) and external (ECAN and DNR) stations, the station I.D.’s were sorted into two lists: study_stations_list and ext_stations_list. In the code below, the lists were used to separate the stations, and they were plotted to show their locations.

ggplot(data = swns_stations_df %>%
         filter(!duplicated(stationid)) %>%
         dplyr::mutate(origin = if_else(stationid %in% study_stations_list,
                                                               "study",
                                                               "ext"))) +
  geom_point(aes(x = EASTING, y = NORTHING, colour = origin)) +
  coord_fixed()

The study stations alone do not provide sufficient coverage in the eastern most portion surrounding the Halifax area, and the Annapolis valley. There are external stations in those areas which were included in the model stations along with the study stations. The external stations that were included as model stations were ‘6354’ in Greenwood, ‘6456’ north is St. Margaret’s Bay, ‘47187’ in Halifax, ‘DN025’ south of Digby and ‘6497’ near Berwick. A number of study stations were used in validating the models rather than generating them, these stations were: “YA4”,“S160”,“AN3”,“CL4”,“KE5”,“WE5”,“WE2”,“LI2”,“CH4”. These stations are located in areas represented by model stations but not so close that their modelling values were essentially the same.

The following code was used to separate the stations, and can be easily altered to exclude or include stations in either category:

# Choose modelling stations
model_stations_df <- swns_stations_df_200 %>%
          # 1. Include all study stations
  filter(stationid %in% study_stations_list |
          # 2. Included these ext stations
          stationid %in% c("6354","47187","6456","DNR025","6497")) %>% 
          # 3. Remove these study stations
  filter((stationid %in% c("YA4","S160","AN3","CL4","KE5","WE5","WE2","LI2","CH4")) == FALSE)

# Choose validation stations
val_stations_df   <- swns_stations_df_200 %>% 
          # 1. Include all ext stations
  filter(stationid %in% ext_stations_list  |    
          # 2. Add these study stations
           stationid %in% c("YA4","S160","AN3","CL4","KE5","WE5","WE2","LI2","CH4")) %>%
          # 3. Remove these ext stations
  filter(!(stationid %in% c("6354", "47187", "6456","27141", "DLG006", "DNR024",
                            "6501", "DNR025", "DNR003", "6497")             
           )
         ) 

This is the resulting modelling and validation stations:

ggplot(data = swns_stations_df %>%
         filter(!duplicated(stationid)) %>%
         dplyr::mutate(role = if_else(stationid %in% unique(model_stations_df$stationid),
                                                               "model",
                                                               "validate"))) +
  geom_point(aes(x = EASTING, y = NORTHING, colour = role)) +
  coord_fixed()

Examine Raster Variable Relationship with Weather Station Temperature

Weather Station Coverage of Raster Variability

model_stations_sp <- model_stations_df %>% 
  filter(!duplicated(stationid)) %>%
  dplyr::select(stationid, EASTING, NORTHING)
coordinates(model_stations_sp) = ~ EASTING + NORTHING

par(mfrow = c(2,2))
for(i in seq_along(rasters_list)){
  raster_values <- raster::extract(rasters_list[[i]], model_stations_sp) 
  if(names(rasters_list[[i]]) == "slope"){
    plot(x = 1:(length(raster_values)),
       y = raster_values[sort.list(raster_values)], 
       ylab = names(rasters_list[[i]]),
       xlab = "Stations",
       ylim = c(0,90))
  }else{
  plot(x = 1:(length(raster_values)),
       y = raster_values[sort.list(raster_values)], 
       ylab = names(rasters_list[[i]]),
       xlab = "Stations",
       ylim = c(minValue(rasters_list[[i]]), maxValue(rasters_list[[i]])))
}
  
}

Generate and Validate Models

A set of functions in swnsmodelr generate GAMs based on a model station data, then apply the models to validation station data, and computes the error at each validation record. The output from the functions is the validation station data frame with columns for the error (resid), GCV score(gcv), and adjusted R square () added. There is a function written for the three timeframes, daily, weekly, and monthly, respectively: validate_daily_GAMs(), validate_weekly_GAMs(), validate_monthly_GAMs().

The functions allow the user to specify an alternative formula, with the argument alt_formula. This feature was specifically implemented for the case that solar radiation data is missing for a particular date. The function will fail if the data is missing, and fall back on the alternative formula. If there is no suspiscion that a missing variable will cause the function to fail, this argument can be left blank.

daily_val_df <- validate_daily_GAMs(model_stations_df,
                                     val_stations_df,
                                     years = 2012,
                                     days = 1:3,
                                    formula = "temp_mean ~ s(dem) + s(asp) + s(tpi) 
                                    + s(ptoc, k= 3) + s(east,north) + s(sum_irradiance)",
                                    alt_formula ="temp_mean ~ s(dem) + s(asp) + s(tpi) 
                                    + s(ptoc, k = 3) + s(east,north)")
## Error in validate_daily_GAMs(model_stations_df, val_stations_df, years = 2012, : could not find function "validate_daily_GAMs"

Two ways of visualizing the errors are by histogram, and a scatterplot against the dependent variable.

ggplot(data = daily_val_df, aes(resid))+
  labs(x = "Residuals",y = "Count") +
  geom_histogram(position = "dodge", colour = "black", fill = "pink",bins = 30)
## Error in ggplot(data = daily_val_df, aes(resid)): object 'daily_val_df' not found
ggplot(data = daily_val_df, aes(x = temp_mean, y = resid)) +
  geom_point() +
  geom_smooth()
## Error in ggplot(data = daily_val_df, aes(x = temp_mean, y = resid)): object 'daily_val_df' not found

Generating Output

Overview

With the combination of a model stations data frame, model formula, and input rasters, functions in swnsmodelr generate different types of raster outputs. The functions daily_model_predict_rasters(), and weekly_model_predict_rasters() generates daily rasters of the dependent variable with the specified model applied in daily or weekly timeframes, respectively.

The prediction functions will fail if the rasters in the rasters_list, and the rasters pointed to in the temporal_rasters_df, do not have the same extents, and number of rows and columns. The function raster::compareRaster() will be return TRUE for the rasters_list if the properties are the same. If not, the function raster::resample() can be used to correct the issue.

The function generate_mean_rasters(), calculates mean rasters for sets of maximum and minimum rasters that correpsond by date. This function was written incase a user would rather model daily minimum and maximum temperatures before calculating daily mean temperature.

Finally, the function generate_output_gdd(), generates GDD rasters from daily mean temperature rasters. The user can specify the GDD base, and frequency of outputs (daily, weekly, monthly or yearly). There is also an agrument called growing_season that when set to TRUE, only accumulates GDD for April to November.

Predict Daily Mean Temperature Rasters

Continuing from the previous section, the code below demonstrates how to use daily_model_predict_rasters().As with the model validation, a formula and an alternative formula (alt_formula) can be specified. The alt_formula is used if the formula causes gam() to fail. This would occur is a variable is a missing for the date being modelling, which is the case randomly with sum_irradiance.

# Predict daily mean temperature
daily_model_predict_rasters(start_date = ymd('2012-01-01'),
                               end_date = ymd('2017-12-31'),
                               formula = "temp_mean ~
                                          s(east,north) +
                                          s(dem) +
                                          s(sum_irradiance) +
                                          s(ptoc, k= 3)",
                               alt_formula = "temp_mean ~
                                          s(east,north) +
                                          s(dem) +
                                          s(ptoc, k= 3)",
                               temperatures_df = model_stations_df,
                               input_rasters = rasters_list,
                               temporal_rasters_df = solar_irradiance_rasters_df,
                               output_folder = "F://daily_models//mean_temperature",
                               output_ext = "tif",
                               verbose = TRUE)

Calculate Growing Degree Day (GDD) Rasters

Rasters of GDD are calculated by raster algebra with the daily mean temperature rasters. The first step is to make a data frame with make_temporal_rasters_df() that contains the file paths to the daily mean temperature rasters, and their corresponding date. The function output_gdd_rasters() applies the specified GDD base to the mean temperature rasters, accumulates them daily, and sends them to the output folder at the specified timeframe, output_timeframe. The accumulation only occurs within a year, and can be set to only include April 1st - November 30th by setting growing_season to TRUE. The code below demonstrates how to use the two functions to generate GDD rasters.

temp_mean_df <- make_temporal_raster_df("F://daily_models//mean_temperature",
                                        start_date = ymd('2012-01-01'),
                                        end_date   = ymd('2017-12-31'),
                                        date_chars = c(10,19),
                                        date_format = "%Y-%m-%d")

generate_gdd_output(temp_mean_df,
                    gdd_base = 5,
                    start_date = start_date,
                    end_date = end_date,
                    output_time_slice = "monthly",
                    growing_season = TRUE,
                    output_folder = "Z:\\Dana\\Weekly\\Monthly_GDD_200_7",
                    plot_gdd_raster = TRUE)