This is my portion of a collaborative project with a group of 10 scientists around the world (folks from Germany, New Zealand, Houston, Phoenix, and Santa Barbara!). One of my roles in this ongoing project was to find open source global climate data and extract temperature and precipitation for specific geographic sites for which we have biological data.

This document gives an overview of the process to download global climate data, download location data for the relevant sites, and stitch those together into a tidy data frame to share with my collaborators.

Load Packages

library(tidyverse) # a suite of packages for wrangling and tidying data
library(prism)     # package to access and download climate data
library(raster)    # the climate data comes in raster files- this package helps process those
library(popler)    # package to access and download biological data
library(stringr)
library(magrittr)

Fetch Climate Data

Prism is an open source climate database that compiles data from a range of monitoring networks around the world and uses sophisticated modeling techniques to interpolate across different spatial and temporal resolutions. Here, I’ve extracted global monthly mean temperature and precipitation data from 2000-2016.

# First, set a file path where prism data will be stored
options(prism.path = 'C:\\Users\\Shannon\\Documents\\F18 Topics in Ecology\\prism.path')

# Now, select the type of data (mean temperature and precipitation for us) and date range
get_prism_monthlys(type = 'tmean', years = 2000:2016, mo = 1:12, keepZip = F)
get_prism_monthlys(type = 'ppt', years = 2000:2016, mo = 1:12, keepZip = F)

Process Climate Data

The data is downloaded as a zip folder of raster files. Luckily, R’s ‘prism’ package has some simple functions to compile the files into a format we can work with. Here, I first stack the raster data and then extract coordinates.

# Grab the prism data and compile the files
climate_data <- ls_prism_data() %>%  
  prism_stack(.)  

# Extract project coordinates from raster stack
climate_crs <- climate_data@crs@projargs

Fetch & Process LTER Data

Now I have the whole world’s climate data, but I only need the data for specific locations for which we have biological data. These locations are Long Term Ecological Research (LTER) sites. This network of 30 sites across North America supports a huge amount of ecological research. I think this project is really neat because we don’t collect any data for it– it’s all coming from open sources! At each site, there is census data for a range of different animals- everything from bison to penguins to reef fish. Eventually, this project aims to connect population sizes of these animals to climate. But we’re not there yet. At this stage, I’m only pulling the LTER data so I can extract the location coordinates to connect with the climate data. Here, I download all LTER site metadata from a database called “popler”. From the data pulled from popler, I extract the coordinates for each LTER site, put them in the same coordinate reference system (CRS) as the Prism climate data, and match them.

# First, pull the metadata for all LTER sites using pplr_browse
lter_sites <- popler::pplr_browse()

# Select just the lat/long and site ID (3-letter code for site) columns and make a df
lter_sites <- lter_sites %>%
  dplyr::select(lng_lter,lat_lter, lterid)
lter_sites <- as.data.frame(lter_sites)

# Convert these locations to format that can be matched to Prism climate data
coordinates(lter_sites) <- c('lng_lter', 'lat_lter')
proj4string(lter_sites) <- CRS(climate_crs)

Join Climate & LTER Data

The Prism climate data comes in pretty nasty shape, so here I wrangle that into a manageable format that I can join with the LTER data.

# Extract the  extracted from the raster stack for those sites 
climate_lter <- data.frame(coordinates(lter_sites), 
                   lter_sites$lterid, 
                   extract(climate_data, lter_sites))

# Reshape data. Col 1:3 are lat, long, and site ID. Col 4:ncol are climate data
# Column headers include date and climate type info
climate_lter <- climate_lter %>% 
  gather(date, value, 4:ncol(climate_lter))

# The column header includes the date and data type, but also some other metadata that we don't need
# Here, I remove the extra info from the column header
climate_lter$date <- gsub('PRISM_', '', climate_lter$date) %>% 
  gsub('stable_4kmM3_', '', .) %>% 
  gsub('stable_4kmM2_', '', .) %>%
  gsub('_bil', '', .)

# Split header into type (precipitation or temperature), year, and month
climate_lter <- separate(climate_lter, 'date', 
                 into = c('type', 'YearMonth'), 
                 sep = '_')
climate_lter <- separate(climate_lter, 'YearMonth',
                 into = c('year', 'month'),
                 sep = 4)

# Reshape data-- make a separate column for temperature and precipitation
climate_lter <- unique(climate_lter)
climate_lter <- climate_lter %>% 
  spread(type, value) %>%
  rename(lng = lng_lter, lat = lat_lter, lterid = lter_sites.lterid)

# Make year and month numeric variables
climate_lter$year  <- as.numeric(climate_lter$year)

# Order data by LTER site
climate_lter <- climate_lter[order(climate_lter$lterid),]

View Data

The final product is a long format dataset with lat/long, LTER site ID code, year and month, precipitation (in mm) and mean temperature (in \(^\circ\)C)

head(climate_lter, 15) # view the first 15 rows of data
##         lng   lat lterid year month    ppt  tmean
## 973 -122.26 44.21    AND 1990    01 476.16  3.370
## 974 -122.26 44.21    AND 1990    02 298.48  1.255
## 975 -122.26 44.21    AND 1990    03 111.90  6.930
## 976 -122.26 44.21    AND 1990    04 232.78 11.530
## 977 -122.26 44.21    AND 1990    05 120.62 11.800
## 978 -122.26 44.21    AND 1990    06  97.38 15.705
## 979 -122.26 44.21    AND 1990    07  20.63 20.925
## 980 -122.26 44.21    AND 1990    08  67.54 20.120
## 981 -122.26 44.21    AND 1990    09  14.92 18.580
## 982 -122.26 44.21    AND 1990    10 204.72  9.985
## 983 -122.26 44.21    AND 1990    11 313.74  5.225
## 984 -122.26 44.21    AND 1990    12 157.49 -1.485
## 985 -122.26 44.21    AND 1991    01 208.61  1.860
## 986 -122.26 44.21    AND 1991    02 195.04  6.925
## 987 -122.26 44.21    AND 1991    03 195.84  4.485
str(climate_lter)      # view the structure of the data
## 'data.frame':    8100 obs. of  7 variables:
##  $ lng   : num  -122 -122 -122 -122 -122 ...
##  $ lat   : num  44.2 44.2 44.2 44.2 44.2 ...
##  $ lterid: Factor w/ 25 levels "AND","ARC","BNZ",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ year  : num  1990 1990 1990 1990 1990 1990 1990 1990 1990 1990 ...
##  $ month : chr  "01" "02" "03" "04" ...
##  $ ppt   : num  476 298 112 233 121 ...
##  $ tmean : num  3.37 1.25 6.93 11.53 11.8 ...

Plots

These plots aren’t super informative or attractive, but are useful for confirming that everything went well. I used these to make sure the climate data was in the range expected and spot check some sites.

First, precipitation: CAP, JRN, and SEV are 3 sites in Arizona/New Mexico, and as expected, they look quite dry. The site with the most rainfall, AND, is in Portland. So looks like the precipitation data worked well. The sites that don’t have data are either in Antarctica or oceanic islands. So my next step is to figure out why Prism can’t produce that data and find a workaround or alternative data source.

Now, temperature: Here, I colored points by month so we can spot check by seasonal trends as well as sites. It’s clear that the blue points that represent summer months are warmer than red/orange points that represent winter months. Good. The hottest sites, CAP and FCE, are in Phoenix and the Florida Everglades.