This is my portion of a collaborative project with a group of 10 scientists around the world (folks from Germany, New Zealand, Houston, Phoenix, and Santa Barbara!). One of my roles in this ongoing project was to find open source global climate data and extract temperature and precipitation for specific geographic sites for which we have biological data.
This document gives an overview of the process to download global climate data, download location data for the relevant sites, and stitch those together into a tidy data frame to share with my collaborators.
library(tidyverse) # a suite of packages for wrangling and tidying data
library(prism) # package to access and download climate data
library(raster) # the climate data comes in raster files- this package helps process those
library(popler) # package to access and download biological data
library(stringr)
library(magrittr)Prism is an open source climate database that compiles data from a range of monitoring networks around the world and uses sophisticated modeling techniques to interpolate across different spatial and temporal resolutions. Here, I’ve extracted global monthly mean temperature and precipitation data from 2000-2016.
# First, set a file path where prism data will be stored
options(prism.path = 'C:\\Users\\Shannon\\Documents\\F18 Topics in Ecology\\prism.path')
# Now, select the type of data (mean temperature and precipitation for us) and date range
get_prism_monthlys(type = 'tmean', years = 2000:2016, mo = 1:12, keepZip = F)
get_prism_monthlys(type = 'ppt', years = 2000:2016, mo = 1:12, keepZip = F)The data is downloaded as a zip folder of raster files. Luckily, R’s ‘prism’ package has some simple functions to compile the files into a format we can work with. Here, I first stack the raster data and then extract coordinates.
# Grab the prism data and compile the files
climate_data <- ls_prism_data() %>%
prism_stack(.)
# Extract project coordinates from raster stack
climate_crs <- climate_data@crs@projargsNow I have the whole world’s climate data, but I only need the data for specific locations for which we have biological data. These locations are Long Term Ecological Research (LTER) sites. This network of 30 sites across North America supports a huge amount of ecological research. I think this project is really neat because we don’t collect any data for it– it’s all coming from open sources! At each site, there is census data for a range of different animals- everything from bison to penguins to reef fish. Eventually, this project aims to connect population sizes of these animals to climate. But we’re not there yet. At this stage, I’m only pulling the LTER data so I can extract the location coordinates to connect with the climate data. Here, I download all LTER site metadata from a database called “popler”. From the data pulled from popler, I extract the coordinates for each LTER site, put them in the same coordinate reference system (CRS) as the Prism climate data, and match them.
# First, pull the metadata for all LTER sites using pplr_browse
lter_sites <- popler::pplr_browse()
# Select just the lat/long and site ID (3-letter code for site) columns and make a df
lter_sites <- lter_sites %>%
dplyr::select(lng_lter,lat_lter, lterid)
lter_sites <- as.data.frame(lter_sites)
# Convert these locations to format that can be matched to Prism climate data
coordinates(lter_sites) <- c('lng_lter', 'lat_lter')
proj4string(lter_sites) <- CRS(climate_crs)The Prism climate data comes in pretty nasty shape, so here I wrangle that into a manageable format that I can join with the LTER data.
# Extract the extracted from the raster stack for those sites
climate_lter <- data.frame(coordinates(lter_sites),
lter_sites$lterid,
extract(climate_data, lter_sites))
# Reshape data. Col 1:3 are lat, long, and site ID. Col 4:ncol are climate data
# Column headers include date and climate type info
climate_lter <- climate_lter %>%
gather(date, value, 4:ncol(climate_lter))
# The column header includes the date and data type, but also some other metadata that we don't need
# Here, I remove the extra info from the column header
climate_lter$date <- gsub('PRISM_', '', climate_lter$date) %>%
gsub('stable_4kmM3_', '', .) %>%
gsub('stable_4kmM2_', '', .) %>%
gsub('_bil', '', .)
# Split header into type (precipitation or temperature), year, and month
climate_lter <- separate(climate_lter, 'date',
into = c('type', 'YearMonth'),
sep = '_')
climate_lter <- separate(climate_lter, 'YearMonth',
into = c('year', 'month'),
sep = 4)
# Reshape data-- make a separate column for temperature and precipitation
climate_lter <- unique(climate_lter)
climate_lter <- climate_lter %>%
spread(type, value) %>%
rename(lng = lng_lter, lat = lat_lter, lterid = lter_sites.lterid)
# Make year and month numeric variables
climate_lter$year <- as.numeric(climate_lter$year)
# Order data by LTER site
climate_lter <- climate_lter[order(climate_lter$lterid),]The final product is a long format dataset with lat/long, LTER site ID code, year and month, precipitation (in mm) and mean temperature (in \(^\circ\)C)
head(climate_lter, 15) # view the first 15 rows of data## lng lat lterid year month ppt tmean
## 973 -122.26 44.21 AND 1990 01 476.16 3.370
## 974 -122.26 44.21 AND 1990 02 298.48 1.255
## 975 -122.26 44.21 AND 1990 03 111.90 6.930
## 976 -122.26 44.21 AND 1990 04 232.78 11.530
## 977 -122.26 44.21 AND 1990 05 120.62 11.800
## 978 -122.26 44.21 AND 1990 06 97.38 15.705
## 979 -122.26 44.21 AND 1990 07 20.63 20.925
## 980 -122.26 44.21 AND 1990 08 67.54 20.120
## 981 -122.26 44.21 AND 1990 09 14.92 18.580
## 982 -122.26 44.21 AND 1990 10 204.72 9.985
## 983 -122.26 44.21 AND 1990 11 313.74 5.225
## 984 -122.26 44.21 AND 1990 12 157.49 -1.485
## 985 -122.26 44.21 AND 1991 01 208.61 1.860
## 986 -122.26 44.21 AND 1991 02 195.04 6.925
## 987 -122.26 44.21 AND 1991 03 195.84 4.485
str(climate_lter) # view the structure of the data## 'data.frame': 8100 obs. of 7 variables:
## $ lng : num -122 -122 -122 -122 -122 ...
## $ lat : num 44.2 44.2 44.2 44.2 44.2 ...
## $ lterid: Factor w/ 25 levels "AND","ARC","BNZ",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ year : num 1990 1990 1990 1990 1990 1990 1990 1990 1990 1990 ...
## $ month : chr "01" "02" "03" "04" ...
## $ ppt : num 476 298 112 233 121 ...
## $ tmean : num 3.37 1.25 6.93 11.53 11.8 ...
These plots aren’t super informative or attractive, but are useful for confirming that everything went well. I used these to make sure the climate data was in the range expected and spot check some sites.
First, precipitation: CAP, JRN, and SEV are 3 sites in Arizona/New Mexico, and as expected, they look quite dry. The site with the most rainfall, AND, is in Portland. So looks like the precipitation data worked well. The sites that don’t have data are either in Antarctica or oceanic islands. So my next step is to figure out why Prism can’t produce that data and find a workaround or alternative data source.
Now, temperature: Here, I colored points by month so we can spot check by seasonal trends as well as sites. It’s clear that the blue points that represent summer months are warmer than red/orange points that represent winter months. Good. The hottest sites, CAP and FCE, are in Phoenix and the Florida Everglades.