To review/streamline Nicole’s work on data cleaning and Allen’s work on DAYMET integration here after directory/file cleanup.
1.1 Internal body temperatures
1.2 DAYMET weather data
2 Quality control
To translate Allen’s work on outlier detection for this section.
3 Active season identification
Inference of active season dates relies only on location and internal body temperature data.
Code
# read in data and clean up columnsraw_season <- raw |>select(loc, snake_id, datetime, tb)# previewraw_season |>head() |>pander("Table: example data rows for season identification.")
Table: example data rows for season identification.
loc
snake_id
datetime
tb
Eatonton
Arnold
2021-10-08 23:16:06
22.01
Eatonton
Arnold
2021-10-09 00:00:06
21.89
Eatonton
Arnold
2021-10-09 01:00:06
22.01
Eatonton
Arnold
2021-10-09 02:00:06
22.01
Eatonton
Arnold
2021-10-09 03:00:06
22.01
Eatonton
Arnold
2021-10-09 04:00:06
22.01
3.1 Diurnal variation
On days when snakes are inactive, the variance in internal body temperatures is expected to be low. Series of daily standard deviations over the days of the year are shown for each snake by location below. The locations are ordered by latitude.
Code
# compute daily variance per snake and convert to functional datadvar_indiv <- raw_season |>group_by(snake_id, loc, yr =year(datetime), day =yday(datetime)) |>summarize(tb.sd =sd(tb), .groups ='drop') |>mutate(fid =consecutive_id(snake_id, loc, yr)) |>tf_nest(tb.sd, .id ='fid', .arg ='day') # plot daily variance seriesdvar_indiv |>ggplot(aes(y = tb.sd)) +geom_spaghetti(alpha =0.6, linewidth =0.2) +facet_wrap(~loc) +labs(y ='SD(temperature)', x ='day of year') + ggthm
Some series are quite short and data are noisy. Snakes with fewer than 30 days’ worth of data are removed and the series are smoothed using a simple 7-day rolling average.
Code
# drop cases with fewer than 30d and smoothdvar_indiv |>mutate(tfe =tf_count(tb.sd)) |>unnest(tfe) |>filter(tfe >30) |>mutate(tb.sd =tf_smooth(tb.sd, 'rollmean', k =7)) |>ggplot(aes(y = tb.sd)) +geom_spaghetti(alpha =0.6, linewidth =0.2) +facet_wrap(~loc) +labs(y ='SD(temperature)', x ='day of year') + ggthm
3.2 Location averages
The smoothed series are aggregated by location (i.e., across snake and year). The resulting series show median diurnal variation in body temperature at a given locale on a given day of the year. Note that missing data are replaced by linear interpolations.
Code
# same as above then aggregate by locationdvar_indiv |>mutate(tfe =tf_count(tb.sd)) |>unnest(tfe) |>filter(tfe >30) |>mutate(tb.sd =tf_smooth(tb.sd, 'rollmean', k =7)) |>group_by(loc) |>mutate(tb.sd =mean(tb.sd, depth ='pointwise', na.rm = T)) |>ggplot(aes(y = tb.sd)) +geom_spaghetti() +facet_wrap(~loc) +labs(y ='median SD(temperature)', x ='day of year') + ggthm
3.3 Changepoint detection
Mean changepoint detection via binary segmentation does a fairly good job at identifying apparent start and end dates for active seasons by location. There is some noise in the data that produces inference of superfluous changepoints. Nonetheless, a simple thresholding of inferred means at 1.5 (dashed horizontal line) adequately identifies start and end dates.
Code
# store seriesdvar_loc <- dvar_indiv |>mutate(tfe =tf_count(tb.sd)) |>unnest(tfe) |>filter(tfe >30) |>select(-tfe) |>mutate(tb.sd =tf_smooth(tb.sd, 'rollmean', k =7)) |>group_by(loc) |>mutate(tb.sd =mean(tb.sd, depth ='pointwise', na.rm = T)) |>distinct(loc, tb.sd) |>ungroup() |>tf_unnest(tb.sd) |>rename(day = tb.sd_arg) |>nest(c(day, tb.sd_value))# # for testing# j <- 6# .vals <- sd_series$data[[j]]$tb.sd_value# .args <- sd_series$data[[j]]$day# wrapper around changepoint detectioncp.fn <-function(.args, .vals, return ='means', q =4){# estimate changepoints via binary segmentation cpt.out <-cpt.mean(.vals,method ='BinSeg', Q = q) cp.ix <-cpts(cpt.out) cp.x <- cpt.out@param.est$mean# option 1: output estimated mean for each dayif(return =='means'){ out <-cut(.args, breaks = .args[c(1, cp.ix, length(.args))],labels =round(cp.x, 5), right = F, include.lowest = T) |>as.character() |>as.numeric() }# option 2: output days of year for each change and change typeif(return =='points'){ cp.delta <-diff(cp.x) out <-tibble(day = .args[cp.ix],delta = cp.delta,type =if_else(cp.delta >0, 'increase', 'decrease')) }return(out)}# # for testing# cp.fn(.args, .vals)# compute changepointschangepts <- dvar_loc |>mutate(tb.sd_mean =map(data, ~cp.fn(.x$day, .x$tb.sd_value, q =3))) |>unnest(everything())# inspect changepoints and threshold meanschangepts |>ggplot(aes(x = day, y = tb.sd_value)) +geom_path() +geom_path(aes(y = tb.sd_mean), color ='red', linewidth =0.5, alpha =0.5) +facet_wrap(~loc) +geom_hline(aes(yintercept =1.5), linetype ='dashed', linewidth =0.25) +labs(y ='median SD(temperature)', x ='day of year') + ggthm
Comments:
In Lordsburg, there is a changepoint in the middle of the end-of-season transition period; the threshold was set to exclude the tapering off. This is easily changed by reducing the threshold to 1.25.
See plots below. In Sedgwick and Montana de Oro, inferred start dates are unreliable due to missing/sparse data.
Figure: Inferred changepoints and individual-level diurnal variation series for MdO and Sedgwick.
In Montana de Oro, data were collected spanning 2017-2018 (from mid April to late February); a single snake appeared to be active for about a week from approximately 2/2/18 until 2/9/18, and then reverted to inactivity until the study endpoint on 2/21/18. The inferred changepoint is simply the start of data collection, and ignores the single active snake from early 2018.
In Sedgwick, data were collected spanning 2015-2016 (from mid May to late March), but only one snake was recorded during the 2016 calendar year; the inferred changepoint is the time at which that snake became active. It is notable that in a later year (2017), data are available for a single snake starting in mid April, and appear to show that snake as inactive until late April.
Code
# inspect start and end dates by year for mdo and sedgwickraw_season |>filter(loc %in%c('Montana', 'Sedgwick')) |>group_by(loc, yr =year(datetime)) |>summarize(first.obs =min(datetime) |>date(),last.obs =max(datetime) |>date()) |>pander(caption ="Table: Sampling dates for MdO and Sedgwick.")
Table: Sampling dates for MdO and Sedgwick.
loc
yr
first.obs
last.obs
Sedgwick
2015
2015-05-16
2015-12-31
Sedgwick
2016
2016-01-01
2016-03-24
Sedgwick
2017
2017-04-15
2017-12-11
Montana
2014
2014-04-16
2014-10-07
Montana
2017
2017-04-15
2017-12-31
Montana
2018
2018-01-01
2018-02-21
Options in these locations:
manually specify season start dates
use inferred changepoints as heuristic only
Tentatively, active season classifications are assigned using option (2). These are shown below for all locations with inferred changes (i.e., excluding Rodeo, McIntyre, and Chimineas, at which locations all data was classified as observed during active season). Note again that locations are ordered by (increasing) latitude.
52 snakes were identified as having multiple observations recorded at the same time for at least one day. Such days are tentatively removed, but this issue should be resolved during data cleanup.
The analysis developed here uses latitude, elevation, and temperature range as site-level attributes and internal body temperature and mass as snake-level attributes. Internal temperatures are converted to daily functional observations on the 0-24h domain. For each snake, days with sparse data (fewer than 10 observations) are removed from the analysis.