Rattlesnake thermoregulation

Analysis of internal body temperatures

Updated

July 1, 2026

Comments look like this.

To-do items look like this.

1 Datasets

Study data are organized across three primary sources:

  1. data/site-metadata.csv — geographic and collaboration information for each of the 19 field sites.
  2. data/snake-metadata.csv — physical and biological attributes for each of the 210 individual snakes, including a path to the corresponding body temperature file.
  3. data/_raw/ — one CSV per snake with timestamped internal body temperature recordings from iButton sensors.

An integrated, quality-controlled workspace is provided as data/processed/tbdata.RData. It contains tb (body temperature time series), meta_snake, and meta_site.

tbdata.RData is produced in two sequential steps. analysis/processing/data-processing.r reads the raw CSV files, joins metadata, and detects basic integrity issues (duplicate timestamps, corrupted observations). analysis/processing/outliers.r then applies biological bounds filtering, removes short discontinuous segments, and flags rolling-window outliers as NA. See the Outlier detection section for details.

1.1 Internal body temperatures

Body temperature data from 210 snakes were collected at 19 sites across western North America, spanning 2013–2025.

Sampling locations and snake counts.
locale state country lat lon elev n
Alberta AB CAN 50.85 -110 710 2
Bouse AZ USA 33.84 -114 360 8
Chimineas Ranch CA USA 35.17 -119.9 744 23
Eatonton GA USA 33.23 -83.52 202 3
Indio Mountains Research Station TX USA 30.75 -105 1250 22
Kamloops-Thompson BC CAN 50.75 -120.5 697 8
Lordsburg NM USA 32.24 -108.9 1274 15
McIntyre Landing MI USA 44.65 -84.86 NA 8
Montana de Oro CA USA 35.26 -120.9 24 15
Norden NE USA 42.8 -100.1 714 1
Owl Head Buttes AZ USA 32.6 -111.1 840 5
Powerlines AZ USA 32.62 -111.3 635 3
Quartzite AZ USA 33.78 -114.1 360 5
Rattlesnake Butte CO USA 40.57 -107 2200 18
Rodeo NM USA 31.89 -109 1257 17
Sedgwick Reserve CA USA 34.69 -120 342 28
Vandenberg Airforce Base CA USA 34.74 -120.6 429 19
Youngblood Place TX USA 30.33 -103.1 1330 10

Sampled individuals belonged to 11 species. Most were represented at a single site.

Sample sizes by species.
spp n_indiv n_sites
crotalus oreganus 85 4
crotalus viridis 43 5
crotalus atrox 25 5
crotalus scutulatus 16 1
crotalus oreganus oreganus 8 1
sistrurus catenatus 8 1
crotaus pyrrhus 7 2
crotalus lepidus 6 1
crotalus ornatus 5 1
crotalus scutulatus x viridis 4 2
crotalus horridus 3 1
Sampling date ranges by site.
locale state start end years
Chimineas Ranch CA Apr 2010 Oct 2017 2
Vandenberg Airforce Base CA Apr 2012 Nov 2017 3
Montana de Oro CA Apr 2014 Feb 2018 3
Eatonton GA Apr 2021 Apr 2021 1
Owl Head Buttes AZ Apr 2022 Aug 2022 1
Quartzite AZ Apr 2023 Aug 2023 1
Kamloops-Thompson BC Jul 2005 Apr 2006 2
Alberta AB Jul 2021 Jul 2022 2
Rattlesnake Butte CO Jun 2017 May 2021 4
Lordsburg NM Jun 2020 Sep 2021 2
Rodeo NM Jun 2021 Sep 2021 1
Youngblood Place TX Jun 2021 Aug 2022 2
McIntyre Landing MI May 2014 Aug 2014 1
Sedgwick Reserve CA May 2015 Dec 2017 3
Indio Mountains Research Station TX May 2019 Oct 2023 5
Norden NE May 2022 Jun 2022 1
Powerlines AZ May 2022 Aug 2023 2
Bouse AZ May 2023 Aug 2023 1

Sampling sites across North America.

1.2 DAYMET weather data

Daily weather covariates (minimum and maximum air temperature) are sourced from the ORNL Daymet 1-km gridded product, fetched via API by site coordinate and covering 1980–2024.

Example rows from DAYMET data.
lat lon date year yday dayl prcp tmax tmin vp
30.33 -103.1 1980-01-01 1980 1 36293 0 18.54 -3.63 423.8
30.33 -103.1 1980-01-02 1980 2 36321 0 17.27 1.12 641.6
30.33 -103.1 1980-01-03 1980 3 36351 0 12.22 -1.7 539.1
30.33 -103.1 1980-01-04 1980 4 36383 0 17.16 -2.94 456.1
30.33 -103.1 1980-01-05 1980 5 36418 0 19.23 -3.21 431.4
30.33 -103.1 1980-01-06 1980 6 36454 0 19.27 -1.04 513.5

2 Exploratory Data Analysis

2.1 Outlier detection

Quality control is implemented in analysis/processing/outliers.r and proceeds in three stages.

1. Biological bounds. Observations with \[T_b < -5°C\] or \[T_b > 40°C\] are removed as physiologically implausible.

2. Discontinuity removal. Time gaps larger than 24 hours define segment boundaries within each snake’s series. Segments shorter than 7 days are dropped, as they are too short to contribute reliable information and may reflect instrument artifacts or re-deployment gaps.

3. Rolling-window outlier detection. For each observation a local mean (0.5-day window) and local SD (4-day window) are estimated using centered rolling statistics, with fallback to forward- or backward-looking windows near series endpoints. An observation is flagged if it deviates from the rolling mean by more than 6 SD and by more than 1°C. Flagged values are set to NA; rows are preserved.

The code block below shows the core logic of each stage.

Code
n_days_mean <- 0.5  # rolling-mean window
n_days_sd   <- 4    # rolling-SD window
n_sd        <- 6    # outlier threshold

# Stages 1-2: biological bounds + discontinuity removal
tb_qc <- tb |>
  filter(tb > -5 & tb < 40) |>
  group_by(file) |>
  arrange(datetime) |>
  mutate(
    lagged            = datetime - lag(datetime, default = first(datetime)),
    is_not_contiguous = cumsum(duration(24, "hours") < lagged)
  ) |>
  group_by(file, is_not_contiguous) |>
  mutate(chunk_length = max(last(datetime) - first(datetime))) |>
  filter(chunk_length > duration(7, "days")) |>
  ungroup()

# Stage 3: per-file rolling mean and SD, then flag deviations
#   NB: outliers.r additionally uses forward/backward windows near series
#   endpoints; this shows the centered-window core for brevity.
tb_aug <- tb_qc |>
  group_by(file) |>
  group_modify(function(tbdata, ...) {
    samp_freq <- median(diff(as.numeric(tbdata$datetime)), na.rm = TRUE)
    w_mean    <- ceiling(as.numeric(duration(n_days_mean, "days")) / samp_freq)
    w_sd      <- ceiling(as.numeric(duration(n_days_sd,   "days")) / samp_freq)
    tbdata |>
      mutate(
        rmean = data.table::frollmean(tb, w_mean, align = "center", fill = NA),
        rsd   = data.table::frollapply(tb, w_sd, FUN = sd, align = "center", fill = NA)
      )
  }) |>
  ungroup()

# flagged values set to NA; the full pipeline writes the result back to tbdata.RData
tb_adj <- tb_aug |>
  mutate(
    tb = if_else((abs(tb - rmean) > n_sd * rsd) & (abs(tb - rmean) > 1), NA_real_, tb)
  ) |>
  select(file, datetime, tb)

Diagnostic plots showing a ±7-day window around each flagged observation are written to analysis/processing/_diagnostics/_outliers/ for manual review.

2.2 Active season identification

Active season dates are inferred from internal body temperatures using changepoint detection on daily diurnal variance. The intuition is that thermoregulating snakes show greater daily temperature variance when active; variance is low and flat during inactivity.

Season classification is a standalone analysis. The output data/processed/seasons.rds is produced by analysis/processing/season-identification.R and is not currently consumed by the main modeling pipeline.

2.2.1 Diurnal variation

Daily standard deviations in body temperature are computed per snake per day of year. Snakes with fewer than 30 days of data are excluded, and series are smoothed with a 7-day rolling mean.

2.2.2 Location averages

Smoothed series are aggregated to locale-level pointwise medians to give the representative seasonal profile for each site.

2.2.3 Changepoint detection

Mean changepoints are detected via binary segmentation (up to \(Q = 3\) segments). A threshold of 1.5°C on segment-level mean distinguishes active from inactive periods.

Comments:

  1. In Lordsburg, a changepoint falls in the middle of the end-of-season transition. The 1.5°C threshold excludes the tapering period; reducing it to 1.25 would include it.

  2. In Montana de Oro and Sedgwick Reserve, inferred start dates are unreliable due to sparse data (see below).

Inferred changepoints and individual-level diurnal variation for Montana de Oro and Sedgwick Reserve.

In Montana de Oro, data span 2017–2018 (mid-April to late February). A single snake was active for roughly one week in early February 2018; the inferred changepoint simply marks the start of data collection. In Sedgwick, data span 2015–2017 but with only one snake recorded in 2016; the changepoint reflects that individual’s activity onset.

Sampling dates for Montana de Oro and Sedgwick Reserve.
locale yr first.obs last.obs
Montana de Oro 2014 2014-04-16 2014-10-07
Montana de Oro 2017 2017-04-15 2017-12-31
Montana de Oro 2018 2018-01-01 2018-02-21
Sedgwick Reserve 2015 2015-05-16 2015-12-31
Sedgwick Reserve 2016 2016-01-01 2016-03-24
Sedgwick Reserve 2017 2017-04-15 2017-12-11

2.2.4 Inferred active seasons

Active season classification uses the changepoint-inferred segment means thresholded at 1.5°C (daily SD). Locales for which all data fell within the active season (Rodeo, McIntyre, Chimineas) are excluded from the table.

Inferred active season changepoints.
location start end
Vandenberg Airforce Base May 5 Oct 16
Montana de Oro Jan 30 Oct 26
Lordsburg Mar 28 Nov 12
Alberta May 1 Sep 8
Youngblood Place Mar 25 Nov 10
Rattlesnake Butte May 3 Sep 22
Indio Mountains Research Station Mar 16 Nov 10
Kamloops-Thompson Mar 23 Sep 27
Sedgwick Reserve Feb 12 Oct 24

3 Statistical Modeling

The analysis models how continuous body temperature profiles (\(T_b\) as a function of time of day) vary with site-level environmental covariates across months, using a pooled cyclic functional regression fit across all months simultaneously.

3.1 Notation and covariates

Symbol Meaning
\[T_b(t)\] Body temperature at hour \[t \in [0, 24)\]
\[x\] Covariate (tmin, tmax, elev, or lat), z-scored globally
\[m\] Month (\[1, \ldots, 12\]), treated as a continuous cyclic variable
\[\ell\] Locale (field site)
\[i\] Curve index — one snake-day
\[\phi^{(1)}_k(t)\] Level-1 (between-locale) eigenfunction, PC \[k\]
\[\phi^{(2)}_k(t)\] Level-2 (within-locale) eigenfunction, PC \[k\]

Four covariates are supported. Each is z-scored globally (across all locale-dates) so that a 1-SD shift is interpretable as a population-level contrast regardless of site.

Variable Units Contrast Notes
tmin °C 5°C
tmax °C 5°C
elev m 100 m Rattlesnake Butte excluded (extreme outlier)
lat degrees

3.2 Data preparation

The package function prep_tf_data() loads the processed data files, joins snake and site metadata with Daymet weather, constructs functional data objects (one curve per snake-day on the 0–24h domain), and z-scores the four covariates globally. Sparse curves (fewer than 10 hourly observations) are retained at this stage and filtered downstream.

Diurnal body temperature profiles across all sites and months. Each curve represents one snake on one calendar day. Red curves fall during the inferred inactive season for that locale.

Season classification is determined at the locale level, not individually. Some curves appear mismatched because individual snakes can be active or inactive outside the typical window for their site.

3.3 Model specification

3.3.1 Pooled cyclic model (primary)

A single model is fit per covariate across all 12 months. The cyclic cubic spline basis on month — with knots at \(m = 0.5\) and \(m = 12.5\) — ensures a smooth December → January transition.

Stage 1 — Initial BAM:

\[ T_b(t, m) = f_0(t) + g_0(m) + h_0(t, m) + x \cdot h_1(t, m) + \varepsilon(t, m) \]

  • \(f_0(t)\): s(t, bs = "cc", k = 20), knots at \(t = 0, 23\)
  • \(g_0(m)\): s(month, bs = "cc", k = 12), knots at \(m = 0.5, 12.5\)
  • \(h_0(t, m)\): te(t, month, bs = c("cc", "cc"), k = c(12, 8))
  • \(h_1(t, m)\): te(t, month, bs = c("cc", "cc"), k = c(12, 8), by = x)

Fit via mgcv::bam() with method = "fREML" and discrete = TRUE. The tensor product basis dimensions k = c(12, 8) were increased from c(10, 6) to reduce EDF saturation.

Stage 2 — Multilevel FPCA:

Stage 1 residuals are averaged to snake-level mean curves (one per individual, pooled across all months) and decomposed into between-locale and within-locale components:

\[ \hat{\varepsilon}_{\ell i}(t) = \sum_{k=1}^{K_1} \xi_{\ell k}^{(1)} \phi_k^{(1)}(t) + \sum_{k=1}^{K_2} \xi_{\ell i, k}^{(2)} \phi_k^{(2)}(t) + e_{\ell i}(t) \]

Estimation uses run_mfpca() with \(K_1 = K_2 = 1\) component retained at each level. Missing hourly values are linearly interpolated before MFPCA.

Stage 3 — Full BAM:

\[ T_b(t, m) = f_0(t) + g_0(m) + h_0(t, m) + x \cdot h_1(t, m) + \phi_1^{(1)}(t) \cdot b_\ell + \phi_1^{(2)}(t) \cdot b_{\ell i} \]

where \(b_\ell \sim N(0, \sigma_1^2)\) (locale-level) and \(b_{\ell i} \sim N(0, \sigma_2^2)\) (snake-level) are eigenfunction-loaded random intercepts. Fit with select = TRUE in addition to the Stage 1 options.

Stage 4 — Population predictions:

Population-level predictions set \(b_\ell = b_{\ell i} = 0\):

\[ \hat{T}_b(t, m \mid x) = \hat{f}_0(t) + \hat{g}_0(m) + \hat{h}_0(t, m) + x \cdot \hat{h}_1(t, m) \]

Confidence bands use the lpmatrix approach (random-effect columns zeroed in the prediction matrix). Predictions span: \(t \in \{0, 0.1, \ldots, 23.9\}\), \(m \in \{1, \ldots, 12\}\), and two contrast levels (\(z = 0\) vs. \(z = c / \hat{\sigma}_x\)).

The full implementation is in analysis/modeling/fit-models.R (local) and analysis/modeling/tide/fit-pooled.R (HPC entrypoint). Key model-fitting code:

Code
prepped <- prep_tf_data(data_dir = "data/processed")

gam_data <- prepped$tf_df |>
  filter(tf_count(tb) > 10, !is.na(tmin)) |>
  select(fid, short.id, locale, tb, date) |>
  mutate(month = lubridate::month(date)) |>
  left_join(prepped$scaled, by = join_by(locale, date, short.id)) |>
  tf_unnest(tb) |>
  rename(t = tb_arg, tb = tb_value)

# Stage 1
fit_init <- bam(
  tb ~ s(t, bs = "cc", k = 20) +
       s(month, bs = "cc", k = 12) +
       te(t, month, bs = c("cc", "cc"), k = c(12, 8)) +
       te(t, month, bs = c("cc", "cc"), k = c(12, 8), by = tmin.z),
  data   = gam_data,
  knots  = list(t = c(0, 23), month = c(0.5, 12.5)),
  method = "fREML",
  discrete = TRUE
)

# Stage 2
fit_mfpca <- run_mfpca(Y_fill, id = locale_vec, visit = visit_vec)

# Stages 3–4: see analysis/modeling/fit-models.R for complete code

3.4 Results

Pre-computed model outputs are stored as lightweight extracts (_lite.rds) in analysis/rslt/model/. The package function reconstruct_predictions() regenerates population prediction grids at any desired contrast value.

3.4.1 Predicted curves

Predicted body temperature curves are shown at the baseline covariate level (\(z = 0\)) and at the shifted level (\(z = c / \hat{\sigma}_x\), corresponding to the contrasts in the table above).

Predicted body temperature curves at baseline and shifted covariate values.

3.4.2 Covariate effects

Pointwise differences between predicted curves (shifted minus baseline) reveal when during the day each covariate has the largest influence on body temperature.

Pointwise covariate effects by hour and month.

3.4.3 Effect sizes

The area between predicted curves (integrated absolute pointwise difference, normalized to an average hourly value in °C) summarizes the magnitude of each covariate’s effect per month.

3.5 Bootstrap inference

Model-based confidence intervals from vcov() are anti-conservative because within-curve residual autocorrelation is approximately 0.8 at lag-1, violating the independence assumption embedded in the model’s standard errors. The snake-level block bootstrap provides valid inference by resampling the set of snakes (\(n = 131\)) with replacement, preserving within-snake and within-curve dependence. For each bootstrap draw, the Stage-3 BAM is refit with Stages 1 and 2 held fixed, and the area between the two predicted population curves is computed per month:

\[ A_m = \int_0^{24} \left| \hat{T}_b(t, m \mid z = c/\hat{\sigma}) - \hat{T}_b(t, m \mid z = 0) \right| \, dt \]

Both absolute and signed areas are computed; the annualized summary is the mean across all 12 months. Bootstrap inference is applied to pooled models only. Results are based on 1000 draws per variable (50 HPC chunks × 20 draws). See analysis/posthoc/posthoc-spec.md for script-level details and execution order.

Pooled model — average hourly absolute effect size with 95% bootstrap CIs.

Pooled model — signed average hourly effect with 95% bootstrap CIs.

Annualized effect size comparison across covariates (95% bootstrap CI).