Galerucella Phenology

Summary
Field observations
Combining sites
Substage parameters
Comparison with current model parameters
Conclusion

Summary

I analyze field observations from different sites and years (and methods) in Oregon in order to estimate distributions of overwintering adult emergence and oviposition. These distributions can then be used to include variation in phenology between substages (or cohorts) in voltinism models.

Field observations

I extracted data from “Galerucella phenology 7-30-13.csv” for 4 sites in 2009 and 2013. Cumulative degree-days (10C/37.8C) are from nearby weather stations using uspest.org.

dat <- read.csv("GCA_pheno_2013.csv", header = TRUE)
dat$Date <- mdy(dat$Date)
dat <- dat %>% 
  group_by(Site, Stage) %>% 
  mutate(RelProp = MeanCount / max(MeanCount, na.rm = TRUE),
         CDF = cumsum(MeanCount) / sum(MeanCount, na.rm = TRUE),
         Count = ifelse(Stage == "adult",   ### this only works for Fritzi's data
                        round(MeanCount * 15), round(MeanCount * 150)))

Plots of raw data suggest that we have the full shape of the adult emergence distribution, but might want to truncate the long right tail. Egg distributions could be estimated by combining datasets, since two studies cover different times. Below, cumulative distribution functions show the variation between sites.

plt <- ggplot(dat, aes(x = GDD, y = CDF, group = Stage, color = Stage)) + 
  geom_point() +
  geom_line() +
  facet_grid(Site~Stage) #+
  # theme_bw() #+ 
  # ggtitle("Cumulative distribution function of observations")
plt

Combining sites

Smoothing with a loess function approximates average phenology, but does not take into account distribution for count data (and allows estimates below 0).

In all plots below, I scale observations at different sites for easier comparisons. All counts are scaled by the site’s maximum count to give values between 0 and 1.

plt <- ggplot(dat, aes(x = GDD, y = RelProp, group = Stage, color = Site)) + 
  geom_point() +
  geom_smooth() +
  facet_wrap(~Stage, ncol = 1) #+
  # theme_bw() #+
  # ggtitle("Smoothed raw data with loess")
plt

Smoothing with a GAM might be better. Here it is for adults, accounting for site population sizes. Data were modeled as counts with a Poisson distributions, but plotted with each count scaled by the site’s maximum count.

stagedat <- dat %>% filter(Stage == "adult")

mod <- gam(Count ~ s(GDD, k = 10) + s(Site, bs = "re"), method = "REML",
           family = poisson, data = stagedat)

xrange <- range(stagedat$GDD)
xseq <- seq(from=xrange[1], to=xrange[2], length=1000)
preddat <- data.frame(GDD = xseq, Site = "Sutherlin")
preddat$y <- predict(mod, newdata = preddat, exclude = "s(Site)", type = "response")
preddat$RelProp <- preddat$y / max(preddat$y, na.rm = TRUE)
adultpred <- preddat #for later chunk

plt_adult <- ggplot(stagedat, aes(x = GDD, y = RelProp, color = Site)) + 
  geom_point() +
  geom_path(data = preddat) +
  ggtitle("Overwintering adults")
  # facet_wrap(~Stage, ncol = 1) +
  # theme_bw()
plt_adult

When the same model is used for eggs, we see one problems with a GAM smooth. The smoothing needs to be adjusted to avoid overfitting, here shown with the default 10 knots.

stagedat <- dat %>% filter(Stage == "egg")

mod <- gam(Count ~ s(GDD, k = 10) + s(Site, bs = "re"), method = "REML",
           family = poisson, data = stagedat)

xrange <- range(stagedat$GDD)
xseq <- seq(from=xrange[1], to=xrange[2], length=1000)
preddat <- data.frame(GDD = xseq, Site = "Sutherlin")
preddat$y <- predict(mod, newdata = preddat, exclude = "s(Site)", type = "response")
preddat$RelProp <- preddat$y / max(preddat$y, na.rm = TRUE)

plt <- ggplot(stagedat, aes(x = GDD, y = RelProp, color = Site)) + 
  geom_point() +
  geom_path(data = preddat) +
  ggtitle("F1 Eggs")
  # facet_wrap(~Stage, ncol = 1) +
  # theme_bw()
plt

Even with a less flexible model (knots = 5), Garono’s counts stay high all the way to the emergence of F1 adults, which we might want to change. At least we can see where the peak is for the egg distribution when combining both datasets.

stagedat <- dat %>% filter(Stage == "egg")

mod <- gam(Count ~ s(GDD, k = 5) + s(Site, bs = "re"), method = "REML",
           family = poisson, data = stagedat)

xrange <- range(stagedat$GDD)
xseq <- seq(from=xrange[1], to=xrange[2], length=1000)
preddat <- data.frame(GDD = xseq, Site = "Sutherlin")
preddat$y <- predict(mod, newdata = preddat, exclude = "s(Site)", type = "response")
preddat$RelProp <- preddat$y / max(preddat$y, na.rm = TRUE)
eggpred <- preddat # for later analysis

plt_egg <- ggplot(stagedat, aes(x = GDD, y = RelProp, color = Site)) + 
  geom_point() +
  geom_path(data = preddat) +   
  ggtitle("F1 Eggs")
  # facet_wrap(~Stage, ncol = 1) +
  # theme_bw()
plt_egg

Substage parameters

From the GAM predictions, I make a CDF which will be split up to derive substage parameters: mean GDD for emergence and the weight (% of total population). This functions can take any shape of model predictions for the distribution and splits it into any number of substages using a defined proportion of the whole distribution to cut off extreme values (must be between 0 and 1).

# distribution needs x and cdf
gamdist <- adultpred %>% 
  mutate(CDF = cumsum(y) / sum(y, na.rm = TRUE),
         x = GDD)

SubstageDistrib <- function(dist, numstage, perc){
  # this is an approximation from GAM predictions, could make more precise
  # with more predicted values or interpolation, but doesn't seem necessary
  ReturnClosestValue <- function(dist, xval){
    out <- dist$CDF[which(dist$x > xval)[1]]
  }
  
  low <- (1 - perc)/2
  high <- 1 - (1 - perc) / 2
  low <- dist$x[which(dist$CDF > low)[1]]
  high <- dist$x[which(dist$CDF > high)[1]]
  
  bounds <- seq(low, high, length.out = numstage + 1)
  means <- (bounds[1:numstage] + bounds[2:(numstage + 1)]) / 2
  weights <- diff(sapply(X = bounds, FUN = ReturnClosestValue, dist = dist), lag = 1)
  return(data.frame(means, weights))
}

Here are results with adult emergence with different numbers of substages and percentages of the whole distribution included. Bars represent substages and the red line shows the predicted distribution from the GAM.

newdat <- expand.grid(NumStage = c(5, 7, 9),
                      PercIncl = c(.9, .95, .99))
test <- newdat %>% 
  group_by(NumStage, PercIncl) %>% 
  do(SubstageDistrib(dist = gamdist, numstage = .$NumStage, perc = .$PercIncl)) %>% 
  mutate(GDD = means, 
         RelProp = weights / max(weights),
         BarWidth = diff(means[1:2]))

plt <- ggplot(data = test, aes(x = GDD, y = RelProp)) +
  geom_col(aes(width = BarWidth), color = "black", alpha = 0) +
  geom_path(data = adultpred, color = "red", alpha = .5, size = 1.5) +
  facet_grid(NumStage ~ PercIncl) +
  ggtitle("Overwintering adults")
plt

Comparison with current model parameters

DDRP model is limited to phenology from McAvoy & Kok 2007, which only has 1-3 observations per year for adults and 1-5 observations for eggs per year from 1993-95 from one site in Coeburn, VA.

Here I plot the adult and egg distributions from Oregon data compared to the DDRP estimates for start and peak of oviposition based on McAvoy & Kok (red triangles). I added an estimate of end of oviposition as peak + 1/2 oviposition period length, which is same way DDRP estimates the start of oviposition.

plt_ad <- plt_adult +
  geom_point(aes(x = 256.2, y = 1, color = "Coeburn"), shape = 17, size = 3) +
  geom_point(aes(x = 108, y = 0, color = "Coeburn"), shape = 17, size = 3) +
  geom_point(aes(x = 404.4, y = 0, color = "Coeburn"), shape = 17, size = 3) +
  scale_x_continuous(limit = c(0, 600)) +
  # theme(legend.position = "none") +
  ggtitle("Overwintering adults")

plt_eg <- plt_egg +
  geom_point(aes(x = 256.2, y = 1, color = "Coeburn"), shape = 17, size = 3) +
  geom_point(aes(x = 108, y = 0, color = "Coeburn"), shape = 17, size = 3) +  
  geom_point(aes(x = 404.4, y = 0, color = "Coeburn"), shape = 17, size = 3) +
  scale_x_continuous(limit = c(0, 600)) +
  # theme(legend.position = "none") +
  ggtitle("F1 Eggs")

multiplot(plt_ad, plt_eg, cols = 1)

Conclusion

The DDRP estimates, based on observations from mid-1990s in Virginia, match well with egg observations from 2010s in Oregon. The adult peak and egg peak do not match well. Even with a pre-oviposition period, I think that egg observations give an estimation of oviposition that is biased late. This could be due to sampling being days after oviposition, early eggs hatching, or counting eggs after they have hatched.

For the substage model, what matters is the timing of oviposition. I plan to truncate the late season Garono egg observations so that the distribution tapers to zero. The taper will be informed by the end of the adult distribution, after which oviposition cannot occur. The beginning of the oviposition curve probably lays somewhere between the adult and egg distributions, due to the pre-oviposition period and egg counts being a biased proxy for oviposition. I am not sure how I’ll combine these distributions yet.