New England Phenology

Note 2/26/2019: Starting with a clean document. See ne_flora.rmd for archived version.

1) Hervey data

There are 1280 taxon names in the Hervey data. Here’s a summary of the # of species which occur (have an “X” marked) in various Hervey-defined periods.

Table 1: Hervey date ranges. Hervey date ranges, and the number of species recorded wihin each. For our purpes, I’ve still used the period names from Hervey, but the actual periods end one day sooner to avoid overlap: e.g. the “Apr 1-Apr 15” goes from day 91 to day 104).

## # A tibble: 18 x 2
##    date             `N taxa`
##    <chr>               <int>
##  1 Mar 15-Apr1             9
##  2 Apr 1-Apr 15           36
##  3 Apr 15-May 1           32
##  4 May 1-May 10          141
##  5 May 10-May 20          79
##  6 May 20-June 1         145
##  7 June 1-June 10        122
##  8 June 10-June 20       138
##  9 June 20-July 1        157
## 10 July 1-July 10        145
## 11 July 10-July 20       141
## 12 July 20-Aug 1         159
## 13 Aug. 1-Aug. 10        196
## 14 Aug. 10-Aug. 20       163
## 15 Aug. 20-Sept. 1       144
## 16 Sept. 1-Sept. 15      123
## 17 October                92
## 18 Nov                     4

2) Contemporary data

There are 3531 observations of 764 species in the contemporary data.

Table 2: Contemporary date. A summary of the # of observations (1 count per observation of a species on a date in a year), the number of taxa, and the range of dates in which flowering was observed for each year:

## # A tibble: 5 x 4
##   year_of_obs `N observations` `N taxa` daterange              
##         <dbl>            <int>    <int> <chr>                  
## 1        2014              364      254 2014-03-05 - 2014-09-17
## 2        2015              458      284 2015-03-06 - 2015-12-27
## 3        2016              919      419 2016-01-01 - 2016-12-31
## 4        2017              892      423 2017-02-19 - 2017-11-25
## 5        2018              898      409 2018-02-09 - 2018-11-19

Presentation Figure 2 (in the original numbering scheme)

3) Introduced vs. nonintroduced:

Hervey: no significant difference in midpoints between introduced and non-introduced.

## 
##  Welch Two Sample t-test
## 
## data:  dhervey_tidy[dhervey_tidy$is_introduced == "introduced", "midpoints"] and dhervey_tidy[dhervey_tidy$is_introduced == "native", "midpoints"]
## t = -0.7688, df = 1548.3, p-value = 0.4421
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -5.681056  2.481706
## sample estimates:
## mean of x mean of y 
##  189.4564  191.0561

Contemporary: significant difference in observations between introduced and non-introudced.

## 
##  Welch Two Sample t-test
## 
## data:  contemporary_dates[contemporary_dates$is_introduced == "introduced",  and contemporary_dates[contemporary_dates$is_introduced == "native",     "day_of_year"][[1]] and     "day_of_year"][[1]]
## t = -12.822, df = 2637.9, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -29.65884 -21.79075
## sample estimates:
## mean of x mean of y 
##  178.0299  203.7547

Presentation Figure 1. (in the original numbering scheme, slightly redrawn)

Below, I’ve done two new figures that pull apart Figure 1, b breaking it into species that overlap between the Hervey and contemporary observation periods (“retained”), were found only in the Hervey (“lost”) or only in the contemporary (“gained”). I’ve plotted them both as density (black and white), for better comparison among groups, and as historgram (blue and red), so that you can see actual units. One obvious result seems to be that newly introduced species are notably early.

Figure 1b.

Figure 1c.

4) Duration/variability

Presentation Figure 3. (in the previous numbering)

It certainly looks like the contemporary data shows a longer duration (or higher variability), with earlier and later flowering. However, this is looking at all contemporary and all historical data: 1499 taxa. Could part of the difference come from the different taxa being observed? As the figure above suggests, even only considering “retained” species, contemporary observations are more variable/of longer duration.

Figure 3b (new version)

We can also look at this comparing species pairs: by looking at the latest flowering date minus earliest flowering date for each species for contemporary observations, as a whole, and comparing this to the t1_end minus t1_begin for the Hervey data.

Presentation Figure 5. (in previous numbering) If we compare the temporal range of each of the 545 species from the Hervey data that show up in at least one of the contemporary years to their temporal range across the contemporary years, it’s clear that the modern period (2014-2018) shows a greater variation/duration than the 1860-1911 period.

Is this just variation among contemporary years? No, it looks like if we compare each of the contemporary years against the Hervey records of species that match it, most of the individual years show a similar pattern:

Presentation Figure 4. (in previous numbering)

5) Temperature in contemporary years

We use CRUTEM4 for 1850-2016 and add Wunderground monthly data for 2017 and 2018 from New Bedford (Station 720223): https://crudata.uea.ac.uk/cru/data/crutem/ge/crutem4-2018-06/N42.5W072.5/720223_data.txt.

Flowering was significantly earlier in 2016. Here I’m comparing across years among different species by converting mean flowering day for a year into deviation from the mean for the species across years: +1 indicates that a species flowered 1 day later in the given year than the mean across years; -1 indicates that a species flowered 1 day earlier in the given year than the mean across years.

Difference in flowering time among years - boxplot

Difference in flowering time among years- ANOVA and posthoc Nemenyi

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = mean_difference ~ as.factor(year), data = contemporary_and_temp)
## 
## $`as.factor(year)`
##                 diff          lwr        upr     p adj
## 2015-2014 -3.0271739 -10.15609748  4.1017497 0.7724764
## 2016-2014 -7.6919411 -14.82086463 -0.5630175 0.0271001
## 2017-2014 -2.6885219  -9.81744543  4.4404017 0.8400477
## 2018-2014 -0.6532353  -7.78215889  6.4756882 0.9991237
## 2016-2015 -4.6647671 -11.79369071  2.4641564 0.3794188
## 2017-2015  0.3386520  -6.79027152  7.4675756 0.9999353
## 2018-2015  2.3739386  -4.75498498  9.5028622 0.8922522
## 2017-2016  5.0034192  -2.12550437 12.1323428 0.3069702
## 2018-2016  7.0387057  -0.09021783 14.1676293 0.0548862
## 2018-2017  2.0352865  -5.09363702  9.1642101 0.9357876

## 
##  Pairwise comparisons using Tukey and Kramer (Nemenyi) test  
##                    with Tukey-Dist approximation for independent samples 
## 
## data:  contemporary_differences$mean_difference and contemporary_differences$year 
## 
##      2014   2015   2016   2017  
## 2015 0.4270 -      -      -     
## 2016 0.0092 0.5179 -      -     
## 2017 0.8686 0.9469 0.1418 -     
## 2018 1.0000 0.3716 0.0067 0.8259
## 
## P value adjustment method: none

This difference among years is significantly related to the annual temperature of the year, although the explanatory power of temperature is low.

Difference in flowering time with annual temperature - ANOVA. 2016 is significantly different.

Difference in flowering time with annual temperature - linear model

## 
## Call:
## lm(formula = mean_difference ~ Annual, data = contemporary_and_temp)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -77.927  -9.691   0.207   8.377 108.493 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   44.877     15.979   2.809  0.00519 **
## Annual        -4.071      1.438  -2.830  0.00485 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.65 on 458 degrees of freedom
## Multiple R-squared:  0.01719,    Adjusted R-squared:  0.01505 
## F-statistic: 8.011 on 1 and 458 DF,  p-value: 0.004853

Spring and Fall differences

Elsewhere we’ve showed that early plants (days 72-172) flower earlier and later plants (173-305) tend to flower later (across different years). To compare these, we regress them against the average temperature of the preceeding 6 (or 3) months; here we report early species versus 6 and 3 month temperature and late species versus 6 and 3 month temperature.

Y axis: phenological devation, i.e. the mean flowering day within a year minues the mean across 2014-2018.

X axis: temperature deviation, i.e. the mean tempearture for a 6 month (or 3 month) period within a year minus that tempearture across years 1850-2018.

Early-flowering spp (before day 173) versus previous 6 and previous 3 months

## 
## Call:
## lm(formula = mean_difference ~ p6mo, data = contemporary_and_temp_with_window[contemporary_and_temp_with_window$species_mean < 
##     173, ])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -70.397  -8.805  -1.121   6.810  92.378 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   1.5448     1.2752   1.211  0.22703   
## p6mo         -3.0125     0.9049  -3.329  0.00102 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.09 on 219 degrees of freedom
##   (4 observations deleted due to missingness)
## Multiple R-squared:  0.04817,    Adjusted R-squared:  0.04382 
## F-statistic: 11.08 on 1 and 219 DF,  p-value: 0.001022

## 
## Call:
## lm(formula = mean_difference ~ p3mo, data = contemporary_and_temp_with_window[contemporary_and_temp_with_window$species_mean < 
##     173, ])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -71.245  -8.741  -0.976   6.822  93.220 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   0.8854     1.2755   0.694   0.4883  
## p3mo         -2.2664     1.0452  -2.168   0.0312 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.26 on 223 degrees of freedom
## Multiple R-squared:  0.02065,    Adjusted R-squared:  0.01626 
## F-statistic: 4.702 on 1 and 223 DF,  p-value: 0.03119

Late-flowering spp (after day 172) versus previous 6 and previous 3 months

## 
## Call:
## lm(formula = mean_difference ~ p6mo, data = contemporary_and_temp_with_window[contemporary_and_temp_with_window$species_mean > 
##     172, ])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -78.036 -10.691   0.312   8.784 107.490 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  -0.5227     1.5167  -0.345    0.731
## p6mo          0.3751     1.3966   0.269    0.788
## 
## Residual standard error: 18.22 on 233 degrees of freedom
## Multiple R-squared:  0.0003095,  Adjusted R-squared:  -0.003981 
## F-statistic: 0.07215 on 1 and 233 DF,  p-value: 0.7885

## 
## Call:
## lm(formula = mean_difference ~ p3mo, data = contemporary_and_temp_with_window[contemporary_and_temp_with_window$species_mean > 
##     172, ])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -78.850 -10.673   0.617   8.247 109.509 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   0.7936     1.5242   0.521    0.603
## p3mo         -1.1686     1.0528  -1.110    0.268
## 
## Residual standard error: 18.17 on 233 degrees of freedom
## Multiple R-squared:  0.00526,    Adjusted R-squared:  0.0009905 
## F-statistic: 1.232 on 1 and 233 DF,  p-value: 0.2682

Previous 6 months

6) Adding in Hervey temperatures

## 
##  Pairwise comparisons using Tukey and Kramer (Nemenyi) test  
##                    with Tukey-Dist approximation for independent samples 
## 
## data:  both_with_window$species_deviation and both_with_window$year 
## 
##      1860  2014  2015  2016  2017 
## 2014 0.015 -     -     -     -    
## 2015 0.532 0.631 -     -     -    
## 2016 1.000 0.022 0.619 -     -    
## 2017 0.203 0.927 0.992 0.263 -    
## 2018 0.020 1.000 0.697 0.031 0.954
## 
## P value adjustment method: none

## 
## Call:
## lm(formula = species_deviation ~ as.factor(year), data = both_with_window)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -64.190 -10.022  -0.109   8.879  94.212 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           -6.005      2.027  -2.963 0.003216 ** 
## as.factor(year)2014   10.358      2.866   3.614 0.000337 ***
## as.factor(year)2015    6.227      2.866   2.173 0.030350 *  
## as.factor(year)2016    2.060      2.866   0.719 0.472630    
## as.factor(year)2017    7.445      2.866   2.597 0.009713 ** 
## as.factor(year)2018    9.938      2.866   3.467 0.000578 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.43 on 438 degrees of freedom
## Multiple R-squared:  0.04671,    Adjusted R-squared:  0.03583 
## F-statistic: 4.292 on 5 and 438 DF,  p-value: 0.0008018

## 
## Call:
## lm(formula = species_deviation ~ Annual, data = both_with_window)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -63.735 -10.127   0.158   8.777  94.868 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   19.556     16.121   1.213    0.226
## Annual        -1.780      1.465  -1.215    0.225
## 
## Residual standard error: 17.75 on 442 degrees of freedom
## Multiple R-squared:  0.003327,   Adjusted R-squared:  0.001072 
## F-statistic: 1.476 on 1 and 442 DF,  p-value: 0.2251

## 
## Call:
## lm(formula = species_deviation ~ p6mo, data = both_with_window)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -63.544  -9.488   0.142   8.742  98.115 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   1.3753     0.9547   1.441  0.15039   
## p6mo         -2.5817     0.8677  -2.975  0.00309 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.6 on 442 degrees of freedom
## Multiple R-squared:  0.01964,    Adjusted R-squared:  0.01742 
## F-statistic: 8.853 on 1 and 442 DF,  p-value: 0.003087

## 
## Call:
## lm(formula = species_deviation ~ p3mo, data = both_with_window)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -63.402  -9.527  -0.101   8.775 100.124 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   1.6373     0.9758   1.678  0.09406 . 
## p3mo         -2.6643     0.8248  -3.230  0.00133 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.57 on 442 degrees of freedom
## Multiple R-squared:  0.02307,    Adjusted R-squared:  0.02086 
## F-statistic: 10.44 on 1 and 442 DF,  p-value: 0.001328

## 
## Call:
## lm(formula = species_deviation ~ Annual, data = both_with_window[both_with_window$species_mean < 
##     173, ])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -62.805  -8.510  -0.439   7.649  61.733 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)    1.274     21.339    0.06    0.952
## Annual        -0.116      1.940   -0.06    0.952
## 
## Residual standard error: 15.92 on 202 degrees of freedom
## Multiple R-squared:  1.771e-05,  Adjusted R-squared:  -0.004933 
## F-statistic: 0.003577 on 1 and 202 DF,  p-value: 0.9524

## 
## Call:
## lm(formula = species_deviation ~ Annual, data = both_with_window[both_with_window$species_mean > 
##     172, ])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -64.494 -11.701   1.134   9.807  95.212 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   35.095     23.710   1.480     0.14
## Annual        -3.194      2.155  -1.482     0.14
## 
## Residual standard error: 19.19 on 238 degrees of freedom
## Multiple R-squared:  0.009146,   Adjusted R-squared:  0.004983 
## F-statistic: 2.197 on 1 and 238 DF,  p-value: 0.1396

## 
## Call:
## lm(formula = species_deviation ~ p6mo, data = both_with_window[both_with_window$species_mean < 
##     173, ])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -61.404  -8.636  -0.876   7.311  60.086 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)    1.265      1.195   1.058  0.29136   
## p6mo          -2.580      0.974  -2.649  0.00871 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.65 on 202 degrees of freedom
## Multiple R-squared:  0.03357,    Adjusted R-squared:  0.02879 
## F-statistic: 7.017 on 1 and 202 DF,  p-value: 0.008714

## 
## Call:
## lm(formula = species_deviation ~ p6mo, data = both_with_window[both_with_window$species_mean > 
##     172, ])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -63.642 -11.237   1.246   9.333  98.042 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)    1.477      1.520   0.972   0.3322  
## p6mo          -2.596      1.552  -1.673   0.0957 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 19.17 on 238 degrees of freedom
## Multiple R-squared:  0.01162,    Adjusted R-squared:  0.007466 
## F-statistic: 2.798 on 1 and 238 DF,  p-value: 0.09571

## 
## Call:
## lm(formula = species_deviation ~ p3mo, data = both_with_window[both_with_window$species_mean < 
##     173, ])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -62.771  -8.427  -0.579   7.680  63.061 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   0.9009     1.1940   0.754   0.4514  
## p3mo         -2.3781     1.1998  -1.982   0.0488 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.77 on 202 degrees of freedom
## Multiple R-squared:  0.01908,    Adjusted R-squared:  0.01422 
## F-statistic: 3.929 on 1 and 202 DF,  p-value: 0.04882

## 
## Call:
## lm(formula = species_deviation ~ p3mo, data = both_with_window[both_with_window$species_mean > 
##     172, ])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -62.795 -10.944   1.298  10.055 100.374 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)    2.501      1.558   1.605   0.1098   
## p3mo          -3.069      1.179  -2.604   0.0098 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 19.01 on 238 degrees of freedom
## Multiple R-squared:  0.0277, Adjusted R-squared:  0.02361 
## F-statistic:  6.78 on 1 and 238 DF,  p-value: 0.009796

Variation/duration in relation to temperature

## 
## Call:
## lm(formula = longer ~ as.factor(year_of_obs), data = duration_temp_with_window)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -142.852  -24.653   -4.963   20.940  164.222 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)   
## (Intercept)                  -5.537      7.985  -0.693  0.48908   
## as.factor(year_of_obs)2014  -16.778     11.293  -1.486  0.13937   
## as.factor(year_of_obs)2015  -12.444     11.293  -1.102  0.27216   
## as.factor(year_of_obs)2016   22.815     11.293   2.020  0.04506 * 
## as.factor(year_of_obs)2017   31.000     11.293   2.745  0.00676 **
## as.factor(year_of_obs)2018    8.630     11.293   0.764  0.44591   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 41.49 on 156 degrees of freedom
## Multiple R-squared:  0.1538, Adjusted R-squared:  0.1267 
## F-statistic: 5.672 on 5 and 156 DF,  p-value: 7.787e-05

7) Seasonal differences

Do spring days tend to have flowers that have moved earlier and fall days have flowers that have moved later? Yes.

Here I took each day_of_year for which you have contemporary data, and asked whether species observations made on that day tended to come before (negative), during (0), or after (positive) the historical range:

Presentation Figure 9. (in previous numbering)

Previously, we asked whether the day of year that an observation was made realted to how likely it was to come before, after, or during the historical period, and got this nice realationship.

Presentation Figure 6. (In previous numbering)

I’m now certain this is an artefact; if we use the Hervey dates as our x axis instead, the pattern is reversed.

We can see this more simply if we plot the actual observations.

However, we can maybe remove this by just using the mean of Hervey and our data as the x axis. In which case,for species means, we still get an effect!

## 
## Call:
## lm(formula = before_after ~ t1_t2_mean, data = test_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -132.771  -13.525   -3.864    9.659  117.065 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -10.51290    4.82890  -2.177   0.0299 *  
## t1_t2_mean    0.10939    0.02507   4.362 1.54e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 25.42 on 541 degrees of freedom
## Multiple R-squared:  0.03398,    Adjusted R-squared:  0.03219 
## F-statistic: 19.03 on 1 and 541 DF,  p-value: 1.541e-05

This suggests a new version of presentation figure 6.

8) Phylogeny

We want to count the number of observations for each species that fall off (<-50 or >75), early (-50-0), late (0-75), during (0)

(Or we can use the 5% and 95% quantiles, < -31 or >84 for off, -31-0 for early, 0-84 for late).

Eventually, we will compare this with phylogeny. For the moment, I’ll arrange these in descending order by the column “early”