map
For temperature, I used temperature data from the CRUTEM4 dataset (1850-2016; https://crudata.uea.ac.uk/cru/data/crutem/ge/crutem4-2018-06/N42.5W072.5/720223_data.txt) supplemented by Wunderground monthly data (for 2017 and 2018) from New Bedford Regional Airport.
Precipitation data from the US Historical Climate Network is available for most of the same period. Missing years are 2002-2016 and must be supplemented by precip data from the New Bedford Regional Airport station.
However, because of the difference in stations, there’s difference in precipitation. We can see this for the years 1997-2002 (where the two records overlap): the aiport consistently had ~200 mm less / year (17 mm less / month) than the network station.
So we might “correct” the data by adding 17 mm / month to the New Bedford Regional Airport data, or just use the raw data. Below, I’ve used the corrected data (yrs 2002-2016).
(Jan will add)
There are 1280 taxon names in the Hervey data, with phenological periods indicated.
There are 764 species in the contemporary data, with 3531 observations.
Here’s a summary of the # of species which occur (have an “X” marked) in various Hervey-defined periods.
Table 1: Contemporary and Hervey date ranges. Hervey date ranges, and the number of species recorded wihin each. For our purposes, I’ve still used the period names from Hervey, but the actual periods end one day sooner to avoid overlap: e.g. the “Apr 1-Apr 15” goes from day 91 to day 104). The contemporary data is any species for which an observation is recorded in any of the contemporary years.
## date N taxa 1860-1911 N taxa 2014-2018
## 1 Jan 1 - Mar 15 NA 23
## 2 Mar 15-Apr1 9 21
## 3 Apr 1-Apr 15 36 44
## 4 Apr 15-May 1 32 109
## 5 May 1-May 10 141 117
## 6 May 10-May 20 79 139
## 7 May 20-June 1 145 158
## 8 June 1-June 10 122 107
## 9 June 10-June 20 138 101
## 10 June 20-July 1 157 97
## 11 July 1-July 10 145 54
## 12 July 10-July 20 141 212
## 13 July 20-Aug 1 159 117
## 14 Aug. 1-Aug. 10 196 175
## 15 Aug. 10-Aug. 20 163 133
## 16 Aug. 20-Sept. 1 144 150
## 17 Sept. 1-Sept. 15 123 173
## 18 Sept. 15-Oct 1 NA 128
## 19 Oct. 1-Oct 31 92 99
## 20 Nov 1-Dec 31 4 36
There are 545 species of 341 genera that were recorded by Hervey and in at least one contemporary year. Of these,74 species of 66 genera were recorded by Hervey and in every contemporary year.
There are 735 species of 403 genera that were recorded by Hervey but not in the contemporary data; and 219 species of 154 genera that were recorded in the contemporary data but not by Hervey. Within this, 241 genera recorded by Hervey were never recorded in the contemporary data and 49 genera were newly recorded in the contemporary data but not by Hervey.
Figure: Comparing Hervey and contemporary observations, there is not a simple sign of earlier contemporary observations
Comparing as a whole the total 1499 across both Hervey and contemporary studies, there is not a simple earlier record of contemporary observations versus the mean (=midpoint, in this case) of Hervey’s species ranges.
Rather, it looks like 2014 and 2015 are signifcantly earlier, followed by Hervey, followed by 2016 and then 2017-2018, which are significantly later. Here are the year-to-year statistics:
## Df Sum Sq Mean Sq F value Pr(>F)
## year 5 248437 49687 17.51 <2e-16 ***
## Residuals 5553 15755970 2837
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Fit: aov(formula = day_of_year ~ year, data = joined)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## 2014 - 1860 == 0 -11.634 3.032 -3.837 0.00165 **
## 2015 - 1860 == 0 -10.324 2.756 -3.746 0.00247 **
## 2016 - 1860 == 0 5.153 2.118 2.433 0.13857
## 2017 - 1860 == 0 6.616 2.140 3.092 0.02329 *
## 2018 - 1860 == 0 11.139 2.135 5.217 < 0.001 ***
## 2015 - 2014 == 0 1.310 3.740 0.350 0.99926
## 2016 - 2014 == 0 16.787 3.299 5.089 < 0.001 ***
## 2017 - 2014 == 0 18.250 3.313 5.509 < 0.001 ***
## 2018 - 2014 == 0 22.774 3.310 6.881 < 0.001 ***
## 2016 - 2015 == 0 15.477 3.047 5.080 < 0.001 ***
## 2017 - 2015 == 0 16.940 3.062 5.532 < 0.001 ***
## 2018 - 2015 == 0 21.464 3.059 7.018 < 0.001 ***
## 2017 - 2016 == 0 1.463 2.504 0.584 0.99160
## 2018 - 2016 == 0 5.986 2.499 2.395 0.15099
## 2018 - 2017 == 0 4.523 2.518 1.796 0.45580
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
Figure: Comparing Hervey records to records in each of the 5 Contemporary years
Could part of the difference come from the different taxa being observed between Hervey and modern periods? Below, we do the same for just the 545 shared taxa.
Below, can look at these 545 shared species among different years. Again, 2014 and 2015 are significantly earlier (the post-hoc test puts them in the “a” group), followed by Hervey, which - for this species set - is not significantly different from the other years (the post-hoc test puts them in the “b” group).
## Df Sum Sq Mean Sq F value Pr(>F)
## year 5 156362 31272 10.84 2.26e-10 ***
## Residuals 3804 10974138 2885
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Fit: aov(formula = day_of_year ~ year, data = joined_ret)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## 2014 - 1860 == 0 -14.2127 3.6657 -3.877 0.00141 **
## 2015 - 1860 == 0 -13.1884 3.3088 -3.986 < 0.001 ***
## 2016 - 1860 == 0 3.1727 2.5893 1.225 0.81914
## 2017 - 1860 == 0 2.3349 2.6163 0.892 0.94641
## 2018 - 1860 == 0 6.4583 2.6163 2.469 0.12886
## 2015 - 2014 == 0 1.0243 4.3452 0.236 0.99990
## 2016 - 2014 == 0 17.3854 3.8258 4.544 < 0.001 ***
## 2017 - 2014 == 0 16.5477 3.8441 4.305 < 0.001 ***
## 2018 - 2014 == 0 20.6711 3.8441 5.377 < 0.001 ***
## 2016 - 2015 == 0 16.3611 3.4853 4.694 < 0.001 ***
## 2017 - 2015 == 0 15.5233 3.5054 4.428 < 0.001 ***
## 2018 - 2015 == 0 19.6467 3.5054 5.605 < 0.001 ***
## 2017 - 2016 == 0 -0.8378 2.8362 -0.295 0.99969
## 2018 - 2016 == 0 3.2856 2.8362 1.158 0.85179
## 2018 - 2017 == 0 4.1234 2.8608 1.441 0.69401
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
We still want to look for differnces between the different seasons in phenological change. I worked with Ivan to address my concerns about this analysis, and (to cut a long story short) what we were able to come up with is that for this data, regressing the mean shift of a species (difference between the species in contemporary and Hervey) on the mean flowering of a species (based on Hervey day of year) is not biased, whereas the other methods we considered (regressing on mean flowering of a species from contemporary observations or on an average of contemporary and Hervey) both are biased.
To demonstrate this, we did a simulation where our actual species mean shift values were randomly shuffled among species 100 times, and were able to show that the intercepts, slopes, and significances of these randomized simiulations looked very different than our actual data.
I’m not sure that we need to include any of this except for the model of our acutal data, but for your information, here’s one example simulation, our acutal data, and the slope and intercepts of the 100 simulations compared to the acutal data.
The outcome here is: yes: we may say that early-flowering species show a greater phenological delay, while late-flowering species show less or no change.
Full model for actual data:
##
## Call:
## lm(formula = before_after ~ t1_mean, data = test_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -121.476 -13.392 -4.318 10.149 118.569
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 26.70264 4.66838 5.720 1.76e-08 ***
## t1_mean -0.09148 0.02472 -3.701 0.000237 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 25.57 on 543 degrees of freedom
## Multiple R-squared: 0.0246, Adjusted R-squared: 0.0228
## F-statistic: 13.7 on 1 and 543 DF, p-value: 0.0002369
For the modern years, there is a small but significant pattern of earlier flowering in warmer years (A, solid blue line). However, our temperature and phenology data for the Hervey dataset do not follow this pattern, to an extent that there is no significant relationship when it is added (A, dotted black line).
In contrast, the small but significant pattern of LATER flower in modern wetter years (B, solid blue line) is in line with the Hervey data, and more significant when the Hervey period data added (B dotted black line).
##
## Call:
## lm(formula = species_deviation ~ mean_temp, data = both[both$year !=
## 1860, ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -66.590 -10.041 0.101 8.501 93.848
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 51.333 17.317 2.964 0.00323 **
## mean_temp -4.519 1.559 -2.899 0.00397 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.15 on 368 degrees of freedom
## Multiple R-squared: 0.02232, Adjusted R-squared: 0.01967
## F-statistic: 8.403 on 1 and 368 DF, p-value: 0.00397
##
## Call:
## lm(formula = species_deviation ~ mean_temp, data = both)
##
## Residuals:
## Min 1Q Median 3Q Max
## -62.712 -10.368 -0.016 8.536 94.398
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.1309 12.9869 -0.087 0.931
## mean_temp 0.1039 1.1911 0.087 0.931
##
## Residual standard error: 17.78 on 442 degrees of freedom
## Multiple R-squared: 1.723e-05, Adjusted R-squared: -0.002245
## F-statistic: 0.007615 on 1 and 442 DF, p-value: 0.9305
##
## Call:
## lm(formula = species_deviation ~ sum_precip, data = both[both$year !=
## 1860, ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -63.544 -10.378 -0.454 8.037 95.506
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -16.754469 6.500049 -2.578 0.01034 *
## sum_precip 0.013224 0.004742 2.789 0.00557 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.17 on 368 degrees of freedom
## Multiple R-squared: 0.0207, Adjusted R-squared: 0.01804
## F-statistic: 7.777 on 1 and 368 DF, p-value: 0.005566
##
## Call:
## lm(formula = species_deviation ~ sum_precip, data = both)
##
## Residuals:
## Min 1Q Median 3Q Max
## -63.113 -10.105 -0.194 9.109 96.364
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -21.210036 5.236805 -4.050 6.04e-05 ***
## sum_precip 0.016257 0.003963 4.102 4.88e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.45 on 442 degrees of freedom
## Multiple R-squared: 0.03667, Adjusted R-squared: 0.03449
## F-statistic: 16.82 on 1 and 442 DF, p-value: 4.881e-05
Spring-flowering species (those with species mean flowering date <183) are significantly advanced (that is, flower before the long-term mean date for their species) following warmer six-month periods (A). Responses of these species to precipitation are non-significant. In contrast, fall-flowering species (those with species mean flowering date >182) do not have a significant response to temperature but are significaly delayed following wetter six-month periods (B).
##
## Call:
## lm(formula = species_deviation ~ p6mo, data = both_with_window[both_with_window$species_mean <
## 183, ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -61.182 -8.525 -0.887 7.430 60.342
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.9804 1.1763 0.833 0.40556
## p6mo -2.5507 0.9562 -2.668 0.00824 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 16.19 on 208 degrees of freedom
## Multiple R-squared: 0.03308, Adjusted R-squared: 0.02843
## F-statistic: 7.117 on 1 and 208 DF, p-value: 0.008239
##
## Call:
## lm(formula = species_deviation ~ p6mo_pr, data = both_with_window[both_with_window$species_mean >
## 182, ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -57.591 -10.076 1.239 8.699 94.258
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.27390 1.40520 -2.330 0.0207 *
## p6mo_pr 0.05238 0.01195 4.382 1.78e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 18.21 on 232 degrees of freedom
## Multiple R-squared: 0.07644, Adjusted R-squared: 0.07246
## F-statistic: 19.2 on 1 and 232 DF, p-value: 1.782e-05
In other words, more precipitation drives later fall flowering, and the fact that it’s much wetter now (in the months experienced by all but a few species in 2015) explains why fall flowering is later.
We can look at variability comparing latest flowering date minus earliest flowering date for each species for contemporary observations, and for Hervey data, and then taking the difference of these two durations. Positive values indicate duration of the species has lengthened, negative that it has shortened.
If we compare the temporal range of each of the 545 species from the Hervey data that show up in at least one of the contemporary years to their temporal range across the contemporary years, it’s clear that the modern period (2014-2018) shows a greater variation/duration than the 1860-1911 period.
##
## Wilcoxon rank sum test with continuity correction
##
## data: rjoined$t1_duration and rjoined$t2_duration
## W = 126270, p-value = 1.702e-05
## alternative hypothesis: true location shift is not equal to 0
And this isn’t just greater variability among contemporary years: there is greater duration in most contemorary years individually.
Contemporary years do differ in observed within-year duration.
##
## Call:
## lm(formula = duration ~ year, data = both_with_window)
##
## Residuals:
## Min 1Q Median 3Q Max
## -57.07 -27.57 -20.07 19.93 289.93
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 27.568 5.811 4.744 2.84e-06 ***
## year2014 -7.081 8.217 -0.862 0.389313
## year2015 -1.608 8.217 -0.196 0.844939
## year2016 29.500 8.217 3.590 0.000368 ***
## year2017 28.554 8.217 3.475 0.000562 ***
## year2018 18.054 8.217 2.197 0.028539 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 49.98 on 438 degrees of freedom
## Multiple R-squared: 0.08119, Adjusted R-squared: 0.0707
## F-statistic: 7.741 on 5 and 438 DF, p-value: 5.428e-07
Years with warmer temperatures are associated with greater within-year duration.
##
## Call:
## lm(formula = duration ~ p6mo, data = both_with_window)
##
## Residuals:
## Min 1Q Median 3Q Max
## -53.82 -33.73 -21.33 22.28 300.10
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 36.298 2.656 13.667 <2e-16 ***
## p6mo 5.805 2.390 2.429 0.0155 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 51.57 on 442 degrees of freedom
## Multiple R-squared: 0.01317, Adjusted R-squared: 0.01094
## F-statistic: 5.899 on 1 and 442 DF, p-value: 0.01555
In the 1860-1911 time period, there are 1266 native species, 737 introduced species (not including those classifed as invasive) and 24 invasive species.
In this data, there is no significant difference in the midpoint of flowering between native and introduced species, but invasive species are significantly earlier.
## Df Sum Sq Mean Sq F value Pr(>F)
## is_introduced 2 18494 9247 4.519 0.011 *
## Residuals 2024 4141599 2046
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 1 observation deleted due to missingness
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Fit: aov(formula = midpoints ~ is_introduced, data = dhervey_tidy)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## introduced - native == 0 -0.8763 2.0959 -0.418 0.89982
## invasive - native == 0 -27.9759 9.3208 -3.001 0.00639 **
## invasive - introduced == 0 -27.0996 9.3828 -2.888 0.00889 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
In the 2014-2018 time period, there are 2184 native species, 1128 introduced species (not including those classifed as invasive) and 182 invasive species.
In this data, both introduced and invasive species have significantly earlier midpoints than native.
## Df Sum Sq Mean Sq F value Pr(>F)
## is_introduced 2 555415 277707 86.54 <2e-16 ***
## Residuals 3491 11202168 3209
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 37 observations deleted due to missingness
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Fit: aov(formula = day_of_year ~ is_introduced, data = contemporary_dates)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## introduced - native == 0 -26.951 2.077 -12.976 <0.001 ***
## invasive - native == 0 -18.538 4.370 -4.242 <0.001 ***
## invasive - introduced == 0 8.413 4.525 1.859 0.141
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
This effect is unchanged if we look just at the 545 shared species monitored in both periods, which suggests that the introduced species have shifted more over time.
## Df Sum Sq Mean Sq F value Pr(>F)
## is_introduced 2 371891 185946 68.663 < 2e-16 ***
## source 1 14789 14789 5.461 0.0195 *
## type 2 182515 91258 33.698 2.84e-15 ***
## is_introduced:source 2 186302 93151 34.397 1.43e-15 ***
## is_introduced:type 4 258062 64515 23.823 < 2e-16 ***
## Residuals 5509 14918978 2708
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 38 observations deleted due to missingness
Taking a look at other species traits from the data Emily assembled for 280 species, which together account for 2990 observations (4-49 observations each). Annual/perennial/biennial, color, cultivation, growth habit, reproduction, and uses all show differences in day of year, but NOT in day of year * source interaction (that is, now difference in shift). In contrast, native range shows a significant interaction, but the species that seem to be driving this are few.
## Df Sum Sq Mean Sq F value Pr(>F)
## value 2 261772 130886 46.23 <2e-16 ***
## Residuals 3647 10325250 2831
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Df Sum Sq Mean Sq F value Pr(>F)
## value 9 447444 49716 17.47 <2e-16 ***
## Residuals 2900 8253863 2846
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Df Sum Sq Mean Sq F value Pr(>F)
## value 1 14309 14309 4.921 0.0266 *
## Residuals 4463 12977982 2908
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Df Sum Sq Mean Sq F value Pr(>F)
## value 5 1352285 270457 99.33 <2e-16 ***
## Residuals 5118 13935378 2723
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Df Sum Sq Mean Sq F value Pr(>F)
## num_range_categories 1 6778 6778 2.757 0.0971 .
## source 1 15529 15529 6.316 0.0121 *
## num_range_categories:source 1 2182 2182 0.887 0.3464
## Residuals 1268 3117653 2459
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Df Sum Sq Mean Sq F value Pr(>F)
## value 2 320846 160423 54.208 <2e-16 ***
## source 1 691 691 0.234 0.629
## value:source 2 8602 4301 1.453 0.234
## Residuals 3616 10701127 2959
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Df Sum Sq Mean Sq F value Pr(>F)
## value 9 485329 53925 18.250 <2e-16 ***
## source 1 3050 3050 1.032 0.31
## Residuals 5162 15252464 2955
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1