Note 2/26/2019: Starting with a clean document. See ne_flora.rmd for archived version.
There are 1280 taxon names in the Hervey data. Here’s a summary of the # of species which occur (have an “X” marked) in various Hervey-defined periods.
Table 1: Hervey date ranges. Hervey date ranges, and the number of species recorded wihin each. For our purpes, I’ve still used the period names from Hervey, but the actual periods end one day sooner to avoid overlap: e.g. the “Apr 1-Apr 15” goes from day 91 to day 104).
## # A tibble: 18 x 2
## date `N taxa`
## <chr> <int>
## 1 Mar 15-Apr1 9
## 2 Apr 1-Apr 15 36
## 3 Apr 15-May 1 32
## 4 May 1-May 10 141
## 5 May 10-May 20 79
## 6 May 20-June 1 145
## 7 June 1-June 10 122
## 8 June 10-June 20 138
## 9 June 20-July 1 157
## 10 July 1-July 10 145
## 11 July 10-July 20 141
## 12 July 20-Aug 1 159
## 13 Aug. 1-Aug. 10 196
## 14 Aug. 10-Aug. 20 163
## 15 Aug. 20-Sept. 1 144
## 16 Sept. 1-Sept. 15 123
## 17 October 92
## 18 Nov 4
There are 3531 observations of 764 species in the contemporary data.
Table 2: Contemporary date. A summary of the # of observations (1 count per observation of a species on a date in a year), the number of taxa, and the range of dates in which flowering was observed for each year:
## # A tibble: 5 x 4
## year_of_obs `N observations` `N taxa` daterange
## <dbl> <int> <int> <chr>
## 1 2014 364 254 2014-03-05 - 2014-09-17
## 2 2015 458 284 2015-03-06 - 2015-12-27
## 3 2016 919 419 2016-01-01 - 2016-12-31
## 4 2017 892 423 2017-02-19 - 2017-11-25
## 5 2018 898 409 2018-02-09 - 2018-11-19
Presentation Figure 2 (in the original numbering scheme)
Hervey: no significant difference in midpoints between introduced and non-introduced.
##
## Welch Two Sample t-test
##
## data: dhervey_tidy[dhervey_tidy$is_introduced == "introduced", "midpoints"] and dhervey_tidy[dhervey_tidy$is_introduced == "native", "midpoints"]
## t = -0.7688, df = 1548.3, p-value = 0.4421
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -5.681056 2.481706
## sample estimates:
## mean of x mean of y
## 189.4564 191.0561
Contemporary: significant difference in observations between introduced and non-introudced.
##
## Welch Two Sample t-test
##
## data: contemporary_dates[contemporary_dates$is_introduced == "introduced", and contemporary_dates[contemporary_dates$is_introduced == "native", "day_of_year"][[1]] and "day_of_year"][[1]]
## t = -12.822, df = 2637.9, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -29.65884 -21.79075
## sample estimates:
## mean of x mean of y
## 178.0299 203.7547
Presentation Figure 1. (in the original numbering scheme, slightly redrawn)
Below, I’ve done two new figures that pull apart Figure 1, b breaking it into species that overlap between the Hervey and contemporary observation periods (“retained”), were found only in the Hervey (“lost”) or only in the contemporary (“gained”). I’ve plotted them both as density (black and white), for better comparison among groups, and as historgram (blue and red), so that you can see actual units. One obvious result seems to be that newly introduced species are notably early.
Figure 1b.
Figure 1c.
Presentation Figure 3. (in the previous numbering)
It certainly looks like the contemporary data shows a longer duration (or higher variability), with earlier and later flowering. However, this is looking at all contemporary and all historical data: 1499 taxa. Could part of the difference come from the different taxa being observed? As the figure above suggests, even only considering “retained” species, contemporary observations are more variable/of longer duration.
Figure 3b (new version)
We can also look at this comparing species pairs: by looking at the latest flowering date minus earliest flowering date for each species for contemporary observations, as a whole, and comparing this to the t1_end
minus t1_begin
for the Hervey data.
Presentation Figure 5. (in previous numbering) If we compare the temporal range of each of the 545 species from the Hervey data that show up in at least one of the contemporary years to their temporal range across the contemporary years, it’s clear that the modern period (2014-2018) shows a greater variation/duration than the 1860-1911 period.
Is this just variation among contemporary years? No, it looks like if we compare each of the contemporary years against the Hervey records of species that match it, most of the individual years show a similar pattern:
Presentation Figure 4. (in previous numbering)
We use CRUTEM4 for 1850-2016 and add Wunderground monthly data for 2017 and 2018 from New Bedford (Station 720223): https://crudata.uea.ac.uk/cru/data/crutem/ge/crutem4-2018-06/N42.5W072.5/720223_data.txt.
Flowering was significantly earlier in 2016. Here I’m comparing across years among different species by converting mean flowering day for a year into deviation from the mean for the species across years: +1 indicates that a species flowered 1 day later in the given year than the mean across years; -1 indicates that a species flowered 1 day earlier in the given year than the mean across years.
Difference in flowering time among years - boxplot
Difference in flowering time among years- ANOVA and posthoc Nemenyi
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = mean_difference ~ as.factor(year), data = contemporary_and_temp)
##
## $`as.factor(year)`
## diff lwr upr p adj
## 2015-2014 -3.0271739 -10.15609748 4.1017497 0.7724764
## 2016-2014 -7.6919411 -14.82086463 -0.5630175 0.0271001
## 2017-2014 -2.6885219 -9.81744543 4.4404017 0.8400477
## 2018-2014 -0.6532353 -7.78215889 6.4756882 0.9991237
## 2016-2015 -4.6647671 -11.79369071 2.4641564 0.3794188
## 2017-2015 0.3386520 -6.79027152 7.4675756 0.9999353
## 2018-2015 2.3739386 -4.75498498 9.5028622 0.8922522
## 2017-2016 5.0034192 -2.12550437 12.1323428 0.3069702
## 2018-2016 7.0387057 -0.09021783 14.1676293 0.0548862
## 2018-2017 2.0352865 -5.09363702 9.1642101 0.9357876
##
## Pairwise comparisons using Tukey and Kramer (Nemenyi) test
## with Tukey-Dist approximation for independent samples
##
## data: contemporary_differences$mean_difference and contemporary_differences$year
##
## 2014 2015 2016 2017
## 2015 0.4270 - - -
## 2016 0.0092 0.5179 - -
## 2017 0.8686 0.9469 0.1418 -
## 2018 1.0000 0.3716 0.0067 0.8259
##
## P value adjustment method: none
This difference among years is significantly related to the annual temperature of the year, although the explanatory power of temperature is low.
Difference in flowering time with annual temperature - ANOVA. 2016 is significantly different.
Difference in flowering time with annual temperature - linear model
##
## Call:
## lm(formula = mean_difference ~ Annual, data = contemporary_and_temp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -77.927 -9.691 0.207 8.377 108.493
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 44.877 15.979 2.809 0.00519 **
## Annual -4.071 1.438 -2.830 0.00485 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.65 on 458 degrees of freedom
## Multiple R-squared: 0.01719, Adjusted R-squared: 0.01505
## F-statistic: 8.011 on 1 and 458 DF, p-value: 0.004853
Spring and Fall differences
Elsewhere we’ve showed that early plants (days 72-172) flower earlier and later plants (173-305) tend to flower later (across different years). To compare these, we regress them against the average temperature of the preceeding 6 (or 3) months; here we report early species versus 6 and 3 month temperature and late species versus 6 and 3 month temperature.
Y axis: phenological devation, i.e. the mean flowering day within a year minues the mean across 2014-2018.
X axis: temperature deviation, i.e. the mean tempearture for a 6 month (or 3 month) period within a year minus that tempearture across years 1850-2018.
Early-flowering spp (before day 173) versus previous 6 and previous 3 months
##
## Call:
## lm(formula = mean_difference ~ p6mo, data = contemporary_and_temp_with_window[contemporary_and_temp_with_window$species_mean <
## 173, ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -70.397 -8.805 -1.121 6.810 92.378
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.5448 1.2752 1.211 0.22703
## p6mo -3.0125 0.9049 -3.329 0.00102 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.09 on 219 degrees of freedom
## (4 observations deleted due to missingness)
## Multiple R-squared: 0.04817, Adjusted R-squared: 0.04382
## F-statistic: 11.08 on 1 and 219 DF, p-value: 0.001022
##
## Call:
## lm(formula = mean_difference ~ p3mo, data = contemporary_and_temp_with_window[contemporary_and_temp_with_window$species_mean <
## 173, ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -71.245 -8.741 -0.976 6.822 93.220
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.8854 1.2755 0.694 0.4883
## p3mo -2.2664 1.0452 -2.168 0.0312 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.26 on 223 degrees of freedom
## Multiple R-squared: 0.02065, Adjusted R-squared: 0.01626
## F-statistic: 4.702 on 1 and 223 DF, p-value: 0.03119
Late-flowering spp (after day 172) versus previous 6 and previous 3 months
##
## Call:
## lm(formula = mean_difference ~ p6mo, data = contemporary_and_temp_with_window[contemporary_and_temp_with_window$species_mean >
## 172, ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -78.036 -10.691 0.312 8.784 107.490
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.5227 1.5167 -0.345 0.731
## p6mo 0.3751 1.3966 0.269 0.788
##
## Residual standard error: 18.22 on 233 degrees of freedom
## Multiple R-squared: 0.0003095, Adjusted R-squared: -0.003981
## F-statistic: 0.07215 on 1 and 233 DF, p-value: 0.7885
##
## Call:
## lm(formula = mean_difference ~ p3mo, data = contemporary_and_temp_with_window[contemporary_and_temp_with_window$species_mean >
## 172, ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -78.850 -10.673 0.617 8.247 109.509
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.7936 1.5242 0.521 0.603
## p3mo -1.1686 1.0528 -1.110 0.268
##
## Residual standard error: 18.17 on 233 degrees of freedom
## Multiple R-squared: 0.00526, Adjusted R-squared: 0.0009905
## F-statistic: 1.232 on 1 and 233 DF, p-value: 0.2682
Previous 6 months
##
## Pairwise comparisons using Tukey and Kramer (Nemenyi) test
## with Tukey-Dist approximation for independent samples
##
## data: both_with_window$species_deviation and both_with_window$year
##
## 1860 2014 2015 2016 2017
## 2014 0.015 - - - -
## 2015 0.532 0.631 - - -
## 2016 1.000 0.022 0.619 - -
## 2017 0.203 0.927 0.992 0.263 -
## 2018 0.020 1.000 0.697 0.031 0.954
##
## P value adjustment method: none
##
## Call:
## lm(formula = species_deviation ~ as.factor(year), data = both_with_window)
##
## Residuals:
## Min 1Q Median 3Q Max
## -64.190 -10.022 -0.109 8.879 94.212
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.005 2.027 -2.963 0.003216 **
## as.factor(year)2014 10.358 2.866 3.614 0.000337 ***
## as.factor(year)2015 6.227 2.866 2.173 0.030350 *
## as.factor(year)2016 2.060 2.866 0.719 0.472630
## as.factor(year)2017 7.445 2.866 2.597 0.009713 **
## as.factor(year)2018 9.938 2.866 3.467 0.000578 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.43 on 438 degrees of freedom
## Multiple R-squared: 0.04671, Adjusted R-squared: 0.03583
## F-statistic: 4.292 on 5 and 438 DF, p-value: 0.0008018
##
## Call:
## lm(formula = species_deviation ~ Annual, data = both_with_window)
##
## Residuals:
## Min 1Q Median 3Q Max
## -63.735 -10.127 0.158 8.777 94.868
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 19.556 16.121 1.213 0.226
## Annual -1.780 1.465 -1.215 0.225
##
## Residual standard error: 17.75 on 442 degrees of freedom
## Multiple R-squared: 0.003327, Adjusted R-squared: 0.001072
## F-statistic: 1.476 on 1 and 442 DF, p-value: 0.2251
##
## Call:
## lm(formula = species_deviation ~ p6mo, data = both_with_window)
##
## Residuals:
## Min 1Q Median 3Q Max
## -63.544 -9.488 0.142 8.742 98.115
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.3753 0.9547 1.441 0.15039
## p6mo -2.5817 0.8677 -2.975 0.00309 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.6 on 442 degrees of freedom
## Multiple R-squared: 0.01964, Adjusted R-squared: 0.01742
## F-statistic: 8.853 on 1 and 442 DF, p-value: 0.003087
##
## Call:
## lm(formula = species_deviation ~ p3mo, data = both_with_window)
##
## Residuals:
## Min 1Q Median 3Q Max
## -63.402 -9.527 -0.101 8.775 100.124
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.6373 0.9758 1.678 0.09406 .
## p3mo -2.6643 0.8248 -3.230 0.00133 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.57 on 442 degrees of freedom
## Multiple R-squared: 0.02307, Adjusted R-squared: 0.02086
## F-statistic: 10.44 on 1 and 442 DF, p-value: 0.001328
##
## Call:
## lm(formula = species_deviation ~ Annual, data = both_with_window[both_with_window$species_mean <
## 173, ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -62.805 -8.510 -0.439 7.649 61.733
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.274 21.339 0.06 0.952
## Annual -0.116 1.940 -0.06 0.952
##
## Residual standard error: 15.92 on 202 degrees of freedom
## Multiple R-squared: 1.771e-05, Adjusted R-squared: -0.004933
## F-statistic: 0.003577 on 1 and 202 DF, p-value: 0.9524
##
## Call:
## lm(formula = species_deviation ~ Annual, data = both_with_window[both_with_window$species_mean >
## 172, ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -64.494 -11.701 1.134 9.807 95.212
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 35.095 23.710 1.480 0.14
## Annual -3.194 2.155 -1.482 0.14
##
## Residual standard error: 19.19 on 238 degrees of freedom
## Multiple R-squared: 0.009146, Adjusted R-squared: 0.004983
## F-statistic: 2.197 on 1 and 238 DF, p-value: 0.1396
##
## Call:
## lm(formula = species_deviation ~ p6mo, data = both_with_window[both_with_window$species_mean <
## 173, ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -61.404 -8.636 -0.876 7.311 60.086
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.265 1.195 1.058 0.29136
## p6mo -2.580 0.974 -2.649 0.00871 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.65 on 202 degrees of freedom
## Multiple R-squared: 0.03357, Adjusted R-squared: 0.02879
## F-statistic: 7.017 on 1 and 202 DF, p-value: 0.008714
##
## Call:
## lm(formula = species_deviation ~ p6mo, data = both_with_window[both_with_window$species_mean >
## 172, ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -63.642 -11.237 1.246 9.333 98.042
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.477 1.520 0.972 0.3322
## p6mo -2.596 1.552 -1.673 0.0957 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 19.17 on 238 degrees of freedom
## Multiple R-squared: 0.01162, Adjusted R-squared: 0.007466
## F-statistic: 2.798 on 1 and 238 DF, p-value: 0.09571
##
## Call:
## lm(formula = species_deviation ~ p3mo, data = both_with_window[both_with_window$species_mean <
## 173, ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -62.771 -8.427 -0.579 7.680 63.061
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.9009 1.1940 0.754 0.4514
## p3mo -2.3781 1.1998 -1.982 0.0488 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.77 on 202 degrees of freedom
## Multiple R-squared: 0.01908, Adjusted R-squared: 0.01422
## F-statistic: 3.929 on 1 and 202 DF, p-value: 0.04882
##
## Call:
## lm(formula = species_deviation ~ p3mo, data = both_with_window[both_with_window$species_mean >
## 172, ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -62.795 -10.944 1.298 10.055 100.374
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.501 1.558 1.605 0.1098
## p3mo -3.069 1.179 -2.604 0.0098 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 19.01 on 238 degrees of freedom
## Multiple R-squared: 0.0277, Adjusted R-squared: 0.02361
## F-statistic: 6.78 on 1 and 238 DF, p-value: 0.009796
Variation/duration in relation to temperature
##
## Call:
## lm(formula = longer ~ as.factor(year_of_obs), data = duration_temp_with_window)
##
## Residuals:
## Min 1Q Median 3Q Max
## -142.852 -24.653 -4.963 20.940 164.222
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5.537 7.985 -0.693 0.48908
## as.factor(year_of_obs)2014 -16.778 11.293 -1.486 0.13937
## as.factor(year_of_obs)2015 -12.444 11.293 -1.102 0.27216
## as.factor(year_of_obs)2016 22.815 11.293 2.020 0.04506 *
## as.factor(year_of_obs)2017 31.000 11.293 2.745 0.00676 **
## as.factor(year_of_obs)2018 8.630 11.293 0.764 0.44591
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 41.49 on 156 degrees of freedom
## Multiple R-squared: 0.1538, Adjusted R-squared: 0.1267
## F-statistic: 5.672 on 5 and 156 DF, p-value: 7.787e-05
Do spring days tend to have flowers that have moved earlier and fall days have flowers that have moved later? Yes.
Here I took each day_of_year for which you have contemporary data, and asked whether species observations made on that day tended to come before (negative), during (0), or after (positive) the historical range:
Presentation Figure 9. (in previous numbering)
Previously, we asked whether the day of year that an observation was made realted to how likely it was to come before, after, or during the historical period, and got this nice realationship.
Presentation Figure 6. (In previous numbering)
I’m now certain this is an artefact; if we use the Hervey dates as our x axis instead, the pattern is reversed.
We can see this more simply if we plot the actual observations.
However, we can maybe remove this by just using the mean of Hervey and our data as the x axis. In which case,for species means, we still get an effect!
##
## Call:
## lm(formula = before_after ~ t1_t2_mean, data = test_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -132.771 -13.525 -3.864 9.659 117.065
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -10.51290 4.82890 -2.177 0.0299 *
## t1_t2_mean 0.10939 0.02507 4.362 1.54e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 25.42 on 541 degrees of freedom
## Multiple R-squared: 0.03398, Adjusted R-squared: 0.03219
## F-statistic: 19.03 on 1 and 541 DF, p-value: 1.541e-05
This suggests a new version of presentation figure 6.
We want to count the number of observations for each species that fall off (<-50 or >75), early (-50-0), late (0-75), during (0)
(Or we can use the 5% and 95% quantiles, < -31 or >84 for off, -31-0 for early, 0-84 for late).
Eventually, we will compare this with phylogeny. For the moment, I’ll arrange these in descending order by the column “early”