mydataindex = read.table("./Hymenoptera.csv", header = TRUE, sep = ",", na.strings = "NA",
dec = ".", strip.white = TRUE)
mydataindex
Year Hymenoptera SunAPR
1 1970 NA 128.7
2 1971 NA 115.7
3 1972 NA 130.7
4 1973 NA 150.0
5 1974 NA 149.5
6 1975 NA 125.7
7 1976 NA 152.6
8 1977 NA 157.4
9 1978 NA 110.4
10 1979 NA 121.5
11 1980 100.0 166.6
12 1981 96.1 129.0
13 1982 94.5 173.1
14 1983 97.3 136.0
15 1984 99.6 217.1
16 1985 100.7 128.1
17 1986 98.3 127.8
18 1987 96.2 154.2
19 1988 94.5 129.6
20 1989 93.4 132.1
21 1990 95.2 206.5
22 1991 95.7 148.8
23 1992 98.8 124.4
24 1993 100.7 113.7
25 1994 101.8 166.1
26 1995 102.8 173.1
27 1996 105.3 131.6
28 1997 104.5 165.3
29 1998 98.1 120.0
30 1999 95.7 154.9
31 2000 95.1 136.7
32 2001 94.4 134.0
33 2002 90.8 193.5
34 2003 88.9 192.0
35 2004 88.0 135.8
36 2005 90.7 147.4
37 2006 92.8 159.9
38 2007 93.5 215.6
39 2008 89.7 153.0
40 2009 82.9 172.0
41 2010 82.3 201.3
42 2011 84.9 218.3
43 2012 84.2 129.4
44 2013 83.2 168.4
45 2014 80.1 146.4
46 2015 78.0 220.7
47 2016 77.6 168.3
'data.frame': 47 obs. of 3 variables:
$ Year : int 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 ...
$ Hymenoptera: num NA NA NA NA NA NA NA NA NA NA ...
$ SunAPR : num 129 116 131 150 150 ...
Call:
lm(formula = Hymenoptera ~ Year, data = mydataindex)
Residuals:
Min 1Q Median 3Q Max
-7.123 -3.793 -1.203 3.017 11.097
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1152.11166 145.81814 7.901 2.73e-09 ***
Year -0.53001 0.07298 -7.262 1.76e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.74 on 35 degrees of freedom
(10 observations deleted due to missingness)
Multiple R-squared: 0.6011, Adjusted R-squared: 0.5897
F-statistic: 52.74 on 1 and 35 DF, p-value: 1.758e-08
par(mfrow = c(2, 2))
plot(RegModel.1)
mtext("Hymenoptera ~ Year", side = 3, line = -2, outer = TRUE, cex = 1.2, font = 2,
col = "red")
Call:
lm(formula = Hymenoptera ~ Year + I(Year^2), data = mydataindex)
Residuals:
Min 1Q Median 3Q Max
-5.5962 -2.5437 -0.1513 1.7272 7.4874
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.298e+05 2.134e+04 -6.083 6.72e-07 ***
Year 1.306e+02 2.136e+01 6.112 6.16e-07 ***
I(Year^2) -3.281e-02 5.347e-03 -6.137 5.72e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.312 on 34 degrees of freedom
(10 observations deleted due to missingness)
Multiple R-squared: 0.8107, Adjusted R-squared: 0.7996
F-statistic: 72.83 on 2 and 34 DF, p-value: 5.124e-13
par(mfrow = c(2, 2))
plot(RegModel.2, col = "blue", lwd = 2.5, lty = 6)
mtext("Hymenoptera ~ Year + I(Year^2)", side = 3, line = -2, outer = TRUE, cex = 1.2,
font = 2, col = "red")
scatterplot(Hymenoptera ~ Year, regLine = FALSE, smooth = FALSE, boxplots = FALSE,
xlim = c(1980, 2020), ylim = c(70, 110), ylab = "Hymenoptera, index value", cex = 1.3,
pch = 16, data = mydataindex, col = "green")
xx <- seq(1980, 2020, length = 10)
lines(xx, predict(RegModel.1, data.frame(Year = xx)), col = "blue", lwd = 2.5, lty = 6)
lines(xx, predict(RegModel.2, data.frame(Year = xx)), col = "red", lwd = 2.5, lty = 6)
legend("bottomleft", legend = c("Linear Regression: Hymenoptera ~ Year", "Polynomial regression: Hymenoptera ~ Year + I(Year^2)"),
col = c("blue", "red"), lty = c(6, 6), cex = 0.9)
Family: gaussian
Link function: identity
Formula:
Hymenoptera ~ s(Year)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 93.1432 0.3408 273.3 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(Year) 8.351 8.883 48.07 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.922 Deviance explained = 94%
GCV = 5.7515 Scale est. = 4.2979 n = 37
Method: GCV Optimizer: magic
Smoothing parameter selection converged after 7 iterations.
The RMS GCV score gradient at convergence was 1.756672e-06 .
The Hessian was positive definite.
Model rank = 10 / 10
Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.
k' edf k-index p-value
s(Year) 9.00 8.35 0.67 0.015 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Family: gaussian
Link function: identity
Formula:
Hymenoptera ~ s(Year, k = 14)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 93.1432 0.2988 311.7 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(Year) 11.11 12.37 45.66 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.94 Deviance explained = 95.8%
GCV = 4.9125 Scale est. = 3.3044 n = 37
Method: GCV Optimizer: magic
Smoothing parameter selection converged after 9 iterations.
The RMS GCV score gradient at convergence was 3.356308e-05 .
The Hessian was positive definite.
Model rank = 14 / 14
Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.
k' edf k-index p-value
s(Year) 13.0 11.1 0.78 0.05 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Analysis of Variance Table
Model 1: Hymenoptera ~ Year + I(Year^2)
Model 2: Hymenoptera ~ s(Year)
Res.Df RSS Df Sum of Sq F Pr(>F)
1 34.000 373.05
2 27.649 118.83 6.3508 254.22 9.3136 1.02e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Analysis of Variance Table
Model 1: Hymenoptera ~ Year + I(Year^2)
Model 2: Hymenoptera ~ s(Year, k = 14)
Res.Df RSS Df Sum of Sq F Pr(>F)
1 34.000 373.05
2 24.888 82.24 9.1116 290.81 9.6588 3.362e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Results:
par(mfrow = c(1, 1))
xx = seq(1980, 2016, length = 10)
prd = data.frame(Year = xx)
err <- predict(RegModel.2, newdata = prd, se.fit = TRUE)
prd$lci <- err$fit - 1.96 * err$se.fit
prd$fit <- err$fit
prd$uci <- err$fit + 1.96 * err$se.fit
plot(Hymenoptera ~ Year, pch = 16, data = mydataindex, xlim = c(1980, 2016), cex = 1.3,
col = "green")
lines(xx, prd$lci, lty = 5, col = "red")
lines(xx, prd$fit, lty = 5, col = "blue")
lines(xx, prd$uci, lty = 5, col = "red")
grid()
plot(gamyear, residuals = T, pch = 16, shift = mean(mydataindex$Hymenoptera, na.rm = T),
ylab = "Hymenoptera")
Family: gaussian
Link function: identity
Formula:
Hymenoptera ~ s(SunAPR)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 93.143 1.157 80.47 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(SunAPR) 1 1 4.769 0.0356 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.0948 Deviance explained = 12%
GCV = 52.399 Scale est. = 49.566 n = 37
plot(gamsun_01, residuals = T, pch = 16, shift = mean(mydataindex$Hymenoptera, na.rm = T),
ylab = "Hymenoptera")
Method: GCV Optimizer: magic
Smoothing parameter selection converged after 14 iterations.
The RMS GCV score gradient at convergence was 5.263863e-06 .
The Hessian was positive definite.
Model rank = 10 / 10
Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.
k' edf k-index p-value
s(SunAPR) 9 1 0.79 0.07 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
gamyearsun_02 <- gam(Hymenoptera ~ s(Year, k = 14) + s(SunAPR), data = mydataindex)
summary(gamyearsun_02)
Family: gaussian
Link function: identity
Formula:
Hymenoptera ~ s(Year, k = 14) + s(SunAPR)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 93.143 0.302 308.5 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(Year) 11.13 12.37 38.764 <2e-16 ***
s(SunAPR) 1.00 1.00 0.246 0.625
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.938 Deviance explained = 95.9%
GCV = 5.2283 Scale est. = 3.3735 n = 37
plot(gamyearsun_02, residuals = T, pch = 16, shift = mean(mydataindex$Hymenoptera,
na.rm = T), ylab = "Hymenoptera", pages = 1)
Method: GCV Optimizer: magic
Smoothing parameter selection converged after 17 iterations.
The RMS GCV score gradient at convergence was 6.360075e-07 .
The Hessian was positive definite.
Model rank = 23 / 23
Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.
k' edf k-index p-value
s(Year) 13.0 11.1 0.78 0.045 *
s(SunAPR) 9.0 1.0 1.07 0.655
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Analysis of Deviance Table
Model 1: Hymenoptera ~ s(Year, k = 14) + s(SunAPR)
Model 2: Hymenoptera ~ s(Year)
Resid. Df Resid. Dev Df Deviance F Pr(>F)
1 22.627 80.54
2 27.117 118.83 -4.4901 -38.294 2.5281 0.06331 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Analysis of Deviance Table
Model 1: Hymenoptera ~ s(Year, k = 14) + s(SunAPR)
Model 2: Hymenoptera ~ s(SunAPR)
Resid. Df Resid. Dev Df Deviance F Pr(>F)
1 22.627 80.54
2 35.000 1734.82 -12.373 -1654.3 39.633 2.238e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
gamdatacsv = read.table("./gamdata.csv", header = TRUE, sep = ",", na.strings = "NA",
dec = ".", strip.white = TRUE)
head(gamdatacsv, 30)
resp wave hill
1 19 0.0 7.04
2 10 0.1 9.04
3 13 0.2 5.48
4 6 0.3 5.24
5 20 0.4 8.60
6 12 0.5 7.32
7 9 0.6 8.76
8 9 0.7 10.00
9 9 0.8 9.00
10 10 0.9 7.80
11 14 1.0 6.16
12 9 1.1 6.08
13 11 1.2 5.52
14 15 1.3 7.96
15 11 1.4 8.48
16 14 1.5 8.40
17 11 1.6 9.44
18 12 1.7 7.84
19 15 1.8 7.88
20 7 1.9 5.72
21 13 2.0 6.96
22 8 2.1 9.48
23 11 2.2 7.40
24 8 2.3 9.32
25 4 2.4 9.96
26 15 2.5 7.92
27 7 2.6 8.72
28 3 2.7 5.32
29 5 2.8 9.52
30 5 2.9 5.40
'data.frame': 126 obs. of 3 variables:
$ resp: int 19 10 13 6 20 12 9 9 9 10 ...
$ wave: num 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 ...
$ hill: num 7.04 9.04 5.48 5.24 8.6 7.32 8.76 10 9 7.8 ...
glmodel.A1 <- glm(resp ~ wave + hill, family = poisson(link = log), data = gamdatacsv)
summary(glmodel.A1)
Call:
glm(formula = resp ~ wave + hill, family = poisson(link = log),
data = gamdatacsv)
Deviance Residuals:
Min 1Q Median 3Q Max
-4.1532 -0.7368 -0.0819 0.6189 3.6487
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.334230 0.161659 14.439 <2e-16 ***
wave -0.017758 0.007982 -2.225 0.0261 *
hill 0.002789 0.019940 0.140 0.8888
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 193.89 on 125 degrees of freedom
Residual deviance: 188.88 on 123 degrees of freedom
AIC: 699.01
Number of Fisher Scoring iterations: 4
\[\frac{\mathrm{Residual\_Deviance}}{\mathrm{DOF}} = \frac{188.88}{123} = 1.53561 >> 1 \]
glmodel.B1 <- glm(resp ~ wave + hill, family = quasipoisson(link = log), data = gamdatacsv)
summary(glmodel.B1)
Call:
glm(formula = resp ~ wave + hill, family = quasipoisson(link = log),
data = gamdatacsv)
Deviance Residuals:
Min 1Q Median 3Q Max
-4.1532 -0.7368 -0.0819 0.6189 3.6487
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.334230 0.196143 11.901 <2e-16 ***
wave -0.017758 0.009685 -1.834 0.0691 .
hill 0.002789 0.024193 0.115 0.9084
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for quasipoisson family taken to be 1.472122)
Null deviance: 193.89 on 125 degrees of freedom
Residual deviance: 188.88 on 123 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 4
\[\frac{\mathrm{Residual\_Deviance}}{\mathrm{DOF}} = \frac{188.88}{123} = 1.53561 > 1.472122 \]
Analysis of Deviance Table (Type II tests)
Response: resp
Error estimate based on Pearson residuals
Sum Sq Df F value Pr(>F)
wave 4.956 1 3.3665 0.06895 .
hill 0.020 1 0.0133 0.90840
Residuals 181.070 123
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Analysis of Deviance Table (Type II tests)
Response: resp
Error estimate based on Pearson residuals
Sum Sq Df F value Pr(>F)
wave 4.956 1 3.3665 0.06895 .
hill 0.020 1 0.0133 0.90840
Residuals 181.070 123
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
gam_resp_wave_hill_01 <- gam(resp ~ s(wave) + s(hill), data = gamdatacsv)
summary(gam_resp_wave_hill_01)
Family: gaussian
Link function: identity
Formula:
resp ~ s(wave) + s(hill)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.4524 0.2416 39.12 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(wave) 6.056 7.208 5.278 2.42e-05 ***
s(hill) 3.377 4.184 20.625 5.21e-14 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.476 Deviance explained = 51.6%
GCV = 8.0183 Scale est. = 7.3543 n = 126
gam_resp_wave_hill_02 <- gam(resp ~ s(wave) + s(hill), family = poisson(link = log),
data = gamdatacsv)
summary(gam_resp_wave_hill_02)
Family: poisson
Link function: log
Formula:
resp ~ s(wave) + s(hill)
Parametric coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.20920 0.03001 73.63 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df Chi.sq p-value
s(wave) 5.805 6.949 29.57 0.000125 ***
s(hill) 3.593 4.452 66.89 4.52e-13 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.465 Deviance explained = 51.4%
UBRE = -0.087376 Scale est. = 1 n = 126
Analysis of Deviance Table
Model 1: resp ~ s(wave) + s(hill)
Model 2: resp ~ wave + hill
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 113.6 94.195
2 126.0 188.880 -12.4 -94.685 9.356e-15 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Analysis of Deviance Table
Model 1: resp ~ s(wave) + s(hill)
Model 2: resp ~ s(wave) + s(hill)
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 113.60 94.19
2 113.61 849.92 -0.0080872 -755.72 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Crawley, Michael J. 2013. “The R Book Second Edition.” John Wiley & Sons.