library(faraway)
data(aatemp)
aatemp
## year temp
## 1 1854 49.15
## 2 1855 46.52
## 3 1871 48.80
## 4 1881 47.95
## 5 1882 47.31
## 6 1883 44.64
## 7 1884 45.93
## 8 1885 43.41
## 9 1890 48.09
## 10 1891 47.71
## 11 1892 46.13
## 12 1893 45.10
## 13 1894 48.32
## 14 1895 46.39
## 15 1896 47.70
## 16 1897 47.61
## 17 1898 48.89
## 18 1899 48.29
## 19 1900 48.79
## 20 1901 47.30
## 21 1902 47.69
## 22 1903 46.54
## 23 1904 44.61
## 24 1905 46.57
## 25 1906 48.42
## 26 1907 44.62
## 27 1908 47.50
## 28 1909 46.53
## 29 1910 46.93
## 30 1911 48.23
## 31 1912 45.22
## 32 1913 48.37
## 33 1914 46.58
## 34 1915 46.21
## 35 1916 46.58
## 36 1917 43.48
## 37 1918 46.88
## 38 1919 48.15
## 39 1920 46.09
## 40 1921 49.97
## 41 1922 48.37
## 42 1923 46.74
## 43 1924 45.21
## 44 1925 47.06
## 45 1926 45.33
## 46 1927 48.18
## 47 1928 47.28
## 48 1929 46.55
## 49 1930 48.92
## 50 1931 51.17
## 51 1932 48.55
## 52 1933 49.40
## 53 1934 48.66
## 54 1935 47.26
## 55 1936 47.55
## 56 1937 47.73
## 57 1938 50.08
## 58 1939 49.58
## 59 1940 46.71
## 60 1941 49.91
## 61 1942 48.29
## 62 1943 47.12
## 63 1944 48.83
## 64 1945 47.63
## 65 1946 49.45
## 66 1947 46.83
## 67 1948 48.17
## 68 1949 50.19
## 69 1951 47.58
## 70 1952 50.13
## 71 1953 50.88
## 72 1956 48.60
## 73 1957 48.49
## 74 1958 47.38
## 75 1959 48.79
## 76 1960 47.04
## 77 1962 48.21
## 78 1963 48.12
## 79 1964 49.66
## 80 1965 47.84
## 81 1966 48.61
## 82 1967 48.04
## 83 1968 49.09
## 84 1969 48.94
## 85 1970 49.15
## 86 1971 49.48
## 87 1972 47.34
## 88 1973 50.41
## 89 1974 48.10
## 90 1975 47.99
## 91 1976 46.50
## 92 1977 47.17
## 93 1978 45.75
## 94 1979 46.10
## 95 1980 46.43
## 96 1981 47.09
## 97 1982 47.07
## 98 1983 48.19
## 99 1984 47.94
## 100 1985 47.44
## 101 1986 48.16
## 102 1987 49.99
## 103 1988 48.11
## 104 1989 46.50
## 105 1990 49.49
## 106 1991 49.98
## 107 1992 47.01
## 108 1993 47.00
## 109 1994 47.86
## 110 1995 47.73
## 111 1996 46.44
## 112 1997 46.83
## 113 1998 51.89
## 114 1999 49.69
## 115 2000 48.22
attach(aatemp)
lmod<-lm(temp~year)
summary(lmod)
##
## Call:
## lm(formula = temp ~ year)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9843 -0.9113 -0.0820 0.9946 3.5343
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 24.005510 7.310781 3.284 0.00136 **
## year 0.012237 0.003768 3.247 0.00153 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.466 on 113 degrees of freedom
## Multiple R-squared: 0.08536, Adjusted R-squared: 0.07727
## F-statistic: 10.55 on 1 and 113 DF, p-value: 0.001533
par(mfrow=c(2,2))
plot(lmod)
par(mfrow=c(1,1))
plot(temp~year)
abline(coefficients(lmod))
the data shows a linear trend as the data follows the line on Q-Q plot.
b.Report a graphical check for correlated errors. Comment.
plot(residuals(lmod) ~ year, na.omit(aatemp), ylab="Residuals")
abline(h=0)
n <- length(residuals(lmod))
plot(tail(residuals(lmod),n-1) ~ head(residuals(lmod),n-1), xlab=
expression(hat(epsilon)[i]),ylab=expression(hat(epsilon)[i+1]))
abline(h=0,v=0,col=grey(0.75))
In this plot, we see long sequences of points above or below the line. This is an indication of positive serial correlation.
require(nlme)
## Loading required package: nlme
glmod <- gls(temp~year, correlation=corAR1 (form=~year), data=na.omit(aatemp))
summary(glmod)
## Generalized least squares fit by REML
## Model: temp ~ year
## Data: na.omit(aatemp)
## AIC BIC logLik
## 426.5694 437.479 -209.2847
##
## Correlation Structure: ARMA(1,0)
## Formula: ~year
## Parameter estimate(s):
## Phi1
## 0.2303887
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 25.18407 8.971864 2.807006 0.0059
## year 0.01164 0.004626 2.516015 0.0133
##
## Correlation:
## (Intr)
## year -1
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -2.7230803 -0.6321970 -0.0520135 0.6645795 2.3775123
##
## Residual standard error: 1.475718
## Degrees of freedom: 115 total; 113 residual
intervals(glmod,which="var-cov")
## Approximate 95% confidence intervals
##
## Correlation structure:
## lower est. upper
## Phi1 0.02920118 0.2303887 0.4136364
## attr(,"label")
## [1] "Correlation structure:"
##
## Residual standard error:
## lower est. upper
## 1.284091 1.475718 1.695942
the value of estimated correlation 0.23. The intervals show positive correaltion. Also p-value is still significant suggesting a linear trend even after autocorrelation.
(d)Fit a polynomial model with degree 8 and use backward elimination to reduce the degree of the model. Plot your fitted model on top of the data. Use this model to predict the temperature in 2023.
mean(aatemp$year)
## [1] 1939.739
lmod3<-lm(temp~poly(year-1939,8),data=aatemp)
summary(lmod3)
##
## Call:
## lm(formula = temp ~ poly(year - 1939, 8), data = aatemp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.6086 -0.8600 -0.2385 1.0608 3.3975
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 47.7426 0.1313 363.579 < 2e-16 ***
## poly(year - 1939, 8)1 4.7616 1.4082 3.381 0.00101 **
## poly(year - 1939, 8)2 -0.9071 1.4082 -0.644 0.52085
## poly(year - 1939, 8)3 -3.3132 1.4082 -2.353 0.02047 *
## poly(year - 1939, 8)4 2.4383 1.4082 1.732 0.08626 .
## poly(year - 1939, 8)5 3.3824 1.4082 2.402 0.01805 *
## poly(year - 1939, 8)6 1.2124 1.4082 0.861 0.39118
## poly(year - 1939, 8)7 -0.9373 1.4082 -0.666 0.50713
## poly(year - 1939, 8)8 -1.1011 1.4082 -0.782 0.43600
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.408 on 106 degrees of freedom
## Multiple R-squared: 0.2086, Adjusted R-squared: 0.1489
## F-statistic: 3.494 on 8 and 106 DF, p-value: 0.001284
lmod4<-lm(temp~poly(year-1939,7),data=aatemp)
summary(lmod4)
##
## Call:
## lm(formula = temp ~ poly(year - 1939, 7), data = aatemp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.5922 -0.9032 -0.2322 0.9880 3.2941
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 47.7426 0.1311 364.241 < 2e-16 ***
## poly(year - 1939, 7)1 4.7616 1.4056 3.388 0.000988 ***
## poly(year - 1939, 7)2 -0.9071 1.4056 -0.645 0.520083
## poly(year - 1939, 7)3 -3.3132 1.4056 -2.357 0.020234 *
## poly(year - 1939, 7)4 2.4383 1.4056 1.735 0.085672 .
## poly(year - 1939, 7)5 3.3824 1.4056 2.406 0.017828 *
## poly(year - 1939, 7)6 1.2124 1.4056 0.863 0.390303
## poly(year - 1939, 7)7 -0.9373 1.4056 -0.667 0.506341
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.406 on 107 degrees of freedom
## Multiple R-squared: 0.2041, Adjusted R-squared: 0.152
## F-statistic: 3.919 on 7 and 107 DF, p-value: 0.0007651
lmod5<-lm(temp~poly(year-1939,6),data=aatemp)
summary(lmod5)
##
## Call:
## lm(formula = temp ~ poly(year - 1939, 6), data = aatemp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.6846 -0.8825 -0.1428 0.9388 3.2950
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 47.7426 0.1307 365.181 < 2e-16 ***
## poly(year - 1939, 6)1 4.7616 1.4020 3.396 0.000957 ***
## poly(year - 1939, 6)2 -0.9071 1.4020 -0.647 0.518996
## poly(year - 1939, 6)3 -3.3132 1.4020 -2.363 0.019905 *
## poly(year - 1939, 6)4 2.4383 1.4020 1.739 0.084851 .
## poly(year - 1939, 6)5 3.3824 1.4020 2.413 0.017527 *
## poly(year - 1939, 6)6 1.2124 1.4020 0.865 0.389067
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.402 on 108 degrees of freedom
## Multiple R-squared: 0.2008, Adjusted R-squared: 0.1564
## F-statistic: 4.522 on 6 and 108 DF, p-value: 0.0003978
lmod6<-lm(temp~poly(year-1939,5),data=aatemp)
summary(lmod6)
##
## Call:
## lm(formula = temp ~ poly(year - 1939, 5), data = aatemp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.7142 -0.9198 -0.1420 0.9903 3.2364
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 47.7426 0.1306 365.604 < 2e-16 ***
## poly(year - 1939, 5)1 4.7616 1.4004 3.400 0.000942 ***
## poly(year - 1939, 5)2 -0.9071 1.4004 -0.648 0.518500
## poly(year - 1939, 5)3 -3.3132 1.4004 -2.366 0.019749 *
## poly(year - 1939, 5)4 2.4383 1.4004 1.741 0.084470 .
## poly(year - 1939, 5)5 3.3824 1.4004 2.415 0.017384 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.4 on 109 degrees of freedom
## Multiple R-squared: 0.1952, Adjusted R-squared: 0.1583
## F-statistic: 5.289 on 5 and 109 DF, p-value: 0.0002176
lmod7<-lm(temp~poly(year-1939,4),data=aatemp)
summary(lmod7)
##
## Call:
## lm(formula = temp ~ poly(year - 1939, 4), data = aatemp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.0085 -0.9618 -0.0913 0.9926 3.7370
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 47.7426 0.1334 357.827 < 2e-16 ***
## poly(year - 1939, 4)1 4.7616 1.4308 3.328 0.00119 **
## poly(year - 1939, 4)2 -0.9071 1.4308 -0.634 0.52741
## poly(year - 1939, 4)3 -3.3132 1.4308 -2.316 0.02243 *
## poly(year - 1939, 4)4 2.4383 1.4308 1.704 0.09117 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.431 on 110 degrees of freedom
## Multiple R-squared: 0.1522, Adjusted R-squared: 0.1213
## F-statistic: 4.936 on 4 and 110 DF, p-value: 0.001068
lmod8<-lm(temp~poly(year-1939,3),data=aatemp)
summary(lmod8)
##
## Call:
## lm(formula = temp ~ poly(year - 1939, 3), data = aatemp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.8557 -0.9646 -0.1552 1.0485 4.1538
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 47.7426 0.1346 354.796 <2e-16 ***
## poly(year - 1939, 3)1 4.7616 1.4430 3.300 0.0013 **
## poly(year - 1939, 3)2 -0.9071 1.4430 -0.629 0.5309
## poly(year - 1939, 3)3 -3.3132 1.4430 -2.296 0.0236 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.443 on 111 degrees of freedom
## Multiple R-squared: 0.1298, Adjusted R-squared: 0.1063
## F-statistic: 5.518 on 3 and 111 DF, p-value: 0.001436
lmod9<-lm(temp~poly(year-1939,2),data=aatemp)
summary(lmod9)
##
## Call:
## lm(formula = temp ~ poly(year - 1939, 2), data = aatemp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.0412 -0.9538 -0.0624 0.9959 3.5820
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 47.7426 0.1371 348.218 < 2e-16 ***
## poly(year - 1939, 2)1 4.7616 1.4703 3.239 0.00158 **
## poly(year - 1939, 2)2 -0.9071 1.4703 -0.617 0.53851
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.47 on 112 degrees of freedom
## Multiple R-squared: 0.08846, Adjusted R-squared: 0.07218
## F-statistic: 5.434 on 2 and 112 DF, p-value: 0.005591
lmod10<-lm(temp~poly(year-1939,1),data=aatemp)
summary(lmod10)
##
## Call:
## lm(formula = temp ~ poly(year - 1939, 1), data = aatemp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9843 -0.9113 -0.0820 0.9946 3.5343
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 47.7426 0.1367 349.176 < 2e-16 ***
## poly(year - 1939, 1) 4.7616 1.4663 3.247 0.00153 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.466 on 113 degrees of freedom
## Multiple R-squared: 0.08536, Adjusted R-squared: 0.07727
## F-statistic: 10.55 on 1 and 113 DF, p-value: 0.001533
lmodx<-lm(temp~poly(year,8))
summary(lmodx)
##
## Call:
## lm(formula = temp ~ poly(year, 8))
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.6086 -0.8600 -0.2385 1.0608 3.3975
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 47.7426 0.1313 363.579 < 2e-16 ***
## poly(year, 8)1 4.7616 1.4082 3.381 0.00101 **
## poly(year, 8)2 -0.9071 1.4082 -0.644 0.52085
## poly(year, 8)3 -3.3132 1.4082 -2.353 0.02047 *
## poly(year, 8)4 2.4383 1.4082 1.732 0.08626 .
## poly(year, 8)5 3.3824 1.4082 2.402 0.01805 *
## poly(year, 8)6 1.2124 1.4082 0.861 0.39118
## poly(year, 8)7 -0.9373 1.4082 -0.666 0.50713
## poly(year, 8)8 -1.1011 1.4082 -0.782 0.43600
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.408 on 106 degrees of freedom
## Multiple R-squared: 0.2086, Adjusted R-squared: 0.1489
## F-statistic: 3.494 on 8 and 106 DF, p-value: 0.001284
model with degree 5 is the highest polynomial model as P value is significant. We can plot the the fitted model.
plot(temp~year)
points(year, fitted(lmod6),col="orange", pch=20)
(e) Suppose someone claims that the temperature maintained a linear trend until 1930 and then became constant. Fit a model corresponding to this claim. Plot the fitted model on top of the data. What does the fitted model say about this claim?
g1<-lm(temp~year, aatemp,subset=(year<=1930))
g2<-lm(temp~year, aatemp, subset=(year>1930))
plot(g1)
plot(temp~year,aatemp,xlab="Mean temp in Ann Arbor",ylab="Year", main="temp in ann arbor michigan")
abline(v=1930, lty=5)
lhs<-function(x)ifelse(x<=1930, 1930-x,0)
rhs<-function(x)ifelse(x>1930, x-1930,0)
gmod<-lm(temp~lhs(year)+rhs(year),aatemp)
x<-seq(1854,2000,by=1)
py<-gmod$coef[1]+gmod$coef[2]*lhs(x)+gmod$coef[3]*rhs(x)
lines(x,py,lty=2)
g1<-lm(temp~year, aatemp,subset=(year<=1930))
g2<-lm(temp~year, aatemp, subset=(year>1930))
plot(g1)
plot(temp~year,aatemp,xlab="Mean temp in Ann Arbor",ylab="Year", main="temp in ann arbor michigan")
abline(v=1930, lty=5)
lhs<-function(x)ifelse(x<=1930, 47.74,0)
rhs<-function(x)ifelse(x>1930, x-1930,0)
gmod<-lm(temp~lhs(year)+rhs(year),aatemp)
x<-seq(1854,2000,by=1)
py<-gmod$coef[1]+gmod$coef[2]*lhs(x)+gmod$coef[3]*rhs(x)
lines(x,py,lty=2)
summary(g1)
##
## Call:
## lm(formula = temp ~ year, data = aatemp, subset = (year <= 1930))
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.6618 -0.7444 0.1448 1.2600 3.0391
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 54.452092 22.918820 2.376 0.0216 *
## year -0.003915 0.012036 -0.325 0.7464
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.495 on 47 degrees of freedom
## Multiple R-squared: 0.002247, Adjusted R-squared: -0.01898
## F-statistic: 0.1058 on 1 and 47 DF, p-value: 0.7464
summary(g2)
##
## Call:
## lm(formula = temp ~ year, data = aatemp, subset = (year > 1930))
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.3978 -0.9834 -0.1354 0.8804 3.9925
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 72.909528 15.287079 4.769 1.11e-05 ***
## year -0.012519 0.007775 -1.610 0.112
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.302 on 64 degrees of freedom
## Multiple R-squared: 0.03893, Adjusted R-squared: 0.02392
## F-statistic: 2.593 on 1 and 64 DF, p-value: 0.1123
in the second model the constant is average of the temperature. Based on the model, the model seems to have some error.
(e). Make a cubic spline fit with 8 basis functions evenly spaced on the range. Visualize the basis functions. Plot the fit in comparison to the previous fits. Does this model fit better than the straight-line model?
attach(aatemp)
## The following objects are masked from aatemp (pos = 4):
##
## temp, year
require(splines)
## Loading required package: splines
knots<-c(1850,1850,1850,1850,1868.75,1906.25,1943.75,1981.25,2000,2000,2000,2000)
byear<-splineDesign(knots,year)
lmodb<-lm(temp~byear-1)
matplot(year, byear, type="l", col=1)
matplot(year, cbind(temp,lmodb$fit), type="pl", ylab="year", pch=20,lty=1,col=1)
from the graph, this model does not fit better than the straight-line model.