1. The aatemp data come from the U.S. Historical Climatology Network. They are the annual mean temperatures (in degrees F) in Ann Arbor, Michigan going back about 150 years.
  1. Is there a linear trend?
library(faraway) 
data(aatemp) 
aatemp
##     year  temp
## 1   1854 49.15
## 2   1855 46.52
## 3   1871 48.80
## 4   1881 47.95
## 5   1882 47.31
## 6   1883 44.64
## 7   1884 45.93
## 8   1885 43.41
## 9   1890 48.09
## 10  1891 47.71
## 11  1892 46.13
## 12  1893 45.10
## 13  1894 48.32
## 14  1895 46.39
## 15  1896 47.70
## 16  1897 47.61
## 17  1898 48.89
## 18  1899 48.29
## 19  1900 48.79
## 20  1901 47.30
## 21  1902 47.69
## 22  1903 46.54
## 23  1904 44.61
## 24  1905 46.57
## 25  1906 48.42
## 26  1907 44.62
## 27  1908 47.50
## 28  1909 46.53
## 29  1910 46.93
## 30  1911 48.23
## 31  1912 45.22
## 32  1913 48.37
## 33  1914 46.58
## 34  1915 46.21
## 35  1916 46.58
## 36  1917 43.48
## 37  1918 46.88
## 38  1919 48.15
## 39  1920 46.09
## 40  1921 49.97
## 41  1922 48.37
## 42  1923 46.74
## 43  1924 45.21
## 44  1925 47.06
## 45  1926 45.33
## 46  1927 48.18
## 47  1928 47.28
## 48  1929 46.55
## 49  1930 48.92
## 50  1931 51.17
## 51  1932 48.55
## 52  1933 49.40
## 53  1934 48.66
## 54  1935 47.26
## 55  1936 47.55
## 56  1937 47.73
## 57  1938 50.08
## 58  1939 49.58
## 59  1940 46.71
## 60  1941 49.91
## 61  1942 48.29
## 62  1943 47.12
## 63  1944 48.83
## 64  1945 47.63
## 65  1946 49.45
## 66  1947 46.83
## 67  1948 48.17
## 68  1949 50.19
## 69  1951 47.58
## 70  1952 50.13
## 71  1953 50.88
## 72  1956 48.60
## 73  1957 48.49
## 74  1958 47.38
## 75  1959 48.79
## 76  1960 47.04
## 77  1962 48.21
## 78  1963 48.12
## 79  1964 49.66
## 80  1965 47.84
## 81  1966 48.61
## 82  1967 48.04
## 83  1968 49.09
## 84  1969 48.94
## 85  1970 49.15
## 86  1971 49.48
## 87  1972 47.34
## 88  1973 50.41
## 89  1974 48.10
## 90  1975 47.99
## 91  1976 46.50
## 92  1977 47.17
## 93  1978 45.75
## 94  1979 46.10
## 95  1980 46.43
## 96  1981 47.09
## 97  1982 47.07
## 98  1983 48.19
## 99  1984 47.94
## 100 1985 47.44
## 101 1986 48.16
## 102 1987 49.99
## 103 1988 48.11
## 104 1989 46.50
## 105 1990 49.49
## 106 1991 49.98
## 107 1992 47.01
## 108 1993 47.00
## 109 1994 47.86
## 110 1995 47.73
## 111 1996 46.44
## 112 1997 46.83
## 113 1998 51.89
## 114 1999 49.69
## 115 2000 48.22
attach(aatemp)
lmod<-lm(temp~year)
summary(lmod)
## 
## Call:
## lm(formula = temp ~ year)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9843 -0.9113 -0.0820  0.9946  3.5343 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 24.005510   7.310781   3.284  0.00136 **
## year         0.012237   0.003768   3.247  0.00153 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.466 on 113 degrees of freedom
## Multiple R-squared:  0.08536,    Adjusted R-squared:  0.07727 
## F-statistic: 10.55 on 1 and 113 DF,  p-value: 0.001533
par(mfrow=c(2,2))
plot(lmod)

par(mfrow=c(1,1))
plot(temp~year)
abline(coefficients(lmod))

the data shows a linear trend as the data follows the line on Q-Q plot.

b.Report a graphical check for correlated errors. Comment.

plot(residuals(lmod) ~ year, na.omit(aatemp), ylab="Residuals")
abline(h=0)

n <- length(residuals(lmod))
plot(tail(residuals(lmod),n-1) ~ head(residuals(lmod),n-1), xlab=
expression(hat(epsilon)[i]),ylab=expression(hat(epsilon)[i+1]))
abline(h=0,v=0,col=grey(0.75))

In this plot, we see long sequences of points above or below the line. This is an indication of positive serial correlation.

  1. Fit a model which allows for serial correlation with an AR(1) model for the errors. What is the estimated correlation and is it significant? Does this model change your opinion of the trend? Speculate why there might be correlation in the errors.
require(nlme)
## Loading required package: nlme
glmod <- gls(temp~year, correlation=corAR1 (form=~year), data=na.omit(aatemp))
summary(glmod)
## Generalized least squares fit by REML
##   Model: temp ~ year 
##   Data: na.omit(aatemp) 
##        AIC     BIC    logLik
##   426.5694 437.479 -209.2847
## 
## Correlation Structure: ARMA(1,0)
##  Formula: ~year 
##  Parameter estimate(s):
##      Phi1 
## 0.2303887 
## 
## Coefficients:
##                Value Std.Error  t-value p-value
## (Intercept) 25.18407  8.971864 2.807006  0.0059
## year         0.01164  0.004626 2.516015  0.0133
## 
##  Correlation: 
##      (Intr)
## year -1    
## 
## Standardized residuals:
##        Min         Q1        Med         Q3        Max 
## -2.7230803 -0.6321970 -0.0520135  0.6645795  2.3775123 
## 
## Residual standard error: 1.475718 
## Degrees of freedom: 115 total; 113 residual
intervals(glmod,which="var-cov")
## Approximate 95% confidence intervals
## 
##  Correlation structure:
##           lower      est.     upper
## Phi1 0.02920118 0.2303887 0.4136364
## attr(,"label")
## [1] "Correlation structure:"
## 
##  Residual standard error:
##    lower     est.    upper 
## 1.284091 1.475718 1.695942

the value of estimated correlation 0.23. The intervals show positive correaltion. Also p-value is still significant suggesting a linear trend even after autocorrelation.

(d)Fit a polynomial model with degree 8 and use backward elimination to reduce the degree of the model. Plot your fitted model on top of the data. Use this model to predict the temperature in 2023.

mean(aatemp$year)
## [1] 1939.739
lmod3<-lm(temp~poly(year-1939,8),data=aatemp)
summary(lmod3)
## 
## Call:
## lm(formula = temp ~ poly(year - 1939, 8), data = aatemp)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.6086 -0.8600 -0.2385  1.0608  3.3975 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            47.7426     0.1313 363.579  < 2e-16 ***
## poly(year - 1939, 8)1   4.7616     1.4082   3.381  0.00101 ** 
## poly(year - 1939, 8)2  -0.9071     1.4082  -0.644  0.52085    
## poly(year - 1939, 8)3  -3.3132     1.4082  -2.353  0.02047 *  
## poly(year - 1939, 8)4   2.4383     1.4082   1.732  0.08626 .  
## poly(year - 1939, 8)5   3.3824     1.4082   2.402  0.01805 *  
## poly(year - 1939, 8)6   1.2124     1.4082   0.861  0.39118    
## poly(year - 1939, 8)7  -0.9373     1.4082  -0.666  0.50713    
## poly(year - 1939, 8)8  -1.1011     1.4082  -0.782  0.43600    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.408 on 106 degrees of freedom
## Multiple R-squared:  0.2086, Adjusted R-squared:  0.1489 
## F-statistic: 3.494 on 8 and 106 DF,  p-value: 0.001284
lmod4<-lm(temp~poly(year-1939,7),data=aatemp)
summary(lmod4)
## 
## Call:
## lm(formula = temp ~ poly(year - 1939, 7), data = aatemp)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.5922 -0.9032 -0.2322  0.9880  3.2941 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            47.7426     0.1311 364.241  < 2e-16 ***
## poly(year - 1939, 7)1   4.7616     1.4056   3.388 0.000988 ***
## poly(year - 1939, 7)2  -0.9071     1.4056  -0.645 0.520083    
## poly(year - 1939, 7)3  -3.3132     1.4056  -2.357 0.020234 *  
## poly(year - 1939, 7)4   2.4383     1.4056   1.735 0.085672 .  
## poly(year - 1939, 7)5   3.3824     1.4056   2.406 0.017828 *  
## poly(year - 1939, 7)6   1.2124     1.4056   0.863 0.390303    
## poly(year - 1939, 7)7  -0.9373     1.4056  -0.667 0.506341    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.406 on 107 degrees of freedom
## Multiple R-squared:  0.2041, Adjusted R-squared:  0.152 
## F-statistic: 3.919 on 7 and 107 DF,  p-value: 0.0007651
lmod5<-lm(temp~poly(year-1939,6),data=aatemp)
summary(lmod5)
## 
## Call:
## lm(formula = temp ~ poly(year - 1939, 6), data = aatemp)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.6846 -0.8825 -0.1428  0.9388  3.2950 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            47.7426     0.1307 365.181  < 2e-16 ***
## poly(year - 1939, 6)1   4.7616     1.4020   3.396 0.000957 ***
## poly(year - 1939, 6)2  -0.9071     1.4020  -0.647 0.518996    
## poly(year - 1939, 6)3  -3.3132     1.4020  -2.363 0.019905 *  
## poly(year - 1939, 6)4   2.4383     1.4020   1.739 0.084851 .  
## poly(year - 1939, 6)5   3.3824     1.4020   2.413 0.017527 *  
## poly(year - 1939, 6)6   1.2124     1.4020   0.865 0.389067    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.402 on 108 degrees of freedom
## Multiple R-squared:  0.2008, Adjusted R-squared:  0.1564 
## F-statistic: 4.522 on 6 and 108 DF,  p-value: 0.0003978
lmod6<-lm(temp~poly(year-1939,5),data=aatemp)
summary(lmod6)
## 
## Call:
## lm(formula = temp ~ poly(year - 1939, 5), data = aatemp)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.7142 -0.9198 -0.1420  0.9903  3.2364 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            47.7426     0.1306 365.604  < 2e-16 ***
## poly(year - 1939, 5)1   4.7616     1.4004   3.400 0.000942 ***
## poly(year - 1939, 5)2  -0.9071     1.4004  -0.648 0.518500    
## poly(year - 1939, 5)3  -3.3132     1.4004  -2.366 0.019749 *  
## poly(year - 1939, 5)4   2.4383     1.4004   1.741 0.084470 .  
## poly(year - 1939, 5)5   3.3824     1.4004   2.415 0.017384 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.4 on 109 degrees of freedom
## Multiple R-squared:  0.1952, Adjusted R-squared:  0.1583 
## F-statistic: 5.289 on 5 and 109 DF,  p-value: 0.0002176
lmod7<-lm(temp~poly(year-1939,4),data=aatemp)
summary(lmod7)
## 
## Call:
## lm(formula = temp ~ poly(year - 1939, 4), data = aatemp)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.0085 -0.9618 -0.0913  0.9926  3.7370 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            47.7426     0.1334 357.827  < 2e-16 ***
## poly(year - 1939, 4)1   4.7616     1.4308   3.328  0.00119 ** 
## poly(year - 1939, 4)2  -0.9071     1.4308  -0.634  0.52741    
## poly(year - 1939, 4)3  -3.3132     1.4308  -2.316  0.02243 *  
## poly(year - 1939, 4)4   2.4383     1.4308   1.704  0.09117 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.431 on 110 degrees of freedom
## Multiple R-squared:  0.1522, Adjusted R-squared:  0.1213 
## F-statistic: 4.936 on 4 and 110 DF,  p-value: 0.001068
lmod8<-lm(temp~poly(year-1939,3),data=aatemp)
summary(lmod8)
## 
## Call:
## lm(formula = temp ~ poly(year - 1939, 3), data = aatemp)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.8557 -0.9646 -0.1552  1.0485  4.1538 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            47.7426     0.1346 354.796   <2e-16 ***
## poly(year - 1939, 3)1   4.7616     1.4430   3.300   0.0013 ** 
## poly(year - 1939, 3)2  -0.9071     1.4430  -0.629   0.5309    
## poly(year - 1939, 3)3  -3.3132     1.4430  -2.296   0.0236 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.443 on 111 degrees of freedom
## Multiple R-squared:  0.1298, Adjusted R-squared:  0.1063 
## F-statistic: 5.518 on 3 and 111 DF,  p-value: 0.001436
lmod9<-lm(temp~poly(year-1939,2),data=aatemp)
summary(lmod9)
## 
## Call:
## lm(formula = temp ~ poly(year - 1939, 2), data = aatemp)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.0412 -0.9538 -0.0624  0.9959  3.5820 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            47.7426     0.1371 348.218  < 2e-16 ***
## poly(year - 1939, 2)1   4.7616     1.4703   3.239  0.00158 ** 
## poly(year - 1939, 2)2  -0.9071     1.4703  -0.617  0.53851    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.47 on 112 degrees of freedom
## Multiple R-squared:  0.08846,    Adjusted R-squared:  0.07218 
## F-statistic: 5.434 on 2 and 112 DF,  p-value: 0.005591
lmod10<-lm(temp~poly(year-1939,1),data=aatemp)
summary(lmod10)
## 
## Call:
## lm(formula = temp ~ poly(year - 1939, 1), data = aatemp)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9843 -0.9113 -0.0820  0.9946  3.5343 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           47.7426     0.1367 349.176  < 2e-16 ***
## poly(year - 1939, 1)   4.7616     1.4663   3.247  0.00153 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.466 on 113 degrees of freedom
## Multiple R-squared:  0.08536,    Adjusted R-squared:  0.07727 
## F-statistic: 10.55 on 1 and 113 DF,  p-value: 0.001533
lmodx<-lm(temp~poly(year,8))
summary(lmodx)
## 
## Call:
## lm(formula = temp ~ poly(year, 8))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.6086 -0.8600 -0.2385  1.0608  3.3975 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     47.7426     0.1313 363.579  < 2e-16 ***
## poly(year, 8)1   4.7616     1.4082   3.381  0.00101 ** 
## poly(year, 8)2  -0.9071     1.4082  -0.644  0.52085    
## poly(year, 8)3  -3.3132     1.4082  -2.353  0.02047 *  
## poly(year, 8)4   2.4383     1.4082   1.732  0.08626 .  
## poly(year, 8)5   3.3824     1.4082   2.402  0.01805 *  
## poly(year, 8)6   1.2124     1.4082   0.861  0.39118    
## poly(year, 8)7  -0.9373     1.4082  -0.666  0.50713    
## poly(year, 8)8  -1.1011     1.4082  -0.782  0.43600    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.408 on 106 degrees of freedom
## Multiple R-squared:  0.2086, Adjusted R-squared:  0.1489 
## F-statistic: 3.494 on 8 and 106 DF,  p-value: 0.001284

model with degree 5 is the highest polynomial model as P value is significant. We can plot the the fitted model.

plot(temp~year)
points(year, fitted(lmod6),col="orange", pch=20)

(e) Suppose someone claims that the temperature maintained a linear trend until 1930 and then became constant. Fit a model corresponding to this claim. Plot the fitted model on top of the data. What does the fitted model say about this claim?

g1<-lm(temp~year, aatemp,subset=(year<=1930))
g2<-lm(temp~year, aatemp, subset=(year>1930))
plot(g1)

plot(temp~year,aatemp,xlab="Mean temp in Ann Arbor",ylab="Year", main="temp in ann arbor michigan")
abline(v=1930, lty=5)
lhs<-function(x)ifelse(x<=1930, 1930-x,0)
rhs<-function(x)ifelse(x>1930, x-1930,0)
gmod<-lm(temp~lhs(year)+rhs(year),aatemp)
x<-seq(1854,2000,by=1)
py<-gmod$coef[1]+gmod$coef[2]*lhs(x)+gmod$coef[3]*rhs(x)
lines(x,py,lty=2)

g1<-lm(temp~year, aatemp,subset=(year<=1930))
g2<-lm(temp~year, aatemp, subset=(year>1930))
plot(g1)

plot(temp~year,aatemp,xlab="Mean temp in Ann Arbor",ylab="Year", main="temp in ann arbor michigan")
abline(v=1930, lty=5)
lhs<-function(x)ifelse(x<=1930, 47.74,0)
rhs<-function(x)ifelse(x>1930, x-1930,0)
gmod<-lm(temp~lhs(year)+rhs(year),aatemp)
x<-seq(1854,2000,by=1)
py<-gmod$coef[1]+gmod$coef[2]*lhs(x)+gmod$coef[3]*rhs(x)
lines(x,py,lty=2)

summary(g1)
## 
## Call:
## lm(formula = temp ~ year, data = aatemp, subset = (year <= 1930))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.6618 -0.7444  0.1448  1.2600  3.0391 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 54.452092  22.918820   2.376   0.0216 *
## year        -0.003915   0.012036  -0.325   0.7464  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.495 on 47 degrees of freedom
## Multiple R-squared:  0.002247,   Adjusted R-squared:  -0.01898 
## F-statistic: 0.1058 on 1 and 47 DF,  p-value: 0.7464
summary(g2)
## 
## Call:
## lm(formula = temp ~ year, data = aatemp, subset = (year > 1930))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.3978 -0.9834 -0.1354  0.8804  3.9925 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 72.909528  15.287079   4.769 1.11e-05 ***
## year        -0.012519   0.007775  -1.610    0.112    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.302 on 64 degrees of freedom
## Multiple R-squared:  0.03893,    Adjusted R-squared:  0.02392 
## F-statistic: 2.593 on 1 and 64 DF,  p-value: 0.1123

in the second model the constant is average of the temperature. Based on the model, the model seems to have some error.

(e). Make a cubic spline fit with 8 basis functions evenly spaced on the range. Visualize the basis functions. Plot the fit in comparison to the previous fits. Does this model fit better than the straight-line model?

attach(aatemp)
## The following objects are masked from aatemp (pos = 4):
## 
##     temp, year
require(splines)
## Loading required package: splines
knots<-c(1850,1850,1850,1850,1868.75,1906.25,1943.75,1981.25,2000,2000,2000,2000)
byear<-splineDesign(knots,year)
lmodb<-lm(temp~byear-1)
matplot(year, byear, type="l", col=1)

matplot(year, cbind(temp,lmodb$fit), type="pl", ylab="year", pch=20,lty=1,col=1)

from the graph, this model does not fit better than the straight-line model.