** 4.6)** The pctUrban increases by one unit is associated with 0.01 decrease in fertility.
5.4) 1)
library(alr4)
## Loading required package: car
## Loading required package: effects
## Loading required package: lattice
## Loading required package: grid
## Loading required package: colorspace
##
## Attaching package: 'effects'
##
## The following object is masked from 'package:car':
##
## Prestige
data(MinnLand)
boxplot(log(acrePrice)~year, data=MinnLand)
From 2002 to 2006, the sale prices in minnesota had a lot of bottom and top outliners. From 2007 to 2011, the most of the outliner is at the bottom. The sales price in minnesta increases from 2002 to 2008. Then, the price seems to be constant from 2002 to 2011.
2)……
a1<-as.factor(MinnLand$year)
a2<-lm(log(acrePrice)~a1,data=MinnLand)
summary(a2)
##
## Call:
## lm(formula = log(acrePrice) ~ a1, data = MinnLand)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.950 -0.379 0.130 0.435 2.346
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.27175 0.02848 255.34 < 2e-16 ***
## a12003 -0.00155 0.03206 -0.05 0.96
## a12004 0.14794 0.03155 4.69 2.8e-06 ***
## a12005 0.36026 0.03176 11.34 < 2e-16 ***
## a12006 0.39392 0.03195 12.33 < 2e-16 ***
## a12007 0.47682 0.03186 14.97 < 2e-16 ***
## a12008 0.68364 0.03162 21.62 < 2e-16 ***
## a12009 0.71407 0.03355 21.28 < 2e-16 ***
## a12010 0.75733 0.03260 23.23 < 2e-16 ***
## a12011 0.72071 0.03527 20.44 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.678 on 18690 degrees of freedom
## Multiple R-squared: 0.129, Adjusted R-squared: 0.129
## F-statistic: 308 on 9 and 18690 DF, p-value: <2e-16
The intercept which represents the estimated mean for year 2002 is about 7.27. The other coefficients estimate the difference between the particular year with the year 2002. From the t-statistics, except for year 2004, the acreprice in other years are very different from the price in 2002.
3)….
a3<-lm(log(acrePrice)~a1-1,data=MinnLand)
summary(a3)
##
## Call:
## lm(formula = log(acrePrice) ~ a1 - 1, data = MinnLand)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.950 -0.379 0.130 0.435 2.346
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## a12002 7.2717 0.0285 255 <2e-16 ***
## a12003 7.2702 0.0147 493 <2e-16 ***
## a12004 7.4197 0.0136 546 <2e-16 ***
## a12005 7.6320 0.0141 543 <2e-16 ***
## a12006 7.6657 0.0145 529 <2e-16 ***
## a12007 7.7486 0.0143 542 <2e-16 ***
## a12008 7.9554 0.0137 579 <2e-16 ***
## a12009 7.9858 0.0177 450 <2e-16 ***
## a12010 8.0291 0.0159 506 <2e-16 ***
## a12011 7.9925 0.0208 384 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.678 on 18690 degrees of freedom
## Multiple R-squared: 0.992, Adjusted R-squared: 0.992
## F-statistic: 2.42e+05 on 10 and 18690 DF, p-value: <2e-16
table(MinnLand$year)
##
## 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
## 566 2114 2490 2321 2187 2248 2431 1459 1823 1061
n<- with(MinnLand, tapply(log(acrePrice),year,length))
means<- with(MinnLand, tapply(log(acrePrice),year,mean))
SDs <- with(MinnLand, tapply(log(acrePrice),year,sd))
data.frame(n,df=n-1, means,SDs, ses=SDs/sqrt(n))
## n df means SDs ses
## 2002 566 565 7.272 0.6350 0.02669
## 2003 2114 2113 7.270 0.7584 0.01649
## 2004 2490 2489 7.420 0.7356 0.01474
## 2005 2321 2320 7.632 0.7086 0.01471
## 2006 2187 2186 7.666 0.6617 0.01415
## 2007 2248 2247 7.749 0.6425 0.01355
## 2008 2431 2430 7.955 0.6209 0.01259
## 2009 1459 1458 7.986 0.6113 0.01600
## 2010 1823 1822 8.029 0.6360 0.01490
## 2011 1061 1060 7.992 0.7012 0.02153
After omitting the intercept E(Y|X)=β1x1+β2x2+…βn*xn, when adding the data from year 2002, β1=1 and the other βs=0, so does the other βs. As result, the parameter estimates the mean of log(acrePrice) for each year.
From r, we can see that the standard errors using SD/sqr(n) and the standard errors of the regression coefficients are different. This is because that they use different n value. For the standard error using the formula, the n is the number of data in the particular year. For the standard error in the regression, the n is 18690+10=18700. Because the n is different, they will get different standard errors.