add your names here!
We're going to pull out a small subset of the CPS85 data in order to explore model vectors and their relationship to model terms.
First, to construct the small data set, follow these instructions exactly.
set.seed(1234)
small <- droplevels(sample(CPS85, size = 10))
small <- transform(small, ageGrp = as.character(ntiles(age, 3)))
rownames(small) <- NULL
In this small sample, you can see how many levels there are of each categorical variable with the command with( small, levels( variableName ) )
with(small, levels(sector))
## [1] "clerical" "manag" "manuf" "service"
with(small, levels(union))
## [1] "Not" "Union"
For each of the following models, your job is
small data set.model.matrix() command.model.matrix().How many explanatory model vectors in the model wage ~ 1?
Prediction: There's just one, the intercept. A vector of all 1s.
Confirming
mod = lm(wage ~ 1, data = small)
model.matrix(mod)
## (Intercept)
## 1 1
## 2 1
## 3 1
## 4 1
## 5 1
## 6 1
## 7 1
## 8 1
## 9 1
## 10 1
## attr(,"assign")
## [1] 0
The prediction was right.
The standard deviation of residuals is
sd(resid(mod))
## [1] 3.381
Not at all a perfect fit. Indeed, the model “explains” nothing, since the residuals have a standard deviation that is every bit as large as the response variable itself:
sd(wage, data = small)
## [1] 3.381
wage ~ sexWe predict there will be 2 or 3 vectors.
mod = lm(wage ~ sex, data = small)
model.matrix(mod)
## (Intercept) sexM
## 1 1 0
## 2 1 1
## 3 1 1
## 4 1 1
## 5 1 0
## 6 1 0
## 7 1 0
## 8 1 1
## 9 1 0
## 10 1 1
## attr(,"assign")
## [1] 0 1
## attr(,"contrasts")
## attr(,"contrasts")$sex
## [1] "contr.treatment"
sd(resid(mod))
## [1] 2.912
Put your commentary here. Work as group. Write down interesting things. Then move to the next model.
wage ~ sectorwage ~ unionwage ~ ageGrpwage ~ educ + sexwage ~ age + ageGrpwage ~ educ * sexwage ~ age * sectorwage ~ age * sector + age * sexwage ~ ageGrp*sector + ageGrp*sexmod10 = lm(wage ~ ageGrp * sector + ageGrp * sex, data = small)
sd(resid(mod10))
## [1] 0.6293
model.matrix(mod10)
## (Intercept) ageGrp2nd ageGrp3rd sectormanag sectormanuf sectorservice
## 1 1 0 1 0 0 0
## 2 1 1 0 0 1 0
## 3 1 0 0 0 1 0
## 4 1 1 0 0 0 1
## 5 1 1 0 0 0 0
## 6 1 0 1 0 0 1
## 7 1 0 1 0 0 0
## 8 1 1 0 1 0 0
## 9 1 0 0 0 0 0
## 10 1 0 0 0 0 0
## sexM ageGrp2nd:sectormanag ageGrp3rd:sectormanag ageGrp2nd:sectormanuf
## 1 0 0 0 0
## 2 1 0 0 1
## 3 1 0 0 0
## 4 1 0 0 0
## 5 0 0 0 0
## 6 0 0 0 0
## 7 0 0 0 0
## 8 1 1 0 0
## 9 0 0 0 0
## 10 1 0 0 0
## ageGrp3rd:sectormanuf ageGrp2nd:sectorservice ageGrp3rd:sectorservice
## 1 0 0 0
## 2 0 0 0
## 3 0 0 0
## 4 0 1 0
## 5 0 0 0
## 6 0 0 1
## 7 0 0 0
## 8 0 0 0
## 9 0 0 0
## 10 0 0 0
## ageGrp2nd:sexM ageGrp3rd:sexM
## 1 0 0
## 2 1 0
## 3 0 0
## 4 1 0
## 5 0 0
## 6 0 0
## 7 0 0
## 8 1 0
## 9 0 0
## 10 0 0
## attr(,"assign")
## [1] 0 1 1 2 2 2 3 4 4 4 4 4 4 5 5
## attr(,"contrasts")
## attr(,"contrasts")$ageGrp
## [1] "contr.treatment"
##
## attr(,"contrasts")$sector
## [1] "contr.treatment"
##
## attr(,"contrasts")$sex
## [1] "contr.treatment"
Try fitting the last model to CPS85 (rather than small) and look at the standard deviation of the residuals. Comment on the difference from the residuals fitted to small).