Counting Model Vectors

Group Members

add your names here!

Introduction

We're going to pull out a small subset of the CPS85 data in order to explore model vectors and their relationship to model terms.

First, to construct the small data set, follow these instructions exactly.

set.seed(1234)
small <- droplevels(sample(CPS85, size = 10))
small <- transform(small, ageGrp = as.character(ntiles(age, 3)))
rownames(small) <- NULL

In this small sample, you can see how many levels there are of each categorical variable with the command with( small, levels( variableName ) )

with(small, levels(sector))
## [1] "clerical" "manag"    "manuf"    "service"
with(small, levels(union))
## [1] "Not"   "Union"

For each of the following models, your job is

EXAMPLE:

How many explanatory model vectors in the model wage ~ 1?

mod = lm(wage ~ 1, data = small)
model.matrix(mod)
##    (Intercept)
## 1            1
## 2            1
## 3            1
## 4            1
## 5            1
## 6            1
## 7            1
## 8            1
## 9            1
## 10           1
## attr(,"assign")
## [1] 0

The prediction was right.

The standard deviation of residuals is

sd(resid(mod))
## [1] 3.381

Not at all a perfect fit. Indeed, the model “explains” nothing, since the residuals have a standard deviation that is every bit as large as the response variable itself:

sd(wage, data = small)
## [1] 3.381

Model 1: wage ~ sex

We predict there will be 2 or 3 vectors.

mod = lm(wage ~ sex, data = small)
model.matrix(mod)
##    (Intercept) sexM
## 1            1    0
## 2            1    1
## 3            1    1
## 4            1    1
## 5            1    0
## 6            1    0
## 7            1    0
## 8            1    1
## 9            1    0
## 10           1    1
## attr(,"assign")
## [1] 0 1
## attr(,"contrasts")
## attr(,"contrasts")$sex
## [1] "contr.treatment"
sd(resid(mod))
## [1] 2.912

Put your commentary here. Work as group. Write down interesting things. Then move to the next model.

Model 2: wage ~ sector

Model 3: wage ~ union

Model 4: wage ~ ageGrp

Model 5: wage ~ educ + sex

Model 6: wage ~ age + ageGrp

Model 7: wage ~ educ * sex

Model 8: wage ~ age * sector

Model 9: wage ~ age * sector + age * sex

Model 10: wage ~ ageGrp*sector + ageGrp*sex

mod10 = lm(wage ~ ageGrp * sector + ageGrp * sex, data = small)
sd(resid(mod10))
## [1] 0.6293
model.matrix(mod10)
##    (Intercept) ageGrp2nd ageGrp3rd sectormanag sectormanuf sectorservice
## 1            1         0         1           0           0             0
## 2            1         1         0           0           1             0
## 3            1         0         0           0           1             0
## 4            1         1         0           0           0             1
## 5            1         1         0           0           0             0
## 6            1         0         1           0           0             1
## 7            1         0         1           0           0             0
## 8            1         1         0           1           0             0
## 9            1         0         0           0           0             0
## 10           1         0         0           0           0             0
##    sexM ageGrp2nd:sectormanag ageGrp3rd:sectormanag ageGrp2nd:sectormanuf
## 1     0                     0                     0                     0
## 2     1                     0                     0                     1
## 3     1                     0                     0                     0
## 4     1                     0                     0                     0
## 5     0                     0                     0                     0
## 6     0                     0                     0                     0
## 7     0                     0                     0                     0
## 8     1                     1                     0                     0
## 9     0                     0                     0                     0
## 10    1                     0                     0                     0
##    ageGrp3rd:sectormanuf ageGrp2nd:sectorservice ageGrp3rd:sectorservice
## 1                      0                       0                       0
## 2                      0                       0                       0
## 3                      0                       0                       0
## 4                      0                       1                       0
## 5                      0                       0                       0
## 6                      0                       0                       1
## 7                      0                       0                       0
## 8                      0                       0                       0
## 9                      0                       0                       0
## 10                     0                       0                       0
##    ageGrp2nd:sexM ageGrp3rd:sexM
## 1               0              0
## 2               1              0
## 3               0              0
## 4               1              0
## 5               0              0
## 6               0              0
## 7               0              0
## 8               1              0
## 9               0              0
## 10              0              0
## attr(,"assign")
##  [1] 0 1 1 2 2 2 3 4 4 4 4 4 4 5 5
## attr(,"contrasts")
## attr(,"contrasts")$ageGrp
## [1] "contr.treatment"
## 
## attr(,"contrasts")$sector
## [1] "contr.treatment"
## 
## attr(,"contrasts")$sex
## [1] "contr.treatment"

Finally …

Try fitting the last model to CPS85 (rather than small) and look at the standard deviation of the residuals. Comment on the difference from the residuals fitted to small).