Counting Model Vectors

Group Members

add your names here!

Introduction

We're going to pull out a small subset of the CPS85 data in order to explore model vectors and their relationship to model terms.

First, to construct the small data set, follow these instructions exactly.

set.seed(1234)
small <- droplevels(sample(CPS85, size = 10))
small <- transform(small, ageGrp = as.character(ntiles(age, 3)))
rownames(small) <- NULL

In this small sample, you can see how many levels there are of each categorical variable with the command with( small, levels( variableName ) )

with(small, levels(sector))

## [1] "clerical" "manag"    "manuf"    "service"

with(small, levels(union))

## [1] "Not"   "Union"

For each of the following models, your job is

FIRST, to predict how many explanatory model vectors there will be in the model for this small data set.
Second, confirm or refute your prediction by using the model.matrix() command.
Third, explain any discrepancy between your prediction and the results of model.matrix().
Fourth, find the standard deviation of the residuals. If the model is a “perfect” fit, explain why.

EXAMPLE:

How many explanatory model vectors in the model wage ~ 1?

Prediction: There's just one, the intercept. A vector of all 1s.
Confirming

mod = lm(wage ~ 1, data = small)
model.matrix(mod)

##    (Intercept)
## 1            1
## 2            1
## 3            1
## 4            1
## 5            1
## 6            1
## 7            1
## 8            1
## 9            1
## 10           1
## attr(,"assign")
## [1] 0

The prediction was right.

The standard deviation of residuals is

sd(resid(mod))

## [1] 3.381

Not at all a perfect fit. Indeed, the model “explains” nothing, since the residuals have a standard deviation that is every bit as large as the response variable itself:

sd(wage, data = small)

## [1] 3.381

Model 1: `wage ~ sex`

We predict there will be 2 or 3 vectors.

mod = lm(wage ~ sex, data = small)
model.matrix(mod)

##    (Intercept) sexM
## 1            1    0
## 2            1    1
## 3            1    1
## 4            1    1
## 5            1    0
## 6            1    0
## 7            1    0
## 8            1    1
## 9            1    0
## 10           1    1
## attr(,"assign")
## [1] 0 1
## attr(,"contrasts")
## attr(,"contrasts")$sex
## [1] "contr.treatment"

sd(resid(mod))

## [1] 2.912

Put your commentary here. Work as group. Write down interesting things. Then move to the next model.

Model 2: `wage ~ sector`

Model 3: `wage ~ union`

Model 4: `wage ~ ageGrp`

Model 5: `wage ~ educ + sex`

Model 6: `wage ~ age + ageGrp`

Model 7: `wage ~ educ * sex`

Model 8: `wage ~ age * sector`

Model 9: `wage ~ age * sector + age * sex`

Model 10: `wage ~ ageGrpsector + ageGrpsex`

mod10 = lm(wage ~ ageGrp * sector + ageGrp * sex, data = small)
sd(resid(mod10))

## [1] 0.6293

model.matrix(mod10)

##    (Intercept) ageGrp2nd ageGrp3rd sectormanag sectormanuf sectorservice
## 1            1         0         1           0           0             0
## 2            1         1         0           0           1             0
## 3            1         0         0           0           1             0
## 4            1         1         0           0           0             1
## 5            1         1         0           0           0             0
## 6            1         0         1           0           0             1
## 7            1         0         1           0           0             0
## 8            1         1         0           1           0             0
## 9            1         0         0           0           0             0
## 10           1         0         0           0           0             0
##    sexM ageGrp2nd:sectormanag ageGrp3rd:sectormanag ageGrp2nd:sectormanuf
## 1     0                     0                     0                     0
## 2     1                     0                     0                     1
## 3     1                     0                     0                     0
## 4     1                     0                     0                     0
## 5     0                     0                     0                     0
## 6     0                     0                     0                     0
## 7     0                     0                     0                     0
## 8     1                     1                     0                     0
## 9     0                     0                     0                     0
## 10    1                     0                     0                     0
##    ageGrp3rd:sectormanuf ageGrp2nd:sectorservice ageGrp3rd:sectorservice
## 1                      0                       0                       0
## 2                      0                       0                       0
## 3                      0                       0                       0
## 4                      0                       1                       0
## 5                      0                       0                       0
## 6                      0                       0                       1
## 7                      0                       0                       0
## 8                      0                       0                       0
## 9                      0                       0                       0
## 10                     0                       0                       0
##    ageGrp2nd:sexM ageGrp3rd:sexM
## 1               0              0
## 2               1              0
## 3               0              0
## 4               1              0
## 5               0              0
## 6               0              0
## 7               0              0
## 8               1              0
## 9               0              0
## 10              0              0
## attr(,"assign")
##  [1] 0 1 1 2 2 2 3 4 4 4 4 4 4 5 5
## attr(,"contrasts")
## attr(,"contrasts")$ageGrp
## [1] "contr.treatment"
## 
## attr(,"contrasts")$sector
## [1] "contr.treatment"
## 
## attr(,"contrasts")$sex
## [1] "contr.treatment"

Finally …

Try fitting the last model to CPS85 (rather than small) and look at the standard deviation of the residuals. Comment on the difference from the residuals fitted to small).

Counting Model Vectors

Group Members

Introduction

EXAMPLE:

Model 1: wage ~ sex

Model 2: wage ~ sector

Model 3: wage ~ union

Model 4: wage ~ ageGrp

Model 5: wage ~ educ + sex

Model 6: wage ~ age + ageGrp

Model 7: wage ~ educ * sex

Model 8: wage ~ age * sector

Model 9: wage ~ age * sector + age * sex

Model 10: wage ~ ageGrp*sector + ageGrp*sex