We saw a general strategy for modeling when the concern is with effect size:

- Decide which sort of effect you want to study: total or partial.
- Construct a model that includes both the variable you are interested in and any covariates that you want to hold constant. What sort of model terms (e.g. interactions) you want to include is another matter, you have a choice.
- Compute a partial derivative with respect to the variable whose effect size you want to study holding the other variables constant.

IMPORTANT: Leaving a covariate **out** of a model does not hold it constant. Ignoring is not the same thing as holding constant.

It can occasionally happen that you want to study the effect of a variable that necessarily has an effect on another variable. For instance, suppose we want to study the effect of education on wages. But each year of education necessarily leads to a year less of “experience.” So, include both in the model:

```
mod = lm( wage ~ sex*poly(exper,2)*educ + sector*educ, data=CPS85)
f = makeFun(mod)
f(sex="M", exper=9, educ=15,sector="prof") - f( sex="M", exper=10, educ=14, sector="prof")
```

```
## 1
## 0.4116
```

You could also do this by constructing the differential involving both experience and education, but this will do.

The in-class activity:

Total-vs-partial In-class activity

Running data: Compare the cross-sectional to the longitudinal data to get at how tie changes versus age. Question: hold individual constant or not.

```
f = fetchData("Cherry-Blossom-Long.csv")
```

```
## Retrieving from
## http://www.mosaic-web.org/go/datasets/Cherry-Blossom-Long.csv
```

```
nrow(f)
```

```
## [1] 41248
```

```
sample(f, size = 5)
```

```
## name.yob age net gun sex year previous nruns
## 12117 erin alexander 1982 25 NA 111.38 F 2007 0 2
## 40268 vincent guerrieri 1943 59 118.20 121.65 M 2002 3 6
## 16003 jane schultz 1964 37 77.68 79.13 F 2001 1 7
## 18058 joan mooney 1964 38 92.20 94.27 F 2002 2 3
## 36705 stephanie brown 1964 36 110.95 114.05 F 2000 1 4
## orig.ids
## 12117 12117
## 40268 40268
## 16003 16003
## 18058 18058
## 36705 36705
```

```
f = subset(f, nruns > 5)
nrow(f)
```

```
## [1] 6186
```

Two models

```
mod1 = lm(net ~ age, data = f)
mod2 = lm(net ~ age + name.yob, data = f)
```

```
head(coef(mod2))
```

```
## (Intercept) age
## 59.0891 0.8393
## name.yobabiy zewde 1967 name.yobadam anthony 1966
## 13.5079 -8.3130
## name.yobadam stolzberg 1976 name.yobai-mei chang 1964
## -11.3294 4.6991
```

```
mod1
```

```
##
## Call:
## lm(formula = net ~ age, data = f)
##
## Coefficients:
## (Intercept) age
## 72.051 0.305
##
```

Note how substantially the age dependence differs depending on whether you are looking longitudinally or cross-sectionally.

```
grades = fetchData("grades.csv")
```

```
## Retrieving from http://www.mosaic-web.org/go/datasets/grades.csv
```

```
g2pt = fetchData("grade-to-number.csv")
```

```
## Retrieving from http://www.mosaic-web.org/go/datasets/grade-to-number.csv
```

```
courses = fetchData("courses.csv")
```

```
## Retrieving from http://www.mosaic-web.org/go/datasets/courses.csv
```

```
grades = merge(grades, g2pt) # Convert letter grade to number
grades = merge(grades, courses)
```

Compute the GPA in the ordinary way:

```
options(na.rm = TRUE)
head(mean(gradepoint ~ sid, data = grades))
```

```
## S31185 S31188 S31191 S31194 S31197 S31200
## 2.413 3.018 3.212 3.359 3.331 2.186
```

… or by a model

```
conventional = coef(lm(gradepoint ~ sid - 1, data = grades))
head(conventional)
```

```
## sidS31185 sidS31188 sidS31191 sidS31194 sidS31197 sidS31200
## 2.413 3.018 3.212 3.359 3.331 2.186
```

What can we hold constant? Department, level, class enrollment?

```
adjusted = coef(lm(gradepoint ~ sid - 1 + dept + level + enroll, data = grades))
head(adjusted)
```

```
## sidS31185 sidS31188 sidS31191 sidS31194 sidS31197 sidS31200
## 2.519 3.190 3.137 3.461 3.356 2.142
```

How do they compare?

```
xyplot(conventional ~ adjusted[1:443], pch = 20)
```

Or in terms of class rank:

```
xyplot(rank(conventional) ~ rank(adjusted[1:443]), pch = 20)
```

Suppose the cut-off for class rank was to be \( \geq 150 \). There are students who pass by the adjusted criteria but fail by the unadjusted.

```
xyplot(rank(conventional) ~ rank(adjusted[1:443]), pch = 20)
plotFun(y >= 150 ~ x & y, add = TRUE)
plotFun(x >= 150 ~ x & y, add = TRUE)
```

The “takes easy courses” index: a positive number indicates taking easy courses.

```
densityplot(~rank(conventional) - rank(adjusted[1:443]))
```

Cross the IDs of the people taking easy courses with the department … to be done.