It would be very nice to have the capability to name the levels of `model.matrix`

-generated input variables better. For example, the input variables that are generated when `contr.sum()`

is used are just numbered, not named. I think there is some magic going on within `model.matrix`

that allows the contrasts to be sensibly named *only* if the default contrasts are used (i.e. no contrasts are explicitly specified). For example, here are the names of the input variables generated by the default contrasts:

```
set.seed(102)
d <- data.frame(y = runif(100), f = factor(sample(c("low", "medium", "high"),
100, replace = TRUE), levels = c("low", "medium", "high")))
colnames(model.matrix(y ~ f, data = d))
```

```
## [1] "(Intercept)" "fmedium" "fhigh"
```

But if we explicitly specify treatment contrasts, even though we get the same model matrix (not shown), the names are now useless:

```
colnames(model.matrix(y ~ f, data = d, contrasts = list(f = contr.treatment)))
```

```
## [1] "(Intercept)" "f2" "f3"
```

Similarly, if we ask for

```
colnames(model.matrix(y ~ f, data = d, contrasts = list(f = contr.sum)))
```

```
## [1] "(Intercept)" "f1" "f2"
```

What if we set the default contrasts using `options()`

?

```
old_opts <- options(contrasts = c("contr.treatment", "contr.poly"))
colnames(model.matrix(y ~ f, data = d))
```

```
## [1] "(Intercept)" "fmedium" "fhigh"
```

```
options(contrasts = c("contr.sum", "contr.poly"))
colnames(model.matrix(y ~ f, data = d))
```

```
## [1] "(Intercept)" "f1" "f2"
```

```
options(old_opts)
```

The code for creating the column names of the model matrix is actually at lines 578ff of https://github.com/wch/r-source/blob/trunk/src/library/stats/src/model.c … it would require either a lot more staring at this code, or source-level debugging, t find out exactly where the difference in behaviour (i.e., why default contrasts get sensible names and non-default contrasts only get numbers) is coming from.

The `car`

package defines a `contr.Sum()`

function that gives *slightly* improved input variable names, but they're only a little bit better. The main point is that they indicate clearly that sum-to-zero contrasts are being used, but they don't help with guessing which levels are being compared.

```
library("car")
colnames(model.matrix(y ~ f, data = d, contrasts = list(f = contr.Sum)))
```

```
## [1] "(Intercept)" "f[S.1]" "f[S.2]"
```

A clunky workaround is to specify a contrast *matrix* with defined column names:

```
cm <- contr.sum(3)
colnames(cm) <- c("a", "b")
colnames(model.matrix(y ~ f, data = d, contrasts = list(f = cm)))
```

```
## [1] "(Intercept)" "fa" "fb"
```

We could even cheat and add a separator:

```
colnames(cm) <- paste0(".", c("a", "b"))
colnames(model.matrix(y ~ f, data = d, contrasts = list(f = cm)))
```

```
## [1] "(Intercept)" "f.a" "f.b"
```

I've defined a `contr.Sum2()`

(not shown) (I called it `contr.Sum2`

to differentiate it from the `car`

version) that automates this somewhat: unfortunately,
it can't be specified simply as `contr.Sum2`

– the optional `factor`

or `levels`

arguments must be provided …

```
colnames(m <- model.matrix(y ~ f, data = d, contrasts = list(f = contr.Sum2(factor = d$f))))
```

```
## [1] "(Intercept)" "f[S.low]" "f[S.medium]"
```

I don't see an easy way to make this more convenient without fairly deep hacking …

What does a weighted contrast matrix look like? Do we have to mess with the intercept column? The easy way to do it (presumably) is to center the design matrix:

```
colMeans(m)
```

```
## (Intercept) f[S.low] f[S.medium]
## 1.00 0.07 0.09
```

```
m[, -1] <- scale(m[, -1], center = TRUE, scale = FALSE)
zapsmall(colMeans(m))
```

```
## (Intercept) f[S.low] f[S.medium]
## 1 0 0
```

```
head(m)
```

```
## (Intercept) f[S.low] f[S.medium]
## 1 1 -1.07 -1.09
## 2 1 -0.07 0.91
## 3 1 -1.07 -1.09
## 4 1 -0.07 0.91
## 5 1 0.93 -0.09
## 6 1 -1.07 -1.09
```

```
prop.table(table(d$f))
```

```
##
## low medium high
## 0.35 0.37 0.28
```

```
apply(m[, -1], 2, unique)
```

```
## f[S.low] f[S.medium]
## [1,] -1.07 -1.09
## [2,] -0.07 0.91
## [3,] 0.93 -0.09
```

If I could do the simple algebra to figure out where these numbers come from, this could fairly easily be incorporated in `contr.Sum2()`

…