2024-01-29
A single categorical predictor with three levels: comparing effect of three kinds of fertiliser on yield (see Grafen & Hails, Section 3.2 Expressing all models as linear equations).
The R model would be
lm(yield ~ treatment)
Grafen & Hails ‘pseudo-matrix’ notation (as in Ch. 5.2, p.87) would be
\[y = \mu + \begin{bmatrix} \mathrm{Treatment} & \mathrm{Difference} \\ 1 & \alpha_1 \\ 2 & \alpha_2 \\ 3 & -\alpha_1-\alpha_2 \end{bmatrix} + \epsilon \]
Take our \(n\) measurements of \(y\), with \(i = 1\ldots n\).
Then, the value of the \(i^\mathrm{th}\) observation \(y_i\) is:
treatment = Fertiliser 1;treatment = Fertiliser 2;treatment = Fertiliser 3.Can we write this in a single equation that reflects the “if fertiliser X” bit? — yes:
We need ‘
We make the if fertiliser is X ‘switches’ by introducing ‘
We set the ‘switches’ like this:
\(\mathbf C=\left[\begin{array}{rr} 1 & 0 \\ 0 & 1 \\ -1 & -1 \end{array}\right]\)
This is the “
We can now write a single linear equation for the linear model:
\[y_i = \mu + \alpha_1 x_{1,i} + \alpha_2 x_{2,i} + \epsilon_i\]
For Fertiliser 1 \((x_1 = 1, x_2=0)\), we get
\(y_i = \mu + (\alpha_1\times 1) + (\alpha_2\times 0) + \epsilon_i =\)
\(y_i =\thinspace\)
For Fertiliser 2 \((x_1 = 0, x_2=1)\), we get
\(y_i = \mu + (\alpha_1\times 0) + (\alpha_2\times 1) + \epsilon_i =\)
\(y_i =\thinspace\)
For Fertiliser 3 \((x_1 = -1, x_2 = -1)\), we get
\(y_i = \mu + (\alpha_1\times -1) + (\alpha_2\times -1) + \epsilon_i =\)
\(y_i =\thinspace\)
\[ \begin{array}{lrrr} &x_1&x_2&x_3\\ \mathrm{Group\ A:}&1&0&0 \\ \mathrm{Group\ B:}&0&1&0 \\ \mathrm{Group\ C:}&0&0&1 \\ \mathrm{Group\ D:}&-1&-1&-1 \end{array} \]
\[y_i = \mu + \alpha_1 x_{1,i} + \alpha_2 x_{2,i} + \alpha_3 x_{3,i} + \epsilon_i\]
For God hath not given us the spirit of fear; but of power, and of love, and of a sound mind.
— 2 Timothy 1:7
We have \(n\) equations that all look the same!
Time for more elegant notation.
Let’s clean up this redundancy and…
\[\mathbf y = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix},\ \mathbf X = \begin{bmatrix} 1 &x_{1,1} & x_{2,1}\\ 1 & x_{1,2} & x_{2,2} \\ \vdots & \vdots & \vdots \\ 1 & x_{1,n} & x_{2,n} \end{bmatrix},\ \boldsymbol\beta = \begin{bmatrix} \mu \\ \alpha_1 \\ \alpha_2 \end{bmatrix},\ \boldsymbol\epsilon = \begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \vdots \\ \epsilon_n \end{bmatrix} \]
We can now state the whole model thus
\[ \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix} = \begin{bmatrix} 1 &x_{1,1} & x_{2,1}\\ 1 & x_{1,2} & x_{2,2} \\ \vdots & \vdots & \vdots \\ 1 & x_{1,n} & x_{2,n} \end{bmatrix} \begin{bmatrix} \mu \\ \alpha_1 \\ \alpha_2 \end{bmatrix} + \begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \vdots \\ \epsilon_n \end{bmatrix} \]
Coefficients are often called ‘betas’: the “intercept” (Grafen & Hails’ \(\mu\)) becomes \(\beta_0\) and the remaining coefficients \(\beta_1, \beta_2, \ldots\)
This works for any number of \(m\) coefficients — for multiple dummy-coded factors and continuous variables:
\[ \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix} = \begin{bmatrix} 1 & x_{1,1} & \ldots & x_{m,1}\\ 1 & x_{1,2} & \ldots & x_{m,2} \\ \vdots & \vdots & \ddots & \vdots\\ 1 & x_{1,n} & \ldots & x_{m,n} \end{bmatrix} \begin{bmatrix} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_m \end{bmatrix} + \begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \vdots \\ \epsilon_n \end{bmatrix} \]
The coding scheme for categorical \(X\) used in Grafen & Hails is known as
\(\mathbf C=\left[\begin{array}{rr} 1&0\\0&1 \\ -1&-1 \end{array}\right]\)
Note how the columns of the contrast matrix sum to zero.
You set sum contrasts like this in R:
group <- factor(c("A", "B", "C"))
contrasts(group) <- contr.sum # set contrasts
contrasts(group) # contrast matrix
## [,1] [,2] ## A 1 0 ## B 0 1 ## C -1 -1
The default coding in R is different! R uses
\(\mathbf C=\left[\begin{array}{rr} 0&0 \\1&0 \\0&1 \end{array}\right]\)
Note how the columns do not sum to zero.
This is what you get unless you tell R otherwise!
group <- factor(c("A", "B", "C")) # no explicit contrasts set
contrasts(group) # contrast matrix is for 'treatment contrasts'!
## B C ## A 0 0 ## B 1 0 ## C 0 1
The model equation is still
but the interpretation of the coefficients \(\boldsymbol\beta\) is very different:
The ‘meaning’ of the \(P\)-values that come with the coefficient estimates also changes when the coding scheme changes:
But you don’t get a test comparing B and C… for this, you’d need post-hoc.
Three groups A, B, C, with means of 0, 1, 2:
y <- c(rnorm(n=100, mean=0), rnorm(n=100, mean=1), rnorm(n=100, mean=2))
x <- factor(rep(c("A", "B", "C"), each=100))
lm(y~x) %>% summary %>% coefficients %>% round(3)
## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.157 0.100 -1.569 0.118 ## xB 1.198 0.141 8.477 0.000 ## xC 2.311 0.141 16.354 0.000
contrasts(x) <- contr.sum lm(y~x) %>% summary %>% coefficients %>% round(3)
## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 1.013 0.058 17.557 0.00 ## x1 -1.170 0.082 -14.336 0.00 ## x2 0.028 0.082 0.346 0.73