What is a fixed effect?

Author

Ralph Porneso (University of Oslo)

If you are working in behavioral or statistical genetics where the term is often used, you might find yourself confused when talking to other academics. Researchers in this field have diverse academic backgrounds – from biology and psychology to economics and the social sciences. While this offers an opportunity to share best practices and learn from each other, it may also pose a challenge. This is especially true because different fields may call or write the same statistical terms and models differently. For example, in a simple regression (i.e., linear) model,

\[Y = \beta X + \epsilon\] \[\epsilon \sim N(0,\sigma^2)\]

\(\beta\) is called fixed effect or the effect of \(X\) on \(Y\) and \(\epsilon\) is residual or the error term.¹ However, \(\beta\) may be called simply as coefficient and \(\epsilon\) as idiosyncratic error by economists. And the assumption that the residuals are independent may be referred to as exogeneity (and its opposite dependence or nesting as endogeneity). The convention of writing a simple regression model may also differ. For instance, the above equation may be written as²

\[Y = \beta X + \mu\] \[\mu \sim N(0,\sigma^2).\]

These minor differences in language and writing style may be innocuous and something that is easily overcome, but confusion may arise in situations that involve more complex data structures. The same statistical tools that academics are familiar with may be used in a more nuanced manner by economists. As such, the equations and the terminologies they employ might confuse anyone outside of economics.

For example, in a scenario where multiple measures are taken from the same entity, \(i\), a researcher might be interested to observe the effect of \(i\) on \(Y\). Alternatively, a researcher may be interested in the effect of \(X\) while controlling for the effect of \(i\) on \(Y\). Both these scenarios are familiar to most academics and are readily addressed by statistical tools known to them. Economists, however, may be interested in observing the effect of \(X\) while controlling for the effect of \(i\) on \(Y\) without having to observe \(i\). Here, economists may use fixed effect estimators.

Note: In the next section, the writing style will follow how economists typically write their models. See footnote 2.

Fixed effect estimators

In economics, one way to control unobserved heterogeneity in a dataset with a suspected structure, like in the example given above, is to cancel out the potential “unintended” source of variation in \(Y\). In this particular case, the data generation process is known. And the source is suspected to be \(i\) which may or may not be observable. This may be modeled as

\[Y_{ij} = \beta X_{ij} + \alpha_i + \mu_{ij}\]

where \(X\) is an explanatory variable that changes over \(i\) and \(j\), \(\alpha_i\) is the unobserved heterogeneity unique to \(i\), and \(\mu_{ij}\) is the idiosyncratic error term. To cancel out \(\alpha_i\), economists may choose to employ a procedure called demeaning:

\[Y_{ij} - \bar{Y}_i = \beta (X_{ij} - \bar{X}_i) + (\alpha_i - \alpha_i) + (\mu_{ij} - \bar{\mu}_i)\]

where

\[\begin{align} \bar{Y}_i = \frac{1}{J} \sum_{1}^{J}Y_{ij} &&,&& \bar{X}_i = \frac{1}{J} \sum_{1}^{J}X_{ij} &&,&& \bar{\mu}_i = \frac{1}{J} \sum_{1}^{J}\mu_{ij} \end{align}.\]

Simplifying gives us:

\[Y_{ij} - \bar{Y}_i = \beta (X_{ij} - \bar{X}_i) + (\mu_{ij} - \bar{\mu}_i)\]

where \(\beta \sim \hat{\beta}_{FE}\) is the fixed effect estimate of \(X\) on \(Y\) since the source of unobserved heterogeneity, \(\alpha_i\), is removed and pooled ordinary least squares (POLS) yields a consistent estimate of \(\beta\).

Another approach that economists may employ is first differences (FD). Using the same example above, it is formulated as:

\[Y_{ij} = \beta X_{ij} + \alpha_i + \mu_{ij}\] \[Y_{ij2} - Y_{ij1} = \beta (X_{ij2} - X_{ij1}) + (\alpha_i - \alpha_i) + (\mu_{ij2} - \mu_{ij1})\] \[\Delta Y_{ij} = \beta \Delta X_{ij} + \Delta\mu_{ij}.\]

\(\beta \sim \hat{\beta}_{FD}\) is the fixed effect estimate of \(X\) on \(Y\) since the source of unobserved heterogeneity, \(\alpha_i\), is similarly removed and pooled ordinary least squares (POLS) yields a consistent estimate of \(\beta\) just like above.

There are other approaches in the economist toolset that aim to provide consistent fixed effect estimates in a dataset where structure is present. One example is least squares dummy variable or LSDV where the levels in \(\alpha_i\) are dummy coded. This allows for the variation in \(Y\) to vary at each level of \(\alpha_i\). These fixed effect estimators possess different properties (e.g., \(1 - \beta_{FE}\) > \(1 - \beta_{FD}\), \(\hat{\beta}_{FE} \sim \hat{\beta}_{LSDV}\), etc.) but they all follow the same set of assumptions which, for brevity, may be succinctly put as exogeneity, i.e.,

\[Cov(\mu_{ij}, X_{ij}) = 0\] \[Cov(\mu_{ij1}, \mu_{ij2}) = 0.\]

To the extent that the above models follow these assumptions, ALL the equations noted are linear (not mixed effects) models. They allow economists to estimate a consistent \(\beta\) across different levels of \(\alpha_i\).

Such sophisticated approaches that allow one to estimate a consistent \(\beta\) may drive confusion among academics in other fields. One reason is \(\alpha_i\) looks like a random term. It only becomes clear that these equations are all linear models when the assumptions are spelled out explicitly. In addition, since \(\alpha_i\) looks like a random term, it is odd for other disciplines that it simply “disappears” by applying simple mathematical operations on the dataset. It may also seem odd that \(\alpha_i\) is not estimated because random effects are typically of interest to biologists and psychologists.

Note: In all succeeding sections, the writing style will follow how biologists and psychologists write their models.

Random effect

Given the same data structure as above, biologists and psychologists will typically add a new coefficient with its associated variance term to the model:

\[Y_{ij} = \beta X_{ij} + \gamma_j + \epsilon_{ij}\] \[\gamma_j \sim N (0, \sigma_{\gamma}^2)\] \[\epsilon_{ij} \sim N (0, \sigma^2)\]

where \(\gamma_j\) is a grouping variable with \(\sigma_{\gamma}^2\) its effect on outcome \(Y\).

Biologists and psychologists refer to \(\sigma_{\gamma}^2\) as a random effect where \(\gamma_j\) is the random intercept or the structured error term. It is called random because the estimates are assumed to have been drawn from a distribution and, as such, shrinkage is performed in the error terms:

\[r_{ij} = Y_{ij} - \hat{Y}_{ij}\] \[\begin{align} \bar{r}_j = \frac{1}{n} \sum_{1}^{n}r_{ij} && ; && k = \frac{\hat{\sigma}_\gamma^2}{\hat{\sigma}_\gamma^2 + \hat{\sigma}^2 / n_j} \end{align}\] \[\hat{\gamma}_j = \bar{r}_j \times k\] \[\hat{\epsilon}_{ij} = r_{ij} - \hat{\gamma}_j\]

where \(\bar{r}_j\) is the mean residual for group \(j\) with \(n\) as the number of observations in group \(j\), and \(k\) is the shrinkage factor with \(n_j\) the total number of \(j\) groups in the sample. The assumptions in this model are:

\[\begin{align} && Cov(\gamma_{j1}, \gamma_{j2}) = 0 && Cov(\gamma_{j1}, \epsilon_{i1j1}) = 0 && Cov(\epsilon_{i1j1}, \epsilon_{i2j1}) = 0 \end{align}\] \[\begin{align} && && && && && && Cov(\gamma_{j1}, \epsilon_{i1j2}) = 0 && Cov(\epsilon_{i1j1}, \epsilon_{i2j2}) = 0 \end{align}\] \[\begin{align} && Cov(\gamma_{j1}, X_{ij}) = 0 && && && && && && Cov(\epsilon_{ij}, X_{ij}) = 0 \end{align}\]

Note that the above assumptions are not tested. Economists are more strict when to consider using fixed effect estimation versus random effects. Hausman test is a tool that economists routinely use to distinguish between a fixed and a random effect. An adjustment of the random effect, similar to the shrinkage shown above, is employed using

\[\lambda = 1 - \frac{\sigma^2}{\sigma^2 + J \sigma_{\gamma}^2}.\]

Since the above model contains both a fixed and a random effect, it is called a mixed effect model.

To expand on this mixed effect model further, a researcher may also want to observe how the effect of the grouping variable \(j\) on outcome \(Y\) changes over the value of \(X\). In this scenario, another coefficient with its associated variance term needs to be added. Since there are 2 random terms, a variance-covariance matrix is also modeled.

\[Y_{ij} = \beta X_{ij} + \gamma_{0j} + \gamma_{1j} X_{ij} + \epsilon_{ij}\] \[\begin{align} \begin{bmatrix} \gamma_0 \\ \gamma_1 \\ \end{bmatrix} \sim N (0, \Omega_\gamma) && ; && \Omega_\gamma = \begin{bmatrix} \sigma^2_{\gamma0} & \\ \sigma_{\gamma01} & \sigma^2_{\gamma1} \\ \end{bmatrix} \end{align}\] \[\epsilon_{ij} \sim N (0, \sigma^2)\]

where \(\gamma_{1j}\) is the random slope and \(\Omega_\gamma\) is the variance-covariance of the random terms. This model is referred to as mixed effects with random slope typically found in growth models. The focus of such models is to estimate the effects of the random terms.

Multilevel models

Whereas economists have a special focus on getting consistent \(\beta\) estimates, biologists and psychologists have a special interest in variation in the outcome that is due to the variation in a grouping variable (i.e., random effects). Hence, psychologists employ different types of mixed models to observe these effects. One such model is a multilevel model:

\[Y_{ijk} = \beta X_{ijk} + \delta_k + \gamma_{jk} + \epsilon_{ijk}\] \[L3: \delta_k \sim N (0, \sigma_{\delta}^2)\] \[L2: \gamma_{jk} \sim N (0, \sigma_{\gamma}^2)\] \[L1: \epsilon_{ijk} \sim N (0, \sigma^2)\]

where \(\gamma_{jk}\) is a grouping variable with its variance term \(\sigma_{\gamma}^2\) and \(\delta_k\) is an additional grouping variable with its variance term, \(\sigma_{\delta}^2\). Both \(\sigma_{\gamma}^2\) and \(\sigma_{\delta}^2\) are random effects.

Biologists and psychologists call the above model a multilevel model because it has more than 2 levels. It is also called a hierarchical model to indicate the type of structure in the data. In this model, \(\gamma_{jk}\) is fully nested in \(\delta_k\) such that \(\sigma_{\delta}^2\) reflects the variance in \(Y\) above and beyond \(\sigma_{\gamma}^2\). \(\sigma_\gamma^2\), on the other hand, represents the between group variation in \(\gamma_{jk}\) within \(\delta_k\). The total between group variation due to \(\gamma_{jk}\) is therefore \(\sigma_{\gamma}^2 + \sigma_{\delta}^2\).

A slightly different specification is required for data structures that are not fully nested. Psychologists refer to this as a crossed multilevel model written as

\[Y_{ij} = \beta X_{ij} + \delta_j + \gamma_j + \epsilon_{ij}.\]

Given that the error terms are uncorrelated, then

\[L2: \delta_j \sim N (0, \sigma_{\delta}^2)\] \[L2: \gamma_j \sim N (0, \sigma_{\gamma}^2)\] \[L1: \epsilon_{ij} \sim N (0, \sigma^2).\]

Here, even though there are 2 grouping variables (similar to the hierarchical model), there are only 2 levels instead of 3.

For a thorough illustration on the impact of mis-specifying a crossed for a nested multilevel model, see https://stats.stackexchange.com/questions/228800/crossed-vs-nested-random-effects-how-do-they-differ-and-how-are-they-specified.

Summary: What is driving the confusion?

Fixed effect estimators, like FE and FD, are called linear models despite having what looks to biologists and psychologists like a random term in the equation, i.e. \(\alpha_i\).
The \(\alpha_i\) coefficient is in the FE and FD equation but it is not in the model explicitly. It is instead assumed to exist but instantly removed by the operations performed in \(Y\) and \(X\).
Some economists refer to \(\alpha_i\) as fixed effect. It may be a convention that is widely accepted in economics because it is used to get \(\hat{\beta}_{FE}\) or \(\hat{\beta}_{FD}\) – the actual fixed effect that demeaning and differencing make possible to estimate without modeling a random term. Such convention may have resulted from shortening the name of fixed effect estimators to simply fixed effect. While perhaps practical, to conclude that \(\alpha_i\) is the fixed effect in the equation goes against the reason for having fixed effect estimators to begin with. This convention drives confusion.
The assumptions in all the fixed effect estimators are not called out explicitly but assumed known.
Difference in interest or focus between economists (consistent \(\hat{\beta}\)) and biologists/psychologists (random terms). Note that both groups want reliable fixed effect and random term estimates.
What biologists and psychologists consider random effect may not be considered random effect by economists due to more rigorous requirements.
To a lesser extent: slight difference in terminologies and style in writing models among disciplines.

Autocorrelation and the pedigree model

In certain scenarios, the terms and the use of statistical tools are similar among researchers from different fields. One such example is when the error term contains a structure, such as auto or serial correlation. We use a simple linear model for illustration

\[Y = \beta X + \epsilon\] \[\epsilon \sim N (0, \sigma^2)\]

where

\[\sigma^2 = \begin{bmatrix} 1 & \rho & \rho^2 \\ \rho & 1 & \rho \\ \rho^2 & \rho & 1 \\ \end{bmatrix}\]

and \(\rho\) is the pair-wise correlation of the entries in the matrix. It is called serial because \(\rho\) decreases as it moves away from 1.

In statistical and behavioral genetics, a person’s kinship (or genetic relatedness) is modeled in a similar fashion. This allows behavioral and statistical geneticists to estimate the variation in trait \(Y\) that is due to a person’s relationship with other individuals in the sample. The expectation is individuals who are related will have a larger \(\rho\) than those who are not and therefore expected to be more similar phenotypically. A pedigree model is specified as

\[Y_i = \beta X_i + \eta_i + \epsilon_i\] \[\eta_i \sim N (0, \sigma_F^2 \Phi)\] \[\epsilon_i \sim N (0, \sigma^2)\]

where \(\Phi\) is 2x the kinship matrix calculated from pedigree information. In this model, the correlation structure is introduced in \(\eta_i\).

Footnotes

The error term is typically assumed to come from a normal distribution, but this is not a strict assumption in a simple linear regression for both biologists/psychologists and economists.↩︎
The use of \(\mu\) to represent the error term is from English economist, Ben Lambert: https://www.youtube.com/watch?v=EbdBHJYbOrg.↩︎