Overview

By the end of these workshops and your own studies, you should be able to

  • Formally describe a normal linear model
  • Implement normal models (continuous, categorical predictors)
  • Conceptually understand maximum likelihood estimation (how lm determines model coefficients)
  • Interpret model coefficients and quantities related to null hypothesis testing (confidence interval, t-value, p-value)
  • Predict hypothetical outcomes given a statistical model

Using a combination of theoretical descriptions, practical exercises, implementations, and simulations.

Before we start \(\dots\)

Go to the NOW learning room of psyc30815 and download exercises.zip from the week 2 unit.

Open the folder and copy the R-scripts into the “exercises” folder inside the R-projects folder from last week.

If you don’t have an “exercises” folder, create on, or move the “exercises” folder in the zip directory over to your R-project.

What are normal linear models?

  • Often introduced as fitting lines to points.
  • Limited perspective that makes more complex regression models, like generalised linear models, hard to understand.
  • Extension to generalised and multilevel linear models (multiple / simple linear regressions, AN(C)OVAs, MAN(C)OVAs)
  • Backbone of statistical modelling: machine learning, time series models, path analysis, structural equation models, factor analysis, signal detection models

What are normal linear models?

  • A model of how the probability distribution of one variable, known as the outcome variable, varies as a function of other variables, known as the explanatory or predictor variables.
  • The most basic type of regression models is the normal linear model.
  • In normal models, we assume that the outcome variable is normally distributed and that its mean varies linearly with changes in a set of predictor variables.
  • By understanding the normal linear model thoroughly, we can see how it can be extended to deal with data and problems beyond those that it is designed for.

Normal linear model (formal description)

\[ \begin{aligned} y_i &\sim \mathcal{N}(\mu, \sigma^2) \end{aligned} \]

  • \(y\): observed outcome variable
  • Each observation \(y_i\) is a sample (indexed by \(i\))
  • Lower case letters indicate observed and known variables.
  • “\(\sim\)”: “is proportional to” / “is a function of”
  • \(\mathcal{N}\): univariate normal distributed probability model; i.e. the underlying process that describes the nature of our data.
  • Population parameters are represented as Greek letters; i.e. mean \(\mu\) and variance \(\sigma^2\).

Simulation of normal distributed data

  • rnorm generates unimodal, symmetrically distributed values.
  • r in rnorm = random; norm = normal
# Generate data
y <- rnorm(n = 10, mean = 550, sd = 10)

Simulation of normal distributed data

  • rnorm generates unimodal, symmetrically distributed values.
  • r in rnorm = random; norm = normal
# Generate data
y <- rnorm(n = 100, mean = 550, sd = 10)

Simulation of normal distributed data

  • rnorm generates unimodal, symmetrically distributed values.
  • r in rnorm = random; norm = normal
# Generate data
y <- rnorm(n = 1000, mean = 550, sd = 10)

Simulation: exercise

Work through the scripts

  • normal_model_simulation_1.R

You will need to complete the code in those scripts; i.e. replace “---” with the correct information.

Then observe how changing the number of observations N affects the model estimates.

Normal linear model (formal description)

\[ \begin{aligned} y_i &\sim \mathcal{N}(\mu_i, \sigma^2)\\ \mu_i &= \beta_0 + \beta_1 \cdot x_{1i} \\ \end{aligned} \]

Value of \(\mu_i\) can be decomposed using a linear regression equation.

  • \(\mu_i\) is a deterministic (“\(=\)”) function of one or more predictors \(x\) and the unknowns \(\beta_0\) (intercept) and \(\beta_1\) (slope).
  • \(\beta_0\) is “constant”
  • \(\beta_1\) directly depends on explanatory variable \(x_{1i}\)
  • If \(x_{1i}\) takes on 0, \(\mu_i = \beta_0\)
  • If \(x_{1i}\) takes on 1, \(\mu_i = \beta_0 + \beta_1\)
  • \(\beta_1\) is difference between \(x_{1i}=1\) and \(x_{1i}=0\).

Simulation of normal distributed data

We can use lm to implement such a model.

model <- lm(y ~ x, data = data)
  • How do we know that lm returns accurate estimates for our model parameters?
  • We can simulate data that exactly meet model assumptions (normal distribution, equal variance, independence etc.) with known model parameters.

Simulation of normal distributed data

  • rnorm generates unimodal, symmetrically distributed values.
  • r in rnorm = random; norm = normal
# Generate data
y <- rnorm(n = 1000, mean = 550, sd = 10)

Simulation of normal distributed data

  • rnorm generates unimodal, symmetrically distributed values.
  • r in rnorm = random; norm = normal
# Decompose the mean
beta_0 = 500
beta_1 = 50
# Generate data
y <- rnorm(n = 1000, mean = beta_0 + beta_1, sd = 10)

Simulation of normal distributed data

# Set parameter values
n <- 1000
beta_0 <- 500 
beta_1 <- 50
sigma <- 100
# Random data for group 1
group_1 <- rnorm(n = n/2, mean = beta_0, sd = sigma)
# Random data for group 2
group_2 <- rnorm(n = n/2, mean = beta_0 + beta_1, sd = sigma)
# Generate data
sim_data <- tibble(group_1 = group_1,
                   group_2 = group_2) 

# Preview data
glimpse(sim_data)
Rows: 500
Columns: 2
$ group_1 <dbl> 467, 477, 459, 590, 392, 640, 559, 558, 528, 557, 591, 514, 48…
$ group_2 <dbl> 416, 642, 547, 668, 610, 566, 388, 613, 511, 542, 498, 584, 64…
# Change data format
sim_data <- pivot_longer(sim_data, cols = c(group_1, group_2), names_to = "x", values_to = "y")

# Preview data
glimpse(sim_data)
Rows: 1,000
Columns: 2
$ x <chr> "group_1", "group_2", "group_1", "group_2", "group_1", "group_2", "g…
$ y <dbl> 467, 416, 477, 642, 459, 547, 590, 668, 392, 610, 640, 566, 559, 388…

Simulation of normal distributed data

# As a reminder, these are the parameter values
beta_0 <- 500 
beta_1 <- 50
sigma <- 100
# Normal linear model of simulated data
model <- lm(y ~ x, data = sim_data)

coef(model) # Model coefficients
(Intercept)    xgroup_2 
      503.7        43.5 
sigma(model) # Standard deviation
[1] 99.4

Simulation: exercise

Work through the scripts

  • normal_model_simulation_2.R

You will need to complete the code in those scripts; i.e. replace “---” with the correct information.

Then observe how changing the number of observations N affects the model estimates.

Normal linear model (example)

A model of how the mean of the distribution of reaction times (rt) varies as a function of age and sex.

\[ \begin{align} \text{rt}_i &\sim \mathcal{N}(\mu_i, \sigma^2)\\ \mu_i &= \beta_0 + \beta_\text{age} \cdot \text{age}_i + \beta_\text{sex} \cdot \text{sex}_i \end{align} \]

For \(i \in 1\dots n\) where \(n\) is total number of observations.

  • Writing down (on paper) a formal model description for a hypothetical normal linear model of blood pressure; state that we assume that blood pressure varies by nutrition, age, and levels of physical activity.
  • We will then write down the model description in RMarkdown.
  • Adapt the model description to your own dataset.

Real data

  • In real life we don’t know the underlying process that generates the data or the true parameter values.
  • For the simulated data we know these:
    • the process we used was a normal distribution because rnorm samples normally distributed data.
    • we determined the the parameters of the mean \(\beta_0\) and \(\beta_1\).
    • both groups share the same variance \(\sigma^2\).
  • We can use linear models to estimate parameter values from the data (i.e. we make assumptions about the process that generates the data).
  • Maximum likelihood estimation: for which set of hypothetical parameter values are the data most likely.

Real data

  • Data may appear non-normal distributed for small samples.

  • That’s fine if have reasons to believe the data come from a process that is normal distributed.

  • Above data are by-participant means, meaning the central limit theorem applies.

  • Yet, reaction times are zero bound, so the process that generates these data is not continuous.

  • (Also the psychological distance between equal units of time might not be linear.)

Real data

  • Log-normal distributions are zero-bound.
  • We assume that the \(rt\) values are sampled from a log-normal distribution with a fixed and unknown mean \(\mu\) and standard deviation \(\sigma\).

\[ \log(\text{rt}_i) \sim \mathcal{N}(\mu_i, \sigma^2) \]

Real data: exercise

  • Complete exercise scripts
    • normal_model_blomkvist_1.R
    • normal_model_blomkvist_2.R
  • Make sure you understand how the model coefficients relate to the visualisation.

Real data

Predictor variable sex is denoted as \(\text{sex}_1, \text{sex}_2, \dots \text{sex}_n\), \(\text{ for } i \in 1\dots n\)

\[ \log(\text{rt}_i) \sim \mathcal{N}(\mu_i, \sigma^2)\\ \mu_i = \beta_0 + \beta_\text{sex} \cdot \text{sex}_i \]

where \(\text{sex}_i\) takes on \(0\) for females and \(1\) for males.

\[ \text{sex}_i = \left\{ \begin{array}{ll} 0, \text{ if sex}_i = \texttt{female}\\ 1, \text{ if sex}_i = \texttt{male}\\ \end{array} \right. \]

variance is the same for males and females.

Real data

Predictor variable sex is denoted as \(\text{sex}_1, \text{sex}_2, \dots \text{sex}_n\), \(\text{ for } i \in 1\dots n\)

\[ \log(\text{rt}_i) \sim \mathcal{N}(\mu_i, \sigma^2)\\ \mu_i = \beta_0 + \beta_\text{sex} \cdot \text{sex}_i \]

where \(\text{sex}_i\) takes on \(0\) for females and \(1\) for males.

\[ \text{sex}_i = \left\{ \begin{array}{ll} 0, \text{ if sex}_i = \texttt{female}\\ 1, \text{ if sex}_i = \texttt{male}\\ \end{array} \right. \]

  • \(\beta_0\) is the average \(\log(\text{rt})\) for females because \(\mu_i = \beta_0 + \beta_\text{sex} \cdot 0 = \beta_0\).
  • Because \(\mu_i = \beta_0 + \beta_\text{sex} \cdot 1 = \beta_0 + \beta_\text{sex}\) is the average \(\log(\text{rt})\) for females, \(\beta_\text{sex}\) is the difference in the averages \(\log(\text{rt})\) between males and females.

Compare simulated and real data

# Set parameter values
beta_0 <- 500 
beta_1 <- 50
sigma <- 100
# Inspect data
glimpse(data, width = 50)
Rows: 1,000
Columns: 2
$ x <chr> "group_1", "group_2", "group_1", "grou…
$ y <dbl> 469, 504, 494, 409, 471, 583, 506, 552…
# Specify model
model <- lm(y ~ x, data = data)
# Model coefficients
coef(model)
(Intercept)    xgroup_2 
      505.6        48.4 
# Load and transform data
blomkvist <- read_csv("../data/blomkvist.csv") %>% 
  mutate(log_rt = log(rt_hand_d)) %>% 
  select(sex, log_rt)
# Inspect data
glimpse(blomkvist, width = 50)
Rows: 267
Columns: 2
$ sex    <chr> "male", "female", "female", "fema…
$ log_rt <dbl> 6.55, 6.15, 6.46, 6.56, 6.41, 6.2…
# Specify model
model <- lm(log_rt ~ sex, data = blomkvist)
# Model coefficients
coef(model)
(Intercept)     sexmale 
     6.4602     -0.0943 

Interpreting coeffcients on the log scale

Slope coef is not additive (linear) but multiplicative (exponential scale)

# Specify model
model <- lm(log_rt ~ sex, data = blomkvist)

# Model coefficients
(coefs <- as.vector(coef(model)))
[1]  6.4602 -0.0943

I.e. logrt is \(\beta_1\) times longer than \(\beta_0\)

# Converting coefficients to msecs
intercept <- coefs[1]
slope <- coefs[2]

Exponentional function exp un-logs coefficients.

exp(intercept) # female group in msecs
[1] 639
exp(intercept + slope) # male group in msecs
[1] 582
exp(intercept + slope) - exp(intercept) # in msecs
[1] -57.5
  • interpreting_coefficients_1.R
  • interpreting_coefficients_2.R
  • interpreting_coefficients_with_emmeans.R

Maximum likelihood estimation

If we assume the model

\[ y_i \sim \mathcal{N}(\mu_i, \sigma^2) \text{, for } i \in 1\dots n\\ \mu_i = \beta_0 + \beta_1 \cdot x_{1i} \]

We estimate values of 2 unknowns plus the number of predictor coefficients (here 1) using maximum likelihood estimation of the unknowns, which we can denote by

\[ \hat\beta_0,\hat\beta_1,\hat\sigma, \]

MLEs are the set of values that make the observed data most probable.

Maximum likelihood estimation

The maximum likelihood values of coefficients \(\hat\beta_0,\hat\beta_1\) are those values of \(\beta_0,\beta_1\) variables that minimize residual sum of squares

\[ \text{RSS} = \sum_{i=1}^n \mid y_i - \mu_i\mid^2, \]

where \(\mu_i\) is

\[ \mu_i = \beta_0 + \beta_1 \cdot x_{1i} \]

This can be simulated using grid approximation for models with small sets of unknowns: see script mle.R

Fitting a normal linear model using lm

For larger sets of parameters we can use maximum likelihood estimation as implemented in lm.

model <- lm(rt ~ sex + age, data = blomkvist)

We’ll continue with rt (instead logrt) data for simplicity.

Maximum likelihood estimators of \(\beta_0\), \(\beta_\text{sex}\) and \(\beta_\text{age}\) are as follow:

(coefs <- coef(model))
(Intercept)     sexmale         age 
     338.64      -58.93        5.75 

Fitting a normal linear model using lm

Once we have \(\hat\mu\), the maximum likelihood estimate of \(\sigma\), denoted as \(\hat\sigma\), is:

\[ \hat\sigma = \sqrt{\frac{1}{n-K-1} \cdot \sum_{i=1}^n\mid y_i-\hat\mu_i\mid^2} \]

  • \(\sum_{i=1}^n\mid y_i-\hat\mu_i\mid^2\) is the RSS (\(y_i\): observed data; \(\hat\mu_i\): model prediction).
  • \(K\) is the number of predictors
  • \(n\) is the number of observations
  • Complete script calculating_sigma.R

Before we move on…

  1. Construct a theoretical model for your own data (formal description and a draft of verbal description).
  2. Implement your theoretical model using lm().
  3. Work out the maximum likelihood estimators obtained by lm() but instead of summary(), use of the broom package and run
tidy(my_model, conf.int = TRUE)

Hypothesis testing: t-values

If \(\beta_1\) is the hypothesized true value of the coefficient, and \(\hat\beta_1\) is the estimated coefficient, then

\[ \frac{\hat\beta_1 - \beta_1}{\hat{\text{se}}_1} \sim t_{n-K-1}, \]

which tells us that if the true value of the coefficient was some hypothesized value \(\beta_1\), then we expect the t-statistic to have a certain range of values with high probability.

Hypothesis testing: t-values

We will test a hypothesis about the coefficient for sex in the model above.

# Extract the coefficients table from summary
coefs <- summary(model)$coefficients
# MLE for sex
(beta_sex <- coefs['sexmale', 'Estimate'])
[1] -58.9
# Standard error
(se_sex <- coefs['sexmale', 'Std. Error'])
[1] 18.5

If we hypothesize the true value of the coefficient is exactly 10, then our t-statistic is

(t <- (beta_sex - 10)/se_sex)
[1] -3.73

\[ \frac{\hat\beta_\text{sex} - \beta_\text{sex}}{\hat{\text{se}}_\text{sex}} = \frac{-58.9 - 10}{18.5} = -3.74 \]

Null hypothesis significance testing: t-values

We hypothesize the true value of the coefficient is exactly 0.0, then our t-statistic is

\[ \frac{\hat\beta_k - \beta_k}{\hat{\text{se}}_k} = \frac{-58.9 - 0.0}{18.5} = \frac{-58.9}{18.5} = -3.19 \]

(t <- (beta_sex - 0.0)/se_sex)
[1] -3.19

Null hypothesis significance testing: p-values

n <- nrow(blomkvist)
K <- 2

Probability of a value smaller than or equal to absolute t.

pt(abs(t), df = n - K - 1) 
[1] 0.999


Null hypothesis significance testing: p-values

n <- nrow(blomkvist)
K <- 2

Probability of a value larger than or equal to absolute t.

1 - pt(abs(t), df = n - K - 1) 
[1] 0.000797


Null hypothesis significance testing: p-values

n <- nrow(blomkvist)
K <- 2

Probability of a value more extreme or equal to t.

2 * (1 - pt(abs(t), df = n - K - 1))
[1] 0.00159


Null hypothesis significance testing: p-values

If we hypothesize that the true value of the coefficient is exactly 0.0, then the probability of obtaining a value more extreme or equal to the t-value we obtained is:

2 * (1 - pt(abs(t), df = n - K - 1))
[1] 0.00159

The null hypothesis has a very low probability, lower than what would be unsurprising (\(\alpha > .05\)), and can be rejected.


p-values

  • The p-value for the hypothesis is the probability of getting a value as or more extreme than the absolute value of t-statistic in the t-distribution.
  • This is the area under the curve in the t-distribution that is as or more extreme than the absolute value of t-statistic.
n <- nrow(blomkvist)
K <- 2
pt(abs(t), df = n - K - 1, lower.tail = FALSE) * 2
[1] 0.00159

p-values

  • Work through script exercise/t_values.R
  • Calculate t and p-value for \(\hat\beta_\text{age}\) testing the null hypothesis that the true value \(\beta_\text{age}\) is 0.0.
  • Verify your calculation using
summary(model)$coefficients
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   338.64     27.359   12.38 4.91e-28
sexmale       -58.93     18.473   -3.19 1.59e-03
age             5.75      0.439   13.09 1.72e-30

t-distribution

t-distribution

t-distribution

t-distribution

t-distribution

Confidence intervals

Set of all hypothetical values that we do not rule out at the \(\alpha < 0.05\) level of significance is known as the 95% confidence interval.

\[ 95\% \text{ CI} = \hat\beta_1 \pm \tau_\nu \cdot \hat{\text{se}}_1 \]

\(\pm\tau_\nu\) are values in t-distribution (with \(\nu\) degrees of freedom) that embrace 95% of the area under the curve: \(\tau\approx1.96\) for large dfs.

Confidence intervals

Set of all hypothetical values that we do not rule out at the \(\alpha < 0.05\) level of significance is known as the 95% confidence interval.

\[ 95\% \text{ CI} = \hat\beta_1 \pm \tau_\nu \cdot \hat{\text{se}}_1 \]

\(\pm\tau_\nu\) are values in t-distribution (with \(\nu\) degrees of freedom) that embrace 95% of the area under the curve: \(\tau\approx1.96\) for large dfs.

Confidence intervals

Set of all hypothetical values that we do not rule out at the \(\alpha < 0.05\) level of significance is known as the 95% confidence interval.

\[ 95\% \text{ CI} = \hat\beta_1 \pm \tau_\nu \cdot \hat{\text{se}}_1 \]

\(\pm\tau_\nu\) are values in t-distribution (with \(\nu\) degrees of freedom) that embrace 95% of the area under the curve: \(\tau\approx1.96\) for large dfs.

Confidence intervals

Set of all hypothetical values that we do not rule out at the \(\alpha < 0.05\) level of significance is known as the 95% confidence interval.

\[ 95\% \text{ CI} = \hat\beta_1 \pm \tau_\nu \cdot \hat{\text{se}}_1 \]

\(\pm\tau_\nu\) are values in t-distribution (with \(\nu\) degrees of freedom) that embrace 95% of the area under the curve: \(\tau\approx1.96\) for large dfs.

Confidence intervals

Set of all hypothetical values that we do not rule out at the \(\alpha < 0.05\) level of significance is known as the 95% confidence interval.

\[ 95\% \text{ CI} = \hat\beta_1 \pm \tau_\nu \cdot \hat{\text{se}}_1 \]

\(\pm\tau_\nu\) are values in t-distribution (with \(\nu\) degrees of freedom) that embrace 95% of the area under the curve: \(\tau\approx1.96\) for large dfs.

Confidence intervals

Set of all hypothetical values that we do not rule out at the \(\alpha < 0.05\) level of significance is known as the 95% confidence interval.

\[ 95\% \text{ CI} = \hat\beta_1 \pm \tau_\nu \cdot \hat{\text{se}}_1 \] \(\pm\tau_\nu\) are values in t-distribution (with \(\nu\) degrees of freedom) that embrace 95% of the area under the curve: \(\tau\approx1.96\) for large dfs.

Calculate the 95% confidence intervals

We can calculate \(\tau\) using qt

(tau <- qt(.975, df = n - K - 1))
[1] 1.97

where n is the number of participants and K is the number of predictors.

Calculate the 95% confidence intervals

(coefs <- coef(summary(model)))
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   338.64     27.359   12.38 4.91e-28
sexmale       -58.93     18.473   -3.19 1.59e-03
age             5.75      0.439   13.09 1.72e-30
beta_sex <- coefs[2,1] # Slope coef of "sex"
se_sex <- coefs[2,2] # SE of "sex" coef
# Lower and upper bound of 95% CI
c(beta_sex - se_sex * tau,
  beta_sex + se_sex * tau)
[1] -95.3 -22.6

Calculate the 95% confidence intervals

(coefs <- coef(summary(model)))
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   338.64     27.359   12.38 4.91e-28
sexmale       -58.93     18.473   -3.19 1.59e-03
age             5.75      0.439   13.09 1.72e-30
beta_sex <- coefs[2,1] # Slope coef of "sex"
se_sex <- coefs[2,2] # SE of "sex" coef
# Lower and upper bound of 95% CI
c(beta_sex - se_sex * tau,
  beta_sex + se_sex * tau)
[1] -95.3 -22.6

Calculate the 95% confidence intervals

confint(model, parm = 'sexmale',  level = .95)
        2.5 % 97.5 %
sexmale -95.3  -22.6

Open and complete script confidence_interval.R

Confidence intervals: a simulation

  • Work through script confidence_interval_simulation.R
  • Change the number of simulations to, one at a time, n_sims = 20, 50, 100, 500, 1000
  • Notice how often the CI does not contain the true parameter value \(\mu\).
  • How many % of simulations contain \(\mu\)?

Confidence intervals: a simulation

count(sims, mu_in_ci) %>% 
  mutate(prop = n / n_sims)
# A tibble: 2 × 3
  mu_in_ci     n  prop
  <chr>    <int> <dbl>
1 no          51 0.051
2 yes        949 0.949
  • When number of experiments approaches \(\infty\) and we calculate the 95% CI for each experiment, 95% of these CIs contain \(\mu\).
  • Note though, 5% of the time, the confidence interval does not contain \(\mu\)!
  • What does this mean for real (as opposed to simulated) experiments?

Varying intercepts model

When we include sex as an explanatory variable in additional to a continuous predictor variable (e.g. age), the model is as follows.

\[ \text{rt}_i \sim \mathcal{N}(\mu_i, \sigma^2)\\ \mu_i = \beta_0 + \beta_\text{sex} \cdot \text{sex}_{i} + \beta_\text{age} \cdot \text{age}_{i} \]

With sexes denoted by \(\text{sex}_{1},\text{sex}_{2}\dots \text{sex}_{i}\dots \text{sex}_{n}\) and age by \(\text{age}_{1},\text{age}_{2}\dots \text{age}_{i}\dots \text{age}_{n}\) \(\text{ for } i \in 1\dots n\).

Varying intercepts model

To implement this model using lm we would do the following.

model_2 <- lm(rt ~ sex + age, data = blomkvist)
coef(model_2)
(Intercept)     sexmale         age 
     338.64      -58.93        5.75 
  • Note that sex is dichotomous categorical and age is continuous.
  • If sex changes by one unit, when age is held constant, then the average value of the corresponding distribution of rt changes by \(\beta_\text{sex}\).
  • Conversely, if age changes by one unit, when sex is held constant, the average value of the distribution of rt changes by \(\beta_\text{age}\).

Varying intercepts model

Given that \(\text{sex}_{i}\) takes that value of 0 when sex is female and 1 when sex is male, this model can be written as follows.

\[ \text{rt}_i \sim \mathcal{N}(\mu_i, \sigma^2) \\ \mu_i = \beta_0 + \beta_\text{sex} \cdot \text{sex}_{i} + \beta_\text{age} \cdot \text{age}_{i}\\ \mu_i = \left\{ \begin{array}{ll} \beta_0 + \beta_\text{age} \cdot \text{age}_{i}, \text{ if sex}_i = 0 \texttt{ [female]}\\ \beta_0 + \beta_\text{sex} \cdot \text{sex}_{i} + \beta_\text{age} \cdot \text{age}_{i}, \text{ if sex}_i = 1\texttt{ [male]} \end{array} \right. \]

This is a varying intercepts model.

Assumption: age slopes do not co-vary by sex!

Fixed intercept model (simple linear regression)

Varying intercepts model

Varying intercepts and varying slopes model

The varying-intercepts model is written as follows. For \(i \in 1\dots n\)

\[ \begin{aligned} \text{rt}_i &\sim \mathcal{N}(\mu_i, \sigma^2)\\ \mu_i &= \beta_0 + \beta_\text{sex} \cdot \text{sex}_{i} + \beta_\text{age} \cdot \text{age}_{i}\\ \end{aligned} \]

The following is a varying intercepts and varying slopes model:

\[ \begin{aligned} \text{rt}_i &\sim \mathcal{N}(\mu_i, \sigma^2)\\ \mu_i &= \beta_0 + \beta_\text{sex} \cdot \text{sex}_{i} + \beta_\text{age} \cdot \text{age}_{i} + \underbrace{\beta_\text{sex,age} \cdot \text{sex}_{i} \cdot \text{age}_{i}}_\text{interaction} \end{aligned} \]

We effectively have a third predictor that is the product of \(\text{sex}_{i} \cdot \text{age}_{i}\); i.e. the interaction.

Varying intercepts and varying slopes model

Using sex and age as predictors, we can do the following to perform a varying-intercepts and varying-slopes model:

model_3 <- lm(rt ~ sex + age + sex:age, data = blomkvist)

Or slightly shorter:

model_3 <- lm(rt ~ sex * age, data = blomkvist)

Assumption: age slopes do co-vary by sex!

In comparison

Varying intercepts and varying slopes model

model_3 <- lm(rt ~ sex * age, data = blomkvist)
(betas <- coef(model_3))
(Intercept)     sexmale         age sexmale:age 
     308.46       17.15        6.28       -1.36 

From the varying-intercepts and varying-slopes model, the intercept term represents female; male is

# males
betas[1] + betas[2]
[1] 326

The slope terms are as follows:

# age slope for females
betas[3]
[1] 6.28
# age slope for males
betas[3] + betas[4]
[1] 4.93
# sex difference for age slopes
betas[4]
[1] -1.36

Varying intercepts and varying slopes model

Work through scripts

  • varying_intercepts_model.R
  • varying_slopes_model.R

Polychotomous predictors

Categorical predictor variables with more than two levels (i.e. dichotomous like sex).

Using smoker as single categorical predictor variable, the linear model would be as follows.

\[ \text{rt}_i \sim \mathcal{N}(\mu_i, \sigma^2), \mu_i = \beta_0 + \beta_\text{smoker[1]} \cdot \text{smoker}_{1i} + \beta_\text{smoker[2]} \cdot \text{smoker}_{2i}, \text{ for } i \in 1\dots n. \]

where \(\text{smoker}_{1i}\), \(\text{smoker}_{2i}\) are as follows.

\[ \text{smoker}_{1i}, \text{smoker}_{2i} = \left\{ \begin{array}{ll} 0,0 \text{ if smoker}_i = \texttt{former}\\ 1,0 \text{ if smoker}_i = \texttt{no}\\ 0,1 \text{ if smoker}_i = \texttt{yes}\\ \end{array} \right. \]

The number of coefs for predictor is \(K-1\) where \(K\) is number of levels of the predictor.

# Categories of smoker
levels(blomkvist$smoker) #  3 levels
[1] "former" "no"     "yes"   

Polychotomous predictors

Using lm, we would simply do as follows.

model_smoker <- lm(rt ~ smoker, data = blomkvist)
(betas <- coef(model_smoker))
(Intercept)    smokerno   smokeryes 
      652.5       -18.0       -19.1 

Default contrasts for polychotomous predictor used by lm.

contrasts(blomkvist$smoker)
       no yes
former  0   0
no      1   0
yes     0   1

The intercept term is the predicted average when smoker is former (i.e. row with only 0s).

Polychotomous predictors

Using lm, we would simply do as follows.

model_smoker <- lm(rt ~ smoker, data = blomkvist)
(betas <- coef(model_smoker))
(Intercept)    smokerno   smokeryes 
      652.5       -18.0       -19.1 

The predicted mean of the rt distribution for no

\[ \hat\beta_0 + \hat\beta_1 \cdot 1 + \hat\beta_2 \cdot 0 \]

betas[1] + betas[2] * 1 + betas[3] * 0
[1] 635

Polychotomous predictors

Using lm, we would simply do as follows.

model_smoker <- lm(rt ~ smoker, data = blomkvist)
(betas <- coef(model_smoker))
(Intercept)    smokerno   smokeryes 
      652.5       -18.0       -19.1 

When smoker is yes, the predicted mean of the rt distribution for smokers is

\[ \hat\beta_0 + \hat\beta_1 \cdot 0 + \hat\beta_2 \cdot 1 \]

betas[1] + betas[2] * 0 + betas[3] * 1
[1] 633

Exercise: work through script polychotomous_predictor.R

Contrast coding

Change contrast coding so intercept represents the average across both groups and to get desired copmarisons (for discussion see Schad et al. 2020; Brehm and Alday 2022).

coef(model_smoker)
(Intercept)    smokerno   smokeryes 
      652.5       -18.0       -19.1 
# Default: treatment contrast
contrasts(blomkvist$smoker)
       no yes
former  0   0
no      1   0
yes     0   1
contr.treatment(3, base = 2) # change base level
       former yes
former      1   0
no          0   0
yes         0   1
contr.helmert(3) / 2 # Helmert contrast
       former vs no former / no vs yes
former           -1                 -1
no                1                 -1
yes               0                  2
contr.sum(3) / 2 # sum contrast
       former vs yes no vs yes
former             1         0
no                 0         1
yes               -1        -1
model <- lm(rt ~ smoker, data = blomkvist) # Re-fit model with sum contrast
coef(model)
        (Intercept) smokerformer vs yes     smokerno vs yes 
             640.17               12.34               -5.61 
coef(model_smoker) # model with treatment contrasts
(Intercept)    smokerno   smokeryes 
      652.5       -18.0       -19.1 

Exercise: workthrough script contrast_coding.R

Predictions

Communicating the inferential results for your data: what does your model predict about how your outcome variable is affected by your predictor variables?

Using \(\hat\beta_0,\hat\beta_1\) (etc.) and \(\hat\sigma^2\), for any new vector of predictor variables \(x_{1i'}\), the corresponding \(y_i\) is

\[ y_{i'} \sim \mathcal{N}(\hat\mu_{i'}, \sigma^2)\\ \hat\mu_{i'} = \hat\beta_0 + \beta_1 \cdot x_{1i'} \]

Predictions

model <- lm(rt ~ sex + age, data = blomkvist)
(Intercept)     sexmale         age 
     338.64      -58.93        5.75 

We can predict rts for a hypothetical woman of age 95.

\[ \begin{aligned} \hat\mu_i &= \hat\beta_0 + \hat\beta_\text{sex} \cdot \text{sex}_{i} + \hat\beta_\text{age} \cdot \text{age}_{i}\\ &=338.64 + -58.93 \cdot \text{sex}_{i} + 5.75 \cdot \text{age}_{i}\\ &=338.64 + -58.93 \cdot 0 + 5.75 \cdot 95\\ &=338.64 + 5.75 \cdot 95\\ &=885 \end{aligned} \] To calculate \(\hat\mu_{i'}\) for any vector of predictor variables

predict(model, newdata = tibble(sex = "female", age = 95)) 
  1 
885 

Prediction confidence intervals

  • We can calculate confidence intervals for the predicted values of \(\mu_{i'}\).
  • The confidence interval for \(\mu_{i'}\) is as follows:

\[ \hat\mu_{i'} \pm \tau_\nu \cdot \hat{\text{se}}_{\mu_{i'}} \]

where \(\hat{\text{se}}_{\mu_{i'}}\) is the standard error term that corresponds to \(\hat\mu_{i'}\).

We can obtain these by using interval = 'confidence' in the predict function.

predict(model, interval = 'confidence', newdata = tibble(sex = "female", age = 95))
  fit lwr upr
1 885 845 925

Exercise: extract and plot model predictions

Work through scripts

  • predictions.R
  • predictions_with_cis.R
  • predictions_with_emmeans.R

Recap

  • Normal model assumes a single normal distributed data generating process.
  • MLE provides estimates for unknown population parameters.
  • We can specify models with continues and / or categorical predictors.
  • Combinations are varying intercepts and / or varying slopes.

Homework

  • For the formative assessment

Using a data-set of your own choice, perform and report a linear regression analysis to address a psychologically relevant question.

  • Find a suitable data set you can use for the formative assessment.
  • Construct (and implement using lm()) two “nested” models, so
    1. A model with 1 or 2 predictors.
    2. A model with the same predictors as the previous one and another poredictor.
  • The former model is “nested” in the latter.
  • Read lecture notes on NOW (also Andrews 2021).

References

Andrews, Mark. 2021. Doing Data Science in R: An Introduction for Social Scientists. SAGE Publications Ltd.

Brehm, Laurel, and Phillip M. Alday. 2022. “Contrast Coding Choices in a Decade of Mixed Models.” Journal of Memory and Language 125: 104334.

Schad, Daniel J., Shravan Vasishth, Sven Hohenstein, and Reinhold Kliegl. 2020. “How to Capitalize on a Priori Contrasts in Linear (Mixed) Models: A Tutorial.” Journal of Memory and Language 110: 104038.