General Linear Model

  • The General Linear Model (GLM) is a mathematical model that describes the relation between some outcome or variable of interest Y and one or several predictor variables.
  • The GLM in its most basic form is no different than the linear equation you learned in high school y = mx + b, where the relation between two points y and x is described using an intercept (b) and a slope (m).
  • Just to shake it up, we will change the terminology a bit for the GLM we use in research

The GLM, continued

y = b0 + b1x

  • y is the value of the outcome or independent variable
  • The expected value of y can be predicted by some mean or base level, b0, plus the value of predictor x premultiplied by a coefficient b1

The GLM, continued

This linear model equation probably looks familiar to you, because it is used in regression. In fact, the general linear model is the same model you learned as the linear regression model.

Typically regression is taught as a separate and distinct method from t-tests and ANOVAs. As a result, students often miss the fact that all of these methods are just slightly different variations of the same model, and that is the General Linear Model.

Correlation Reminders

  • Value ranging from -1 to +1 used to describe the magnitude and direction of a relation between two numeric variables
  • Pearson's r is commonly used to calculate this linear relationship
  • Test whether the correlation coefficient is statistically different than 0 using a corresponding t-test

Calculation of Pearson's r

One of the equations for the calculation of Pearson's r is shown below:

\(r_{x,y} = \frac{Cov(x,y)}{\sigma_{x}\sigma_{y}}\)

Linear Correlation as GLM

  • This correlation coefficient can also be estimated using the GLM, y = B0+B1x.
  • Specifically, the relation between x and y is equal to the standardized slope term B1 in this equation.
  • Here is why:

\(r_{x,y} = \frac{Cov(x,y)}{\sigma_{x}\sigma_{y}}\)

\(\beta_{1} = \frac{Cov(x,y)}{\sigma_{x}^2}\)

Linear Correlation as GLM, cont

\(r_{x,y} = \frac{Cov(x,y)}{\sigma_{x}\sigma_{y}}\)

\(\beta_{1} = \frac{Cov(x,y)}{\sigma_{x}^2}\)

  • Numerators are identical
  • Denominators are only slightly different - how?

Linear Correlation as GLM, cont

  • If x and y had the same standard deviation, these equations would produce the exact same result.
  • How could we force x and y to have the same SD?
  • Recall that when you standardize a variable with a Z transformation, the mean is always zero and the SD is always 1
  • If we then regresssed Zx on Zy, the estimated slope is equivalent to Pearson's r.

Example: Correlation and Regression in R

  • Last week we looked at the subscale measuring Percevied Peer Support (PPS).
  • The subscale has 8 items, each measured on a Likert type scale ranging from 0-4.
  • What is the relation between age and overall Perceived Peer Support?

Example: Correlation and Regression in R

# count NAs per row for scale items
PPSdat$NAcount <- rowSums(is.na(PPSdat[4:12]))
# subset to rows with less than 3 items missing
PPSdat <- subset(PPSdat, PPSdat$NAcount < 3) 
# get variable names/locations
names(PPSdat) 
#calculate mean PPS score for each person
PPSdat$PPS_TOT <- rowMeans(PPSdat[4:12], na.rm=TRUE) 

Example: Correlation and Regression in R

First let's use the cor.test function:

cor.test(PPSdat$Age, PPSdat$PPS_TOT)

Example: Correlation and Regression in R

Okay, now for the magic!!

First let's standardize both variables. Recall the equation of a z-score: (x-M)/SD.

Z_Age <- (PPSdat$Age - mean(PPSdat$Age))/sd(PPSdat$Age)
Z_PPS <- (PPSdat$PPS_TOT - mean(PPSdat$PPS_TOT))/sd(PPSdat$PPS_TOT)

Note: These z-values could also be obtained more simply using the scale function.

Example: Correlation and Regression in R

Use z-scores in linear regression using the lm function:

model1 <- lm(Z_PPS ~ Z_Age)