Today, we’ll make sure that we understand the tremendous usefulness of general linear models. Then, you’ll work in teams to refine your hypotheses.
Quiz: Comprehension Questions
In a sentence or formula, what is the general linear model? (You may want to consider what two basic elements of the plot on the right a general linear model would attempt to capture.)
Why might you want to transform data (e.g., by creating Z-scores or standardized scores)?
Quiz: Application Question
Which of the following is the best “model” of a bull? Explain in 3-4 sentences. (I’m more interested in your justification than the answer itself; there isn’t a clearly correct answer.)
What are Statistical Models?
Fundamentally, statistics attempts to understand a particular outcome in terms of a simplified model and the predictive error that results from simplifying the messiness of reality into a model.
Data are messy and complex; statistics allows us to gain an understanding of the world by simplifying it and extracting the most important signals.
The best models have low prediction error and high generalizability to new datasets.
Simplifications are Always Incorrect
Models are mere representations of actual data; they always simplify and thus distort the truth.
Models ignore details that are considered to be non-essential to roughly predict the future and explain the past.
Linear Models
Assuming a linear relationship between predictors and an outcome variable, linear models predict an outcome of interest by fitting a line (or multiple lines) onto the data. Each line can be characterized by a slope and a y-intercept.
outcome = intercept + slope*predictors + error
or, in mathematical notation, y = a + bx + e
Surprise: You’ve Actually Been Using Linear Models All Along!
Most Statistical Tests are Various Formulations of Linear Models
One-sample t-test
y = intercept + error
Independent-samples t-test
y = intercept + slope*group + error
ANOVA
y = intercept + slope1*group1 + slope2*group2 + error
More complex statistics often don’t use wholly new model types; they just add additional model components.
Moving Toward Multivariate Models
Fitting a single line is not sufficient for modeling certain relationships; in such cases, multiple variables need to be taken into account (while taking into account the tradeoff between fit and simplicity/generalizability).
Groupwork
Share your hypotheses and background research with your teammates. Then, work to converge upon a shared question that is (a) feasible, (b) novel, and (c) testable with an “advanced” statistical model. Call me over to help as needed!
Please prepare sufficiently for our group meetings to make them worthwhile. I encourage you to refer to the syllabus to see my expectations for these weekly meetings.
I’ve realized that it will be easiest to meet in my office (LSP 132D), since it will often be helpful to look at my monitor.