Two Ways Not to Use Regression Analysis

Kirby Arinder
2023/08/09

Introduction

What are we doing?

Why do we care?

At whom is this aimed?

Outline

  • Linear regression: A minimal-math introduction
    • As a descriptive method
    • As an inferential method
  • Two common uses of linear regression
    • Forecasting
    • Causal inference

Linear regression: A minimal-math introduction

The core of linear regression is simple.

It's a method of assessing the central tendency of multidimensional data and making certain inferences based on that assessment.

So let's start at the beginning.

Here's a number line.

It has one dimension.

It also has an average (or central tendency).

Which is a point, i.e., zero-dimensional!

This average or central tendency

…Serves as an expected value of sorts for, well, entities like whatever this number line represents!

Let's think informally for now: If the numbers on the previous slide represented the output of some process, then ceteris paribus you'd expect future processes like this to also have this average, over the long term!

So let's add just a little complexity.

We can easily envision a two-dimensional set of points.

And this set, too...

has a central tendency! (Actually many, but…)

A brief digression...

That blue line looks wonky, you might say!

This line minimizes the squared distances on the Y axis of all points in the set from the central tendency (or regression surface – hey, there's our word!).

So let's reiterate

The central tendency under discussion is identical to the expected value of the set.

That is: the value you'd expect new members of this set, generated by the same process, to assume, on average, in the long term!

A rule:

The C.T.U.D. is one dimension less than its set.

  • A line gets a point; a plane gets a line!
  • In general, an n-dimensional set gets an n-1-dimensional hyperplane!

But:

High-dimensional predictions aren't typically very useful.

Even a two-dimensional expectation can

  • strain our imagination and be
  • too imprecise to guide action!

Conditional expectations

By holding a number of dimensions constant, we can reduce the dimensionality of our expected value!

This creates a conditional expectation.

In theory, this practice is very useful:

  • We can make tractable, low-dimensional predictions
  • We can isolate the variables we don't care about from the variables we do!

How does THAT work?

First, let's introduce the ideas of:

  • independent variables (let's think of them as those we can control) and
  • dependent variables (those we can't, and thus are typically interested in predicting).

Traditionally, in two dimensions, the independent variable is depicted on the x axis and the dependent on the y.

Aaand?

With that understanding, let's just hold the x axis constant!

What does "holding constant" mean in the real world?

Right now, nothing at all! It's a purely logical stipulation: If P, then Q.

But that changes when we think about regression as an inferential method. Which we will shortly!

But first, a step back!

You now understand linear regression as a descriptive technique.

It's a high-dimensional average which can be simplified down to set conditional expectations for the output of some function.

Linear regression as an inferential technique

But more typically, we want to use linear regression to make inferences.

These often take two forms:

  • Forecasting a future value
  • Making a causal inference.

Inference and randomization

Either of these forms of inference requires a little more than what we've seen so far.

We need not just a conditional expectation, but a distribution of error around that expectation.

This is where things like confidence intervals and p-values come in.

For example:

Here's the standard 95% confidence interval around our old friend.

Which I won't talk about!

Or at least, I'm not going to talk about how to do them.

Which may be unnecessary anyway; if I compute a regression line using R's standard functions, I'm going to get CIs and p-values whether or not I know how to use them or even want them!

But I will say...

The standard inferential mechanism for linear regression is based on an assumption of random sampling. It assumes:

  • Your starting dataset is a random sample from a population;
  • You are making inferences about that same population;
  • Your conditional expectations refer to the results of more random samples followed by discards.

Your CIs and p-vals are only meaningful in this context.

But this is almost never the case in our work!

If your data aren't from a random sample at all, it tells you nothing to know the probability of a random sample with values at least as extreme as the ones you observed!

With that in mind, let's finally move on to…

Two common inferential uses of linear regression

A.k.a., what not to do:

  • Forecasting: Predicting events that haven't happened yet
  • Causal inference: Isolating variables to declare some remainder causally influential

Forecasting

Two problems with using linear regression for this purpose:

  • Regressions predict the results of random draws, not causal processes; and
  • By definition, future events don't belong to the population from which you drew your sample (which you did, right?)

Causal inference

Before I say anything at all about this, a quote:

“No causal claim can be established by a purely statistical method, be it propensity scores, regression, stratification, or any other distribution-based design.”

-Judea Pearl, “Causality,” p. 350

Causal inference

But disregarding that looming fact, we've still got our more specific problems:

  • Linear regression is blind to causal processes, and doesn't even necessarily decrease bias!
  • And once again, regression makes probabilistic predictions, not causal or counterfactual ones.

This presentation is the barest introduction

It just covers the simplest form of linear regression and its formal interpretation!

There are many, many considerations not mentioned here!

So what's our takeaway?

Don't be dissuaded from forecasting, or from making causal claims.

But beware of using a method developed for one specific context, in a different context entirely!

If you want to break the rules, have an argument ready that your method works in your novel context.

If you want to find this presentation again

Questions?