Some Title About Regression

Kirby Arinder
2023/08/09

Introduction

There's probably a lot here.

The goals of this presentation:

Answer followup questions
Present recommendations

Here is a set of points on a number line.

It has one dimension.

It also has an average (or central tendency).

Which is a point, i.e., zero-dimensional!

From there...

We can easily envision a two-dimensional set of points.

And this set, too...

has a central tendency! (Actually many, but…)

Expected value

The central tendency shown above is (trivially) identical to the expected value of the set.

Which is to say, it's the value you'd expect new members of this set, generated by the same process, to assume, on average, in the long term!

Two things to notice:

-The central tendency under discussion is one dimension less than its set.

A line gets a point; a plane gets a line!

-If we fix certain values, we can decrease the dimensionality of our expected values!

This is called a conditional expectation!

How does THAT work?

It's a lot simpler than it sounds.

First, let's introduce the ideas of:

independent variables (let's think of them as those we can control) and
dependent variables (those we can't, and thus are typically interested in predicting).

Traditionally, in two dimensions, the independent variable is depicted on the x axis and the dependent on the y.

Aaand?

With that understanding, let's just hold the x axis constant!

A brief digression

That blue line looks wonky, you might say!

Fair. It seems to us like the central tendency ought to run straight down the diagonal, right?

The short answer: This is really a central tendency for distance on the Y axis.

In other words?

This line minimizes the squared distances on the Y axis of all points in the set from the central tendency (or regression surface – hey, there's our word!).

That's not super important right now, and there are lots of ways of drawing that line. I'm being as imprecise as I can get away with, because I want to be as general as possible.

End of digression...

The main takeaway:

Yeah, the line doesn't look like it seems like it should, but it's measuring a specific central tendency that's precisely defined and serves a purpose.

... and a return to the main point.

We said before that the central tendency under discussion has one dimension less than the set to which it belongs.

This principle extends upwards!

A three-dimensional set has an average that is a 2-d plane;
In general, any n-dimensional set has an average that is an n-1 dimensional hyperplane!

These can be hard to imagine (or depict).

But here's where it gets good!

By holding a number of dimensions constant, we can reduce the dimensionality of our expected value!

In theory, this has a number of useful properties. Notably:

We can make tractable, low-dimensional predictions
We can isolate the variables we don't care about from the variables we do!

NOW YOU UNDERSTAND REGRESSION ANALYSIS.

And things like contemporary AI, at least if I've done my job.

It's a high-dimensional average which can be simplified down to set conditional expectations for the output of some function.

Which is less complicated than it sounds.

It's just like predicting the next entity in a sequence by picking the average of the sequence up to now.

You might not be right any particular time… but:

over the long term,
if nothing changes,
you'll be right on average.

An obligatory caveat

Or rather, initial caveat, before all the other caveats I'm about to give you.

The principle of regression analysis is simple, but everything hinges on the execution.

I've intentionally spoken generally because I want to dispel any mystery around the principles.

But there are lots of ways of doing even the stuff I've talked about so far, and I've only talked about the simplest version applied to the most tractable data!

How do we actually use it, though?

A regression surface is just a descriptive statistic.

It can be used in any context in which the central tendency you choose is informative.

The relevant question is about utility, not truth!

But more typically...

We want to use it to make inferences.

These often take two forms:

Forecasting a future value
Making a causal inference.

Inference and randomization

Either of these forms of inference requires a little more than what we've seen so far.

We need not just a conditional expectation, but a distribution around that expectation.

This is where things like confidence intervals and p-values come in.

For example:

Here's the standard 95% confidence interval around our old friend.

But beware!

This confidence interval is based on an assumption of random sampling. So are the p-values you'd get from it, if you're interested in testing a hypothesis!

That is, it assumes that you got your data from a random sample from the population of interest…

And that you're trying to predict the results of future random samples!

In other words

We are tempted to interpret this sort of confidence interval as though the confidence level is the likelihood that an output value will fall in the range we've marked out.

But it's not!

But notice:

None of that stuff may actually apply to your data.

At least in MS, it almost never does!

The data I showed you weren't a random sample at all; they are the whole population!

So what's my main takeaway regarding prediction?

Beware of using a method developed for one specific context, in a different context entirely!

If you want to break the rules, have an argument ready that the method is equally applicable in your novel context.

But there are other takeaways.

Do I even have time for this slide?

Ask the right questions
Beware overfitting
Your line will not be sensitive to changes in the world!

So how about

For example:

OLS is best
Beware better-fitting lines
Randomization – this is huge. We assume random samples
We also assume that the new process is a random draw – NOT a manipulation!
This is not for causal inference. It's completely silent on causal relationships
The models are fragile – best fit line changes, sometimes in response to stuff you haven't measured!
Your “prediction” may not capture what matters practically! Disadvantaged kids and scores.

Questions?

I think what I need to talk about is

randomization (in front of prediction; then one slide on other issues, e.g., fragility)

and then

selection inference vs causal inference (and then talk about other issues).

Questions?

Overview

Notes about the above:

An average doesn't really reflect what we want for short-term purposes if we're forecasting.
So we add a confidence interval!
But, 1: Assumes randomization
But, 2: Completely ignorant of causes

A relevant quote

“… the golden rule of causal analysis: No causal claim can be established by a purely statistical method, be it propensity scores, regression, stratification, or any other distribution-based design.”

-Judea Pearl, “Causality,” p. 350

Research links!

Now this is links.

Some important links, maybe not formatted this way:

Closing comments?.

And this is a bibliography.

Coalition for Evidence-Based Policy (2013). Randomized Controlled Trials Commissioned by the Institute of Education Sciences Since 2002: How Many Found Positive Versus Weak or No Effects. Retrieved from http://coalition4evidence.org/wp-content/uploads/2013/06/IES-Commissioned-RCTs-positive-vs-weak-or-null-findings-7-2013.pdf

Farrington, D.P., Gottfredson, D.C., Sherman, L.W. & Welsh, B.C. (2002). The Maryland Scientific Methods Scale. In Farrington, D.P., MacKenzie. D. L., Sherman, L.W.,& Welsh, B.C. (Eds.), Evidence-Based Crime Prevention (pp. 13-21). London: Routledge.

Ioannidis, J.P.A. (2005). Contradicted and Initially Stronger Effects in Highly Cited Clinical Research. Journal of the American Medical Association, 294(2), 218-228.

Manzi, J. (2012). Uncontrolled: The Surprising Payoff of Trial-and-Error for Business, Politics, and Society. New York: Perseus Books Group.

Pearl, J. (2009). Causality (2nd ed.). Cambridge: Cambridge University Press.

Zia, M. I., Siu, L. L., Pond, G. R., & Chen, E. X. (2005). Comparison of Outcomes of Phase II Studies and Subsequent Randomized Control Studies Using Identical Chemotherapeutic Regimens. Journal of Clinical Oncology, 23(28), 6982-6991.