Kirby Arinder
2023/08/09
There's probably a lot here.
The goals of this presentation:
It has one dimension.
Which is a point, i.e., zero-dimensional!
We can easily envision a two-dimensional set of points.
has a central tendency! (Actually many, but…)
The central tendency shown above is (trivially) identical to the expected value of the set.
Which is to say, it's the value you'd expect new members of this set, generated by the same process, to assume, on average, in the long term!
-The central tendency under discussion is one dimension less than its set.
-If we fix certain values, we can decrease the dimensionality of our expected values!
It's a lot simpler than it sounds.
First, let's introduce the ideas of:
Traditionally, in two dimensions, the independent variable is depicted on the x axis and the dependent on the y.
With that understanding, let's just hold the x axis constant!
That blue line looks wonky, you might say!
Fair. It seems to us like the central tendency ought to run straight down the diagonal, right?
The short answer: This is really a central tendency for distance on the Y axis.
This line minimizes the squared distances on the Y axis of all points in the set from the central tendency (or regression surface – hey, there's our word!).
That's not super important right now, and there are lots of ways of drawing that line. I'm being as imprecise as I can get away with, because I want to be as general as possible.
The main takeaway:
Yeah, the line doesn't look like it seems like it should, but it's measuring a specific central tendency that's precisely defined and serves a purpose.
We said before that the central tendency under discussion has one dimension less than the set to which it belongs.
This principle extends upwards!
These can be hard to imagine (or depict).
By holding a number of dimensions constant, we can reduce the dimensionality of our expected value!
In theory, this has a number of useful properties. Notably:
And things like contemporary AI, at least if I've done my job.
It's a high-dimensional average which can be simplified down to set conditional expectations for the output of some function.
It's just like predicting the next entity in a sequence by picking the average of the sequence up to now.
You might not be right any particular time… but:
Or rather, initial caveat, before all the other caveats I'm about to give you.
The principle of regression analysis is simple, but everything hinges on the execution.
I've intentionally spoken generally because I want to dispel any mystery around the principles.
But there are lots of ways of doing even the stuff I've talked about so far, and I've only talked about the simplest version applied to the most tractable data!
A regression surface is just a descriptive statistic.
It can be used in any context in which the central tendency you choose is informative.
The relevant question is about utility, not truth!
We want to use it to make inferences.
These often take two forms:
Either of these forms of inference requires a little more than what we've seen so far.
We need not just a conditional expectation, but a distribution around that expectation.
This is where things like confidence intervals and p-values come in.
Here's the standard 95% confidence interval around our old friend.
This confidence interval is based on an assumption of random sampling. So are the p-values you'd get from it, if you're interested in testing a hypothesis!
That is, it assumes that you got your data from a random sample from the population of interest…
And that you're trying to predict the results of future random samples!
We are tempted to interpret this sort of confidence interval as though the confidence level is the likelihood that an output value will fall in the range we've marked out.
But it's not!
None of that stuff may actually apply to your data.
At least in MS, it almost never does!
The data I showed you weren't a random sample at all; they are the whole population!
Beware of using a method developed for one specific context, in a different context entirely!
If you want to break the rules, have an argument ready that the method is equally applicable in your novel context.
Do I even have time for this slide?
I think what I need to talk about is
and then
Notes about the above:
“… the golden rule of causal analysis: No causal claim can be established by a purely statistical method, be it propensity scores, regression, stratification, or any other distribution-based design.”
-Judea Pearl, “Causality,” p. 350
Now this is links.
Some important links, maybe not formatted this way:
Closing comments?.
Coalition for Evidence-Based Policy (2013). Randomized Controlled Trials Commissioned by the Institute of Education Sciences Since 2002: How Many Found Positive Versus Weak or No Effects. Retrieved from http://coalition4evidence.org/wp-content/uploads/2013/06/IES-Commissioned-RCTs-positive-vs-weak-or-null-findings-7-2013.pdf
Farrington, D.P., Gottfredson, D.C., Sherman, L.W. & Welsh, B.C. (2002). The Maryland Scientific Methods Scale. In Farrington, D.P., MacKenzie. D. L., Sherman, L.W.,& Welsh, B.C. (Eds.), Evidence-Based Crime Prevention (pp. 13-21). London: Routledge.
Ioannidis, J.P.A. (2005). Contradicted and Initially Stronger Effects in Highly Cited Clinical Research. Journal of the American Medical Association, 294(2), 218-228.
Manzi, J. (2012). Uncontrolled: The Surprising Payoff of Trial-and-Error for Business, Politics, and Society. New York: Perseus Books Group.
Pearl, J. (2009). Causality (2nd ed.). Cambridge: Cambridge University Press.
Zia, M. I., Siu, L. L., Pond, G. R., & Chen, E. X. (2005). Comparison of Outcomes of Phase II Studies and Subsequent Randomized Control Studies Using Identical Chemotherapeutic Regimens. Journal of Clinical Oncology, 23(28), 6982-6991.