Causal Inference, Hypothesis Testing, Z-scores
POLS 3316: Statistics for Political Scientists

Tom Hanna

2023-10-28

Where we were

  • Standard Errors - distance between sample and population data

  • Z-scores - probability that sample represents the true population data

              + Z- Score tables

Today

  • Look at Z-tables and formula
  • Talk about statistics and causal inference

Statistics: Cause and effect

  • What is a cause?
  • Causes are complicated!
  • The Fundamental Problem
  • Hypothesis Testing
  • So what does a hypothesis test tell us?
  • Bayes Rule Again

What is a cause?

What is a cause?

  • the thing that Y would not happen without

What is a cause?

  • the thing that Y would not happen without
  • If X does not exist, Y will not happen

What is a cause?

  • the thing that Y would not happen without
  • If X does not exist, Y will not happen

Except…

Causes are complicated!

Causes are complicated!

  • Multiple causes

Causes are complicated!

  • Multiple causes
  • Causes may be necessary but not sufficient (conditional)

Causes are complicated!

  • Multiple causes
  • Causes may be necessary but not sufficient (conditional)
  • Causes may be sufficient but not necessary (more than one possible sufficient cause)

Causes are complicated!

  • Multiple causes
  • Causes may be necessary but not sufficient (conditional)
  • Causes may be sufficient but not necessary (more than one possible cause)
  • Some causes may be neither

Causes are complicated!

  • Multiple causes
  • Causes may be necessary but not sufficient (conditional)
  • Causes may be sufficient but not necessary (more than one possible cause)
  • Some causes may be neither
  • The randomness factor (stochastic factor)

Causes are complicated!

  • Multiple causes
  • Causes may be necessary but not sufficient (conditional)
  • Causes may be sufficient but not necessary (more than one possible cause)
  • Some causes may be neither
  • The randomness factor (stochastic factor) - - Intermediate steps (mediating or moderating variables)

Causes are complicated!

  • Multiple causes
  • Causes may be necessary but not sufficient (conditional)
  • Causes may be sufficient but not necessary (more than one possible cause)
  • Some causes may be neither
  • The randomness factor (stochastic factor) - - Intermediate steps (mediating or moderating variables)
  • Reverse causation (endogeneity)

These are easy compared to the big issue…

The Fundamental Problem of Causal Inferance

The Fundamental Problem

  • We can’t observe “what if”?

The Fundamental Problem

  • We can’t observe the “what if”?
  • Technical term: counterfactual

The Fundamental Problem

  • We can’t observe the “what if”?
  • Technical term: counterfactual
  • We don’t know if Y would have happened without X because X did happen

The Fundamental Problem

  • We can’t observe the “what if”?
  • Technical term: counterfactual
  • We don’t know if Y would have happened without X because X happened
  • Huge problem for observational studies

The Fundamental Problem

  • We can’t observe the “what if”?
  • Technical term: counterfactual
  • We don’t know if Y would have happened without X because X happened
  • Huge problem for observational studies
  • Experimental design manipulates the data generation: partial solution

The Fundamental Problem

  • We can’t observe the “what if”?
  • Technical term: counterfactual
  • We don’t know if Y would have happened without X because X happened
  • Huge problem for observational studies
  • Experimental design manipulates the data generation: partial solution
  • Observational studies rely on our treatment of the data: partial solution

Approach hypothesis testing

  • Our approach to hypothesis testing is part of the solution to the fundamental problem
  • Our interpretation of hypothesis testing is driven by the fundamental problem

Hypothesis Testing

  • Null hypothesis: effect on Y is due to random chance

Hypothesis Testing

  • Null hypothesis: effect on Y is only due to random chance
  • Null hypothesis: as if X didn’t exist

Hypothesis Testing

  • Null hypothesis: effect on Y is only due to random chance
  • Null hypothesis: as if X didn’t exist
  • design the model so that: null hypothesis ~ the counterfactual

Hypothesis Testing

  • Null hypothesis: effect on Y is only due to random chance
  • Null hypothesis: as if X didn’t exist
  • design the model so that: null hypothesis ~ the counterfactual

That’s aspirational

Standard Errors, Z-Scores, and Z-Tables

  • Standard error: Standard deviation of the sampling distribution of the mean

SE = \(\frac{\sigma}{\sqrt{n}}\)

  • Z-score: number of standard errors from the mean

\(Z = \frac{\bar{x}-\mu}{SE}\) or Z = \(\frac{\bar{x} - \my}{\sigma_{\bar{x}}}\)

Z-Tables

Z-table

Example:

  • Mean height of UH students is 5’10”
  • Standard deviation of height is 3”
  • Sample of 100 students
  • Mean height of sample is 5’9”

Is the sample mean height shorter than the population mean height?

Is this a one-tailed or two-tailed test?

  • A one-tailed test is directional
  • For example, is the sample mean greater than the population mean?
  • A two-tailed test is non-directional
  • For example, is the sample mean different from the population mean?

What probability are we looking for?

  • Our required confidence level is 95%
  • Our required significance level is 5%
  • We are looking for a probabiliyt of 5% or less
  • Also called a p-value of 0.05 or less
  • Also called an \(\alpha\) (alpha) of 0.05 or less

p < .05

What is our Standard Error?

  • SE = \(\frac{\sigma}{\sqrt{n}}\)

What is our Standard Error?

  • SE = \(\frac{\sigma}{\sqrt{n}}\)

  • \(\frac{3}{\sqrt{100}} = 0.3\)

What is our Z-score?

  • \(Z = \frac{\bar{x}-\mu}{SE}\) or Z = \(\frac{\bar{x} - \my}{\sigma_{\bar{x}}}\)

What is our Z-score?

  • \(Z = \frac{\bar{x}-\mu}{SE}\) or Z = \(\frac{\bar{x} - \my}{\sigma_{\bar{x}}}\)

  • \(Z = \frac{5'9"-5'10"}{0.3} = -3.33\)

Z-Table

So what does a hypothesis test tell us?

Critical z-Values for a 95% confidence interval:

  • Z < 1.96 (or Z > -1.96) for a two-tailed
  • Z < 1.65 (or Z > 1.65) for a one-tailed test

So what does a hypothesis test tell us?

  • Z < 1.96: “the null hypothesis is retained”

So what does a hypothesis test tell us?

  • Z < 1.96: “the null hypothesis is retained”

      - The Theory is Wrong

So what does a hypothesis test tell us?

  • Z < 1.96: “the null hypothesis is retained”

      - The Theory is Wrong
      - As written

So what does a hypothesis test tell us?

  • Z < 1.96: “the null hypothesis is retained”

      - The Theory is Wrong
      - As written
      - In some way

So what does a hypothesis test tell us?

  • Possible: “the null hypothesis is retained”

      - The Theory is Wrong
      - As written
      - In some way
  • Z > 1.96:

So what does a hypothesis test tell us?

  • Possible: “the null hypothesis is retained”

      - The Theory is Wrong
      - As written
      - In some way
  • Z > 1.96: “the null hypothesis is rejected”

So what does a hypothesis test tell us?

  • Possible: “the null hypothesis is retained”

      - The Theory is Wrong
      - As written
      - In some way
  • Z > 1.96: “the null hypothesis is rejected”

      - The Theory is Right??

So what does a hypothesis test tell us?

  • Possible: “the null hypothesis is retained”

      - The Theory is Wrong
      - As written
      - In some way
  • Z > 1.96: “the null hypothesis is rejected”

      - The Theory is Right??

NO!!!!!!

So what does a hypothesis test tell us?

  • Z > 1.96: “the null hypothesis is rejected”

The evidence supports the hypothesis.

So what does a hypothesis test tell us?

  • Z > 1.96: “the null hypothesis is rejected”

The evidence supports the hypothesis.

The evidence is consistent with the theory.

So what does a hypothesis test tell us?

  • Z > 1.96: “the null hypothesis is rejected”

The evidence supports the hypothesis.

The evidence is consistent with the theory.

The null hypothesis is rejected and the evidence is consistent with the hypothesized effect.

So what does a hypothesis test tell us?

  • Z > 1.96: “the null hypothesis is rejected”

The evidence supports the hypothesis.

The evidence is consistent with the theory.

The null hypothesis is rejected and the evidence is consistent with the hypothesized effect.

What about certainty and proof?

Back to Bayes Rule

Bayes Rule

What does this tell us?

  • We need to be precise about what we mean by a cause

  • We need to understand what statistics can tell us about causation and what it can’t

      - Correlation does not *prove* causation 
      - but correlation can *help establish* causation
    
      - We need to understand the limits of data and statistics
      - We also need to understand the capabilities of data and statistics

Everything after here is draft notes for your reading. Beware of typos, etc.

Some of the things in these notes are from courses I took, some are from assorted books, some are from these two sources which are at least somewhat readable and free:

https://egap.org/resource/10-things-to-know-about-hypothesis-testing/

https://egap.org/resource/10-things-to-know-about-causal-inference/

1 - Correlation \(\notequal\) causation.

  • Correlation does imply a relationship
  • Relationship may involve some cause and effect somewhere
  • The relationship could go either direction
  • The relationship could involve other variables
  • Lack of correlation doesn’t necessarily mean anything - correlation is linear and causal effects are not always linear

2 - A cause is a claim about something that did not happen

  • If we say X caused Y, we mean: If X did not happen, Y would not happen, everything else being held the same.

If we say X caused Y, we mean: If X didn’t happen, Y would not happen, everything else being held the same.

  1. The Fundamental Problem of Causal Inference
  • Our proposed cause, which did happen, is the factual
  • The thing that didn’t happen is called the counterfactual
  • We can’t actually observe the thing that didn’t happen
  • The inability to observe the counterfactual is the fundamental problem of causal inference
  • Experiments are a potential way around this

4. Causes have to involve a possible manipulation of circumstances so that the counterfactual occurred

5. Statistics looks for average causal effects

Statistics are about average causal effects, not single data points or individual effects. The average effects may conflict with anecdotal evidence. This is partially because…

6. There can be multiple causes.

The technical phrase here is: Causes are non-rival.

7. Causes can be…

  • necessary
  • sufficient
  • neither -or both

and still be causes

8. Measuring effects is easy

It’s a lot easier to measure effects than to find causes.

What can statistics do for us?

  • The Null Hypothesis and counterfactuals

              + We can measure the probability an effect is due to random chance (the null hypothesis)
              + Formal hypothesis tests give us this value, the *p-value*
              + Theory provides an *alternative hypothesis* which we believe to be true based on the theory
              + Well designed hypotheses can help with the unobserved counterfactual
              + When we reject the null, we can determine that "the evidence is consistent with the alternative hypothesis" and the theory

Authorship, License, Credits

Creative Commons License

Z-Table image from: https://byjus.com/maths/z-score-table/

Full Z-Table from unknown course I took sometime in the last 8 years

Other images referenced in previous lectures