Lecture 4

Administrative Miscellanea

Homework 4 due Friday
Quiz 2 today
Problem Set 1 Graded - Partial Credit available until Friday
Exam next Wednesday
- Study materials will be available this week

Hypothesis testing: motivating question

Run \(y_i=\beta_0+\beta_1 x_i + \varepsilon_i\)
Suppose \(\beta_1=0\) (the true value), will we obtain \(\hat\beta_1=0\)?
No. Even without bias we will still have some random variation due to sampling
We know that \(\hat\beta_1\sim N(\beta_1,\frac{\hat\sigma_\varepsilon}{n\sigma_x^2})\)

Probabilities under continuous random variables

This is a probability density function
What is the probability that X=0?
1. We’ll see why in a second

Probabilities under continuous random variables

We want to find the area of the shaded region (between -1 and 1)
The function pnorm \((=\Phi(z))\) gives \(P(X\le z)\)
How do we calculate \(P(-1\le x\le 1)?\)

Probabilities under continuous random variables

We grab the full area to the left of 1 (pnrom(1) = green + red area), then subtract out the area to the left of -1 (pnorm(-1))

Question: Probabilities

\(\hat\beta_1\sim N(0,1)\). What is the probability that \(\hat\beta_1\) is between -2 and 1, \(P(-2<\hat\beta_1<1)\)?

       x area_to_left
 1: -3.0        0.001
 2: -2.5        0.006
 3: -2.0        0.023
 4: -1.5        0.067
 5: -1.0        0.159
 6: -0.5        0.309
 7:  0.0        0.500
 8:  0.5        0.691
 9:  1.0        0.841
10:  1.5        0.933
11:  2.0        0.977
12:  2.5        0.994
13:  3.0        0.999

Answer: Probabilities

pnorm(1)-pnorm(-2)

[1] 0.8185946

Question: calculating z scores

Given a standard normal (\(Z \sim N(0,1)\)) we can calculate \(P(a<X<b)\) with \(\Phi(b)-\Phi(a)\) where \(\Phi(x)\) is the pnorm function we used earlier
If \(X\sim N(\mu,\sigma)\) how we calculate \(P(a<X<b)?\)

Standardize: subtract the mean and divide by the standard deviation to obtain the z score. This is still a normal random variable

\(P(a<x<b) = P(\frac{a-\mu}{\sigma} < \frac{X-\mu}{\sigma} < \frac{b-\mu}{\sigma}) = P(z_a<Z<z_b)\)

\(= \Phi(z_a)-\Phi(z_b)\)

Question: z scores normal

what is the probability that \(\hat\beta_1\) is between -2 and 2, but now \(\hat\beta_1\sim N(1,2)\), ie it now has mean 1 and standard error of 2? First step?
\((\hat\beta_1 -1)/2 \equiv z\sim N(0,1)\)
Now translate the probability: \(-2<\beta_2<2\implies -1.5<z<.5\)

Answer: z scores normal

       x area_to_left
 1: -3.0        0.001
 2: -2.5        0.006
 3: -2.0        0.023
 4: -1.5        0.067
 5: -1.0        0.159
 6: -0.5        0.309
 7:  0.0        0.500
 8:  0.5        0.691
 9:  1.0        0.841
10:  1.5        0.933
11:  2.0        0.977
12:  2.5        0.994
13:  3.0        0.999

[1] 0.6246553

t distribution vs Z

Asymptotically (as \(n\to\infty\)) \(\hat\beta_1\) is normally distributed, but for small n it actually follows a t distribution
The t distribution has a mean and standard deviation, but also degrees of freedom (\(df=n-1\))
Nothing actually changes, except when we lookup a table or calculate in R we use slightly different function to calculate
In R: pt(z,df) instead of pnorm(z)
It is rare to have small samples in econometrics - they basically give the same result
R regression output gives you t values by default

T distribution: graph

Flipping the question

We know that if \(\hat\beta_1=0\) we may still get nonzero results
If we obtain \(\hat\beta_1=1\), what is the probability that \(\beta_1=0\)? i.e. what is the probability there is actually no effect but we obtain a nonzero result?
This is impossible to answer, but we can get a suggestive answer from the prior exercise
We assume the null hypothesis (\(\beta_1=0\)), and calculate the probability that we obtain a value of \(\hat\beta_1\ge 1\), ie \(\hat\beta_1\sim N(0,1), p=P(\hat\beta_1>1)=1-\Phi(1)\approx .32\),

Adding Hypothesis Testing Jargon

We are testing the hypothesis that x causes y, ie that \(\beta_1\neq0\) (we are explicitly assuming exogeneity at the moment). We label these hypotheses.
- \(H_0\) is always the null hypothesis: that no effect exists. Here \(H_0: \beta_1=0\)
- \(H_1\) is the alternative hypothsis here that there is an effect: \(H_1: \beta_1\neq0\). We could have also explicitly tested \(\beta_1>0\) or \(\beta_1<0\)

Adding Hypothesis Testing Jargon

We don’t have enough information to calculate probabilities. Instead we ask: if \(H_0\) is true, what is the probability that we observe a result as extreme as \(\hat\beta_1\)? Suppose we obtained an estimate of \(\hat\beta_1=1\)
- \(P(|\hat\beta_1|>1\ |\ \beta_1=0)\)
- This is called the p value, here \(p=1-.68=.32\)

Adding Hypothesis Testing Jargon

We set up a criterion for rejecting \(H_0\) which we call the signficance level \(\alpha\) (normally .05 or .01). If the probability of obtaining such a result under the null is small, we reject the null and assume the effect is not statistical noise, otherwise we conservatively fail to reject \(H_0\)
- Here \(\alpha=.05 > p=.32\) so we fail to reject \(H_0\)
- There could still be an effect (on average it’s 1), it’s just too small to confidently conclude it wasn’t random chance

Adding Hypothesis Testing Jargon

Under this setup, we will reject \(\alpha=5\%\) of cases where \(\beta_1=0\) by chance (the type I error rate, or false positive rate)
We can also fail to reject \(H_0\) (and conclude there is no effect) even when \(\beta_1\neq 0\), called a type II error or false negative
- This occurs with probability \(\beta\), which requires additional assumptions to calculate

Hypothesis testing: Type 1 and Type 2 errors

We can either reject or fail to reject \(H_0\), additional \(H_0\) can either be true or false (note that we can never observe this). This leads to 4 possibilities

Hypothesis testing: Type 1 and Type 2 errors

\(H_0\) is true and you fail to reject H_0 : correct decision
\(H_0\) is true and you reject H_0: False positive (type 1 error) This happens with probability \(\alpha\)
\(H_0\) is false and you fail to reject \(H_0\): False negative (type 2 error). This happens with probability \(\beta\), which requires additional assumptions to calculate
\(H_0\) is false and you reject \(H_0\) Correct decision

Power Curve

Let \(\alpha=.05\), \(H_0: \hat\beta_1\sim N(0,1)\). Then we fail to reject \(H_0\) if \(-1.96<\hat\beta_1<1.96\)
Probability of a type I error is .05 Type 2 error depends on what \(\beta_1 is\)
Calculate this for each value of \(\beta_1\)
Power curve graphs the “power” - \(1-\beta_1\) (so that a power of 1 corresponds to a type II error rate of 0)
Because this is symmetric we normally only graph the right hand side (which we can reference as \(|\beta_1|\))

Power Curve: Graph

Power: Implications

Suppose my outcome variable is number of years of education achieved. A result of .0001 years may end up being statistically significant, but not practically
Suppose I determine that anything below 1 year is not very significant. Then I can look up \(\beta=1\) in the power curve and determine my probability of detecting an effect of that size
In the prior graph I only have a 20% chance of being able to detect such a change. I need an effect of around 3 just to have an 80% chance of detecting an event

Power: Implications

Given power, we can now interpret results of hypothesis testing. Assuming all of our modeling assumptions hold, then:
If we reject \(H_0\) then we know this is likely a real effect, though we must interpret it since it could be incredibly small in practice
If we fail to reject \(H_0\) then it may have just been random noise, but there also could have been a sizeable effect that was missed due to lower power (e.g. \(\beta=1\))

Power: Implications

If we have a large sample size then our interpretation is usually clear: we know whether there is an effect, and we can judge the size based on practical significance
This is conditional on all other assumptions! This is why economic papers focus most of their effort on methodology and very little on p values

Hypothesis Testing: Example

We randomly assign students to classroom with either 10 or 20 students in an experiment to determine the effect of class size on test scores. We run the regression \(score_i=\beta_0+\beta_1 size_i + \varepsilon_i\). After running an OLS regression we obtain \(\hat\beta_1 = 0.05, se(\hat\beta_1)= 0.04\) We wish to know whether class size affects student test scores, and use a significance level of \(\alpha=.05\)

Write \(H_0, H_1\) using a two tailed test
Calculate the standardized values (z scores) for the area you’re testing

Hypothesis Testing: Example

We randomly assign students to classroom with either 10 or 20 students in an experiment to determine the effect of class size on test scores. We run the regression \(score_i=\beta_0+\beta_1 size_i + \varepsilon_i\). After running an OLS regression we obtain \(\hat\beta_1 = 0.05, se(\hat\beta_1)= 0.04\) We wish to know whether class size affects student test scores, and use a significance level of \(\alpha=.05\)

Compute the p value
Determine your decision

Hypothesis Testing: Example

\(H_0:\beta_1=0, H_1: \beta_1\neq 0\)
\(z=.05/.04=1.25 \implies |z|<1.25\)
\(p=1-(\Phi(1.25)-\Phi(-1.25)) = 1- .789=.211\)
Fail to reject \(H_0\)

Errors: Examples; categorization

You are given the following output from R on a regression of y vs x. Do you reject \(H_0: \beta_1=0\) at \(\alpha=.05\)?


Call:
lm(formula = y ~ x)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.21741 -0.65004 -0.04366  0.47332  3.08603 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept) -0.18679    0.09869  -1.893   0.0613 .
x            0.01065    0.10383   0.103   0.9185  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.986 on 98 degrees of freedom
Multiple R-squared:  0.0001074, Adjusted R-squared:  -0.0101 
F-statistic: 0.01053 on 1 and 98 DF,  p-value: 0.9185

Confidence Intervals

Suppose we reject \(H_0\) and have \(\hat\beta_1=1, se(\hat\beta_1)=0.1\).
We know that there is likely an effect (if all assumptions are met), and that the most likely value of \(\beta_1\) is 1
We would like to know the likely range of values \(\beta_1\) could actually be
Instead of testing a hypothesis, we can form an interval around \(\hat\beta_1\) that captures 95% (or 99%, etc) of the likely range of values it can take on.

Confidence Intervals

The calculation is essentially the same, and we call it a confidence interval: \(CI_{0.95} = \hat\beta_1 \pm z_{.95}se(\hat\beta_1)\)
- \(-1.96<z_{.95}<1.96\) gives an area of \(95\%\), so we have \(\hat\beta_1=1\pm .196 = (.804,1.196)\)
- How do we obtain 1.96? qnorm(.975). \(.975=1-.05/2\), ie we take \(\alpha\), divide between our two tails, and lookup the value in our table

Confidence Intervals

We say that we are 95% confident that the true value of \(\beta_1\) is between .804 and 1.196
- We don’t say that there’s a 95% chance that \(\beta_1\) is in this range because it isn’t true. We’re making a lot of assumptions when calculating this value

Confidence Intervals: Example

\(\hat\beta_1=3,se(\hat\beta_1)=2\) calculate a 99% confidence interval of \(\hat\beta_1\)
\(3\pm 2.576*2=3\pm 5.152 = (-2.152,8.152)\)

When Hypothesis testing goes wrong

The tea example is quite compelling, but in the real world hypothesis testing is highly misinterpreted
Some journals are now banning significance based hypothesis testing, and the American Statistical Association has had to put out statements regarding the misuse of p values
When we run a regression model, our hypothesis that we are rejecting is not that \(\beta_1=0\) (if we reject that then it must be that \(\beta_1\neq0\) which implies there is a causal relationship). Rather, we are rejecting the claim that \(\beta_1=0\) and all of our additional modeling assumptions are correct

When Hypothesis testing goes wrong

In particular, we assumed in calculating the distribution of \(\hat\beta_1\) that this was an unbiased estimate (exogeneity), and we calculated the variance using a formula that assumed uncorrelated error terms.
When we reject \(H_0\) we are just saying that at least one one of these assumptions is (statistically) incorrect
Note that if we have an extremely large sample size even slight differences will be significant (e.g. \(\beta_1=.001\neq0\)), but this means even very small modeling assumption errors will also result in significance. On big data you will almost always get p<.01, but that means virtually nothing

Cluster correlation

When Hypothesis testing goes wrong: real data


Call:
lm(formula = grade ~ attendance, data = dt)

Residuals:
    Min      1Q  Median      3Q     Max 
-63.514  -8.129   0.613   9.083  40.188 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   54.011      3.703  14.588  < 2e-16 ***
attendance    38.164      4.377   8.719 9.41e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 15.85 on 206 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.2695,    Adjusted R-squared:  0.266 
F-statistic: 76.02 on 1 and 206 DF,  p-value: 9.408e-16

When Hypothesis testing goes wrong: real data

Our p value is 0.000000000000000941. Highly Significant!
We reject \(H_0\) that \(\beta_1=0\), but we absolutely cannot claim that this is a causal effect because there is no exogeneity.
Rather, we’re concluding that all of our assumptions are implausible. i.e. it almost certainly cannot be the case that \(\beta_1=0\) AND we have exogeneity in our model AND our error terms are uncorrelated AND our true model is linear.
Note that this says almost nothing!

Yet another type of wrong

So is there any use for hypothesis testing

First, you must show that your modeling assumptions are valid. For example, claim exogeneity by using a randomized control trial or other compelling quasi-experimental design
Second, make conservative estimates in your other assumptions. e.g. overstating the variance in your model will be more convincing than understating your variance
Third, interpret your results within the context of how precise your estimate is
Significance testing is a small but important part of research

Hypothesis testing: Summary of steps

Calculate \(\hat\beta_1\) and \(se(\hat\beta_1) (=\sigma_{\hat\beta_1})\) (this will be given to you as the output to a regression)
Compute the z-score (the number of standard deviations from the mean) : \(z = \frac{\hat\beta_1-0}{se(\hat\beta_1)}\)
Calculate \(P(|Z|\ge z)\), where \(Z\sim N(0,1)\)
- Look it up in a normal table, or use 1-(pnorm(z)-pnorm(-z)) in R
Compare p to \(\alpha\). If \(p<\alpha\), reject \(H_0\), otherwise fail to reject \(H_0\)

Hypothesis testing: Summary of steps

For a confidence interval, instead calculate \(\hat\beta_1 \pm z_\alpha*se(\hat\beta_1)\)
- \(z_\alpha\) is looked up via a table, or qnorm in R. It’s 1.96 for 95% confidence