Wednesday April 3 Class Notes

Review of Vocabulary

A graphical way of thinking about hypothesis testing.

App for playing with Significance and Power

fetchData("mHypTest.R")
mHypTest()  # by default, a coefficient

Power

How did I know that I could reject the Null in the shuffling problem? I did a little simulation.

mysim <- function(n = 15) {
    days = resample(1:31, size = n)
    nums = ceiling(runif(n, min = 0, max = days))
    mod = lm(nums ~ days)
    list(r2 = r.squared(mod), p = summary(mod)$coef[2, 4])
}
s15 = do(1000) * mysim(24)  # typical R^2 is about 0.4
mean(~r2, data = s15)

## [1] 0.2384

tally(~p < 0.05, data = s15, format = "proportion")

## 
##  TRUE FALSE Total 
## 0.959 0.041 1.000

Hypothesis tests on individual coefficients

The p-value on an individual coefficient.

Simulate life on Planet Null by shuffling the variables involved in the coefficient.

What's the distribution of p-values on Planet Null?

summary(lm(width ~ sex + length, data = KidsFeet))$coef

##             Estimate Std. Error t value  Pr(>|t|)
## (Intercept)   3.6412     1.2506   2.912 6.139e-03
## sexG         -0.2325     0.1293  -1.798 8.055e-02
## length        0.2210     0.0497   4.447 8.015e-05

summary(lm(width ~ shuffle(sex) + length, data = KidsFeet))$coef

##               Estimate Std. Error t value  Pr(>|t|)
## (Intercept)    2.86848    1.22346  2.3446 2.468e-02
## shuffle(sex)G -0.03776    0.12864 -0.2935 7.708e-01
## length         0.24844    0.04944  5.0252 1.391e-05

summary(lm(width ~ shuffle(sex) + length, data = KidsFeet))$coef[2, 4]

## [1] 0.878

s = do(1000) * summary(lm(width ~ shuffle(sex) + length, data = KidsFeet))$coef[2, 
    4]
densityplot(~result, data = s)

plot of chunk unnamed-chunk-4

R² as a test statistic

What's the distribution of R² under the Null Hypothesis?

What do we mean by the Null Hypothesis?

All explanatory variables are unrelated to the response.
Just one explanatory variable is related?
Some subset of explanatory variables are related?

We're going to study the “Whole Model” Null Hypothesis.

Show distribution of R^2.

It's nice to have some theory.

What's the typical value of R² if you make up \( m-1 \) random vectors to explain \( n \) data points. (We'll always include an intercept in the model in addition to the random vectors.)

kids = fetchData("KidsFeet")

## Data KidsFeet found in package.

r.squared(lm(width ~ rand(10), data = kids))

## [1] 0.1922

s = do(1000) * r.squared(lm(width ~ rand(10), data = kids))
densityplot(~result, data = s)

plot of chunk unnamed-chunk-5

Do it for no random vectors.
Do it for 40 random vectors. R² is always 1
What's the smallest \( m-1 \) can be to get an R² of 1?
Do it for 1, 2, 3, 4, random vectors, 10, 20, 30 random vectors. What's the pattern?

Now do it for the CPS85 data. Tell me how many random vectors you would need to get a typical R² of 0.2. Then simulate this and confirm your answer.

Do mHypTest(TRUE) setting the “effect size” to about 0.4 Translation to F.

s = do(1000) * r.squared(lm(width ~ rand(10), data = kids))
s = transform(s, F = (result/10)/((1 - result)/(38 - 10)))
densityplot(~F, data = s)
plotFun(df(x, 10, 28) ~ x, add = TRUE, col = "red")