Wednesday April 3 Class Notes

Review of Vocabulary

A graphical way of thinking about hypothesis testing.

App for playing with Significance and Power

fetchData("mHypTest.R")
mHypTest()  # by default, a coefficient

Power

How did I know that I could reject the Null in the shuffling problem? I did a little simulation.

mysim <- function(n = 15) {
    days = resample(1:31, size = n)
    nums = ceiling(runif(n, min = 0, max = days))
    mod = lm(nums ~ days)
    list(r2 = r.squared(mod), p = summary(mod)$coef[2, 4])
}
s15 = do(1000) * mysim(24)  # typical R^2 is about 0.4
mean(~r2, data = s15)
## [1] 0.2384
tally(~p < 0.05, data = s15, format = "proportion")
## 
##  TRUE FALSE Total 
## 0.959 0.041 1.000

Hypothesis tests on individual coefficients

The p-value on an individual coefficient.

Simulate life on Planet Null by shuffling the variables involved in the coefficient.

What's the distribution of p-values on Planet Null?

summary(lm(width ~ sex + length, data = KidsFeet))$coef
##             Estimate Std. Error t value  Pr(>|t|)
## (Intercept)   3.6412     1.2506   2.912 6.139e-03
## sexG         -0.2325     0.1293  -1.798 8.055e-02
## length        0.2210     0.0497   4.447 8.015e-05
summary(lm(width ~ shuffle(sex) + length, data = KidsFeet))$coef
##               Estimate Std. Error t value  Pr(>|t|)
## (Intercept)    2.86848    1.22346  2.3446 2.468e-02
## shuffle(sex)G -0.03776    0.12864 -0.2935 7.708e-01
## length         0.24844    0.04944  5.0252 1.391e-05
summary(lm(width ~ shuffle(sex) + length, data = KidsFeet))$coef[2, 4]
## [1] 0.878
s = do(1000) * summary(lm(width ~ shuffle(sex) + length, data = KidsFeet))$coef[2, 
    4]
densityplot(~result, data = s)

plot of chunk unnamed-chunk-4

R2 as a test statistic

What's the distribution of R2 under the Null Hypothesis?

What do we mean by the Null Hypothesis?

We're going to study the “Whole Model” Null Hypothesis.

Show distribution of R2.

It's nice to have some theory.

What's the typical value of R2 if you make up \( m-1 \) random vectors to explain \( n \) data points. (We'll always include an intercept in the model in addition to the random vectors.)

kids = fetchData("KidsFeet")
## Data KidsFeet found in package.
r.squared(lm(width ~ rand(10), data = kids))
## [1] 0.1922
s = do(1000) * r.squared(lm(width ~ rand(10), data = kids))
densityplot(~result, data = s)

plot of chunk unnamed-chunk-5

Now do it for the CPS85 data. Tell me how many random vectors you would need to get a typical R2 of 0.2. Then simulate this and confirm your answer.

Do mHypTest(TRUE) setting the “effect size” to about 0.4 Translation to F.

s = do(1000) * r.squared(lm(width ~ rand(10), data = kids))
s = transform(s, F = (result/10)/((1 - result)/(38 - 10)))
densityplot(~F, data = s)
plotFun(df(x, 10, 28) ~ x, add = TRUE, col = "red")

plot of chunk unnamed-chunk-6