Monday April 1 Class Notes

Review of Vocabulary

The Summary Report and the p-value

Show the t-statistic and the translation into a p-value.

Show the p-value reported on R2.

Notice that the alternative plays no role whatsoever in the regression report.

The Multi-World Metaphor for Statistical Inference

Motto: Always know what world you are thinking about.

Planet Earth

We want to know which hypotheses are true on Earth and which are false.

The planets involved in statistical inference are:

Planet Sample Planet Null Planet Alt
Planet Sample Planet Null Planet Alt

How to travel to the different worlds …

Other hypotheses, with content

Side-by-side comparison: http://en.wikipedia.org/wiki/File:PileatedIvoryWoodpecker.svg

Some of the photographic evidence The Ivory Billed woodpecker from a hypothesis testing perspective second page.

App for playing with Significance and Power

fetchData("mHypTest.R")
mHypTest()  # by default, a coefficient

Stocks on Planet Null

fetchData("getDJIAdata.R")
## Retrieving from http://www.mosaic-web.org/go/datasets/getDJIAdata.R
## [1] TRUE
djia = getDJIAdata()  # djia-2011.csv is the basic file
## Retrieving from http://www.mosaic-web.org/go/datasets/djia-2011.csv
xyplot(Close ~ Date, data = djia)

plot of chunk unnamed-chunk-3

Look at the day-to-day differences in log prices:

dd = with(djia, diff(log(Close)))
mean(dd)
## [1] 0.000191

Subtract out the mean, shuffle, cumulative sum, and exponentiate to create a realization:

ddnull = dd - mean(dd)
sim = exp(cumsum(shuffle(ddnull)))
xyplot(sim ~ Date, data = djia)

plot of chunk unnamed-chunk-5

Shuffling

A proof for the existence of Extra-Sensory Perception! If I can get you to focus on a number, I can predict, to some extent, your thought process.

Your birthday is a number that plays an important part in your thought process. Generate a random number between 0 and your birday.

Permutation test by hand

Spreadsheet reading command:

esp = fetchGoogle("https://docs.google.com/spreadsheet/pub?key=0Am13enSalO74dE5iMjZrcGFjTUtJSjg0T05NLW84Mmc&single=true&gid=0&output=csv")
## Loading required package: RCurl
## Warning: package 'RCurl' was built under R version 2.15.2
## Loading required package: bitops

Power

How did I know that I could reject the Null in the shuffling problem? I did a little simulation.

mysim <- function(n = 15) {
    days = resample(1:31, size = n)
    nums = ceiling(runif(n, min = 0, max = days))
    mod = lm(nums ~ days)
    list(r2 = r.squared(mod), p = summary(mod)$coef[2, 4])
}
s15 = do(1000) * mysim(24)  # typical R^2 is about 0.4
mean(~r2, data = s15)
## [1] 0.4169
tally(~p < 0.05, data = s15, format = "proportion")
## 
##  TRUE FALSE Total 
##  0.97  0.03  1.00

Do mHypTest(TRUE) setting the “effect size” to about 0.4