At the very start of the course, we introduced ways to quantify the amount of variation.
Then we started build models. Two basic quantities that we calculated:
Now we're translating into new quantities, ones that summarize the information in model coefficients and R2 in different ways.
Think of the p-value as a way of putting the various statistics — model coefficients, R2, t, F — onto a common scale, where they are easily interpreted.
Remember that p doesn't carry all the information that's in the coefficient or R2. It's a summary that indicates the strength of evidence for rejecting the Null Hypothesis.
fetchData("mHypTest.R")
mHypTest() # by default, a coefficient
Do mHypTest(TRUE) setting the “effect size” to about 0.4
Translation to F.
Calculation of p-value using pt() and pf().
Fit a model, do summary, then reproduce the p-value.
What's the distribution of p-values on Planet Null?
summary(lm(width ~ sex + length, data = KidsFeet))$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.6412 1.2506 2.912 6.139e-03
## sexG -0.2325 0.1293 -1.798 8.055e-02
## length 0.2210 0.0497 4.447 8.015e-05
summary(lm(width ~ shuffle(sex) + length, data = KidsFeet))$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.87819 1.22887 2.3422 2.482e-02
## shuffle(sex)G 0.01965 0.12990 0.1513 8.806e-01
## length 0.24692 0.04992 4.9461 1.771e-05
summary(lm(width ~ shuffle(sex) + length, data = KidsFeet))$coef[2, 4]
## [1] 0.1495
s = do(1000) * summary(lm(width ~ shuffle(sex) + length, data = KidsFeet))$coef[2,
4]
densityplot(~result, data = s)
Are there easy grading professors?
grades = fetchData("grades.csv")
## Retrieving from http://www.mosaic-web.org/go/datasets/grades.csv
g2n = fetchData("grade-to-number.csv")
## Retrieving from http://www.mosaic-web.org/go/datasets/grade-to-number.csv
courses = fetchData("courses.csv")
## Retrieving from http://www.mosaic-web.org/go/datasets/courses.csv
all = merge(grades, g2n)
all = merge(all, courses)
names(all)
## [1] "sessionID" "grade" "sid" "gradepoint" "dept"
## [6] "level" "sem" "enroll" "iid"
mod1 = lm(gradepoint ~ iid - 1, data = all)
Look at the distribution of p-values:
pvals1 = summary(mod1)$coef[, 4] # grab the 4th column
tally(~pvals1 < 0.05)
##
## TRUE FALSE Total
## 359 0 359
Every one of the p-values is less than 0.05. Why?
Tweak: Center the grades on zero. That is, subtract out the mean grade so that the null hypothesis value of zero is meaningful.
options(na.rm = TRUE)
all = transform(all, GP = gradepoint - mean(gradepoint))
mod2 = lm(GP ~ iid - 1, data = all)
pvals2 = summary(mod2)$coef[, 4] # grab the 4th column
tally(~pvals2 < 0.05)
##
## TRUE FALSE Total
## 89 270 359
Ah! We've got 89 significant p-values. Let's look at the distribution of grades.
densityplot(~coef(mod2), groups = pvals2 < 0.05)
Things to point out:
The meaning of the t-statistic — how far from zero the coefficient is, measured in units of the standard error. Demonstrate this from the regression table.
Relationship between t and p-value. There's something called the t-distribution which has one parameter.
When we calculate a p-value, there's a choice to be made — should we count just one side or both sides in evaluating “as big or more extreme”. Many statistics texts spend a lot of time on this issue. We're going to ignore it. And follow the Food and Drug Adminstration's rule: Surprising is surprising, whichever way it goes.
Historically, t came first, about 1906. F was introduced in the 1920s as a generalization of t.
In econometrics, researchers tend to use t statistics to evaluate significance. I'll push F statistics. The F is a generalization of the t. The rule is \( t^2 = F \). To demonstrate this:
ts = rt(10000, df = 10)
Fs = rf(10000, df1 = 1, df2 = 10)
With t, you need to look at both tails. The software does this automatically. With F, only the tail \( >1 \) is meaningful, but this corresponds to both tails of the t.
For both distributions the parameter (df for t, df2 for F) counts the dimension of the residual subspace. This is effectively, how much data you have to use in estimating the size of the residual vector and therefore the standard error of the statistic.
Should we take every p-value less than 0.05 to indicate a professor who is significantly easy or hard grading? Let's look at the distribution of p-values on Planet Null.
mod3 = lm(GP ~ shuffle(iid) - 1, data = all)
pvals3 = summary(mod3)$coef[, 4] # grab the 4th column
histogram(~pvals3)
tally(~pvals3 < 0.05)
##
## TRUE FALSE Total
## 10 351 361
On Planet Null, the p-values are uniformly distributed between 0 and 1.
The Bonferroni correction. To make up for the fact that there are inevitably false-positive rejections of the null on Planet Null, the Bonferroni correction adjusts the threshold for rejection based on the number of tests. We have 359 professors, so the Bonferroni-adjusted threshold should be 0.05/359
densityplot(~coef(mod2), groups = pvals2 < 0.05/359)
tally(~pvals2 < 0.05/359)
##
## TRUE FALSE Total
## 18 341 359
They would be if the assumptions behind the techniques used to calculate probabilities matched the reality behind the data. One can get mislead. A simple way to avoid the most eggregious forms of violation of the assumptions is to use the rank transform on quantitative data.
goo = with(all, rank(GP))
mod4 = lm(goo - mean(goo) ~ shuffle(iid) - 1, data = all)
pvals4 = summary(mod4)$coef[, 4] # grab the 4th column
histogram(~pvals4)
tally(~pvals4 < 0.05)
##
## TRUE FALSE Total
## 18 346 364
An article about organic food diet and health of young children in the British Journal of Nutrition.
Table 3 shows odds ratios for five different foods for both moderately organic and strictly organic diets. 10 comparisons, one of which is significant at \( p=0.02 \). Doing a Bonferroni adjustment suggests an appropriate threshold is 0.005.
Print out list of 20 p-values. Ask who has got the first one < 0.05? Who has got any of them? What's the smallest p-value in the class?
Simulation activity:
fetchData("simulate.r")
## Retrieving from http://www.mosaic-web.org/go/datasets/simulate.r
## [1] TRUE
d = run.sim(bogus.groups, ngroups = 5, 10)
bwplot(val ~ group, data = d)
dd = subset(d, group %in% c("A", "E")) # or whatever are the furthest-apart groups
Carry out a hypothesis test on a subset that is just the best and worst groups. How often do people get a p-value less than 0.05?
Problem 14.04.
Publication bias. Filing cabinet problem.