Homework Solutions (5:30 PM - 6:00 PM)

Basic 4.30 Sunlight Protection Factor

Rebecca Jenkins

Annie O’Connell

Basic 4.31 Effect of Group Therapy on Survival of Breast Cancer Patients

Jaxon Horn3e

Supplemental 4.19 Bumpus’s Study

Jacob Levenson

Supplemental 4.29 Salvage Logging

Volunteers for next week?

Master Problem Exact Permutation Test of Sleuth Case 4.1.1 Challenger Data

Yibiao Liang


Outline of Class 3 on Sleuth Chapter 5 Comparisons Among Several Samples


R. A. Fisher, Inventor of ANOVA

R. A. Fisher, Inventor of ANOVA ## 25 terms invented by Fisher (courtesy of HA David & David Bee (AP Stat group)

variance 1918a; analysis of variance 1918b; likelihood 1921; maximum likelihood 1922a; parameter 1922a; statistic 1922a; consistency 1922a; efficiency 1922a; sufficiency 1922a; degrees of freedom 1922b; Student’s t, t 1924; z-distribution 1924; test of significance 1925a; level of significance 1925a; sufficient statistic 1925b; randomization 1926; randomized blocks 1926; confounding 1926; interaction 1926; sampling distribution 1928; covariance 1930; Greco-Latin square (with F. Yates) 1934; null hypothesis 1935; Yates’s correction for continuity 1936; normal score (with F. Yates) 1938

Most experimental & survey design is based on an ANOVA framework

  • One can’t really appreciate the need for proper replication without considering the implications for testing treatment effects with ANOVA

  • Hurlbert’s (1984) monograph criticizing statistics in ecological papers is largely a criticism of inappropriate ANOVA design

    • Hurlbert’s pseudoreplication is Tony Underwood’s ‘model misspecification’ and both are largely based on using an inappropriate ANOVA model
  • While ANOVA is a proper subset of the general linear model (GLM) and regression, as we’ll see, the concepts involving design and partitioning degrees of freedom are more evident in ANOVA models

Ehinger’s partition of statistical methods


image source: Dani Servén Marín (one of the developers of pyGAM)


Major types of ANOVA (there are many more)


Steps in doing an ANOVA-based experiment or survey


Case 5.1 Diet Restriction & Longevity

Sleuth3 Display 5.2


Sleuth3 Display 5.2 ***

5 planned contrasts out of choose (6,2) or 15 possible two-way tests.

Sleuth3 Display 5.3


If hypotheses are specified in advance, then you can test at a pre-set α level, without a posteriori (or post hoc, multiple comparison) adjustment of α levels. Recall that alpha = P(Type I error)

  • See Cook & Farewell (1996, J. Roy. Stat. Assoc. A; Posted in Blackboard). In dose-response studies, no need to adjust for number of dose treatments.

  • One large design allows the use of a more precise estimate of the error variance

    • Separate control vs. treatment t tests are not powerful

    • If interaction effects are evident, separate tests can be misleading. They can miss interaction effects.

Summary of Statistical Findings

  • There is overwhelming evidence that mean lifetimes in the six groups are different (p-value < 0.001); analysis of variance F-test).

  • Analysis of the 5 particular questions are

    1. There is convincing evidence that lifetime increases as a result of restricting the diet from 85 kcal/wk to 50 kcal/wk (1-sided p-value < 0.0001; t test)

    2. There is no evidence that reducing the calories before weaning increased lifespan, when the caloric intake after weaning is 50 kcal/wk (1-sided p value = 0.32, t test). A 95% CI for the amount by which the lifetime under the R/R50 diet exceeds the lifetime under the N/R50 diet is - 1.7 to 2.9 months.

    3. Further restriction of the diet from 50 to 40 kcal/wk increases lifetime by an estimated 2.8 months (95% CI: 0.5 to 5.1 months). The evidence that this effect is greater than zero is moderate (p=0.017, t test).

    4. There was moderate evidence that lifetime was decreased by the lowering of protein in addition to the 50 kcal/wk diet (2-sided p value =0.024; t-test)

    5. There is convincing evidence that the control mice live longer than the mice on the non-purified (NP) diet (1-sided p-value <0.0001)

  • Note that all 5 of these hypotheses can be tested as Tukey Least Significant Difference (LSD) tests (or linear contrasts) see Sleuth Ch 6


Case 0501 Box et al. (2005) Graphical ANOVA


Case 5.2 The Spock Conspiracy Trial—An Observational Study

Sleuth, page 117: Dr. Spock’s venire contained only 1 woman, who was released by the prosecution

Sleuth3 Display 5.4

Key questions

  • Is there evidence that women were underrepresented on the Spock judge’s venires

  • Is there evidence that there are differences in women’s representation on the other juries?


Tsarnaev Jury: Was it a biased sample of the eastern Massachusetts population?

NBC News 2/27/2015

The defense said 1,373 people, summoned from a population of about 5 million in eastern Massachusetts, were originally given numbers based on a random pool order list. New numbers were later assigned, based on when the jurors reported to court to complete written questionnaires.

The defense argued that the reordering undermined the randomness of the selection process and pushed certain groups — including blacks, people under 30 and people who live in Boston — down on the list and made them less likely to be chosen for the jury.


Sleuth3 Case 0502. See Display 5.5 in Sleuth3

Statistical Summary

  • The percentage of women on the Spock judge’s venires were substantially lower than the other judges (t test of Spock judge vs.‘Other judges’)

  • There is little evidence to reject the null hypothesis of no difference in female representation among the other judges p=0.32 (1-way ANOVA)

  • The percentage of women is 15% less on the Spock judge’s venires (95% CI: 10% to 20%)

Case0502 Graphical ANOVA, Box et al. (2005)


Case Study 5.1 Diet Restriction and Longevity—A Randomized Experiment: R analysis

First look at the structure of the data

str(case0501)
## 'data.frame':    349 obs. of  2 variables:
##  $ Lifetime: num  35.5 35.4 34.9 34.8 33.8 33.5 32.6 32.4 31.8 31.6 ...
##  $ Diet    : Factor w/ 6 levels "N/N85","N/R40",..: 4 4 4 4 4 4 4 4 4 4 ...
favstats(Lifetime ~ Diet, data = case0501)  # from Horton's mosaic
##    Diet  min     Q1 median    Q3  max     mean       sd  n missing
## 1 N/N85 17.9 31.400  33.10 36.40 42.3 32.69123 5.125297 57       0
## 2 N/R40 19.6 42.275  46.05 50.35 54.6 45.11667 6.703406 60       0
## 3 N/R50 18.6 37.950  43.90 48.20 51.9 42.29718 7.768195 71       0
## 4    NP  6.4 24.800  28.90 31.40 35.5 27.40204 6.133701 49       0
## 5 R/R50 24.2 39.150  43.95 48.35 50.7 42.88571 6.683152 56       0
## 6 lopro 23.4 35.000  41.05 46.45 49.7 39.68571 6.991695 56       0

Do a graphical analysis to evaluate outliers and homogeneity of variance

# Re-order levels for better boxplot organization:
myDiet <- factor(case0501$Diet, levels=c("NP","N/N85","N/R50","R/R50","lopro","N/R40") )
myNames <- c("NP(49)","N/N85(57)","N/R50(71)","R/R50(56)","lopro(56)",
             "N/R40(60)") # Make these for boxplot labeling.
boxplot(Lifetime ~ myDiet, ylab= "Lifetime (months)", names=myNames,
        xlab="Treatment (and sample size)", data=case0501)

densityplot(~Lifetime, groups = Diet, auto.key = TRUE, data=case0501)


Do the analysis of variance (ANOVA) on the Case0501 longevity data

myAov1 <- aov(Lifetime ~ Diet, data=case0501) # One-way analysis of variance
myAov1
## Call:
##    aov(formula = Lifetime ~ Diet, data = case0501)
## 
## Terms:
##                     Diet Residuals
## Sum of Squares  12733.94  15297.42
## Deg. of Freedom        5       343
## 
## Residual standard error: 6.678239
## Estimated effects may be unbalanced
summary(myAov1)
##              Df Sum Sq Mean Sq F value Pr(>F)    
## Diet          5  12734  2546.8    57.1 <2e-16 ***
## Residuals   343  15297    44.6                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary.lm(myAov1)
## 
## Call:
## aov(formula = Lifetime ~ Diet, data = case0501)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -25.5167  -3.3857   0.8143   5.1833  10.0143 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  32.6912     0.8846  36.958  < 2e-16 ***
## DietN/R40    12.4254     1.2352  10.059  < 2e-16 ***
## DietN/R50     9.6060     1.1877   8.088 1.06e-14 ***
## DietNP       -5.2892     1.3010  -4.065 5.95e-05 ***
## DietR/R50    10.1945     1.2565   8.113 8.88e-15 ***
## Dietlopro     6.9945     1.2565   5.567 5.25e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.678 on 343 degrees of freedom
## Multiple R-squared:  0.4543, Adjusted R-squared:  0.4463 
## F-statistic:  57.1 on 5 and 343 DF,  p-value: < 2.2e-16
attributes(myAov1)
## $names
##  [1] "coefficients"  "residuals"     "effects"       "rank"         
##  [5] "fitted.values" "assign"        "qr"            "df.residual"  
##  [9] "contrasts"     "xlevels"       "call"          "terms"        
## [13] "model"        
## 
## $class
## [1] "aov" "lm"

Analyze the residuals to examine homogeneity of variance and outliers

# Residual vs. fitted plot
plot(myAov1, which=1) # Plot residuals versus estimated means.

summary(myAov1)
##              Df Sum Sq Mean Sq F value Pr(>F)    
## Diet          5  12734  2546.8    57.1 <2e-16 ***
## Residuals   343  15297    44.6                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The residual plot indicates no violation of assumptions.

Plot the Honestly Significant Differences (Sleuth Ch 6). Not as powerful as testing a priori hypotheses

TukeyHSD(myAov1)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Lifetime ~ Diet, data = case0501)
## 
## $Diet
##                    diff        lwr         upr     p adj
## N/R40-N/N85  12.4254386   8.885436  15.9654413 0.0000000
## N/R50-N/N85   9.6059550   6.202170  13.0097399 0.0000000
## NP-N/N85     -5.2891873  -9.017748  -1.5606269 0.0008380
## R/R50-N/N85  10.1944862   6.593417  13.7955556 0.0000000
## lopro-N/N85   6.9944862   3.393417  10.5955556 0.0000008
## N/R50-N/R40  -2.8194836  -6.175736   0.5367684 0.1564608
## NP-N/R40    -17.7146259 -21.399845 -14.0294069 0.0000000
## R/R50-N/R40  -2.2309524  -5.787127   1.3252222 0.4684413
## lopro-N/R40  -5.4309524  -8.987127  -1.8747778 0.0002306
## NP-N/R50    -14.8951423 -18.449713 -11.3405719 0.0000000
## R/R50-N/R50   0.5885312  -2.832070   4.0091319 0.9963976
## lopro-N/R50  -2.6114688  -6.032070   0.8091319 0.2460200
## R/R50-NP     15.4836735  11.739756  19.2275913 0.0000000
## lopro-NP     12.2836735   8.539756  16.0275913 0.0000000
## lopro-R/R50  -3.2000000  -6.816968   0.4169683 0.1167873
plot(TukeyHSD(myAov1))


Box et al. (2005) Graphical ANOVA see Display 5.16, 5.17, p 134 & 135

anovaPlot(myAov1, stacked = TRUE, base = TRUE, axes = TRUE,
          faclab = TRUE, labels = TRUE, cex = par("cex"),
          cex.lab = par("cex.lab"))

The graphical ANOVA shows the strong differences between most of the diets, especially NP and N/R40


All Pairwise t tests with pooled sd, as in Ramsey & Schafer (2013)

pairwise.t.test(case0501$Lifetime,case0501$Diet, pool.SD=TRUE, p.adj="none")
## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  case0501$Lifetime and case0501$Diet 
## 
##       N/N85   N/R40   N/R50   NP      R/R50
## N/R40 < 2e-16 -       -       -       -    
## N/R50 1.1e-14 0.017   -       -       -    
## NP    5.9e-05 < 2e-16 < 2e-16 -       -    
## R/R50 8.9e-15 0.073   0.622   < 2e-16 -    
## lopro 5.2e-08 1.6e-05 0.029   < 2e-16 0.012
## 
## P value adjustment method: none

General Linear Hypothesis Testing (glht package) of Specified Means

p-VALUES AND CONFIDENCE INTERVALS FOR SPECIFIED COMPARISONS OF MEANS

if(require(multcomp)){
  diet <- factor(case0501$Diet,labels=c("lopro", "NN85", "NR40", "NR50", "NP", "RR50"))
  
  myAov2 <- aov(Lifetime ~ diet - 1, data=case0501)  # Note the -1 is in the Sleuth3 vignette
  # It drops the dietlopro diet from the analysis, making it the llinear model
  # reference group
  summary(myAov2)
  myComparisons <- glht(myAov2,
                        linfct=c("dietNR50 - dietNN85 = 0",
                                 "dietNR40 - dietNR50 = 0",
                                 "dietRR50 - dietNR50 = 0",
                                 "dietlopro - dietNR50 = 0",
                                 "dietNN85 - dietNP = 0") )
  summary(myComparisons,test=adjusted("none")) # No multiple comparison adjust.
  confint(myComparisons, calpha = univariate_calpha()) # No adjustment
}
## 
##   Simultaneous Confidence Intervals
## 
## Fit: aov(formula = Lifetime ~ diet - 1, data = case0501)
## 
## Quantile = 1.9669
## 95% confidence level
##  
## 
## Linear Hypotheses:
##                           Estimate lwr      upr     
## dietNR50 - dietNN85 == 0  -17.7146 -20.2438 -15.1854
## dietNR40 - dietNR50 == 0   14.8951  12.4556  17.3347
## dietRR50 - dietNR50 == 0   12.2837   9.7142  14.8532
## dietlopro - dietNR50 == 0   5.2892   2.7302   7.8481
## dietNN85 - dietNP == 0      2.2310  -0.2097   4.6716

Case Study 5.2 The Spock Conspiracy Trial—An Observational Study: R analysis

First look at the structure of the data

str(case0502)
## 'data.frame':    46 obs. of  2 variables:
##  $ Percent: num  6.4 8.7 13.3 13.6 15 15.2 17.7 18.6 23.1 16.8 ...
##  $ Judge  : Factor w/ 7 levels "A","B","C","D",..: 7 7 7 7 7 7 7 7 7 1 ...
favstats(Percent ~ Judge, data = case0502)  # from Horton's mosaic
##     Judge  min     Q1 median     Q3  max     mean        sd n missing
## 1       A 16.8 30.800  33.60 40.500 48.9 34.12000 11.941817 5       0
## 2       B 27.0 29.675  32.35 34.800 45.6 33.61667  6.582224 6       0
## 3       C 21.0 27.500  30.50 32.500 33.8 29.10000  4.592929 9       0
## 4       D 24.3 25.650  27.00 28.350 29.7 27.00000  3.818377 2       0
## 5       E 17.7 20.150  24.70 33.075 40.2 26.96667  9.010142 6       0
## 6       F 16.5 23.500  26.70 29.800 36.2 26.80000  5.968878 9       0
## 7 Spock's  6.4 13.300  15.00 17.700 23.1 14.62222  5.038794 9       0

Do a graphical analysis with a boxplot to evaluate outliers and homogeneity of variance

# Re-order levels for better boxplot organization:
## Make new factor level names (with sample sizes) and analyze boxplots
myNames <- c("A (5)", "B (6)", "C (9)", "D (2)", "E (6)", "F (9)", "Spock's (9)")

boxplot(Percent ~ Judge, ylab = "Percent of Women on Judges' Venires",
        names = myNames, xlab = "Judge (and number of venires)",
        main = "Percent Women on Venires of 7 Massachusetts Judges", 
        data=case0502)


Plot a colored boxplot

boxplot(Percent ~ Judge,  ylab= "Percent of Women on Judges' Venires",
        names=myNames, xlab="Judge (and number of venires)",
        main= "Percent Women on Venires of 7 Massachusetts Judges",
        col="green", boxlwd=2,  medlwd=2,  whisklty=1,  whisklwd=2,
        staplewex=.2, staplelwd=2,  outlwd=2,  outpch=21, outbg="green",
        outcex=1.5, data=case0502)

Do the analysis of variance (ANOVA) for Case0502 Spock data

myAov2  <- aov(Percent ~ Judge, data=case0502)
summary(myAov2) # Initial screening. Any evidence of judge differences? (yes)
##             Df Sum Sq Mean Sq F value  Pr(>F)    
## Judge        6   1927   321.2   6.718 6.1e-05 ***
## Residuals   39   1864    47.8                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary.lm(myAov2)
## 
## Call:
## aov(formula = Percent ~ Judge, data = case0502)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -17.320  -4.367  -0.250   3.319  14.780 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   34.1200     3.0921  11.034 1.47e-13 ***
## JudgeB        -0.5033     4.1868  -0.120   0.9049    
## JudgeC        -5.0200     3.8566  -1.302   0.2007    
## JudgeD        -7.1200     5.7848  -1.231   0.2258    
## JudgeE        -7.1533     4.1868  -1.709   0.0955 .  
## JudgeF        -7.3200     3.8566  -1.898   0.0651 .  
## JudgeSpock's -19.4978     3.8566  -5.056 1.05e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.914 on 39 degrees of freedom
## Multiple R-squared:  0.5083, Adjusted R-squared:  0.4326 
## F-statistic: 6.718 on 6 and 39 DF,  p-value: 6.096e-05
attributes(myAov2)
## $names
##  [1] "coefficients"  "residuals"     "effects"       "rank"         
##  [5] "fitted.values" "assign"        "qr"            "df.residual"  
##  [9] "contrasts"     "xlevels"       "call"          "terms"        
## [13] "model"        
## 
## $class
## [1] "aov" "lm"

Analyze the residuals to examine homogeneity of variance and outliers

# Residual vs. fitted plot
plot(myAov2, which=1) # Plot residuals versus estimated means.

summary(myAov2)
##             Df Sum Sq Mean Sq F value  Pr(>F)    
## Judge        6   1927   321.2   6.718 6.1e-05 ***
## Residuals   39   1864    47.8                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The residual plot indicates no apparent violation of assumptions.


Analysis 1. TWO-SAMPLE t-TEST (ASSUMING NON-SPOCK JUDGES HAVE A COMMON MEAN)

SpockOrOther <- factor(ifelse(case0502$Judge=="Spock's","Spock","Other"))                    
aovFull      <- aov(Percent ~ Judge, data=case0502)
aovReduced   <- aov(Percent ~ SpockOrOther, data=case0502) 
anova(aovReduced,aovFull) #Any evidence that 7 means fit better than 2 means?  
## Analysis of Variance Table
## 
## Model 1: Percent ~ SpockOrOther
## Model 2: Percent ~ Judge
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1     44 2190.9                           
## 2     39 1864.5  5    326.46 1.3658 0.2582
 # Evidence that 2 means differ?
t.test(Percent ~ SpockOrOther, var.equal=TRUE, data=case0502)
## 
##  Two Sample t-test
## 
## data:  Percent by SpockOrOther
## t = 5.6697, df = 44, p-value = 1.03e-06
## alternative hypothesis: true difference in means between group Other and group Spock is not equal to 0
## 95 percent confidence interval:
##   9.584045 20.155294
## sample estimates:
## mean in group Other mean in group Spock 
##            29.49189            14.62222

There is strong evidence that the Spock judge differs from the other 6. There is little evidence for a difference among the 6 non-Spock judges

Graphical ANOVA from Box, Hunter & Hunter, 2nd edition.

anovaPlot(aovFull, stacked = TRUE, base = TRUE, axes = TRUE,
          faclab = TRUE, labels = TRUE, cex = par("cex"),
          cex.lab = par("cex.lab"))

There is strong evidence from the Box et al. (2005) graphical ANOVA that the Spock juries differs from the other 6 in the percentage of females.


ANALYSIS 2. COMPARE SPOCK MEAN TO AVERAGE OF OTHER MEANS, using multcomp::glht

myAov3 <- aov(Percent ~ Judge - 1, data = case0502)

myContrast    <- rbind(c(1/6, 1/6, 1/6, 1/6, 1/6, 1/6, - 1))
if(require(multcomp)) {    
  myComparison  <- glht(myAov3, linfct=myContrast) 
  summary(myComparison, test=adjusted("none"))
  confint(myComparison)
}
## 
##   Simultaneous Confidence Intervals
## 
## Fit: aov(formula = Percent ~ Judge - 1, data = case0502)
## 
## Quantile = 2.0227
## 95% family-wise confidence level
##  
## 
## Linear Hypotheses:
##        Estimate lwr     upr    
## 1 == 0 14.9783   9.6348 20.3219

myAov3 <- aov(Percent ~ Judge - 1)

myContrast <- rbind(c(1/6, 1/6, 1/6, 1/6, 1/6, 1/6, - 1)) if(require(multcomp)){ # use multcomp library myComparison <- glht(myAov3, linfct=myContrast) summary(myComparison, test=adjusted(“none”))
confint(myComparison) }


Sleuth3 5.2 Comparing any two of several means

Comparisons among means in ANOVA can be analyzed using t statistics, with a new, more precise estimate of pooled error. It is that pooling, with higher df, that makes ANOVA a more powerful method than multiple t tests.

Sleuth3 Display 0506 sp = √ Error Mean Square = √ Within Groups MS


Case 5.1 Pooled standard error


Extra Sum of Squares F test


F distribution


F distribution


Display 5.10


Case0502 ANOVA table


Case0502 ANOVA linear contrast


Testing linear contrasts with R’s multcomp

myContrast    <- rbind(c(1/6, 1/6, 1/6, 1/6, 1/6, 1/6, - 1))
if(require(multcomp)){  # use multcomp library
  myComparison  <- glht(myAov3, linfct=myContrast) 
  summary(myComparison, test=adjusted("none"))   
  confint(myComparison) 
}
## 
##   Simultaneous Confidence Intervals
## 
## Fit: aov(formula = Percent ~ Judge - 1, data = case0502)
## 
## Quantile = 2.0227
## 95% family-wise confidence level
##  
## 
## Linear Hypotheses:
##        Estimate lwr     upr    
## 1 == 0 14.9783   9.6348 20.3219

When is an effect Random?

See Sleuth Page 137-139 in 3rd ed, 136-138 in 2nd: ‘The Random Effects model’

Quinn and Keough on random effects

Spock judge versus the other six judges. Is the judge effect fixed or random?

  • Model I ANOVA: Fixed effects ANOVA: test for differences in the averages among groups

  • Model II ANOVA: Random effects ANOVA: test differences in variances due to the group classification

  • Model III model: Fixed & random factors

Note: The calculations are often identical for random and fixed-effects ANOVA, but the interpretations are different. With Factorial ANOVA (>1 factor), the F statistics differ among models, with a different denominator mean square for random factors. The inference allowed differs among models

ANOVA Robustness of Assumptions

Sleuth3 Display 5.13 p 131


Sleuth3 Display 5.15 p 133


Sleuth3 Display 5.19 p 139


Cumming and Finch (2005) Inference by eye


Cumming and Finch (2005) Inference by eye

Statistical versus scientific significance

Always report the effect size and NEVER report ‘significant’ or ‘not significant’

Buhl-Mortensen (1996) The Precautionary Principle


Type II error and Power


Nonparametric 1-way ANOVA Kruskal Wallis test

Type II error and Power

Conclusions

Assumptions

  • Homoscedasticity or equal variance among groups

    • Levene’s test a rough guide

    • Boxplots or residual plots are the standard tools for assessing homoscedasticity (equal variance among groups)

    • Spread vs. Level plots

  • Independence of errors among groups a key ANOVA assumption

  • Normally distributed errors (not underlying data) not crucial

An ANOVA is more efficient & powerful than multiple, separate t tests

  • The ANOVA error MS (=within groups MS) provides a more precise estimate of the population standard deviation [It is not a smaller estimate of error {it is an unbiased estimator})

Kruskal-Wallis ANOVA is the rank-based analogue of 1-way ANOVA and is resistant to outliers but not unequal spread

  • Ties correction must be used

  • Effect sizes, hierarchic structure, and covariates difficult to handle

ANOVA tests for difference in means (fixed effect) or whether σ2= 0 (random effect) or both (mixed model)

  • Fixed vs. random effects

    • The choice of fixed vs. random effects is often crucial and depends on whether the factor levels (judges in the Spock example) represent a random or representative sample from some larger statistical population

    • The F statistics and interpretation of the results sometimes change depending on whether fixed or random effects are chosen