Fisher’s contributions
Major types of ANOVA
Steps in doing ANOVA
Case Study 5.1 Diet Restriction & Longevity
5 a priori hypotheses
boxplots indicate no problem with heteroscedasticity
Statistical summary & evaluation of the 5 hypotheses
Graphical ANOVA
Case Study 5.2: The Spock trial;
boxplots indicate no problem with heteroscedasticity
Statistical Summary
Graphical ANOVA
Comparing 2 means with the pooled estimate of the sd
Testing a priori hypotheses with t statistic with pooled sd from error mean square
ANOVA can be performed given just means and variances
Extra sum of squares F test
The F distribution
The ANOVA table
Fixed effects ANOVA
nested ANOVA table
Type II & Type III sums of squares
ANOVA Models I, II, and III
Linear contrasts
Random vs. Fixed Effects
ANOVA assumptions
Kruskal-Wallis nonparametric ANOVA
Interpreting Confidence Limits
Statistical vs. Scientific significance
Spock Judge vs. the others: Fixed or Random Effect?
Conclusions
## 25 terms invented by Fisher (courtesy of HA David & David Bee (AP Stat group)
variance 1918a; analysis of variance 1918b; likelihood 1921; maximum likelihood 1922a; parameter 1922a; statistic 1922a; consistency 1922a; efficiency 1922a; sufficiency 1922a; degrees of freedom 1922b; Student’s t, t 1924; z-distribution 1924; test of significance 1925a; level of significance 1925a; sufficient statistic 1925b; randomization 1926; randomized blocks 1926; confounding 1926; interaction 1926; sampling distribution 1928; covariance 1930; Greco-Latin square (with F. Yates) 1934; null hypothesis 1935; Yates’s correction for continuity 1936; normal score (with F. Yates) 1938
One can’t really appreciate the need for proper replication without considering the implications for testing treatment effects with ANOVA
Hurlbert’s (1984) monograph criticizing statistics in ecological papers is largely a criticism of inappropriate ANOVA design
While ANOVA is a proper subset of the general linear model (GLM) and regression, as we’ll see, the concepts involving design and partitioning degrees of freedom are more evident in ANOVA models
Ehinger’s partition of statistical methods
image source: Dani Servén Marín (one of the developers of pyGAM)
One-factor or one-way ANOVA
1 factor with 2 or more levels
If there is just 1 factor with just 2 levels, ANOVA is identical to an independent samples Student’s t test
Randomized block ANOVA
Factorial ANOVA
2 or more factors with 2 or more levels of each factor
24 factorial: 4 factors, each with two levels
Can assess the interactions among variables, for example salinity and temperature effects on marine animal growth
Split-plot ANOVA
Full replication of factors isn’t possible, so factors divided into whole plots and subplots
Experiments in greenhouses or fields; large wafers used to print integrated circuits
Nested or hierarchical ANOVA
cores within quadrats, replicate quadrats within treatments, fish within aquaria
What is the experimental unit? It’s the smallest unit to which a treatment can be applied
For example, for fish within aquaria in an experiment on ocean acidity , the experimental units on which degrees of freedom are alloted are the aquaria not the fish
Mixed model ANOVA: random and fixed factors, often with nested factors
Repeated measures ANOVA
Repeated sampling of the same plots or individuals
Now handled as longitudinal models, usually as a type of mixed model
R’s lme4 & lme
Design the experiment or survey after identifying the hypotheses and describe statistical tests BEFORE collecting data (a priori vs . a posteriori hypotheses)
Do a pre-test or preliminary survey
If the variance is unknown, consider doing a preliminary experiment to calculate power for the full analysis
Also, a pre-test will allow covariates and problems to be identified
Endeavor to create balanced designs with equal number of replicates at each combination of treatment and block levels
Decide whether your factors are fixed or random
ANOVA tests for difference in means (fixed effect) or whether σ2= 0 (random effect) or both (mixed model ANOVA)
The choice of fixed vs. random effects is often crucial and depends on whether the factor levels represent a random or representative sample from some larger statistical population
The F statistics, the interpretation of the results, and the extent of statistical inference often change markedly depending on whether factors are fixed or random
Avoid pseudoreplication (Hurlbert 1984)
Pseudoreplication only occurs after an incorrect analysis has been presented: no analysis, no pseudoreplication
Pseudoreplication has two causes: inadequate replication at the design stage, or using an inappropriate model especially the wrong ANOVA model with an inappropriate error mean square and error d.f.
Examples of model misspecification
Failing to use a repeated measures design for longitudinal data
Confusing factorial and nested ANOVA
Inappropriately pooling terms in a nested, randomized block, or factorial ANOVA
Avoid pseudofactorialism (Hurlbert & White 1993): Using a factorial ANOVA when separate ANOVAs are appropriate
Set the alpha level for hypothesis tests (i.e., the critical values) in advance. Tests and hypothesis, as far as possible, should be specified in advance. A priori hypotheses, if a small subset of possible tests, can be tested at the experiment-wise alpha level, usually α=0.05.
Note that Mead, Gelman and Hurlbert all reject the use of multiple comparison tests
Patterns which reveal themselves after the data have been analyzed, or just graphed, must be assessed using an appropriate multiple comparison procedure that reduces the test α to maintain the experiment-wise or family-wise α level
Sample the appropriate populations & tabulate the data
Graph the data and critically look for violations of the assumptions, especially unequal spread
Unequal variance = heteroscedasticity = heteroskedacity = lack of homoscedasticity
Unequal variance is best revealed by box plots
Unequal spread can be tested with Levene’s test (there are 5 different versions of Levene’s test)
Transform the data to correct unequal spread
√ transform for Poisson-distributed counts, log (X+1) for logarithmically or log-normally distributed data
Logit (log (p/(1-p)) transform or arcsin √P for binomial data
Note that Zuur in his many R-based books rarely if ever transforms data, preferring to use analyses that assume non-normal error structure
Perform the ANOVA
Assess higher order interaction effects and analyze the influence of outliers
Graphically display residuals vs. Expected values & assess heteroscedasticity (again) and effects of outliers. Note that an outlier is only an outlier when compared to an underlying probability model
Use appropriate rules for pooling sums of squares to produce more powerful tests of lower order interactions & main effects
Evaluate null hypotheses, report p values & effect sizes
Examine the data for outliers, but never remove outliers without strong justification
Examine data notebooks to find out if there were conditions that justify treating outliers as a different statistical population (e.g., different analyst or different analytical instrument)
If the outlier’s removal might be justified, do the analysis with and without the outlier
If the conclusion remains the same, leave the outlier in, unless it has caused a major violation in assumptions
If the conclusion differs, drop the outlier and all similar data
If there is no reason for removing the outlier
Use rank-based methods, like Kruskal-Wallis or Friedman’s ANOVA which are resistant to outlier effects
Report the results with and without the oultier(s)
Multiple comparisons procedures, from most to least conservative: Sleuth Chapter 6 (next week)
Scheffé: must be used whenever more than one group is combined in a linear contrast, more conservative than Bonferroni
Bonferroni
Tukey’s Honestly Significant Difference (HSD), also called Tukey-Kramer if sample sizes are unequal
Student-Newman-Keuls More powerful than HSD
Benjamini–Hochberg False Discovery Rate
Dunnet’s, appropriate if there is a control group
Tukey’s LSD with F-protection: Use LSD if the overall F statistic is significant; not sufficiently conservative
Report all relevant p values and df needed to reconstruct the ANOVA table
Hurlbert (1984): it wasn’t clear in the majority of ecological studies what test was performed
Avoid the significant/non-significant dichotomy (see Sterne & Smith 2001, Hurlbert et al. 2019, Gelman et al. 2021)
Summarize the results of the ANOVA in the text, table or figure. It is unlikely that a journal will allow both a table and figure, but summary in the text is essential
Report the effect size (i.e., difference in means with 95% confidence intervals, or estimate and standard error)
Report negative results, (e.g., failure to reject the null hypothesis)
Sleuth3 Display 5.2
***
Sleuth3 Display 5.3
If hypotheses are specified in advance, then you can test at a pre-set α level, without a posteriori (or post hoc, multiple comparison) adjustment of α levels. Recall that alpha = P(Type I error)
See Cook & Farewell (1996, J. Roy. Stat. Assoc. A; Posted in Blackboard). In dose-response studies, no need to adjust for number of dose treatments.
One large design allows the use of a more precise estimate of the error variance
Separate control vs. treatment t tests are not powerful
If interaction effects are evident, separate tests can be misleading. They can miss interaction effects.
There is overwhelming evidence that mean lifetimes in the six groups are different (p-value < 0.001); analysis of variance F-test).
Analysis of the 5 particular questions are
There is convincing evidence that lifetime increases as a result of restricting the diet from 85 kcal/wk to 50 kcal/wk (1-sided p-value < 0.0001; t test)
There is no evidence that reducing the calories before weaning increased lifespan, when the caloric intake after weaning is 50 kcal/wk (1-sided p value = 0.32, t test). A 95% CI for the amount by which the lifetime under the R/R50 diet exceeds the lifetime under the N/R50 diet is - 1.7 to 2.9 months.
Further restriction of the diet from 50 to 40 kcal/wk increases lifetime by an estimated 2.8 months (95% CI: 0.5 to 5.1 months). The evidence that this effect is greater than zero is moderate (p=0.017, t test).
There was moderate evidence that lifetime was decreased by the lowering of protein in addition to the 50 kcal/wk diet (2-sided p value =0.024; t-test)
There is convincing evidence that the control mice live longer than the mice on the non-purified (NP) diet (1-sided p-value <0.0001)
Note that all 5 of these hypotheses can be tested as Tukey Least Significant Difference (LSD) tests (or linear contrasts) see Sleuth Ch 6
Case 0501 Box et al. (2005) Graphical ANOVA
Sleuth, page 117: Dr. Spock’s venire contained only 1 woman, who was released by the prosecution
Sleuth3 Display 5.4
Is there evidence that women were underrepresented on the Spock judge’s venires
Is there evidence that there are differences in women’s representation on the other juries?
NBC News 2/27/2015
The defense said 1,373 people, summoned from a population of about 5 million in eastern Massachusetts, were originally given numbers based on a random pool order list. New numbers were later assigned, based on when the jurors reported to court to complete written questionnaires.
The defense argued that the reordering undermined the randomness of the selection process and pushed certain groups — including blacks, people under 30 and people who live in Boston — down on the list and made them less likely to be chosen for the jury.
Sleuth3 Case 0502. See Display 5.5 in Sleuth3
The percentage of women on the Spock judge’s venires were substantially lower than the other judges (t test of Spock judge vs.‘Other judges’)
There is little evidence to reject the null hypothesis of no difference in female representation among the other judges p=0.32 (1-way ANOVA)
The percentage of women is 15% less on the Spock judge’s venires (95% CI: 10% to 20%)
Case0502 Graphical ANOVA, Box et al. (2005)
str(case0501)
## 'data.frame': 349 obs. of 2 variables:
## $ Lifetime: num 35.5 35.4 34.9 34.8 33.8 33.5 32.6 32.4 31.8 31.6 ...
## $ Diet : Factor w/ 6 levels "N/N85","N/R40",..: 4 4 4 4 4 4 4 4 4 4 ...
favstats(Lifetime ~ Diet, data = case0501) # from Horton's mosaic
## Diet min Q1 median Q3 max mean sd n missing
## 1 N/N85 17.9 31.400 33.10 36.40 42.3 32.69123 5.125297 57 0
## 2 N/R40 19.6 42.275 46.05 50.35 54.6 45.11667 6.703406 60 0
## 3 N/R50 18.6 37.950 43.90 48.20 51.9 42.29718 7.768195 71 0
## 4 NP 6.4 24.800 28.90 31.40 35.5 27.40204 6.133701 49 0
## 5 R/R50 24.2 39.150 43.95 48.35 50.7 42.88571 6.683152 56 0
## 6 lopro 23.4 35.000 41.05 46.45 49.7 39.68571 6.991695 56 0
# Re-order levels for better boxplot organization:
myDiet <- factor(case0501$Diet, levels=c("NP","N/N85","N/R50","R/R50","lopro","N/R40") )
myNames <- c("NP(49)","N/N85(57)","N/R50(71)","R/R50(56)","lopro(56)",
"N/R40(60)") # Make these for boxplot labeling.
boxplot(Lifetime ~ myDiet, ylab= "Lifetime (months)", names=myNames,
xlab="Treatment (and sample size)", data=case0501)
densityplot(~Lifetime, groups = Diet, auto.key = TRUE, data=case0501)
myAov1 <- aov(Lifetime ~ Diet, data=case0501) # One-way analysis of variance
myAov1
## Call:
## aov(formula = Lifetime ~ Diet, data = case0501)
##
## Terms:
## Diet Residuals
## Sum of Squares 12733.94 15297.42
## Deg. of Freedom 5 343
##
## Residual standard error: 6.678239
## Estimated effects may be unbalanced
summary(myAov1)
## Df Sum Sq Mean Sq F value Pr(>F)
## Diet 5 12734 2546.8 57.1 <2e-16 ***
## Residuals 343 15297 44.6
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary.lm(myAov1)
##
## Call:
## aov(formula = Lifetime ~ Diet, data = case0501)
##
## Residuals:
## Min 1Q Median 3Q Max
## -25.5167 -3.3857 0.8143 5.1833 10.0143
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 32.6912 0.8846 36.958 < 2e-16 ***
## DietN/R40 12.4254 1.2352 10.059 < 2e-16 ***
## DietN/R50 9.6060 1.1877 8.088 1.06e-14 ***
## DietNP -5.2892 1.3010 -4.065 5.95e-05 ***
## DietR/R50 10.1945 1.2565 8.113 8.88e-15 ***
## Dietlopro 6.9945 1.2565 5.567 5.25e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.678 on 343 degrees of freedom
## Multiple R-squared: 0.4543, Adjusted R-squared: 0.4463
## F-statistic: 57.1 on 5 and 343 DF, p-value: < 2.2e-16
attributes(myAov1)
## $names
## [1] "coefficients" "residuals" "effects" "rank"
## [5] "fitted.values" "assign" "qr" "df.residual"
## [9] "contrasts" "xlevels" "call" "terms"
## [13] "model"
##
## $class
## [1] "aov" "lm"
# Residual vs. fitted plot
plot(myAov1, which=1) # Plot residuals versus estimated means.
summary(myAov1)
## Df Sum Sq Mean Sq F value Pr(>F)
## Diet 5 12734 2546.8 57.1 <2e-16 ***
## Residuals 343 15297 44.6
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The residual plot indicates no violation of assumptions.
TukeyHSD(myAov1)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Lifetime ~ Diet, data = case0501)
##
## $Diet
## diff lwr upr p adj
## N/R40-N/N85 12.4254386 8.885436 15.9654413 0.0000000
## N/R50-N/N85 9.6059550 6.202170 13.0097399 0.0000000
## NP-N/N85 -5.2891873 -9.017748 -1.5606269 0.0008380
## R/R50-N/N85 10.1944862 6.593417 13.7955556 0.0000000
## lopro-N/N85 6.9944862 3.393417 10.5955556 0.0000008
## N/R50-N/R40 -2.8194836 -6.175736 0.5367684 0.1564608
## NP-N/R40 -17.7146259 -21.399845 -14.0294069 0.0000000
## R/R50-N/R40 -2.2309524 -5.787127 1.3252222 0.4684413
## lopro-N/R40 -5.4309524 -8.987127 -1.8747778 0.0002306
## NP-N/R50 -14.8951423 -18.449713 -11.3405719 0.0000000
## R/R50-N/R50 0.5885312 -2.832070 4.0091319 0.9963976
## lopro-N/R50 -2.6114688 -6.032070 0.8091319 0.2460200
## R/R50-NP 15.4836735 11.739756 19.2275913 0.0000000
## lopro-NP 12.2836735 8.539756 16.0275913 0.0000000
## lopro-R/R50 -3.2000000 -6.816968 0.4169683 0.1167873
plot(TukeyHSD(myAov1))
anovaPlot(myAov1, stacked = TRUE, base = TRUE, axes = TRUE,
faclab = TRUE, labels = TRUE, cex = par("cex"),
cex.lab = par("cex.lab"))
The graphical ANOVA shows the strong differences between most of the diets, especially NP and N/R40
pairwise.t.test(case0501$Lifetime,case0501$Diet, pool.SD=TRUE, p.adj="none")
##
## Pairwise comparisons using t tests with pooled SD
##
## data: case0501$Lifetime and case0501$Diet
##
## N/N85 N/R40 N/R50 NP R/R50
## N/R40 < 2e-16 - - - -
## N/R50 1.1e-14 0.017 - - -
## NP 5.9e-05 < 2e-16 < 2e-16 - -
## R/R50 8.9e-15 0.073 0.622 < 2e-16 -
## lopro 5.2e-08 1.6e-05 0.029 < 2e-16 0.012
##
## P value adjustment method: none
if(require(multcomp)){
diet <- factor(case0501$Diet,labels=c("lopro", "NN85", "NR40", "NR50", "NP", "RR50"))
myAov2 <- aov(Lifetime ~ diet - 1, data=case0501) # Note the -1 is in the Sleuth3 vignette
# It drops the dietlopro diet from the analysis, making it the llinear model
# reference group
summary(myAov2)
myComparisons <- glht(myAov2,
linfct=c("dietNR50 - dietNN85 = 0",
"dietNR40 - dietNR50 = 0",
"dietRR50 - dietNR50 = 0",
"dietlopro - dietNR50 = 0",
"dietNN85 - dietNP = 0") )
summary(myComparisons,test=adjusted("none")) # No multiple comparison adjust.
confint(myComparisons, calpha = univariate_calpha()) # No adjustment
}
##
## Simultaneous Confidence Intervals
##
## Fit: aov(formula = Lifetime ~ diet - 1, data = case0501)
##
## Quantile = 1.9669
## 95% confidence level
##
##
## Linear Hypotheses:
## Estimate lwr upr
## dietNR50 - dietNN85 == 0 -17.7146 -20.2438 -15.1854
## dietNR40 - dietNR50 == 0 14.8951 12.4556 17.3347
## dietRR50 - dietNR50 == 0 12.2837 9.7142 14.8532
## dietlopro - dietNR50 == 0 5.2892 2.7302 7.8481
## dietNN85 - dietNP == 0 2.2310 -0.2097 4.6716
str(case0502)
## 'data.frame': 46 obs. of 2 variables:
## $ Percent: num 6.4 8.7 13.3 13.6 15 15.2 17.7 18.6 23.1 16.8 ...
## $ Judge : Factor w/ 7 levels "A","B","C","D",..: 7 7 7 7 7 7 7 7 7 1 ...
favstats(Percent ~ Judge, data = case0502) # from Horton's mosaic
## Judge min Q1 median Q3 max mean sd n missing
## 1 A 16.8 30.800 33.60 40.500 48.9 34.12000 11.941817 5 0
## 2 B 27.0 29.675 32.35 34.800 45.6 33.61667 6.582224 6 0
## 3 C 21.0 27.500 30.50 32.500 33.8 29.10000 4.592929 9 0
## 4 D 24.3 25.650 27.00 28.350 29.7 27.00000 3.818377 2 0
## 5 E 17.7 20.150 24.70 33.075 40.2 26.96667 9.010142 6 0
## 6 F 16.5 23.500 26.70 29.800 36.2 26.80000 5.968878 9 0
## 7 Spock's 6.4 13.300 15.00 17.700 23.1 14.62222 5.038794 9 0
# Re-order levels for better boxplot organization:
## Make new factor level names (with sample sizes) and analyze boxplots
myNames <- c("A (5)", "B (6)", "C (9)", "D (2)", "E (6)", "F (9)", "Spock's (9)")
boxplot(Percent ~ Judge, ylab = "Percent of Women on Judges' Venires",
names = myNames, xlab = "Judge (and number of venires)",
main = "Percent Women on Venires of 7 Massachusetts Judges",
data=case0502)
boxplot(Percent ~ Judge, ylab= "Percent of Women on Judges' Venires",
names=myNames, xlab="Judge (and number of venires)",
main= "Percent Women on Venires of 7 Massachusetts Judges",
col="green", boxlwd=2, medlwd=2, whisklty=1, whisklwd=2,
staplewex=.2, staplelwd=2, outlwd=2, outpch=21, outbg="green",
outcex=1.5, data=case0502)
myAov2 <- aov(Percent ~ Judge, data=case0502)
summary(myAov2) # Initial screening. Any evidence of judge differences? (yes)
## Df Sum Sq Mean Sq F value Pr(>F)
## Judge 6 1927 321.2 6.718 6.1e-05 ***
## Residuals 39 1864 47.8
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary.lm(myAov2)
##
## Call:
## aov(formula = Percent ~ Judge, data = case0502)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.320 -4.367 -0.250 3.319 14.780
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 34.1200 3.0921 11.034 1.47e-13 ***
## JudgeB -0.5033 4.1868 -0.120 0.9049
## JudgeC -5.0200 3.8566 -1.302 0.2007
## JudgeD -7.1200 5.7848 -1.231 0.2258
## JudgeE -7.1533 4.1868 -1.709 0.0955 .
## JudgeF -7.3200 3.8566 -1.898 0.0651 .
## JudgeSpock's -19.4978 3.8566 -5.056 1.05e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.914 on 39 degrees of freedom
## Multiple R-squared: 0.5083, Adjusted R-squared: 0.4326
## F-statistic: 6.718 on 6 and 39 DF, p-value: 6.096e-05
attributes(myAov2)
## $names
## [1] "coefficients" "residuals" "effects" "rank"
## [5] "fitted.values" "assign" "qr" "df.residual"
## [9] "contrasts" "xlevels" "call" "terms"
## [13] "model"
##
## $class
## [1] "aov" "lm"
# Residual vs. fitted plot
plot(myAov2, which=1) # Plot residuals versus estimated means.
summary(myAov2)
## Df Sum Sq Mean Sq F value Pr(>F)
## Judge 6 1927 321.2 6.718 6.1e-05 ***
## Residuals 39 1864 47.8
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The residual plot indicates no apparent violation of assumptions.
SpockOrOther <- factor(ifelse(case0502$Judge=="Spock's","Spock","Other"))
aovFull <- aov(Percent ~ Judge, data=case0502)
aovReduced <- aov(Percent ~ SpockOrOther, data=case0502)
anova(aovReduced,aovFull) #Any evidence that 7 means fit better than 2 means?
## Analysis of Variance Table
##
## Model 1: Percent ~ SpockOrOther
## Model 2: Percent ~ Judge
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 44 2190.9
## 2 39 1864.5 5 326.46 1.3658 0.2582
# Evidence that 2 means differ?
t.test(Percent ~ SpockOrOther, var.equal=TRUE, data=case0502)
##
## Two Sample t-test
##
## data: Percent by SpockOrOther
## t = 5.6697, df = 44, p-value = 1.03e-06
## alternative hypothesis: true difference in means between group Other and group Spock is not equal to 0
## 95 percent confidence interval:
## 9.584045 20.155294
## sample estimates:
## mean in group Other mean in group Spock
## 29.49189 14.62222
There is strong evidence that the Spock judge differs from the other 6. There is little evidence for a difference among the 6 non-Spock judges
anovaPlot(aovFull, stacked = TRUE, base = TRUE, axes = TRUE,
faclab = TRUE, labels = TRUE, cex = par("cex"),
cex.lab = par("cex.lab"))
There is strong evidence from the Box et al. (2005) graphical ANOVA that the Spock juries differs from the other 6 in the percentage of females.
myAov3 <- aov(Percent ~ Judge - 1, data = case0502)
myContrast <- rbind(c(1/6, 1/6, 1/6, 1/6, 1/6, 1/6, - 1))
if(require(multcomp)) {
myComparison <- glht(myAov3, linfct=myContrast)
summary(myComparison, test=adjusted("none"))
confint(myComparison)
}
##
## Simultaneous Confidence Intervals
##
## Fit: aov(formula = Percent ~ Judge - 1, data = case0502)
##
## Quantile = 2.0227
## 95% family-wise confidence level
##
##
## Linear Hypotheses:
## Estimate lwr upr
## 1 == 0 14.9783 9.6348 20.3219
myAov3 <- aov(Percent ~ Judge - 1)
myContrast <- rbind(c(1/6, 1/6, 1/6, 1/6, 1/6, 1/6, - 1)) if(require(multcomp)){ # use multcomp library myComparison <- glht(myAov3, linfct=myContrast) summary(myComparison, test=adjusted(“none”))
confint(myComparison) }
Comparisons among means in ANOVA can be analyzed using t statistics, with a new, more precise estimate of pooled error. It is that pooling, with higher df, that makes ANOVA a more powerful method than multiple t tests.
sp = √ Error Mean Square = √ Within Groups MS
Case 5.1 Pooled standard error
Extra Sum of Squares F test
F distribution
F distribution
Display 5.10
Case0502 ANOVA table
Case0502 ANOVA linear contrast
myContrast <- rbind(c(1/6, 1/6, 1/6, 1/6, 1/6, 1/6, - 1))
if(require(multcomp)){ # use multcomp library
myComparison <- glht(myAov3, linfct=myContrast)
summary(myComparison, test=adjusted("none"))
confint(myComparison)
}
##
## Simultaneous Confidence Intervals
##
## Fit: aov(formula = Percent ~ Judge - 1, data = case0502)
##
## Quantile = 2.0227
## 95% family-wise confidence level
##
##
## Linear Hypotheses:
## Estimate lwr upr
## 1 == 0 14.9783 9.6348 20.3219
See Sleuth Page 137-139 in 3rd ed, 136-138 in 2nd: ‘The Random Effects model’
The differences among subgroup means is NOT of intrinsic interest (not the case with Judges)
If the number of levels of a factor is small relative to the total possible levels of a factor (not the case with district Judges since ALL were sampled)
Are the subgroups a representative or random sample of some larger group? (No for judges)
Quinn and Keough on random effects
Model I ANOVA: Fixed effects ANOVA: test for differences in the averages among groups
Model II ANOVA: Random effects ANOVA: test differences in variances due to the group classification
Model III model: Fixed & random factors
Note: The calculations are often identical for random and fixed-effects ANOVA, but the interpretations are different. With Factorial ANOVA (>1 factor), the F statistics differ among models, with a different denominator mean square for random factors. The inference allowed differs among models
Normality is not critical. Extremely long-tailed distributions or skewed distributions, coupled with different sample sizes present the only serious distributional problems
The assumptions of independence within and across groups is critical
The assumption of equal standard deviations in the populations is crucial. Also called the equal variance, equal spread, or homoscedasticity assumption (vs. Heterorscedasticity). Winer et al. (1991) argue that ANOVA is robust to violations of equal variance if the sample sizes are equal.
The tools are not resistant to severely outlying observations
Check the assumptions by analyzing the residuals with Levene’s test and boxplots
Sleuth3 Display 5.13 p 131
Sleuth3 Display 5.15 p 133
Sleuth3 Display 5.19 p 139
Cumming and Finch (2005) Inference by eye
Cumming and Finch (2005) Inference by eye
Always report the effect size and NEVER report ‘significant’ or ‘not significant’
Deming: report effect sizes for tests
Many statistically significant results are trivial ecologically (environmentally, chemically or socially)
Most null hypotheses (μ1 = μ2) are false and the p-value is often dependent on the sample size
Gelman never reports p-values in his statistics books, usually effect and standard error
e.g., a p value of 0.00001 may not be ecologically meaningful if there is only a minor difference in effect and a much larger difference causes meaningful changes in the ecosystem
Test statistics with large p values (>0.1) but with broad 95% confidence intervals may be consistent with important ecological or other effects
Buhl-Mortensen (1996) The Precautionary Principle
Type II error and Power
Type II error and Power
Homoscedasticity or equal variance among groups
Levene’s test a rough guide
Boxplots or residual plots are the standard tools for assessing homoscedasticity (equal variance among groups)
Spread vs. Level plots
Independence of errors among groups a key ANOVA assumption
Normally distributed errors (not underlying data) not crucial
Ties correction must be used
Effect sizes, hierarchic structure, and covariates difficult to handle
Fixed vs. random effects
The choice of fixed vs. random effects is often crucial and depends on whether the factor levels (judges in the Spock example) represent a random or representative sample from some larger statistical population
The F statistics and interpretation of the results sometimes change depending on whether fixed or random effects are chosen