Analysis of Variance which turn in facts of using Groups Means for a Sums of SQuared deviations split (Between and Within SS) is a conventional choice of test statistics when:
Normality within Each group is assumed ; check (Shapiro test and histograms)
Within group variance are about equal (check Var1/Var2 ratio <=2 is Ok_Levene or Barlett test)
Subject are i.i.d (independent and identically distributed).
(outliers obs. questionable_mild assumptions)
CAUTIONS:
However in practice loosing a subjcet is very common with experiments or missing a data points Hence it result in unbalanced group counts (nji) decreasing the power of the ANOVA. However, until a certain amount of unbalanced is acceptable when it reaches huge discrepancies in group count deteriorate is validity:
The differents unbalanced nj might also increase noticeably the withinvariance group ruling out the assumptions of 2-.(“oneway” test accomodate this by adjusting the degree of freedom and the F statitsic SS: How in a next chapter in the blog).
Always coupled you Anova with a NPT.
Normality Variance equal and i.id is assumed here!
ANOVA BY HAND
When your roommate come and tell you:“I ve got your figures of your survey or experiments: Here are the n, sd, mean of groups” I wanted an ANOVA test to see a possible effect on the presumed group…
You tell him ! “What a hell is that: Where is the raw/file csv data???” This is not proper way to do:
your boss
He tells you I don’t know where is this s…cvs but probably in the UNIVERSITY that conduct the study but the guy who did it left with is computer at UNIGE…(true story) :No csv sorry!
Too long to wait for : your boss room mate implores you…
#ensure that groups are factored in R#SUM OF SQUARE RESIDUAL (WITHIN GROUPS VAR)###SSR = Sum( (nj-1)*varXj)RSS4=(length(mtcars$mpg[mtcars$cyl==4])-1)*var(mtcars$mpg[mtcars$cyl==4])RSS6=(length(mtcars$mpg[mtcars$cyl==6])-1)*var(mtcars$mpg[mtcars$cyl==6])RSS8=(length(mtcars$mpg[mtcars$cyl==8])-1)*var(mtcars$mpg[mtcars$cyl==8])SSRT=sum(RSS4,RSS8,RSS6)SSRT
[1] 301.2626
###Sum of Square TotalSST =sum((mtcars$mpg-20.09)^2)#each Xi minus Grand-meanSST
[1] 1126.047
###SUM ÒF SQUARE GROUPS( Between Variance)##SSGA=tapply(mtcars$mpg,mtcars$cyl,mean)kable(A,col.names ="mean nj")
sum((mtcars$mpg-20.0885)^2)###Total sum of sq each Xis
[1] 1126.047
SSGT+SSRT## Betwween plus witin SS
[1] 1125.535
###R GIVES YOU:aov(lm(mtcars$mpg~factor(mtcars$cyl)))#std sequential anova type I
Call:
aov(formula = lm(mtcars$mpg ~ factor(mtcars$cyl)))
Terms:
factor(mtcars$cyl) Residuals
Sum of Squares 824.7846 301.2626
Deg. of Freedom 2 29
Residual standard error: 3.223099
Estimated effects may be unbalanced
Anova(lm(mtcars$mpg~factor(mtcars$cyl)))#Anova ::car Anova test II and III SS
Anova Table (Type II tests)
Response: mtcars$mpg
Sum Sq Df F value Pr(>F)
factor(mtcars$cyl) 824.78 2 39.697 4.979e-09 ***
Residuals 301.26 29
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
NOTE A COMMON MISTAKE IN (UNBALANCED) GRAND MEAN CALCULATIONS:
####NOTE for GRAND MEAN COMMON MISTAKE!(11*26.66+7*19.74+14*15.1)/32#Weighted mean
[1] 20.08875
(15.74+26.66+19.74)/3#sum des means/3 not equal to GRAND mean ANOVA:: The weighted depends form nj
[1] 20.71333
AN F -TEST
MSG=SSGT/(3-1)##df=2MSG
[1] 412.1364
MSR=SSRT/(32-3)##df32 minus nbrs of groupsFT=MSG/MSRFT
[1] 39.67288
1-pf(39.67,2,29)
[1] 5.015716e-09
pf(39.67,2,29,lower.tail = F)##eq.
[1] 5.015716e-09
####UNBALANCED GROUP TEST BY F TYPE III SS!!!###Use library car ANOVA fy for two way unbalanced test
WHEN VARIANCE HOMEGENEITY ASSUMPTIONS ARE UNMET:`
oneway.test(mtcars$mpg~factor(mtcars$cyl))
One-way analysis of means (not assuming equal variances)
data: mtcars$mpg and factor(mtcars$cyl)
F = 31.624, num df = 2.000, denom df = 18.032, p-value = 1.271e-06
Coming soon : How the oneway test is constructed.
WHEN EVERYTHING IN ANOVA GOES WRONG
When severe departure form Assumptions are entitled revert to a Non Parametric Test (Kruskall Wallis i.e).
kruskal.test(mtcars$mpg~mtcars$cyl)
Kruskal-Wallis rank sum test
data: mtcars$mpg by mtcars$cyl
Kruskal-Wallis chi-squared = 25.746, df = 2, p-value = 2.566e-06
pchisq(25.746,2,lower.tail =FALSE)##Pchisq is a lower tail FALSE