ANOVA BY HAND

Author

Mudry JM MDY STATER

ABOUT ANOVA GOOD PRACTICES AND TRICKS

Analysis of Variance which turn in facts of using Groups Means for a Sums of SQuared deviations split (Between and Within SS) is a conventional choice of test statistics when:

  1. Normality within Each group is assumed ; check (Shapiro test and histograms)

  2. Within group variance are about equal (check Var1/Var2 ratio <=2 is Ok_Levene or Barlett test)

  3. Subject are i.i.d (independent and identically distributed).

  4. (outliers obs. questionable_mild assumptions)

CAUTIONS:

However in practice loosing a subjcet is very common with experiments or missing a data points Hence it result in unbalanced group counts (nji) decreasing the power of the ANOVA. However, until a certain amount of unbalanced is acceptable when it reaches huge discrepancies in group count deteriorate is validity:

The differents unbalanced nj might also increase noticeably the within variance group ruling out the assumptions of 2-.(“oneway” test accomodate this by adjusting the degree of freedom and the F statitsic SS: How in a next chapter in the blog).

Always coupled you Anova with a NPT.

Normality Variance equal and i.id is assumed here!

ANOVA BY HAND

When your roommate come and tell you:“I ve got your figures of your survey or experiments: Here are the n, sd, mean of groups” I wanted an ANOVA test to see a possible effect on the presumed group…

You tell him ! “What a hell is that: Where is the raw/file csv data???” This is not proper way to do:

your boss

He tells you I don’t know where is this s…cvs but probably in the UNIVERSITY that conduct the study but the guy who did it left with is computer at UNIGE…(true story) :No csv sorry!

Too long to wait for : your boss room mate implores you…

Do the calculation please!

How to get read of that stuff rapidly:

ANOVA BY HAND in R:

SS WITHIN: \[ \sum_{j = 1}^{j}\sum_{i = 1}^{nj}{(\bar{x} - x_ji)^2}/(n_j-1)=\sum\sigma^2_j \] SS BETWEEN: \[ \sum_{j = 1}^{j}\sum_{i = 1}^{nj}{((\bar{x}_j - \bar{x}\bar{x})^2}/(n_j))=\sum\sigma^2\bar{x}_j \]

#ensure that groups are factored in R

#SUM OF SQUARE RESIDUAL (WITHIN GROUPS VAR)
###SSR = Sum( (nj-1)*varXj)



RSS4=(length(mtcars$mpg[mtcars$cyl==4])-1)*var(mtcars$mpg[mtcars$cyl==4])
RSS6=(length(mtcars$mpg[mtcars$cyl==6])-1)*var(mtcars$mpg[mtcars$cyl==6])
RSS8=(length(mtcars$mpg[mtcars$cyl==8])-1)*var(mtcars$mpg[mtcars$cyl==8])
SSRT=sum(RSS4,RSS8,RSS6)
SSRT
[1] 301.2626
###Sum of Square Total
SST = sum((mtcars$mpg-20.09)^2)#each Xi minus Grand-mean
SST
[1] 1126.047
###SUM ÒF SQUARE GROUPS( Between Variance)
##SSG
A=tapply(mtcars$mpg,mtcars$cyl,mean)
kable(A,col.names = "mean nj")
mean nj
4 26.66364
6 19.74286
8 15.10000
Nj=tapply(mtcars$mpg,mtcars$cyl,length)
kable(Nj,col.names = "count nj")
count nj
4 11
6 7
8 14
SSG4=11*((26.66-20.09)^2)
SSG6=7*((19.74-20.09)^2)
SSG8=14*((15.10-20.09)^2)
SSGT=sum(SSG4,SSG6,SSG8)##SSGROUPES
SSGT
[1] 824.2728
sum((mtcars$mpg-20.0885)^2)###Total sum of sq each Xis
[1] 1126.047
SSGT+SSRT## Betwween plus witin SS
[1] 1125.535
###R GIVES YOU:
aov(lm(mtcars$mpg~factor(mtcars$cyl)))#std sequential anova type I
Call:
   aov(formula = lm(mtcars$mpg ~ factor(mtcars$cyl)))

Terms:
                factor(mtcars$cyl) Residuals
Sum of Squares            824.7846  301.2626
Deg. of Freedom                  2        29

Residual standard error: 3.223099
Estimated effects may be unbalanced
Anova(lm(mtcars$mpg~factor(mtcars$cyl)))#Anova ::car Anova test II and III SS
Anova Table (Type II tests)

Response: mtcars$mpg
                   Sum Sq Df F value    Pr(>F)    
factor(mtcars$cyl) 824.78  2  39.697 4.979e-09 ***
Residuals          301.26 29                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

NOTE A COMMON MISTAKE IN (UNBALANCED) GRAND MEAN CALCULATIONS:

####NOTE for GRAND MEAN COMMON MISTAKE!
(11*26.66+7*19.74+14*15.1)/32#Weighted mean
[1] 20.08875
(15.74+26.66+19.74)/3#sum des means/3 not equal to GRAND mean ANOVA:: The weighted depends form nj
[1] 20.71333

AN F -TEST

MSG=SSGT/(3-1)##df=2
MSG
[1] 412.1364
MSR=SSRT/(32-3)##df32 minus nbrs of groups
FT=MSG/MSR
FT
[1] 39.67288
1-pf(39.67,2,29)
[1] 5.015716e-09
pf(39.67,2,29,lower.tail = F)##eq.
[1] 5.015716e-09
####UNBALANCED GROUP TEST BY F TYPE III SS!!!
###Use library car ANOVA fy for two way unbalanced test

WHEN VARIANCE HOMEGENEITY ASSUMPTIONS ARE UNMET:`

oneway.test(mtcars$mpg~factor(mtcars$cyl))

    One-way analysis of means (not assuming equal variances)

data:  mtcars$mpg and factor(mtcars$cyl)
F = 31.624, num df = 2.000, denom df = 18.032, p-value = 1.271e-06

Coming soon : How the oneway test is constructed.


WHEN EVERYTHING IN ANOVA GOES WRONG

When severe departure form Assumptions are entitled revert to a Non Parametric Test (Kruskall Wallis i.e).

kruskal.test(mtcars$mpg~mtcars$cyl)

    Kruskal-Wallis rank sum test

data:  mtcars$mpg by mtcars$cyl
Kruskal-Wallis chi-squared = 25.746, df = 2, p-value = 2.566e-06
pchisq(25.746,2,lower.tail = FALSE)##Pchisq is a lower tail FALSE
[1] 2.566416e-06
1-pchisq(25.746,2,lower.tail = FALSE)
[1] 0.9999974
##similar conclusion as ANOVA
ABBREVIATIONS:


- NPT: Non parametric Test

- SS: Sum of Square

REFERENCES

Anova by simulation

An R and S-Plus Companion to Applied Regression, John Fox · 2002