Hulk adore les stat! Il en fait tellement qu’il est devenu vert de rage!

ASSUMPTIONS (fr::Critères) FOR ANOVA

Note: “i” is the individual level, “j” is the group belonging the “i’s

Note [X is a RV] then VAR(aX) that is \(a^2\)VAR(X)

Note sj^2=VAR(Xj)

Populations Parameters = statistics of your table from sampling:

Then simulate rnorm vectors nj, withicount of 4 groups j=1,2,3,4

n=c(19,20,15,17)
mean1=c(6.1,6.6,7.7,8.7)#ne jamais appeler un vector nom d'une fonction
stdevj1234=c(1.5,1.8,2.0,1.5)
mytable=data.frame("n"=n,"mean~j"=mean1,"sd~j"=stdevj1234)
kable(mytable)
n mean.j sd.j
19 6.1 1.5
20 6.6 1.8
15 7.7 2.0
17 8.7 1.5
#making 4 VECTORS WITH RNORM AND TABLES PARAMETERS
set.seed(123)
V1=rnorm(19,6.1,1.5)#parameters table line1...
V2=rnorm(20,6.6,1.8)#parameters table line1...
V3=rnorm(15,7.7,2)
V4=rnorm(17,8.7,1.5)
#make a long vector
V1234=c(V1,V2,V3,V4)
mean(V1234)#this is your Grand Mean
## [1] 7.303119
describe(V1234)##::psych
##    vars  n mean   sd median trimmed  mad  min   max range skew kurtosis   se
## X1    1 71  7.3 1.84   7.17    7.26 1.66 3.15 12.04  8.89 0.22    -0.12 0.22
#vous pouvez décrire par groupes aussi tapez ?describe
#a trimmed mean is a mean without the 10 % tails (outliers)

Preparing for ANOVA

needs 4 groups here = create 4 groups FACTOR

Myf=c(rep("0mat",19),rep("1er",20),rep("2er",15),rep("3er",17))
#pourquoi mets-je 0 devant mat?
length(Myf)
## [1] 71
str(Myf)
##  chr [1:71] "0mat" "0mat" "0mat" "0mat" "0mat" "0mat" "0mat" "0mat" "0mat" ...
#71 count id OK
#Myf must be a group or factor
Myf=factor(Myf)#gl is eq : old fx gl Splus

CREATE A LONG FORMAT dataframe with Groups (Time mat 1er…)

simu181=data.frame("measure"=V1234,"Groupsj"=Myf)
summary(simu181)
##     measure       Groupsj  
##  Min.   : 3.150   0mat:19  
##  1st Qu.: 6.137   1er :20  
##  Median : 7.172   2er :15  
##  Mean   : 7.303   3er :17  
##  3rd Qu.: 8.400            
##  Max.   :12.038
kable(tapply(simu181$measure,simu181$Groupsj,sd),caption="sd")#verif SD
sd
x
0mat 1.482317
1er 1.497816
2er 1.893889
3er 1.368663
kable(tapply(simu181$measure,simu181$Groupsj,mean),caption="mean")#verif MEAN
mean
x
0mat 6.360942
1er 6.499428
2er 7.844384
3er 8.824073
#or
describe.by(simu181$measure,simu181$Groupsj)
## Warning: describe.by is deprecated. Please use the describeBy function
## 
##  Descriptive statistics by group 
## group: 0mat
##    vars  n mean   sd median trimmed  mad  min  max range  skew kurtosis   se
## X1    1 19 6.36 1.48   6.29    6.41 1.28 3.15 8.78  5.63 -0.14    -0.57 0.34
## ------------------------------------------------------------ 
## group: 1er
##    vars  n mean  sd median trimmed  mad  min  max range  skew kurtosis   se
## X1    1 20  6.5 1.5   6.35    6.55 2.03 3.56 8.86  5.29 -0.18    -1.23 0.33
## ------------------------------------------------------------ 
## group: 2er
##    vars  n mean   sd median trimmed  mad  min   max range skew kurtosis   se
## X1    1 15 7.84 1.89   7.53    7.73 1.14 5.17 12.04  6.87 0.65    -0.53 0.49
## ------------------------------------------------------------ 
## group: 3er
##    vars  n mean   sd median trimmed  mad  min   max range skew kurtosis   se
## X1    1 17 8.82 1.37   8.89    8.79 1.02 6.38 11.78   5.4 0.28    -0.42 0.33

Important note: The data cannot be made in wide format as the vectors has not the same length see [http://www.cookbook-r.com/Manipulating_data/Converting_data_between_wide_and_long_format/]

SOME PLOT TO SEE WHATS HAPPENS

##Normality hard to see Make shapiro instead
par(mfrow=c(2,2))
hist(V1)
hist(V2)
hist(V3)
hist(V4)

#normality is hard to see in Student distributed RV
par(mfrow=c(1,1))
boxplot(simu181$measure~simu181$Groupsj,col=rainbow(4))

stripchart(V1234~(Myf),col=rainbow(4),cex=2,pch=20,main="Yij / 4 groups")

#group J=3 and j=2 as the largest within sd
plotmeans(simu181$measure~simu181$Groupsj,main="Graph 1:Mean and SD by groups",xlab="Time Group",ylab="mean-sd",cex=3,col=rainbow(4),pch=20)
abline(h=7.2,col=2)
text(0.7,7.2,"Grand-Mean",col=2,lty=2)

#NOTE that mat comes on last place on the factor

ANOVA TEST R

aov(lm(simu181$measure~simu181$Groupsj))#lm is needed
## Call:
##    aov(formula = lm(simu181$measure ~ simu181$Groupsj))
## 
## Terms:
##                 simu181$Groupsj Residuals
## Sum of Squares         73.50526 162.36354
## Deg. of Freedom               3        67
## 
## Residual standard error: 1.556707
## Estimated effects may be unbalanced
AOV181=aov(lm(simu181$measure~simu181$Groupsj))#Anova object
summary(AOV181)
##                 Df Sum Sq Mean Sq F value  Pr(>F)    
## simu181$Groupsj  3  73.51  24.502   10.11 1.4e-05 ***
## Residuals       67 162.36   2.423                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
AOV181$df.residual#df2=ntot-groups
## [1] 67
AOV181$xlevels #df1=4-1
## $`simu181$Groupsj`
## [1] "0mat" "1er"  "2er"  "3er"
f=24.5/2.42
f
## [1] 10.12397
pval=pf(10.12,3,67,lower.tail = FALSE)#F distributio se lit de droite a guache seulement
pval
## [1] 1.3893e-05
1-pf(10.11,3,67)#same car TOTAL d'une aire de distribution prob=1-P(f>F)
## [1] 1.403281e-05
#valeur alpha crit 5 trouver la valeur de decision de F(0.95,3,67)
Fcrit=qf(0.95,3,67)
Fcrit
## [1] 2.741574

P(T>t=2.74)<<< 0.0001 MSQ residuals and MSQ group(between donne F= MS between / Ms within) -cours: an:MSQ==fr:SCM

F distribution: ratio de deux variance or 2MSQ

Pour trouver quel groupe est differents :

MULTICOMPARAISONS: les p-valeurs doivent être adjustée Méthodes HOLMES OU BONFERRONI

TRICK: En fait un truc “easy” pour alfacrit : Prenez 5% et divisez par le nbrs de tests a faire.

#1
TukeyHSD(AOV181)#ANOVA OBJECT
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = lm(simu181$measure ~ simu181$Groupsj))
## 
## $`simu181$Groupsj`
##               diff         lwr      upr     p adj
## 1er-0mat 0.1384861 -1.17546321 1.452435 0.9924539
## 2er-0mat 1.4834415  0.06681570 2.900067 0.0366339
## 3er-0mat 2.4631310  1.09386410 3.832398 0.0000674
## 2er-1er  1.3449554 -0.05595873 2.745869 0.0644222
## 3er-1er  2.3246448  0.97163951 3.677650 0.0001456
## 3er-2er  0.9796895 -0.47323487 2.432614 0.2935677
#mat 3 -er est tres différents
#2
pairwise.t.test(simu181$measure,simu181$Groupsj)#même conclusion
## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  simu181$measure and simu181$Groupsj 
## 
##     0mat    1er     2er    
## 1er 0.78211 -       -      
## 2er 0.02988 0.04135 -      
## 3er 6.9e-05 0.00013 0.16037
## 
## P value adjustment method: holm

Pourquoi peut on faire des predictions avec un ANOVA

En fait dans oav vous avez la fonction lm (Rgerssion lineaire qui est une droite affine d’