Hulk adore les stat! Il en fait tellement qu’il est devenu vert de rage!
Note: “i” is the individual level, “j” is the group belonging the “i’s”
Echantillon: individus indépendant said i.i.d (independant and identically distributed): Par exemple un frère et une soeur ne sont pas indépendanst ou des individus mesurés deux fois ce n’est pas indépendant (mesures corrélées ou par abus PAIRED)
Normally distributed : If n >30 OK=NORMAL if n<<30 alors student – Consider student as Normal with mu and sd as parameters – Faites un shapiro.test pour valider sur l’histogramme pas visible si n<30
Within group variance about equal (Fr:Variance intra-group) – Rule of thumb \(sd_j=1\) <= 2.5 * \(sd_j=_2\) THEN ANOVA ASSUMPTIONS OK
Note [X is a RV] then VAR(aX) that is \(a^2\)VAR(X)
Note sj^2=VAR(Xj)
n=c(19,20,15,17)
mean1=c(6.1,6.6,7.7,8.7)#ne jamais appeler un vector nom d'une fonction
stdevj1234=c(1.5,1.8,2.0,1.5)
mytable=data.frame("n"=n,"mean~j"=mean1,"sd~j"=stdevj1234)
kable(mytable)
| n | mean.j | sd.j |
|---|---|---|
| 19 | 6.1 | 1.5 |
| 20 | 6.6 | 1.8 |
| 15 | 7.7 | 2.0 |
| 17 | 8.7 | 1.5 |
#making 4 VECTORS WITH RNORM AND TABLES PARAMETERS
set.seed(123)
V1=rnorm(19,6.1,1.5)#parameters table line1...
V2=rnorm(20,6.6,1.8)#parameters table line1...
V3=rnorm(15,7.7,2)
V4=rnorm(17,8.7,1.5)
#make a long vector
V1234=c(V1,V2,V3,V4)
mean(V1234)#this is your Grand Mean
## [1] 7.303119
describe(V1234)##::psych
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 71 7.3 1.84 7.17 7.26 1.66 3.15 12.04 8.89 0.22 -0.12 0.22
#vous pouvez décrire par groupes aussi tapez ?describe
#a trimmed mean is a mean without the 10 % tails (outliers)
Myf=c(rep("0mat",19),rep("1er",20),rep("2er",15),rep("3er",17))
#pourquoi mets-je 0 devant mat?
length(Myf)
## [1] 71
str(Myf)
## chr [1:71] "0mat" "0mat" "0mat" "0mat" "0mat" "0mat" "0mat" "0mat" "0mat" ...
#71 count id OK
#Myf must be a group or factor
Myf=factor(Myf)#gl is eq : old fx gl Splus
simu181=data.frame("measure"=V1234,"Groupsj"=Myf)
summary(simu181)
## measure Groupsj
## Min. : 3.150 0mat:19
## 1st Qu.: 6.137 1er :20
## Median : 7.172 2er :15
## Mean : 7.303 3er :17
## 3rd Qu.: 8.400
## Max. :12.038
kable(tapply(simu181$measure,simu181$Groupsj,sd),caption="sd")#verif SD
| x | |
|---|---|
| 0mat | 1.482317 |
| 1er | 1.497816 |
| 2er | 1.893889 |
| 3er | 1.368663 |
kable(tapply(simu181$measure,simu181$Groupsj,mean),caption="mean")#verif MEAN
| x | |
|---|---|
| 0mat | 6.360942 |
| 1er | 6.499428 |
| 2er | 7.844384 |
| 3er | 8.824073 |
#or
describe.by(simu181$measure,simu181$Groupsj)
## Warning: describe.by is deprecated. Please use the describeBy function
##
## Descriptive statistics by group
## group: 0mat
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 19 6.36 1.48 6.29 6.41 1.28 3.15 8.78 5.63 -0.14 -0.57 0.34
## ------------------------------------------------------------
## group: 1er
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 20 6.5 1.5 6.35 6.55 2.03 3.56 8.86 5.29 -0.18 -1.23 0.33
## ------------------------------------------------------------
## group: 2er
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 15 7.84 1.89 7.53 7.73 1.14 5.17 12.04 6.87 0.65 -0.53 0.49
## ------------------------------------------------------------
## group: 3er
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 17 8.82 1.37 8.89 8.79 1.02 6.38 11.78 5.4 0.28 -0.42 0.33
Important note: The data cannot be made in wide format as the vectors has not the same length see [http://www.cookbook-r.com/Manipulating_data/Converting_data_between_wide_and_long_format/]
##Normality hard to see Make shapiro instead
par(mfrow=c(2,2))
hist(V1)
hist(V2)
hist(V3)
hist(V4)
#normality is hard to see in Student distributed RV
par(mfrow=c(1,1))
boxplot(simu181$measure~simu181$Groupsj,col=rainbow(4))
stripchart(V1234~(Myf),col=rainbow(4),cex=2,pch=20,main="Yij / 4 groups")
#group J=3 and j=2 as the largest within sd
plotmeans(simu181$measure~simu181$Groupsj,main="Graph 1:Mean and SD by groups",xlab="Time Group",ylab="mean-sd",cex=3,col=rainbow(4),pch=20)
abline(h=7.2,col=2)
text(0.7,7.2,"Grand-Mean",col=2,lty=2)
#NOTE that mat comes on last place on the factor
aov(lm(simu181$measure~simu181$Groupsj))#lm is needed
## Call:
## aov(formula = lm(simu181$measure ~ simu181$Groupsj))
##
## Terms:
## simu181$Groupsj Residuals
## Sum of Squares 73.50526 162.36354
## Deg. of Freedom 3 67
##
## Residual standard error: 1.556707
## Estimated effects may be unbalanced
AOV181=aov(lm(simu181$measure~simu181$Groupsj))#Anova object
summary(AOV181)
## Df Sum Sq Mean Sq F value Pr(>F)
## simu181$Groupsj 3 73.51 24.502 10.11 1.4e-05 ***
## Residuals 67 162.36 2.423
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
AOV181$df.residual#df2=ntot-groups
## [1] 67
AOV181$xlevels #df1=4-1
## $`simu181$Groupsj`
## [1] "0mat" "1er" "2er" "3er"
f=24.5/2.42
f
## [1] 10.12397
pval=pf(10.12,3,67,lower.tail = FALSE)#F distributio se lit de droite a guache seulement
pval
## [1] 1.3893e-05
1-pf(10.11,3,67)#same car TOTAL d'une aire de distribution prob=1-P(f>F)
## [1] 1.403281e-05
#valeur alpha crit 5 trouver la valeur de decision de F(0.95,3,67)
Fcrit=qf(0.95,3,67)
Fcrit
## [1] 2.741574
P(T>t=2.74)<<< 0.0001 MSQ residuals and MSQ group(between donne F= MS between / Ms within) -cours: an:MSQ==fr:SCM
F distribution: ratio de deux variance or 2MSQ
TRICK: En fait un truc “easy” pour alfacrit : Prenez 5% et divisez par le nbrs de tests a faire.
#1
TukeyHSD(AOV181)#ANOVA OBJECT
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = lm(simu181$measure ~ simu181$Groupsj))
##
## $`simu181$Groupsj`
## diff lwr upr p adj
## 1er-0mat 0.1384861 -1.17546321 1.452435 0.9924539
## 2er-0mat 1.4834415 0.06681570 2.900067 0.0366339
## 3er-0mat 2.4631310 1.09386410 3.832398 0.0000674
## 2er-1er 1.3449554 -0.05595873 2.745869 0.0644222
## 3er-1er 2.3246448 0.97163951 3.677650 0.0001456
## 3er-2er 0.9796895 -0.47323487 2.432614 0.2935677
#mat 3 -er est tres différents
#2
pairwise.t.test(simu181$measure,simu181$Groupsj)#même conclusion
##
## Pairwise comparisons using t tests with pooled SD
##
## data: simu181$measure and simu181$Groupsj
##
## 0mat 1er 2er
## 1er 0.78211 - -
## 2er 0.02988 0.04135 -
## 3er 6.9e-05 0.00013 0.16037
##
## P value adjustment method: holm
En fait dans oav vous avez la fonction lm (Rgerssion lineaire qui est une droite affine d’