Wel here some tricks to bring along in the exam and and clarification.
T test is the most used world-wide statistical test for (bio pharma etc..:
His power
His simplicity
Very resistant to Normality departure.
Student based his distribution on the Gaussian Normal one : Instead of counting 100 , 1000 of seed he modified the N(0,1) to a T distribution allowing you accuracy when n group size is below n=~30.
Two sample TEST on test the difference in MEANS, nothing else!
note: Ensure that this estimators is representing correctly the study phenomena!
But some ASSUMPTIONS must be entitled and verified:
NORMALITY
EQUAL WITHIN GROUP VARIANCE
I.I.D Sampling
T test on R
TWO SAMPLE T TEST With equal variance (pooled variance).
R:: var.equal = TRUE (tested by var.test or Leveneve::car
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 100 20.18 1.83 20.12 20.16 1.78 15.38 24.37 8.99 0.06 -0.22 0.18
describe(Y)
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 100 19.78 1.93 19.55 19.67 1.93 15.89 26.48 10.59 0.63 0.58 0.19
library(skimr)
Warning: le package 'skimr' a été compilé avec la version R 4.2.3
skim(data.frame(X))
Data summary
Name
data.frame(X)
Number of rows
100
Number of columns
1
_______________________
Column type frequency:
numeric
1
________________________
Group variables
None
Variable type: numeric
skim_variable
n_missing
complete_rate
mean
sd
p0
p25
p50
p75
p100
hist
X
0
1
20.18
1.83
15.38
19.01
20.12
21.38
24.37
▁▃▇▅▂
hist(X)
hist(Y)
var.test(X,Y)##var on available for non group
F test to compare two variances
data: X and Y
F = 0.8911, num df = 99, denom df = 99, p-value = 0.5673
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.5995678 1.3243799
sample estimates:
ratio of variances
0.891098
(2.21*2.21)/(1.83*1.83)#ratio is lower than two OK!
[1] 1.458419
t.test(X,Y,var.equal =TRUE)#TWO SAMPLE T TEST !NOT WELCH
Two Sample t-test
data: X and Y
t = 1.4886, df = 198, p-value = 0.1382
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.1285617 0.9203725
sample estimates:
mean of x mean of y
20.18081 19.78491
2*(1-pt(1.4886,198))##pval by hand
[1] 0.1381837
##dif mean < diff var t test must be non signif NO signal## NOTE ON : ON SAMPLE----T_TEST BY HAND-----kable(round(describe(X),2))
vars
n
mean
sd
median
trimmed
mad
min
max
range
skew
kurtosis
se
X1
1
100
20.18
1.83
20.12
20.16
1.78
15.38
24.37
8.99
0.06
-0.22
0.18
#testing mu population=20t.test(X,mu=20)#testing mu diff 20
One Sample t-test
data: X
t = 0.99041, df = 99, p-value = 0.3244
alternative hypothesis: true mean is not equal to 20
95 percent confidence interval:
19.81857 20.54306
sample estimates:
mean of x
20.18081
t=(20.18-20)/0.18t
[1] 1
2*(1-pnorm(1))#BASED on N(0,1)
[1] 0.3173105
2*(1-pnorm(20.18,20,0.18))#based on real statistics estimates of parameters
[1] 0.3173105
#SAME RESULT##DO NOT FORGET 2*PVALfor a two sided test (ussually always)
With UN-equal variance (WELCH TT).
vars n mean sd median trimmed mad min max range skew kurtosis
X1 1 100 20.04 3.54 19.85 20.05 3.96 10.13 27.72 17.59 -0.05 -0.24
se
X1 0.35
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 100 19.81 1.81 19.75 19.79 1.77 15.28 23.76 8.49 0.05 -0.56 0.18
F test to compare two variances
data: X and Y
F = 3.8157, num df = 99, denom df = 99, p-value = 1.363e-10
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
2.567392 5.671090
sample estimates:
ratio of variances
3.815745
Welch Two Sample t-test
data: X and Y
t = 0.58262, df = 147.56, p-value = 0.561
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.553478 1.016273
sample estimates:
mean of x mean of y
20.03848 19.80708
[1] 0.5608255
You see when n is >>30 even with diff in group variance t test are similar.
When n is below 30 that make the differences.
PAIRED DATA
When your data is correlated (longitudinal study, family study….) the alternative of T test is still a T test: Making the differences on each X_is_ not on the Means allows you to generate a new variables that remove correlation.
Remember the difference of two correlated RV is given by:
E [VAR(Xi-Yi)] = VAR(Xi)+ VAR(Yi)-2 COV (XY)
2 COV (XY)= correlation XY
set.seed(234)Xbrothe=rnorm(100,20,4)#SIM PAIRED Brothers family study...Ybrother=rnorm(100,20,2)var.test(Xbrothe,Ybrother)##var on available for non group
F test to compare two variances
data: Xbrothe and Ybrother
F = 3.4456, num df = 99, denom df = 99, p-value = 2.538e-09
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
2.318372 5.121030
sample estimates:
ratio of variances
3.445643
##WELCH PIARED T TESTt.test(X,Y,var.equal =FALSE,paired =TRUE)#TWO SAMPLE WELCH PAIRED DATA T TEST !
Paired t-test
data: X and Y
t = 0.60629, df = 99, p-value = 0.5457
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
-0.5258984 0.9886937
sample estimates:
mean difference
0.2313976