T test Empire Vampire STAT II SSP

Author

Mudry Jm Stater

T TEST (STUDENT T TEST)

Wel here some tricks to bring along in the exam and and clarification.

T test is the most used world-wide statistical test for (bio pharma etc..:

  • His power

  • His simplicity

  • Very resistant to Normality departure.

Student based his distribution on the Gaussian Normal one : Instead of counting 100 , 1000 of seed he modified the N(0,1) to a T distribution allowing you accuracy when n group size is below n=~30.

Two sample TEST on test the difference in MEANS, nothing else!

note: Ensure that this estimators is representing correctly the study phenomena!

But some ASSUMPTIONS must be entitled and verified:

  • NORMALITY

  • EQUAL WITHIN GROUP VARIANCE

  • I.I.D Sampling

T test on R

TWO SAMPLE T TEST With equal variance (pooled variance).

R:: var.equal = TRUE (tested by var.test or Leveneve::car

library(psych)
library(knitr)
set.seed(123)
X=rnorm(100,20,2)
Y=rnorm(100,20,2)
describe(X)
   vars   n  mean   sd median trimmed  mad   min   max range skew kurtosis   se
X1    1 100 20.18 1.83  20.12   20.16 1.78 15.38 24.37  8.99 0.06    -0.22 0.18
describe(Y)
   vars   n  mean   sd median trimmed  mad   min   max range skew kurtosis   se
X1    1 100 19.78 1.93  19.55   19.67 1.93 15.89 26.48 10.59 0.63     0.58 0.19
library(skimr)
Warning: le package 'skimr' a été compilé avec la version R 4.2.3
skim(data.frame(X))
Data summary
Name data.frame(X)
Number of rows 100
Number of columns 1
_______________________
Column type frequency:
numeric 1
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
X 0 1 20.18 1.83 15.38 19.01 20.12 21.38 24.37 ▁▃▇▅▂
hist(X)

hist(Y)

var.test(X,Y)##var on available for non group

    F test to compare two variances

data:  X and Y
F = 0.8911, num df = 99, denom df = 99, p-value = 0.5673
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.5995678 1.3243799
sample estimates:
ratio of variances 
          0.891098 
(2.21*2.21)/(1.83*1.83)#ratio is lower than two OK!
[1] 1.458419
t.test(X,Y,var.equal = TRUE)#TWO SAMPLE T TEST !NOT WELCH

    Two Sample t-test

data:  X and Y
t = 1.4886, df = 198, p-value = 0.1382
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.1285617  0.9203725
sample estimates:
mean of x mean of y 
 20.18081  19.78491 
2*(1-pt(1.4886,198))##pval by hand
[1] 0.1381837
##dif mean < diff var t test must be non signif NO signal

## NOTE ON : ON SAMPLE----T_TEST BY HAND-----

kable(round(describe(X),2))
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 100 20.18 1.83 20.12 20.16 1.78 15.38 24.37 8.99 0.06 -0.22 0.18
#testing mu population=20
t.test(X,mu=20)#testing mu diff 20

    One Sample t-test

data:  X
t = 0.99041, df = 99, p-value = 0.3244
alternative hypothesis: true mean is not equal to 20
95 percent confidence interval:
 19.81857 20.54306
sample estimates:
mean of x 
 20.18081 
t=(20.18-20)/0.18
t
[1] 1
2*(1-pnorm(1))#BASED on N(0,1)
[1] 0.3173105
2*(1-pnorm(20.18,20,0.18))#based on real statistics estimates of parameters
[1] 0.3173105
#SAME RESULT
##DO NOT FORGET 2*PVALfor a two sided test (ussually always)

With UN-equal variance (WELCH TT).

   vars   n  mean   sd median trimmed  mad   min   max range  skew kurtosis
X1    1 100 20.04 3.54  19.85   20.05 3.96 10.13 27.72 17.59 -0.05    -0.24
     se
X1 0.35
   vars   n  mean   sd median trimmed  mad   min   max range skew kurtosis   se
X1    1 100 19.81 1.81  19.75   19.79 1.77 15.28 23.76  8.49 0.05    -0.56 0.18


    F test to compare two variances

data:  X and Y
F = 3.8157, num df = 99, denom df = 99, p-value = 1.363e-10
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 2.567392 5.671090
sample estimates:
ratio of variances 
          3.815745 

    Welch Two Sample t-test

data:  X and Y
t = 0.58262, df = 147.56, p-value = 0.561
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.553478  1.016273
sample estimates:
mean of x mean of y 
 20.03848  19.80708 
[1] 0.5608255

You see when n is >>30 even with diff in group variance t test are similar.

When n is below 30 that make the differences.

PAIRED DATA

When your data is correlated (longitudinal study, family study….) the alternative of T test is still a T test: Making the differences on each X_is_ not on the Means allows you to generate a new variables that remove correlation.

Remember the difference of two correlated RV is given by:

E [VAR(Xi-Yi)] = VAR(Xi)+ VAR(Yi)-2 COV (XY)

2 COV (XY)= correlation XY

set.seed(234)
Xbrothe=rnorm(100,20,4)#SIM PAIRED  Brothers family study...
Ybrother=rnorm(100,20,2)
var.test(Xbrothe,Ybrother)##var on available for non group

    F test to compare two variances

data:  Xbrothe and Ybrother
F = 3.4456, num df = 99, denom df = 99, p-value = 2.538e-09
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 2.318372 5.121030
sample estimates:
ratio of variances 
          3.445643 
##WELCH PIARED T TEST
t.test(X,Y,var.equal = FALSE,paired = TRUE)#TWO SAMPLE WELCH PAIRED  DATA T TEST !

    Paired t-test

data:  X and Y
t = 0.60629, df = 99, p-value = 0.5457
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 -0.5258984  0.9886937
sample estimates:
mean difference 
      0.2313976 
2*(pt(0.6029,198,lower.tail = FALSE))#pval by hand
[1] 0.5472651