1. T-TEST

Using

Analysis the difference between TWO groups or TREATMENTS by using t value which is calculated by using the difference between mean values of 2 groups divide by NOISE (SD of the difference)

Assumptions

  1. Dataset should follow the normal distribution
  2. 2 groups should be independent
  3. Ramdom sampling methods
  4. Sample size each group > 30 observations
boxplot(Raw_data$`WG (g)`~ Raw_data$Hemorrhage, data = Raw_data, lwd = 2, ylab = 'WG')
stripchart(Raw_data$`WG (g)`~ Raw_data$Hemorrhage, vertical = TRUE, data = Raw_data, 
    method = "jitter", add = TRUE, pch = 20, col = 'blue')

describe.by(Raw_data$`WG (g)`,Raw_data$Hemorrhage,range=F)
## 
##  Descriptive statistics by group 
## group: 0
##    vars  n mean    sd skew kurtosis   se
## X1    1 26 51.1 16.05 0.09    -1.48 3.15
## ------------------------------------------------------------ 
## group: 1
##    vars  n  mean    sd skew kurtosis   se
## X1    1 34 43.23 12.87 0.59    -0.81 2.21
beeswarm(`WG (g)`~Hemorrhage,data=Raw_data,color=20,pch=16)

boxplot(Raw_data$`WG (g)`~ Raw_data$Hemorrhage,method = "jitter",add = T) # kết hợp beeswarm

t.test(Raw_data$`WG (g)`~Raw_data$Hemorrhage)
## 
##  Welch Two Sample t-test
## 
## data:  Raw_data$`WG (g)` by Raw_data$Hemorrhage
## t = 2.0478, df = 47.018, p-value = 0.04618
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##   0.1388506 15.6034452
## sample estimates:
## mean in group 0 mean in group 1 
##        51.09697        43.22582

t value = 2.04 có nghĩa difference (mean1-mean2) lớn gấp 2.04 lần noise (SD of difference)

p value = 0.04 có ý nghĩa thống kê

95% CI: lặp lại nghiên cứu này 100 lần thì 95 lần kết quả dao động từ 0.13 đến 15.6

Mean ở group 1 (bị hemorrhage ) và 0 (ko bị hemorrage) lần lượt là 51.09 và 43.2

2. Normal distribution theoary

  1. Symmetrical distribution mean =median=mode
  2. Negative skewed distribution mean <median<mode
  3. Positive skewed distribution mean >median>mode If data follow normal distribution, mean =1, SD=0, 68% will be performed from -1 to +1 (using integral), 95% from -2 to 2, 99.7% from -3 to 3.

Checking normal distribution by using qqnorm and qqline

qqnorm(Raw_data$`WG (g)`)
qqline(Raw_data$`WG (g)`,col=3)

plot(density(Raw_data$`WG (g)`)) 

Checking normal distribution by using test

shapiro.test(Raw_data$`WG (g)`)
## 
##  Shapiro-Wilk normality test
## 
## data:  Raw_data$`WG (g)`
## W = 0.92827, p-value = 0.001678
ad.test(Raw_data$`WG (g)`)
## 
##  Anderson-Darling normality test
## 
## data:  Raw_data$`WG (g)`
## A = 1.7083, p-value = 0.0001972

Note that, sometimes, the statistical tests are sensitive. For example, if you have 1000 observations but having some outliers, the test will performed with p value <0.05. In this case, you should check by using qqnorm, qqline to identify normal distribution.