Measuring every single piece of item is ideal but not practical. Let’s guess you have to perform one million patients for new medicine test. Could you do for all? Thus, the statistical method, let’s called sample test, has been developed to solve problems. The best practical way to perform it is to measure just a sample of the population. Some methods test hypotheses by comparing means.
Four tests to compare means: (a) z-test, (b) t-test(single sample), (c) t-test(dependent), (d) t-test(independent). One question immediately comes up. When to use?
For all common formula is like this. (a) Compare observed to expected (b) Relative to stand error(SE)
Formula z or t = (Observed(sample_mean) - Expected(population_mean)) / Standard Error
jordan_1984_85 <- read.csv(file = "Data Files/jordan_1984_85.csv", stringsAsFactors = FALSE)
attach(jordan_1984_85)
scores
## [1] 16 21 37 25 17 25 33 27 45 27 16 34 35 23 30 13 22 20 20 20 21 20 27
## [24] 21 34 14 18 34 32 45 21 25 22 42 36 23 24 35 27 25 38 29 22 45 26 38
## [47] 31 41 23 49 17 26 16 26 38 28 24 21 37 37 33 26 28 21 32 16 27 32 31
## [70] 38 20 26 38 35 38 31 25 40 33 22 28 29
sample_scores_1 <- sample(x = scores, size = 30); sample_scores_2 <- sample(x = scores, size = 30)
sample_mean_1 <- mean(sample_scores_1); sample_mean_2 <- mean(sample_scores_2)
population_mean <- mean(scores);
SE_1 <- sqrt(var(sample_scores_1)/length(sample_scores_1)); SE_2 <- sqrt(var(sample_scores_2)/length(sample_scores_2))
z_1 = (sample_mean_1 - population_mean) / SE_1; z_2 = (sample_mean_2 - population_mean) / SE_2
jordan_z_test_1 <- data.frame(
category = c("sample_1", "sample_2"),
mean_sample = c(sample_mean_1, sample_mean_2),
mean_ppl = c(population_mean, population_mean),
Standard_error = c(SE_1, SE_2),
Z_test = c(z_1, z_2)
)
jordan_z_test_1
## category mean_sample mean_ppl Standard_error Z_test
## 1 sample_1 27.70000 28.20732 1.432906 -0.3540478
## 2 sample_2 27.43333 28.20732 1.332773 -0.5807319
sample_scores_3 <- sample(x = scores, size = 30); sample_scores_4 <- sample(x = scores, size = 45); sample_scores_5 <- sample(x = scores, size = 60)
sample_mean_3 <- mean(sample_scores_3); sample_mean_4 <- mean(sample_scores_4); sample_mean_5 <- mean(sample_scores_5)
population_mean <- mean(scores);
SE_3 <- sqrt(var(sample_scores_3)/length(sample_scores_3)); SE_4 <- sqrt(var(sample_scores_4)/length(sample_scores_4)); SE_5 <- sqrt(var(sample_scores_5)/length(sample_scores_5))
z_3 = (sample_mean_3 - population_mean) / SE_3; z_4 = (sample_mean_4 - population_mean) / SE_4; z_5 = (sample_mean_5 - population_mean) / SE_5
jordan_z_test_2 <- data.frame(
category = c("sample_3", "sample_4", "sample_5"),
sample_size = c(30,45,60),
mean_sample = c(sample_mean_3, sample_mean_4, sample_mean_5),
mean_ppl = c(population_mean, population_mean, population_mean),
Standard_error = c(SE_3, SE_4, SE_5),
Z_test = c(z_3, z_4, z_5)
)
jordan_z_test_2
## category sample_size mean_sample mean_ppl Standard_error Z_test
## 1 sample_3 30 26.76667 28.20732 1.538534 -0.9363784
## 2 sample_4 45 29.75556 28.20732 1.172563 1.3203880
## 3 sample_5 60 28.36667 28.20732 1.022064 0.1559097
single_t_value <- (mean(sample_scores_1) - mean(jordan_1984_85$scores))/ sqrt(var(sample_scores_1)/length(sample_scores_1))
single_t_value
## [1] -0.3540478
t.test(sample_scores_1, mu = 30, alternative = "less", conf.level = 0.95)
##
## One Sample t-test
##
## data: sample_scores_1
## t = -1.6051, df = 29, p-value = 0.05965
## alternative hypothesis: true mean is less than 30
## 95 percent confidence interval:
## -Inf 30.13469
## sample estimates:
## mean of x
## 27.7
From the output, we can see that the man Jordan’s score for the sample_scores_1 is 30.0667. The ond-sided 95% confidence interval tells us that mean scoring is likely to be less than 32.6. The p-value of 0.7585 tells us that if the mean scoring volume of the Jordan were 29, the probability of selecting a sample with mean volume less than or equal to this one would be approximately 51%.
Since the p-value is not less than the significance level of 0.05, we can’t reject the null hypothesis that mean scoring is equal to 30. This means that there is no evidence that the scores are being under-scored.
Observed mean value is sample mean of different scores (e.g. sample_mean_1 - sample_mean_2).
Expected mean value is population mean of difference socres.
SE value is SE of the mean difference.
Dependent test is popular known as pre-test vs pro-test in the same population.
Let’s see graph below.
wm <- read.csv(file = "Data Files/wm.csv", stringsAsFactors = FALSE)
wm_t <- subset(wm, wm$train == 1)
# summary statistics
library(psych)
describe.by(wm_t)
## Warning: describe.by is deprecated. Please use the describeBy function
## Warning: 강제형변환에 의해 생성된 NA 입니다
## Warning in FUN(newX[, i], ...): min에 전달되는 인자들 중 누락이 있어 Inf를
## 반환합니다
## Warning in FUN(newX[, i], ...): max에 전달되는 인자들 중 누락이 있어 -Inf를
## 반환합니다
## Warning in describeBy(x = x, group = group, mat = mat, type = type, ...):
## no grouping variable requested
## vars n mean sd median trimmed mad min max range skew kurtosis
## cond* 1 80 NaN NA NA NaN NA Inf -Inf -Inf NA NA
## pre 2 80 10.03 1.37 10 10.03 1.48 8 12 4 0.10 -1.24
## post 3 80 13.51 2.54 14 13.50 2.97 7 19 12 0.00 -0.24
## gain 4 80 3.49 2.15 3 3.41 1.48 -1 9 10 0.34 -0.25
## train 5 80 1.00 0.00 1 1.00 0.00 1 1 0 NaN NaN
## se
## cond* NA
## pre 0.15
## post 0.28
## gain 0.24
## train 0.00
# Create a boxplot with pre- and post-training groups
boxplot(wm_t$pre, wm_t$post, main = "Boxplot",
xlab = "Pre- and Post-Training", ylab = "Intelligence Score",
col = c("red", "green"))
In statistics, the main question to figure out is “does it happen by chance or not?” Let us find out.
Compare observed value to critical value
In our case, our null hypothesis is that there’s no effect.
# Define the sample size
n <- nrow(wm_t)
# Mean of the different scores
mean_diff <- sum(wm_t$gain) / n # mean(wm_t$gain)
# standard deviation of the different scores
sd_diff <- sqrt(sum((mean_diff - wm_t$gain)^2) / (n-1))
# Obsered t-value
t_obs <- mean_diff / (sd_diff / sqrt(n))
t_obs
## [1] 14.49238
# Compute the critical value
t_crit <- qt(0.975, df = 79)
# Print the critical value
t_crit
## [1] 1.99045
# Print the observed t-value to compare
t_obs
## [1] 14.49238
# Compute Cohen's d
cohens_d <- mean_diff / sd_diff
# View Cohen's d
cohens_d
## [1] 1.620297
Now, we get each value. Let’s compare two values. The observed t-value is 14.49238, and the critical value is 1.99045. The observed t-value is significantly larger than the critical value, which tells us the the difference is significant at a significance level of 0.05.
Let’s see cohens_d is 1.620297. A Cohen’s d of 1.62 means that the intelligence scores of our subjects changed by 1.62 standard deviations, which is very large.
# Apply the t.test function
t.test(wm_t$post, wm_t$pre, paired = TRUE)
##
## Paired t-test
##
## data: wm_t$post and wm_t$pre
## t = 14.492, df = 79, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 3.008511 3.966489
## sample estimates:
## mean of the differences
## 3.4875
# Calculate Cohen's d
# install.packages("lsr")
library(lsr)
cohensD(wm_t$post, wm_t$pre, method = "paired")
## [1] 1.620297