In this markdown I activate the power of t-tests, facilitating them across multiple random vectors to analyze subtle differences in means, and the statistical significance of each difference or lack thereof.
# ?rnorm()
set.seed(454524)
x_sample <- rnorm(100,55,1)
set.seed(787722)
y_sample <- rnorm(100,49,1)
samples_df <- data.frame(x_sample,y_sample)
Data must:
1.) Be Normally Distributed
2.) Have similar or Equal Variance
3.) Be Independently Sampled
4.) Be Randomly Sampled
5.) Be Continuous
Some variance tests such as the F-Test require normality, so I’ll check for normality first.
hist(x_sample, border = "cyan3", col= "azure")
Histogram of x_sample seems normal, though slightly right-skewed.
hist(y_sample, border = "darkgrey", col= "whitesmoke")
Histogram of x_sample looks normally distributed as well.
I’ll use a density plot to get better view of density curves for each variable.
# ?geom_density()
ggplot(samples_df, aes(x = x_sample)) +
geom_density( aes(x = x_sample, y = after_stat(density)),
fill="cyan3", alpha=0.3 )+
geom_label( aes(x=53.5, y=0.4, label="x_sample"),
color="grey") +
geom_density( aes(x = y_sample, y = -after_stat(density)),
fill= "black", alpha=0.1) +
geom_label( aes(x= 47, y= -0.4, label="y_sample"),
color="grey") +
xlab("Test for Normality") +
xlim(46,59) + ylim(-.6,.6) + theme_bw()
I’m definitely having too much fun with this plot here, however, the density plots of both samples seem to be bell curved and normally distributed. Next, I’ll look at a qq-plot to finalize my assumption.
x_qqplot <- ggqqplot(samples_df, x = "x_sample", color = "cyan3",
title = "QQ-Plot Test for Normality", ggtheme = theme_bw(),
xlab = "x_sample", ylab = F)
y_qqplot <- ggqqplot(samples_df, x = "y_sample", color = "darkgrey",
title = "QQ-Plot Test for Normality", ggtheme = theme_bw(),
xlab = "Y_sample", ylab = F)
qqtest <- ggarrange(x_qqplot, y_qqplot + rremove("x.text"),
labels = c("A", "B"),
ncol = 2, nrow = 1)
qqtest
I’ll assume normality of both x and y samples as all of the points fall approximately along my qq-lines of normality.
# ?var.test()
var.test(x_sample, y_sample, alternative = "two.sided")
##
## F test to compare two variances
##
## data: x_sample and y_sample
## F = 1.3904, num df = 99, denom df = 99, p-value = 0.1027
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.9355143 2.0664489
## sample estimates:
## ratio of variances
## 1.390393
Null H0 states true ratio of variances is equal to 1.
Alternative H1 states true ratio of variances is not equal to 1.
Results show p-value > 5%.
Fail to reject null, true ratio of variances may be equal to 1.
There is no significant difference between the two variances. Therefore, I can assume variances of the 2 samples to be equal or similar.
These samples were generated from Random Samples and were Independently Sampled. They’re both Continuous numerical vectors. Therefore, I can assume that all 5 assumptions of this T-test have been met. Now, I’ll begin running t-tests.
# ?t.test()
# 1 sample, two-sided, independent(fixed value)
t.test(x_sample, mu= 10, alternative = "two.sided")
##
## One Sample t-test
##
## data: x_sample
## t = 466.33, df = 99, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 10
## 95 percent confidence interval:
## 54.85465 55.23799
## sample estimates:
## mean of x
## 55.04632
Null H0 states x mean is same as fixed value mean of 10.
Alternative H1 states x mean is different from 10.
Results show p-value < 5%.
Reject null, x mean is different from 10.
# 1 sample, right-tailed, independent(fixed value)
t.test(x_sample, mu= 10, alternative = "greater")
##
## One Sample t-test
##
## data: x_sample
## t = 466.33, df = 99, p-value < 2.2e-16
## alternative hypothesis: true mean is greater than 10
## 95 percent confidence interval:
## 54.88593 Inf
## sample estimates:
## mean of x
## 55.04632
Null H0 states x mean ≤ fixed value mean of 10.
Alternative H1 states that x mean is > than 10.
Results show p-value < 5%.
Reject null, x mean > 10.
# 2 sample, two-sided, independent
t.test(x_sample, y_sample, "two.sided", var.equal = T)
##
## Two Sample t-test
##
## data: x_sample and y_sample
## t = 48.001, df = 198, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 5.829915 6.329457
## sample estimates:
## mean of x mean of y
## 55.04632 48.96664
Null H0 states difference in means is = to zero.
Alternative H1 states difference in means is not zero.
Results show p-value < 5%.
Reject null, difference in means is not 0.
# 2 sample, right-tailed, independent(fixed value)
t.test(x_sample, y_sample, mu= 15, "greater", var.equal = T)
##
## Two Sample t-test
##
## data: x_sample and y_sample
## t = -70.429, df = 198, p-value = 1
## alternative hypothesis: true difference in means is greater than 15
## 95 percent confidence interval:
## 5.870374 Inf
## sample estimates:
## mean of x mean of y
## 55.04632 48.96664
Null H0 states difference in means is ≤ 15.
Alternative H1 states difference in means is > 15.
Results show p-value > 5%.
Fail to reject null.
Difference in means may be ≤ 15.
set.seed(112233)
xdep <- rnorm(100,49,1)
# 1 sample, two-sided, dependent-paired(2 samples, same population)
t.test(x_sample, xdep, alternative = "two.sided", paired = T, var.equal = T)
##
## Paired t-test
##
## data: x_sample and xdep
## t = 41.369, df = 99, p-value < 2.2e-16
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## 5.731589 6.309102
## sample estimates:
## mean difference
## 6.020345
Null H0 states mean difference is 0.
Alternative H1 states mean difference is not 0.
Results show p-value < 5%.
Reject null, difference in means is not 0.
# 1 sample, left-tailed, dependent-paired(2 samples, same population, fixed value)
t.test(x_sample, xdep, mu=10, alternative = "less", paired = T, var.equal = T)
##
## Paired t-test
##
## data: x_sample and xdep
## t = -27.347, df = 99, p-value < 2.2e-16
## alternative hypothesis: true mean difference is less than 10
## 95 percent confidence interval:
## -Inf 6.261976
## sample estimates:
## mean difference
## 6.020345
Null H0 states difference in means of x and y is ≥ 10.
Alternative H1 states mean difference is < 10.
Results show p-value < 5%.
Reject null, mean difference is < 10.
# The underlying theme of t-test results is that
# if the p-value is less than 5%, we can reject the null
# and accept the alternative H1. If the p-value is
# greater than 5%, then we cannot reject the null H0,
# and we cannot accept the alternative H1. Further
# analysis is needed.
# A p-value asks: "If the null
# was true, what is the probability on a scale of 0
# to 100 percent, that I would see sample mean
# relationships of the magnitude in which they're
# calculating at right now?" The p-value
# is the percentage point. A p-value of less than 5%
# shows the sample means we're seeing wouldn't appear
# if the null were true. These means are statistically
# significant and likely not by chance. We'd
# likely see similar sample results if we'd taken
# another sample from the same population. Therefore,
# we can reject the null hypothesis and accept
# the alternative hypothesis.