T-Tests for Statistical Significance

Utilizing T-Tests to ensure Statistical Significance

In this markdown I activate the power of t-tests, facilitating them across multiple random vectors to analyze subtle differences in means, and the statistical significance of each difference or lack thereof.

Create random samples

# ?rnorm()
set.seed(454524)
x_sample <- rnorm(100,55,1)
set.seed(787722)
y_sample <- rnorm(100,49,1)
samples_df <- data.frame(x_sample,y_sample)

T-Test Assumptions

Data must:

1.) Be Normally Distributed

2.) Have similar or Equal Variance

3.) Be Independently Sampled

4.) Be Randomly Sampled

5.) Be Continuous

Check all 5 T-test assumptions

Some variance tests such as the F-Test require normality, so I’ll check for normality first.

Check for normality

hist(x_sample, border = "cyan3", col= "azure")

Histogram of x_sample seems normal, though slightly right-skewed.

hist(y_sample, border = "darkgrey", col= "whitesmoke")

Histogram of x_sample looks normally distributed as well.

I’ll use a density plot to get better view of density curves for each variable.

# ?geom_density()

ggplot(samples_df, aes(x = x_sample)) +
  geom_density( aes(x = x_sample, y = after_stat(density)),
                fill="cyan3", alpha=0.3 )+ 
  geom_label( aes(x=53.5, y=0.4, label="x_sample"),
              color="grey") +
  geom_density( aes(x = y_sample, y = -after_stat(density)),
                fill= "black", alpha=0.1) +
  geom_label( aes(x= 47, y= -0.4, label="y_sample"),
              color="grey") +
  xlab("Test for Normality") +
  xlim(46,59) + ylim(-.6,.6) + theme_bw()

I’m definitely having too much fun with this plot here, however, the density plots of both samples seem to be bell curved and normally distributed. Next, I’ll look at a qq-plot to finalize my assumption.

Plot qq-plot to check for normality

x_qqplot <- ggqqplot(samples_df, x = "x_sample", color = "cyan3", 
         title = "QQ-Plot Test for Normality", ggtheme = theme_bw(),
         xlab = "x_sample", ylab = F)

y_qqplot <- ggqqplot(samples_df, x = "y_sample", color = "darkgrey",
         title = "QQ-Plot Test for Normality", ggtheme = theme_bw(),
         xlab = "Y_sample", ylab = F)

qqtest <- ggarrange(x_qqplot, y_qqplot + rremove("x.text"), 
          labels = c("A", "B"),
          ncol = 2, nrow = 1)
qqtest

I’ll assume normality of both x and y samples as all of the points fall approximately along my qq-lines of normality.

Check for equal variance

# ?var.test()
var.test(x_sample, y_sample, alternative = "two.sided")

## 
##  F test to compare two variances
## 
## data:  x_sample and y_sample
## F = 1.3904, num df = 99, denom df = 99, p-value = 0.1027
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.9355143 2.0664489
## sample estimates:
## ratio of variances 
##           1.390393

Null H0 states true ratio of variances is equal to 1.

Alternative H1 states true ratio of variances is not equal to 1.

Results show p-value > 5%.

Fail to reject null, true ratio of variances may be equal to 1.

There is no significant difference between the two variances. Therefore, I can assume variances of the 2 samples to be equal or similar.

These samples were generated from Random Samples and were Independently Sampled. They’re both Continuous numerical vectors. Therefore, I can assume that all 5 assumptions of this T-test have been met. Now, I’ll begin running t-tests.

Run most common types of T-tests and interpret results

# ?t.test()
# 1 sample, two-sided, independent(fixed value)
t.test(x_sample, mu= 10, alternative = "two.sided")

## 
##  One Sample t-test
## 
## data:  x_sample
## t = 466.33, df = 99, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 10
## 95 percent confidence interval:
##  54.85465 55.23799
## sample estimates:
## mean of x 
##  55.04632

Null H0 states x mean is same as fixed value mean of 10.

Alternative H1 states x mean is different from 10.

Results show p-value < 5%.

Reject null, x mean is different from 10.

# 1 sample, right-tailed, independent(fixed value)
t.test(x_sample, mu= 10, alternative = "greater")

## 
##  One Sample t-test
## 
## data:  x_sample
## t = 466.33, df = 99, p-value < 2.2e-16
## alternative hypothesis: true mean is greater than 10
## 95 percent confidence interval:
##  54.88593      Inf
## sample estimates:
## mean of x 
##  55.04632

Null H0 states x mean ≤ fixed value mean of 10.

Alternative H1 states that x mean is > than 10.

Results show p-value < 5%.

Reject null, x mean > 10.

# 2 sample, two-sided, independent
t.test(x_sample, y_sample, "two.sided", var.equal = T)

## 
##  Two Sample t-test
## 
## data:  x_sample and y_sample
## t = 48.001, df = 198, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  5.829915 6.329457
## sample estimates:
## mean of x mean of y 
##  55.04632  48.96664

Null H0 states difference in means is = to zero.

Alternative H1 states difference in means is not zero.

Results show p-value < 5%.

Reject null, difference in means is not 0.

# 2 sample, right-tailed, independent(fixed value)
t.test(x_sample, y_sample, mu= 15, "greater", var.equal = T)

## 
##  Two Sample t-test
## 
## data:  x_sample and y_sample
## t = -70.429, df = 198, p-value = 1
## alternative hypothesis: true difference in means is greater than 15
## 95 percent confidence interval:
##  5.870374      Inf
## sample estimates:
## mean of x mean of y 
##  55.04632  48.96664

Null H0 states difference in means is ≤ 15.

Alternative H1 states difference in means is > 15.

Results show p-value > 5%.

Fail to reject null.

Difference in means may be ≤ 15.

Create dummy dependent x-sample for Paired T-Test

set.seed(112233)
xdep <- rnorm(100,49,1)

Run Paired T-Test

# 1 sample, two-sided, dependent-paired(2 samples, same population)
t.test(x_sample, xdep, alternative = "two.sided", paired = T, var.equal = T)

## 
##  Paired t-test
## 
## data:  x_sample and xdep
## t = 41.369, df = 99, p-value < 2.2e-16
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  5.731589 6.309102
## sample estimates:
## mean difference 
##        6.020345

Null H0 states mean difference is 0.

Alternative H1 states mean difference is not 0.

Results show p-value < 5%.

Reject null, difference in means is not 0.

# 1 sample, left-tailed, dependent-paired(2 samples, same population, fixed value)
t.test(x_sample, xdep, mu=10, alternative = "less", paired = T, var.equal = T)

## 
##  Paired t-test
## 
## data:  x_sample and xdep
## t = -27.347, df = 99, p-value < 2.2e-16
## alternative hypothesis: true mean difference is less than 10
## 95 percent confidence interval:
##      -Inf 6.261976
## sample estimates:
## mean difference 
##        6.020345

Null H0 states difference in means of x and y is ≥ 10.

Alternative H1 states mean difference is < 10.

Results show p-value < 5%.

Reject null, mean difference is < 10.

# The underlying theme of t-test results is that 
# if the p-value is less than 5%, we can reject the null
# and accept the alternative H1. If the p-value is
# greater than 5%, then we cannot reject the null H0,
# and we cannot accept the alternative H1. Further
# analysis is needed. 

# A p-value asks: "If the null
# was true, what is the probability on a scale of 0
# to 100 percent, that I would see sample mean
# relationships of the magnitude in which they're
# calculating at right now?" The p-value 
# is the percentage point. A p-value of less than 5%
# shows the sample means we're seeing wouldn't appear
# if the null were true. These means are statistically
# significant and likely not by chance. We'd
# likely see similar sample results if we'd taken
# another sample from the same population. Therefore,
# we can reject the null hypothesis and accept
# the alternative hypothesis.