2024-03-15

T-Testing

What is T-Testing

T-Testing is a means of evaluating the means of one or two populations using hypothesis testing. It aims to find if a sample differs from a known value, if two samples differ from each other, or if there is a significant difference in paired measures. T-Testing can only be done with at most two groups.

Assumptions of t-Tests

When we’re doing a t-Test, we must assume the following: - The data is continuous. - The sample data is randomly selected. - There are similar variability in each data. - There is an approximately normal distribution.

Types of t-Tests

There are three types of t-Tests you can do that depend on the sample size and need: - One-Sample t-Test: Used for one sample - Two-Sample t-Test: Used with two independent samples - Paired t-Test: Used with two dependent samples.

Alternative Hypothesis

When doing t-Tests, you must define your Null and Alternative Hypothesis. In general, the Null Hypothesis is if there is no difference or significant effect.

If there is a significant difference found in the sample, then the Null Hypothesis is rejected and the Alternative Hypothesis (there is a significant effect) is accepted.

If not, then we fail to reject the Null Hypothesis and thus there is no significant effect.

One-Sample t-Test

Problem

Let’s say that we have a random sample of 30 energy bars that claims to have 20 grams of protein.

##       [,1]  [,2]  [,3]  [,4]  [,5]
## [1,] 20.70 27.46 22.15 19.85 21.29
## [2,] 24.75 20.75 22.91 25.34 20.33
## [3,] 21.54 21.08 22.14 19.56 21.10
## [4,] 18.04 24.12 19.95 19.72 18.28
## [5,] 16.26 17.46 20.53 22.12 25.06
## [6,] 22.44 19.08 19.88 21.39 22.23

As you can see here, the values of each protein bar vary and would lead people to believe that it is not actually 20 grams per bar.

Expressing the Null Hypothesis

We can express the Null Hypothesis with the following equation \(H_0 : \mu = 20\) The Null Hypothesis states that the mean protein amount is 20.

And the alternative hypothesis can be expressed like \(H_a : \mu \neq 20\) The Alternative Hypothesis states that the mean protein amount is not 20.

Deeper analysis of the data

##       Mean Standard_Deviation Standard_Error Sample_Size
## 1 21.25033           2.447252      0.4468051          30

t-Test

After we analyze the data, we can perform the t-Test as shown here.

## [1] "Critical Value"
## [1] 2.04523
## 
##  One Sample t-test
## 
## data:  energyBars
## t = 2.7984, df = 29, p-value = 0.009034
## alternative hypothesis: true mean is not equal to 20
## 95 percent confidence interval:
##  20.33651 22.16415
## sample estimates:
## mean of x 
##  21.25033

Since the value falls within the confidence interval, we fail to reject the Null Hypothesis. This means that the mean amount is indeed 20 grams of protein.

t-Distribution chart

We can better understand this using a t-Distribution chart given the data shown from the t-test

This has the critical Value included

Doing the Test By Hand

A deeper explanation on how this t-test is performed.

First we calculate the sample average \(\overline{x}\) and subtract by population mean \(\mu\)

\(\overline{x}−\mu\)

Next the standard error where s is the standard deviation and n is sample size \(\frac{s}{\sqrt(n)}\)

Then the final test statistic is shown here \(\frac{\overline{x}-\mu}{s/\sqrt{n}}\)

Two-Sample t-Test

Example

Let’s do another t-Test but this time with two samples. In this case, the samples are the body fat percentages of both men and women. The question we’re answering is if the two groups have similar body fat mean.

## Men
##  [1] 13.3  6.0 20.0  8.0 14.0 19.0 18.0 25.0 16.0 24.0 15.0  1.0 15.0
## Women
##      [,1] [,2] [,3] [,4] [,5]
## [1,]   22   16 21.7   21   30
## [2,]   26   12 23.2   28   23

Barplots

Barplots cont

Deeper Analysis

Here is a deeper analysis of the plot

##       Mean Standard_Deviation Standard_Error Sample_Size
## 1 14.94615           6.842589       1.897793          13
##    Mean Standard_Deviation Standard_Error Sample_Size
## 1 22.29            5.31966       1.682224          10

Two-Sample t-Test

## 
##  Welch Two Sample t-test
## 
## data:  menFat and womenFat
## t = -2.8958, df = 20.989, p-value = 0.00865
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -12.618000  -2.069692
## sample estimates:
## mean of x mean of y 
##  14.94615  22.29000

In this case, since the p value is greater than 0.05, we reject the null hypothesis. The means of two groups are in fact different.

Distribution Chart

The critical value is -2.21989

Paired T-Test

Example

Finally, here is an example of a paired t-Test. Let’s say there are two exams. Each student submits their score of the two exams. The professor wants to find out if the two exams are the same difficulty.

## Test 1
##      [,1] [,2] [,3] [,4]
## [1,]   63   65   56  100
## [2,]   88   83   77   92
## [3,]   90   84   68   74
## [4,]   87   64   71   88

Example cont

## Test 2
##      [,1] [,2] [,3] [,4]
## [1,]   69   65   62   91
## [2,]   78   87   79   88
## [3,]   85   92   69   81
## [4,]   84   75   84   82

Analysis

Analysis cont.

Analysis cont.

##     Mean Standard_Deviation Standard_Error Sample_Size
## 1 78.125           12.66425       3.166064          16
##      Mean Standard_Deviation Standard_Error Sample_Size
## 1 79.4375           9.150364       1.329915          16

t-Test

## 
##  Paired t-test
## 
## data:  test1 and test2
## t = -0.74978, df = 15, p-value = 0.465
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -5.043647  2.418647
## sample estimates:
## mean difference 
##         -1.3125

In this case, the p-value is greater than the 0.05 confidence interval. Thus, we fail to reject the Null Hypothesis. There is no significant difference between difficulty of the two tests.

t-Distribution

The critical value is -2.213989