2024-03-15
T-Testing is a means of evaluating the means of one or two populations using hypothesis testing. It aims to find if a sample differs from a known value, if two samples differ from each other, or if there is a significant difference in paired measures. T-Testing can only be done with at most two groups.
When we’re doing a t-Test, we must assume the following: - The data is continuous. - The sample data is randomly selected. - There are similar variability in each data. - There is an approximately normal distribution.
There are three types of t-Tests you can do that depend on the sample size and need: - One-Sample t-Test: Used for one sample - Two-Sample t-Test: Used with two independent samples - Paired t-Test: Used with two dependent samples.
When doing t-Tests, you must define your Null and Alternative Hypothesis. In general, the Null Hypothesis is if there is no difference or significant effect.
If there is a significant difference found in the sample, then the Null Hypothesis is rejected and the Alternative Hypothesis (there is a significant effect) is accepted.
If not, then we fail to reject the Null Hypothesis and thus there is no significant effect.
Let’s say that we have a random sample of 30 energy bars that claims to have 20 grams of protein.
## [,1] [,2] [,3] [,4] [,5] ## [1,] 20.70 27.46 22.15 19.85 21.29 ## [2,] 24.75 20.75 22.91 25.34 20.33 ## [3,] 21.54 21.08 22.14 19.56 21.10 ## [4,] 18.04 24.12 19.95 19.72 18.28 ## [5,] 16.26 17.46 20.53 22.12 25.06 ## [6,] 22.44 19.08 19.88 21.39 22.23
As you can see here, the values of each protein bar vary and would lead people to believe that it is not actually 20 grams per bar.
We can express the Null Hypothesis with the following equation \(H_0 : \mu = 20\) The Null Hypothesis states that the mean protein amount is 20.
And the alternative hypothesis can be expressed like \(H_a : \mu \neq 20\) The Alternative Hypothesis states that the mean protein amount is not 20.
## Mean Standard_Deviation Standard_Error Sample_Size ## 1 21.25033 2.447252 0.4468051 30
After we analyze the data, we can perform the t-Test as shown here.
## [1] "Critical Value"
## [1] 2.04523
## ## One Sample t-test ## ## data: energyBars ## t = 2.7984, df = 29, p-value = 0.009034 ## alternative hypothesis: true mean is not equal to 20 ## 95 percent confidence interval: ## 20.33651 22.16415 ## sample estimates: ## mean of x ## 21.25033
Since the value falls within the confidence interval, we fail to reject the Null Hypothesis. This means that the mean amount is indeed 20 grams of protein.
We can better understand this using a t-Distribution chart given the data shown from the t-test
This has the critical Value included
A deeper explanation on how this t-test is performed.
First we calculate the sample average \(\overline{x}\) and subtract by population mean \(\mu\)
\(\overline{x}−\mu\)
Next the standard error where s is the standard deviation and n is sample size \(\frac{s}{\sqrt(n)}\)
Then the final test statistic is shown here \(\frac{\overline{x}-\mu}{s/\sqrt{n}}\)
Let’s do another t-Test but this time with two samples. In this case, the samples are the body fat percentages of both men and women. The question we’re answering is if the two groups have similar body fat mean.
## Men
## [1] 13.3 6.0 20.0 8.0 14.0 19.0 18.0 25.0 16.0 24.0 15.0 1.0 15.0
## Women
## [,1] [,2] [,3] [,4] [,5] ## [1,] 22 16 21.7 21 30 ## [2,] 26 12 23.2 28 23
Here is a deeper analysis of the plot
## Mean Standard_Deviation Standard_Error Sample_Size ## 1 14.94615 6.842589 1.897793 13
## Mean Standard_Deviation Standard_Error Sample_Size ## 1 22.29 5.31966 1.682224 10
## ## Welch Two Sample t-test ## ## data: menFat and womenFat ## t = -2.8958, df = 20.989, p-value = 0.00865 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## -12.618000 -2.069692 ## sample estimates: ## mean of x mean of y ## 14.94615 22.29000
In this case, since the p value is greater than 0.05, we reject the null hypothesis. The means of two groups are in fact different.
The critical value is -2.21989
Finally, here is an example of a paired t-Test. Let’s say there are two exams. Each student submits their score of the two exams. The professor wants to find out if the two exams are the same difficulty.
## Test 1
## [,1] [,2] [,3] [,4] ## [1,] 63 65 56 100 ## [2,] 88 83 77 92 ## [3,] 90 84 68 74 ## [4,] 87 64 71 88
## Test 2
## [,1] [,2] [,3] [,4] ## [1,] 69 65 62 91 ## [2,] 78 87 79 88 ## [3,] 85 92 69 81 ## [4,] 84 75 84 82
## Mean Standard_Deviation Standard_Error Sample_Size ## 1 78.125 12.66425 3.166064 16
## Mean Standard_Deviation Standard_Error Sample_Size ## 1 79.4375 9.150364 1.329915 16
## ## Paired t-test ## ## data: test1 and test2 ## t = -0.74978, df = 15, p-value = 0.465 ## alternative hypothesis: true mean difference is not equal to 0 ## 95 percent confidence interval: ## -5.043647 2.418647 ## sample estimates: ## mean difference ## -1.3125
In this case, the p-value is greater than the 0.05 confidence interval. Thus, we fail to reject the Null Hypothesis. There is no significant difference between difficulty of the two tests.
The critical value is -2.213989