In a Two-Sample t-test, we look to compare the means of two independent groups in order to determine if there’s a significant difference between them. We start by assuming normality and constant variance within the populations. We will explore the case where we don’t have constant variance; if this is the case then, we will need to use a Variance Stabilizing Transformation (VST). It is needed to have 2 independent populations, otherwise ANOVA testing should be used instead.
Summary:
Use T-test if we have constant variance and normality
If we don’t have constant variance, we use VST (BoxCot)
Non-parametric test (Mann-Witney U-test)
We mentioned previously the assumption of constant variance and normality. We will discuss further down the implications of each one.
Constant Variance: This is a strong assumption. The variance of the two populations should be roughly equal. We can check the variance using boxplots and residuals plot. If this is not the case, a VST is required to proceed.
Normality: For the standard t-test to be accurate, it requires normality however, non-normal data are best done with nonparametric tests. We check normality with scatter plots and histograms.
Independence: The two populations must not come from the same source of data.
Random: The data must be collected at random. This is to extrapolate the general population of data.
Small sample size: Typically, a sample size of 30 is needed. Otherwise, large number of samples would fall under Central Limit Theory (CLT).
Note: Also if sample size is too small, then t-test may not be a proper statistical tool.
Alternatives:
Mann-Witney U-test: Also known as the Wilcoxon test. This is a non-parametric statistical test that can be used as an alternative when the assumption of constant variance and normality are not met.
Note: In the case of 2 or more populations, use Kruskal Wallace Test instead.
VST: We will explore deeply the effects of using a transformation in an attempt to stabilize the variance. There are several possible transformations (ex. Log, Sqrt, ArcSin, etc). In this section, we will use Box-Cox transformation, which is a power transformation using a best fitted lambda value, given a certain data set.
Before we begin the experiment/hypothesis testing, it’s important to know how many samples are needed prior to collecting data. Sample size directly affects the power of a study, meaning the probability of correctly rejecting, or not rejecting, the null hypothesis and reducing Type I and II errors. It’s also important to know if the amount of data needed for the experiment is realistic to obtain. Finally, budgeting constrains may be a factor to consider (such as, there is a financial constrain that limits the amount of data collected)
Note: Type I Error (α): This occurs when the null hypothesis is incorrectly rejected. [Conclude there is a difference when there is none]
Note: Type II Error (β): This occurs when the null hypothesis is incorrectly not rejected. [Conclude there is no difference when there is]
In R, we use the pwr tool to determine the sample size. As an example, we used the following parameters to conlcude a sampling size of 26 needed.
d - Effect size (Cohen’s d, delta between the means divided by the pooled std deviation) :
Sig.Level - Significance level (Type I error)
Power: Power of test (1 minus Type II error)
##
## Two-sample t test power calculation
##
## n = 25.52458
## d = 0.8
## sig.level = 0.05
## power = 0.8
## alternative = two.sided
##
## NOTE: n is number in *each* group
(Use a modified OutDesign function here. Talk about Exporting) GENERATE THE DESIGN/OUTPUT IT.
(Talk about adding response to csv. Use InDesign function here.) ReAD DATA BACK WITH INDESIGN
Histogram - This type of plot is a graphical representation of the data distribution. A normal distribution histogram is symmetrical, bell-shaped with minimal skeness as shown below.
QQ - Plot - The quantile-quantile plot is another graphical representation used to check for normality. Typically, we look that the data set follows a straight line pattern, with minimal skewness and minimal outliers. Graph below is a representation of a normal distribution QQ plot.
Boxplot - Also known as the Box-And-Whiskerplot, is a graphical representation of the dataset distribution. Primarily, we look for the InterQuartile Range (IQR). If the height of the boxes is similar we can conclude that the variance is minimal.
The primary objective of the two-sample t-test is to test whether the means of two populations differ. We state the null and alternate hypothesis as follows:
\[ H_o : \mu_1 = \mu_2 \]
\[ H_a : \mu_1 \neq \mu_2 \]
Equations
\[ t_{0}=(y_{1}-y_{2})/\sqrt{\frac{s_{1}^{2}}{n_{1}}+\frac{s_{2}^{2}}{n_{2}}} \]
\[ S_{p}^{2}=\frac{(n_{1}-1)s_{2}^{1}+((n_{2}-1)s_{2}^{1})}{n_{1}+n_{2}-2} \]
\[ SE(y\bar{_{1}}-y\bar{_{2}})=\sqrt{(\frac{s_{2}^{1}}{n_{1}})}+(\frac{s_{2}^{2}}{n_{2}}) \]
\[ df=n_{1}+n_{2}-2 \]
\[ P=2*P(T>|t_{0}|) \]
In conventional hypothesis testing, a fixed significance level (often α=0.05) is used. For instance, if a test statistic t0 falls beyond the critical value of 2.101 (for a two-tailed test with 18 degrees of freedom), the null hypothesis (H0:μ1=μ2) would be rejected at the 0.05 level of significance. However, this method doesn’t convey the strength of evidence against H0. The P-value approach, on the other hand, calculates the probability of obtaining a test statistic as extreme as t0, assuming H0 is true.
Linear Effects Equation
\[ Y_{ij}=\mu +\tau_{ij}+\epsilon _{ij} \] Where μ = Mean, τ = effect, ε = error
Linear Effects Equation after VST \[ Y_{ij}^{\lambda } =\mu +\tau_{ij}+\epsilon _{ij\lambda} \]
Where ε is now N(0,σ2)
We use the Residuals vs Fitted plot to look for constant variance. The residuals is difference between each observation and the estimated mean value of the population. We look for a straight line between the spread of the residuals to verify the assumption of constant variance.
If we reject the Ho in the ANOVA, then the next step is to determine which pairs of means are significantly different. However, in a two-sample t-test we only have two populations.
Note: Possible examples include the LSD (Least Significant Difference - Not conservative) and/or Tukey’s Test (More Conservative)
A Mcdonald’s in Texas claims that their drive-thru system has an average total service time of 75 seconds. This means from the moment a car approaches the speaker box to order until the moment he receives the food and leaves. This location claims that the time should be valid for all Mcdonald’s in the USA. We look to conduct a two sample t-test to verify the validity of this claim.
We collected 30 data points each at two Mcdonald’s locations in Texas and Nevada.
We will begin our analysis by observing graphs to determine the assumption of normality and equal variance.
The graph shown above indicates that there is a small indication of normal distribution given that the data set seems to follow the best fitted line. It’s not a perfect fit, but the assumption of normality can be interpreted.
The Boxplot graph shows that the means are different. However, the main information that we get is that the variance in the Nevada location is slightly higher than in the Texas location given that the height of the boxes are slightly different. Due to this, we will look to stabilize the variance via a VST using Box-Cox.
The graph shown above shows an approximate lambda value of 0.8-0.9. This value is very close to 1 in which it further validates the assumption of a slight variance difference obtain from the previous Boxplot graph. We will choose 0.88 as the lambda value with a 95% significance level.
The transformed graph now shows a lambda value closer to 1, which is what we are looking for. We will examine the boxplot graph after the transformation
The boxplot graph now shows a slightly closer variance between the 2 populations. The magnitude of the graph has drastically decreased. Now we can conduct a t-test of the data set to obtain an accurate result.
data1<-data1^lambda
data2<-data2^lambda
t.test(data1,data2)
##
## Welch Two Sample t-test
##
## data: data1 and data2
## t = 4.4509, df = 54.35, p-value = 4.286e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 7.371924 19.453532
## sample estimates:
## mean of x mean of y
## 55.36014 41.94741
With a p-value of 4.286e-05 (<0.05) at a 95% confidence interval, we can reject the null hypothesis in favor of the alternative meaning that the true difference in means is not 0.
As an additional step, we can also explore a different transformation approach to compare the results. In this case we can use the square root transform to obtain new results, and compare the validity of the analysis.
#Transform 2
pulldata<-read.csv("https://raw.githubusercontent.com/omarttuedu/MidtermData/main/Runfile.csv")
data1<-pulldata$Nevada
data2<-pulldata$Texas
data1<-sqrt(data1)
data2<-sqrt(data2)
dataplot<-sqrt(dataplot)
boxplot(dataplot~x,xlab="Mcdonald's Location",ylab="observation",main="Boxplot of Observations")
t.test(data1,data2)
##
## Welch Two Sample t-test
##
## data: data1 and data2
## t = 4.5283, df = 50.824, p-value = 3.617e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.8279127 2.1468711
## sample estimates:
## mean of x mean of y
## 9.744576 8.257184
The plot shown above, provides a similar observation with the Box-Cox approach. It can shown in the Boxplot that there is a slight difference in the variance, but the magnitude has substantially decreased. Additionally, conducting the t-test using the 2nd transform values, we can also conclude to reject the null hypothesis in favor of the alternate.