In this demonstration we’ll conduct a hypothesis test for multiple means by hand, using R only for step-by-step mathematical calculations.

First, let’s load the mtcars dataset.

head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Suppose we are interested in finding out the difference in mpg of cars with 3 gears as opposed to 4 gears. We can subset mtcars into two groups using the subset function, shown below:

fr <- subset(mtcars, gear == 4, c("mpg"))
th <- subset(mtcars, gear == 3, c("mpg"))

Now let’s find the mean of the 4-geared cars as well as the mean of 3-geared cars.

fm <- mean(fr$mpg)
fm
## [1] 24.53333
tm <- mean(th$mpg)
tm
## [1] 16.10667

Unfortunately, before proceeding further we must realize that variances of both populations are unknown and assumed to be unequal. Hence, we must use Welch’s Unequal Variances T-Test rather than a straightforward Student’s T-Test. Recall Welch’s T-Statistic which is given by

\[\frac{(\overline{X}-\overline{Y}) - (\mu_X - \mu_Y)}{\sqrt{\frac{S^2_X}{n}+\frac{S^2_Y}{m}}} \approx T_\nu\] and \[\nu = \frac{\left(\frac{s^2_X}{n} + \frac{s^2_Y}{m}\right)^2}{\frac{s^4_X}{n^2(n-1)}+\frac{s^4_Y}{m^2(m-1)}}\]

We will test the following hypotheses at the \(\alpha = 0.05\) level of significance:

\[H_0: \mu_{X} = \mu_{Y}\] \[H_1: \mu_{X} \neq \mu_{Y}\]

Where \(\mu_{X}\) and \(\mu_{Y}\) are the average MPG of cars with 3 and 4 gears, respectively.

Analyzing our vectors of data, we get

\(\overline{x} = 16.10667\), \(\overline{y} = 24.53333\), \(s_x = 3.371618\), \(s_y = 5.276764\), \(n = 15\), and \(m = 12\).

Our next step is to find the distribution which will most accurately assist in our analysis.

That is given by \(T_\nu\) where \[\nu = \frac{\left(\frac{3.37^2}{15} + \frac{5.28^2}{12}\right)^2}{\frac{3.37^4}{15^2(15-1)}+\frac{5.28^4}{12^2(12-1)}} \approx 20.80247\]

We must round to the nearest integer, so we will be using \(T_{21}\) to model our test. Next, we must find our test-statistic to place on our distribution, which is given by

\[t = \frac{(16.107-24.5333) - 0}{\sqrt{\frac{3.37^2}{15}+\frac{5.28^2}{12}}} \approx -2.736\]

We can calculate our p-value using

2*pt(-2.736,21,lower.tail = TRUE)
## [1] 0.01237907

Since \(0.01237907 < \alpha\), we will reject \(H_0\) and assume that the MPG is different between 3 and 4 geared cars. Note that we are not assuming explicitly that 4-geared cars have higher MPG than 3-geared cars since our test was merely for difference in means, but since there is enough evidence to show that the two populations have unequal mean, one can assume that the 4-geared cars have higher average mean than the 3-geared cars.

Note that we could achieve the same results using the built-in t.test function.

Thank you for reading. This is not meant to be a substitute for lecture notes or classroom learning in any way.