Two Population Hypothesis Tests

Rasim Muzaffer Musal

Goals

Discuss the two population hypothesis tests
Independent Samples t-tests
Paired Samples t-tests
Construction of Step 1
Evaluate and discuss Excel output

Two population hypothesis tests

Motivation to do two population hypothesis tests arise from the need to compare the population mean of one to the other.
Example: Is the average sales of agents that were trained using system X larger than those agents that were trained using system Y.
Example: Is the average heartrate of the group who have taken drug A lower than those who did not.
Example: Is the average salary of a Analytics Major more than the average salary of a Finance major.

Independent Samples t-test

You are interested in making a statement about whether one population’s average is more/less/different than the other.
The key is that there are two separate populations from which you observe 2 separate samples.
Ex: Is the average salary of a Analytics Major more than the average salary of a Finance major.
There is the population of Analytics Majors and the population of Finance Majors. You will observe 2 separate samples from these populations.

Independent Samples t-test

Ex: Is the average salary of a Analytics Major more than the average salary of a Finance major.

\[\begin{align} H_{0}:& \mu_{A} - \mu_{F} \le 0 \\ H_{a}:& \mu_{A} - \mu_{F} > 0 \end{align}\]

Ex: Is the average salary of a Analytics Major at least 10K more than the average salary of a Finance major. \[\begin{align} H_{0}:& \mu_{A} - \mu_{F} \le 10,000 \\ H_{a}:& \mu_{A} - \mu_{F} > 10,000 \end{align}\]

Independent Samples t-test

Ex: Is the average yield of Yuca per acre different in Nicaragua compared to Honduras.

\[\begin{align} H_{0}:& \mu_{Nic} - \mu_{Hon} = 0 \\ H_{a}:& \mu_{Nic} - \mu_{Hon} \neq 0 \end{align}\]

Ex: Is the average life expectancy of dogs who are fed only meat lower, compared to dogs who are fed a balanced diet.

\[\begin{align} H_{0}:& \mu_{Meat only} - \mu_{Balan.} \ge 0 \\ H_{a}:& \mu_{Meat only} - \mu_{Balan.} < 0 \end{align}\]

Paired Samples t-test

Scientists would like to test whether drug A lowers resting heart rate. In order to control for metabolismic variety a group of individuals will have their heart rates measured before and after this drug is taken while at rest.

\[\begin{align} H_{0}:& \mu_{Before-After} \leq 0 \\ H_{a}:& \mu_{Before-After} > 0 \end{align}\]

Paired Samples t-test

Why does \(\mu_{Before-After}\) have the \(>\) sign in the alternative hypothesis? Well… if after taking the drug heart rates decrease, the average heart rate difference would be positive. As an hypothethica example to illustrate this point, say 48 is the average resting heart rate before the drug is taken and 45 is after, the difference would be positive.

-Note that the above is also equivalent to \[\begin{align} H_{0}:& \mu_{After-Before} \ge 0 \\ H_{a}:& \mu_{After-Before} < 0 \end{align}\]

Some comments before looking at output.

Sometimes students are confused about whether they should choose paired samples t-test or independent samples t-test in a given problem.
If there are 2 separate groups from which you observe 2 samples, this leads to an independent samples t-test.
If the same set of individuals (not neccessarily human beings) are observed twice, this leads to paired samples t-test.

Some comments before looking at output.

The principles we learned in single population hypothesis tests are still relevant and true.

-If population means are known there is no grounds for hypothesis testing.
- You only do calculations to compute \(t_{computed}\) if the sample means are not consistent with the null hypothesis. If they are, there is no reason to do hypothesis testing.

Some comments before looking at output.

Statistical significance does not mean practical significance.
If the average yield of Yuca in Nicaragua is only 0.000001 tons more per acre than the average yield of Yuca in Honduras, this more than likely will not have a practical significance.

Business Case:

A real estate business is being sued for underselling houses. The adjudicator had the house prices sold by this company and a set of houses’ prices sold by other companies stored in an excel sheet.

What the people who sue is claiming:

\[\begin{align} H_{0}:& \mu_{NotSued} - \mu_{Sued} \leq 0 \\ H_{a}:& \mu_{NotSued} - \mu_{Sued} > 0 \end{align}\]

\(\mu\) represents the average price of the houses sold by the group specified in the subscript.
This is a directional hypothesis test. The p-values we will look at will be called 1 tail p-value.

Note on 1 tail p value in Excel output:

Excel calculates the \(t_{computed}\).
If \(t_{computed}\) is positive as in this case, 1 tail p value is calculated as the area to the right of \(t_{computed}\) under the curve.
If \(t_{computed}\) is negative, 1 tail p value is calculated as the area to the left of \(t_{computed}\) under the curve.

Excel Set-up Indep. Samples.

This is where we set up the range of data which contains saple 1 and sample 2 as well as alpha and hypothesized mean differences

Excel Output Indep. Samples.

This output contains results from 2 hypothesis tests. The p values and t computed as well as t criticals are all here.

Discussion of output

The output in Excel contains results from multiple hypothesis tests. You should be able to pick the relevant set.
The alternative hypothesis always calculates the p value in the tail of where \(\bar{Var}_{1}-\bar{Var}_{2}\) lands. In this notation \(\bar{Var}_{1}\) is associated with the sample mean of Variable 1 range of values you entered on Section 14 and of course \(\bar{Var}_{2}\) is associated with the sample mean of Variable 2 range of values you entered on Section 14.

Discussion of output

If the sample mean differences are positive p value is to the right of \(t_{computed}\). If the sample mean differences are negative the p value is to the left of the \(t_{computed}\)
In our example the sample mean differences were positive so excel assume what the alternative hypothesis was for us when calculating the p value.
Var1 and Var2 consists of 469 and 49 observations respectively. These are the values you entered in Variable 1 and 2 ranges. We assume unequal variances since the sample sizes and sample variances are so different from each other.

Discussion of output.

If we stick with this case where the difference of sample means is positive we can see that the p value of the 1 tail test, which for this case in excel will always have the > sign in alternative hypothesis, is 0.002. Recall Section 13 for the discussion about this. Compared to an alpha of 0.05 (Remember alpha is a decision variable, it is not something to be computed from data) it is smaller, therefore we reject the null hypothesis and accept the alternative that the company that is not being sued have an average sale price that is larger than the company that is being sued.

Discussion of output.

What if we were researching whether the company NOT being sued has a lower average sale price compared to the company being sued? \[\begin{align} H_{0}:& \mu_{NotSued} - \mu_{Sued} \geq 0 \\ H_{a}:& \mu_{NotSued} - \mu_{Sued} < 0 \end{align}\]
There would be no reason to look at p values.
The sample mean differences is consistent with the null hypothesis. Therefore we can not reject the null hypothesis is all we need to say.

Discussion of output.

What if we were interested in whether the average prices were different between these two companies?

\[\begin{align} H_{0}:& \mu_{NotSued} - \mu_{Sued} = 0 \\ H_{a}:& \mu_{NotSued} - \mu_{Sued} \neq 0 \end{align}\]

We would need to look at the 2 tail p-value. This is 0.003. Therefore we reach the same conclusion if our alpha value is 0.05 (or even if it were 0.01).

Discussion of the output

Note that the output contains the \(t_{computed}\) of 3.079. You can use this number to compare against the \(t_{critical}\) for either the appropriate one tail (1.669) or the 2 tail 1.998 value to reach the exact same conclusion as the comparisons with p and alpha values.

Paired sample t-test

I would like to determine whether the average test scores of students have increased after training via a module. (ignoring the retesting effect)

\[\begin{align} H_{0}:& \mu_{After-Before} \leq 0 \\ H_{a}:& \mu_{After-Before} > 0 \end{align}\]

equivalent to

\[\begin{align} H_{0}:& \mu_{Before-After} \geq 0 \\ H_{a}:& \mu_{Before-After} < 0 \end{align}\]

Excel Set up

Unlike the Independent Samples t-test pairs of values need to be matched in Paired Samples t-test on the excel columns. In this case Column B and C

Excel Set up

Similar to Independent Samples t-test

Excel Set up

Similar to Independent Samples t-test