Part 8: Making Statistical Decisions about Two Population Means

When would we want to compare two population parameters?

Example: Consider a study designed to compare male and female college students to see which gender consumed more alcoholic beverages in the past week.

The two groups being compared - males and females - are the categories of a binary variable. There are two possible outcomes!

Most comparisons of groups use independent samples. Sometimes a dependent sample is more appropriate

Independent samples:

Dependent samples:

If time, we'll return to dependent samples as a special topic at the end of the class.


Independent samples are usually easier to make statements about than dependent samples. Each of these comparisons are similar to hypothesis tests we have already learned!

Rather than covering the specific sampling distribution in each scenario, we'll skip straight to R. Here's what we need to know:

Comparing Two Independent Population Means

For a quantitative response variable with independent samples, like the number of alcoholic beverages consumed per week, we'll break the observations into two groups.

To compare these two groups, we'll modify our t-test.

The null hypothesis for comparing two independent population means is:

\[ H_0: \mu_1-\mu_2=0 \]

The alternative hypothesis can be either one-sided or two-sided:

\[ H_0: \mu_1>\mu_2=0 \] \[ H_0: \mu_1<\mu_2=0 \] \[ H_0: \mu_1\ne\mu_2=0 \]


Example: The data set Drinking records the number of alcoholic beverages consumed per week (Alcohol) and the gender (Gender) of 236 college students at a large state university.

Once we've identified the null and alternative hypotheses, we need to use R to do the test. We'll still use the t.test function, but the form will change:

t.test(variable ~ group, data = , alternative = )

In this function, variable will be the variable of interest. For the Drinking data set, that's Alcohol. group will be the variable that contains the groups we want to split the data on. In our case, group is Gender.

Once we have the output, we still need to:

  1. Find and interpret the p-value.
  2. Reject or fail to reject \( H_0 \)
  3. Write a short interpretation of the results in context (what do they mean?)

Example: The data set Drinking records the number of alcoholic beverages consumed per week (Alcohol) and the gender (Gender) of 236 college students at a large state university.

library(mosaic)
data(Drinking)
t.test(Alcohol ~ Gender, data = Drinking, alternative = "less")
## 
##  Welch Two Sample t-test
## 
## data:  Alcohol by Gender
## t = -4.287, df = 96.21, p-value = 2.154e-05
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##    -Inf -2.933
## sample estimates:
## mean in group Female   mean in group Male 
##                2.842                7.630

We reject the null hypothesis (p-value = 0.00002154 < 0.05). There is enough evidence to conclude that male college students consume more alcoholic beverages per week on average than female college students.


Example: Would students with a higher GPA be more likely to report cheating? The Bodyimage data set you used in Homework 6 contains a lot more data!

data(Bodyimage)
head(Bodyimage)
##   Gender Height  GPA HS_GPA Seat  WtFeel Cheat
## 1 Female     64 2.60   2.63    M AboutRt    No
## 2   Male     69 2.70   3.72    M AboutRt    No
## 3 Female     66 3.00   3.44    F AboutRt    No
## 4 Female     63 3.11   2.73    F AboutRt    No
## 5   Male     72 3.40   2.35    B  OverWt    No
## 6 Female     67 3.43   3.84    M AboutRt    No

Let group 1 be students who would not tell an instructor if they saw someone cheating on an exam (Cheat=No) and group 2 be students who would tell an instructor if they saw someone cheating on an exam (Cheat=Yes). Test the hypothesis that students who would tell the instructor have a higher GPA than those who do not.


Cellphones data revisited!

Example: At the start of the class we looked at some histograms to see whether college students who own a cell phone spend more hours sleeping in a typical day. The histogram we used is below.

xhistogram(~Sleep | Cell, data = Cellphones, type = "count")

plot of chunk unnamed-chunk-4

## 
##  Welch Two Sample t-test
## 
## data:  Sleep by Cell
## t = 0.3166, df = 90.39, p-value = 0.3762
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  -0.3001     Inf
## sample estimates:
##  mean in group no mean in group yes 
##             7.254             7.183

Try it with a partner!

Example: Low birth weight is an outcome that physicians are very concerned about. Infant mortality rates and birth defect rates are very high for babies with low birth weight. A woman's behavior and medical history before and during pregancy can greatly alter the chances of delivering a baby with normal birth weight.

Load the LBW data set. Use this data set to answer the questions below.