Part 8: Making Statistical Decisions about Two Population Means

When would we want to compare two population parameters?

Example: Consider a study designed to compare male and female college students to see which gender consumed more alcoholic beverages in the past week.

The two groups being compared - males and females - are the categories of a binary variable. There are two possible outcomes!

For college students, what other groups (binary variables or not) could we make comparisons on?

Most comparisons of groups use independent samples. Sometimes a dependent sample is more appropriate

Independent samples:

Experiments that randomly assign subjects to treatment groups
Observational studies that separate subjects into groups based on an explanatory variable

Dependent samples:

Studying the effect of different exercise regimens on twins - each twin does a different workout
Measuring blood pressure before and after a treatment on the same subject

If time, we'll return to dependent samples as a special topic at the end of the class.

Independent samples are usually easier to make statements about than dependent samples. Each of these comparisons are similar to hypothesis tests we have already learned!

Rather than covering the specific sampling distribution in each scenario, we'll skip straight to R. Here's what we need to know:

We expect our samples to fall close to the true population means
We expect that our samples will vary – the average “deviation” or “error” is the standard error

Comparing Two Independent Population Means

For a quantitative response variable with independent samples, like the number of alcoholic beverages consumed per week, we'll break the observations into two groups.

We'll compare the two groups by testing their difference: \[ \mu_1-\mu_2 \]
To estimate this difference, we'll use: \[ \bar{x_1}-\bar{x_2} \]

To compare these two groups, we'll modify our t-test.

The null hypothesis for comparing two independent population means is:

\[ H_0: \mu_1-\mu_2=0 \]

How could you rewrite this hypothesis?

The alternative hypothesis can be either one-sided or two-sided:

\[ H_0: \mu_1>\mu_2=0 \] \[ H_0: \mu_1<\mu_2=0 \] \[ H_0: \mu_1\ne\mu_2=0 \]

Example: The data set Drinking records the number of alcoholic beverages consumed per week (Alcohol) and the gender (Gender) of 236 college students at a large state university.

Let group 1 be females and group 2 be males. Write the null and alternative hypothesis to see whether men consume more alcoholic beverages on average per week than women.

Once we've identified the null and alternative hypotheses, we need to use R to do the test. We'll still use the t.test function, but the form will change:

t.test(variable ~ group, data = , alternative = )

In this function, variable will be the variable of interest. For the Drinking data set, that's Alcohol. group will be the variable that contains the groups we want to split the data on. In our case, group is Gender.

Once we have the output, we still need to:

Find and interpret the p-value.
Reject or fail to reject \( H_0 \)
Write a short interpretation of the results in context (what do they mean?)

Example: The data set Drinking records the number of alcoholic beverages consumed per week (Alcohol) and the gender (Gender) of 236 college students at a large state university.

library(mosaic)
data(Drinking)
t.test(Alcohol ~ Gender, data = Drinking, alternative = "less")

## 
##  Welch Two Sample t-test
## 
## data:  Alcohol by Gender
## t = -4.287, df = 96.21, p-value = 2.154e-05
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##    -Inf -2.933
## sample estimates:
## mean in group Female   mean in group Male 
##                2.842                7.630

We reject the null hypothesis (p-value = 0.00002154 < 0.05). There is enough evidence to conclude that male college students consume more alcoholic beverages per week on average than female college students.

Example: Would students with a higher GPA be more likely to report cheating? The Bodyimage data set you used in Homework 6 contains a lot more data!

data(Bodyimage)
head(Bodyimage)

##   Gender Height  GPA HS_GPA Seat  WtFeel Cheat
## 1 Female     64 2.60   2.63    M AboutRt    No
## 2   Male     69 2.70   3.72    M AboutRt    No
## 3 Female     66 3.00   3.44    F AboutRt    No
## 4 Female     63 3.11   2.73    F AboutRt    No
## 5   Male     72 3.40   2.35    B  OverWt    No
## 6 Female     67 3.43   3.84    M AboutRt    No

Let group 1 be students who would not tell an instructor if they saw someone cheating on an exam (Cheat=No) and group 2 be students who would tell an instructor if they saw someone cheating on an exam (Cheat=Yes). Test the hypothesis that students who would tell the instructor have a higher GPA than those who do not.

Cellphones data revisited!

Example: At the start of the class we looked at some histograms to see whether college students who own a cell phone spend more hours sleeping in a typical day. The histogram we used is below.

xhistogram(~Sleep | Cell, data = Cellphones, type = "count")

plot of chunk unnamed-chunk-4

Based on these histograms, we decided that there was not much of a difference in students who have cell phones and those who don't. Do you think that the population mean hours of sleep are different for group 1 (those without cell phones) and group 2 (those with cell phones)? Why or why not?
Use R to do a hypothesis to test whether students without cell phones sleep more than students who do. The R output is provided below to check your work.

## 
##  Welch Two Sample t-test
## 
## data:  Sleep by Cell
## t = 0.3166, df = 90.39, p-value = 0.3762
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  -0.3001     Inf
## sample estimates:
##  mean in group no mean in group yes 
##             7.254             7.183

Interpret this hypothesis test.
How do you think the standard deviations of these two groups would compare?

Try it with a partner!

Example: Low birth weight is an outcome that physicians are very concerned about. Infant mortality rates and birth defect rates are very high for babies with low birth weight. A woman's behavior and medical history before and during pregancy can greatly alter the chances of delivering a baby with normal birth weight.

Load the LBW data set. Use this data set to answer the questions below.

Write the null and alternative hypotheses to test whether patients with no history of hypertension (hypertension=0) tend to have babies with a higher birth weight than patients who have had hypertension before or during their pregnancy (hypertension=1).
Use R to do this hypothesis test. What's the p-value?
Is there enough evidence to say that mothers who haven't experienced hypertension have babies with a higher average birth weight than mothers who have experienced hypertension?
Low birth weight is classified as less than 2,500 g. Does one, both, or none of these groups have low birth weight on average?