When would we want to compare two population parameters?
Example: Consider a study designed to compare male and female college students to see which gender consumed more alcoholic beverages in the past week.
The two groups being compared - males and females - are the categories of a binary variable. There are two possible outcomes!
Most comparisons of groups use independent samples. Sometimes a dependent sample is more appropriate
Independent samples:
Dependent samples:
If time, we'll return to dependent samples as a special topic at the end of the class.
Independent samples are usually easier to make statements about than dependent samples. Each of these comparisons are similar to hypothesis tests we have already learned!
Rather than covering the specific sampling distribution in each scenario, we'll skip straight to R. Here's what we need to know:
For a quantitative response variable with independent samples, like the number of alcoholic beverages consumed per week, we'll break the observations into two groups.
To compare these two groups, we'll modify our t-test.
The null hypothesis for comparing two independent population means is:
\[ H_0: \mu_1-\mu_2=0 \]
The alternative hypothesis can be either one-sided or two-sided:
\[ H_0: \mu_1>\mu_2=0 \] \[ H_0: \mu_1<\mu_2=0 \] \[ H_0: \mu_1\ne\mu_2=0 \]
Example: The data set Drinking records the number of alcoholic beverages consumed per week (Alcohol) and the gender (Gender) of 236 college students at a large state university.
Once we've identified the null and alternative hypotheses, we need to use R to do the test. We'll still use the t.test function, but the form will change:
t.test(variable ~ group, data = , alternative = )
In this function, variable will be the variable of interest. For the Drinking data set, that's Alcohol. group will be the variable that contains the groups we want to split the data on. In our case, group is Gender.
Once we have the output, we still need to:
Example: The data set Drinking records the number of alcoholic beverages consumed per week (Alcohol) and the gender (Gender) of 236 college students at a large state university.
library(mosaic)
data(Drinking)
t.test(Alcohol ~ Gender, data = Drinking, alternative = "less")
##
## Welch Two Sample t-test
##
## data: Alcohol by Gender
## t = -4.287, df = 96.21, p-value = 2.154e-05
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf -2.933
## sample estimates:
## mean in group Female mean in group Male
## 2.842 7.630
We reject the null hypothesis (p-value = 0.00002154 < 0.05). There is enough evidence to conclude that male college students consume more alcoholic beverages per week on average than female college students.
Example: Would students with a higher GPA be more likely to report cheating? The Bodyimage data set you used in Homework 6 contains a lot more data!
data(Bodyimage)
head(Bodyimage)
## Gender Height GPA HS_GPA Seat WtFeel Cheat
## 1 Female 64 2.60 2.63 M AboutRt No
## 2 Male 69 2.70 3.72 M AboutRt No
## 3 Female 66 3.00 3.44 F AboutRt No
## 4 Female 63 3.11 2.73 F AboutRt No
## 5 Male 72 3.40 2.35 B OverWt No
## 6 Female 67 3.43 3.84 M AboutRt No
Let group 1 be students who would not tell an instructor if they saw someone cheating on an exam (Cheat=No) and group 2 be students who would tell an instructor if they saw someone cheating on an exam (Cheat=Yes). Test the hypothesis that students who would tell the instructor have a higher GPA than those who do not.
Cellphones data revisited!
Example: At the start of the class we looked at some histograms to see whether college students who own a cell phone spend more hours sleeping in a typical day. The histogram we used is below.
xhistogram(~Sleep | Cell, data = Cellphones, type = "count")
##
## Welch Two Sample t-test
##
## data: Sleep by Cell
## t = 0.3166, df = 90.39, p-value = 0.3762
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## -0.3001 Inf
## sample estimates:
## mean in group no mean in group yes
## 7.254 7.183
Try it with a partner!
Example: Low birth weight is an outcome that physicians are very concerned about. Infant mortality rates and birth defect rates are very high for babies with low birth weight. A woman's behavior and medical history before and during pregancy can greatly alter the chances of delivering a baby with normal birth weight.
Load the LBW data set. Use this data set to answer the questions below.
hypertension=0) tend to have babies with a higher birth weight than patients who have had hypertension before or during their pregnancy (hypertension=1).