Comparing Two Means

Let’s compare mean values.

Mean values refer to metric (continuous) variables.
Mean values can be compared across groups.
If there are two groups, t-test is the parametric test you need.
If there are more than two groups, use one-way ANOVA (coming soon).
If the assumptions of t-test do not hold, use the non-parametric Mann-Whitney’s U/ Wilcoxon’s Rank Sum test.

In the general case, 1) start with exploratory statistics (get the feel of your data); 2) check assumptions; 3) if assumptions hold, perform the t-test, else run the non-parametric test.

Start with data inspection and assumptions to decide on the test

library(foreign)
a <- read.spss("C:/Users/lssi5/Dropbox/Data Analysis II (2019)/data/alcohol.sav", use.value.labels = T, to.data.frame = T)
a$sex <- as.factor(a$sex)
a$sex <- relevel(a$sex, ref = "female")
a$smoke <- as.numeric(a$smoke)

Inspect the Distribution

descriptive statistics by groups

library(psych)
describeBy(a, a$sex)

## 
##  Descriptive statistics by group 
## group: female
##           vars     n  mean    sd median trimmed   mad min max range  skew
## sex*         1 10274  1.00  0.00      1    1.00  0.00   1   1     0   NaN
## age          2 10274 40.69 22.99     40   40.45 26.69   0 101   101  0.07
## material*    3  8579  3.49  1.16      4    3.53  1.48   1   5     4 -0.29
## marital*     4  8669  2.80  1.46      2    2.74  1.48   1   6     5  0.40
## alcohol*     5  3088  5.03  0.92      5    5.12  1.48   1   6     5 -0.86
## smoke        6  1177 15.98  5.91     15   14.98  2.97   3  51    48  2.31
##           kurtosis   se
## sex*           NaN 0.00
## age          -0.93 0.23
## material*    -1.03 0.01
## marital*     -1.27 0.02
## alcohol*      0.92 0.02
## smoke         7.26 0.17
## -------------------------------------------------------- 
## group: male
##           vars    n  mean    sd median trimmed   mad min max range  skew
## sex*         1 8098  2.00  0.00      2    2.00  0.00   2   2     0   NaN
## age          2 8098 34.72 21.22     33   34.00 25.20   0  94    94  0.24
## material*    3 6364  3.48  1.14      4    3.51  1.48   1   5     4 -0.24
## marital*     4 6469  2.11  1.06      2    1.96  1.48   1   6     5  1.16
## alcohol*     5 3465  4.35  1.13      4    4.40  1.48   1   6     5 -0.61
## smoke        6 3075 13.61  3.99     13   13.45  2.97   1  50    49  1.99
##           kurtosis   se
## sex*           NaN 0.00
## age          -0.77 0.24
## material*    -1.04 0.01
## marital*      1.00 0.01
## alcohol*      0.41 0.02
## smoke        12.60 0.07

Inspect the Distribution

histograms by groups

library(ggplot2)
ggplot(a, aes(x = smoke, color = sex, fill = sex)) +
      geom_histogram(aes(y=..density..), position = "identity", alpha = 0.5) +
      labs(title = "Smoke by Sex histogram",x = "Started Smoking (age)", y = "Density") +
      theme_classic() +
      facet_grid(. ~ sex)

If you are in a desert with no time for ggplot, this will also do:

par(mfrow = c(1, 2))
hist(a$smoke[a$sex == "female"], ylim = c(0, 2100))
hist(a$smoke[a$sex == "male"], ylim = c(0, 2100))

Inspect the Distribution

boxplots

boxplot(a$smoke ~ a$sex, 
        xlab = "Sex", 
        ylab = "Age Started Smoking", 
        par(mar = c(4,5,1,1)+.5), 
        names = c("Females", "Males"))

ggplot(a, aes(x = sex, y = smoke)) + 
  geom_boxplot() +
  stat_summary(fun.y = mean, geom = "point", shape = 4, size = 4) +
  theme_classic() +
  ggtitle("Age Started Smoking by Sex of Respondent")

## Warning: Removed 14120 rows containing non-finite values (stat_boxplot).

## Warning: Removed 14120 rows containing non-finite values (stat_summary).

Results of Exploratory Analysis:

both groups have thousands of observations
it means normality is not that much of a concern
but there are many outliers
you can bootstrap standard errors if distributions are not normal
we might also want to do both parametric and non-parametric tests to compare

Let’s check the assumptions of t-test:

Normal distribution of DV by group;
Homogeneous variances in the groups.

There are formal tests to check normality (Shapiro-Wilk, Kolmogorov-Smirnov test), but they are oversensitive (= tell that distribution is not normal even with slight deviations from normality). In addition, we care less about normality for large representative samples (over 300 obs.), following the Central Limit Theorem. See: http://thestatsgeek.com/2013/09/28/the-t-test-and-robustness-to-non-normality/

As an alternative to the formal test of normality, you can draw a Q-Q (quantile-quantile) plot.

Q-Q plot shows your variable’s distribution against a theoretical (normal) distribution.
See: normal probability plot (https://goo.gl/ZyGjzK).
In a normal distribution, there will be a straight line going from one corner to the opposite one.
In a skewed distribution (check out skew and kurtosis in descriptive stats above), the line will resemble a snake hanging from a tree branch (as one quant psychologist once put it).

QQ-plot for normality check

qqnorm(a$smoke[a$sex == "female"]); qqline(a$smoke[a$sex == "female"], col= 2)

qqnorm(a$smoke[a$sex == "male"]); qqline(a$smoke[a$sex == "female"], col= 2) #these two are not straight lines

Homogeneity (equality) of variances of DV across groups

If variances are not equal, Welch’s correction should be applied for t-test to be reliable.
the regular t.test in R accounts for it by default (Yeah!)
formal tests also exist, e.g. Levene’s test or Bartlett’s test but they are not needed

t.test(a$smoke ~ as.factor(a$sex), var.equal = F) # Compare with:

## 
##  Welch Two Sample t-test
## 
## data:  a$smoke by as.factor(a$sex)
## t = 12.712, df = 1603.4, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  2.005746 2.737652
## sample estimates:
## mean in group female   mean in group male 
##             15.98471             13.61301

t.test(a$smoke ~ as.factor(a$sex))

## 
##  Welch Two Sample t-test
## 
## data:  a$smoke by as.factor(a$sex)
## t = 12.712, df = 1603.4, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  2.005746 2.737652
## sample estimates:
## mean in group female   mean in group male 
##             15.98471             13.61301

Conclusion: on average, males start to smoke at 13.6, while females at 16.0 years. The t-statistic is equal to 12.7 (p-value < 0.001), which means that the observed difference in means is statistically significant across the two groups.

Non-parametric Test for Two Independent Samples:

no assumptions apply
slightly less powerful than t-test if assumptions hold

wilcox.test(smoke ~ sex, data = a)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  smoke by sex
## W = 2252600, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0

# wilcox.test(a$smoke ~ a$sex)

Conclusion: the Wilcoxon rank sum W = 2252600 (p < 0.001), which means that the males and females start smoking at different ages and this difference is statistically significant.

(If you are in doubt, e.g. you have two groups of 29 observations, and reach differenct conclusions with non-parametric and parametric tests, stick to the non-parametric outcome.)

Paired Samples t-test

Example: alcohol consumption on weekdays and on weekends (same respondents). Data: ESS 2014, UK sample.

library(foreign)
GBdata <- read.spss("C:/Users/lssi5/Dropbox/Data Analysis II (2019)/data/ESS7GB.sav", to.data.frame = T, use.value.labels = T)
GBdata$alcwkdy <- as.numeric(as.character(GBdata$alcwkdy))
GBdata$alcwknd <- as.numeric(as.character(GBdata$alcwknd))
summary(GBdata$alcwkdy) # How many grams of alcohol respondent consumes on an average weekday

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     0.0     0.0    18.0    36.3    48.0   632.0     461

summary(GBdata$alcwknd) # How many grams of alcohol respondent consumes on an average weekend day

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.00   17.00   40.00   62.43   77.75 1948.00     462

Draw histograms to get the idea of the distributions:

#hist(GBdata$alcwkdy)
#hist(GBdata$alcwknd)
ggplot(GBdata, aes(x = alcwkdy)) +
      geom_histogram(aes(y=..density..), position = "identity", alpha = 0.5) +
      geom_density(alpha=0.6) +
      labs(title = "Alcohol Consumption on Weekday, UK",x = "Alcohol consumed in grams", y = "Density") +
      theme_classic()

ggplot(GBdata, aes(x = alcwknd)) +
      geom_histogram(aes(y=..density..), position = "identity", alpha = 0.5) +
      geom_density(alpha=0.6) +
      labs(title = "Alcohol Consumption on Weekend Day, UK",x = "Alcohol consumed in grams", y = "Density") +
      theme_classic()

ggplot(GBdata, aes(x = alcwknd - alcwkdy)) +
      geom_histogram(aes(y=..density..), position = "identity", alpha = 0.5) +
      geom_density(alpha=0.6) +
      labs(title = "The difference between alcohol consumed during weekends and weekdays in grams, UK",x = "The difference in grams", y = "Density") +
      theme_classic()

Now, the paired t-test itself

let’s compare whether respondents drink on an average weekend day MORE than on an average weekday.

mean(GBdata$alcwkdy - GBdata$alcwknd, na.rm = T) # average difference in grams

## [1] -26.14159

t.test(GBdata$alcwknd, GBdata$alcwkdy, paired = T)

## 
##  Paired t-test
## 
## data:  GBdata$alcwknd and GBdata$alcwkdy
## t = 13.119, df = 1800, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  22.23332 30.04986
## sample estimates:
## mean of the differences 
##                26.14159

Conclusion: The mean amount of alcohol consumed on a weekend day is 20 grams higher than on a weekday. The paired t(1180) = 10.248 (p < 0.001), with 95%CI = [15; 22] Therefore, we conclude that the difference is statistically significant and alcohol consumption on weekends is, indeed, higher.

Non-parametric equivalent:

wilcox.test(x = GBdata$alcwknd, y = GBdata$alcwkdy, paired = TRUE, alternative = "greater")

## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  GBdata$alcwknd and GBdata$alcwkdy
## V = 771290, p-value < 2.2e-16
## alternative hypothesis: true location shift is greater than 0

Conclusion: Wilcoxon’s signed rank V = 277510 (p < 0.001), which means that alcohol consumption on weekend days is greater than on an average weekday, and this difference is statistically significant.

Some extras: bootstrapped CI and effect size for parametric t-test

Field (2016, p.558): “When testing the difference between independent means, Yuen proposed a test that compares trimmed means and uses a bootstrap to compute confidence intervals. The trimming reduces the impact of outliers. We can implement this method thanks to a function by Wilcox34 in the R software package.”

# Try this at home:
# library(yuen)
# yuenbt(smoke ~ as.factor(sex), data = a)

effect size of the t-test shows how strong the difference is, given the sample sizes
a popular measure in the literature is Cohen’s D
For interpretation, see: http://rpsychologist.com/d3/cohend/

With a Cohen’s d of 0.5, 69 % of the treatment group will be above the mean of the control group (Cohen’s U3), 80 % of the two groups will overlap, and there is a 64 % chance that a person picked at random from the treatment group will have a higher score than a person picked at random from the control group (probability of superiority).

library(lsr)

## Warning: package 'lsr' was built under R version 3.5.2

cohensD(a$smoke ~ a$sex)

## [1] 0.5156865

Comparing Two Means

Let’s compare mean values.

Start with data inspection and assumptions to decide on the test

Inspect the Distribution

Inspect the Distribution

Inspect the Distribution

Results of Exploratory Analysis:

Let’s check the assumptions of t-test:

QQ-plot for normality check

Homogeneity (equality) of variances of DV across groups

Non-parametric Test for Two Independent Samples:

Paired Samples t-test

Draw histograms to get the idea of the distributions:

Now, the paired t-test itself

Non-parametric equivalent:

Some extras: bootstrapped CI and effect size for parametric t-test

The end!