This is the homework version of the Running a T-test Lab, using my cleaned project dataset and research hypothesis.

1 Loading Libraries

#install.packages("car")
#install.packages("effsize")

library(psych)
library(car)
## Loading required package: carData
## 
## Attaching package: 'car'
## The following object is masked from 'package:psych':
## 
##     logit
library(effsize)
## 
## Attaching package: 'effsize'
## The following object is masked from 'package:psych':
## 
##     cohen.d

2 Importing Data

d <- read.csv(file="Data/projectdata.csv", header=T)

3 State Your Hypothesis

There will be a significant difference in subjective wellbeing by people’s sex, between males and females. Specifically, females will report lower subjective wellbeing than males due to experiencing higher levels of perceived stress during emerging adulthood.

Note: Subjective wellbeing was measured as a composite score across multiple items in this dataset.

4 Check Your Variables

## Checking the Categorical variable (IV)

str(d$gender)
##  chr [1:3162] "f" "m" "m" "f" "m" "f" "f" "f" "f" "f" "f" "m" "f" "m" "m" ...
d$gender <- as.factor(d$gender)

str(d$gender)
##  Factor w/ 3 levels "f","m","nb": 1 2 2 1 2 1 1 1 1 1 ...
table(d$gender, useNA = "always")
## 
##    f    m   nb <NA> 
## 2320  788   54    0
## Checking the Continuous variable (DV)

describe(d$swb)
##    vars    n mean   sd median trimmed  mad min max range  skew kurtosis   se
## X1    1 3162 4.48 1.32   4.67    4.53 1.48   1   7     6 -0.36    -0.45 0.02
hist(d$swb, main = "Histogram of Subjective Wellbeing", xlab = "Subjective Wellbeing")

describeBy(d$swb, group = d$gender)
## 
##  Descriptive statistics by group 
## group: f
##    vars    n mean  sd median trimmed  mad min max range  skew kurtosis   se
## X1    1 2320 4.48 1.3   4.58    4.54 1.36   1   7     6 -0.38    -0.45 0.03
## ------------------------------------------------------------ 
## group: m
##    vars   n mean   sd median trimmed  mad min max range  skew kurtosis   se
## X1    1 788 4.53 1.36   4.67    4.57 1.48   1   7     6 -0.34    -0.45 0.05
## ------------------------------------------------------------ 
## group: nb
##    vars  n mean   sd median trimmed  mad min  max range  skew kurtosis   se
## X1    1 54 3.73 1.29   3.75    3.75 1.61   1 6.83  5.83 -0.04    -0.62 0.18
boxplot(d$swb ~ d$gender, xlab = "Gender", ylab = "Subjective Wellbeing")

5 Check Your Assumptions

5.1 T-test Assumptions

  • IV must have exactly 2 levels
  • Data values must be independent (independent t-test only)
  • Data obtained via a random sample
  • Dependent variable must be normally distributed
  • Variances of the two groups are approx. equal
d <- subset(d, gender != "nb")

table(d$gender, useNA = "always")
## 
##    f    m   nb <NA> 
## 2320  788    0    0
d$gender <- droplevels(d$gender)

table(d$gender, useNA = "always")
## 
##    f    m <NA> 
## 2320  788    0

5.2 Testing Homogeneity of Variance with Levene’s Test

We can test whether the variances of our two groups are equal using Levene’s test. The NULL hypothesis is that the variance between the two groups is equal, which is the result we WANT. So when running Levene’s test we’re hoping for a NON-SIGNIFICANT result!

leveneTest(swb ~ gender, data = d)
## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value Pr(>F)
## group    1  1.7608 0.1846
##       3106

Levene’s test revealed that our data has equal variances between the two comparison groups, females and males, on their levels of subjective wellbeing, F(1, 3106) = 1.76, p = 0.18.

When running a t-test, we can account for heterogeneity in our variance by using the Welch’s t-test, which does not have the same assumption about variance as the Student’s t-test. R defaults to using Welch’s t-test so this doesn’t require any changes on our part! Even if your data has no issues with homogeneity of variance, you’ll still use Welch’s t-test – it handles the potential issues around variance well and there are no real downsides.

5.3 Issues with My Data

My independent variable has 3 levels. To proceed with this analysis, I will drop the non-binary (nb) participants from my sample. I will make a note to discuss this issue in my methods section write-up and in my discussion section as a limitation of my study.

My data has no issue regarding homogeneity of variance, as Levene’s test was non-significant (p = 0.18). To be consistent with best practices, I will still use Welch’s t-test instead of Student’s t-test in my analysis.

6 Run a T-test

t_output <- t.test(swb ~ gender, data = d)

7 View Test Output

t_output
## 
##  Welch Two Sample t-test
## 
## data:  swb by gender
## t = -0.91817, df = 1308.9, p-value = 0.3587
## alternative hypothesis: true difference in means between group f and group m is not equal to 0
## 95 percent confidence interval:
##  -0.15997786  0.05797148
## sample estimates:
## mean in group f mean in group m 
##        4.475647        4.526650

8 Calculate Cohen’s d - Effect Size

# We **only** calculate effect size if the test is SIG!
# Our t-test was non-significant (p = 0.36), so we do NOT calculate Cohen's d.

# d_output <- cohen.d(swb ~ gender, data = d)

9 View Effect Size

# Effect size was not calculated because the t-test result was non-significant.

10 Write Up Results

To test our hypothesis that females in our sample would report significantly lower levels of subjective wellbeing than males, we used an independent samples t-test. This required us to drop our non-binary gender participants from our sample, as we are limited to a two-group comparison when using this test. We tested the homogeneity of variance with Levene’s test and found no signs of heterogeneity (F(1, 3106) = 1.76, p = 0.18). Our data met all other assumptions of an independent samples t-test.

Contrary to our hypothesis, there was not a significant difference in subjective wellbeing between females (M = 4.48, SD = 1.30) and males (M = 4.53, SD = 1.36); t(1308.9) = -0.92, p = 0.36 (see Figure 1). Given the non-significant result, effect size was not calculated.

References

Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.