This is the homework version of the Running a T-test Lab, using my cleaned project dataset and research hypothesis.
#install.packages("car")
#install.packages("effsize")
library(psych)
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
library(effsize)
##
## Attaching package: 'effsize'
## The following object is masked from 'package:psych':
##
## cohen.d
d <- read.csv(file="Data/projectdata.csv", header=T)
There will be a significant difference in subjective wellbeing by people’s sex, between males and females. Specifically, females will report lower subjective wellbeing than males due to experiencing higher levels of perceived stress during emerging adulthood.
Note: Subjective wellbeing was measured as a composite score across multiple items in this dataset.
## Checking the Categorical variable (IV)
str(d$gender)
## chr [1:3162] "f" "m" "m" "f" "m" "f" "f" "f" "f" "f" "f" "m" "f" "m" "m" ...
d$gender <- as.factor(d$gender)
str(d$gender)
## Factor w/ 3 levels "f","m","nb": 1 2 2 1 2 1 1 1 1 1 ...
table(d$gender, useNA = "always")
##
## f m nb <NA>
## 2320 788 54 0
## Checking the Continuous variable (DV)
describe(d$swb)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 3162 4.48 1.32 4.67 4.53 1.48 1 7 6 -0.36 -0.45 0.02
hist(d$swb, main = "Histogram of Subjective Wellbeing", xlab = "Subjective Wellbeing")
describeBy(d$swb, group = d$gender)
##
## Descriptive statistics by group
## group: f
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 2320 4.48 1.3 4.58 4.54 1.36 1 7 6 -0.38 -0.45 0.03
## ------------------------------------------------------------
## group: m
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 788 4.53 1.36 4.67 4.57 1.48 1 7 6 -0.34 -0.45 0.05
## ------------------------------------------------------------
## group: nb
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 54 3.73 1.29 3.75 3.75 1.61 1 6.83 5.83 -0.04 -0.62 0.18
boxplot(d$swb ~ d$gender, xlab = "Gender", ylab = "Subjective Wellbeing")
d <- subset(d, gender != "nb")
table(d$gender, useNA = "always")
##
## f m nb <NA>
## 2320 788 0 0
d$gender <- droplevels(d$gender)
table(d$gender, useNA = "always")
##
## f m <NA>
## 2320 788 0
We can test whether the variances of our two groups are equal using Levene’s test. The NULL hypothesis is that the variance between the two groups is equal, which is the result we WANT. So when running Levene’s test we’re hoping for a NON-SIGNIFICANT result!
leveneTest(swb ~ gender, data = d)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 1.7608 0.1846
## 3106
Levene’s test revealed that our data has equal variances between the two comparison groups, females and males, on their levels of subjective wellbeing, F(1, 3106) = 1.76, p = 0.18.
When running a t-test, we can account for heterogeneity in our variance by using the Welch’s t-test, which does not have the same assumption about variance as the Student’s t-test. R defaults to using Welch’s t-test so this doesn’t require any changes on our part! Even if your data has no issues with homogeneity of variance, you’ll still use Welch’s t-test – it handles the potential issues around variance well and there are no real downsides.
My independent variable has 3 levels. To proceed with this analysis, I will drop the non-binary (nb) participants from my sample. I will make a note to discuss this issue in my methods section write-up and in my discussion section as a limitation of my study.
My data has no issue regarding homogeneity of variance, as Levene’s test was non-significant (p = 0.18). To be consistent with best practices, I will still use Welch’s t-test instead of Student’s t-test in my analysis.
t_output <- t.test(swb ~ gender, data = d)
t_output
##
## Welch Two Sample t-test
##
## data: swb by gender
## t = -0.91817, df = 1308.9, p-value = 0.3587
## alternative hypothesis: true difference in means between group f and group m is not equal to 0
## 95 percent confidence interval:
## -0.15997786 0.05797148
## sample estimates:
## mean in group f mean in group m
## 4.475647 4.526650
# We **only** calculate effect size if the test is SIG!
# Our t-test was non-significant (p = 0.36), so we do NOT calculate Cohen's d.
# d_output <- cohen.d(swb ~ gender, data = d)
# Effect size was not calculated because the t-test result was non-significant.
To test our hypothesis that females in our sample would report significantly lower levels of subjective wellbeing than males, we used an independent samples t-test. This required us to drop our non-binary gender participants from our sample, as we are limited to a two-group comparison when using this test. We tested the homogeneity of variance with Levene’s test and found no signs of heterogeneity (F(1, 3106) = 1.76, p = 0.18). Our data met all other assumptions of an independent samples t-test.
Contrary to our hypothesis, there was not a significant difference in subjective wellbeing between females (M = 4.48, SD = 1.30) and males (M = 4.53, SD = 1.36); t(1308.9) = -0.92, p = 0.36 (see Figure 1). Given the non-significant result, effect size was not calculated.
References
Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.