library(psych) # for the describe() command
library(car) # for the leveneTest() command
library(effsize) # for the cohen.d() command
library(ggplot2)t-Test HW
Loading Libraries
Importing Data
# UPDATE THIS FOR HW!!!
d <- read.csv(file="Data/mydata.csv", header=T)State Your Hypothesis - PART OF YOUR WRITEUP
Men will report significantly lower rates of social media usage than women.
State your t-test hypothesis. Remember, a t-test has one continuous variable as the dependent variable, and one categorical variable with two levels as the independent variable. If your IV of choice has more than two levels, you will need to pick two levels to compare and drop the rest, or combine levels until you only have two left.
Check Your Assumptions
T-test Assumptions
- Data values must be independent (independent t-test only) (confirmed by data report)
- Data obtained via a random sample (confirmed by data report)
- IV must have two levels (will check below)
- Dependent variable must be normally distributed (will check below. if issues, note and proceed)
- Variances of the two groups must be approximately equal, aka ‘homogeneity of variance’. Lacking this makes our results inaccurate (will check below - this really only applies to Student’s t-test, but we’ll check it anyway)
Checking IV levels
# preview the levels and counts for your IV
table(d$gender, useNA = "always")
f m nb <NA>
2278 777 54 0
# # note that the table() output shows you exactly how the levels of your variable are written. when recoding, make sure you are spelling them exactly as they appear
#
# # to drop levels from your variable
# # this subsets the data and says that any participant who is coded as 'LEVEL BAD' should be removed
d <- subset(d, gender != "nb")
# # preview your changes and make sure everything is correct
#
# # check your variable types
str(d)'data.frame': 3055 obs. of 6 variables:
$ swb : num 4.33 4.17 1.83 5.17 3.67 ...
$ gender : chr "f" "m" "m" "f" ...
$ race_rc : chr "white" "white" "white" "other" ...
$ socmeduse : int 47 23 34 35 37 13 37 43 37 29 ...
$ moa_safety: num 2.75 3.25 3 1.25 2.25 2.5 4 3.25 2.75 3.5 ...
$ exploit : num 2 3.67 4.33 1.67 4 ...
# # make sure that your IV is recognized as a factor by R
d$gender <- as.factor(d$gender)Testing Homogeneity of Variance with Levene’s Test
We can test whether the variances of our two groups are equal using Levene’s test. The null hypothesis is that the variance between the two groups is equal, which is the result we want. So when running Levene’s test we’re hoping for a non-significant result!
# # use the leveneTest() command from the car package to test homogeneity of variance
# # uses the same 'formula' setup that we'll use for our t-test: formula is y~x, where y is our DV and x is our IV
leveneTest(socmeduse~gender, data = d)Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 13.112 0.0002982 ***
3053
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This is more of a formality in our case, because we are using Welch’s t-test, which does not have the same assumptions as Student’s t-test (the default type of t-test) about variance. R defaults to using Welch’s t-test so this doesn’t require any extra effort on our part!
Check Normality
# you only need to check the variables you're using in the current analysis
# although you checked them previously, it's always a good idea to look them over again and be sure that everything is correct
# you can use the describe() command on an entire datafrom (d) or just on a single variable (d$pss)
# use it to check the skew and kurtosis of your DV
describe(d$socmeduse) vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 3055 34.45 8.58 35 34.72 7.41 11 55 44 -0.31 0.25 0.16
# can use the describeBy() command to view the means and standard deviations by group
# it's very similar to the describe() command but splits the dataframe according to the 'group' variable
describeBy(d$socmeduse, group=d$gender)
Descriptive statistics by group
group: f
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 2278 35.17 8.24 36 35.38 7.41 11 55 44 -0.28 0.34 0.17
------------------------------------------------------------
group: m
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 777 32.34 9.18 33 32.64 8.9 11 55 44 -0.24 -0.08 0.33
# also use a histogram to examine your continuous variable
hist(d$socmeduse)# last, use a boxplot to examine your continuous and categorical variables together
# categorical/IV on the right, continuous/DV on the left
boxplot(d$socmeduse~d$gender)Issues with My Data - PART OF YOUR WRITEUP
Before proceeding with analysis, we confirmed that all t-test assumptions were met. Levene’s test found significant heterogeneity of variance (p = 0.0003). As a result, Welch’s t-test will be used. We dropped non-binary participants due to small sample and representative sample size. We also confirmed heterogeneity of variance using Levene’s test (p = 0.0003) and that our dependent variable is abnormally distributed (skew and kurtosis was not between -2 and +2).
Run a T-test
# # very simple! we specify the dataframe alongside the variables instead of having a separate argument for the dataframe like we did for leveneTest()
t_output <- t.test(d$socmeduse~d$gender)View Test Output
t_output
Welch Two Sample t-test
data: d$socmeduse by d$gender
t = 7.6126, df = 1228.9, p-value = 5.329e-14
alternative hypothesis: true difference in means between group f and group m is not equal to 0
95 percent confidence interval:
2.101448 3.560662
sample estimates:
mean in group f mean in group m
35.17340 32.34234
Calculate Cohen’s d
# # once again, we use our formula to calculate cohen's d
d_output <- cohen.d(d$socmeduse~d$gender)View Effect Size
- Trivial: < .2
- Small: between .2 and .5
- Medium: between .5 and .8
- Large: > .8
d_output
Cohen's d
d estimate: 0.333554 (small)
95 percent confidence interval:
lower upper
0.2516665 0.4154415
Write Up Results
We tested our hypothesis that men would report significantly more social media usage than women using a Welch Two Sample t-test. Our data did not meet all of the assumptions of the t-test, wed did not find a significant difference, mean in group f: 35.17, mean in group m: 32.34234. t(1228.9) = 7.61, p = 0.0000000000000532, d = 0.33, 95% [2.10, 3.56]. (refer to Figure 1).
Our effect size was small, according to Cohen (1998).
References
Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.