library(psych) # for the describe() command
library(car) # for the leveneTest() command
library(effsize) # for the cohen.d() commandt-Test Lab
Loading Libraries
Importing Data
# UPDATE THIS FOR HW
d <- read.csv(file="Data/mydata.csv", header=T)State Your Hypothesis - PART OF YOUR WRITEUP
Females will report higher levels of social support than male participants
Check Your Assumptions
T-test Assumptions
Checking IV levels
-Data values must be independent (independent t-test only) (confirmed by data report) - Data obtained via a random sample (confirmed by data report) - IV must have two levels (will check below) - Dependent variable must be normally distributed (will check below. if issues, note and proceed) - Variances of the two groups must be approximately equal, aka ‘homogeneity of variance’. Lacking this makes our results inaccurate (will check below - this really only applies to Student’s t-test, but we’ll check it anyway)
# preview the levels and counts for your IV
table(d$gender, useNA = "always")
female I use another term male Prefer not to say
1011 31 199 21
<NA>
0
# note that the table() output shows you exactly how the levels of your variable are written. when recoding, make sure you are spelling them exactly as they appear
# to drop levels from your variable
# this subsets the data and says that any participant who is coded as 'LEVEL BAD' should be removed
# # if you don't need this for the homework, comment it out (add a # at the beginning of the line)
d <- subset(d, gender != "Prefer not to say")
d <- subset(d, gender != "I use another term")
table(d$relationship_status, useNA = "always")
In a relationship/married and cohabiting
286
In a relationship/married but living apart
96
Prefer not to say
76
Single, divorced or widowed
42
Single, never married
710
<NA>
0
# to combine levels
# this says that where any participant is coded as 'LEVEL BAD' it should be replaced by 'LEVEL GOOD'
# you can repeat this as needed, changing 'LEVEL BAD' if you have multiple levels that you want to combine into a single level
# # if you don't need this for the homework, comment it out (add a # at the beginning of the line)
d$relationship_status_ab[d$relationship_status == "In a relationship/married and cohabiting"] <- "taken"
d$relationship_status_ab[d$relationship_status == "In a relationship/married but living apart"] <- "taken"
d$relationship_status_ab[d$relationship_status == "Prefer not to say"] <- "not taken"
d$relationship_status_ab[d$relationship_status == "Single, divorced or widowed"] <- "not taken"
d$relationship_status_ab[d$relationship_status == "Single, never married"] <- "not taken"
table(d$relationship_status_ab, useNA = "always")
not taken taken <NA>
828 382 0
table(d$relationship_status, d$relationship_status_ab, useNA = "always")
not taken taken <NA>
In a relationship/married and cohabiting 0 286 0
In a relationship/married but living apart 0 96 0
Prefer not to say 76 0 0
Single, divorced or widowed 42 0 0
Single, never married 710 0 0
<NA> 0 0 0
# # preview your changes and make sure everything is correct
table(d$gender, useNA = "always")
female male <NA>
1011 199 0
table(d$relationship_status_ab, useNA = "always")
not taken taken <NA>
828 382 0
#
# # check your variable types
str(d)'data.frame': 1210 obs. of 7 variables:
$ big5_con : num 6 3.33 5.33 5.67 6 ...
$ big5_neu : num 6 6.67 4 4 2.67 ...
$ isolation : num 2.25 3.5 1 2.5 1.75 2 1.25 1 3 1.25 ...
$ support : num 2.5 2.17 5 2.5 3.67 ...
$ relationship_status : chr "In a relationship/married and cohabiting" "Prefer not to say" "Prefer not to say" "In a relationship/married and cohabiting" ...
$ gender : chr "female" "male" "female" "female" ...
$ relationship_status_ab: chr "taken" "not taken" "not taken" "taken" ...
#
# # make sure that your IV is recognized as a factor by R
d$gender <- as.factor(d$gender)
d$relationship_status_ab <- as.factor(d$relationship_status_ab)Testing Homogeneity of Variance with Levene’s Test
We can test whether the variances of our two groups are equal using Levene’s test. The null hypothesis is that the variance between the two groups is equal, which is the result we want. So when running Levene’s test we’re hoping for a non-significant result!
# # use the leveneTest() command from the car package to test homogeneity of variance
# # uses the same 'formula' setup that we'll use for our t-test: formula is y~x, where y is our DV and x is our IV
leveneTest(support~gender, data = d)Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 5.2675 0.0219 *
1208
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Check Normality
# you only need to check the variables you're using in the current analysis
# although you checked them previously, it's always a good idea to look them over again and be sure that everything is correct
# you can use the describe() command on an entire datafrom (d) or just on a single variable (d$support)
# use it to check the skew and kurtosis of your DV
describe(d$support) vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 1210 3.6 0.94 3.67 3.65 0.99 1 5 4 -0.45 -0.54 0.03
# can use the describeBy() command to view the means and standard deviations by group
# it's very similar to the describe() command but splits the dataframe according to the 'group' variable
describeBy(d$support, group=d$gender)
Descriptive statistics by group
group: female
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 1011 3.59 0.95 3.67 3.65 0.99 1 5 4 -0.44 -0.59 0.03
------------------------------------------------------------
group: male
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 199 3.63 0.86 3.67 3.68 0.99 1 5 4 -0.48 -0.34 0.06
# also use a histogram to examine your continuous variable
hist(d$support)# last, use a boxplot to examine your continuous and categorical variables together
# categorical/IV goes on the right, continuous/DV goes on the left
boxplot(d$support~d$gender)Issues with My Data - PART OF YOUR WRITEUP
Briefly describe any issues with your data and how you’ve resolved them. For instance, if you are using a gender variable that has three levels, you should say that you dropped or combined two of the levels for your analysis. This should be written in an appropriate scientific tone.
A note that might be helpful: the opposite of ‘homogeneity of variance’ (the thing we want) is ‘heterogeneity of variance’ (the thing we don’t want). So, you could say something like this, if needed:
“Before proceeding with analysis, we confirmed that all t-test assumptions were met. Levene’s test found significant heterogeneity of variance (p = .###). As a result, Welch’s t-test will be used.”
We dropped participants who had genders other than female or male (e.g., I use another term and Prefer not to say). We also confirmed homogeneity of variance using Levene’s test (p=0.450) and that our dependent variable is normally distributed (skew and kurtosis between -2 and +2)
Run a T-test
# # very simple! we specify the dataframe alongside the variables instead of having a separate argument for the dataframe like we did for leveneTest()
t_output <- t.test(d$support~d$gender)View Test Output
t_output
Welch Two Sample t-test
data: d$support by d$gender
t = -0.5969, df = 300.84, p-value = 0.551
alternative hypothesis: true difference in means between group female and group male is not equal to 0
95 percent confidence interval:
-0.17480874 0.09344246
sample estimates:
mean in group female mean in group male
3.592483 3.633166
Calculate Cohen’s d
# # once again, we use our formula to calculate cohen's d
d_output <- cohen.d(d$support~d$gender)View Effect Size
- Trivial: < .2
- Small: between .2 and .5
- Medium: between .5 and .8
- Large: > .8
d_output
Cohen's d
d estimate: -0.04334107 (negligible)
95 percent confidence interval:
lower upper
-0.1955016 0.1088195
Write Up Results
Write up your results. Again, make sure to maintain an appropriate tone, and follow APA guidelines for reporting statistical results. I recommend following the below outline:
- Briefly restate your hypothesis
- Describe any issues with your data (you can copy/paste from above, just make sure everything flows).
- Report your results. Remember to include means of your groups, your t-value, your degrees of freedom, your p-value, your d-value, and your confidence interval.
- If your test is significant, interpret your effect size (trivial, small, medium, or large) and include the citation.
- Make sure to include a reference to Figure 1 (created using the code below)
“We tested our hypothesis that females participants would report significantly higher levels of social support than male participants using an independent sample t-test. Our data met all of the assumptions of a t-test, however, we did not find a significant difference, t(300.84) = -0.6, p = 0.551, d = -0.04, 95% [-0.17,0.09]. (refer to Figure 1).”
If needed: “Our effect size was large according to Cohen (1988).”
References
Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.