t-Test Lab

Author

Autumn Morgan

Loading Libraries

library(psych) # for the describe() command
library(car) # for the leveneTest() command
library(effsize) # for the cohen.d() command

Importing Data

d <- read.csv(file= "Data/mydata.csv", header=T)

State Your Hypothesis - PART OF YOUR WRITEUP

There is a significant difference in self-esteem (RSE scores) between male and female participants.

Check Your Assumptions

T-test Assumptions

  • Data values must be independent (independent t-test only) (confirmed by data report)
  • Data obtained via a random sample (confirmed by data report)
  • IV must have two levels (will check below)
  • Dependent variable must be normally distributed (will check below. if issues, note and proceed)
  • Variances of the two groups must be approximately equal, aka ‘homogeneity of variance’. Lacking this makes our results inaccurate (will check below - this really only applies to Student’s t-test, but we’ll check it anyway)

Checking IV levels

# # preview the levels and counts for your IV
table(d$gender, useNA = "always")

            female I use another term               male  Prefer not to say 
               764                 21                143                 11 
              <NA> 
                 0 
# 
# # note that the table() output shows you exactly how the levels of your variable are rewritten. when recoding, make sure you are spelling them exactly as they appear
# 
# # to drop levels from your variable
# # this subsets the data and says that any participant who is coded as 'LEVEL BAD' should be removed
# # if you don't need this for the homework, comment it out (add a # at the beginning of the line)

 
# 
# # to combine levels
# # this says that where any participant is coded as 'LEVEL BAD' it should be replaced by 'LEVEL GOOD'
# # you can repeat this as needed, changing 'LEVEL BAD' if you have multiple levels that you want to combine into a single level
# # if you don't need this for the homework, comment it out (add a # at the beginning of the line)
### d$IV[d$IV == "LEVEL BAD"] <- "LEVEL GOOD"
# 
# # preview your changes and make sure everything is correct

# # check your variable types
str(d)
'data.frame':   939 obs. of  6 variables:
 $ age   : chr  "1 under 18" "1 under 18" "4 between 36 and 45" "4 between 36 and 45" ...
 $ gender: chr  "male" "female" "female" "female" ...
 $ rse   : num  1.6 3.9 1.7 3.9 1.8 3.5 3 3.5 2.5 3.4 ...
 $ pss   : num  3.75 1 3.25 2 4 1.25 2.5 2 3.75 2.75 ...
 $ phq   : num  3.33 1 2.33 1.11 2.33 ...
 $ edeq12: num  1.83 1 1.67 1.42 3.17 ...
# 
# # make sure that your IV is recognized as a factor by R
d$gender <- as.factor(d$gender)
levels(d$gender)
[1] "female"             "I use another term" "male"              
[4] "Prefer not to say" 

Testing Homogeneity of Variance with Levene’s Test

We can test whether the variances of our two groups are equal using Levene’s test. The null hypothesis is that the variance between the two groups is equal, which is the result we want. So when running Levene’s test we’re hoping for a non-significant result!

# # use the leveneTest() command from the car package to test homogeneity of variance
# # uses the same 'formula' setup that we'll use for our t-test: formula is y~x, where y is our DV and x is our IV
leveneTest(rse ~ gender, data = d)
Levene's Test for Homogeneity of Variance (center = median)
       Df F value Pr(>F)
group   3  1.7716 0.1509
      935               
## Check Normality
# # you only need to check the variables you're using in the current analysis
# # although you checked them previously, it's always a good idea to look them over again and be sure that everything is correct
# 
# # you can use the describe() command on an entire datafrom (d) or just on a single variable (d$pss)
# # use it to check the skew and kurtosis of your DV
describe(d$rse)
   vars   n mean   sd median trimmed  mad min max range skew kurtosis   se
X1    1 939 2.65 0.71    2.7    2.67 0.74   1   4     3 -0.2    -0.75 0.02
describeBy(d$rse, group=d$gender)

 Descriptive statistics by group 
group: female
   vars   n mean   sd median trimmed  mad min max range  skew kurtosis   se
X1    1 764 2.64 0.69    2.7    2.65 0.74   1   4     3 -0.17     -0.7 0.03
------------------------------------------------------------ 
group: I use another term
   vars  n mean   sd median trimmed  mad min max range skew kurtosis  se
X1    1 21 1.79 0.47    1.9    1.78 0.44 1.1 2.8   1.7 0.09    -0.86 0.1
------------------------------------------------------------ 
group: male
   vars   n mean   sd median trimmed  mad min max range  skew kurtosis   se
X1    1 143 2.92 0.69      3    2.97 0.74 1.2   4   2.8 -0.54    -0.51 0.06
------------------------------------------------------------ 
group: Prefer not to say
   vars  n mean   sd median trimmed  mad min max range skew kurtosis   se
X1    1 11 2.05 0.73    1.8    1.98 0.74 1.2 3.5   2.3 0.58    -1.06 0.22
# 
# # can use the describeBy() command to view the means and standard deviations by group
# # it's very similar to the describe() command but splits the dataframe according to the 'group' variable
# 
# # also use a histogram to examine your continuous variable
hist(d$rse)

# 
# # last, use a boxplot to examine your continuous and categorical variables together
 boxplot(rse ~ gender, data=d)

Issues with My Data - PART OF YOUR WRITEUP

Did not drop

Run a T-test

# # very simple! we specify the dataframe alongside the variables instead of having a separate argument for the dataframe like we did for leveneTest()

d <- d[d$gender != "Prefer not to say", ]
d <- d[d$gender != "I use another term", ]
t_output <- t.test(rse ~ gender, data = d, var.equal = FALSE)
levels(d$gender)
[1] "female"             "I use another term" "male"              
[4] "Prefer not to say" 
table(d$gender)

            female I use another term               male  Prefer not to say 
               764                  0                143                  0 

View Test Output

t_output

    Welch Two Sample t-test

data:  rse by gender
t = -4.4903, df = 198.65, p-value = 1.204e-05
alternative hypothesis: true difference in means between group female and group male is not equal to 0
95 percent confidence interval:
 -0.4084374 -0.1591678
sample estimates:
mean in group female   mean in group male 
            2.635079             2.918881 

Calculate Cohen’s d

# # once again, we use our formula to calculate cohen's d
 d_output <- cohen.d(d$rse ~ d$gender)
Warning in cohen.d.default(d, f, subject = subject, ...): Factor with multiple
levels, using only the two actually present in data
t_output

    Welch Two Sample t-test

data:  rse by gender
t = -4.4903, df = 198.65, p-value = 1.204e-05
alternative hypothesis: true difference in means between group female and group male is not equal to 0
95 percent confidence interval:
 -0.4084374 -0.1591678
sample estimates:
mean in group female   mean in group male 
            2.635079             2.918881 

View Effect Size

  • Trivial: < .2
  • Small: between .2 and .5
  • Medium: between .5 and .8
  • Large: > .8
d_output

Cohen's d

d estimate: NaN (NA)
95 percent confidence interval:
lower upper 
  NaN   NaN 

Write Up Results

The t-test was conducted to compare the scores on the RSE (Resilience Scale for Adults) between female and male participants. The test revealed a statistically significant difference in RSE scores between the two groups, The 95% confidence interval for the difference in means was with female participants (M = 2.46, SD = 0.68) reporting significantly lower resilience scores than male participants (M = 2.84, SD = 0.72).

References

Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.