t-Test Lab

Author

Cora Craft

Loading Libraries

library(psych) # for the describe() command
library(car) # for the leveneTest() command
library(effsize) # for the cohen.d() command

Importing Data

# Update THIS FOR HOMEWORK - INSERT HW DATA
d <- read.csv(file="Data/eammi2_data_final.csv", header=T)

State Your Hypothesis - PART OF YOUR WRITEUP

We hypothesize that there will be a significant difference in stress levels between Black and White participants.

Check Your Assumptions

T-test Assumptions

Data values must be independent (independent t-test only) (confirmed by data report)
Data obtained via a random sample (confirmed by data report)
IV must have two levels (will check below)
Dependent variable must be normally distributed (will check below. if issues, note and proceed)
Variances of the two groups must be approximately equal, aka ‘homogeneity of variance’. Lacking this makes our results inaccurate (will check below - this really only applies to Student’s t-test, but we’ll check it anyway)

Checking IV levels

# # preview the levels and counts for your IV
 table(d$race_rc, useNA = "always")


      asian       black    hispanic multiracial  nativeamer       other 
        210         249         286         293          12          97 
      white        <NA> 
       2026           9

# 
# # note that the table() output shows you exactly how the levels of your variable are written. when recoding, make sure you are spelling them exactly as they appear
# 
# # to drop levels from your variable
# # this subsets the data and says that any participant who is coded as 'LEVEL BAD' should be removed
# # if you don't need this for the homework, comment it out (add a # at the beginning of the line)
d <- subset(d, race_rc != "asian")
d <- subset(d, race_rc != "hispanic")
d <- subset(d, race_rc != "multiracial")
d <- subset(d, race_rc != "nativeamer")
d <- subset(d, race_rc != "other")
 
table(d$edu, useNA = "always")


     1 High school diploma or less, and NO COLLEGE 
                                                41 
                            2 Currently in college 
                                              1834 
3 Completed some college, but no longer in college 
                                                16 
                  4 Complete 2 year College degree 
                                               120 
                      5 Completed Bachelors Degree 
                                               111 
                 6 Currently in graduate education 
                                               112 
                  7 Completed some graduate degree 
                                                41 
                                              <NA> 
                                                 0

# # to combine levels
# # this says that where any participant is coded as 'LEVEL BAD' it should be replaced by 'LEVEL GOOD'
# # you can repeat this as needed, changing 'LEVEL BAD' if you have multiple levels that you want to combine into a single level
# # if you don't need this for the homework, comment it out (add a # at the beginning of the line)
 d$edu_rc[d$edu == "High school diploma or less"] <- "Education Level"
 d$edu_rc[d$edu == "Currently in college"] <- "Education Level"
 d$edu_rc[d$edu == "Completed some college, but no longer in college"] <- "Education Level"
 d$edu_rc[d$edu == "Completed 2 year college degree"] <- "Education Level"
 d$edu_rc[d$edu == "Completed Bachelors Degree"] <- "Education Level"
 d$edu_rc[d$edu == "Currently in graduate education"] <- "Education Level"
 d$edu_rc[d$edu == "Completed some graduate degree"] <- "Education Level"
d$edu_rc[d$edu == "None or NA"] <- "No education level"
# 
 table(d$edu_rc, useNA = "always")


<NA> 
2275

table(d$edu,d$edu_rc, useNA = "always")

                                                    
                                                     <NA>
  1 High school diploma or less, and NO COLLEGE        41
  2 Currently in college                             1834
  3 Completed some college, but no longer in college   16
  4 Complete 2 year College degree                    120
  5 Completed Bachelors Degree                        111
  6 Currently in graduate education                   112
  7 Completed some graduate degree                     41
  <NA>                                                  0

# # preview your changes and make sure everything is correct
table(d$race_rc, useNA = "always")


black white  <NA> 
  249  2026     0

table(d$edu_rc, useNA = "always")


<NA> 
2275

# 
# # check your variable types
 str(d)

'data.frame':   2275 obs. of  28 variables:
 $ ResponseId      : chr  "R_BJN3bQqi1zUMid3" "R_2TGbiBXmAtxywsD" "R_12G7bIqN2wB2N65" "R_1QiKb2LdJo1Bhvv" ...
 $ gender          : chr  "f" "m" "m" "m" ...
 $ race_rc         : chr  "white" "white" "white" "white" ...
 $ age             : chr  "1 between 18 and 25" "1 between 18 and 25" "1 between 18 and 25" "1 between 18 and 25" ...
 $ income          : chr  "1 low" "1 low" "rather not say" "2 middle" ...
 $ edu             : chr  "2 Currently in college" "5 Completed Bachelors Degree" "2 Currently in college" "2 Currently in college" ...
 $ sibling         : chr  "at least one sibling" "at least one sibling" "at least one sibling" "at least one sibling" ...
 $ party_rc        : chr  "democrat" "independent" "apolitical" "apolitical" ...
 $ disability      : chr  NA NA "psychiatric" NA ...
 $ marriage5       : chr  "are currently divorced from one another" "are currently married to one another" "are currently married to one another" "are currently married to one another" ...
 $ phys_sym        : chr  "high number of symptoms" "high number of symptoms" "high number of symptoms" "low number of symptoms" ...
 $ pipwd           : num  NA NA 2.33 NA NA ...
 $ moa_independence: num  3.67 3.67 3.5 3.83 3.5 ...
 $ moa_role        : num  3 2.67 2.5 2.67 3.33 ...
 $ moa_safety      : num  2.75 3.25 3 2.25 2.5 4 3.25 3 3.75 3 ...
 $ moa_maturity    : num  3.67 3.33 3.67 3.67 4 ...
 $ idea            : num  3.75 3.88 3.75 3.5 3.25 ...
 $ swb             : num  4.33 4.17 1.83 3.67 4 ...
 $ mindful         : num  2.4 1.8 2.2 3.2 3.4 ...
 $ belong          : num  2.8 4.2 3.6 3.4 4.2 3.9 3.6 4.1 3.1 2.4 ...
 $ efficacy        : num  3.4 3.4 2.2 3 2.4 2.3 3 3.3 3 3.1 ...
 $ support         : num  6 6.75 5.17 6 4.5 ...
 $ socmeduse       : int  47 23 34 37 13 37 43 35 27 37 ...
 $ usdream         : chr  "american dream is important and achievable for me" "american dream is important and achievable for me" "american dream is not important and maybe not achievable for me" "not sure if american dream important" ...
 $ npi             : num  0.6923 0.1538 0.0769 0.7692 0.2308 ...
 $ exploit         : num  2 3.67 4.33 4 1.33 ...
 $ stress          : num  3.3 3.3 4 3.1 3.5 3.3 2.4 3.5 2.5 3.5 ...
 $ edu_rc          : chr  NA NA NA NA ...

# 
# # make sure that your IV is recognized as a factor by R
d$race_rc <- as.factor(d$race_rc)
d$edu_rc <- as.factor(d$edu_rc)

Testing Homogeneity of Variance with Levene’s Test

We can test whether the variances of our two groups are equal using Levene’s test. The null hypothesis is that the variance between the two groups is equal, which is the result we want. So when running Levene’s test we’re hoping for a non-significant result!

# # use the leveneTest() command from the car package to test homogeneity of variance
# # uses the same 'formula' setup that we'll use for our t-test: formula is y~x, where y is our DV and x is our IV
leveneTest(stress~race_rc, data = d)

Levene's Test for Homogeneity of Variance (center = median)
        Df F value   Pr(>F)   
group    1  9.1578 0.002504 **
      2271                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This is more of a formality in our case, because we are using Welch’s t-test, which does not have the same assumptions as Student’s t-test (the default type of t-test) about variance. R defaults to using Welch’s t-test so this doesn’t require any extra effort on our part!

Check Normality

# you only need to check the variables you're using in the current analysis
# although you checked them previously, it's always a good idea to look them over again and be sure that everything is correct

# you can use the describe() command on an entire datafrom (d) or just on a single variable (d$pss)
# use it to check the skew and kurtosis of your DV
describe(d$stress)

   vars    n mean  sd median trimmed  mad min max range skew kurtosis   se
X1    1 2273 3.04 0.6      3    3.04 0.59 1.3 4.7   3.4 0.04    -0.25 0.01

# can use the describeBy() command to view the means and standard deviations by group
# it's very similar to the describe() command but splits the dataframe according to the 'group' variable
describeBy(d$stress, group=d$race_rc)


 Descriptive statistics by group 
group: black
   vars   n mean   sd median trimmed  mad min max range skew kurtosis   se
X1    1 248 3.08 0.54    3.1    3.06 0.44 1.7 4.7     3 0.25     0.19 0.03
------------------------------------------------------------ 
group: white
   vars    n mean   sd median trimmed  mad min max range skew kurtosis   se
X1    1 2025 3.04 0.61      3    3.04 0.59 1.3 4.7   3.4 0.02     -0.3 0.01

# also use a histogram to examine your continuous variable
hist(d$stress)

# last, use a boxplot to examine your continuous and categorical variables together
# categorical / IV goes on the right. Continuous / DV goes on the left
boxplot(d$stress~d$race_rc)

Issues with My Data - PART OF YOUR WRITEUP

We dropped participants who identified with racial groups other than “Black” and “White.” We also confirmed homogeneity of variance using Levene’s test (p = .450) and that our dependent variable is normally distributed ( skew and kurtosis between -2 and +2).

Run a T-test

# # very simple! we specify the dataframe alongside the variables instead of having a separate argument for the dataframe like we did for leveneTest()
 t_output <- t.test(d$stress~d$race_rc)

View Test Output

 t_output


    Welch Two Sample t-test

data:  d$stress by d$race_rc
t = 0.95005, df = 329.02, p-value = 0.3428
alternative hypothesis: true difference in means between group black and group white is not equal to 0
95 percent confidence interval:
 -0.03755085  0.10769860
sample estimates:
mean in group black mean in group white 
           3.076210            3.041136

Calculate Cohen’s d

# # once again, we use our formula to calculate cohen's d
 d_output <- cohen.d(d$stress~d$race_rc)

View Effect Size

Trivial: < .2
Small: between .2 and .5
Medium: between .5 and .8
Large: > .8

d_output


Cohen's d

d estimate: 0.0581756 (negligible)
95 percent confidence interval:
      lower       upper 
-0.07376446  0.19011567

Write Up Results

We tested the hypothesis that stress levels would be significantly different between Black and White participants. We dropped any participants who did not identify as being “Black” or “white.” Additionally, we did not confirm homogeneity varience with the Levene’s test (p.003). However we found an evenly distributed dependent variable (skew and kurtosis within -2 and +2). A sample Welch’s t-test was used and revealed no significant different in stress levels between Black participants and White participants. Mean of Black participants = 3.08 and Mean of White participants = 3.04. t(329.02) = 0.95, p = .343, d = 0.06, 95% [-0.04, .11]. (refer to Figure 1).”

References

Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.