t-Test Lab HW

Author

Angela Vazquez

Loading Libraries

library(psych) # for the describe() command
library(car) # for the leveneTest() command
library(effsize) # for the cohen.d() command

Importing Data

# update this on HMWK!!!!!!!!!
d <- read.csv(file="Data/mydata.csv", header=T)

State Your Hypothesis - PART OF YOUR WRITEUP

Racial and ethnic minority individuals in the United States experience higher rates of generalized anxiety disorder compared to white individuals.

State your t-test hypothesis. Remember, a t-test has one continuous variable as the dependent variable, and one categorical variable with two levels as the independent variable. If your IV of choice has more than two level, you will need to pick two levels to compare and drop the rest, or combine levels until you only have two left.

Check Your Assumptions

T-test Assumptions

  • Data values must be independent (independent t-test only) (confirmed by data report)
  • Data obtained via a random sample (confirmed by data report)
  • IV must have two levels (will check below)
  • Dependent variable must be normally distributed (will check below. if issues, note and proceed)
  • Variances of the two groups must be approximately equal, aka ‘homogeneity of variance’. Lacking this makes our results inaccurate (will check below - this really only applies to Student’s t-test, but we’ll check it anyway)

Checking IV levels

# preview the levels and counts for your IV
table(d$ethnicity, useNA = "always")

 Asian/Asian British - Indian, Pakistani, Bangladeshi, other 
                                                         119 
             Black/Black British - Caribbean, African, other 
                                                          23 
                                     Chinese/Chinese British 
                                                          10 
Middle Eastern/Middle Eastern British - Arab, Turkish, other 
                                                          11 
                                          Mixed race - other 
                                                          40 
                  Mixed race - White and Black/Black British 
                                                          20 
                                          Other ethnic group 
                                                          10 
                                           Prefer not to say 
                                                          23 
                               White - British, Irish, other 
                                                         948 
                                                        <NA> 
                                                           0 
# # note that the table() output shows you exactly how the levels of your variable are rewritten. when recoding, make sure you are spelling them exactly as they appear
# 
# # to drop levels from your variable
# # this subsets the data and says that any participant who is coded as 'LEVEL BAD' should be removed
# # if you don't need this for the homework, comment it out (add a # at the beginning of the line)
# d <- subset(d, ethnicity != "Asian")
# d <- subset(d, ethnicity != "Black")
# d <- subset(d, ethnicity != "Mixed Race")
# d <- subset(d, ethnicity != "White")
# d <- subset(d, ethnicity != "Chinese")
# d <- subset(d, ethnicity != "Middle Eastern")
# d <- subset(d, ethnicity != "other ethic groups")


# # to combine levels
# # this says that where any participant is coded as 'LEVEL BAD' it should be replaced by 'LEVEL GOOD'
# # you can repeat this as needed, changing 'LEVEL BAD' if you have multiple levels that you want to combine into a single level
# # if you don't need this for the homework, comment it out (add a # at the beginning of the line)
d$ethnicity_rc[d$ethnicity == "Asian/Asian British - Indian, Pakistani, Bangladeshi, other"] <- "POC"
d$ethnicity_rc[d$ethnicity == "Black/Black British - Caribbean, African, other"] <- "POC"
d$ethnicity_rc[d$ethnicity == "Chinese/Chinese British"] <- "POC"
d$ethnicity_rc[d$ethnicity == "Middle Eastern/Middle Eastern British - Arab, Turkish, other"] <- "POC"
d$ethnicity_rc[d$ethnicity == "Mixed race - other"] <- "POC"
d$ethnicity_rc[d$ethnicity == "Mixed race - White and Black/Black British "] <- "POC"
d$ethnicity_rc[d$ethnicity == " Other ethnic group"] <- "POC"
d$ethnicity_rc[d$ethnicity == "White - British, Irish, other"] <- "White"

table(d$ethnicity_rc, useNA = "always")  

  POC White  <NA> 
  203   948    53 
#table(d$ethnicity, d$ethnicity_rc, useNA = "always") 

# # check your variable types
str(d)
'data.frame':   1204 obs. of  7 variables:
 $ edeq12      : num  1.58 1.83 1 1.67 1.42 ...
 $ isolation   : num  2.25 3.5 1 2.5 1.75 2 1.25 1 1.25 3 ...
 $ support     : num  2.5 2.17 5 2.5 3.67 ...
 $ gad         : num  1.86 3.86 1.14 2 1.43 ...
 $ ethnicity   : chr  "White - British, Irish, other" "White - British, Irish, other" "White - British, Irish, other" "White - British, Irish, other" ...
 $ education   : chr  "6 graduate degree or higher" "prefer not to say" "2 equivalent to high school completion" "5 undergraduate degree" ...
 $ ethnicity_rc: chr  "White" "White" "White" "White" ...
# # make sure that your IV is recognized as a factor by R

d$ethnicity_rc <- as.factor(d$ethnicity_rc)

Testing Homogeneity of Variance with Levene’s Test

We can test whether the variances of our two groups are equal using Levene’s test. The null hypothesis is that the variance between the two groups is equal, which is the result we want. So when running Levene’s test we’re hoping for a non-significant result!

# # use the leveneTest() command from the car package to test homogeneity of variance
# # uses the same 'formula' setup that we'll use for our t-test: formula is y~x, where y is our DV and x is our IV
leveneTest(gad~ethnicity_rc, data = d)
Levene's Test for Homogeneity of Variance (center = median)
        Df F value Pr(>F)
group    1  0.2339 0.6287
      1149               

This is more of a formality in our case, because we are using Welch’s t-test, which does not have the same assumptions as Student’s t-test (the default type of t-test) about variance. R defaults to using Welch’s t-test so this doesn’t require any extra effort on our part!

Check Normality

# you only need to check the variables you're using in the current analysis
# although you checked them previously, it's always a good idea to look them over again and be sure that everything is correct

# you can use the describe() command on an entire datafrom (d) or just on a single variable (d$pss)
# use it to check the skew and kurtosis of your DV
describe(d$gad)
   vars    n mean   sd median trimmed  mad min max range skew kurtosis   se
X1    1 1204 2.05 0.91   1.71    1.96 0.85   1   4     3 0.67    -0.74 0.03
# can use the describeBy() command to view the means and standard deviations by group
# it's very similar to the describe() command but splits the dataframe according to the 'group' variable
describeBy(d$gad, group=d$ethnicity_rc)

 Descriptive statistics by group 
group: POC
   vars   n mean   sd median trimmed  mad min max range skew kurtosis   se
X1    1 203 2.03 0.92   1.71    1.94 0.85   1   4     3 0.66    -0.78 0.06
------------------------------------------------------------ 
group: White
   vars   n mean   sd median trimmed  mad min max range skew kurtosis   se
X1    1 948 2.04 0.91   1.71    1.95 0.85   1   4     3 0.69     -0.7 0.03
# also use a histogram to examine your continuous variable
hist(d$gad)

# last, use a boxplot to examine your continuous and categorical variables together
#categorical/ IV on right, continous/ DV on left!!!!
boxplot(d$gad~d$ethnicity_rc)

Issues with My Data - PART OF YOUR WRITEUP

One issue with my data was the need to combine POC into a single category to ensure two levels. I grouped all racial and ethnic minority groups into one category and white individuals into another. Before proceeding with the analysis, we confirmed that all t-test assumptions were met. Levene’s test indicated no significant heterogeneity of variance (p = 0.6287). As a result, Welch’s t-test was used. Additionally, our dependent variable was confirmed to be normally distributed, with skewness and kurtosis values falling within the acceptable range of -2 to +2.

Run a T-test

# very simple! we specify the dataframe alongside the variables instead of having a separate argument for the dataframe like we did for leveneTest()
t_output <- t.test(d$gad~d$ethnicity_rc)

View Test Output

t_output

    Welch Two Sample t-test

data:  d$gad by d$ethnicity_rc
t = -0.16777, df = 291.96, p-value = 0.8669
alternative hypothesis: true difference in means between group POC and group White is not equal to 0
95 percent confidence interval:
 -0.1519099  0.1280455
sample estimates:
  mean in group POC mean in group White 
           2.032372            2.044304 

Calculate Cohen’s d

# once again, we use our formula to calculate cohen's d
d_output <- cohen.d(d$gad~d$ethnicity_rc)

View Effect Size

  • Trivial: < .2
  • Small: between .2 and .5
  • Medium: between .5 and .8
  • Large: > .8
d_output

Cohen's d

d estimate: -0.01310285 (negligible)
95 percent confidence interval:
     lower      upper 
-0.1648407  0.1386350 

Write Up Results

We tested our hypothesis that racial and ethnic minority individuals in the United States experience higher rates of generalized anxiety disorder compared to white individuals using an independent samples t-test. While our data met all the assumptions for a t-test, the results did not reveal a significant difference, t(291.96) = -0.17, p = 0.867, d = 0.01, 95% CI [-0.16, 0.14] (see Figure 1). According to Cohen (1988), the effect size was negligible.

References

Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.