t-Test HW

Author

Elliot Wilson

Loading Libraries

library(psych) # for the describe() command
library(car) # for the leveneTest() command
library(effsize) # for the cohen.d() command

Importing Data

#UPDATE THIS FOR HW
d <- read.csv(file="Data/eammi2_data_final.csv", header=T)

State Your Hypothesis - PART OF YOUR WRITEUP

Emerging adults will have higher “need to belong” scores.

Check Your Assumptions

T-test Assumptions

Data values must be independent (independent t-test only) (confirmed by data report)
Data obtained via a random sample (confirmed by data report)
IV must have two levels (will check below)
Dependent variable must be normally distributed (will check below. if issues, note and proceed)
Variances of the two groups must be approximately equal, aka ‘homogeneity of variance’. Lacking this makes our results inaccurate (will check below - this really only applies to Student’s t-test, but we’ll check it anyway)

Checking IV levels

# preview the levels and counts for your IV
table(d$age, useNA = "always")


1 between 18 and 25 2 between 26 and 35 3 between 36 and 45           4 over 45 
               1997                 116                  38                  18 
               <NA> 
               1013

# # note that the table() output shows you exactly how the levels of your variable are rewritten. when recoding, make sure you are spelling them exactly as they appear
# 
# # to drop levels from your variable
# # this subsets the data and says that any participant who is coded as 'LEVEL BAD' should be removed
# # if you don't need this for the homework, comment it out (add a # at the beginning of the line)
# d <- subset(d, pet != "bird")
# d <- subset(d, pet != "cat and dog")
# d <- subset(d, pet != "fish")
# d <- subset(d, pet != "multiple types of pet")
# d <- subset(d, pet != "no pets")
# d <- subset(d, pet != "other")

d$age_rc[d$age == "2 between 26 and 35"] <- "emerging adult"
d$age_rc[d$age == "1 between 18 and 25"] <- "young adult"
d <- subset(d, age != "3 between 36 and 45")
d <- subset(d, age != "4 over 45")


table(d$age_rc, useNA = "always")


emerging adult    young adult           <NA> 
           116           1997              0

# # to combine levels
# # this says that where any participant is coded as 'LEVEL BAD' it should be replaced by 'LEVEL GOOD'
# # you can repeat this as needed, changing 'LEVEL BAD' if you have multiple levels that you want to combine into a single level
# # if you don't need this for the homework, comment it out (add a # at the beginning of the line)
# d$mhealth_rc[d$mhealth == "anxiety disorder"] <- "mental health diagnosis"

table(d$belong, useNA = "always")


 1.3  1.4  1.5  1.6  1.7  1.8  1.9    2  2.1  2.2  2.3  2.4  2.5  2.6  2.7  2.8 
   2    3    6    8   11   12   19   17   25   36   32   56   56   88   71   98 
 2.9    3  3.1  3.2  3.3  3.4  3.5  3.6  3.7  3.8  3.9    4  4.1  4.2  4.3  4.4 
 116  135  121  144  147  124  130  134  116   95   70   62   61   49   25   19 
 4.5  4.6  4.7  4.9    5 <NA> 
   9   11    1    1    1    2

# # preview your changes and make sure everything is correct
table(d$age_rc, useNA = "always")


emerging adult    young adult           <NA> 
           116           1997              0

table(d$mhealth_rc, useNA = "always")


<NA> 
   0

# 
# check your variable types
str(d)

'data.frame':   2113 obs. of  28 variables:
 $ ResponseId      : chr  "R_BJN3bQqi1zUMid3" "R_2TGbiBXmAtxywsD" "R_12G7bIqN2wB2N65" "R_39pldNoon8CePfP" ...
 $ gender          : chr  "f" "m" "m" "f" ...
 $ race_rc         : chr  "white" "white" "white" "other" ...
 $ age             : chr  "1 between 18 and 25" "1 between 18 and 25" "1 between 18 and 25" "1 between 18 and 25" ...
 $ income          : chr  "1 low" "1 low" "rather not say" "rather not say" ...
 $ edu             : chr  "2 Currently in college" "5 Completed Bachelors Degree" "2 Currently in college" "2 Currently in college" ...
 $ sibling         : chr  "at least one sibling" "at least one sibling" "at least one sibling" "at least one sibling" ...
 $ party_rc        : chr  "democrat" "independent" "apolitical" "apolitical" ...
 $ disability      : chr  NA NA "psychiatric" NA ...
 $ marriage5       : chr  "are currently divorced from one another" "are currently married to one another" "are currently married to one another" "are currently married to one another" ...
 $ phys_sym        : chr  "high number of symptoms" "high number of symptoms" "high number of symptoms" "high number of symptoms" ...
 $ pipwd           : num  NA NA 2.33 NA NA ...
 $ moa_independence: num  3.67 3.67 3.5 3 3.83 ...
 $ moa_role        : num  3 2.67 2.5 2 2.67 ...
 $ moa_safety      : num  2.75 3.25 3 1.25 2.25 2.5 4 3.25 2.75 3.5 ...
 $ moa_maturity    : num  3.67 3.33 3.67 3 3.67 ...
 $ idea            : num  3.75 3.88 3.75 3.75 3.5 ...
 $ swb             : num  4.33 4.17 1.83 5.17 3.67 ...
 $ mindful         : num  2.4 1.8 2.2 2.2 3.2 ...
 $ belong          : num  2.8 4.2 3.6 4 3.4 4.2 3.9 3.6 2.9 2.5 ...
 $ efficacy        : num  3.4 3.4 2.2 2.8 3 2.4 2.3 3 3 3.7 ...
 $ support         : num  6 6.75 5.17 5.58 6 ...
 $ socmeduse       : int  47 23 34 35 37 13 37 43 37 29 ...
 $ usdream         : chr  "american dream is important and achievable for me" "american dream is important and achievable for me" "american dream is not important and maybe not achievable for me" "american dream is not important and maybe not achievable for me" ...
 $ npi             : num  0.6923 0.1538 0.0769 0.0769 0.7692 ...
 $ exploit         : num  2 3.67 4.33 1.67 4 ...
 $ stress          : num  3.3 3.3 4 3.2 3.1 3.5 3.3 2.4 2.9 2.7 ...
 $ age_rc          : chr  "young adult" "young adult" "young adult" "young adult" ...

# 
# # make sure that your IV is recognized as a factor by R
d$age_rc <- as.factor(d$age_rc)

Testing Homogeneity of Variance with Levene’s Test

We can test whether the variances of our two groups are equal using Levene’s test. The null hypothesis is that the variance between the two groups is equal, which is the result we want. So when running Levene’s test we’re hoping for a non-significant result!

# # use the leveneTest() command from the car package to test homogeneity of variance
# # uses the same 'formula' setup that we'll use for our t-test: formula is y~x, where y is our DV and x is our IV
leveneTest(belong~age_rc, data = d)

Levene's Test for Homogeneity of Variance (center = median)
        Df F value  Pr(>F)  
group    1  3.7295 0.05359 .
      2109                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This is more of a formality in our case, because we are using Welch’s t-test, which does not have the same assumptions as Student’s t-test (the default type of t-test) about variance. R defaults to using Welch’s t-test so this doesn’t require any extra effort on our part!

Check Normality

# # you only need to check the variables you're using in the current analysis
# # although you checked them previously, it's always a good idea to look them over again and be sure that everything is correct
# 
# # you can use the describe() command on an entire datafrom (d) or just on a single variable (d$pss)
# # use it to check the skew and kurtosis of your DV
describe(d$belong)

   vars    n mean   sd median trimmed  mad min max range  skew kurtosis   se
X1    1 2111 3.22 0.61    3.2    3.24 0.59 1.3   5   3.7 -0.29    -0.07 0.01

# 
# # can use the describeBy() command to view the means and standard deviations by group
# # it's very similar to the describe() command but splits the dataframe according to the 'group' variable
describeBy(d$belong, group=d$age_rc)


 Descriptive statistics by group 
group: emerging adult
   vars   n mean   sd median trimmed  mad min max range skew kurtosis   se
X1    1 116 2.86 0.65    2.8    2.86 0.74 1.4 4.3   2.9 0.03     -0.8 0.06
------------------------------------------------------------ 
group: young adult
   vars    n mean  sd median trimmed  mad min max range  skew kurtosis   se
X1    1 1995 3.24 0.6    3.3    3.26 0.59 1.3   5   3.7 -0.29        0 0.01

# 
# # also use a histogram to examine your continuous variable
hist(d$belong)

# 
# # last, use a boxplot to examine your continuous and categorical variables together
boxplot(d$belong~d$age_rc)

Issues with My Data - PART OF YOUR WRITEUP

To ensure my data was sufficient for a t-test, we dropped two categories in the “age” variable so there were only two levels in the independent variable and changed the “1 between 18 and 25” to be “emerging adults” and “2 between 26 and 35” to be “young adults” for clarity.

Before proceeding with analysis, we confirmed that all t-test assumptions were met. Levene’s test found significant homogeneity of variance (p = .05359), although with a slim margin. To be safe, Welch’s t-test will be used.

Run a T-test

# # very simple! we specify the dataframe alongside the variables instead of having a separate argument for the dataframe like we did for leveneTest()
t_output <- t.test(d$belong~d$age_rc)

View Test Output

t_output


    Welch Two Sample t-test

data:  d$belong by d$age_rc
t = -6.1259, df = 126.47, p-value = 1.056e-08
alternative hypothesis: true difference in means between group emerging adult and group young adult is not equal to 0
95 percent confidence interval:
 -0.5020736 -0.2568962
sample estimates:
mean in group emerging adult    mean in group young adult 
                    2.862069                     3.241554

Calculate Cohen’s d

# # once again, we use our formula to calculate cohen's d
d_output <- cohen.d(d$belong~d$age_rc)

View Effect Size

Trivial: < .2
Small: between .2 and .5
Medium: between .5 and .8
Large: > .8

d_output


Cohen's d

d estimate: -0.6328007 (medium)
95 percent confidence interval:
     lower      upper 
-0.8210734 -0.4445281

Write Up Results

Write up your results. Again, make sure to maintain an appropriate tone, and follow APA guidelines for reporting statistical results. I recommend following the below outline:

Briefly restate your hypothesis
Describe any issues with your data (you can copy/paste from above, just make sure everything flows).
Report your results. Remember to include means of your groups, your t-value, your degrees of freedom, your p-value, your d-value, and your confidence interval.
If your test is significant, interpret your effect size (trivial, small, medium, or large) and include the citation.
Make sure to include a reference to Figure 1 (created using the code below)

My hypothesis was that emerging adults would have higher need for belonging than young adults. Ultimately, my hypothesis was not supported. To ensure my data was sufficient for a t-test, we dropped two categories in the “age” variable so there were only two levels in the independent variable and changed the “1 between 18 and 25” to be “emerging adults” and “2 between 26 and 35” to be “young adults” for clarity.

The mean of the emerging adult group was 2.86 with a standard deviation of .65 and the mean of the young adult group was 3.24 with a standard deviation of .6.The t-value was -6.13, there were 126.47 degrees of freedom, the p-value was 1.056e-08, the d-value was -.633, which is medium (Cohen 1988).

References

Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.