t-Test HW

Author

Michelle Karpinski

Loading Libraries

library(psych) # for the describe() command
library(car) # for the leveneTest() command
library(effsize) # for the cohen.d() command

Importing Data

#Update for homework
d <- read.csv(file="Data/mydata.csv", header=T)

State Your Hypothesis - PART OF YOUR WRITEUP

Participants who have completed a Bachelor’s degree or more will report high levels of efficacy than those who have not.

Check Your Assumptions

T-test Assumptions

  • Data values must be independent (independent t-test only) (confirmed by data report)
  • Data obtained via a random sample (confirmed by data report)
  • IV must have two levels (will check below)
  • Dependent variable must be normally distributed (will check below. if issues, note and proceed)
  • Variances of the two groups must be approximately equal, aka ‘homogeneity of variance’. Lacking this makes our results inaccurate (will check below - this really only applies to Student’s t-test, but we’ll check it anyway)

Checking IV levels

# # preview the levels and counts for your IV
table(d$edu, useNA = "always")

     1 High school diploma or less, and NO COLLEGE 
                                                58 
                            2 Currently in college 
                                              2548 
3 Completed some college, but no longer in college 
                                                34 
                  4 Complete 2 year College degree 
                                               179 
                      5 Completed Bachelors Degree 
                                               135 
                 6 Currently in graduate education 
                                               134 
                  7 Completed some graduate degree 
                                                60 
                                              <NA> 
                                                 0 
#
# # note that the table() output shows you exactly how the levels of your variable are written. when recoding, make sure you are spelling them exactly as they appear
# 
# I did not drop levels from my variable.
# # to drop levels from your variable
# # this subsets the data and says that any participant who is coded as 'LEVEL BAD' should be removed
# # if you don't need this for the homework, comment it out (add a # at the beginning of the line)
#d <- subset(d, pet != "bird")

# # to combine levels
# # this says that where any participant is coded as 'LEVEL BAD' it should be replaced by 'LEVEL GOOD'
# # you can repeat this as needed, changing 'LEVEL BAD' if you have multiple levels that you want to combine into a single level
# # if you don't need this for the homework, comment it out (add a # at the beginning of the line)
d$edu_rc[d$edu == "1 High school diploma or less, and NO COLLEGE"] <- "No Bachelor's Degree"
d$edu_rc[d$edu == "2 Currently in college"] <- "No Bachelor's Degree"
d$edu_rc[d$edu == "3 Completed some college, but no longer in college"] <- "No Bachelor's Degree"
d$edu_rc[d$edu == "4 Complete 2 year College degree"] <- "No Bachelor's Degree"
d$edu_rc[d$edu == "5 Completed Bachelors Degree"] <- "Bachelor's Degree or More"
d$edu_rc[d$edu == "6 Currently in graduate education"] <- "Bachelor's Degree or More"
d$edu_rc[d$edu == "7 Completed some graduate degree"] <- "Bachelor's Degree or More"
#
# # preview your changes and make sure everything is correct
table(d$edu_rc, useNA = "always")

Bachelor's Degree or More      No Bachelor's Degree                      <NA> 
                      329                      2819                         0 
#
# # check your variable types
str(d)
'data.frame':   3148 obs. of  7 variables:
 $ income  : chr  "1 low" "1 low" "rather not say" "rather not say" ...
 $ edu     : chr  "2 Currently in college" "5 Completed Bachelors Degree" "2 Currently in college" "2 Currently in college" ...
 $ swb     : num  4.33 4.17 1.83 5.17 3.67 ...
 $ efficacy: num  3.4 3.4 2.2 2.8 3 2.4 2.3 3 3 3.7 ...
 $ exploit : num  2 3.67 4.33 1.67 4 ...
 $ stress  : num  3.3 3.3 4 3.2 3.1 3.5 3.3 2.4 2.9 2.7 ...
 $ edu_rc  : chr  "No Bachelor's Degree" "Bachelor's Degree or More" "No Bachelor's Degree" "No Bachelor's Degree" ...
# 
# # make sure that your IV is recognized as a factor by R
d$edu_rc <- as.factor(d$edu_rc)

Testing Homogeneity of Variance with Levene’s Test

We can test whether the variances of our two groups are equal using Levene’s test. The null hypothesis is that the variance between the two groups is equal, which is the result we want. So when running Levene’s test we’re hoping for a non-significant result!

Homogenity Discussion - LAB ONLY

Only for explaining.

# # use the leveneTest() command from the car package to test homogeneity of variance
# # uses the same 'formula' setup that we'll use for our t-test: formula is y~x, where y is our DV and x is our IV
leveneTest(efficacy~edu_rc, data = d)
Levene's Test for Homogeneity of Variance (center = median)
        Df F value Pr(>F)
group    1  0.5544 0.4566
      3146               
# not significant! = Homogeneity!

This is more of a formality in our case, because we are using Welch’s t-test, which does not have the same assumptions as Student’s t-test (the default type of t-test) about variance. R defaults to using Welch’s t-test so this doesn’t require any extra effort on our part!

Check Normality

# # you only need to check the variables you're using in the current analysis
# # although you checked them previously, it's always a good idea to look them over again and be sure that everything is correct
# 
# # you can use the describe() command on an entire datafrom (d) or just on a single variable (d$pss)
# # use it to check the skew and kurtosis of your DV
describe(d$efficacy)
   vars    n mean   sd median trimmed  mad min max range  skew kurtosis   se
X1    1 3148 3.13 0.45    3.1    3.13 0.44 1.1   4   2.9 -0.24     0.45 0.01
#Kurtosis=0.45, Skew=-0.24
# 
# # can use the describeBy() command to view the means and standard deviations by group
# # it's very similar to the describe() command but splits the dataframe according to the 'group' variable
describeBy(d$efficacy, group=d$edu_rc)

 Descriptive statistics by group 
group: Bachelor's Degree or More
   vars   n mean   sd median trimmed  mad min max range  skew kurtosis   se
X1    1 329  3.2 0.45    3.2    3.21 0.44 1.6   4   2.4 -0.14    -0.09 0.02
------------------------------------------------------------ 
group: No Bachelor's Degree
   vars    n mean   sd median trimmed  mad min max range  skew kurtosis   se
X1    1 2819 3.12 0.45    3.1    3.12 0.44 1.1   4   2.9 -0.26     0.51 0.01
#No Bachelor's: Kurtosis=0.51, Skew=-0.26
#Bachelor's+: Kurtosis=-0.09, Skew=-0.14
# 
# # also use a histogram to examine your continuous variable
hist(d$efficacy)

# 
# # last, use a boxplot to examine your continuous and categorical variables together
boxplot(d$efficacy~d$edu_rc)

#only slight difference observable
# variable~group (categorical/IV right, continuous/DV left)

Issues with My Data - PART OF YOUR WRITEUP

We combined groups to form the two categories, “No Bachelor’s Degree” (e.g. High school diploma or less, and no college and Complete 2 year College degree) and “Bachelor’s Degree or More” (e.g. Completed Bachelors Degree and Completed some graduate degree). We also confirmed homogeneity of variance using Levene’s test (p = 0.457) and that our dependent variable is normally distributed (skew and kurtosis between -2 and +2).

Run a T-test

# # very simple! we specify the dataframe alongside the variables instead of having a separate argument for the dataframe like we did for leveneTest()
t_output <- t.test(d$efficacy~d$edu_rc)
#categorical on right

View Test Output

t_output

    Welch Two Sample t-test

data:  d$efficacy by d$edu_rc
t = 3.3146, df = 408.36, p-value = 0.0009998
alternative hypothesis: true difference in means between group Bachelor's Degree or More and group No Bachelor's Degree is not equal to 0
95 percent confidence interval:
 0.03501674 0.13708831
sample estimates:
mean in group Bachelor's Degree or More      mean in group No Bachelor's Degree 
                               3.203647                                3.117595 
#p-value = 0.0009998
# mean in bachelor's+ = 3.20
# mean in no bachelor's = 3.12

Calculate Cohen’s d

# # once again, we use our formula to calculate cohen's d
d_output <- cohen.d(d$efficacy~d$edu_rc)

View Effect Size

  • Trivial: < .2
  • Small: between .2 and .5
  • Medium: between .5 and .8
  • Large: > .8
d_output

Cohen's d

d estimate: 0.1931292 (negligible)
95 percent confidence interval:
     lower      upper 
0.07879773 0.30746065 
#negligible (0.19)

Write Up Results

We tested our hypothesis that those with a Bachelor’s degree or more would report significantly higher efficacy than those without a Bachelor’s degree using an independent samples t-test. Our data met all of the assumptions of a t-test, and we found a significant difference, t(408.36) = 3.31, p = .001, d = .19, 95% [0.03, 0.14], in efficacy between those with a Bachelor’s degree or more (M = 3.20) and those without a Bachelor’s degree (M = 3.12) (refer to Figure 1).

Our effect size was negligible according to Cohen (1988).

References

Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.