t-Test HW

Author

Kayla Teipen

Loading Libraries

library(psych) # for the describe() command
library(car) # for the leveneTest() command
library(effsize) # for the cohen.d() command

Importing Data

#update this for the homework!!!! get rid of the parapgraph. Education is the independent categorical variable with two levels and IOU is the dependent continuous variable.  
 d <- read.csv(file="Data/mydata.csv", header=T)

State Your Hypothesis - PART OF YOUR WRITEUP

People with an undergraduate degree or higher will report higher scores on the intolerance of uncertainty scale.

Check Your Assumptions

T-test Assumptions

Data values must be independent (independent t-test only) (confirmed by data report)
Data obtained via a random sample (confirmed by data report)
IV must have two levels (will check below)
Dependent variable must be normally distributed (will check below. if issues, note and proceed)
Variances of the two groups must be approximately equal, aka ‘homogeneity of variance’. Lacking this makes our results inaccurate (will check below - this really only applies to Student’s t-test, but we’ll check it anyway)

Checking IV levels

 # preview the levels and counts for your IV
 table(d$education, useNA = "always")


             1 equivalent to not completing high school 
                                                    275 
                 2 equivalent to high school completion 
                                                    271 
3 equivalent to vocational/technical program completion 
                                                     19 
                       4 equivalent to AP/IB completion 
                                                    109 
                                 5 undergraduate degree 
                                                    110 
                            6 graduate degree or higher 
                                                     84 
                                      prefer not to say 
                                                     58 
                                                   <NA> 
                                                      0

# # note that the table() output shows you exactly how the levels of your variable are rewritten. when recoding, make sure you are spelling them exactly as they appear
# 
# # to drop levels from your variable
# # this subsets the data and says that any participant who is coded as 'LEVEL BAD' should be removed
# # if you don't need this for the homework, comment it out (add a # at the beginning of the line)
#d <- subset(d, pet != "bird")


#table(d$mhealth, useNA = "always")
#table(d$mhealth_rc, useNA = "always")
#table(d$mhealth, d$mhealth_rc, useNA = "always")

# # to combine levels
# # this says that where any participant is coded as 'LEVEL BAD' it should be replaced by 'LEVEL GOOD'
# # you can repeat this as needed, changing 'LEVEL BAD' if you have multiple levels that you want to combine into a single level
# # if you don't need this for the homework, comment it out (add a # at the beginning of the line)
d$education_rc[d$education == "1 equivalent to not completing high school"] <- "Less than undergraduate"
d$education_rc[d$education == "2 equivalent to high school completion"] <- "Less than undergraduate"
d$education_rc[d$education == "3 equivalent to vocational/technical program completion"] <- "Less than undergraduate"
d$education_rc[d$education == "4 equivalent to AP/IB completion"] <- "Less than undergraduate"
d$education_rc[d$education == "prefer not to say"] <- "Less than undergraduate"
d$education_rc[d$education == "5 undergraduate degree"] <- "Undergraduate or higher"
d$education_rc[d$education == "6 graduate degree or higher"] <- "Undergraduate or higher"

# 
# # preview your changes and make sure everything is correct
table(d$education, useNA = "always")


             1 equivalent to not completing high school 
                                                    275 
                 2 equivalent to high school completion 
                                                    271 
3 equivalent to vocational/technical program completion 
                                                     19 
                       4 equivalent to AP/IB completion 
                                                    109 
                                 5 undergraduate degree 
                                                    110 
                            6 graduate degree or higher 
                                                     84 
                                      prefer not to say 
                                                     58 
                                                   <NA> 
                                                      0

table(d$education_rc, useNA = "always")


Less than undergraduate Undergraduate or higher                    <NA> 
                    732                     194                       0

# # check your variable types
str(d)

'data.frame':   926 obs. of  7 variables:
 $ age         : chr  "1 under 18" "1 under 18" "4 between 36 and 45" "4 between 36 and 45" ...
 $ education   : chr  "prefer not to say" "2 equivalent to high school completion" "5 undergraduate degree" "6 graduate degree or higher" ...
 $ pswq        : num  0.851 -1.124 1.163 -0.342 -0.127 ...
 $ iou         : num  4 1.59 3.37 1.7 1.11 ...
 $ phq         : num  3.33 1 2.33 1.11 2.33 ...
 $ rse         : num  1.6 3.9 1.7 3.9 1.8 3.5 3 3.5 2.5 3.4 ...
 $ education_rc: chr  "Less than undergraduate" "Less than undergraduate" "Undergraduate or higher" "Undergraduate or higher" ...

# 
# # make sure that your IV is recognized as a factor by R
d$education_rc <- as.factor(d$education_rc)

Testing Homogeneity of Variance with Levene’s Test

We can test whether the variances of our two groups are equal using Levene’s test. The null hypothesis is that the variance between the two groups is equal, which is the result we want. So when running Levene’s test we’re hoping for a non-significant result!

##WE DONT WANT LEVENES TEST TO BE STATISTICALLY SIGNIF
#THE DATA NEEDS TO HAVE HOMOGENEITY OF VARIANCE IN ORDER FOR A T TEST TO FUNCTION, SO TO CONFIRM THAT WE USE A LEVENES TEST 
# # use the leveneTest() command from the car package to test homogeneity of variance
# # uses the same 'formula' setup that we'll use for our t-test: formula is y~x, where y is our DV and x is our IV
leveneTest(iou~education_rc, data = d)

Levene's Test for Homogeneity of Variance (center = median)
       Df F value    Pr(>F)    
group   1  34.095 7.263e-09 ***
      924                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This is more of a formality in our case, because we are using Welch’s t-test, which does not have the same assumptions as Student’s t-test (the default type of t-test) about variance. R defaults to using Welch’s t-test so this doesn’t require any extra effort on our part!

Check Normality

# # you only need to check the variables you're using in the current analysis
# # although you checked them previously, it's always a good idea to look them over again and be sure that everything is correct
# 
# # you can use the describe() command on an entire datafrom (d) or just on a single variable (d$pss)
# # use it to check the skew and kurtosis of your DV
describe(d$iou)

   vars   n mean   sd median trimmed  mad  min  max range skew kurtosis   se
X1    1 926 2.54 0.89   2.41    2.48 0.99 1.04 4.89  3.85  0.5    -0.62 0.03

# 
# # can use the describeBy() command to view the means and standard deviations by group
# # it's very similar to the describe() command but splits the dataframe according to the 'group' variable
describeBy(d$iou, group=d$education_rc)


 Descriptive statistics by group 
group: Less than undergraduate
   vars   n mean   sd median trimmed  mad  min  max range skew kurtosis   se
X1    1 732 2.65 0.92   2.52     2.6 1.04 1.04 4.89  3.85 0.37     -0.8 0.03
------------------------------------------------------------ 
group: Undergraduate or higher
   vars   n mean   sd median trimmed  mad  min  max range skew kurtosis   se
X1    1 194 2.14 0.67   2.04    2.09 0.66 1.04 4.26  3.22 0.71     0.06 0.05

# 
# # also use a histogram to examine your continuous variable
hist(d$iou)

# 
# # last, use a boxplot to examine your continuous and categorical variables together
#Cat/IV goes on the right, cont/dv goes on the left
boxplot(d$iou~d$education_rc)

Issues with My Data - PART OF YOUR WRITEUP

I combined the categorical levels of “equivalent to not completing highschool”, “equivalent to high school completion”, “equivalent to vocational/technical program completion”,“equivalent to AP/IB”, and “prefer not to say” to be under one category of “less than undergraduate”. I combined the levels of “undergraduate degree” and “graduate degree or higher” to be the other level titled “undergraduate degree or higher”. I tested homogeneity of variance using Levene’s test and found significant heterogeneity of variance (p7.263e-09).As a result, Welch’s T-test will be used.

Run a T-test

# # very simple! we specify the dataframe alongside the variables instead of having a separate argument for the dataframe like we did for leveneTest()
t_output <- t.test(d$iou~d$education_rc)

View Test Output

t_output


    Welch Two Sample t-test

data:  d$iou by d$education_rc
t = 8.7255, df = 408.17, p-value < 2.2e-16
alternative hypothesis: true difference in means between group Less than undergraduate and group Undergraduate or higher is not equal to 0
95 percent confidence interval:
 0.3958495 0.6260845
sample estimates:
mean in group Less than undergraduate mean in group Undergraduate or higher 
                             2.648806                              2.137839

Calculate Cohen’s d

# # once again, we use our formula to calculate cohen's d
d_output <- cohen.d(d$iou~d$education_rc)

View Effect Size

Trivial: < .2
Small: between .2 and .5
Medium: between .5 and .8
Large: > .8

d_output


Cohen's d

d estimate: 0.5876257 (medium)
95 percent confidence interval:
    lower     upper 
0.4268989 0.7483525

Write Up Results

I tested my hypothesis that people with an undergraduate degree or higher will report higher scores on the intolerance of uncertainty scale using an independent t-test. I combined the categorical levels of “equivalent to not completing highschool”, “equivalent to high school completion”, “equivalent to vocational/technical program completion”,“equivalent to AP/IB”, and “prefer not to say” to be under one category of “less than undergraduate”. I combined the levels of “undergraduate degree” and “graduate degree or higher” to be the other level titled “undergraduate degree or higher”. I tested homogeneity of variance using Levene’s test and found significant heterogeneity of variance (p7.263e-09).As a result, Welch’s T-test will be used.The data met all the assumptions of a t-test. I did find a significant difference, t(408.17)=8.73, p<.001, d=.59, 95% [0.42,0.75]. (refer to Figure 1) Our effect size was medium according to Cohen (1998)

References

Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.