ANOVA HW

Author

Kate Curtis

Loading Libraries

library(afex) # to run the ANOVA and plot results
library(psych) # for the describe() command
library(ggplot2) # to visualize our results
library(expss) # for the cross_cases() command
library(car) # for the leveneTest() command
library(emmeans) # for posthoc tests

Importing Data

d <- read.csv(file="Data/mydata.csv", header=T)
# 
# # new code! this adds a column with a number for each row. it acts as an identifier for our participations
d$row_id <- 1:nrow(d)

State Your Hypothesis - PART OF YOUR WRITEUP

Note: You can chose to run either a one-way ANOVA (a single IV with more than 3 levels) or a two-way/factorial ANOVA (at least two IVs) for the homework. You will need to specify your hypothesis and customize your code based on the choice you make. I will run both versions of the test here for illustrative purposes.

Women will feel a greater need to belong than men or those who do not identify as male or female.

Check Your Assumptions

ANOVA Assumptions

Independence of observations (confirmed by data report)
All levels of the IVs should have equal number of cases (ideally; in the real world, this varies) and there should be no empty cells. Cells with low numbers increase chance of Type II error. (we will check this below)
Homogeneity of variance should be assured (we will check this below)
Outliers should be identified and removed (we will check this below)
DV should be normally distributed for each level of the IV (we will check this below)

Check levels of IVs

# # for a one-way ANOVA
# # COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
table(d$gender, useNA = "always")


   f    m   nb <NA> 
1582  543   31    0

# 

# 
# # preview your changes and make sure everything is correct
table(d$gender, useNA = "always")


   f    m   nb <NA> 
1582  543   31    0

# 
# # check your variable types
str(d)

'data.frame':   2156 obs. of  7 variables:
 $ gender   : chr  "f" "m" "m" "f" ...
 $ age      : chr  "1 between 18 and 25" "1 between 18 and 25" "1 between 18 and 25" "1 between 18 and 25" ...
 $ swb      : num  4.33 4.17 1.83 5.17 3.67 ...
 $ mindful  : num  2.4 1.8 2.2 2.2 3.2 ...
 $ belong   : num  2.8 4.2 3.6 4 3.4 4.2 3.9 3.6 2.9 2.5 ...
 $ socmeduse: int  47 23 34 35 37 13 37 43 37 29 ...
 $ row_id   : int  1 2 3 4 5 6 7 8 9 10 ...

# 
# # make sure that your IV is recognized as a factor by R
d$gender <- as.factor(d$gender)

Check homogeneity of variance

# # use the leveneTest() command from the car package to test homogeneity of variance
# # uses the 'formula' setup: formula is y~x1*x2, where y is our DV and x1 is our first IV and x2 is our second IV
# 
# # for a one-way ANOVA
# # COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
#non significant is good 
leveneTest(belong~gender, data = d)

Levene's Test for Homogeneity of Variance (center = median)
        Df F value  Pr(>F)  
group    2  3.5412 0.02915 *
      2153                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Check for outliers using Cook’s distance and Residuals vs Leverage plot

Run a Regression

# # use this commented out section only if you need to remove outliers
# # to drop a single outlier, remove the # at the beginning of the line and use this code:
# # d <- subset(d, row_id!=c(1108))

# to drop multiple outliers, remove the # at the beginning of the line and use this code:
# d <- subset(d, row_id!=c(1108) & row_id!=c(602))

# use the lm() command to run the regression
# formula is y~x1*x2 + c, where y is our DV, x1 is our first IV, x2 is our second IV, and c is our covariate
# 
#  for a one-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
reg_model <- lm(belong ~ gender, data = d) 
#for one-way

Check for outliers

# for a one-way ANOVA
 # COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
# Cook's distance (cut off is .5 not .05)
plot(reg_model, 4)

# 
# Residuals vs Leveragem-- look for wobble in the red line or any participants outside the dashed lines for cooks distance 
plot(reg_model, 5)

Check Your Variables

# we'll use the describeBy() command to view skew and kurtosis across our IVs and make sure the DV is normally distributed across all of the levels
describeBy(d$belong, group = d$gender)


 Descriptive statistics by group 
group: f
   vars    n mean   sd median trimmed  mad min max range  skew kurtosis   se
X1    1 1582 3.26 0.59    3.3    3.27 0.59 1.3   5   3.7 -0.25    -0.21 0.01
------------------------------------------------------------ 
group: m
   vars   n mean   sd median trimmed  mad min max range  skew kurtosis   se
X1    1 543 3.06 0.65    3.1    3.08 0.59 1.3 4.9   3.6 -0.22    -0.05 0.03
------------------------------------------------------------ 
group: nb
   vars  n mean  sd median trimmed  mad min max range skew kurtosis   se
X1    1 31 3.34 0.5    3.3    3.33 0.44 2.3 4.6   2.3 0.29    -0.09 0.09

Issues with My Data - PART OF YOUR WRITEUP

We did not drop any participants since there were already three levels being tested. Levene’s test found significant homogeneity of variance (p =.029). There were no outliers in the data. We found no skew or kurtosis in the data.

Run an ANOVA

# for a one-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
aov_model <- aov_ez(data = d,
                    id = "row_id",
                    between = c("gender"),
                    dv = "belong",
                    anova_table = list(es = "pes"))

Contrasts set to contr.sum for the following variables: gender

View Output

Effect size cutoffs from Cohen (1988):

ηp2 = 0.01 indicates a small effect
ηp2 = 0.06 indicates a medium effect
ηp2 = 0.14 indicates a large effect

# for a one-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
nice(aov_model)

Anova Table (Type 3 tests)

Response: belong
  Effect      df  MSE         F  pes p.value
1 gender 2, 2153 0.36 23.09 *** .021   <.001
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1

Visualize Results

# for a one-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
afex_plot(aov_model, x = "gender")

Run Posthoc Tests

Only run posthocs if the test is significant! E.g., only run the posthoc tests on gender if there is a main effect for gender.

# for a one-way ANOVA
# COMMENT THIS SECTION OUR FROM THE HW IF YOU DO NOT USE IT
emmeans(aov_model, specs="gender", adjust="tukey")

Note: adjust = "tukey" was changed to "sidak"
because "tukey" is only appropriate for one set of pairwise comparisons

 gender emmean     SE   df lower.CL upper.CL
 f        3.26 0.0151 2153     3.22     3.30
 m        3.06 0.0258 2153     3.00     3.12
 nb       3.34 0.1080 2153     3.08     3.60

Confidence level used: 0.95 
Conf-level adjustment: sidak method for 3 estimates

pairs(emmeans(aov_model, specs="gender", adjust="tukey"))

 contrast estimate     SE   df t.ratio p.value
 f - m      0.1998 0.0299 2153   6.684  <.0001
 f - nb    -0.0828 0.1090 2153  -0.760  0.7277
 m - nb    -0.2826 0.1110 2153  -2.546  0.0295

P value adjustment: tukey method for comparing a family of 3 estimates

Write Up Results

One-Way ANOVA

We hypothesized that women would feel a greater need to belong than men and those who identify as non-binary. We did not drop any participants since there were already three levels being tested. Levene’s test found significant heterogeneity of variance (p =.029). This could mean our results are exagerated and we have to interpret with caution. There were no outliers in the data, and we found no skew or kurtosis in the data. The results are significant F(2,2153) = 23.09, p < .001, ηp2 = .02. We found that women are higher in need to belong than men (p <.001), non-binary feel a greater sense of need to belong than men (p = .030), and there is no difference between in need to belong between women and non-binary (p =.728). Overall there was a small effect on gender on need to belong, see Figure 1 (Choen, 1998).

References

Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.