library(psych) # for the describe() command
library(ggplot2) # to visualize our results
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
library(expss) # for the cross_cases() command
## Loading required package: maditr
##
## To drop variable use NULL: let(mtcars, am = NULL) %>% head()
##
## Attaching package: 'maditr'
## The following object is masked from 'package:base':
##
## sort_by
##
## Attaching package: 'expss'
## The following object is masked from 'package:ggplot2':
##
## vars
library(car) # for the leveneTest() command
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:expss':
##
## recode
## The following object is masked from 'package:psych':
##
## logit
library(afex) # to run the ANOVA and plot results
## Loading required package: lme4
## Loading required package: Matrix
##
## Attaching package: 'lme4'
## The following object is masked from 'package:expss':
##
## dummy
## ************
## Welcome to afex. For support visit: http://afex.singmann.science/
## - Functions for ANOVAs: aov_car(), aov_ez(), and aov_4()
## - Methods for calculating p-values with mixed(): 'S', 'KR', 'LRT', and 'PB'
## - 'afex_aov' and 'mixed' objects can be passed to emmeans() for follow-up tests
## - Get and set global package options with: afex_options()
## - Set sum-to-zero contrasts globally: set_sum_contrasts()
## - For example analyses see: browseVignettes("afex")
## ************
##
## Attaching package: 'afex'
## The following object is masked from 'package:lme4':
##
## lmer
library(emmeans) # for posthoc tests
## Welcome to emmeans.
## Caution: You lose important information if you filter this package's results.
## See '? untidy'
# import the dataset you cleaned previously
# this will be the dataset you'll use throughout the rest of the semester
# use ARC data
d <- read.csv(file="Data/mydata.csv", header=T)
# new code! this adds a column with a number for each row. it makes it easier when we drop outliers later
d$row_id <- 1:nrow(d)
One-Way: I predict that there will be a significant effect of education level on independence, as measured by the Markers of Adulthood- Importance scale (MoA- Importance).
# you only need to check the variables you're using in the current analysis
# although you checked them previously, it's always a good idea to look them over again and be sure that everything is correct
str(d)
## 'data.frame': 3045 obs. of 7 variables:
## $ edu : chr "2 Currently in college" "5 Completed Bachelors Degree" "2 Currently in college" "2 Currently in college" ...
## $ marriage5 : chr "are currently divorced from one another" "are currently married to one another" "are currently married to one another" "are currently married to one another" ...
## $ moa_independence: num 3.67 3.67 3.5 3 3.83 ...
## $ moa_role : num 3 2.67 2.5 2 2.67 ...
## $ mindful : num 2.4 1.8 2.2 2.2 3.2 ...
## $ efficacy : num 3.4 3.4 2.2 2.8 3 2.4 2.3 3 3 3.7 ...
## $ row_id : int 1 2 3 4 5 6 7 8 9 10 ...
# make our categorical variables factors
d$edu <- as.factor(d$edu)
d$row_id <- as.factor(d$row_id)
# you can use the describe() command on an entire dataframe (d) or just on a single variable
describe(d$moa_independence)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 3045 3.54 0.46 3.67 3.61 0.49 1 4 3 -1.43 2.47 0.01
# we'll use the describeBy() command to view skew and kurtosis across the IV
describeBy(d$moa_independence, group = d$edu)
##
## Descriptive statistics by group
## group: 1 High school diploma or less, and NO COLLEGE
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 53 3.51 0.54 3.67 3.6 0.49 2 4 2 -1.34 1.03 0.07
## ------------------------------------------------------------
## group: 2 Currently in college
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 2460 3.54 0.46 3.67 3.61 0.49 1 4 3 -1.48 2.71 0.01
## ------------------------------------------------------------
## group: 3 Completed some college, but no longer in college
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 35 3.56 0.45 3.67 3.62 0.49 2.5 4 1.5 -0.9 -0.15 0.08
## ------------------------------------------------------------
## group: 4 Complete 2 year College degree
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 174 3.6 0.42 3.67 3.66 0.49 2.17 4 1.83 -1.14 0.79 0.03
## ------------------------------------------------------------
## group: 5 Completed Bachelors Degree
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 135 3.51 0.48 3.67 3.58 0.49 1.5 4 2.5 -1.33 2.12 0.04
## ------------------------------------------------------------
## group: 6 Currently in graduate education
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 132 3.48 0.46 3.5 3.53 0.49 2 4 2 -0.99 0.75 0.04
## ------------------------------------------------------------
## group: 7 Completed some graduate degree
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 56 3.4 0.45 3.5 3.44 0.49 1.67 4 2.33 -1.25 2.41 0.06
# also use histograms to examine your continuous variable
hist(d$moa_independence)
# One-way ANOVA:
table(d$edu)
##
## 1 High school diploma or less, and NO COLLEGE
## 53
## 2 Currently in college
## 2460
## 3 Completed some college, but no longer in college
## 35
## 4 Complete 2 year College degree
## 174
## 5 Completed Bachelors Degree
## 135
## 6 Currently in graduate education
## 132
## 7 Completed some graduate degree
## 56
# use the leveneTest() command from the car package to test homogeneity of variance
# uses the 'formula' setup: formula is y~x1*x2, where y is our DV and x1 is our first IV and x2 is our second IV
leveneTest(moa_independence~edu, data = d)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 6 0.2866 0.9436
## 3038
# use the lm() command to run the regression
# formula is y~x1*x2 + c, where y is our DV, x1 is our first IV, x2 is our second IV, and c is our covariate
reg_model <- lm(moa_independence ~ edu, data = d) #for one-way
# Cook's distance
plot(reg_model, 4)
# Residuals vs Leverage
plot(reg_model, 5)
The kurtosis for the group of those currently in college and those who have completed some graduate degree are above the cut-off.This means that the DV is not evenly distributed, so the data is slightly skewed.
The cell sizes are very unbalanced, with the majority of participants being currently in college. A small sample size for one of the levels of the variable limits the power and increases the Type II error rate.
aov_model <- aov_ez(data = d,
id = "row_id",
between = c("edu"),
dv = "moa_independence",
anova_table = list(es = "pes"))
## Contrasts set to contr.sum for the following variables: edu
Effect size cutoffs from Cohen (1988):
nice(aov_model)
## Anova Table (Type 3 tests)
##
## Response: moa_independence
## Effect df MSE F pes p.value
## 1 edu 6, 3038 0.21 1.95 + .004 .070
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1
afex_plot(aov_model, x = "edu")
To test the hypothesis that there will be a significant effect of education level on independence, as measured by the Markers of Adulthood- Importance scale (MoA- Importance), I used a one-way ANOVA. The data was unbalanced, with many more current college students participating in the survey (n = 2460) than the number of other participants combined (n = 585). The smallest group was those who completed some college, but were no longer in college (n = 35). This significantly reduces the power of the test and increases the chances of a Type II error.
No outliers were deemed necessary to remove following visual analysis of a Residuals vs Leverage plot. Levene’s test produced an insignificant result (p = .94), indicating that the data does not violate the assumption of homogeneity of variance, and no adhoc tests were necessary. I did not find a significant effect of education, F(6,3038) = 1.95, p = .07, ηp2 = .004 (negligible effect size; Cohen, 1988) on independence.(see Figure 1 for a comparison).
1 = High school diploma or less, and NO COLLEGE; 2 = Currently in college; 3 = Completed some college, but no longer in college; 4 = Complete 2 year College degree; 5 = Completed Bachelors Degree; 6 = Currently in graduate education; 7 = Completed some graduate degree
References
Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.