library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(infer)
library(EnvStats)
##
## Attaching package: 'EnvStats'
##
## The following objects are masked from 'package:stats':
##
## predict, predict.lm
##
## The following object is masked from 'package:base':
##
## print.default
msdlabs <- read.csv("msd_labs.csv")
#msdlabs$race <- ifelse (msdlabs$AFAM == 1,1,NA )
##msdlabs$race <- ifelse (msdlabs$WHITE == 1,1)
msdlabs_race <- unite(msdlabs, race, AFAM, WHITE,OTHRACE)
### There were no variables in the file that had > 2 levels. I attempted to combine
### the race codes into 1 variable so I could use it in ANOVA. I tried doing that w/ the ifelse
### command; it didn't work. I then tried the unite command. It worked, but gave a result I
### could not use. Somehow that code was deleted. In my R code, there is a small blue box
### in the chunk above this one. The unite code should be in that chunk. I took the result of the
### unite function, imported it to NCSS, edited the race column in NCSS, exported it to .csv, then read ### it into R and ran the ANOVA. As I explained in the write up at the end of the code, something didn't ### work in R, but when i ran the recoded file in NCSS, it worked as expected. Go figure.
write.csv (msdlabs_race, file = "msdlabs_race.csv")
msdlabs_recoded <- read.csv ("msdlabs_recoded.csv")
###install.packages("lawstat")
###install.packages("sjstats")
library(lawstat)
library(sjstats)
##
## Attaching package: 'sjstats'
## The following object is masked from 'package:EnvStats':
##
## cv
## The following object is masked from 'package:infer':
##
## p_value
race_anova <- (aov(formula = HOUSEHOLD ~ I6, data = msdlabs_recoded))
summary(race_anova)
## Df Sum Sq Mean Sq F value Pr(>F)
## I6 1 21144 21144 11.67 0.000652 ***
## Residuals 1554 2815846 1812
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
levene.test(msdlabs_recoded$HOUSEHOLD, msdlabs_recoded$I6)
##
## Modified robust Brown-Forsythe Levene-type test based on the absolute
## deviations from the median
##
## data: msdlabs_recoded$HOUSEHOLD
## Test Statistic = 3.3522, p-value = 0.01833
effectsize::cohens_f(race_anova)
## For one-way between subjects designs, partial eta squared is equivalent
## to eta squared. Returning eta squared.
## # Effect Size for ANOVA
##
## Parameter | Cohen's f | 95% CI
## -----------------------------------
## I6 | 0.09 | [0.04, Inf]
##
## - One-sided CIs: upper bound fixed at [Inf].
### TukeyHSD(race_anova, conf.level=.95)
### A one-way ANOVA was executed to determine if there were statistically significant differences
### in the mean household income for 4 groups of students: Black, White, other races, and AIAN and API ### together. The ANOVA was statistically significant (F = 11.67, df = 1, 1554, p = .000652). This
### suggests that at least one of the 4 groups had an income level that differed significantly from the ### others. Cohen's F shows a relatively weak effect size.
### A problem: I do not understand why there was only 1 degree of freedom when
### there were 4 groups. There should have been 3 degrees of freedom. I ran the same test in NCSS and
### the results were as expected: 3 df, and the Tukey HSD showed statistically significant differences ### among the groups. The Tukey HSD did not run in R because R said "Error in TukeyHSD.aov(race_anova, ### conf.level = 0.95) :no factors in the fitted model." I do not understand what went wrong.