Last knit on March 17, 2024 at 10:00
This is an R Markdown Notebook. It has captured my learning process on understanding data analysis and so is very rough. It contains my code, with results of executing the code beneath each chunk. Pieces of this will be converted into my Results chapter.
For my Dissertation Committee: Use the TOC links above and skip ahead to the Descriptive Statistics section, there is a lot of housekeeping code before then.
Some questions are highlighted in yellow…
3/17/24 Arcenis consult - GOALS
This study has proposed a model that captures the relationships between three white racial affects (emotions) and antiracist self-efficacy, on the dependent variable antiracist accountability. Each of these variables has an instrument associated with it with Likert-style items. The mean of each scale is calculated in the script below, which are the IVs and DV (WTGUILT, WTFEAR, WTANGER, SELFEFF, ACCOUNT).
My full dataset is about 140 complete/valid observations, however the population of interest is counselors and that subset is 62 observations.
library(DiagrammeR)
#| label: apafg-conceptualmodel
#| apa-cap: Conceptual Model (DiagrammeR package)
#| apa-note: Conceptual model of hypothesis exploring the effect of white racial affects (white guilt, white fear, white anger) and mediator variable antiracist self-efficacy on antiracist accountability.
proposedpath <- create_graph() %>%
add_node(label = "White Anger") %>%
add_node(label = "White Guilt") %>%
add_node(label = "White Fear") %>%
add_node(label = "Antiracist Self-Efficacy") %>%
add_node(label = "Antiracist Accountability") %>%
add_edge(from = 1, to = 4) %>%
add_edge(from = 2, to = 4) %>%
add_edge(from = 3, to = 4) %>%
add_edge(from = 4, to = 5) %>%
set_node_attrs(
node_attr = "shape",
values = "square"
) %>%
render_graph(layout = "tree")
proposedpath
#install.packages("psych")
library(diagram)
## Loading required package: shape
library(psych)
#| label: apafg-conceptualmodel-old
#| apa-cap: Conceptual Model
#| apa-note: Conceptual model of hypothesis exploring the effect of white racial affects (white guilt, white fear, white anger) and mediator variable antiracist self-efficacy on antiracist accountability.
#first, show the primitives
xlim=c(-2,10)
ylim=c(0,10)
plot(NA,xlim=xlim,ylim=ylim,main="",axes=FALSE,xlab="",ylab="")
wg <- dia.rect(1,9,labels="white guilt",xlim=xlim,ylim=ylim)
wf <- dia.rect(1,5,"white fear",xlim=xlim,ylim=ylim)
wa <- dia.rect(1,1,labels="white anger",xlim=xlim,ylim=ylim)
se <- dia.rect(4,5,"antiracist\nself-efficacy",xlim=xlim,ylim=ylim)
aa <- dia.rect(9,5,"antiracist\naccountability",xlim=xlim,ylim=ylim)
dia.arrow(from=wg$right,to=se$top,labels="")
dia.arrow(from=wf$right,to=se$left,labels="")
dia.arrow(wa$right,to=se$bottom,labels="")
#dia.arrow(from=se,to=aa,labels="")
dia.arrow(se,aa)
A variety of clean-up tasks need to be done on the dataset before analysis:
In the R script, these options are configured to allow for different outputs of the analysis. Warning: This is brittle, all options not fully built out in code below. Can currently do all responses, counseling professionals only, or therapists only.
# Set all the below options to TRUE if limiting the analysis only to the study criteria
# If analyzing other professions, set counselingprofessionalsonly to FALSE and then configure them individually below
# counselingprofessionalsonly is counselors and counselor educators
counselingprofessionalsonly = TRUE
include.woman = TRUE
include.white = TRUE
include.alternatives = FALSE
# Note that some respondents used the open-text fields to indicate alternative answers for race and gender identity. These can either be excluded as a whole (meaning, only those who answered definitively as "white" and as "woman" in the radio button question on race and gender), or they can be conditionally included, based on code in the Data Cleanup section below.
# reset all variables - default is to include all professions
# IMPORTANT: The counselingprofessionalsonly ABOVE flag overrides the individual settings (probably bad design)
include.counselors = TRUE
include.counseloreducators = TRUE
include.therapists = FALSE
include.socialworkers = FALSE
include.psychologists = FALSE
# The 3 obs from psychologists are explicitly excluded below regardless of what is answered here
#if get kableExtra error when knitting pdf, install via devtools
library(kableExtra)
library(vtable)
library(tibble)
library(readr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:kableExtra':
##
## group_rows
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyverse)
## Warning: package 'ggplot2' was built under R version 4.3.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ purrr 1.0.2
## ✔ ggplot2 3.5.0 ✔ stringr 1.5.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ ggplot2::%+%() masks psych::%+%()
## ✖ ggplot2::alpha() masks psych::alpha()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::group_rows() masks kableExtra::group_rows()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(ggstats)
library(psych)
library(labelled)
library(rstatix)
##
## Attaching package: 'rstatix'
##
## The following object is masked from 'package:stats':
##
## filter
library(performance)
library(report)
library(rempsyc) # this can output "report" objects (from report package) and it gives me the apa tables I need
## Suggested APA citation: Thériault, R. (2023). rempsyc: Convenience functions for psychology.
## Journal of Open Source Software, 8(87), 5466. https://doi.org/10.21105/joss.05466
# https://cran.r-project.org/web/packages/rempsyc/vignettes/table.html
# https://remi-theriault.com/tutorials/
library(conflicted)
library(flextable)
## Warning: package 'flextable' was built under R version 4.3.3
library(ftExtra)
## Warning: package 'ftExtra' was built under R version 4.3.3
## Registered S3 method overwritten by 'ftExtra':
## method from
## as_flextable.data.frame flextable
library(officer)
## Warning: package 'officer' was built under R version 4.3.3
library(knitr)
conflicts_prefer(dplyr::filter, .quiet = TRUE)
conflicts_prefer(flextable::separate_header, .quiet = TRUE)
# Note: sjPlot may be a package to look into - data vis for stats in social sciences
# http://www.strengejacke.de/sjPlot/
# for power analysis - monte carlo simulations for lavaan
# abandoned, could not figure it out
#library(simsem)
library(seminr) #also abandoned but it has potential? sits on top of lavaan
## Warning: package 'seminr' was built under R version 4.3.3
The read_csv command is also set to identify default datatypes per column.
# reset memory
cleandata <- NULL
surveymonkeyraw <- NULL
# The export from SM should be: All Responses Data, CSV, Condensed, Original View (no rules applied), Actual Answer Text. There is no need for the PageOrder.csv file
# Below command throws a parsing issue warning but it's about row 2, a second header row output by SurveyMonkey, which gets discarded
# read the SurveyMonkey export and set datatypes; read_csv creates a tibble (not a dataframe)
surveymonkeyraw <- read_csv("data\\Lisa Wenninger - Dissertation Research.csv", col_types = cols(.default = "c", 'Start Date' = "D", 'End Date' = "D", '...11' = "c", '...14' = "c", '...17' = "c"))
## New names:
## • `` -> `...11`
## • `` -> `...14`
## • `` -> `...17`
## • `` -> `...23`
## • `` -> `...24`
## • `` -> `...25`
## • `` -> `...26`
## • `` -> `...27`
## • `` -> `...28`
## • `` -> `...29`
## • `` -> `...30`
## • `` -> `...31`
## • `` -> `...32`
## • `` -> `...33`
## • `` -> `...34`
## • `` -> `...35`
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
## dat <- vroom(...)
## problems(dat)
Remove the unnecessary columns added by SurveyMonkey, and rename columns to usable shortnames for purpose of these scripts.
The raw CSV file output from SurveyMonkey used the prompts from each survey item as the column header. These prompts are full sentence questions, and are quite unwieldy to use in programmatic analysis. Therefore, the columns will be renamed.
# First remove extraneous columns - these were all blank
cleandata <- select(surveymonkeyraw, -c('IP Address', 'Email Address', 'Collector ID', 'First Name', 'Last Name', 'Custom Data 1'))
#remove row 2 which is SM descriptor, not data
# This puts NA in the row
cleandata <- cleandata[-c(1), ]
# grab original SurveyMonkey column names as labels
originalsm.labels <- colnames(cleandata)
# renaming cols from long descriptions to simple var names by column index
cleandata <- cleandata %>%
rename(gender = 4, genderdescrip = 5, trans = 6, race = 7, raceother = 8, agecat = 9, sexuality = 10, sexualityother = 11, relationstatus = 12, region = 13, education = 14, class = 15, profID = 30, yrsprac = 31)
# rename the multiple-choice religion columns
# not currently doing any analysis on this data, maybe circle back later to output
# summary statistics on it? very low priority
cleandata <- cleandata %>%
rename(religprotestant = 16, religcatholic= 17, religchristian = 18, religjudaism = 19, religislam = 20, religbuddhism = 21, relighinduism = 22, relignativeamerican = 23, religpagan = 24, religwiccan = 25, religinterdenom = 26, relignotreligious = 27, religdidnotdisclose = 28, religotherspecified = 29 )
# Rename cols for each instrument
# Cols 32-47 are the 16 PCRW questions
cleandata <- cleandata %>%
rename(PCRWQ01 = 32, PCRWQ02 = 33, PCRWQ03 = 34, PCRWQ04 = 35, PCRWQ05 = 36, PCRWQ06 = 37, PCRWQ07 = 38, PCRWQ08 = 39, PCRWQ09 = 40, PCRWQ10 = 41, PCRWQ11 = 42, PCRWQ12 = 43, PCRWQ13 = 44, PCRWQ14 = 45, PCRWQ15 = 46, PCRWQ16 = 47)
# Cols 48-59 are the 12 WPAS questions
cleandata <- cleandata %>%
rename(WPASQ01 = 48, WPASQ02 = 49, WPASQ03 = 50, WPASQ04 = 51, WPASQ05 = 52, WPASQ06 = 53, WPASQ07 = 54, WPASQ08 = 55, WPASQ09 = 56, WPASQ10 = 57, WPASQ11 = 58, WPASQ12 = 59)
# Col 60 is attention check for switch of axes on A-RES after the 12 WPAS questions
cleandata <- cleandata %>%
rename(attentioncheck = 60)
# Cols 61-64 are the A-RES questions
cleandata <- cleandata %>%
rename(ARESQ01 = 61, ARESQ02 = 62, ARESQ03 = 63, ARESQ04 = 64)
# convert datatypes for instrument items from character to factor
# 6-level Likert-type values in PCRW and WPAS scales
likert_levels_6 <- c(
"Strongly Agree",
"Moderately Agree",
"Slightly Agree",
"Slightly Disagree",
"Moderately Disagree",
"Strongly Disagree"
)
# 4-level Likert-type values in A-RES scale
likert_levels_4 <- c(
"Agree strongly",
"Agree",
"Disagree",
"Disagree strongly"
)
# Apply 6 level factor to all items in both PCRW and WPAS
cleandata <- cleandata %>%
mutate(across(PCRWQ01:WPASQ12, ~ factor(.x, levels = likert_levels_6)))
# Apply 4 level factor to items in ARES
cleandata <- cleandata %>%
mutate(across(ARESQ01:ARESQ04, ~ factor(.x, levels = likert_levels_4)))
# assign the SM labels
# This has to happen after the mutate factor step which would otherwise
# wipe out the labels
cleandata <- cleandata %>%
set_variable_labels(.labels = originalsm.labels )
vt(cleandata)
| Name | Class | Label | Values |
|---|---|---|---|
| Respondent ID | character | Respondent ID | |
| Start Date | Date | Start Date | |
| End Date | Date | End Date | |
| gender | character | Gender: How do you primarily identify? | |
| genderdescrip | character | …11 | |
| trans | character | Do you identify as trans? (optional) | |
| race | character | Which race/ethnicity best describes you? (Please choose only one.) | |
| raceother | character | …14 | |
| agecat | character | Which category below includes your age? | |
| sexuality | character | What best describes how you identify your sexual or romantic interests or affiliation/orientation? | |
| sexualityother | character | …17 | |
| relationstatus | character | Which of the following best describes your current relationship status? | |
| region | character | In which region of the United States do you primarily live? (Or, where did you live for the longest time period, if you are not currently a U.S. resident?) | |
| education | character | What is the highest level of school you have completed or highest degree you have received? | |
| class | character | How would you identify your current social class? | |
| religprotestant | character | Do you identify with any of the following religions? (Please select all that apply.) | |
| religcatholic | character | …23 | |
| religchristian | character | …24 | |
| religjudaism | character | …25 | |
| religislam | character | …26 | |
| religbuddhism | character | …27 | |
| relighinduism | character | …28 | |
| relignativeamerican | character | …29 | |
| religpagan | character | …30 | |
| religwiccan | character | …31 | |
| religinterdenom | character | …32 | |
| relignotreligious | character | …33 | |
| religdidnotdisclose | character | …34 | |
| religotherspecified | character | …35 | |
| profID | character | What is your primary professional identity in the mental health field? If you are pre-licensed, please answer according to the field you are studying in or what license you will qualify for when training is done. | |
| yrsprac | character | How many years have you been practicing mental health counseling? Include your pre-licensed period of training in your total. | |
| PCRWQ01 | factor | When I hear about acts of racial violence, I become angry or depressed. | ‘Strongly Agree’ ‘Moderately Agree’ ‘Slightly Agree’ ‘Slightly Disagree’ ‘Moderately Disagree’ and 1 more |
| PCRWQ02 | factor | I feel safe in most neighborhoods, regardless of the racial composition. | ‘Strongly Agree’ ‘Moderately Agree’ ‘Slightly Agree’ ‘Slightly Disagree’ ‘Moderately Disagree’ and 1 more |
| PCRWQ03 | factor | I feel helpless about not being able to eliminate racism. | ‘Strongly Agree’ ‘Moderately Agree’ ‘Slightly Agree’ ‘Slightly Disagree’ ‘Moderately Disagree’ and 1 more |
| PCRWQ04 | factor | Sometimes I feel guilty about being White. | ‘Strongly Agree’ ‘Moderately Agree’ ‘Slightly Agree’ ‘Slightly Disagree’ ‘Moderately Disagree’ and 1 more |
| PCRWQ05 | factor | I have very few friends of other races. | ‘Strongly Agree’ ‘Moderately Agree’ ‘Slightly Agree’ ‘Slightly Disagree’ ‘Moderately Disagree’ and 1 more |
| PCRWQ06 | factor | I become sad when I think about racial injustice. | ‘Strongly Agree’ ‘Moderately Agree’ ‘Slightly Agree’ ‘Slightly Disagree’ ‘Moderately Disagree’ and 1 more |
| PCRWQ07 | factor | Being White makes me feel personally responsible for racism. | ‘Strongly Agree’ ‘Moderately Agree’ ‘Slightly Agree’ ‘Slightly Disagree’ ‘Moderately Disagree’ and 1 more |
| PCRWQ08 | factor | I never feel ashamed about being White. | ‘Strongly Agree’ ‘Moderately Agree’ ‘Slightly Agree’ ‘Slightly Disagree’ ‘Moderately Disagree’ and 1 more |
| PCRWQ09 | factor | I am fearful that racial minority populations are rapidly increasing in the U.S., and my group will no longer be the numerical majority. | ‘Strongly Agree’ ‘Moderately Agree’ ‘Slightly Agree’ ‘Slightly Disagree’ ‘Moderately Disagree’ and 1 more |
| PCRWQ10 | factor | I am angry that racism exists. | ‘Strongly Agree’ ‘Moderately Agree’ ‘Slightly Agree’ ‘Slightly Disagree’ ‘Moderately Disagree’ and 1 more |
| PCRWQ11 | factor | I am distrustful of people of other races. | ‘Strongly Agree’ ‘Moderately Agree’ ‘Slightly Agree’ ‘Slightly Disagree’ ‘Moderately Disagree’ and 1 more |
| PCRWQ12 | factor | I feel good about being White. | ‘Strongly Agree’ ‘Moderately Agree’ ‘Slightly Agree’ ‘Slightly Disagree’ ‘Moderately Disagree’ and 1 more |
| PCRWQ13 | factor | I often find myself fearful of people of other races. | ‘Strongly Agree’ ‘Moderately Agree’ ‘Slightly Agree’ ‘Slightly Disagree’ ‘Moderately Disagree’ and 1 more |
| PCRWQ14 | factor | Racism is dehumanizing to people of all races, including Whites. | ‘Strongly Agree’ ‘Moderately Agree’ ‘Slightly Agree’ ‘Slightly Disagree’ ‘Moderately Disagree’ and 1 more |
| PCRWQ15 | factor | I am afraid that I abuse my power and privilege as a White person. | ‘Strongly Agree’ ‘Moderately Agree’ ‘Slightly Agree’ ‘Slightly Disagree’ ‘Moderately Disagree’ and 1 more |
| PCRWQ16 | factor | It disturbs me when people express racist views. | ‘Strongly Agree’ ‘Moderately Agree’ ‘Slightly Agree’ ‘Slightly Disagree’ ‘Moderately Disagree’ and 1 more |
| WPASQ01 | factor | I plan to work to change our unfair social structure that promotes racism. | ‘Strongly Agree’ ‘Moderately Agree’ ‘Slightly Agree’ ‘Slightly Disagree’ ‘Moderately Disagree’ and 1 more |
| WPASQ02 | factor | I take action against racism with people I know. | ‘Strongly Agree’ ‘Moderately Agree’ ‘Slightly Agree’ ‘Slightly Disagree’ ‘Moderately Disagree’ and 1 more |
| WPASQ03 | factor | I accept responsibility to change racism. | ‘Strongly Agree’ ‘Moderately Agree’ ‘Slightly Agree’ ‘Slightly Disagree’ ‘Moderately Disagree’ and 1 more |
| WPASQ04 | factor | I have not done anything about racism. | ‘Strongly Agree’ ‘Moderately Agree’ ‘Slightly Agree’ ‘Slightly Disagree’ ‘Moderately Disagree’ and 1 more |
| WPASQ05 | factor | I look forward to creating a more racially equitable society. | ‘Strongly Agree’ ‘Moderately Agree’ ‘Slightly Agree’ ‘Slightly Disagree’ ‘Moderately Disagree’ and 1 more |
| WPASQ06 | factor | I intend to work toward dismantling racism. | ‘Strongly Agree’ ‘Moderately Agree’ ‘Slightly Agree’ ‘Slightly Disagree’ ‘Moderately Disagree’ and 1 more |
| WPASQ07 | factor | I don’t care to explore how I supposedly have unearned benefits from being White. | ‘Strongly Agree’ ‘Moderately Agree’ ‘Slightly Agree’ ‘Slightly Disagree’ ‘Moderately Disagree’ and 1 more |
| WPASQ08 | factor | I am curious about how to communicate effectively to break down racism. | ‘Strongly Agree’ ‘Moderately Agree’ ‘Slightly Agree’ ‘Slightly Disagree’ ‘Moderately Disagree’ and 1 more |
| WPASQ09 | factor | I’m glad to explore my racism. | ‘Strongly Agree’ ‘Moderately Agree’ ‘Slightly Agree’ ‘Slightly Disagree’ ‘Moderately Disagree’ and 1 more |
| WPASQ10 | factor | I want to begin the process of eliminating racism. | ‘Strongly Agree’ ‘Moderately Agree’ ‘Slightly Agree’ ‘Slightly Disagree’ ‘Moderately Disagree’ and 1 more |
| WPASQ11 | factor | I take action to dismantle racism. | ‘Strongly Agree’ ‘Moderately Agree’ ‘Slightly Agree’ ‘Slightly Disagree’ ‘Moderately Disagree’ and 1 more |
| WPASQ12 | factor | I am eager to find out more about letting go of racism. | ‘Strongly Agree’ ‘Moderately Agree’ ‘Slightly Agree’ ‘Slightly Disagree’ ‘Moderately Disagree’ and 1 more |
| attentioncheck | character | Please respond to this item by marking the option “Agree strongly.” | |
| ARESQ01 | factor | People like you can influence the way racism affects others. | ‘Agree strongly’ ‘Agree’ ‘Disagree’ ‘Disagree strongly’ |
| ARESQ02 | factor | You consider yourself well qualified to participate in movements related to racism. | ‘Agree strongly’ ‘Agree’ ‘Disagree’ ‘Disagree strongly’ |
| ARESQ03 | factor | You have a pretty good understanding of the important issues facing our country around racism. | ‘Agree strongly’ ‘Agree’ ‘Disagree’ ‘Disagree strongly’ |
| ARESQ04 | factor | People like you don’t have any say about racism. | ‘Agree strongly’ ‘Agree’ ‘Disagree’ ‘Disagree strongly’ |
Make the dataset a tad easier to parse - this is intended as temporary code, eval=false later. It also means that index-level column references WILL NOT WORK below this point.
cleandata <- cleandata %>%
select(-starts_with('relig'))
cleandata <- cleandata %>%
select(-contains('date'))
The options offered to respondents in the SurveyMonkey form were descriptive and often long (such as “White / European American”). This step replaces the long version with shorter versions (such as “white”) for improved readability in charts/graphs output.
library(dplyr)
# recode values in rows
cleandata <- cleandata %>%
mutate(gender = recode(gender, 'Woman or female' = 'woman',
'Non-binary or gender nonconforming' = 'nonbinary',
'Man or male' = 'man',
'Prefer to self-describe\\, below' = 'self-describe'))
cleandata <- cleandata %>%
mutate(relationstatus = recode(relationstatus, "Cohabitating with a significant other or in a domestic partnership" = "Domestic Partner",
"Single, never married, currently unpartnered" = "Single",
"Dating one or more partners (not cohabitating)" = "Dating",
"Skip/Prefer not to say" = "Did not disclose"))
cleandata <- cleandata %>%
mutate(region = recode(region, "South (AL, AK, DC, DE, FL, GA, KY, LA, MD, MS, NC, OK, SC, TN, TX, VA, WV)" = "South",
"West (AK, AZ, CA, CO, HI, ID, MT, NV, NM, OR, UT, WA, WY)" = "West",
"Midwest (IL, IN, IA, KS, MI, MN, MO, NE, ND, OH, SD, WI)" = "Midwest",
"Northeast (CT, ME, MA, NH, NJ, NY, PA, RI, VT)" = "Northeast",
# "U.S. territory, U.S. military base, etc (American Samoa, Guam, PR, US-VI, etc.)" = "US territory"
))
cleandata <- cleandata %>%
mutate(profID = recode(profID, "Counselor (including Clinical Counselor, Career Counselor, Rehab Counselor, School Counselor, Mental Health Counselor, etc)" = "counselor",
"Counselor Educator, teaching in a counselor education program at either the master's or doctoral level" = "counselor educator",
"Therapist (including Marriage and Family Therapist, Mental Health Therapist, etc)" = "therapist",
"Social Worker (including Clinical Social Worker, Master Social Worker, etc)" = "social worker",
"Psychologist (including Clinical Psychologist)" = "psychologist",
"Psychiatrist (including MD, DO, etc)" = "psychiatrist",
"other allied mental health professional (including RN, PA, etc)" = "other",
"none of these, I am not a mental health professional" = 'none' ))
The raw dataset needs to be examined to be sure that invalid and incomplete responses are excluded from the analysis, and to confirm that the resulting subset of data meets parametric assumptions (Flamez et al., 2017) before analysis is conducted.
Remove any incomplete rows, where the participant exited the study before completing it, or entered a disqualifying response on the gender or race questions.
Remove responses that failed the attention check that preceded the last instrument. The reason for the attention check was that the last instrument’s scale is flipped compared to the previous 2 instruments (“strongly agree” on the left of the screen instead of on the right).
attncheckfails = 0
incompletes = 0
numrawobs = 0
observations = 0
#preserve the full dataset in case it's needed
fulldata <- cleandata
#continue with operations on cleandata as the baseline of actual data to be used for analysis
# 1) Cull incompletes
# Check for final question in the whole survey being blank
# Later analysis: Where did people drop off?
incompletes <- cleandata %>%
filter(
(is.na(ARESQ04) | ARESQ04 == "")
)
cleandata <- cleandata %>%
filter(
(!is.na(ARESQ04) | ARESQ04 != "")
)
# 2) Failed attention check
attncheckfails <- cleandata %>%
filter(
(attentioncheck != 'Agree strongly')
)
cleandata <- cleandata %>%
filter(
(attentioncheck == 'Agree strongly')
)
numrawobs = nrow(fulldata)
numcleanobs = nrow(cleandata)
The “gender” and “race” questions allowed participants to self-identify in alternative ways. Some of these answers may indicate a qualifying response, some will indicate disqualifying. These questions were included in order to make the research inclusive and supportive of identities.
The code below was developed after manually inspecting these fields and identifying values that can be included or should be excluded. If more observations are gathered, then those subsequent records would need to also be manually inspected.
For a strict interpretation of respondent answers, use the flag at top of the notebook to turn include.alternatives to FALSE.
# A value was only entered in the raceother column if the respondent chose Altnerative for the
# Race question
altrace = NULL
#This entire section will be skipped if include.alternatives flag set at top is false
if (include.alternatives == TRUE) {
# First display the data in these alternative fields
altrace <- cleandata %>%
select(raceother) %>%
filter((!is.na(raceother) | raceother != "")
)
# Based on manual inspection, exclude the "White & Asian" answer, mixed or biracial does not meet study criteria
cleandata <- cleandata %>%
filter((!str_detect(raceother, 'Asian')) | is.na(raceother))
cleandata <- cleandata %>%
mutate(race = recode(race, "Alternate or self-identified answer:" = "Other White"))
numcleanobs = nrow(cleandata)
}
### BELOW NOT WORKING
### New to fix the sexuality fields
cleandata <- cleandata %>%
mutate(
sexualty = case_when(
sexualityother == "Asexual/panromantic" ~ "Asexual",
sexualityother == "Demi-pansexual" ~ "Pansexual"
)
)
Values from the fill-in “other race” field in the currently selected data subset are:
The respondents who indicated “Jewish” as part of their alternative answer are included in the study, as long as that is the sole explanatory answer.
The respondent who indicated “Asian” as part of their alternative answer is excluded.
For gender identity, the SurveyMonkey survey was set up to disqualify all answers except for “woman or female.”
None of the respondents who answered “woman or female” also answered “Yes” to the question asking if they identify as trans. One respondent answered “Non-binary or gender nonconforming” to the gender question, then answered “Yes” to trans, however they did not complete the instruments due to the “non-binary” answer for gender. The survey was set up to allow a respondent to answer “Yes” for the trans question, but they would have had to self-identify as “woman or female” in order to proceed with the survey instruments.
# Save the records
psychobs <- cleandata %>%
filter(
(profID == 'psychologist')
)
cleandata <- cleandata %>%
filter(
(profID != 'psychologist')
)
numcleanobs = nrow(cleandata)
numpsychobs = nrow(psychobs)
The full dataset included 3 observations from respondents who identify primarily as psychologists.
Psychologists have different training and hold a different professional identity, and were not targeted as part of this study. In addition,3 is a very small number and does not allow for meaningful analysis or comparisons to other groups. Therefore observations from respondents identifying as psychologists as primary professional identity will be removed.
The original dataset from SurveyMonkey had 149 rows (not including 2 header rows).
14 respondents entered disqualifying identity responses or quit before completing the survey and their partial entries are excluded.
3 observations failed the attention check and are excluded.
3 respondents identified as psychologists and are excluded.
The dataset is now at 129 rows.
Run some statistics on the data that remains in order to check assumptions in advance of parametric analysis.
df_gender <- cleandata %>%
group_by(gender) %>%
summarise(counts = n())
df_gender
# ggplot(df_gender, aes(x=gender, y=counts)) +
# geom_bar(fill = "#0073C2FF", stat="identity") +
# theme_minimal()
Note: The raw dataset did include several respondents who self-identified as non-binary, however that was a disqualifying response (along with self-identifying as male). Those partial responses were excluded.
By contrast, one respondent left the yes/no question asking if trans blank, and gave a description of their identity. Since they self-identified as female in the main gender question, their response is included.
If on the gender or race question, they selected “Other (describe)” then their response was not automatically excluded by SurveyMonkey, but instead was manually examined to determine if the self-identification description was sufficient to meet the study criteria or not. This needs to be done outside this script (unless there’s a way to create a prompt with a yes/no answer during the knit process?). For now, such answers are being excluded further down the script; this section is only highlighting that they exist in the pared-down dataset.
df_trans <- cleandata %>%
group_by(trans) %>%
summarise(counts = n())
df_trans
ggplot(df_trans, aes(x=trans, y=counts)) +
geom_bar(fill = "#0073C2FF", stat="identity") +
theme_minimal()
df_race <- cleandata %>%
group_by(race) %>%
summarise(counts = n())
df_race
ggplot(df_race, aes(x=race, y=counts)) +
geom_bar(fill = "#0073C2FF", stat="identity") +
theme_minimal()
df_age <- cleandata %>%
group_by(agecat) %>%
summarise(counts = n())
df_age
ggplot(df_age, aes(x=agecat, y=counts)) +
geom_bar(fill = "#0073C2FF", stat="identity") +
theme_minimal()
df_profID <- cleandata %>%
group_by(profID) %>%
summarise(counts = n())
df_profID
ggplot(df_profID, aes(x=profID, y=counts)) +
geom_bar(fill = "#0073C2FF", stat="identity") +
theme_minimal()
df_yrsprac <- cleandata %>%
group_by(yrsprac) %>%
summarise(counts = n())
df_yrsprac
ggplot(df_yrsprac, aes(x=yrsprac, y=counts)) +
geom_bar(fill = "#0073C2FF", stat="identity") +
theme_minimal()
df_region <- cleandata %>%
group_by(region) %>%
summarise(counts = n())
df_region
ggplot(df_region, aes(x=region, y=counts)) +
geom_bar(fill = "#0073C2FF", stat="identity") +
theme_minimal()
df_relationstatus <- cleandata %>%
group_by(relationstatus) %>%
summarise(counts = n())
df_relationstatus
ggplot(df_relationstatus, aes(x=relationstatus, y=counts)) +
geom_bar(fill = "#0073C2FF", stat="identity") +
theme_minimal()
(this is the section that got moved to earlier in the script, had to rename coredata to cleandata)
Now we will score each respondent’s answers on the three instruments.
Note: The SELFEFF A-RES Self-Efficacy score for antiracist action is scaled such that LOWER numbers mean higher levels of self-efficacy. From the original paper: “Items were answered using a 4-point, fully-anchored, Likert-type scale ranging from Agree strongly (1) to Disagree strongly (4). Items were rescored so a higher number would indicate a more positively-valanced response.” Because of this, the items were similarly rescored here.
#### Recode the Likert responses ####
## from https://towardsdatascience.com/working-with-survey-data-clean-and-visualize-likert-scale-questions-in-r-6a78e3b9c7b2
## USED FOR PCRW & WPAS
likert_recode <- function(x) {
as.numeric(case_when(
x == "Strongly Disagree" ~ 1,
x == "Moderately Disagree" ~ 2,
x == "Slightly Disagree" ~ 3,
x == "Slightly Agree" ~ 4,
x == "Moderately Agree" ~ 5,
x == "Strongly Agree" ~ 6,
))
}
# Negative Recode
likert_recode_negative <- function(x) {
as.numeric(case_when(
x == "Strongly Disagree" ~ 6,
x == "Moderately Disagree" ~ 5,
x == "Slightly Disagree" ~ 4,
x == "Slightly Agree" ~ 3,
x == "Moderately Agree" ~ 2,
x == "Strongly Agree" ~ 1,
))
}
#######
# Used for A-RES
# See note above from original published paper saying rescored
likert_recode_4 <- function(x) {
as.numeric(case_when(
x == "Agree strongly" ~ 4,
x == "Agree" ~ 3,
x == "Disagree" ~ 2,
x == "Disagree strongly" ~ 1,
))
}
# Negative Recode
likert_recode_negative_4 <- function(x) {
as.numeric(case_when(
x == "Agree strongly" ~ 1,
x == "Agree" ~ 2,
x == "Disagree" ~ 3,
x == "Disagree strongly" ~ 4,
))
}
## PCRW
# This is the Respondent ID col1, plus the PCRW questions 01-16 columns
#pcrw_subset <- cleandata[ , c(1,PCRWQ01:PCRWQ16)]
pcrw_subset <- cleandata %>%
select("Respondent ID", PCRWQ01:PCRWQ16)
# Questions 2, 8, and 12 are reverse-scored
# "Higher scores should indicate greater levels of costs of racism. There is no total score, so please use only subscale scores."
# Apply the recoding
pcrw_positive <- pcrw_subset %>%
select(-'Respondent ID', -PCRWQ02, -PCRWQ08, -PCRWQ12) %>%
mutate_all(likert_recode)
pcrw_negative <- pcrw_subset %>%
select(PCRWQ02, PCRWQ08, PCRWQ12) %>%
mutate_all(likert_recode_negative)
#Recombine data - but this just merges them without putting them in the correct order by Q#
pcrw_subset_recoded <- cbind(pcrw_positive, pcrw_negative)
#pcrw_subset_recoded %>% select(order(colnames(pcrw_subset_recoded )))
pcrw_subset_recoded <- pcrw_subset_recoded %>%
select(sort(tidyselect::peek_vars()))
# pcrw_wtanger_items <- pcrw_subset_recoded[ , c(1,3,6,10,14,16)]
# pcrw_wtguilt_items <- pcrw_subset_recoded[ , c(4,7,8,12,15)]
# pcrw_wtfear_items <- pcrw_subset_recoded[ , c(2,5,9,11,13)]
#### Calculate Totals ####
# Subscale Totals
#cleandata$pcrwsub1total <- rowSums(pcrw_subset_recoded) #Create column of Subscale 1 total per person
# White Empathy aka WTANGER in my study
#cleandata$WTANGER <- rowSums(pcrw_subset_recoded[ , c(1,3,6,10,14,16)], na.rm=TRUE)
#cleandata$WTANGER <- rowMeans(pcrw_subset_recoded[ , c(1,3,6,10,14,16)], na.rm=TRUE)
cleandata$WTANGER <- rowMeans(pcrw_subset_recoded[c("PCRWQ01", "PCRWQ03", "PCRWQ06", "PCRWQ10", "PCRWQ14", "PCRWQ16")], na.rm=TRUE)
#Create column of PCRW Subscale 1 total per respondent
# White Guilt aka WTGUILT in my study
#cleandata$WTGUILT <- rowSums(pcrw_subset_recoded[ , c(4,7,8,12,15)], na.rm=TRUE)
#cleandata$WTGUILT <- rowMeans(pcrw_subset_recoded[ , c(4,7,8,12,15)], na.rm=TRUE)
cleandata$WTGUILT <- rowMeans(pcrw_subset_recoded[c("PCRWQ04", "PCRWQ07", "PCRWQ08", "PCRWQ12", "PCRWQ15")], na.rm=TRUE)
#Create column of PCRW Subscale 2 total per respondent
# White Fear aka WTFEAR in my study
#cleandata$WTFAR <- rowSums(pcrw_subset_recoded[ , c(2,5,9,11,13)], na.rm=TRUE)
#cleandata$WTFEAR <- rowMeans(pcrw_subset_recoded[ , c(2,5,9,11,13)], na.rm=TRUE) #Create column of PCRW
cleandata$WTFEAR <- rowMeans(pcrw_subset_recoded[c("PCRWQ02", "PCRWQ05", "PCRWQ09", "PCRWQ11", "PCRWQ13")], na.rm=TRUE)
# don't know if I need this?
#pcrw_subset_recoded <- cbind(id = pcrw_subset$'Respondent ID', pcrw_subset_recoded) #Recombine data with Respondent IDs
## A-RES: COMPETENCE
# This is the Respondent ID col1, plus the A-RES questions 01-02 columns
# Only the Competence subscale is being used in this study
#aresc_subset <- cleandata[ , c(1,61:62)]
aresc_subset <- cleandata %>%
select('Respondent ID', ARESQ01:ARESQ02)
# This instrument is scaled opposite to the others
# Apply the recoding
aresc_subset <- aresc_subset %>%
select(-'Respondent ID') %>%
mutate_all(likert_recode_4)
# no negative-scored items in the Competence subscale
## A-RES: IMPACT
# This is the Respondent ID col1, plus the A-RES Impact questions 03-04 columns
#aresi_subset <- cleandata[ , c(1,63:64)]
aresi_subset <- cleandata %>%
select('Respondent ID', ARESQ03:ARESQ04)
# This instrument is scaled opposite to the others
# Question 4 is reverse-scored
# Apply the recoding
aresi_positive <- aresi_subset %>%
select(-'Respondent ID', -ARESQ04) %>%
mutate_all(likert_recode_4)
aresi_negative <- aresi_subset %>%
select(ARESQ04) %>%
mutate_all(likert_recode_negative_4)
#Recombine data
aresi_subset_recoded <- cbind(aresi_positive, aresi_negative)
###NOT USING THIS 'IMPACT' VARIABLE - NOT RECORDING IT INTO THE OUTPUT
head(aresi_subset_recoded)
#### Calculate Totals ####
# Subscale Totals
#cleandata$SELFEFF <- rowSums(aresc_subset)
cleandata$SELFEFF <- rowMeans(aresc_subset)
#Create column of subscale COMPETENCE total per person
##############-----------------------
## WPAS
# Dependent (Outcome) Variable
# This is the Respondent ID col1, plus the WPAS questions 01-12 columns
#wpas_subset <- cleandata[ , c(1,48:59)]
wpas_subset <- cleandata %>%
select('Respondent ID', WPASQ01:WPASQ12)
# This study uses Subscale 1 only, Confronting White Privilege, renamed to Confronting Racism
# Questions 4 and 7 are reverse-scored
# "Higher scores correspond with higher levels of acknowledgement of [racism]."
# Apply the recoding
wpas_positive <- wpas_subset %>%
select(-'Respondent ID', -WPASQ04, -WPASQ07) %>%
mutate_all(likert_recode)
wpas_negative <- wpas_subset %>%
select(WPASQ04, WPASQ07) %>%
mutate_all(likert_recode_negative)
#Recombine data
wpas_subset_recoded <- cbind(wpas_positive, wpas_negative)
#### Calculate Totals ####
# Subscale Totals
#cleandata$ACCOUNT <- rowSums(wpas_subset_recoded)
cleandata$ACCOUNT <- rowMeans(wpas_subset_recoded)
#Create column of Subscale 1 total per person
# Set labels on the data
cleandata <- cleandata %>%
set_variable_labels(WTANGER = "PCRW White Anger", WTGUILT = "PCRW White Guilt", WTFEAR = "PCRW White Fear", SELFEFF = "A-RES Antiracist Self Efficacy", ACCOUNT = "WPAS Antiracist Accountability")
This may be useful for a future study on the same dataset, looking at differences between the different professional identities.
#do the analysis of variance
aov.prof = aov(ACCOUNT~profID,data = cleandata)
#show the summary table
summary(aov.prof)
## Df Sum Sq Mean Sq F value Pr(>F)
## profID 3 0.61 0.2029 0.603 0.614
## Residuals 125 42.07 0.3366
print(model.tables(aov.prof,"means"),digits=3)
## Tables of means
## Grand mean
##
## 5.393411
##
## profID
## counselor counselor educator social worker therapist
## 5.45 5.48 5.24 5.35
## rep 55.00 11.00 10.00 53.00
#report the means and the number of subjects/cell
boxplot(ACCOUNT~profID,data = cleandata)
#graphical summary appears in graphics window
# Using package report
# gender is col 4, omitting it as all are women
# spell_n is about output numbers as words vs digits
paste(
report_participants(data = subset(cleandata[ c(7,9,10,12, 13, 14, 15)], age = agecat, group = region, spell_n = TRUE),
"were recruited in the study by means of torture and coercion.")
)
## Warning: In subset.data.frame(cleandata[c(7, 9, 10, 12, 13, 14, 15)], age = agecat,
## group = region, spell_n = TRUE) :
## extra arguments 'age', 'group', 'spell_n' will be disregarded
## [1] "129 participants (Education: Bachelor degree, 7.75%; Doctorate degree, 14.73%; Master's degree, 76.74%; Skip/Prefer not to say, 0.78%)"
# <!--
#
# The dataset was comprised of `r report_participants(cleandata, age = agecat, group = region, spell_n = TRUE)`.
#
# -->
report_sample(cleandata, group_by = "profID", select = c("WTANGER", "WTGUILT", "WTFEAR", "SELFEFF", "ACCOUNT"))
library(apaTables)
table1 <- apa.cor.table(cleandata,
table.number=1)
print(table1)
##
##
## Table 1
##
## Descriptive Statistics and Correlations
##
##
## Variable N M SD 1 2 3 4
## 1. WTANGER 129 5.32 0.52
##
##
##
## 2. WTGUILT 129 3.81 1.07 .21*
## [.04, .37]
## p = .017
##
## 3. WTFEAR 129 2.07 0.66 .04 .19*
## [-.13, .21] [.02, .35]
## p = .660 p = .031
##
## 4. SELFEFF 129 3.33 0.46 .11 .01 -.37**
## [-.07, .28] [-.16, .18] [-.51, -.21]
## p = .223 p = .896 p < .001
##
## 5. ACCOUNT 129 5.39 0.58 .30** .26** -.33** .53**
## [.13, .45] [.09, .42] [-.47, -.16] [.39, .64]
## p < .001 p = .003 p < .001 p < .001
##
##
## Note. N = number of cases. M = mean. SD = standard deviation.
## Values in square brackets indicate the 95% confidence interval.
## * indicates p < .05. ** indicates p < .01.
##
# apa.save(filename = "table1.doc",
# table1)
#apa.knit.table.for.pdf(table1)
Here is a correlations table
cleandata %>%
select(WTANGER, WTGUILT, WTFEAR, SELFEFF, ACCOUNT) %>%
apa.cor.table()
##
##
## Table 0
##
## Descriptive Statistics and Correlations
##
##
## Variable N M SD 1 2 3 4
## 1. WTANGER 129 5.32 0.52
##
##
##
## 2. WTGUILT 129 3.81 1.07 .21*
## [.04, .37]
## p = .017
##
## 3. WTFEAR 129 2.07 0.66 .04 .19*
## [-.13, .21] [.02, .35]
## p = .660 p = .031
##
## 4. SELFEFF 129 3.33 0.46 .11 .01 -.37**
## [-.07, .28] [-.16, .18] [-.51, -.21]
## p = .223 p = .896 p < .001
##
## 5. ACCOUNT 129 5.39 0.58 .30** .26** -.33** .53**
## [.13, .45] [.09, .42] [-.47, -.16] [.39, .64]
## p < .001 p = .003 p < .001 p < .001
##
##
## Note. N = number of cases. M = mean. SD = standard deviation.
## Values in square brackets indicate the 95% confidence interval.
## * indicates p < .05. ** indicates p < .01.
##
# code from https://github.com/rempsyc/rempsyc/issues/9
conflicts_prefer(dplyr::lag)
## [conflicted] Will prefer dplyr::lag over any other package.
cleandata %>%
pivot_longer(cols = c(agecat, sexuality, relationstatus, education, class, profID, yrsprac), names_to = "Characteristics", values_to = "Category") %>%
count(Characteristics, Category, name = "Frequency") %>%
arrange(factor(Characteristics, levels = c("Age", "Sexual Identity", "Relationship Status", "Highest Education Completed", "SES", "Professional Identity", "Years Practicing"))) %>%
mutate(Percentage = prop.table(Frequency) * 100,
Category = as.character(Category),
Characteristics = ifelse(Characteristics == lag(
Characteristics, default = ""), "", Characteristics)) %>%
# nice_table(title = c("Table 1", "Sample demographic characteristics (N = 877)")) %>%
as_flextable() %>%
save_as_image(path = "sampledemographics2.png")
## Warning: ftExtra:::as_flextable.data.frame is deprecated and will be removed in
## the future release. Consider using flextalbe's implementation by running
## `.S3method("as_flextable", "data.frame", flextable:::as_flextable.data.frame)`
## [1] "sampledemographics2.png"
Next we will choose which responses to filter out. Choices coded here support the study criteria: self identified white, self identified woman, self-identified counselors. These can be reconfigured using the flags at the top of the script and the analysis re-run accordingly. (Possible to do later: Set up parameterized reporting)
## LIMIT THE DATASET
# This data will be reduced to a subset called coredata for this configured analysis
coredata <- cleandata
# Next, remove disqualifying responses
# Use flags at top of the script to turn these on or off
# 1) Not self-identifying as woman
if(include.woman == TRUE) {
coredata <- coredata %>%
filter(
(gender == 'woman')
)
}
print(nrow(coredata))
## [1] 129
# 2) Not self-identifying solely as white
if(include.white == TRUE && include.alternatives == FALSE) {
coredata <- coredata %>%
filter(
(race == 'White / European-American')
)
} else if(include.white == TRUE && include.alternatives == TRUE) {
coredata <- coredata %>%
filter(
(race == 'White / European-American' | race == 'Other White')
)
}
# 3) Not identifying as counselor and/or counselor educator
if(counselingprofessionalsonly == TRUE) {
coredata <- coredata %>%
filter(
(profID == 'counselor' | profID == 'counselor educator')
)
} else {
# pull in only the ones flagged at top of script
if(include.counselors != TRUE) {
coredata <- coredata %>%
filter(
(profID != 'counselor')
)
}
if(include.counseloreducators != TRUE) {
coredata <- coredata %>%
filter(
(profID != 'counselor educator')
)
}
if(include.therapists != TRUE) {
coredata <- coredata %>%
filter(
(profID != 'therapist')
)
}
if(include.psychologists != TRUE) {
coredata <- coredata %>%
filter(
(profID != 'psychologist')
)
}
if(include.socialworkers != TRUE) {
coredata <- coredata %>%
filter(
(profID != 'social worker')
)
}
}
observations = nrow(coredata)
print(observations)
## [1] 64
For the remainder of this output, the number of observations included in the analysis is: 64
This current dataset includes:
df_trans2 <- coredata %>%
group_by(trans) %>%
summarise(counts = n())
df_trans2
ggplot(df_trans2, aes(x=trans, y=counts)) +
geom_bar(fill = "#0073C2FF", stat="identity") +
theme_minimal()
df_race2 <- coredata %>%
group_by(race) %>%
summarise(counts = n())
df_race2
ggplot(df_race2, aes(x=race, y=counts)) +
geom_bar(fill = "#0073C2FF", stat="identity") +
theme_minimal()
df_age2 <- coredata %>%
group_by(agecat) %>%
summarise(counts = n())
df_age2
ggplot(df_age2, aes(x=agecat, y=counts)) +
geom_bar(fill = "#0073C2FF", stat="identity") +
theme_minimal()
df_yrsprac2 <- coredata %>%
group_by(yrsprac) %>%
summarise(counts = n())
df_yrsprac2
ggplot(df_yrsprac2, aes(x=yrsprac, y=counts)) +
geom_bar(fill = "#0073C2FF", stat="identity") +
theme_minimal()
df_region2 <- coredata %>%
group_by(region) %>%
summarise(counts = n())
df_region2
ggplot(df_region2, aes(x=region, y=counts)) +
geom_bar(fill = "#0073C2FF", stat="identity") +
theme_minimal()
df_relationstatus2 <- coredata %>%
group_by(relationstatus) %>%
summarise(counts = n())
df_relationstatus2
ggplot(df_relationstatus2, aes(x=relationstatus, y=counts)) +
geom_bar(fill = "#0073C2FF", stat="identity") +
theme_minimal()
They found 5 clusters in Spanierman et al. (2006).
The below analysis is done to follow the standard set by Soble et al. (2011) in how they did their analysis of the PCRW instrument as part of their study. It is intended to check whether the current sample matches the characteristics identified from the sample used in development of the PCRW.
From Spanierman et al. (2012): “Similar to previous studies (e.g., Spanierman et al. 2006), a cluster group’s subscale score is considered “high” if it falls within the upper one third of the overall distribution of sample subscale scores. A “moderate” subscale score is one that falls within the middle third of the overall distribution of subscale scores, and a subscale score that falls within the bottom third of the distribution is con- sidered “low” (Spanierman et al. 2006). Although not iden- tical, the five cluster groups from initial studies and the present study displayed similar overall mean patterns of White empathy, White guilt, and White fear. We therefore continued our investigation to examine differences in cluster types by gender.”
“To perform k-means clustering in R we can use the built-in kmeans() function”.
library(factoextra)
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
library(cluster)
# from https://www.statology.org/k-means-clustering-in-r/
# 1. Number of Clusters vs. the Total Within Sum of Squares
matrixdata <- select(coredata, c('WTANGER':'WTFEAR'))
fviz_nbclust(matrixdata, kmeans, method = "wss")
# Perform K-Means Clustering with Optimal K
#make this example reproducible
set.seed(1)
# this line frmo https://stackoverflow.com/questions/21858124/r-error-in-glmnet-na-nan-inf-in-foreign-function-call
# this would work on the individual PCRW items
#xsub <- model.matrix( ~ ., cleandata[ , c(32:47)])
# this works on just the raw score means (i.e., scale score divided by number of items in scale) for each subscale) - this is done in @spanierman2006
#xsub <- model.matrix( ~ ., cleandata[ , c(65:67)])
xsub <- model.matrix( ~ ., matrixdata)
#perform k-means clustering with k = 5 clusters on the PCRW items only
km <- kmeans(xsub, centers = 5, nstart = 25)
#view results
km
## K-means clustering with 5 clusters of sizes 6, 4, 26, 16, 12
##
## Cluster means:
## (Intercept) WTANGER WTGUILT WTFEAR
## 1 1 4.472222 4.766667 2.133333
## 2 1 5.291667 3.350000 3.400000
## 3 1 5.416667 3.523077 1.669231
## 4 1 5.604167 4.887500 1.887500
## 5 1 5.013889 2.000000 1.783333
##
## Clustering vector:
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
## 4 3 3 1 3 3 3 5 1 3 3 2 3 4 3 3 5 5 1 5 3 5 5 5 4 4
## 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
## 5 4 3 4 3 3 4 3 4 2 4 5 3 2 3 3 3 5 4 5 1 4 4 3 3 1
## 53 54 55 56 57 58 59 60 61 62 63 64
## 3 3 5 3 4 1 2 4 3 3 4 4
##
## Within cluster sum of squares by cluster:
## [1] 2.665370 3.541944 10.957650 8.170278 11.868796
## (between_SS / total_SS = 69.1 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss" "tot.withinss"
## [6] "betweenss" "size" "iter" "ifault"
# WTANGER
quantile(coredata$WTANGER, probs = seq(0, 1, 1/3))
## 0% 33.33333% 66.66667% 100%
## 3.500000 5.166667 5.500000 6.000000
# WTGUILT
quantile(coredata$WTGUILT, probs = seq(0, 1, 1/3))
## 0% 33.33333% 66.66667% 100%
## 1.2 3.2 4.4 5.8
#WTFEAR
quantile(coredata$WTFEAR, probs = seq(0, 1, 1/3))
## 0% 33.33333% 66.66667% 100%
## 1.0 1.6 2.0 4.0
Cluster means: ## WTANGER WTGUILT WTFEAR ## 1 Insensitive and afraid (4) 5.291667 = moderate 3.350000 = moderate 3.400000 = high ## 2 Unempathic and unaware (12) 5.013889 = low 2.000000 = low 1.783333 = low ## 3 Empathic but unaccountable. (26) 5.429487 = moderate 3.476923 = moderate 1.684615 = low ## 4 Fearful guilt (6) 4.472222 = low 4.766667 = high 2.133333 = high ## 5 Informed empathy and guilt. (17) 5.588235 = high 4.847059 = high 1.882353 = moderate
Coefficient alpha is a measure of internal consistency, or how closely related a set of items are as a group.
NO - SKIP THIS Another measure of internal consistency promoted by the psych package authors is McDonald’s coefficient omega
See https://www.personality-project.org/r/psych/HowTo/omega.pdf linked from https://www.personality-project.org/r/psych/
Find omega on the A-RES scale
omega(wpas_subset_recoded, check.keys=F)
calculate Cronbach’s alpha of the three PCRW subscale responses.
COME BACK TO THIS
# pcrw_wtanger_items <- pcrw_subset_recoded[ , c(1,3,6,10,14,16)]
# pcrw_wtguilt_items <- pcrw_subset_recoded[ , c(4,7,8,12,15)]
# pcrw_wtfear_items <- pcrw_subset_recoded[ , c(2,5,9,11,13)]
head(pcrw_wtanger_items)
CronbachWTANGER <- psych::alpha(pcrw_wtanger_items, check.keys=T)
CronbachWTGUILT <- psych::alpha(pcrw_wtguilt_items, check.keys=T)
CronbachWTFEAR <- psych::alpha(pcrw_wtfear_items, check.keys=T)
# inspect results
CronbachWTANGER
CronbachWTGUILT
CronbachWTFEAR
calculate Cronbach’s alpha of the WPAS instrument responses.
This is particularly important due to substitution of wording from white privilege in original items of validated WPAS instrument, to the term racism in current study, however maybe not necessary since the WPAS was validated and reliabiity confirmed when developed by original authors?
Cronbach <- psych::alpha(wpas_subset_recoded, check.keys=F)
# Cronbach <- psych::alpha(coredata[c("WPASQ01",
# "WPASQ02",
# "WPASQ03",
# "WPASQ04",
# "WPASQ05",
# "WPASQ06",
# "WPASQ07",
# "WPASQ08",
# "WPASQ09",
# "WPASQ10",
# "WPASQ11",
# "WPASQ12")], check.keys=F)
# inspect results
Cronbach
##
## Reliability analysis
## Call: psych::alpha(x = wpas_subset_recoded, check.keys = F)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.9 0.91 0.93 0.44 9.5 0.014 5.4 0.58 0.46
##
## 95% confidence boundaries
## lower alpha upper
## Feldt 0.87 0.9 0.92
## Duhachek 0.87 0.9 0.92
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## WPASQ01 0.88 0.89 0.92 0.43 8.3 0.016 0.023 0.44
## WPASQ02 0.89 0.90 0.93 0.46 9.2 0.015 0.022 0.48
## WPASQ03 0.88 0.89 0.92 0.43 8.2 0.016 0.023 0.42
## WPASQ05 0.88 0.89 0.92 0.43 8.3 0.015 0.023 0.44
## WPASQ06 0.88 0.89 0.92 0.43 8.1 0.016 0.023 0.43
## WPASQ08 0.89 0.90 0.92 0.44 8.8 0.015 0.022 0.47
## WPASQ09 0.89 0.90 0.93 0.46 9.3 0.014 0.021 0.47
## WPASQ10 0.89 0.90 0.92 0.44 8.6 0.015 0.021 0.46
## WPASQ11 0.89 0.90 0.92 0.45 8.9 0.015 0.021 0.47
## WPASQ12 0.89 0.90 0.92 0.44 8.6 0.015 0.022 0.46
## WPASQ04 0.90 0.90 0.92 0.46 9.4 0.013 0.019 0.48
## WPASQ07 0.89 0.90 0.92 0.45 9.1 0.014 0.023 0.47
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## WPASQ01 129 0.80 0.77 0.76 0.74 5.2 0.90
## WPASQ02 129 0.64 0.62 0.57 0.55 5.3 0.84
## WPASQ03 129 0.79 0.79 0.78 0.74 5.2 0.80
## WPASQ05 129 0.75 0.77 0.75 0.70 5.7 0.66
## WPASQ06 129 0.81 0.81 0.80 0.76 5.4 0.82
## WPASQ08 129 0.65 0.69 0.66 0.59 5.7 0.64
## WPASQ09 129 0.57 0.59 0.55 0.48 5.5 0.80
## WPASQ10 129 0.68 0.73 0.71 0.63 5.7 0.63
## WPASQ11 129 0.71 0.67 0.65 0.63 5.0 0.97
## WPASQ12 129 0.69 0.73 0.71 0.63 5.6 0.74
## WPASQ04 129 0.64 0.58 0.55 0.52 4.9 1.25
## WPASQ07 129 0.64 0.64 0.60 0.55 5.6 0.93
##
## Non missing response frequency for each item
## 1 2 3 4 5 6 miss
## WPASQ01 0.01 0.00 0.04 0.14 0.40 0.42 0
## WPASQ02 0.00 0.02 0.01 0.14 0.37 0.47 0
## WPASQ03 0.00 0.02 0.01 0.11 0.45 0.42 0
## WPASQ05 0.00 0.01 0.01 0.04 0.17 0.78 0
## WPASQ06 0.00 0.01 0.01 0.15 0.29 0.54 0
## WPASQ08 0.00 0.01 0.01 0.03 0.17 0.78 0
## WPASQ09 0.00 0.01 0.02 0.08 0.24 0.65 0
## WPASQ10 0.00 0.00 0.02 0.04 0.21 0.74 0
## WPASQ11 0.00 0.03 0.02 0.20 0.38 0.36 0
## WPASQ12 0.00 0.00 0.02 0.08 0.19 0.71 0
## WPASQ04 0.02 0.04 0.07 0.15 0.31 0.41 0
## WPASQ07 0.01 0.02 0.01 0.06 0.15 0.75 0
Internal consistency confirmed. Results show Cronbach’s alpha > 0.7 so items in the WPAS instrument are internally reliable (even with the modification).
The diagnostic plots to check if the assumptions for regression are met show several outliers. An examination of these identified observations themselves did not reveal any evidence that they are errors. For example, the start time and end time of the submissions are similar to other observations, so there is no indication that the respondent sped through quickly without actually reading the item, such as might occur with a participant interested in completing in order to enter the raffle.
Let’s examine the answers to each of the 12 WPAS questions. Questions 4 and 7 are reverse-coded.
# https://larmarange.github.io/ggstats/articles/gglikert.html
# This mutate step is required to put the levels in the correct order
# Don't do it across all columns though - this makes all datatypes into factors
# coredata2 <- coredata %>%
# mutate(across(everything(), ~ factor(.x, levels = likert_levels_plot)))
# coredata2 <- coredata %>%
# mutate(across(WPASQ01:WPASQ12, ~ factor(.x, levels = likert_levels_plot)))
# Below code superseded by global labels function implemented in earlier chunk
# coredata2 <- coredata2 %>%
# set_variable_labels(
# WPASQ01 = paste("Q1 ", originalsm.labels[48]),
# WPASQ02 = paste("Q2 ", originalsm.labels[49]),
# WPASQ03 = paste("Q3 ", originalsm.labels[50]),
# WPASQ04 = paste("Q4 ", originalsm.labels[51]), #reverse
# WPASQ05 = paste("Q5 ", originalsm.labels[52]),
# WPASQ06 = paste("Q6 ", originalsm.labels[53]),
# WPASQ07 = paste("Q7 ", originalsm.labels[54]), #reverse
# WPASQ08 = paste("Q8 ", originalsm.labels[55]),
# WPASQ09 = paste("Q9 ", originalsm.labels[56]),
# WPASQ10 = paste("Q10 ", originalsm.labels[57]),
# WPASQ11 = paste("Q11 ", originalsm.labels[58]),
# WPASQ12 = paste("Q12 ", originalsm.labels[59])
# )
#
# vt(coredata2)
if(counselingprofessionalsonly) {
counsonly <- coredata %>%
filter(profID == "counselor" | profID == "counselor educator")
## below is one question at a time, with breakdown by couns vs counsed
# gglikert(counsonly,
# include = c(WPASQ01),
# y = "profID",
# facet_rows = vars(.question),
# labels_size = 3,
# facet_label_wrap = 15) +
# ggtitle("WPAS Q1 only") +
# scale_fill_brewer(palette = "RdYlBu", direction =-1)
# Below is full set of WPAS questions, for counselors and counselor educators only
# Use this line if want to do the group-by on profID
#gglikert(counsonly, include = c(WPASQ01:WPASQ03, WPASQ05:WPASQ06, WPASQ08:WPASQ12), facet_rows = vars(profID)) +
gglikert(counsonly, include = c(WPASQ01:WPASQ03, WPASQ05:WPASQ06, WPASQ08:WPASQ12)) +
ggtitle(paste("Items comprising the ACCOUNT var,", nrow(counsonly), "obs"), subtitle = "Standard-coded questions") +
# scale_x_continuous ( labels = label_percent_abs(),
# expand = expansion(0,.2))
scale_fill_brewer(palette = "RdYlBu", direction =-1)
gglikert(coredata, include = c(WPASQ04,WPASQ07)) +
ggtitle(paste("Items comprising the ACCOUNT var,", nrow(counsonly), "obs", subtitle = "Reverse-coded questions")) +
scale_fill_brewer(palette = "RdYlBu", direction =1)
}
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
Per Flamez et al. (2017), before parametric analysis, the dataset needs to be examined to confirm it meets model assumptions, specifically for participant independence, that the dataset is normally distributed, and also homogeneity of covariances.
I understand now that I don’t need to worry about whether the sample variables are normally distributed or not. I am keeping this in this working notebook just to preserve the code/my learning process. The main interesting one is ACCOUNT, the last one, which shows some observations in the tail (left side of x axis) which are worth investigating further.
p <- ggplot(coredata, aes(x=WTANGER)) + geom_histogram()
p+ geom_vline(aes(xintercept=mean(WTANGER)),
color="blue", linetype="dashed", linewidth=1)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# Histogram with density plot
ggplot(coredata, aes(x=WTANGER)) +
geom_histogram(aes(y=after_stat(density)), colour="black", fill="white")+
geom_density(alpha=.2, fill="#FF6666")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
hist(coredata$WTANGER)
hist(coredata$WTGUILT)
hist(coredata$WTFEAR)
hist(coredata$SELFEFF)
hist(coredata$ACCOUNT)
descriptivestats <- report_sample(coredata, select = c("WTANGER", "WTGUILT", "WTFEAR", "SELFEFF", "ACCOUNT"))
descriptivestats %>%
flextable() %>%
set_header_labels(Summary = "Mean (Std Dev)", values = NULL)
Variable | Mean (Std Dev) |
|---|---|
Mean WTANGER (SD) | 5.29 (0.55) |
Mean WTGUILT (SD) | 3.68 (1.12) |
Mean WTFEAR (SD) | 1.90 (0.60) |
Mean SELFEFF (SD) | 3.34 (0.45) |
Mean ACCOUNT (SD) | 5.46 (0.59) |
coredata %>%
select(WTANGER:ACCOUNT) %>%
describe(fast = TRUE) %>%
round(digits = 2) %>%
flextable()
vars | n | mean | sd | min | max | range | se |
|---|---|---|---|---|---|---|---|
1 | 64 | 5.29 | 0.55 | 3.50 | 6.0 | 2.50 | 0.07 |
2 | 64 | 3.68 | 1.12 | 1.20 | 5.8 | 4.60 | 0.14 |
3 | 64 | 1.90 | 0.60 | 1.00 | 4.0 | 3.00 | 0.08 |
4 | 64 | 3.34 | 0.45 | 2.00 | 4.0 | 2.00 | 0.06 |
5 | 64 | 5.46 | 0.59 | 2.83 | 6.0 | 3.17 | 0.07 |
Turned eval=FALSE for these because they aren’t running in this notebook, the code does work in the Results chapter. This had been in the Linear Regression section.
[Editing Note: Including the plots below, unsure if I will include them in the actual dissertation. Thoughts? I don’t think I can easily extract them to create their own figure #s and recreating each from scratch would be quite a pain. Can I just say that I ran the diagnostics and describe my assessment?]
[Editing Note: again including these?]
I’m trying to figure out what type of regression to run, so here’s what I understand so far. I figured out that it’s not a multinomial regression, since I only have one DV. I was told to do a hierarchical regression, which I understand to be a LM where the IVs are named in order of importance, based on my understanding of the subject of study. I also discovered stepwise regression which seems to be an avenue to explore? See below for some of that.
What type of variables are these? Since each IV and the DV are calculated from the Likert-style ordinal values on the items on each instrument, are my IV and DV also ordinal, even though they are now represented as integers?
I looked at a lot of sources to figure this out and got conflicting information. For example, here is a discussion of when to use Likert scale responses as ordinal or interval: (https://stats.stackexchange.com/questions/10/under-what-conditions-should-likert-scales-be-used-as-ordinal-or-interval-data). The Likert responses in this study’s instruments are calculated as averages of the scores by scale. My takeaway from that post is that the difference between each possible item on the Likert scales cannot be measured, which makes these data ordinal. (It would only be interval if it could be precisely measured and was the same between each.)
See this: “When using summated scale scores (i.e., we add up score on each item to compute a”total score”), usual statistics may be applied, but you have to keep in mind that you are now working with a latent variable so the underlying construct should make sense! In psychometrics, we generally check that (1) unidimensionnality of the scale holds, (2) scale reliability is sufficient. When comparing two such scale scores (for two different instruments), we might even consider using attenuated correlation measures instead of classical Pearson correlation coefficient. ”
“The model fit was evaluated by examining the Chi-squared (χ2) statistic with a p-value greater than 0.05, the comparative fit index (CFI), the goodness of fit index (GFI), the root mean square error of approximation (RMSEA) and the χ2/df ratio (Hooper et al., 2008). The bootstrap method was employed for significance testing of the total effect, direct effect and total indirect effect.” from Yao et al. (2021) who proposed a model and then used SEM to validate.
This site specifically talks about analyzing Likert data in R with many useful code snippets.
Conclusion: Because the outcome variable is ordinal, and there are multiple independent variables, then this study requires multiple ordinal regression. (Verify this.) (Learning resource: Multiple ordinal regression https://www.youtube.com/watch?v=aiQcFCFtT5k )
ACTUALLY the exact-opposite argument is made here, that summed totals of individual Likert-type questions become interval data. https://stats.stackexchange.com/questions/398046/does-the-total-of-ordinal-scores-on-a-questionnaire-become-scale-in-spss
Here is a quote from an answer on that post: “The upshot here is that in order to do the addition, you had to implicitly convert those categories to numeric values. Often, if we analyzing individual Likert-type items we try to treat the data ordinal, but if we have several items summed into a scale, we treat the result as interval.” So that states that all of my variables are interval. HELP.
Should I run a cumulative odds ratio to understand how much one variable impacts the outcome variable when holding others constant. If that’s a good idea, what will it tell me?
Question: Because the scale averages are calculated from the item responses, each of my variables is a latent variable (constructed variable). Right? Or are they observed? Do I care? I think this only matters if I do path analysis (?).
#justvars <- coredata[ , c(65:69)]
justvars <- coredata %>%
select(WTANGER:ACCOUNT)
head(justvars)
round(describe(justvars),2)
write_csv(justvars, "data\\justvars.csv")
round(cor(justvars),3)
## WTANGER WTGUILT WTFEAR SELFEFF ACCOUNT
## WTANGER 1.000 0.168 -0.023 0.232 0.419
## WTGUILT 0.168 1.000 0.105 0.003 0.186
## WTFEAR -0.023 0.105 1.000 -0.329 -0.395
## SELFEFF 0.232 0.003 -0.329 1.000 0.617
## ACCOUNT 0.419 0.186 -0.395 0.617 1.000
There appears to be a decent correlation between SELFEFF and ACCOUNT in every analysis run, regardless of which subset of the data. WTANGER and WTGUILT also have a high correlation which is not surprising given what they measure, though might be an issue in the model. Later on I do an assessment for multicollinearity which does not appear to be a problem.
# Basic scatter plots
ggplot(justvars, aes(x=WTFEAR, y=ACCOUNT)) +
geom_point() +
geom_smooth(method='lm')
## `geom_smooth()` using formula = 'y ~ x'
ggplot(justvars, aes(x=WTGUILT, y=ACCOUNT)) +
geom_point() +
geom_smooth(method='lm')
## `geom_smooth()` using formula = 'y ~ x'
ggplot(justvars, aes(x=WTANGER, y=ACCOUNT)) +
geom_point() +
geom_smooth(method='lm')
## `geom_smooth()` using formula = 'y ~ x'
ggplot(justvars, aes(x=SELFEFF, y=ACCOUNT)) +
geom_point() +
geom_smooth(method='lm')
## `geom_smooth()` using formula = 'y ~ x'
This is the standard LM function in R with all independent variables regressed against Antiracist Accountability (ACCOUNT).
First look at specifically the Residuals vs Fitted.
Now with model performance
## Running the lm using absolute value of WTGUILT was explored based on suggestion of chatbot, thinking that scatterplot of ACCOUNT to WTGUILT was v-shaped. The results were identical with abs WTGUILT. Then tried the transformation which did give different results however WTGUILT not statistically significant in that model, and R-sq went down. Skipping that transformation as not additive.
# coredata <- coredata %>%
# mutate(abs_WTGUILT = abs(WTGUILT))
#
# plot(coredata$abs_WTGUILT, coredata$ACCOUNT)
#
# coredata <- coredata %>%
# mutate(reciprocal_WTGUILT = 1/WTGUILT)
#
# antiracism.lm <- lm(ACCOUNT ~ SELFEFF + WTANGER + reciprocal_WTGUILT + WTFEAR, data = coredata)
# HERE IS MY STANDARD LM
antiracism.lm <- lm(ACCOUNT ~ SELFEFF + WTANGER + WTGUILT + WTFEAR, data = coredata)
model_performance(antiracism.lm)
summary(antiracism.lm)
##
## Call:
## lm(formula = ACCOUNT ~ SELFEFF + WTANGER + WTGUILT + WTFEAR,
## data = coredata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.49631 -0.22641 0.03488 0.31135 0.73861
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.94789 0.65421 2.977 0.00421 **
## SELFEFF 0.62341 0.12829 4.859 9.07e-06 ***
## WTANGER 0.29915 0.10040 2.979 0.00419 **
## WTGUILT 0.08707 0.04797 1.815 0.07462 .
## WTFEAR -0.24593 0.09254 -2.657 0.01011 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4165 on 59 degrees of freedom
## Multiple R-squared: 0.5355, Adjusted R-squared: 0.504
## F-statistic: 17.01 on 4 and 59 DF, p-value: 2.515e-09
residuals <- resid(antiracism.lm)
report(antiracism.lm)
## We fitted a linear model (estimated using OLS) to predict ACCOUNT with SELFEFF,
## WTANGER, WTGUILT and WTFEAR (formula: ACCOUNT ~ SELFEFF + WTANGER + WTGUILT +
## WTFEAR). The model explains a statistically significant and substantial
## proportion of variance (R2 = 0.54, F(4, 59) = 17.01, p < .001, adj. R2 = 0.50).
## The model's intercept, corresponding to SELFEFF = 0, WTANGER = 0, WTGUILT = 0
## and WTFEAR = 0, is at 1.95 (95% CI [0.64, 3.26], t(59) = 2.98, p = 0.004).
## Within this model:
##
## - The effect of SELFEFF is statistically significant and positive (beta = 0.62,
## 95% CI [0.37, 0.88], t(59) = 4.86, p < .001; Std. beta = 0.47, 95% CI [0.28,
## 0.66])
## - The effect of WTANGER is statistically significant and positive (beta = 0.30,
## 95% CI [0.10, 0.50], t(59) = 2.98, p = 0.004; Std. beta = 0.28, 95% CI [0.09,
## 0.46])
## - The effect of WTGUILT is statistically non-significant and positive (beta =
## 0.09, 95% CI [-8.93e-03, 0.18], t(59) = 1.81, p = 0.075; Std. beta = 0.16, 95%
## CI [-0.02, 0.35])
## - The effect of WTFEAR is statistically significant and negative (beta = -0.25,
## 95% CI [-0.43, -0.06], t(59) = -2.66, p = 0.010; Std. beta = -0.25, 95% CI
## [-0.44, -0.06])
##
## Standardized parameters were obtained by fitting the model on a standardized
## version of the dataset. 95% Confidence Intervals (CIs) and p-values were
## computed using a Wald t-distribution approximation.
# Below works just fine, but the standard plot outputs gives more detail
# plot(fitted(antiracism.lm), residuals)
Is this model a better fit?
Now with model performance
antiracismwithoutguilt.lm<-lm(ACCOUNT ~ SELFEFF + WTANGER + WTFEAR, data = coredata)
model_performance(antiracismwithoutguilt.lm)
summary(antiracismwithoutguilt.lm)
##
## Call:
## lm(formula = ACCOUNT ~ SELFEFF + WTANGER + WTFEAR, data = coredata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.52698 -0.20677 0.03187 0.29712 0.80873
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.07535 0.66275 3.131 0.00269 **
## SELFEFF 0.62314 0.13072 4.767 1.23e-05 ***
## WTANGER 0.32959 0.10087 3.267 0.00180 **
## WTFEAR -0.22845 0.09378 -2.436 0.01784 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4244 on 60 degrees of freedom
## Multiple R-squared: 0.5096, Adjusted R-squared: 0.4851
## F-statistic: 20.78 on 3 and 60 DF, p-value: 2.362e-09
residuals <- resid(antiracismwithoutguilt.lm)
report(antiracismwithoutguilt.lm)
## We fitted a linear model (estimated using OLS) to predict ACCOUNT with SELFEFF,
## WTANGER and WTFEAR (formula: ACCOUNT ~ SELFEFF + WTANGER + WTFEAR). The model
## explains a statistically significant and substantial proportion of variance (R2
## = 0.51, F(3, 60) = 20.78, p < .001, adj. R2 = 0.49). The model's intercept,
## corresponding to SELFEFF = 0, WTANGER = 0 and WTFEAR = 0, is at 2.08 (95% CI
## [0.75, 3.40], t(60) = 3.13, p = 0.003). Within this model:
##
## - The effect of SELFEFF is statistically significant and positive (beta = 0.62,
## 95% CI [0.36, 0.88], t(60) = 4.77, p < .001; Std. beta = 0.47, 95% CI [0.27,
## 0.67])
## - The effect of WTANGER is statistically significant and positive (beta = 0.33,
## 95% CI [0.13, 0.53], t(60) = 3.27, p = 0.002; Std. beta = 0.30, 95% CI [0.12,
## 0.49])
## - The effect of WTFEAR is statistically significant and negative (beta = -0.23,
## 95% CI [-0.42, -0.04], t(60) = -2.44, p = 0.018; Std. beta = -0.23, 95% CI
## [-0.43, -0.04])
##
## Standardized parameters were obtained by fitting the model on a standardized
## version of the dataset. 95% Confidence Intervals (CIs) and p-values were
## computed using a Wald t-distribution approximation.
Now look at the full suite of standard plots from the original basic LM including WTGUILT.
Question In the Q-Q plot of errors, the residuals on both tails are straying from the line, which could indicate that they are not normally distributed.
—– from coaching D1: The Q-Q of residuals actually visually appears pretty normal (the points are not that far from the line). At the time of that consult, the calculations for the derived values had errors in their formulas which resulted in a low p-value on the Shapiro-Wilk test which seemed to indicate that the residuals were not normal, but this was due to those formula errors. When re-run on corrected formulas for derived values, the Shapiro-Wilk p-value > 0.05.
Coaching D1 also confirmed that it does not matter if the variables in the dataset are normal. The test for normality is only done on the residuals (errors) after running the regression.
Learning resource - cook’s distance and diagnostic plots in general
Check for Heteroscedasticity
In order for the model to be confirmed as a fit and that the results are trustworthy, then we need to evaluate that the data is homoscedastic.
Question In Scale-Location, the red line is not really horizontal (??). Is that a problem?
Maybe a regular (OLS) linear regression (that assumes normal residuals) is not correct since the DV isn’t really continuous, it is actually discrete. Therefore, try a Poisson regression (“In fact, a zero-inflated Poisson, negative binomial, or zero-inflated negative binomial are more likely what you will end up needing.” (https://stats.stackexchange.com/questions/103466/heteroscedasticity-in-residuals-vs-fitted-plot))
plot(antiracism.lm)
For the subset of data of only counselors and counselor educators, there are two observations that appear to be influential points: Observation #17 and #33 in the plots. On Residuals vs. Fitted, Observation #17 is clearly influencing the line. On Q-Q Residuals, Observation #17 is not even on the line. On Residuals vs. Leverage. Observation #17 is within Cook’s Distance. These plots indicate that this influential point is affecting the model.
#Create density plot of residuals
plot(density(residuals))
shapiro.test(residuals)
##
## Shapiro-Wilk normality test
##
## data: residuals
## W = 0.95639, p-value = 0.02387
2/19/24 with closed study and final dataset - Ran the basic LM of all 4 IVs against DV for the entire dataset including outliers of 128 and Shapiro-Wilk is infintesimal of 0.000005 something. One point was within Cook’s distance on Residuals vs Leverage. Q-Q residuals did not look that bad though. Residuals vs Fitted was mostly a horizontal line.
Re-ran with the limited dataset of only counselor/counselor ed, none of the alternatives on the race field, and no extremely influentials, which gave a sample of 63. Shapiro-Wilk is 0.26. The Q-Q Residuals looks quite good.
When the influential point is included, Shapiro-Wilk is 0.027
Even with the influential point identified above, the Shapiro Wilk test indicates that the residuals from the sample follow a normal distribution.
The Shapiro Wilk test checks normality of the data. The null hypothesis for the Shapiro-Wilk test is that the residuals fit a normal distribution. If the p-value of the Shapiro Wilk test is less than .05, then the null hypothesis is rejected, which is saying that the sample does not come from a normal population. In this case, the p-value is > .05, so the null hypothesis is not rejected - so the residuals do fit a normal distribution.
If the null hypothesis (that the residuals are normal) on the Shapiro Wilk is rejected, then that would mean the residuals were not normal and the dataset was non-parametric (which is not the case here). Linear regression is for parametric data. Per coaching D1, there is not a straight substitution of a different analysis than LM for nonparametric data, so instead, bootstrapping would be an option to pursue. Not needed on the current dataset though.
Ran the code below with outliers2 on the 149 dataset subsetted to only couns/counsed, same obs id as above but not extreme in the selfeff; disabling this code in case it’s causing problems but keep the analysis
# https://rpkgs.datanovia.com/rstatix/index.html
# This adds 2 columns, one for is.outlier and one for is.extreme
# Not sure what it will do if I run the same identify_outliers function on another column - maybe add another pair of same-named columns?
# Create separate tibble of just the identified outliers
# coredataoutliers <- coredata %>%
# identify_outliers(ACCOUNT)
# head(coredataoutliers)
# returns a new tibble with 2 columns
myoutliers <- coredata %>%
identify_outliers(ACCOUNT)
head(myoutliers)
# Ran the code below on the 149 dataset subsetted to only couns/counsed, same obs id as above but not extreme in the selfeff; disabling this code in case it's causing problems but keep the analysis
# myoutliers2 <- coredata %>%
# identify_outliers(SELFEFF)
# head(myoutliers2)
# identify the extremely influential per the rstatix function
# create new column extremelyinfluential to capture true/false
coredata <- coredata %>%
mutate(extremelyinfluential = is_extreme(ACCOUNT))
# added Tues 2/6 but seemed to cause problems down-code?
# re-added 2/10
coredata <- coredata %>%
mutate(influential = is_outlier(ACCOUNT))
head(coredata)
# Needed to confirm that the observation is still intact- this shows it is (this line works but is not needed in reality)
#coredata[coredata$"Respondent ID" == "114516188154", ]
When run on counselors only (including counselor educators), the identify_outliers() function identifies two observations, with the one containing Respondent ID of 114516188154 identified as “extreme.” That “extreme” datapoint is Observation __.
3/15/24 Arcenis recommended using Kendall’s \(\tau\) to evaluate whether to include the outlier or not. This was done by doing correlation via cor.test using the Kendall method first on SELFEFF and ACCOUNT in the full dataset, and compare to the same variables in the dataset with the outlier omitted.
## Kendall on SELFEFF and ACCOUNT full dataset
cor.test(coredata$SELFEFF, coredata$ACCOUNT, method = "kendall")
##
## Kendall's rank correlation tau
##
## data: coredata$SELFEFF and coredata$ACCOUNT
## z = 4.4415, p-value = 8.934e-06
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
## tau
## 0.4436783
kendallnooutlierself <- coredata %>%
filter(extremelyinfluential == FALSE) %>%
.$SELFEFF
kendallnooutlieraccount <- coredata %>%
filter(extremelyinfluential == FALSE) %>%
.$ACCOUNT
## Kendall on SELFEFF and ACCOUNT without the outlier
cor.test(kendallnooutlierself, kendallnooutlieraccount, method = "kendall")
##
## Kendall's rank correlation tau
##
## data: kendallnooutlierself and kendallnooutlieraccount
## z = 4.161, p-value = 3.169e-05
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
## tau
## 0.4212329
Results show a slightly higher tau when the outlier is included (.444) compared to when outlier excluded (.421). Arcenis advised that a tau > .5 is very strong; anything above .2 is strong enough for publication.
For Arcenis: Does the higher Kendall’s tau result for the model with the outlier included provide further reinforcement of my decision to include the outlier? Also do you happen to have a citation that I can use for this? This is not a standard statistic in my field, I don’t think.
Even though the residuals are shown to be normal based on the Shapiro Wilk test above, the plots still are visually indicating relationships between the observations in the dataset that need further attention.
Aguinis et al. (2013) presented recommendations on how to handle outliers. They commented that studying the outliers is important.
Below is the same regression with diagnostics plots on the residuals on the dataset without Observation __.
The model_performance() output gives a lower AIC than the LM with the extremely influential included.
The Shapiro wilk shows it as not normal.
# This is brute force just deleting specific rows
# But don't do that as it messes up the observation numbering
# coredatanoextreme <- coredata[-c(17), ]
antiracism2.lm<-lm(ACCOUNT ~ WTFEAR + WTGUILT + WTANGER + SELFEFF, data = subset(coredata, extremelyinfluential == FALSE))
# from performance package
model_performance(antiracism2.lm)
summary(antiracism2.lm)
##
## Call:
## lm(formula = ACCOUNT ~ WTFEAR + WTGUILT + WTANGER + SELFEFF,
## data = subset(coredata, extremelyinfluential == FALSE))
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.43581 -0.23386 0.05376 0.27226 0.74150
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.81929 0.69752 4.042 0.000158 ***
## WTFEAR -0.21743 0.08843 -2.459 0.016949 *
## WTGUILT 0.06924 0.04599 1.506 0.137601
## WTANGER 0.20147 0.10173 1.980 0.052409 .
## SELFEFF 0.52666 0.12676 4.155 0.000108 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3953 on 58 degrees of freedom
## Multiple R-squared: 0.396, Adjusted R-squared: 0.3543
## F-statistic: 9.506 on 4 and 58 DF, p-value: 5.581e-06
plot(antiracism2.lm)
residuals2 <- resid(antiracism2.lm)
shapiro.test(residuals2)
##
## Shapiro-Wilk normality test
##
## data: residuals2
## W = 0.95656, p-value = 0.02608
Residuals vs Fitted is showing how another point, Observation 32 (the datapoint identified as 33 in the prior plots, not sure why the index changed, probably because I lost the row of labels in one of my manipulations?), is also having an effect on the line, however clearly not as extreme as Observation 17. This observation will not be removed from the dataset.
The Scale-Location plot still has a slope however it is much more horizontal without the influential observation.
Q-Q Residuals meet the colloquial “fat pencil test” where a fat pencil overlaid on top would capture all the points.
The Shapiro-Wilk again shows a p-value > .05 which means the null hypothesis of the Shapiro-Wilk test is not rejected, which means that the sample comes from a population that is normally distributed.
Run the regression on ACCOUNT (excluding the influential point).
Get VIF values for just the 3 affect variables. A VIF close to 1 means not collinear.
Each of the three white racial affect variables are unique in what they are contributing to the regression.
library(car)
## Loading required package: carData
whiteracialaffectsonaccount.lm <- lm(ACCOUNT ~ SELFEFF + WTGUILT + WTFEAR + WTANGER, data = subset(coredata, extremelyinfluential == FALSE))
vif(whiteracialaffectsonaccount.lm)
## SELFEFF WTGUILT WTFEAR WTANGER
## 1.101855 1.032837 1.111659 1.027722
A follow-up analysis looked at the Variance Inflation Factor (VIF) test statistic. This assessment examines whether variables in a model show collinearity, which can affect the results. The VIF values for all four independent variables are near 1, so multicollinearity was determined to not be present.
The extremely influential point is omitted.
The other three constructs of white racial affect are omitted.
The purpose of this section is to see if this simpler model (residuals from) fit a normal distribution. I hope to get a higher p-value from the Shapiro wilk test.
Even this basic LM is showing that the residuals are not normal - the dataset is not normal (I think)
OK so what this says is that the assumption of linearity is not here - the relationship between ACCOUNT and SELFEFF cannot be expressed by a straight line.
Is this because they each are interval datapoints???
Help from Devin:
Here’s what I know: 1. Doing this simple OLS regression gives not-normal residuals per Shapiro-Wilk, whether I do it on my full sample or on the full sample with extremely influential points omitted.
antiracism.lm<-lm(ACCOUNT ~ WTFEAR + WTGUILT + WTANGER + SELFEFF, data = coredata)
All of these variables are averages, computed from answers to Likert-style questions on instruments. All except for SELFEFF are 1-6, SELFEFF is 1-4. Does this mean they are interval data? We cannot actually know the distance between the points. If they are interval data, does OLS LM not work because of this?
Looking at the scatter plot just between ACCOUNT and SELFEFF shows that there really is no way to fit a line to this data. Is this what’s causing me problems?
Let’s please look at the <a href=“#spearmans-rhoSpearman’s rho together. I see that there are no statistically significant relationships with the emotional variables and ACCOUNT. HOWEVER: My original model proposed that perhaps SELFEFF was a mediating variable. Can I make any conclusions from the (admittedly non statistically significant) inverse relationships revealed between SELFEFF and the 3 white racial affects???
Look at scatter plot of ACCOUNT and SELFEFF Because they are both intervals (are they???), does that mean that I cannot do a LM? Does this somehow say that the dataset is not normal?
What if I calculated totals instead of means for the ACCOUNT and SELFEFF scores - these variables are derived from Likert style answers on the underlying instruments. It would still be interval data but it would be different - would it? Would I measure the exact-same relationships if looking at sum vs looking at average? I don’t really understand data.
Please help me with terminology. My study is specifically looking at counselors (which also includes counselor educators). However, when I did data collection, I recruited from a variety of listservs for mental health practitioners, including social workers, therapists, and psychologists. Therefore, my entire dataset has responses from people who professionally identify in all of these ways.
When I’m talking about the entire dataset, with counselors, therapists, etc., is that a different population than when I’m talking about the subset of data from only counselors?
# antiracismselfeffonly.lm<-lm(ACCOUNT ~ SELFEFF, data = subset(coredata, extremelyinfluential == FALSE))
antiracismselfeffonly.lm<-lm(ACCOUNT ~ SELFEFF, data = coredata)
model_performance(antiracismselfeffonly.lm)
summary(antiracismselfeffonly.lm)
##
## Call:
## lm(formula = ACCOUNT ~ SELFEFF, data = coredata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.53781 -0.17516 0.06018 0.31018 0.81018
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.7338 0.4460 6.130 6.71e-08 ***
## SELFEFF 0.8187 0.1325 6.177 5.57e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4691 on 62 degrees of freedom
## Multiple R-squared: 0.381, Adjusted R-squared: 0.371
## F-statistic: 38.16 on 1 and 62 DF, p-value: 5.573e-08
plot(antiracismselfeffonly.lm)
residualsselfeff <- resid(antiracismselfeffonly.lm)
shapiro.test(residualsselfeff)
##
## Shapiro-Wilk normality test
##
## data: residualsselfeff
## W = 0.92066, p-value = 0.0005295
Cannot really examine this output since the data does not meet assumptions for a linear model.
When I removed the extremely influential outlier, the Multiple R-squared went down from close to 50% to under 40%.
This influential point is apparently a model fit outlier (Aguinis et al., 2013) as it affected R-squared. Figure out how to ggplot these both on the same graph.
Notes: Here is what Aguinis et al. (2013) says to do: “Report findings with and without either of the following approaches: -Respecify model -Remove outlier -Robust approaches (e.g., absolute deviation, least trimmed squares, M-estimation, Bayesian statistics):
***TO DO - LOOK AT THOSE ROBUST APPROACHES ABOVE.
# from https://statisticsglobe.com/plot-predicted-vs-actual-values-in-r
subcore <- coredata %>%
filter(extremelyinfluential == FALSE)
# Create data for ggplot2
data_mod <- data.frame(Predicted = predict(antiracism2.lm),
Observed = subcore$ACCOUNT)
# Draw plot using ggplot2 package
ggplot(data_mod,
aes(x = Predicted,
y = Observed)) +
geom_point() +
geom_abline(intercept = 0,
slope = 1,
color = "red",
linewidth = 2) +
ggtitle("R-squared Fitted vs Observed - full data")
Per this websiteThe assumptions for the data when using this analysis are that the dependent variable be either interval or ratio (yes); the observations should be independent (yes), the DV should be normally distributed (yes mostly), and there should be no outliers (does not fully meet this). This is used to test a hypothesis: is the average value for the DV X? I don’t really care what the mean is though, I’m looking at relationships. I don’t think this is the right analysis to perform.
Used for nonparametric correlation calculations.
# keeping snippet - from psych package - found simpler way below instead
corr1 <- cor2(coredata$ACCOUNT, coredata$SELFEFF, digits=2, use="pairwise", method = 'spearman', cor="cor", show=TRUE)
From the rstatix package.
corout <- coredata %>%
cor_test(
vars = "ACCOUNT",
vars2 = c("SELFEFF", "WTGUILT", "WTANGER", "WTFEAR"),
method = "spearman"
)
corout$p <- formatC(corout$p, digits=4)
corout
Based on these low p-values, there are many significant relationships between all four independent variables and ACCOUNT, though the only one with a larger correlation is SELFEFF.
corout <- coredata %>%
cor_test(
vars = "SELFEFF",
vars2 = c("WTGUILT", "WTANGER", "WTFEAR"),
method = "spearman"
)
corout$p <- formatC(corout$p, digits=4)
corout
What’s interesting here is, while there are no statistically significant relationships demonstrated, the direction of the calculated correlation is negative. This actually matches my hypothesized relationships in the model - I think.
(Pulled from manuscript - this is model assumptions stuff - placed here to preserve it, needs to be below the Model Assumptions header because this is where the lm is available)
And of course at the very end after I had done everything manually, I discovered this R function to report on it in one table.
Model | Normality (Shapiro-Wilk) | Homoscedasticity (Breusch-Pagan) | Autocorrelation of residuals (Durbin-Watson) | Diagnostic |
|---|---|---|---|---|
ACCOUNT ~ SELFEFF + WTANGER + WTGUILT + WTFEAR | 0.03 | 0.61 | 0.78 | 1.00 |
The p value of < 0.05 indicate that assumption is not met. The Diagnostic column indicates how many of the assumptions are violated. In this output, the Shapiro-Wilk test of normality again shows the p-value of < 0.05.
This output doesn’t go to Word nicely from the qmd files so not including it now - but definitely try to use this in the future! It is a much simpler way of evaluating everything that I spent so long doing manually!!
I tried the iRegression package (installed it and looked through) but it’s using two alternative formulas with a min and a max, does not seem relevant at all.
Individual items of the A-RES regressed against ACCOUNT
antiracismselfeffQs.lm<-lm(ACCOUNT ~ ARESQ01 + ARESQ02 + ARESQ03 + ARESQ04, data = subset(coredata, extremelyinfluential == FALSE))
summary(antiracismselfeffQs.lm)
##
## Call:
## lm(formula = ACCOUNT ~ ARESQ01 + ARESQ02 + ARESQ03 + ARESQ04,
## data = subset(coredata, extremelyinfluential == FALSE))
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.41187 -0.18860 0.04657 0.26843 0.66576
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.54285 0.26752 20.719 < 2e-16 ***
## ARESQ01Agree -0.23810 0.11458 -2.078 0.04248 *
## ARESQ02Agree -0.23275 0.13673 -1.702 0.09446 .
## ARESQ02Disagree -0.42756 0.21386 -1.999 0.05062 .
## ARESQ03Agree -0.01362 0.11686 -0.117 0.90762
## ARESQ03Disagree -0.14683 0.20863 -0.704 0.48460
## ARESQ03Disagree strongly -1.35976 0.45918 -2.961 0.00455 **
## ARESQ04Disagree 0.23258 0.26775 0.869 0.38889
## ARESQ04Disagree strongly 0.35177 0.24798 1.419 0.16178
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4007 on 54 degrees of freedom
## Multiple R-squared: 0.422, Adjusted R-squared: 0.3364
## F-statistic: 4.929 on 8 and 54 DF, p-value: 0.000132
#residuals plots - can skip these unless it's a different dataset
plot(antiracismselfeffQs.lm)
## Warning: not plotting observations with leverage one:
## 35
residualsselfeffQs <- resid(antiracismselfeffQs.lm)
shapiro.test(residualsselfeffQs)
##
## Shapiro-Wilk normality test
##
## data: residualsselfeffQs
## W = 0.94695, p-value = 0.008799
Individual items of the PCRW White Guilt subscale regressed against ACCOUNT
arwtguilt.lm<-lm(ACCOUNT ~ PCRWQ04 + PCRWQ07 + PCRWQ08 + PCRWQ12 + PCRWQ15, data = subset(coredata, extremelyinfluential == FALSE))
summary(arwtguilt.lm)
##
## Call:
## lm(formula = ACCOUNT ~ PCRWQ04 + PCRWQ07 + PCRWQ08 + PCRWQ12 +
## PCRWQ15, data = subset(coredata, extremelyinfluential ==
## FALSE))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.7698 -0.1681 -0.0076 0.2337 0.9052
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.29198 0.49265 12.772 3.93e-15 ***
## PCRWQ04Moderately Agree -0.12068 0.25762 -0.468 0.64221
## PCRWQ04Slightly Agree 0.19762 0.30614 0.646 0.52256
## PCRWQ04Slightly Disagree -0.40432 0.32684 -1.237 0.22386
## PCRWQ04Moderately Disagree 0.11409 0.31484 0.362 0.71914
## PCRWQ04Strongly Disagree -0.02370 0.33389 -0.071 0.94379
## PCRWQ07Moderately Agree -0.01286 0.25564 -0.050 0.96014
## PCRWQ07Slightly Agree -0.23741 0.29488 -0.805 0.42590
## PCRWQ07Slightly Disagree -0.22762 0.29888 -0.762 0.45115
## PCRWQ07Moderately Disagree -0.48544 0.33545 -1.447 0.15628
## PCRWQ07Strongly Disagree -0.53988 0.34905 -1.547 0.13044
## PCRWQ08Moderately Agree -0.99932 0.46715 -2.139 0.03908 *
## PCRWQ08Slightly Agree -1.65232 0.56628 -2.918 0.00596 **
## PCRWQ08Slightly Disagree -1.08514 0.50691 -2.141 0.03895 *
## PCRWQ08Moderately Disagree -1.22786 0.52914 -2.321 0.02593 *
## PCRWQ08Strongly Disagree -1.19611 0.53130 -2.251 0.03039 *
## PCRWQ12Moderately Agree 0.84870 0.43385 1.956 0.05802 .
## PCRWQ12Slightly Agree 1.12917 0.39599 2.852 0.00708 **
## PCRWQ12Slightly Disagree 0.96254 0.42528 2.263 0.02957 *
## PCRWQ12Moderately Disagree 0.83788 0.42333 1.979 0.05526 .
## PCRWQ12Strongly Disagree 0.84120 0.45147 1.863 0.07038 .
## PCRWQ15Moderately Agree -0.45247 0.27993 -1.616 0.11452
## PCRWQ15Slightly Agree -0.31772 0.27501 -1.155 0.25538
## PCRWQ15Slightly Disagree -0.27565 0.30843 -0.894 0.37725
## PCRWQ15Moderately Disagree -0.01853 0.28554 -0.065 0.94860
## PCRWQ15Strongly Disagree -0.25358 0.30154 -0.841 0.40578
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4528 on 37 degrees of freedom
## Multiple R-squared: 0.4944, Adjusted R-squared: 0.1528
## F-statistic: 1.447 on 25 and 37 DF, p-value: 0.1505
plot(arwtguilt.lm)
residualswtg <- resid(arwtguilt.lm)
shapiro.test(residualswtg)
##
## Shapiro-Wilk normality test
##
## data: residualswtg
## W = 0.98608, p-value = 0.6978
Individual items of the PCRW White Fear subscale regressed against ACCOUNT
arwtfear.lm<-lm(ACCOUNT ~ PCRWQ02 + PCRWQ05 + PCRWQ09 + PCRWQ11 + PCRWQ13, data = subset(coredata, extremelyinfluential == FALSE))
summary(arwtfear.lm)
##
## Call:
## lm(formula = ACCOUNT ~ PCRWQ02 + PCRWQ05 + PCRWQ09 + PCRWQ11 +
## PCRWQ13, data = subset(coredata, extremelyinfluential ==
## FALSE))
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.0133 -0.2156 0.0000 0.2392 0.5220
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.487366 0.641296 6.997 1.3e-08 ***
## PCRWQ02Moderately Agree -0.451594 0.237889 -1.898 0.06438 .
## PCRWQ02Slightly Agree -0.466731 0.241747 -1.931 0.06014 .
## PCRWQ02Slightly Disagree -0.177519 0.284487 -0.624 0.53593
## PCRWQ02Moderately Disagree -0.536264 0.268273 -1.999 0.05196 .
## PCRWQ02Strongly Disagree -1.412104 0.416077 -3.394 0.00149 **
## PCRWQ05Moderately Agree 1.059297 0.533395 1.986 0.05344 .
## PCRWQ05Slightly Agree 0.228788 0.344704 0.664 0.51041
## PCRWQ05Slightly Disagree 0.007781 0.307558 0.025 0.97993
## PCRWQ05Moderately Disagree -0.059705 0.339573 -0.176 0.86126
## PCRWQ05Strongly Disagree 0.055130 0.306129 0.180 0.85793
## PCRWQ09Slightly Disagree -0.715113 0.629977 -1.135 0.26260
## PCRWQ09Moderately Disagree -0.545595 0.477719 -1.142 0.25974
## PCRWQ09Strongly Disagree -0.285772 0.427783 -0.668 0.50768
## PCRWQ11Slightly Agree -0.141294 0.811773 -0.174 0.86264
## PCRWQ11Slightly Disagree -1.280774 1.172059 -1.093 0.28059
## PCRWQ11Moderately Disagree 0.001476 1.031874 0.001 0.99887
## PCRWQ11Strongly Disagree -0.066097 1.002196 -0.066 0.94772
## PCRWQ13Slightly Agree NA NA NA NA
## PCRWQ13Moderately Disagree 1.797549 0.883728 2.034 0.04814 *
## PCRWQ13Strongly Disagree 1.823666 0.859564 2.122 0.03967 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4043 on 43 degrees of freedom
## Multiple R-squared: 0.5315, Adjusted R-squared: 0.3245
## F-statistic: 2.567 on 19 and 43 DF, p-value: 0.005245
plot(arwtfear.lm)
## Warning: not plotting observations with leverage one:
## 12, 18, 35, 39, 58
residualswtf <- resid(arwtfear.lm)
shapiro.test(residualswtf)
##
## Shapiro-Wilk normality test
##
## data: residualswtf
## W = 0.95653, p-value = 0.02598
Individual items of the PCRW White Anger subscale regressed against ACCOUNT
arwtanger.lm<-lm(ACCOUNT ~ PCRWQ01 + PCRWQ03 + PCRWQ06 + PCRWQ10 + PCRWQ14 + PCRWQ16, data = subset(coredata, extremelyinfluential == FALSE))
summary(arwtanger.lm)
##
## Call:
## lm(formula = ACCOUNT ~ PCRWQ01 + PCRWQ03 + PCRWQ06 + PCRWQ10 +
## PCRWQ14 + PCRWQ16, data = subset(coredata, extremelyinfluential ==
## FALSE))
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.24238 -0.10946 0.00051 0.18010 0.81706
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.492382 0.126659 43.364 < 2e-16 ***
## PCRWQ01Moderately Agree -0.049882 0.140558 -0.355 0.724540
## PCRWQ01Slightly Agree -1.282855 0.299505 -4.283 0.000112 ***
## PCRWQ01Slightly Disagree -0.645132 0.439605 -1.468 0.150052
## PCRWQ01Moderately Disagree -0.343116 0.703278 -0.488 0.628298
## PCRWQ03Moderately Agree 0.279691 0.164149 1.704 0.096160 .
## PCRWQ03Slightly Agree 0.231272 0.173110 1.336 0.189108
## PCRWQ03Slightly Disagree 0.340439 0.220769 1.542 0.130933
## PCRWQ03Moderately Disagree 0.391191 0.227441 1.720 0.093167 .
## PCRWQ03Strongly Disagree 0.279873 0.368582 0.759 0.452111
## PCRWQ06Moderately Agree -0.192020 0.142417 -1.348 0.185149
## PCRWQ06Slightly Agree 0.049148 0.366575 0.134 0.894018
## PCRWQ06Slightly Disagree NA NA NA NA
## PCRWQ10Moderately Agree -0.121756 0.195005 -0.624 0.535929
## PCRWQ10Slightly Agree 0.704609 0.479662 1.469 0.149662
## PCRWQ14Moderately Agree 0.004812 0.186062 0.026 0.979495
## PCRWQ14Slightly Agree 0.173311 0.202413 0.856 0.396972
## PCRWQ14Slightly Disagree 0.198846 0.351501 0.566 0.574754
## PCRWQ14Moderately Disagree -0.359996 0.454683 -0.792 0.433176
## PCRWQ14Strongly Disagree 0.276346 0.417498 0.662 0.511824
## PCRWQ16Moderately Agree -0.545528 0.217339 -2.510 0.016219 *
## PCRWQ16Slightly Agree -0.136832 0.370480 -0.369 0.713824
## PCRWQ16Moderately Disagree -1.301692 0.685464 -1.899 0.064795 .
## PCRWQ16Strongly Disagree -1.056518 0.609491 -1.733 0.090721 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3943 on 40 degrees of freedom
## Multiple R-squared: 0.5856, Adjusted R-squared: 0.3576
## F-statistic: 2.569 on 22 and 40 DF, p-value: 0.004661
plot(arwtanger.lm)
## Warning: not plotting observations with leverage one:
## 8, 13, 17, 19, 41, 54, 57
residualswta <- resid(arwtanger.lm)
shapiro.test(residualswta)
##
## Shapiro-Wilk normality test
##
## data: residualswta
## W = 0.93244, p-value = 0.001887
https://www.youtube.com/watch?v=Gtz8ca_4hVg
Wed morning made eval false here, this code is not running
library(quantreg)
#plot ordinary and median regressions
ggplot(coredata, aes(SELFEFF, ACCOUNT)) +
geom_point() +
geom_smooth(method = lm, se = F, color = "red", ) +
geom_quantile(quantiles = 0.5)
lr <- lm(ACCOUNT ~ SELFEFF, data = coredata)
check_heteroscedasticity(lr)
qm50 <- rq(ACCOUNT ~ SELFEFF, data = subset(coredata, extremelyinfluential == FALSE), tau = 0.5)
AIC(lr, qm50)
check_outliers(lr)
check_normality(lr)
#check_homogeneity(lr)
# multivariable regression
q <- rq(ACCOUNT ~ SELFEFF + WTANGER, data = subset(coredata, extremelyinfluential == FALSE),
tau = seq(.05, .95, by = 0.05))
summary(q) %>%
plot(c("SELFEFF", "WTANGER"))
library(quantregGrowth)
set.seed(1)
o <-gcrq(ACCOUNT ~ ps(SELFEFF),
data = coredata %>% sample_n(100), tau=seq(.10,.90,l=3))
# par(mfrow=c(1,2)) # for several plots
plot(o, legend=TRUE, conf.level = .95, shade=TRUE, lty = 1, lwd = 3, col = -1, res=TRUE)
# nonparametric multiple regression
mod.lo <- loess(ACCOUNT ~ SELFEFF + WTANGER, span=.5, degree=1, data = subset(coredata, extremelyinfluential == FALSE))
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : pseudoinverse used at 11.431 15.903
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : neighborhood radius 1.633
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : reciprocal condition number 1.3479e-19
summary(mod.lo)
## Call:
## loess(formula = ACCOUNT ~ SELFEFF + WTANGER, data = subset(coredata,
## extremelyinfluential == FALSE), span = 0.5, degree = 1)
##
## Number of Observations: 63
## Equivalent Number of Parameters: 7.77
## Residual Standard Error: 0.3778
## Trace of smoother matrix: 10.19 (exact)
##
## Control settings:
## span : 0.5
## degree : 1
## family : gaussian
## surface : interpolate cell = 0.2
## normalize: TRUE
## parametric: FALSE FALSE
## drop.square: FALSE FALSE
selfe <- with(subset(coredata, extremelyinfluential == FALSE), seq(min(SELFEFF), max(SELFEFF), len=25))
wtang <- with(subset(coredata, extremelyinfluential == FALSE), seq(min(WTANGER), max(WTANGER), len=25))
newdata <- expand.grid(SELFEFF=selfe, WTANGER=wtang)
fit.account <- matrix(predict(mod.lo, newdata), 25, 25)
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : eval 8.165 11.084
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : lowerlimit 8.1405
## 11.431
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : extrapolation not
## allowed with blending
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : eval 8.3691 11.084
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : lowerlimit 8.1405
## 11.431
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : extrapolation not
## allowed with blending
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : eval 8.5732 11.084
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : lowerlimit 8.1405
## 11.431
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : extrapolation not
## allowed with blending
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : eval 8.7773 11.084
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : lowerlimit 8.1405
## 11.431
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : extrapolation not
## allowed with blending
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : eval 8.9815 11.084
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : lowerlimit 8.1405
## 11.431
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : extrapolation not
## allowed with blending
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : eval 9.1856 11.084
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : lowerlimit 8.1405
## 11.431
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : extrapolation not
## allowed with blending
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : eval 9.3897 11.084
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : lowerlimit 8.1405
## 11.431
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : extrapolation not
## allowed with blending
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : eval 9.5938 11.084
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : lowerlimit 8.1405
## 11.431
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : extrapolation not
## allowed with blending
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : eval 9.798 11.084
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : lowerlimit 8.1405
## 11.431
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : extrapolation not
## allowed with blending
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : eval 8.165 11.345
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : lowerlimit 8.1405
## 11.431
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : extrapolation not
## allowed with blending
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : eval 8.3691 11.345
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : lowerlimit 8.1405
## 11.431
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : extrapolation not
## allowed with blending
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : eval 8.5732 11.345
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : lowerlimit 8.1405
## 11.431
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : extrapolation not
## allowed with blending
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : eval 8.7773 11.345
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : lowerlimit 8.1405
## 11.431
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : extrapolation not
## allowed with blending
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : eval 8.9815 11.345
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : lowerlimit 8.1405
## 11.431
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : extrapolation not
## allowed with blending
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : eval 9.1856 11.345
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : lowerlimit 8.1405
## 11.431
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : extrapolation not
## allowed with blending
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : eval 9.3897 11.345
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : lowerlimit 8.1405
## 11.431
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : extrapolation not
## allowed with blending
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : eval 9.5938 11.345
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : lowerlimit 8.1405
## 11.431
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : extrapolation not
## allowed with blending
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : eval 9.798 11.345
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : lowerlimit 8.1405
## 11.431
## Warning in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x
## else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : extrapolation not
## allowed with blending
persp(selfe, wtang, fit.account, theta=45, phi=30, ticktype="detailed",
xlab="SELFEFF", ylab="WTANGER", zlab="ACCOUNT", expand=2/3,
shade=0.5)
library(glmulti)
# This did not work, the code runs but cannot get the output to display.
glmulti.output <- glmulti(ACCOUNT ~ SELFEFF + WTGUILT + WTANGER + WTFEAR,
data = subset(coredata, extremelyinfluential == FALSE),
crit = aicc, # AICC corrected AIC for small samples
level = 2, # 2 with interactions, 1 without
method = "d", # "d", or "h", or "g" for genetic which is v slow
family = gaussian,
plotty = TRUE,
fitfunction = glm, # Type of model (LM, GLM etc.)
confsetsize = 100) # Keep 100 best models
# optimal_model_glmulti_d <- glmulti.output@objects[[1]]
# performance(optimal_model_glmulti_d)
# optimal_model_glmulti_d$formula
print(glmulti.output)
summary(glmulti.output)
USELESS!
The low p-value for self efficacy correlated to accountability means the null hypothesis should be rejected; the correlation between these two constructs is stastically significant _if the dataset were normal, but it’s not__.
And, at least in this subset of the full dataset, the null hypothesis cannot be rejected about any other relationships.
My proposed model said that the three white racial affects (emotions) influence antiracist self-efficacy, which influences antiracist accountability. (I think this means that SELFEFF is a mediator variable - correct? Does that determine how the regression should be set up - or even that path analysis is needed?)
The Multiple R-squared calculated in the regression of all four IVs against ACCOUNT in the counselors only dataset is: 0.3789
This means that about 38% of the variation in ACCOUNT is explained by the variables. This is considered generally to be a very low effect size; however, this post by a statistician indicates that when studying humans, R squared values are often < .5; need to find an academic citation for this.
When I run only SELFEFF against ACCOUNT, the R-squared goes down.
antiracism3.lm<-lm(ACCOUNT ~ SELFEFF + WTGUILT, data = subset(coredata, extremelyinfluential == FALSE))
summary(antiracism3.lm)
##
## Call:
## lm(formula = ACCOUNT ~ SELFEFF + WTGUILT, data = subset(coredata,
## extremelyinfluential == FALSE))
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.34037 -0.16587 0.07884 0.23893 0.68828
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.13065 0.47950 6.529 1.58e-08 ***
## SELFEFF 0.63564 0.12816 4.960 6.12e-06 ***
## WTGUILT 0.06528 0.04802 1.359 0.179
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4185 on 60 degrees of freedom
## Multiple R-squared: 0.2995, Adjusted R-squared: 0.2762
## F-statistic: 12.83 on 2 and 60 DF, p-value: 2.301e-05
#residuals plots - can skip these unless it's a different dataset
#plot(antiracism3.lm)
# residuals3 <- resid(antiracism3.lm)
# shapiro.test(residuals3)
# Create data for ggplot2
data_mod <- data.frame(Predicted = predict(antiracism3.lm),
Observed = subcore$ACCOUNT)
# Draw plot using ggplot2 package
ggplot(data_mod,
aes(x = Predicted,
y = Observed)) +
geom_point() +
geom_abline(intercept = 0,
slope = 1,
color = "red",
linewidth = 2) +
ggtitle("R-squared Fitted vs Observed - only SELFEFF")
Let’s look at just the white racial affects on self-efficacy as a DV.
This is the standard LM function in R with with SELFEFF as DV, excluding the extremely influential observation.
The p-values show only WTFEAR having a (negative) effect on SELFEFF.
The Residuals vs Fitted does not look at all like a cone here.
But what’s going on with the Q-Q Residuals??
#antiracistselfeff.lm<-lm(SELFEFF ~ WTANGER + WTGUILT + WTFEAR, data = subset(coredata, extremelyinfluential == FALSE))
antiracistselfeff.lm<-lm(SELFEFF ~ WTFEAR + WTGUILT + WTANGER, data = coredata)
summary(antiracistselfeff.lm)
##
## Call:
## lm(formula = SELFEFF ~ WTFEAR + WTGUILT + WTANGER, data = coredata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.79186 -0.35362 0.02283 0.21280 0.87491
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.8170242 0.5487552 5.133 3.24e-06 ***
## WTFEAR -0.2384423 0.0878873 -2.713 0.00869 **
## WTGUILT -0.0004311 0.0482747 -0.009 0.99291
## WTANGER 0.1838356 0.0982087 1.872 0.06610 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4192 on 60 degrees of freedom
## Multiple R-squared: 0.1585, Adjusted R-squared: 0.1165
## F-statistic: 3.768 on 3 and 60 DF, p-value: 0.01516
plot(antiracistselfeff.lm)
residualsselfeff <- resid(antiracistselfeff.lm)
model_performance((antiracistselfeff.lm))
report(antiracistselfeff.lm)
## We fitted a linear model (estimated using OLS) to predict SELFEFF with WTFEAR,
## WTGUILT and WTANGER (formula: SELFEFF ~ WTFEAR + WTGUILT + WTANGER). The model
## explains a statistically significant and moderate proportion of variance (R2 =
## 0.16, F(3, 60) = 3.77, p = 0.015, adj. R2 = 0.12). The model's intercept,
## corresponding to WTFEAR = 0, WTGUILT = 0 and WTANGER = 0, is at 2.82 (95% CI
## [1.72, 3.91], t(60) = 5.13, p < .001). Within this model:
##
## - The effect of WTFEAR is statistically significant and negative (beta = -0.24,
## 95% CI [-0.41, -0.06], t(60) = -2.71, p = 0.009; Std. beta = -0.32, 95% CI
## [-0.56, -0.08])
## - The effect of WTGUILT is statistically non-significant and negative (beta =
## -4.31e-04, 95% CI [-0.10, 0.10], t(60) = -8.93e-03, p = 0.993; Std. beta =
## -1.08e-03, 95% CI [-0.24, 0.24])
## - The effect of WTANGER is statistically non-significant and positive (beta =
## 0.18, 95% CI [-0.01, 0.38], t(60) = 1.87, p = 0.066; Std. beta = 0.23, 95% CI
## [-0.02, 0.47])
##
## Standardized parameters were obtained by fitting the model on a standardized
## version of the dataset. 95% Confidence Intervals (CIs) and p-values were
## computed using a Wald t-distribution approximation.
shapiro.test(residualsselfeff)
##
## Shapiro-Wilk normality test
##
## data: residualsselfeff
## W = 0.96818, p-value = 0.09721
antiracistselfeff.lm<-lm(SELFEFF ~ WTFEAR + WTGUILT + WTANGER, data = coredata)
summary(antiracistselfeff.lm)
##
## Call:
## lm(formula = SELFEFF ~ WTFEAR + WTGUILT + WTANGER, data = coredata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.79186 -0.35362 0.02283 0.21280 0.87491
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.8170242 0.5487552 5.133 3.24e-06 ***
## WTFEAR -0.2384423 0.0878873 -2.713 0.00869 **
## WTGUILT -0.0004311 0.0482747 -0.009 0.99291
## WTANGER 0.1838356 0.0982087 1.872 0.06610 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4192 on 60 degrees of freedom
## Multiple R-squared: 0.1585, Adjusted R-squared: 0.1165
## F-statistic: 3.768 on 3 and 60 DF, p-value: 0.01516
plot(antiracistselfeff.lm)
residualsselfeff <- resid(antiracistselfeff.lm)
model_performance((antiracistselfeff.lm))
report(antiracistselfeff.lm)
## We fitted a linear model (estimated using OLS) to predict SELFEFF with WTFEAR,
## WTGUILT and WTANGER (formula: SELFEFF ~ WTFEAR + WTGUILT + WTANGER). The model
## explains a statistically significant and moderate proportion of variance (R2 =
## 0.16, F(3, 60) = 3.77, p = 0.015, adj. R2 = 0.12). The model's intercept,
## corresponding to WTFEAR = 0, WTGUILT = 0 and WTANGER = 0, is at 2.82 (95% CI
## [1.72, 3.91], t(60) = 5.13, p < .001). Within this model:
##
## - The effect of WTFEAR is statistically significant and negative (beta = -0.24,
## 95% CI [-0.41, -0.06], t(60) = -2.71, p = 0.009; Std. beta = -0.32, 95% CI
## [-0.56, -0.08])
## - The effect of WTGUILT is statistically non-significant and negative (beta =
## -4.31e-04, 95% CI [-0.10, 0.10], t(60) = -8.93e-03, p = 0.993; Std. beta =
## -1.08e-03, 95% CI [-0.24, 0.24])
## - The effect of WTANGER is statistically non-significant and positive (beta =
## 0.18, 95% CI [-0.01, 0.38], t(60) = 1.87, p = 0.066; Std. beta = 0.23, 95% CI
## [-0.02, 0.47])
##
## Standardized parameters were obtained by fitting the model on a standardized
## version of the dataset. 95% Confidence Intervals (CIs) and p-values were
## computed using a Wald t-distribution approximation.
shapiro.test(residualsselfeff)
##
## Shapiro-Wilk normality test
##
## data: residualsselfeff
## W = 0.96818, p-value = 0.09721
Let’s look at just the white racial affects on ACCOUNT.
#antiracistaffectsonly.lm<-lm(ACCOUNT ~ WTANGER + WTGUILT + WTFEAR, data = subset(coredata, extremelyinfluential == FALSE))
antiracistaffectsonly.lm<-lm(ACCOUNT ~ WTANGER + WTGUILT + WTFEAR, data = coredata)
summary(antiracistaffectsonly.lm)
##
## Call:
## lm(formula = ACCOUNT ~ WTANGER + WTGUILT + WTFEAR, data = coredata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.45991 -0.23845 0.01118 0.33163 0.88462
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.70406 0.63989 5.789 2.76e-07 ***
## WTANGER 0.41375 0.11452 3.613 0.000620 ***
## WTGUILT 0.08680 0.05629 1.542 0.128351
## WTFEAR -0.39457 0.10248 -3.850 0.000289 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4888 on 60 degrees of freedom
## Multiple R-squared: 0.3496, Adjusted R-squared: 0.3171
## F-statistic: 10.75 on 3 and 60 DF, p-value: 9.448e-06
plot(antiracistaffectsonly.lm)
#antiracistaffectsonly.lm<-lm(ACCOUNT ~ WTFEAR + WTGUILT + WTANGER, data = subset(coredata, extremelyinfluential == FALSE))
realmodelmaybe.lm<-lm(ACCOUNT ~ SELFEFF + WTFEAR + WTANGER, data = coredata)
summary(realmodelmaybe.lm)
##
## Call:
## lm(formula = ACCOUNT ~ SELFEFF + WTFEAR + WTANGER, data = coredata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.52698 -0.20677 0.03187 0.29712 0.80873
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.07535 0.66275 3.131 0.00269 **
## SELFEFF 0.62314 0.13072 4.767 1.23e-05 ***
## WTFEAR -0.22845 0.09378 -2.436 0.01784 *
## WTANGER 0.32959 0.10087 3.267 0.00180 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4244 on 60 degrees of freedom
## Multiple R-squared: 0.5096, Adjusted R-squared: 0.4851
## F-statistic: 20.78 on 3 and 60 DF, p-value: 2.362e-09
plot(realmodelmaybe.lm)
Per Dr Patrick conversation 3/6/24: Selection Regression? Sequential/ hierarchical / clockwise? I investigated R seqest package, to understand the effects from the emotion variables to SELFEFF, then to ACCOUNT. (The docs for this package very hard to understand.) It has a seq_cat_model() function for sequential logistic regression on categorical predictors, which is not my data. Also seq_GEE_model() function for Generalized Estimating Equations, which is either correlated data, or data that is grouped, which also does not seem to be a match for my data (??).
Discuss with Arcenis.
Also looked at ase_seq_logit() which sounded perfect: Variable selection and stopping criterion in logistic regression model. This sounded exactlylike what Dr. Patrick was suggesting. However my data is not binary so logistic regression not appropriate (??). And, I then realized that it’s not actually showing mediation. It’s just letting me confirm that all of the variables are legitimate to include in the regression, because they indicate direct effect on ACCOUNT. I actually want to be testing for indirect effect though (see use of mediation package below, which I think I cannot use because of small sample?).
The code below from leaps package reinforces the same info that I got from stepwise regression, and confirms that all of the independent variables belong in the regression model with the DV ACCOUNT. However, this does not actually tell me about mediating role of SELFEFF, it just says that all variables are relevant when regressed on ACCOUNT.
#from https://advstats.psychstat.org/book/mregression/selection.php
library(leaps)
allvars <- regsubsets(ACCOUNT ~ SELFEFF + WTANGER + WTGUILT + WTFEAR, data = coredata, nbest=1, nvmax=5)
info <- summary(allvars)
cbind(info$which, round(cbind(rsq=info$rsq, adjr2=info$adjr2, cp=info$cp, bic=info$bic, rss=info$rss), 3))
## (Intercept) SELFEFF WTANGER WTGUILT WTFEAR rsq adjr2 cp bic rss
## 1 1 1 0 0 0 0.381 0.371 18.632 -22.377 13.643
## 2 1 1 1 0 0 0.461 0.443 10.455 -27.089 11.877
## 3 1 1 1 0 1 0.510 0.485 6.294 -28.966 10.808
## 4 1 1 1 1 1 0.536 0.504 5.000 -28.284 10.237
#plot(2:8, info$cp, xlab='P (# of predictors + 1)', ylab='Cp')
#abline(a=0,b=1)
This type of regression is used to figure out which of the variables actually contribute to the dependent variable. A stepwise procedure
First is a Forward Stepwise Selection AIC is Akaike Information Criterion which is a goodness-of-fit measure, when used to compare to another model (a sole AIC score independently is meaningless). A smaller AIC is preferable to larger.
#define intercept-only model
#intercept_only <- lm(ACCOUNT ~ 1, data = subset(coredata, extremelyinfluential == FALSE))
intercept_only <- lm(ACCOUNT ~ 1, data = coredata)
#define model with all predictors
#all <- lm(ACCOUNT ~ WTFEAR + WTGUILT + WTANGER + SELFEFF, data = subset(coredata, extremelyinfluential == FALSE))
all <- lm(ACCOUNT ~ WTFEAR + WTGUILT + WTANGER + SELFEFF, data = coredata)
#perform forward stepwise regression
#trace=0 will suppress the full results of the stepwise to save space in the output
forward <- step(intercept_only, direction='forward', scope=formula(all), trace=0)
#forward <- step(intercept_only, direction='forward', scope=formula(all))
#view results of forward stepwise regression
forward$anova
#view final model
forward$coefficients
## (Intercept) SELFEFF WTANGER WTFEAR WTGUILT
## 1.94789385 0.62341231 0.29914801 -0.24592658 0.08706655
(When the full dataset of all professional identities is included in this analysis (not just counselors), the AIC scores have much greater variability between the additional three variables, which is interesting and probably deserves further exploration - though, might only be an aspect of the larger dataset.)
Let’s do the same thing with a backwards stepwise selection:
#define intercept-only model
intercept_only <- lm(ACCOUNT ~ 1, data = coredata)
#define model with all predictors
all <- lm(ACCOUNT ~ WTFEAR + WTGUILT + WTANGER + SELFEFF, data = coredata)
#perform both-directions stepwise regression
backwardsw <- step(intercept_only, direction='backward', scope=formula(all), trace=0)
#view results of forward stepwise regression
backwardsw$anova
#view final model
backwardsw$coefficients
## (Intercept)
## 5.464844
Test for mediation:
summary(antiracistaffectsonly.lm)
##
## Call:
## lm(formula = ACCOUNT ~ WTANGER + WTGUILT + WTFEAR, data = coredata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.45991 -0.23845 0.01118 0.33163 0.88462
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.70406 0.63989 5.789 2.76e-07 ***
## WTANGER 0.41375 0.11452 3.613 0.000620 ***
## WTGUILT 0.08680 0.05629 1.542 0.128351
## WTFEAR -0.39457 0.10248 -3.850 0.000289 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4888 on 60 degrees of freedom
## Multiple R-squared: 0.3496, Adjusted R-squared: 0.3171
## F-statistic: 10.75 on 3 and 60 DF, p-value: 9.448e-06
WTGUILT p-value not less than .05, so no significant evidence that it
is associated with ACCOUNT when SELFEFF not in the model, and the effect
size is small anyway.
summary(antiracistselfeff.lm)
##
## Call:
## lm(formula = SELFEFF ~ WTFEAR + WTGUILT + WTANGER, data = coredata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.79186 -0.35362 0.02283 0.21280 0.87491
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.8170242 0.5487552 5.133 3.24e-06 ***
## WTFEAR -0.2384423 0.0878873 -2.713 0.00869 **
## WTGUILT -0.0004311 0.0482747 -0.009 0.99291
## WTANGER 0.1838356 0.0982087 1.872 0.06610 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4192 on 60 degrees of freedom
## Multiple R-squared: 0.1585, Adjusted R-squared: 0.1165
## F-statistic: 3.768 on 3 and 60 DF, p-value: 0.01516
Only WTFEAR is has a statistically significant effect, with moderate effect size of -0.24. WTANGER is approaching significance, with a smaller effect of 0.18. WTGUILT cannot be said to have a significant effect, with p-value approaching 1, and the beta is tiny. The adjusted R squared of this model is quite low, at .116. It is not established that there is a relationship between WTGUILT and SELFEFF. HOW DOES THIS AFFECT MY MEDIATION ANALYSIS?
summary(antiracism.lm)
##
## Call:
## lm(formula = ACCOUNT ~ SELFEFF + WTANGER + WTGUILT + WTFEAR,
## data = coredata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.49631 -0.22641 0.03488 0.31135 0.73861
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.94789 0.65421 2.977 0.00421 **
## SELFEFF 0.62341 0.12829 4.859 9.07e-06 ***
## WTANGER 0.29915 0.10040 2.979 0.00419 **
## WTGUILT 0.08707 0.04797 1.815 0.07462 .
## WTFEAR -0.24593 0.09254 -2.657 0.01011 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4165 on 59 degrees of freedom
## Multiple R-squared: 0.5355, Adjusted R-squared: 0.504
## F-statistic: 17.01 on 4 and 59 DF, p-value: 2.515e-09
All 4 of the IVs are shown to be associated to ACCOUNT. So suddenly here, WTGUILT shows up (though p-value is .07 which is higher than my .05 alpha).
WTANGER and WTFEAR p-values increase when SELFEFF is in the model.
WITH SELFEFF WTANGER 0.29915 WTFEAR -0.24593
WITHOUT SELFEFF WTANGER 0.41375 WTFEAR -0.39457
For both of those, the coefficients are lower when SELFEFF is in the model, which implies that SELFEFF is a mediator on both.
With SELFEFF in the model, WTANGER is lower at 0.299 (compared to 0.41375 without SELFEFF), and p-value increases but still significant. WTFEAR also decreases (from -0.24593 with SELFEFF, to -0.39457 without) and p-value also increases but still significant. This suggests that SELFEFF is mediating the relationship between these variables and ACCOUNT.
A mediator variable is one that explains the how or why of a observed relationship between an independent variable and dependent variable. So, in this case, SELFEFF appears to be explaining part of the relationship between WTANGER and WTFEAR and ACCOUNT.
WITH SELFEFF
WITHOUT SELFEFF WTGUILT .086 and p-value .13
WITH SELFEFF WTGUILT 0.08707 and p-value = .07 which is approaching significance.
WTGUILT is somewhat different: When SELFEFF is in the model, the relationship of WTGUILT on ACCOUNT improves slightly. However, WTGUILT has no association with SELFEFF when they are examined separately. The predictive influence of WTGUILT on ACCOUNT improves marginally when SELFEFF is in the model, though it still is not at my alpha of .05. When SELFEFF is not in the model, the coeff for WTGUILT is .086 and p-value .13. When SELFEFF is in the model, WTGUILT bumps up a tiny bit to .087 and the p-value approaches statistical significance, at .07. When I regress WTGUILT on SELFEFF, then the coeff is tiny (-0.0004311) and not statistically significant (p-value nearly 1.0).
This could suggest that ‘SELFEFF’ is acting as a suppressor variable. A suppressor variable is one that increases the predictive validity of another variable (or variables) by its inclusion in a regression equation.
The ‘yhat’ package in R has a function called ‘suppressorEffect()’ that can help you test for suppressor effects.
Another option is the ‘rms’ package. It provides a function called ‘vif()’, which can calculate the variance inflation factor (VIF). A low VIF for ‘SELFEFF’ might indicate it’s acting as a suppressor.
I got VIF values for all 4 of my IVs to test for collinearity using this formula: ACCOUNT ~ SELFEFF + WTGUILT + WTFEAR + WTANGER
Can I use those same VIF values to evaluate if SELFEF is a suppressor? Or is it a different function?
For a partial mediation model, you’d typically follow these steps: 1. First, run a regression with just the independent variables (IV) and dependent variable (DV). This gives you the total effect of the IV on the DV. 2. Next, run a regression with the IV and the mediator. This gives you the effect of the IV on the mediator. 3. Then, run a regression with the mediator and the DV, controlling for the IV. This gives you the effect of the mediator on the DV, controlling for the IV. 4. Finally, run a regression with both the IV and the mediator on the DV. This gives you the direct effect of the IV on the DV, controlling for the mediator.
(After sweating through the full bootstrap thing at the end of this file, that guy said “if you have real outliers or very influential observations in your data, they will be given more weight then needed. In this case you would use a robust regression,” which brings me back to thist)
library(MASS)
#fit robust regression model
#antiracism.robust <- rlm(ACCOUNT ~ WTFEAR + WTGUILT + WTANGER + SELFEFF, data=subset(coredata, extremelyinfluential == FALSE))
# since it's robust, and since the extreme influential point actually is not a data entry error that I can tell, include the full dataset here
antiracism.robust <- rlm(ACCOUNT ~ WTFEAR + WTGUILT + WTANGER + SELFEFF, data=coredata)
summary(antiracism.robust)
##
## Call: rlm(formula = ACCOUNT ~ WTFEAR + WTGUILT + WTANGER + SELFEFF,
## data = coredata)
## Residuals:
## Min 1Q Median 3Q Max
## -1.50995 -0.24448 0.02026 0.28598 0.70805
##
## Coefficients:
## Value Std. Error t value
## (Intercept) 2.1614 0.6453 3.3494
## WTFEAR -0.2227 0.0913 -2.4399
## WTGUILT 0.0705 0.0473 1.4889
## WTANGER 0.2824 0.0990 2.8519
## SELFEFF 0.5987 0.1265 4.7314
##
## Residual standard error: 0.3856 on 59 degrees of freedom
plot(antiracism.robust)
Now compare to the ordinary least squares regression of the model, with the influential outlier removed. The residual standard error returned next will measure the standard deviation of the residuals in each of the models. A lower RSE means a better fit to the data.
#find residual standard error of ols model with outliers removed
summary(antiracism2.lm)$sigma
## [1] 0.3952902
#find residual standard error of robust model
summary(antiracism.robust)$sigma
## [1] 0.3855842
visualize the results
coredata$robust_fitted <- fitted(antiracism.robust)
coredata$classical_fitted <- fitted(antiracism.lm)
ggplot(coredata, aes(x = ACCOUNT, y = SELFEFF)) +
geom_point() +
geom_line(aes(y = robust_fitted), color = "red") +
geom_line(aes(y = classical_fitted), color = "blue") +
ggtitle("Comparison of Robust vs. Classical Regression")
The null hypothesis of these one-way ANOVA calculations is that the group means are all equal.
# from https://personality-project.org/r/r.guide.html#descriptive
aov.ex1 = aov(ACCOUNT~sexuality,data = subset(coredata, extremelyinfluential == FALSE)) #do the analysis of variance
summary(aov.ex1) #show the summary table
## Df Sum Sq Mean Sq F value Pr(>F)
## sexuality 7 3.873 0.5533 2.734 0.0165 *
## Residuals 55 11.131 0.2024
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
print(model.tables(aov.ex1,"means"),digits=3)
## Tables of means
## Grand mean
##
## 5.506614
##
## sexuality
## Alternate or self-identified answer Bisexual Gay Heterosexual or straight
## 5.71 5.41 3.75 5.51
## rep 2.00 8.00 1.00 40.00
## Lesbian Pansexual Queer Undecided/Exploring
## 6 5.81 5.54 5.83
## rep 1 3.00 7.00 1.00
#report the means and the number of subjects/cell
boxplot(ACCOUNT~sexuality,data = subset(coredata, extremelyinfluential == FALSE))
#graphical summary appears in graphics window
The p-value is < .05 which means that the null hypothesis - that the groups are equivalent on ACCOUNT - is rejected. (Need to fix the labels on the box plots, but they show it too.)
aov.age = aov(ACCOUNT~agecat,data = subset(coredata, extremelyinfluential == FALSE)) #do the analysis of variance
summary(aov.age) #show the summary table
## Df Sum Sq Mean Sq F value Pr(>F)
## agecat 4 0.72 0.1801 0.731 0.574
## Residuals 58 14.28 0.2463
print(model.tables(aov.age,"means"),digits=3)
## Tables of means
## Grand mean
##
## 5.506614
##
## agecat
## 18 to 24 25 to 34 35 to 44 45 to 54 55 to 64
## 5.06 5.48 5.52 5.58 5.53
## rep 3.00 14.00 24.00 16.00 6.00
#report the means and the number of subjects/cell
boxplot(ACCOUNT~agecat,data = subset(coredata, extremelyinfluential == FALSE))
#graphical summary appears in graphics window
The box plots and the p-value > .05 actually both agree that there are not significant differences in ACCOUNT by age groups in the sample.
aov.region = aov(ACCOUNT~region,data = subset(coredata, extremelyinfluential == FALSE)) #do the analysis of variance
summary(aov.region) #show the summary table
## Df Sum Sq Mean Sq F value Pr(>F)
## region 3 0.912 0.3042 1.273 0.292
## Residuals 59 14.092 0.2388
print(model.tables(aov.region,"means"),digits=3)
## Tables of means
## Grand mean
##
## 5.506614
##
## region
## Midwest Northeast South West
## 5.29 5.63 5.56 5.57
## rep 15.00 8.00 26.00 14.00
#report the means and the number of subjects/cell
boxplot(ACCOUNT~region,data = subset(coredata, extremelyinfluential == FALSE))
#graphical summary appears in graphics window
aov.class = aov(ACCOUNT~class,data = subset(coredata, extremelyinfluential == FALSE)) #do the analysis of variance
summary(aov.class) #show the summary table
## Df Sum Sq Mean Sq F value Pr(>F)
## class 4 0.557 0.1393 0.559 0.693
## Residuals 58 14.447 0.2491
print(model.tables(aov.class,"means"),digits=3)
## Tables of means
## Grand mean
##
## 5.506614
##
## class
## Lower middle class Skip/Prefer not to say Upper class Upper middle class
## 5.5 5.92 5.92 5.47
## rep 17.0 1.00 2.00 38.00
## Working class
## 5.55
## rep 5.00
#report the means and the number of subjects/cell
boxplot(ACCOUNT~class,data = subset(coredata, extremelyinfluential == FALSE))
#graphical summary appears in graphics window
aov.yrsprac = aov(ACCOUNT~yrsprac,data = subset(coredata, extremelyinfluential == FALSE)) #do the analysis of variance
summary(aov.yrsprac) #show the summary table
## Df Sum Sq Mean Sq F value Pr(>F)
## yrsprac 5 1.964 0.3928 1.717 0.145
## Residuals 57 13.040 0.2288
print(model.tables(aov.yrsprac,"means"),digits=3)
## Tables of means
## Grand mean
##
## 5.506614
##
## yrsprac
## 0-1 16-24 2-4 25 or more 5-8 9-15
## 5.3 5.55 5.8 5.39 5.53 5.43
## rep 15.0 7.00 14.0 3.00 9.00 15.00
#report the means and the number of subjects/cell
boxplot(ACCOUNT~yrsprac,data = subset(coredata, extremelyinfluential == FALSE))
#graphical summary appears in graphics window
SKIP EVERYTHING BELOW
Resampling. Recommended by coaching D1 for working with nonparametric data. Likely due to small sample size.
Learning resource - bootstrapping using the boot package
(Abandoned below…)
# built from example at https://www.statmethods.net/advstats/bootstrapping.html
library(boot)
# function to obtain regression weights
# bs <- function(formula, data, indices)
# {
# d <- data[indices,] # allows boot to select sample
# fit <- lm(formula, data=d)
# return(coef(fit))
# }
# # bootstrapping with 1000 replications
# results <- boot(data = subset(coredata, extremelyinfluential == FALSE), statistic=bs,
# R=1000, formula=ACCOUNT ~WTFEAR + WTGUILT + WTANGER + SELFEFF)
#
# # view results
# results
# plot(results, index=1) # intercept
# plot(results, index=2) # WTGUILT
# plot(results, index=3) # WTGUILT
# plot(results, index=4) # WTANGER
# plot(results, index=5) # SELFEFF
#
# # get 95% confidence intervals
# boot.ci(results, type="bca", index=1) # intercept
# boot.ci(results, type="bca", index=2) # WTGUILT
# boot.ci(results, type="bca", index=3) # WTGUILT
# boot.ci(results, type="bca", index=4) # WTANGER
# boot.ci(results, type="bca", index=5) # SELFEFF
Let’s start fresh… Mon Feb 4.
Learning resource - non parametric boostrapped regression via tidymodels His blog post
Bootstrapping do not have any assumptions of the data to meet.
Works better for small samples.
Describes variation in the data better than standard lm.
library(tidymodels)
# this makes the boostrap sampling reproducible, it always will have the same data based on this call (the 108 is random, could be anything)
set.seed(108)
# Bootstrap the data with resampling
boot_data <- bootstraps(data = subset(coredata, extremelyinfluential == FALSE & influential == FALSE), times = 2000)
# bootstrap new regressions and unnest all fits
boot_models <- boot_data %>%
mutate(model = map(splits, ~lm(ACCOUNT ~ SELFEFF + WTANGER + WTGUILT + WTFEAR, data = subset(coredata, extremelyinfluential == FALSE))),
augmented = map(model, augment) )
boot_models
boot_aug <- boot_models %>%
unnest(augmented) %>%
dplyr::select(-splits, -model)
boot_aug
#### uSE BOOT_AUG FOR THE LM
ggplot(data = boot_aug, aes(x = SELFEFF, y = ACCOUNT)) +
geom_line(aes(y = .fitted, group = id), col = "grey", size = 0.25)
# # Plot estimates distribution of 1000 estimates
# boot_coefs <- boot_models %>%
# unnest(coefs) %>%
# dplyr::select(term, estimate)
#
#
#
# boot_coefs
#
# boot_meds <- boot_coefs %>%
# group_by(term) %>%
# summarise(
# med_est = median(estimate),
# quantile = quantile(estimate, 0.5 ),
# conf.low = quantile(estimate, 0.025),
# conf.high = quantile(estimate, 0.975),
# conf.25 = quantile(estimate, 0.25 ),
# conf.75 = quantile(estimate, 0.75 ))
#
# boot_meds
#
#
#
# boot_coefs %>%
# ggplot(aes(x = estimate)) +
# geom_rect(aes(x = med_est, xmin = conf.low, xmax = conf.high, ymin = 0, ymax = Inf), data = boot_meds, alpha = 0.1, fill = "green") +
# geom_rect(aes(x = med_est, xmin = conf.25, xmax = conf.75, ymin = 0, ymax = Inf), data = boot_meds, alpha = 0.3, fill = "green") +
# geom_density() +
# geom_vline(data = boot_meds, aes(xintercept = med_est))+
# facet_wrap(~term, scales = "free")
#
#
# boot_models <- bootstraps(data = subset(coredata, extremelyinfluential == FALSE), times = 10000) %>%
# mutate(model = map(splits, ~lm(ACCOUNT ~ 0 + SELFEFF + WTANGER + WTGUILT + WTFEAR, data = subset(coredata, extremelyinfluential == FALSE))),
# coefs = map(model, tidy) ) # optional "conf.int = TRUE"
#
# boot_coefs <- boot_models %>%
# unnest(coefs) %>%
# dplyr::select(term, p.value)
#
#
#
# boot_coefs %>%
# ggplot(aes(x = p.value)) +
# geom_histogram() +
# facet_wrap(~term, scales = "free")
#
# boot_aug <- boot_models %>%
# mutate(augmented = map(model, augment)) %>%
# unnest(augmented) %>%
# dplyr::select(-splits, -model)
#
#
#
# # get median estimates, median 95% CIs
# nonpar_med_boot_preds <- boot_aug %>%
# group_by(SELFEFF) %>%
# summarise(
# predicted = median(.fitted ),
# conf.low = quantile(.fitted, 0.025),
# conf.high = quantile(.fitted, 0.975)) %>%
# mutate(model = "bootstrapped") %>%
# dplyr::select(model, everything())
#
# nonpar_med_boot_preds
# Note that this is the first lm function in this code with influentials also omitted
# antiracismbooted.lm<-lm(ACCOUNT ~ WTFEAR + WTGUILT + WTANGER + SELFEFF, data = subset(coredata, extremelyinfluential == FALSE & influential == FALSE))
antiracismbooted.lm<-lm(ACCOUNT ~ WTFEAR + WTGUILT + WTANGER + SELFEFF, data = coredata)
model_performance(antiracismbooted.lm)
summary(antiracismbooted.lm)
residualsbooted <- resid(antiracismbooted.lm)
report(antiracismbooted.lm)
# Below works just fine, but the standard plot outputs gives more detail
# plot(fitted(antiracism.lm), residuals)
Plots of bootstrapped LM
plot(antiracismbooted.lm)
#Create density plot of residuals
plot(density(residualsbooted))
Shapiro-Wilk text on residuals from LM of booted dataset
shapiro.test(residualsbooted)
library(robustbase)
rm <- lmrob(ACCOUNT ~ SELFEFF + WTANGER + WTGUILT + WTFEAR, data = subset(coredata, extremelyinfluential == FALSE))
summary(rm)
Keeping this in here for now, though further reading indicates that Poisson is best for a “count” dependent variable (such as, the number of people who die each year from bee stings). My DV is not a count.
# from https://www.geeksforgeeks.org/poisson-regression-in-r-programming/
# output <-glm(formula = breaks ~ wool + tension,
# data = warpbreaks, family = poisson)
# print(summary(output))
# the below runs but I don't think it's a valid function to run on this data
outputpo <-glm(formula = ACCOUNT ~ WTFEAR + WTGUILT + WTANGER + SELFEFF, data = subset(coredata, extremelyinfluential == FALSE), family = poisson)
print(summary(output))
plot(outputpo)
An ordinal logistic regression is for a dependent variable with at least three groups in a natural order, which the ACCOUNTABILITY measure seems to have (given the available answers to respondents of Agree through Disagree - wait, no: it’s a total score of the scale which is actually a continuous variable right?). See types of regression here: https://statisticsbyjim.com/regression/choosing-regression-analysis/
I’m really stuck on whether the variables, as summations of ordered values, are actually ordinal or if they are interval. :-(
From https://towardsdatascience.com/generalized-linear-models-9ec4dfe3dc3f
I considered this however just could not figure out which family to use. It’s not count so not Poisson. It’s not binomial. I don’t think it’s Gaussian. Blech.
output.glm <-glm(formula = ACCOUNT ~ WTFEAR + WTGUILT + WTANGER + SELFEFF, data = subset(coredata, extremelyinfluential == FALSE), family = poisson)
print(summary(output))
summary(output.glm)
plot(output.glm)
Two-Stage Least-Squares Regression with Diagnostics This uses instrumental variables (which I’m doubtful that my variables actually are) Subsequently from Arcenis 3/15/24 (see Wyzant for message): “I looked through some literature on 2-stage least squares (2SLS) and found that it really is meant for use with instrumental variables. Your model would ideally be run as a sturctural equation model (SEM) meaning the {lavaan} package. The good news is that I think that the specification error in your package likely happened because of misspecification. You might have set it to test for covariances between/among all of your explanatory variables, which would give you fewer degrees of freedom. That’s easily addressable if it is the problem.
If that doesn’t work, you can still try 2SLS, but you’d have to specify the first stage first, then use those estimates to use as an OLS regressor (very doable), you’d just have to ensure that the outcome variable is not highly correlated with your DV. I can help you with that. If you do it this way, then 2SLS is valid.”
My interpretation: This doesn’t work based on my variable types. Variables are definitely not instrumental, there is direct effect of all of them on ACCOUNT.
Also, if I’m understanding the second recommendation: The outcome variable of the first stage would be SELFEFF, and it is actually highly correlated with the DV ACCOUNT. So I think it wouldn’t work for that reason also.
# ivreg package
library(ivreg)
## Warning: package 'ivreg' was built under R version 4.3.3
m_iv <- ivreg(ACCOUNT ~ SELFEFF | WTANGER + WTGUILT + WTFEAR, data = coredata)
summary(m_iv)
##
## Call:
## ivreg(formula = ACCOUNT ~ SELFEFF | WTANGER + WTGUILT + WTFEAR,
## data = coredata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.53007 -0.62336 0.06414 0.49168 1.15835
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.7236 1.5671 -0.462 0.6459
## SELFEFF 1.8551 0.4691 3.954 0.0002 ***
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 3 60 3.768 0.01516 *
## Wu-Hausman 1 61 13.923 0.00042 ***
## Sargan 2 NA 2.057 0.35753
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6611 on 62 degrees of freedom
## Multiple R-Squared: -0.2296, Adjusted R-squared: -0.2494
## Wald test: 15.64 on 1 and 62 DF, p-value: 0.0001996
This type of analysis is frequently done in clinical trials. It includes sensitivity assessments. It appears to be somewhat of an alternative to path analysis/SEM.
Rule of thumb for mediation analysis is ten observations per variable. Could not find a direct statement of this, however Fritz & MacKinnon (2007) reported on a metaanalysis of different types of mediation analyses. Hair et al. (2011) gives rules of thumb that may be applicable.
The mediation package in R primarily uses regression-based methods for mediation analysis (it does not use structural equation modeling). The main function in the package, “mediate”, estimates the causal mediation effects using regression models for the mediator and the outcome, via tests of the indirect effect (i.e., non-SEM product-of-coefficients tests)
From Fritz & MacKinnon (2007): “For example, consider a researcher interested in testing whether intentions mediate the relation between attitudes and behaviors, as proposed by the theory of planned behavior (Ajzen, 1985). The researcher believes that in the study, the effect of attitudes on intentions will be of medium size, the effect of in- tentions on behaviors will be small, and intentions will com- pletely mediate the effect of attitudes on behavior. Using the empirical power tables, the researcher can see that for the joint significance test, a sample size of 405, or a larger sample if measurement error is present, is required for .8 power.”
Converting that to my study: I now know for example (given that I have analysis underway) that total effect of WTANGER on ACCOUNT is of .4 which is medium size (alpha path), the indirect effect of WTANGER on ACCOUNT through SELFEFF is small at .11 (beta path), and SELFEFF does not completely mediate the effect of the WTANGER on ACCOUNT. Using the table in Fritz & MacKinnon (2007) to estimate sample size needed for .8 power, the alpha path is M, the beta path is S, the method is PRODCLIN (indirect product of coefficients) which gives sample size needed of 404 (if I’m doing it correctly).
For future analyses where using this, see Ato García et al. (2014) for guidance in interpreting results - for example their paragraph here which describes their own analysis using mediation package that closely resembles what I have done:
“The two equations were estimated with the R-lm function, the mediation effects with the R-mediate function and the sensitivity analysis with the R-medsens function of the mediation package. For the random replicate used, we found a non-significant indirect effect (ACME = 0.0755, p = .08) and a mediated proportion of 0.115 (p = 0.08), so the presence of a mediating effect was discarded. The second part of Output 1 (Table 1) is a sensitivity analysis with the medsens function, which reveals that the confidence interval of indirect effects includes zero in the entire spectrum of the sensitivity parameter ρ (from - 0.9 to + 0.9). In fact, sensitivity analysis is not required when the indirect effect is not significant. Figure 4 is a plot of sensitivity analysis and shows, together with the axes of indirect effect and ρ, the observed mediating effect (dashed line) and the values that the indirect effect would reach varying the sensitivity parameter (solid curved line) along its range in steps of .10. Note that the confidence interval (limits represented with a grey background) always includes zero for the indirect effect whatever the value of ρ is. This result is expected as a consequence of imposing a zero correlation between the residuals of Equations (1) and (2).”
(My commentary: Remember, if zero correlation between the residuals, or rho = 0, that means that there is no identifiable relationship between the way the predictors move in their respective model equations)
Note that in each of the Sensitivity Region matrixes, it reports on Rho at which ACME = 0: 0.5. This is not an output of the sensitivity calculation, it’s saying how the calculation was done. That rho is not a reportable number relevant to the variables.
library(mediation)
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
## Loading required package: mvtnorm
## Loading required package: sandwich
## mediation: Causal Mediation Analysis
## Version: 4.5.0
# The psych package also has a mediate() function
# https://cran.r-project.org/web/packages/mediation/vignettes/mediation.pdf
# Model Based Causal Mediation Analysis
# mediator model
model.m <- lm(SELFEFF ~ WTANGER + WTGUILT + WTFEAR, data=coredata)
# outcome model
model.y <- lm(ACCOUNT ~ SELFEFF + WTANGER + WTGUILT + WTFEAR, data = coredata)
# # mediator model
# model.m <- lm(SELFEFF ~ WTANGER, data=coredata)
#
# # outcome model
# model.y <- lm(ACCOUNT ~ SELFEFF + WTANGER, data = coredata)
# causal mediation analysis - WTANGER
m1.out <- mediation::mediate(model.m, model.y, sims=1000, treat="WTANGER", mediator="SELFEFF")
summary(m1.out)
##
## Causal Mediation Analysis
##
## Quasi-Bayesian Confidence Intervals
##
## Estimate 95% CI Lower 95% CI Upper p-value
## ACME 0.11310 -0.00122 0.25 0.054 .
## ADE 0.30112 0.10233 0.49 0.004 **
## Total Effect 0.41421 0.19696 0.63 <2e-16 ***
## Prop. Mediated 0.26653 -0.00319 0.61 0.054 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Sample Size Used: 64
##
##
## Simulations: 1000
#plot(m1.out)
# sensitivity analysis - WTANGER
s1.out <- medsens(m1.out)
summary(s1.out)
##
## Mediation Sensitivity Analysis for Average Causal Mediation Effect
##
## Sensitivity Region
##
## Rho ACME 95% CI Lower 95% CI Upper R^2_M*R^2_Y* R^2_M~R^2_Y~
## [1,] -0.9 0.4886 -0.0127 0.9899 0.81 0.3166
## [2,] -0.8 0.3561 -0.0105 0.7228 0.64 0.2501
## [3,] -0.7 0.2922 -0.0097 0.5941 0.49 0.1915
## [4,] -0.6 0.2505 -0.0093 0.5103 0.36 0.1407
## [5,] -0.5 0.2192 -0.0092 0.4476 0.25 0.0977
## [6,] -0.4 0.1937 -0.0092 0.3966 0.16 0.0625
## [7,] -0.3 0.1716 -0.0094 0.3525 0.09 0.0352
## [8,] -0.2 0.1516 -0.0096 0.3128 0.04 0.0156
## [9,] -0.1 0.1328 -0.0101 0.2757 0.01 0.0039
## [10,] 0.0 0.1146 -0.0108 0.2400 0.00 0.0000
## [11,] 0.1 0.0964 -0.0118 0.2046 0.01 0.0039
## [12,] 0.2 0.0776 -0.0134 0.1687 0.04 0.0156
## [13,] 0.3 0.0576 -0.0163 0.1316 0.09 0.0352
## [14,] 0.4 0.0355 -0.0221 0.0932 0.16 0.0625
## [15,] 0.5 0.0100 -0.0359 0.0559 0.25 0.0977
## [16,] 0.6 -0.0213 -0.0710 0.0285 0.36 0.1407
## [17,] 0.7 -0.0630 -0.1413 0.0154 0.49 0.1915
## [18,] 0.8 -0.1269 -0.2641 0.0103 0.64 0.2501
## [19,] 0.9 -0.2594 -0.5282 0.0094 0.81 0.3166
##
## Rho at which ACME = 0: 0.5
## R^2_M*R^2_Y* at which ACME = 0: 0.25
## R^2_M~R^2_Y~ at which ACME = 0: 0.0977
plot(s1.out)
# summarize results
#Causal Mediation Analysis - WTGUILT
m2.out <- mediation::mediate(model.m, model.y, sims=1000, treat="WTGUILT", mediator="SELFEFF")
summary(m2.out)
##
## Causal Mediation Analysis
##
## Quasi-Bayesian Confidence Intervals
##
## Estimate 95% CI Lower 95% CI Upper p-value
## ACME 0.00101 -0.05790 0.07 0.976
## ADE 0.08478 -0.00629 0.17 0.064 .
## Total Effect 0.08579 -0.01801 0.19 0.102
## Prop. Mediated 0.04260 -3.11795 2.50 0.890
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Sample Size Used: 64
##
##
## Simulations: 1000
#plot(m2.out)
# sensitivity analysis - WTGUILT
s2.out <- medsens(m2.out)
summary(s2.out)
##
## Mediation Sensitivity Analysis for Average Causal Mediation Effect
##
## Sensitivity Region
##
## Rho ACME 95% CI Lower 95% CI Upper R^2_M*R^2_Y* R^2_M~R^2_Y~
## [1,] -0.9 -0.0011 -0.2466 0.2443 0.81 0.3166
## [2,] -0.8 -0.0008 -0.1797 0.1780 0.64 0.2501
## [3,] -0.7 -0.0007 -0.1474 0.1461 0.49 0.1915
## [4,] -0.6 -0.0006 -0.1264 0.1252 0.36 0.1407
## [5,] -0.5 -0.0005 -0.1106 0.1096 0.25 0.0977
## [6,] -0.4 -0.0005 -0.0977 0.0968 0.16 0.0625
## [7,] -0.3 -0.0004 -0.0866 0.0858 0.09 0.0352
## [8,] -0.2 -0.0004 -0.0765 0.0758 0.04 0.0156
## [9,] -0.1 -0.0003 -0.0670 0.0664 0.01 0.0039
## [10,] 0.0 -0.0003 -0.0578 0.0573 0.00 0.0000
## [11,] 0.1 -0.0002 -0.0486 0.0482 0.01 0.0039
## [12,] 0.2 -0.0002 -0.0392 0.0388 0.04 0.0156
## [13,] 0.3 -0.0001 -0.0291 0.0288 0.09 0.0352
## [14,] 0.4 -0.0001 -0.0179 0.0178 0.16 0.0625
## [15,] 0.5 0.0000 -0.0051 0.0050 0.25 0.0977
## [16,] 0.6 0.0000 -0.0106 0.0107 0.36 0.1407
## [17,] 0.7 0.0001 -0.0315 0.0318 0.49 0.1915
## [18,] 0.8 0.0003 -0.0635 0.0641 0.64 0.2501
## [19,] 0.9 0.0006 -0.1297 0.1309 0.81 0.3166
##
## Rho at which ACME = 0: 0.5
## R^2_M*R^2_Y* at which ACME = 0: 0.25
## R^2_M~R^2_Y~ at which ACME = 0: 0.0977
plot(s2.out)
#Causal Mediation Analysis - WTFEAR
m3.out <- mediation::mediate(model.m, model.y, sims=1000, treat="WTFEAR", mediator="SELFEFF")
summary(m3.out)
##
## Causal Mediation Analysis
##
## Quasi-Bayesian Confidence Intervals
##
## Estimate 95% CI Lower 95% CI Upper p-value
## ACME -0.149 -0.291 -0.04 0.008 **
## ADE -0.244 -0.431 -0.08 0.006 **
## Total Effect -0.394 -0.604 -0.19 <2e-16 ***
## Prop. Mediated 0.374 0.119 0.71 0.008 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Sample Size Used: 64
##
##
## Simulations: 1000
#plot(m3.out)
# sensitivity analysis - WTFEAR
s3.out <- medsens(m3.out)
summary(s3.out)
##
## Mediation Sensitivity Analysis for Average Causal Mediation Effect
##
## Sensitivity Region
##
## Rho ACME 95% CI Lower 95% CI Upper R^2_M*R^2_Y* R^2_M~R^2_Y~
## [1,] 0.3 -0.0748 -0.1531 0.0036 0.09 0.0352
## [2,] 0.4 -0.0461 -0.1126 0.0204 0.16 0.0625
## [3,] 0.5 -0.0130 -0.0717 0.0458 0.25 0.0977
## [4,] 0.6 0.0276 -0.0336 0.0888 0.36 0.1407
## [5,] 0.7 0.0817 -0.0001 0.1634 0.49 0.1915
##
## Rho at which ACME = 0: 0.5
## R^2_M*R^2_Y* at which ACME = 0: 0.25
## R^2_M~R^2_Y~ at which ACME = 0: 0.0977
plot(s3.out)
(Reflecting my current knowledge/understanding…)
3/11/24 This analysis is not appropriate for this dataset; not enough observations, underpowered. The attempts at using these functions also returned low-quality results (poor goodness of fit measures). Dr. Patrick commented that this study was not set up to do this type of analysis. I am keeping this in the notebook for historical purposes, however this will not be used in the dissertation.
Note: I also wrote up a lot of these analytics and transferred the code into Ch 4 before realizing that it is underpowered. All of that effort has been cut from the ms and placed in the ch1-snippets.RMD file. There are some good diagrams there (they’re just not valid). Maybe a future study.
3/15/24 Arcenis says that SEM may be appropriate and recommended that lavaan may actually be a valid package to use. (First began discussing using two-stage least squares regression, which is used when there is an instrumental variable; he had misunderstood my “IV” notation as that, when I meant it as “independent variables.” The white racial affects are NOT instrumental variables; instrumental variables are correlated with the predictor but not correlated with the outcome variable, and that’s not the case here, the emotions show a relationship to selfeff and also directly on account.)
See here for pros of SEM and when to avoid: https://stats.stackexchange.com/questions/11255/whether-to-use-structural-equation-modelling-to-analyse-observational-studies-in/187406#187406
https://stats.stackexchange.com/questions/187924/sample-size-for-sem-with-ordinal-measures
# uses seminr package
measurements <- constructs(
reflective("Accountability", single_item("ACCOUNT")),
reflective("White Anger", single_item("WTANGER")),
reflective("White Guilt", single_item("WTGUILT")),
reflective("White Fear", single_item("WTFEAR")),
reflective("Self Efficacy", single_item("SELFEFF"))
)
# create multiple paths "from" and "to" sets of constructs
structure <- relationships(
paths(from = c("WTANGER", "WTGUILT", "WTFEAR"), to = "SELFEFF"),
paths(from = "SELFEFF", to = c("ACCOUNT"))
)
cbsem_model <- estimate_cbsem(data = coredata, measurements, structure)
Resulted in warning: Error in try_or_stop(lavaan::sem(model = lavaan_model, data = data, std.lv = TRUE, : Could not estimating CBSEM using Lavaan: lavaan WARNING: The variance-covariance matrix of the estimated parameters (vcov) does not appear to be positive definite! The smallest eigenvalue (= -3.988603e-03) is smaller than zero. This may be a symptom that the model is not identified.
GOOD TUTORIAL ON PATH https://rpubs.com/tbihansk/302732
See here for description of fit statistics https://davidakenny.net/cm/fit.htm
The problem is, in this first output using the cfa() function, the CFI .710 and TLI .324 are not showing goodness of fit, those statistics indicate a poor model.
library(lavaan)
## This is lavaan 0.6-17
## lavaan is FREE software! Please report any bugs.
library(lavaanPlot)
library(semPlot)
library(OpenMx)
library(tidyverse)
library(knitr)
library(kableExtra)
library(GGally)
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
pathmodel <- '
ACCOUNT ~ SELFEFF
SELFEFF ~ WTANGER + WTGUILT + WTFEAR
'
fit <- cfa(pathmodel, data = coredata)
summary(fit, fit.measures = TRUE, standardized=T,rsquare=T)
## lavaan 0.6.17 ended normally after 2 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 6
##
## Number of observations 64
##
## Model Test User Model:
##
## Test statistic 18.384
## Degrees of freedom 3
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 60.126
## Degrees of freedom 7
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.710
## Tucker-Lewis Index (TLI) 0.324
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -74.448
## Loglikelihood unrestricted model (H1) -65.256
##
## Akaike (AIC) 160.896
## Bayesian (BIC) 173.849
## Sample-size adjusted Bayesian (SABIC) 154.965
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.283
## 90 Percent confidence interval - lower 0.168
## 90 Percent confidence interval - upper 0.414
## P-value H_0: RMSEA <= 0.050 0.001
## P-value H_0: RMSEA >= 0.080 0.997
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.099
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## ACCOUNT ~
## SELFEFF 0.819 0.130 6.276 0.000 0.819 0.617
## SELFEFF ~
## WTANGER 0.184 0.095 1.933 0.053 0.184 0.225
## WTGUILT -0.000 0.047 -0.009 0.993 -0.000 -0.001
## WTFEAR -0.238 0.085 -2.802 0.005 -0.238 -0.323
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .ACCOUNT 0.213 0.038 5.657 0.000 0.213 0.619
## .SELFEFF 0.165 0.029 5.657 0.000 0.165 0.841
##
## R-Square:
## Estimate
## ACCOUNT 0.381
## SELFEFF 0.159
pathresult <- lavaanPlot(model = fit, node_options = list(shape = "box", fontname = "Helvetica"), edge_options = list(color = "grey"), coefs = TRUE, sig = 0.054)
pathresult
save_png(pathresult, "fig-pathresults.png")
model_performance(fit)
Now, the output using this sem() function shows 0 degrees of freedom, which is a just-identified model. Also not valid results.
# from https://advstats.psychstat.org/book/path/index.php
mediationfear <- '
ACCOUNT ~ b*SELFEFF + cp*WTFEAR
SELFEFF ~ a*WTFEAR
ab := a*b
total := a*b + cp
'
mediationfear.res <- sem(mediationfear, data = coredata)
summary(mediationfear.res)
## lavaan 0.6.17 ended normally after 1 iteration
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 5
##
## Number of observations 64
##
## Model Test User Model:
##
## Test statistic 0.000
## Degrees of freedom 0
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## ACCOUNT ~
## SELFEFF (b) 0.725 0.133 5.433 0.000
## WTFEAR (cp) -0.211 0.098 -2.141 0.032
## SELFEFF ~
## WTFEAR (a) -0.242 0.087 -2.784 0.005
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .ACCOUNT 0.199 0.035 5.657 0.000
## .SELFEFF 0.175 0.031 5.657 0.000
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|)
## ab -0.176 0.071 -2.477 0.013
## total -0.386 0.112 -3.439 0.001
The model fits measures of CFI and TLI do not indicate goodness of fit (see writeup in ch 4 already)
This model shows only how WTFEAR may be mediated by SELFEFF on ACCOUNT. In interpreting those results: The output of the mediator model with SELFEFF as mediator between WTFEAR and ACCOUNT showed that if WTFEAR is regressed against SELFEFF, p-value is 0.005 and the coefficient estimate beta is -0,242. However, when included with SELFEFF in the regression against ACCOUNT, the coefficient beta decreases to -0.211. This indicates that SELFEFF mediates the relationship between WTFEAR and ACCOUNT. This also is confirmed by the ab and total: Both have low p-values, indicating the relationships are statistically significant.
This is different when we run the same thing on WTANGER:
# from https://advstats.psychstat.org/book/path/index.php
mediationanger <- '
ACCOUNT ~ b*SELFEFF + cp*WTANGER
SELFEFF ~ a*WTANGER
ab := a*b
total := a*b + cp
'
mediationanger.res <- sem(mediationanger, data = coredata)
summary(mediationanger.res)
## lavaan 0.6.17 ended normally after 1 iteration
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 5
##
## Number of observations 64
##
## Model Test User Model:
##
## Test statistic 0.000
## Degrees of freedom 0
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## ACCOUNT ~
## SELFEFF (b) 0.729 0.125 5.826 0.000
## WTANGER (cp) 0.315 0.102 3.085 0.002
## SELFEFF ~
## WTANGER (a) 0.190 0.099 1.910 0.056
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .ACCOUNT 0.186 0.033 5.657 0.000
## .SELFEFF 0.185 0.033 5.657 0.000
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|)
## ab 0.138 0.076 1.815 0.069
## total 0.454 0.123 3.688 0.000
The results looking solely at WTANGER being mediated by SELFEFF show a non-statistically significant ab result which is the path from the emotion through SELFEFF to ACCOUNT. The beta for WTANGER also INCREASES in the full model.
And with WTGUILT the effects are actually even more pronounced:
# from https://advstats.psychstat.org/book/path/index.php
mediationguilt <- '
ACCOUNT ~ b*SELFEFF + cp*WTGUILT
SELFEFF ~ a*WTGUILT
ab := a*b
total := a*b + cp
'
mediationguilt.res <- sem(mediationguilt, data = coredata)
summary(mediationguilt.res)
## lavaan 0.6.17 ended normally after 7 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 5
##
## Number of observations 64
##
## Model Test User Model:
##
## Test statistic 0.000
## Degrees of freedom 0
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## ACCOUNT ~
## SELFEFF (b) 0.818 0.127 6.450 0.000
## WTGUILT (cp) 0.097 0.051 1.924 0.054
## SELFEFF ~
## WTGUILT (a) 0.001 0.050 0.022 0.983
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .ACCOUNT 0.202 0.036 5.657 0.000
## .SELFEFF 0.196 0.035 5.657 0.000
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|)
## ab 0.001 0.041 0.022 0.983
## total 0.098 0.065 1.511 0.131
WTGUILT has an almost 0 coefficient estimate when regressed on SELFEFF and a very high p-value of almost 1. WTGUILT also does not appear to be a statistically significant influence on ACCOUNT at all. Only the coefficients with p < 0.05 are included in the below diagram.
And try it with all 3
library(lavaanPlot)
# https://www.alexlishinski.com/post/lavaanplot-0-5-1/
mediationall <- '
# direct effect
ACCOUNT ~ d*SELFEFF + ep*WTANGER + fp*WTGUILT + gp*WTFEAR
# mediator
SELFEFF ~ a*WTANGER + b*WTGUILT + c*WTFEAR
# indirect effect of the white racial affects
ad := a*d #WTANGER
bd := b*d #WTGUILT
cd := c*d #WTFEAR
# total effect
total := a*d + b*d + c*d + ep + fp + gp
'
mediationall.res <- sem(mediationall, data = coredata)
summary(mediationall.res)
## lavaan 0.6.17 ended normally after 1 iteration
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 9
##
## Number of observations 64
##
## Model Test User Model:
##
## Test statistic 0.000
## Degrees of freedom 0
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## ACCOUNT ~
## SELFEFF (d) 0.623 0.123 5.061 0.000
## WTANGER (ep) 0.299 0.096 3.103 0.002
## WTGUILT (fp) 0.087 0.046 1.890 0.059
## WTFEAR (gp) -0.246 0.089 -2.768 0.006
## SELFEFF ~
## WTANGER (a) 0.184 0.095 1.933 0.053
## WTGUILT (b) -0.000 0.047 -0.009 0.993
## WTFEAR (c) -0.238 0.085 -2.802 0.005
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .ACCOUNT 0.160 0.028 5.657 0.000
## .SELFEFF 0.165 0.029 5.657 0.000
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|)
## ad 0.115 0.063 1.806 0.071
## bd -0.000 0.029 -0.009 0.993
## cd -0.149 0.061 -2.451 0.014
## total 0.106 0.151 0.703 0.482
confirmedpath <- lavaanPlot(model = mediationall.res, node_options = list(shape = "box", fontname = "Helvetica"), edge_options = list(color = "grey"), coefs = TRUE,covs = TRUE, stars = c("regress"))
confirmedpath
save_png(confirmedpath, "fig-confirmedpath.png")
model_performance(mediationall.res)
The Model Test User Model output by lavaan of Test statistic 0.000 and Degrees of freedom 0 indicate that this is a just-identified model which is a sign that it may be overfit to the data.
Including all three white racial affects in the mediation model gives same results, at least based on p-values: Only WTFEAR is mediated by SELFEFF (p = .005 in regression against ACCOUNT; the other two p > .05, though WTANGER p = .053 which rounds to .05 so I don’t know how to consider that??). WTGUILT is not an important element in the model at all.
There are correlation coefficients between the other two white racial affects, as shown in the path diagram, but they are not statistically significant.
The influence of SELFEFF on ACCOUNT is lessened with all three emotions in the model - the beta here for SELFEFF on ACCOUNT is 0.623, it has a much higher influence when only one of the emotions are included. So this means that… what? That the emotions are actually moderating the influence of SELFEFF?
Test with no direct effect - this test actually shows that SELFEFF as a mediator variable is not sufficient, that the direct effect of the emotions is also needed to improve the fit of the model.
The code below shows the model without any direct effects from any of the emotions to ACCOUNT. This is only showing effect of the emotions through SELFEFF. This is from this tutorial
From the text of that tutorial: “To evaluate the hypothesis, we can check the model fit. The null hypothesis is “H0: The model fits the data well or the model is supported”. The alternative hypothesis is “H1: The model does not fit the data or the model is rejected”. The model with the direct effect fits the data perfectly. Therefore, if the current model also fits the data well, we fail to reject the null hypothesis. Otherwise, we reject it. The test of the model can be conducted based on a chi-squared test.”
Applying it to my results: From the output, the Chi-square is 18.384 with 3 degree of freedom. The p-value is about 0. Therefore, the null hypothesis is rejected. This indicates that the model without direct effect is not a good model.
# https://advstats.psychstat.org/book/path/index.php
mediationnodirecte <- '
# direct effect
ACCOUNT ~ d*SELFEFF
# mediator
SELFEFF ~ a*WTANGER + b*WTGUILT + c*WTFEAR
'
mediationnodirecte.res <- sem(mediationnodirecte, data = coredata)
summary(mediationnodirecte.res)
## lavaan 0.6.17 ended normally after 2 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 6
##
## Number of observations 64
##
## Model Test User Model:
##
## Test statistic 18.384
## Degrees of freedom 3
## P-value (Chi-square) 0.000
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## ACCOUNT ~
## SELFEFF (d) 0.819 0.130 6.276 0.000
## SELFEFF ~
## WTANGER (a) 0.184 0.095 1.933 0.053
## WTGUILT (b) -0.000 0.047 -0.009 0.993
## WTFEAR (c) -0.238 0.085 -2.802 0.005
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .ACCOUNT 0.213 0.038 5.657 0.000
## .SELFEFF 0.165 0.029 5.657 0.000
checkwithoutdirecte <- lavaanPlot(model = mediationnodirecte.res, node_options = list(shape = "box", fontname = "Helvetica"), edge_options = list(color = "grey"), coefs = TRUE, sig = 0.054)
checkwithoutdirecte
save_png(checkwithoutdirecte, "fig-checkwithoutdirecteffect.png")
Below meets goodness-of-fit measures and shows both direct and indirect relationships. This is different from what was originally proposed.
# trying to tweak things based on information from chatbot that it's
# a just-identified model. Removed bd := b*d #WTGUILT
mediation2 <- '
# direct effect
ACCOUNT ~ d*SELFEFF + ep*WTANGER + fp*WTGUILT + gp*WTFEAR
# mediator
SELFEFF ~ a*WTANGER + c*WTFEAR
# indirect effect of the white racial affects
#ad := a*d #WTANGER
#cd := c*d #WTFEAR
# total effect
total := a*d + c*d + ep + fp + gp
'
mediation2.res <- sem(mediation2, data = coredata)
#mediation2.res <- sem(mediation2, data = subset(coredata, extremelyinfluential == FALSE))
summary(mediation2.res)
## lavaan 0.6.17 ended normally after 1 iteration
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 8
##
## Number of observations 64
##
## Model Test User Model:
##
## Test statistic 0.000
## Degrees of freedom 1
## P-value (Chi-square) 0.993
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## ACCOUNT ~
## SELFEFF (d) 0.623 0.123 5.061 0.000
## WTANGER (ep) 0.299 0.096 3.103 0.002
## WTGUILT (fp) 0.087 0.046 1.890 0.059
## WTFEAR (gp) -0.246 0.089 -2.768 0.006
## SELFEFF ~
## WTANGER (a) 0.184 0.094 1.961 0.050
## WTFEAR (c) -0.239 0.085 -2.820 0.005
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .ACCOUNT 0.160 0.028 5.657 0.000
## .SELFEFF 0.165 0.029 5.657 0.000
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|)
## total 0.106 0.150 0.706 0.480
model_performance(mediation2.res)
# can also add covs=TRUE to show beween-emotions numbers
confirmedpath2 <- lavaanPlot(model = mediation2.res, node_options = list(shape = "box", fontname = "Helvetica"), edge_options = list(color = "grey"), coefs = TRUE, stars = c("regress"))
confirmedpath2
save_png(confirmedpath2, "fig-confirmedpath2.png")
3/3/24 Did comparisons of the same equation with: - outlier included and not: results are very similar, except effect size is markedly greater on several relationship with the outlier included - counselors only vs MFTs: Also similar, however WTGUILT was increased and was statistically significant; WTANGER was reduced effect AND no longer a statistically significant result.
Let’s do that same model with cfa() instead of sem()
fitcfarev <- cfa(mediation2, data = coredata)
summary(fitcfarev, fit.measures = TRUE, standardized=T,rsquare=T)
## lavaan 0.6.17 ended normally after 1 iteration
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 8
##
## Number of observations 64
##
## Model Test User Model:
##
## Test statistic 0.000
## Degrees of freedom 1
## P-value (Chi-square) 0.993
##
## Model Test Baseline Model:
##
## Test statistic 60.126
## Degrees of freedom 7
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 1.000
## Tucker-Lewis Index (TLI) 1.132
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -65.256
## Loglikelihood unrestricted model (H1) -65.256
##
## Akaike (AIC) 146.512
## Bayesian (BIC) 163.783
## Sample-size adjusted Bayesian (SABIC) 138.605
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.000
## P-value H_0: RMSEA <= 0.050 0.993
## P-value H_0: RMSEA >= 0.080 0.006
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.000
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## ACCOUNT ~
## SELFEFF (d) 0.623 0.123 5.061 0.000 0.623 0.470
## WTANGER (ep) 0.299 0.096 3.103 0.002 0.299 0.276
## WTGUILT (fp) 0.087 0.046 1.890 0.059 0.087 0.164
## WTFEAR (gp) -0.246 0.089 -2.768 0.006 -0.246 -0.251
## SELFEFF ~
## WTANGER (a) 0.184 0.094 1.961 0.050 0.184 0.225
## WTFEAR (c) -0.239 0.085 -2.820 0.005 -0.239 -0.323
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .ACCOUNT 0.160 0.028 5.657 0.000 0.160 0.464
## .SELFEFF 0.165 0.029 5.657 0.000 0.165 0.841
##
## R-Square:
## Estimate
## ACCOUNT 0.536
## SELFEFF 0.159
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## total 0.106 0.150 0.706 0.480 0.106 0.143
pathresultcfarev <- lavaanPlot(model = fitcfarev, node_options = list(shape = "box", fontname = "Helvetica"), edge_options = list(color = "grey"), coefs = TRUE, sig = 0.054)
pathresultcfarev
save_png(pathresultcfarev, "fig-fitcfarev.png")
model_performance(fitcfarev)
This also appears to not be a good fitting model. The CFI should be > 0.95 (hu) but the SRMR = 0 which indicates perfect fit. See https://ecologicalprocesses.springeropen.com/articles/10.1186/s13717-016-0063-3
So I am overfitting to the data.
What happens when I run all of this with the full dataset???
SELFEFF being influenced by ACCOUNTABILITY and ACCOUNTABILITY being influenced by the emotions?
pathmodel2 <- '
SELFEFF ~ ACCOUNT
ACCOUNT ~ WTFEAR + WTGUILT + WTANGER
'
fit2 <- cfa(pathmodel2, data = coredata)
summary(fit2, fit.measures = TRUE, standardized=T,rsquare=T)
## lavaan 0.6.17 ended normally after 2 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 6
##
## Number of observations 64
##
## Model Test User Model:
##
## Test statistic 1.896
## Degrees of freedom 3
## P-value (Chi-square) 0.594
##
## Model Test Baseline Model:
##
## Test statistic 60.126
## Degrees of freedom 7
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 1.000
## Tucker-Lewis Index (TLI) 1.048
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -66.204
## Loglikelihood unrestricted model (H1) -65.256
##
## Akaike (AIC) 144.408
## Bayesian (BIC) 157.362
## Sample-size adjusted Bayesian (SABIC) 138.478
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.177
## P-value H_0: RMSEA <= 0.050 0.654
## P-value H_0: RMSEA >= 0.080 0.269
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.037
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## SELFEFF ~
## ACCOUNT 0.465 0.074 6.276 0.000 0.465 0.617
## ACCOUNT ~
## WTFEAR -0.395 0.099 -3.976 0.000 -0.395 -0.403
## WTGUILT 0.087 0.055 1.592 0.111 0.087 0.164
## WTANGER 0.414 0.111 3.731 0.000 0.414 0.382
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .SELFEFF 0.121 0.021 5.657 0.000 0.121 0.619
## .ACCOUNT 0.224 0.040 5.657 0.000 0.224 0.650
##
## R-Square:
## Estimate
## SELFEFF 0.381
## ACCOUNT 0.350
skip for now
# from https://advstats.psychstat.org/book/path/index.php
library(lavaan)
#usedata('coredata')
# mediation <- '
# + ACCOUNT ~ b*SELFEFF + cp*WTANGER
# + SELFEFF ~ a*WTANGER
# + ab := a*b
# + total := a*b + cp
# + '
#
# mediation.res<-sem(mediation, data=coredata)
# summary(mediation.res)
mymodel <- '
# regressions
ACCOUNT ~ SELFEFF
SELFEFF ~ WTGUILT + WTANGER + WTFEAR
# residual correlations
# y1 ~~ y5
# y2 ~~ y4 + y6
# y3 ~~ y7
# y4 ~~ y8
# y6 ~~ y8
'
fit <- sem(mymodel, data = coredata)
summary(fit, standardized = TRUE)
And try it with all 3
semselfeff <- '
# direct effect
SELFEFF ~ d*ACCOUNT + ep*WTANGER + fp*WTGUILT + gp*WTFEAR
# mediator
ACCOUNT ~ a*WTANGER + b*WTGUILT + c*WTFEAR
# indirect effect of the white racial affects
ad := a*d #WTANGER
bd := b*d #WTGUILT
cd := c*d #WTFEAR
# total effect
total := a*d + b*d + c*d + ep + fp + gp
'
semselfeff.res <- sem(semselfeff, data = coredata)
summary(semselfeff.res)
## lavaan 0.6.17 ended normally after 1 iteration
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 9
##
## Number of observations 64
##
## Model Test User Model:
##
## Test statistic 0.000
## Degrees of freedom 0
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## SELFEFF ~
## ACCOUNT (d) 0.458 0.091 5.061 0.000
## WTANGER (ep) -0.006 0.089 -0.066 0.947
## WTGUILT (fp) -0.040 0.040 -0.999 0.318
## WTFEAR (gp) -0.058 0.080 -0.716 0.474
## ACCOUNT ~
## WTANGER (a) 0.414 0.111 3.731 0.000
## WTGUILT (b) 0.087 0.055 1.592 0.111
## WTFEAR (c) -0.395 0.099 -3.976 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .SELFEFF 0.118 0.021 5.657 0.000
## .ACCOUNT 0.224 0.040 5.657 0.000
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|)
## ad 0.190 0.063 3.003 0.003
## bd 0.040 0.026 1.519 0.129
## cd -0.181 0.058 -3.127 0.002
## total -0.055 0.129 -0.425 0.670
selfeffpath <- lavaanPlot(model = semselfeff.res, node_options = list(shape = "box", fontname = "Helvetica"), edge_options = list(color = "grey"), coefs = TRUE,covs = TRUE, stars = c("regress"))
selfeffpath
save_png(selfeffpath, "fig-selfeffpath.png")
model_performance(semselfeff.res)
Nope, it’s a just-identified model.
But, some remarks on what it showed: WTANGER and WTGUILT totally not showing direct effects on SELFEFF. But they DO show direct effects on ACCOUNT through the original proposed model (per regressions). WTFEAR is the only one that has a valid direct effect, and it’s smaller in this model (-0.128, compared to the -0.24 we were seeing with the ACCOUNT DV models). What this does seem to emphasize though: The three white racial affects have a DIRECT EFFECT on ACCOUNT.
So my thinking was, these racialized emotions affect people’s ability to feel like they have self-efficacy around antiracism, but really they are affecting people’s ability to even feel ACCOUNTABLE for antiracism.
WTFEAR makes your accountability go DOWN.
OMG THAT SEEMS HUGE?!??
library(readr)
if (counselingprofessionalsonly == TRUE) {
write_csv(cleandata, "data\\output-counselors-cleandata.csv")
write_csv(coredata, "data\\output-counselors-coredata.csv")
}
write_csv(cleandata, "data\\output-cleandata.csv")
write_csv(coredata, "data\\output-coredata.csv")
write_csv(pcrw_subset_recoded, "data\\pcrw.csv")
write_csv(justvars, "data\\justvars.csv")
#library(webshot)
flextable(lm(ACCOUNT ~ WTFEAR + WTGUILT + WTANGER + SELFEFF, data = coredata)) %>%
set_caption(caption = "Table 1") %>%
save_as_image("tmp1.png")
flextable(lm(ACCOUNT ~ WTFEAR + WTGUILT + WTANGER + SELFEFF, data = subset(coredata, extremelyinfluential == FALSE))) %>%
set_caption(caption = "Table 2") %>% save_as_image("tmp2.png")
knitr::include_graphics(c("tmp1.png", "tmp2.png"))
library(stargazer)
# this library seems to suck
modelwithextreme = lm(ACCOUNT ~ WTFEAR + WTGUILT + WTANGER + SELFEFF, data = coredata)
modelwithoutextreme = lm(ACCOUNT ~ WTFEAR + WTGUILT + WTANGER + SELFEFF, data = subset(coredata, extremelyinfluential == FALSE))
#stargazer(modelwithextreme, modelwithoutextreme, title="Regression Results", align=TRUE,type="html")
stargazer(modelwithextreme, modelwithoutextreme, title="Regression Results")