First filtering ASU exclusions
data <- data %>% filter(exclude != 1 | is.na(exclude))
ASUExclusions <- nrow(data)
Then filtering JJ exclusions
data <- data %>% filter(!str_detect(PCODE, "EXCLUDE")) %>%
filter(ExcludeSurveyData == "No" | is.na(ExcludeSurveyData))
JJExclusions <- nrow(data)
Then filtering out JJ Coding Exclusions
data <- data %>% filter(ExcludeVideoData == "No" | is.na(ExcludeVideoData))
JJCodingExclusions <- nrow(data)
Then filtering out ASU Coding Exclusions
data <- data %>% filter(videoCoded == 1 | is.na(videoCoded))
ASUCodingExclusions <- nrow(data)
After processing all exclusions, the final sample is 223.
The initial preference data was MESSY. Some reminders:
Coders could select multiple initial preferences, including:
Our two datasets handled missing values differently.
The data was initially split into eight individual dummy-coded
columns (one for each potential preference).
In the code below, I do the following
#### Function to clean a column by replacing -99, -99.0, 99, 99.0, and NA values with 0
convertToZero <- function(column) {
ifelse(column %in% c("-99", "-99.0", "99", "99.0") | is.na(column), 0, column)
}
#### Clean Initial_Pref_1 Column ####
data$Initial_Pref_1 <- convertToZero(data$Initial_Pref_1)
data <- data %>% mutate(Initial_Pref_1 = ifelse(Initial_Pref_1 == "Emily: 1", "1", Initial_Pref_1))
data <- data %>% mutate(Initial_Pref_1 = as.numeric(Initial_Pref_1))
#### Clean Initial_Pref_2 Column ###
data$Initial_Pref_2 <- convertToZero(data$Initial_Pref_2)
data <- data %>% mutate(Initial_Pref_2 = as.numeric(Initial_Pref_2))
#### Clean Initial_Pref_3 Column ####
data$Initial_Pref_3 <- convertToZero(data$Initial_Pref_3)
data <- data %>% mutate(Initial_Pref_3 = as.numeric(Initial_Pref_3))
#### Clean Initial_Pref_4 Column ####
data$Initial_Pref_4 <- convertToZero(data$Initial_Pref_4)
data <- data %>% mutate(Initial_Pref_4 = ifelse(Initial_Pref_4 == "Emily: 4", "4", Initial_Pref_4))
data <- data %>% mutate(Initial_Pref_4 = as.numeric(Initial_Pref_4))
#### Clean Initial_Pref_5 Column ####
data$Initial_Pref_5 <- convertToZero(data$Initial_Pref_5)
data <- data %>% mutate(Initial_Pref_5 = as.numeric(Initial_Pref_5))
#### Clean Initial_Pref_6 Column ####
data$Initial_Pref_6 <- convertToZero(data$Initial_Pref_6)
data <- data %>% mutate(Initial_Pref_6 = as.numeric(Initial_Pref_6))
#### Clean Initial_Pref_7 Column ####
data$Initial_Pref_7 <- convertToZero(data$Initial_Pref_7)
#### Clean Initial_Pref_8 Column ####
data$Initial_Pref_8 <- convertToZero(data$Initial_Pref_8)
#### Convert text to numeric ####
data <- data %>% mutate(
Initial_Pref_7 = ifelse(Initial_Pref_7 == "Reject", 7,
ifelse(Initial_Pref_7 == "0", 0, NA)),
Initial_Pref_8 = ifelse(Initial_Pref_8 == "Not Visible", 8,
ifelse(Initial_Pref_8 == "0", 0, NA))
)
##Combined into single column with comma separators
data <- data %>% mutate(
Combined_Initial_Pref = apply(select(., starts_with("Initial_Pref_")), 1, function(x) {
paste(sort(na.omit(as.character(x[x != 0]))), collapse = ",")
})
)
data <- data %>%
mutate(
Combined_Initial_Pref = ifelse(Combined_Initial_Pref == "", NA, Combined_Initial_Pref)
)
There were several ID variables, including decision, decision2, and Final_ID. For JJ at least, our resolved final ID coder data was in the column Final_ID. We know, however, that there were some admin/coder discrepancies so the below numbers may be subject to change.
The code below standardizes the Final_ID column, ensuring consistent numeric formatting, removing invalid entries, and preparing it for analysis.
This includes:
data <- data %>%
mutate(
Final_ID = ifelse(grepl("^\\d+\\.0$", Final_ID), gsub("\\.0$", "", Final_ID), Final_ID)
)
data <- data %>%
mutate(
Final_ID = ifelse(Final_ID == "Reject", 7, Final_ID)
)
data <- data %>%
mutate(
Final_ID = ifelse(Final_ID == "Coder A: 3 | Coder B: 6", NA, Final_ID)
)
data$Final_ID <- as.numeric(data$Final_ID)
I created several indicator variables to track patterns in decision-making:
Consistency Between Initial and Final Choices:
Suspect Inclusion:
Rejection Inclusion:
Changes Over Time:
## Match: Dummy Var for whether the final ID was present in the initial preferences
data <- data %>%
mutate(
Match = str_detect(Combined_Initial_Pref, as.character(Final_ID))
)
## Mismatch: The inverse of the one above (final ID not present in initial pref)
data <- data %>%
mutate(
Mismatch = !str_detect(Combined_Initial_Pref, as.character(Final_ID))
)
#### Extra Ones Below ####
##Dummy Var for whether suspect is present in Initial Preferences
data <- data %>%
mutate(
Suspect_In_Initial_Pref = str_detect(Combined_Initial_Pref, paste0("\\b", Order, "\\b"))
)
## Dummy Variable for whether suspect is present in Final Preferences
data <- data %>%
mutate(
Suspect_In_Final_ID = Order == Final_ID
)
#Dummy Var for whether Rejection is present in Initial Preferences
data <- data %>%
mutate(
Rejection_In_Initial_Pref = str_detect(Combined_Initial_Pref, "\\b7\\b"))
## Suspect dropped (number of people who had suspect in initial preference but not in final id)
suspect_dropped_count <- data %>%
summarize(Count = sum(Suspect_In_Initial_Pref & !Suspect_In_Final_ID, na.rm = TRUE))
## Suspect added (number of people who DID NOT have suspect in initial preference but had it in final id)
suspect_added_count <- data %>%
summarize(Count = sum(!Suspect_In_Initial_Pref & Suspect_In_Final_ID, na.rm = TRUE))
## Rejection dropped (number of people who had rejection in initial preference but not in final id)
rejection_dropped_count <- data %>%
summarize(Count = sum(str_detect(Combined_Initial_Pref, "\\b7\\b") & Final_ID != 7, na.rm = TRUE))
## Rejection added (number of people who DID NOT have rejection in initial preference but had it in final id)
rejection_added_count <- data %>%
summarize(Count = sum(!str_detect(Combined_Initial_Pref, "\\b7\\b") & Final_ID == 7, na.rm = TRUE))
Indicator Variable | Number of Observations |
|---|---|
Match | 204 |
Mismatch | 17 |
Note: Match refers to consistency between initial and final preferences. | |
Indicator Variable | Number of Observations |
|---|---|
Suspect_In_Initial_Pref | 88 |
Suspect_In_Final_ID | 73 |
Indicator Variable | Number of Observations |
|---|---|
Rejection_In_Initial_Pref | 23 |
Rejection_In_Final_ID | 21 |
Indicator Variable | Number of Observations |
|---|---|
Suspect_Dropped | 19 |
Suspect_Added | 4 |
Note: Dropped indicates that suspect was in initial preference but not final ID. Added indicates suspect was the final ID but not among initial preferences. | |
Indicator Variable | Number of Observations |
|---|---|
Rejection_Dropped | 7 |
Rejection_Added | 5 |
Note: Dropped indicates that rejection was in initial preference but not final ID. Added indicates rejection was in final ID but not among initial preferences. | |