Summary of Findings

Scores on the Thurstone Scales:

FORA is a continuum from extreme openness (M = 1) to extreme selectivity (M = 11) with regards to the fairness of specialized, advanced art programs.

EHDP is a continuum from extreme focus on the happiness and benefit to the individual (M = 1) to extreme focus on the benefit to society (M = 11) as justification for specialized, advanced art programs.

Grouping variables include interests in and experience with each type of specialized, advanced art program, race/ethnicity variables, grade levels taught, locale and poverty level of school, etc.

1. Description of Respondents’ Experiences

The most common experience reported are Art Club and Awards. The least common are IB, Magnet School, and None.

2. Central Views are Flexible Views

a. Reasonably strong association at R-squared of .32 with distance from mean of EDHP being a significant predictor of EHDP number. The more extreme your views about EHDP the fewer items you agree with, and more importantly, the more central your views about EHDP the more agreeable you are in selecting more items on the scale.

b. Also a significant but small negative relationships between extremity of views about the EHDP variable and breadth of statements endorsed.

This just means that average scores on EHDP are not made by selecting only a few statements in the middle, but by selecting a balance of statements on both sides of the mean.

With more data, this pattern may support the hypothesis that those in the middle have flexible views, endorsing statements on both sides of the continuum. It’s a given that those on the extremes of EHDP score would not have endorsed as many items as it’s the only way to keep your score on the extreme, but it is interesting to know that that’s not common in the middle.

3. Interpretation of Principal Components

Principal components analysis was conducted to abstract from the scaled continuous and ordinal variables.

PC1: Outsider skeptic

Eigenvalue of 2.2 explains a substantial portion of the variance. Much lower agreeableness and flexibility and second-lowest feedback rating. Experience and especially interest in fewer types of programs than others Fewer years than average, but not the fewest. Most rural schools, average school poverty.

PC2: Open, positive, & fresh

Eigenvalue of 1.4 explains a substantial amount of the variance, but less than PC1. Highest agreeableness, flexibility, and feedback ratings Interested in more types of programs than the few types of programs they have experience with - the fewest of all groups. Newest to teaching art. A little below average rurality and lowest school poverty level (suburban)

PC3: Cautious and mid-career at rural and at the poorest schools

Eigenvalue of 1.2 explains a substantial amount of the variance, but less than PC1 & PC2. Above-average agreeableness and flexibility, lowest feedback ratings. Experienced with average number of types of programs Interested in the greatest number of types of programs Second-least number of years of teaching than than other groups. At schools that are the most poor and second-most rural

PC4: Focused, experienced, more urban and less poor schools

Eigenvalue of 0.9 just below the threshold for usefulness in abstraction. About average agreeableness, flexibility, and feedback ratings Interested in fewer types of programs than has experience with Greatest number of years of teaching Working at the most urban schools with above-average poverty

5. FORA is predicted by PCs 1-3.

FORA decreases by .45 units with a unit of increase on PC1 (outsider) and increases by .43 units with a unit increase in PC2 (early and open), both significant (p < .001). A considerable amount of the variation in FORA score is explained by PC predictors (Adjusted R-squared of 0.56). A respondent at the mean of both principal components had a FORA score of 4.73 (p < .001), which is a little on the openness=fairness side.

PC1 ~ lower FORA

PC2 ~ higher FORA

PC3 ~ a bit higher FORA

# Load necessary libraries
library(tidyr)
library(dplyr)
library(psych)
library(gridExtra)
library(ggplot2)
library(caret)
library(car)
library(readr)
library(quantreg)
library(tibble)
library(plotly)
library(reshape2)
library(clipr)
library(lavaan)
library(semTools)

Set the date for use in writing filenames

# Get the current date in the format YYYYMMDD
date_suffix <- format(Sys.Date(), "%Y%m%d")

Load data and trim columns

data <- read.csv("asap-aq-responses.csv", colClasses = c(interests = "character", experience = "character"))


data <- data %>%
  filter(Q_RelevantIDDuplicateScore < 80,
         interests != "")
  
#still 87 responses okay go with it.


#remove unused columns from survey export
data[, 1:22] <- NULL


data[, 17:20] <- NULL

#16 useful variables remain

Create and transform useful variables

#binary variables for eligibility categories that are not mutually exclusive

data$preservice <- ifelse(grepl("1", data$eligible), 1, 0)
data$practicing <- ifelse(grepl("2", data$eligible), 1, 0)
data$researcher <- ifelse(grepl("3", data$eligible), 1, 0)
data$eligible <- NULL

#binary variables for race and ethnicity

data$raceWhite <- ifelse(grepl("1", data$race), 1, 0)
data$raceBlack <- ifelse(grepl("2", data$race), 1, 0)
data$raceNative <- ifelse(grepl("3", data$race), 1, 0)
data$raceAsian <- ifelse(grepl("4", data$race), 1, 0)
data$racePacific <- ifelse(grepl("5", data$race), 1, 0)
data$raceOther <- ifelse(grepl("6", data$race), 1, 0)
data$raceDecline <- ifelse(grepl("7", data$race), 1, 0)
data$race <- NULL
data$ethnHispanic <- ifelse(grepl("1", data$ethnicity), 1, 0)
data$ethnicity <- NULL

#binary variable for those with graduate degree
data$grad_degree <- ifelse(is.na(data$education), 0, ifelse(data$education > 5, 1, 0))

#binary variable for practicing teachers
data$teaching <- ifelse(is.na(data$school_type), 0, 1)

#binar variable for above-the-median years of experience
data$teach_experience <- ifelse(data$years > median(data$years, na.rm = TRUE), 1, 0)

#minimum grade level currently taught
data$grade_min <- sapply(data$grade_levels, function(x) {
  # Split the string into a list of character numbers
  l <- strsplit(x, ",")[[1]]
  # Check if the list is empty or contains only NA values
  if (all(is.na(l)) || length(l) == 0) {
    return(NA)  # Return NA if no valid numbers
  } else {
    return(min(as.numeric(l), na.rm = TRUE))
  }
})

#maximum grade level currently taught
data$grade_max <- sapply(data$grade_levels, function(x) {
  l <- strsplit(x, ",")[[1]]
  # Check if the list is empty or contains only NA values
  if (all(is.na(l)) || length(l) == 0) {
    return(NA)  # Return NA if no valid numbers
  } else {
    return(max(as.numeric(l), na.rm = TRUE))
  }
})

#grade_range
data$grade_range <- sapply(data$grade_levels, function(x) {
  l <- strsplit(x, ",")[[1]]
  # Check if the list is empty or contains only NA values
  if (all(is.na(l)) || length(l) == 0) {
    return(NA)  # Return NA if no valid numbers
  } else {
    return(diff(range(as.numeric(l), na.rm = TRUE)))
  }
})


#Create a high school code and an elementary school code
data$hs_grades <- ifelse(is.na(data$grade_min), 0, ifelse(data$grade_min > 8, 1, 0))

data$es_grades <- ifelse(is.na(data$grade_min), 0, ifelse(data$grade_max < 7, 1, 0))


# Function to calculate the mean of the comma-separated numbers in each record of grade_levels
calculate_mean <- function(x) {
  num_values <- as.numeric(unlist(strsplit(x, ",")))  # Split, unlist, and convert to numeric
  mean(num_values)  # Calculate mean
}

# Apply the function to each element of data$grade_levels
data$grade_levels_M <- ifelse(data$grade_levels == "", NA, sapply(data$grade_levels, calculate_mean))

#remove the grade levels variable
data$grade_levels <- NULL

#simplify school type into four categories
data$school_type_cat <- ifelse(is.na(data$school_type), "None",
                               ifelse(data$school_type < 10, "Public",
                                      ifelse(data$school_type == 10, "BIE",
                                             ifelse(data$school_type < 30, "Private",
                                                    ifelse(data$school_type == 30, "MusOrg", data$school_type)))))


#binary variable for school poverty at 50% or greater as reported
data$school_poverty_high <- ifelse(is.na(data$school_poverty), 0, ifelse(data$school_poverty > 2, 1, 0))

#binary variable for publicly funded schools including BIE-funded
data$school_pub <- ifelse(is.na(data$school_type), 0, ifelse(data$school_type < 20, 1, 0))

#35 variables now

Interests and Experience

Number of interests and number of experience categories are also of interest. Create new variables for each and one for the difference between interests and experience categories.

data$inter_n <- sapply(data$interests, function(x) {
  l <- strsplit(x, ",")[[1]]
  length(l)
})

data$exper_n <- sapply(data$experience, function(x) {
  l <- strsplit(x, ",")[[1]]
  length(l)
})

data$IEdiff <- data$inter_n - data$exper_n

Scores

Create binary variables for the presence of each choice in EHDP and FORA scores.

all_items <-  unlist(strsplit("1.1,1.2,2.1,2.2,3,4,5,6,7,8,9,10.1,10.2,11.1,11.2", ","))
 
# Create binary variables for each unique value in EHDP
for (val in all_items) {
  data[[paste0("EHDP_",gsub("\\.", "_", val))]] <- sapply(data$EHDP, function(x) as.integer(val %in% strsplit(x, ",")[[1]]))
}

# Create binary variables for each unique value in FORA
for (val in all_items) {
  data[[paste0("FORA_",gsub("\\.", "_", val))]] <- sapply(data$FORA, function(x) as.integer(val %in% strsplit(x, ",")[[1]]))
}

Next, find the n, mean, and SD of both score variables for each respondent.

#EHDP Mean
data$EHDP_M <- sapply(data$EHDP, function(x) {
  # Split the string into a list of character numbers
  l <- strsplit(x, ",")[[1]]
  # Convert the character list to numeric, trim decimals, and calculate the mean
  mean(trunc(as.numeric(l)), na.rm = TRUE)
})

#FORA Mean
data$FORA_M <- sapply(data$FORA, function(x) {
  l <- strsplit(x, ",")[[1]]
  mean(trunc(as.numeric(l)), na.rm = TRUE)
})

#EHDP n
data$EHDP_n <- sapply(data$EHDP, function(x) {
  # Split the string into a list of character numbers
  l <- strsplit(x, ",")[[1]]
  # Count the n of items in the list
  length(l)
})

#FORA n
data$FORA_n <- sapply(data$FORA, function(x) {
  l <- strsplit(x, ",")[[1]]
  length(l)
})

#EHDP standard deviation
data$EHDP_SD <- sapply(data$EHDP, function(x) {
  # Split the string into a list of character numbers
  l <- strsplit(x, ",")[[1]]
  # Convert the character list to numeric, trim decimals, and calculate the SD
  sd(trunc(as.numeric(l)), na.rm = TRUE)
})

#FORA stamdard devitation
data$FORA_SD <- sapply(data$FORA, function(x) {
  l <- strsplit(x, ",")[[1]]
  sd(trunc(as.numeric(l)), na.rm = TRUE)
})

#Okay to remove EHDP and FORA variables which were comma-separated lists of type char
data[, 1:2] <- NULL

#standardize scores
data$EHDP_M_z <- scale(data$EHDP_M)
data$FORA_M_z <- scale(data$FORA_M)

#combine flexibility in thinking about both constructs into one variable
#add FORA_SD and EHDP_SD
data$flexibility <- data$FORA_SD + data$EHDP_SD

#combine n for both constructs into one variable -- agreeableness
data$agreeableness <- data$FORA_n + data$EHDP_n

Interests and Experience Factors

Binary variables for experience and interest are:

  1. art club

  2. honor society

  3. award

  4. AP

  5. IB

  6. GT

  7. magnet

  8. postsec credit

  9. mentorship/internship

  10. other
    0 = none of the above.

#here is a list of all the interests and experience codes
all_items <-  c(0,1,2,3,4,5,6,7,8,9,10)
 
# Create binary variables for each unique value in interests
for (val in all_items) {
  data[[paste0("inter_", val)]] <- sapply(data$interests, function(x) as.integer(val %in% strsplit(x, ",")[[1]]))
}

# Create binary variables for each unique value in experience
for (val in all_items) {
  data[[paste0("exper_", val)]] <- sapply(data$experience, function(x) as.integer(val %in% strsplit(x, ",")[[1]]))
}

Histogram of Scores

hist(data$EHDP_M)

hist(data$FORA_M)

Leptokuric distribution of EHDP scores. Platykuric distribution of FORA scores.

Describe score variables

describe(data$EHDP_M)
##    vars  n mean   sd median trimmed  mad  min  max range skew kurtosis   se
## X1    1 87 6.41 0.83   6.36    6.39 0.46 3.33 10.5  7.17 0.86     7.21 0.09
describe(data$FORA_M)
##    vars  n mean   sd median trimmed  mad  min  max range  skew kurtosis   se
## X1    1 87 4.75 1.15   4.92    4.83 1.19 1.33 7.08  5.75 -0.76     0.65 0.12

Histogram of Score SD

hist(data$EHDP_SD)

hist(data$FORA_SD)

List potential grouping variables

ordinal_variables <- c('education','years', 'school_poverty', 'locale')
nominal_variables <-c('school_type_cat')
binary_group_variables <- c('preservice',
                       'practicing',
                       'researcher',
                       'raceWhite',
                       'raceBlack',
                       'raceNative',
                       'raceAsian',
                       'racePacific',
                       'raceOther',
                       'ethnHispanic',
                       'grad_degree',
                       'teaching',
                       'teach_experience',
                       'hs_grades',
                       'es_grades',
                       'school_poverty_high',
                       'school_pub'
                       )
binary_exper_variables <- grep("^exper_[0-9]", colnames(data), value = TRUE)
continuous_variables <-c('grade_range',
                         'grade_levels_M',
                         'inter_n',
                         'exper_n',
                         'IEdiff',
                         'EHDP_M',
                         'FORA_M',
                         'flexibility',
                         'agreeableness',
                         'feedback_1'
                         )

Count the number of each experience variable.

# Use lapply to sum each column specified in binary_exper_variables
sums_list <- lapply(binary_exper_variables, function(var) {
  sum(data[[var]], na.rm = TRUE)
})

# Combine the results into a dataframe
sums_df <- data.frame(
  Variable = binary_exper_variables,
  Count = unlist(sums_list)
)

names <- c("None", "Art Club", "Honor Society", "Awards", "AP", "IB","Gifted & Talented","Magnet School", "Dual Credit", "Mentorship/Internship", "Other")


sums_df$Label <- names
sums_df$Proportion <- round(sums_df$Count / nrow(data),2) 
sums_df <- sums_df %>% arrange(desc(Proportion))
# Display the resulting dataframe
print(sums_df)
##    Variable Count                 Label Proportion
## 1   exper_1    65              Art Club       0.75
## 2   exper_3    54                Awards       0.62
## 3   exper_9    38 Mentorship/Internship       0.44
## 4   exper_4    37                    AP       0.43
## 5   exper_2    31         Honor Society       0.36
## 6   exper_6    30     Gifted & Talented       0.34
## 7   exper_8    22           Dual Credit       0.25
## 8   exper_5    10                    IB       0.11
## 9   exper_7     8         Magnet School       0.09
## 10  exper_0     6                  None       0.07
## 11 exper_10     5                 Other       0.06
write.csv(sums_df, paste0("Experience Counts ", date_suffix, ".csv"))

Finding 1: Descriptive

The table above counts the frequency of each type of experience selected by respondents. Greatest proportion are Art Club & Awards > 50%. Next areMentorship & AP > 40%, then Honor Society and G&T > 30%.

Visualize FORA by Grouping Variables

First the experience groups.

# List of grouping variables
group_vars <- binary_exper_variables

# Initialize an empty list to store plots
plot_list <- list()

# Loop through each grouping variable and create plots
for (i in seq_along(group_vars)) {
  group_var <- group_vars[i]
  
  # Create the plot for each grouping variable
  p <- ggplot(data, aes_string(x = paste0("as.factor(", group_var, ")"), y = "FORA_M")) +
    geom_boxplot() +
    labs(x = group_var,
         y = "FORA") +
    theme_minimal()
  
  # Store the plot in the list
  plot_list[[i]] <- p
}
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
# Arrange all plots in a grid (2 rows by 5 columns as an example)
plot_grid <- grid.arrange(grobs = plot_list, ncol = 4)

Next other grouping variables

# List of grouping variables
group_vars <- binary_group_variables

# Initialize an empty list to store plots
plot_list <- list()

# Loop through each grouping variable and create plots
for (i in seq_along(group_vars)) {
  group_var <- group_vars[i]
  
  # Create the plot for each grouping variable
  p <- ggplot(data, aes_string(x = paste0("as.factor(", group_var, ")"), y = "FORA_M")) +
    geom_boxplot() +
    labs(x = group_var,
         y = "FORA") +
    theme_minimal()
  
  # Store the plot in the list
  plot_list[[i]] <- p
}

# Arrange all plots in a grid (2 rows by 5 columns as an example)
plot_grid <- grid.arrange(grobs = plot_list, ncol = 4)

Visualize EHDP by Group

First the experience groups

# List of grouping variables
group_vars <- binary_exper_variables

# Initialize an empty list to store plots
plot_list <- list()
group_count <- data.frame()

# Loop through each grouping variable and create plots
for (i in seq_along(group_vars)) {
  group_var <- group_vars[i]
  
  # Create the plot for each grouping variable
  p <- ggplot(data, aes_string(x = paste0("as.factor(", group_var, ")"), y = "EHDP_M")) +
    geom_boxplot() +
    labs(x = group_var,
         y = "EHDP") +
    theme_minimal()
  
  # Store the plot in the list
  plot_list[[i]] <- p
  
  
}

# Arrange all plots in a grid
plot_grid <- grid.arrange(grobs = plot_list, ncol = 6)

Then the other grouping variables

# List of grouping variables
group_vars <- binary_group_variables

# Initialize an empty list to store plots
plot_list <- list()

# Loop through each grouping variable and create plots
for (i in seq_along(group_vars)) {
  group_var <- group_vars[i]
  
  # Create the plot for each grouping variable
  p <- ggplot(data, aes_string(x = paste0("as.factor(", group_var, ")"), y = "EHDP_M")) +
    geom_boxplot() +
    labs(x = group_var,
         y = "EHDP") +
    theme_minimal()
  
  # Store the plot in the list
  plot_list[[i]] <- p
}

# Arrange all plots in a grid
plot_grid <- grid.arrange(grobs = plot_list, ncol = 6)

Test the differences in continuous variables for each group and others not in group.

Loop through grouping variables and see which if any continuous variables have a significant difference by group.

Experience groups

outcomes <- continuous_variables
groups <- binary_exper_variables

# Initialize a list to store results
results_list <- list()

# Loop over each outcome variable
for (outcome in outcomes) {
  
  # Initialize a list to store results for the current outcome
  outcome_results_list <- list()
  
  
  # Loop over each group variable
  for (group in groups) {
    
    # Perform a t-test comparing the means of the current outcome variable grouping variable
    t_test_result <- t.test(data[[outcome]] ~ data[[group]])

  
    # Create a dataframe with the results for this outcome variable for all groups
    results <- data.frame(
      Outcome = outcome,
      Group = group,
      P_Value = round(t_test_result$p.value, 3)
      )
  
    # Append the result to the outcome's results list
    outcome_results_list[[group]] <- results
  }

  # Combine all results for this outcome into a single dataframe
  outcome_results_df <- do.call(rbind, outcome_results_list)
    
  # Append the combined results to the main results list
  results_list[[outcome]] <- outcome_results_df
}

# Combine all results into a single data frame
final_results <- do.call(rbind, results_list)

significant_results_experience <- final_results %>%
  filter(P_Value < .05)

print(significant_results_experience)
##                               Outcome    Group P_Value
## grade_range.exper_6       grade_range  exper_6   0.010
## grade_range.exper_9       grade_range  exper_9   0.047
## grade_levels_M.exper_2 grade_levels_M  exper_2   0.001
## grade_levels_M.exper_3 grade_levels_M  exper_3   0.029
## grade_levels_M.exper_4 grade_levels_M  exper_4   0.000
## inter_n.exper_2               inter_n  exper_2   0.017
## inter_n.exper_4               inter_n  exper_4   0.048
## exper_n.exper_0               exper_n  exper_0   0.000
## exper_n.exper_1               exper_n  exper_1   0.001
## exper_n.exper_2               exper_n  exper_2   0.000
## exper_n.exper_3               exper_n  exper_3   0.000
## exper_n.exper_4               exper_n  exper_4   0.000
## exper_n.exper_6               exper_n  exper_6   0.000
## exper_n.exper_7               exper_n  exper_7   0.020
## exper_n.exper_8               exper_n  exper_8   0.001
## exper_n.exper_9               exper_n  exper_9   0.000
## IEdiff.exper_0                 IEdiff  exper_0   0.010
## IEdiff.exper_3                 IEdiff  exper_3   0.003
## IEdiff.exper_4                 IEdiff  exper_4   0.014
## IEdiff.exper_7                 IEdiff  exper_7   0.043
## IEdiff.exper_8                 IEdiff  exper_8   0.009
## IEdiff.exper_9                 IEdiff  exper_9   0.040
## flexibility.exper_10      flexibility exper_10   0.011
## agreeableness.exper_2   agreeableness  exper_2   0.013
## feedback_1.exper_9         feedback_1  exper_9   0.029

Next other binary grouping variables

outcomes <- continuous_variables
groups <- c('preservice',
           'researcher',
           'raceWhite',
           'raceNative',
           'raceOther',
           'ethnHispanic',
           'grad_degree',
           'hs_grades',
           'es_grades',
           'school_poverty_high',
           'school_pub'
           )
#groups <- binary_group_variables


# Initialize a list to store results
results_list <- list()

# Loop over each outcome variable
for (outcome in outcomes) {
  
  # Initialize a list to store results for the current outcome
  outcome_results_list <- list()
  
  
  # Loop over each group variable
  for (group in groups) {
    
    # Perform a t-test comparing the means of the current outcome variable grouping variable
    t_test_result <- t.test(data[[outcome]] ~ as.factor(data[[group]]))

  
    # Create a dataframe with the results for this outcome variable for all groups
    results <- data.frame(
      Outcome = outcome,
      Group = group,
      P_Value = round(t_test_result$p.value, 3)
      )
  
    # Append the result to the outcome's results list
    outcome_results_list[[group]] <- results
  }

  # Combine all results for this outcome into a single dataframe
  outcome_results_df <- do.call(rbind, outcome_results_list)
    
  # Append the combined results to the main results list
  results_list[[outcome]] <- outcome_results_df
}

# Combine all results into a single data frame
final_results <- do.call(rbind, results_list)

significant_results_group <- significant_results_group <- final_results[final_results$P_Value < 0.05, ]


print(significant_results_group)
##                                  Outcome      Group P_Value
## grade_range.hs_grades        grade_range  hs_grades   0.000
## grade_levels_M.raceWhite  grade_levels_M  raceWhite   0.030
## grade_levels_M.raceNative grade_levels_M raceNative   0.037
## grade_levels_M.hs_grades  grade_levels_M  hs_grades   0.000
## grade_levels_M.es_grades  grade_levels_M  es_grades   0.000
## exper_n.es_grades                exper_n  es_grades   0.011
## IEdiff.raceNative                 IEdiff raceNative   0.026
## FORA_M.preservice                 FORA_M preservice   0.001
## flexibility.raceWhite        flexibility  raceWhite   0.011
## agreeableness.preservice   agreeableness preservice   0.018

Significant differences linear models of effect

Use this dataframe filtered by P-value <.05 to examine the any linear relationship between the outcome variable and the group belonging

# Step 1: Filter the results to keep only rows with p-value < 0.05
significant_results <- rbind(significant_results_experience, significant_results_group)

# Initialize a list to store regression results
regression_results_list <- list()

# Step 2: Perform linear regression for each significant outcome and exper variable combination
for (i in 1:nrow(significant_results)) {
  
  # Extract the outcome variable and exper variable names
  outcome_var <- significant_results$Outcome[i]
  exper_var <- significant_results$Group[i]
  
  # Create a linear model formula dynamically
  formula <- as.formula(paste(outcome_var, "~", exper_var))
  
  # Fit the linear model using lm()
  model <- lm(formula, data = data)
  
  # Store the summary of the model results
  model_summary <- summary(model)
  
  # Extract coefficients, R-squared, and p-value
  coef <- model_summary$coefficients
  r_squared <- model_summary$r.squared
  p_value <- round(coef[2, 4], 3) # p-value for the slope (exper_n variable)
  
  # Create a dataframe for the regression results
  regression_result <- data.frame(
    Outcome = outcome_var,
    Group = exper_var,
    Coefficient = coef[2, 1], # slope coefficient
    R_Squared = r_squared,
    P_Value = p_value
  )
  
  # Add the result to the list
  regression_results_list[[i]] <- regression_result
}

# Combine all regression results into one dataframe
all_regression_results <- do.call(rbind, regression_results_list)

all_regression_results <- all_regression_results %>%
  arrange(Outcome, desc(R_Squared))

print(all_regression_results)
##           Outcome      Group Coefficient   R_Squared P_Value
## 1          FORA_M preservice   0.9633753 0.031134439   0.102
## 2          IEdiff    exper_3  -1.6919192 0.101365876   0.003
## 3          IEdiff    exper_7  -2.8022152 0.098615434   0.003
## 4          IEdiff    exper_8  -1.7174825 0.083819590   0.007
## 5          IEdiff    exper_0   2.7716049 0.074186310   0.011
## 6          IEdiff    exper_4  -1.3648649 0.068481913   0.014
## 7          IEdiff    exper_9  -1.1654135 0.050253306   0.037
## 8          IEdiff raceNative  -1.2976190 0.008431791   0.398
## 9   agreeableness    exper_2   2.6877880 0.060950714   0.021
## 10  agreeableness preservice   5.6596386 0.051683810   0.034
## 11        exper_n    exper_4   2.3918919 0.431828708   0.000
## 12        exper_n    exper_9   2.1186896 0.341013072   0.000
## 13        exper_n    exper_2   2.1532258 0.328386702   0.000
## 14        exper_n    exper_3   2.0050505 0.292290716   0.000
## 15        exper_n    exper_6   1.9578947 0.267443149   0.000
## 16        exper_n    exper_8   1.6804196 0.164751051   0.000
## 17        exper_n    exper_7   2.4588608 0.155898393   0.000
## 18        exper_n    exper_0  -2.7037037 0.144947640   0.000
## 19        exper_n    exper_1   1.4223776 0.118038192   0.001
## 20        exper_n  es_grades  -1.1546218 0.064726659   0.017
## 21     feedback_1    exper_9 -10.0069686 0.065270744   0.026
## 22    flexibility  raceWhite  -0.5543625 0.070013930   0.013
## 23    flexibility   exper_10   0.5097435 0.019541704   0.197
## 24 grade_levels_M  hs_grades   5.5528205 0.685449277   0.000
## 25 grade_levels_M  es_grades  -5.9377204 0.572440936   0.000
## 26 grade_levels_M    exper_4   2.7946886 0.178256089   0.000
## 27 grade_levels_M    exper_2   2.3974224 0.126258099   0.002
## 28 grade_levels_M    exper_3   1.7676104 0.066679507   0.025
## 29 grade_levels_M  raceWhite   1.9251282 0.054925764   0.043
## 30 grade_levels_M raceNative  -3.5780983 0.045537855   0.066
## 31    grade_range  hs_grades  -3.1555556 0.211875716   0.000
## 32    grade_range    exper_6   2.5298273 0.128512963   0.002
## 33    grade_range    exper_9   1.6774194 0.060491370   0.033
## 34        inter_n    exper_2   1.2759217 0.062064443   0.020
## 35        inter_n    exper_4   1.0270270 0.042852937   0.054

Results of early linear models reveal little explanatory power of single binary groups and outcome variables. Suggest abstraction with PCA may be worthwhile.

Loop through each outcome variable for each group and describe for in- and out-of group

# Initialize a list to store descriptive statistics results
descriptive_stats_list <- list()

groups <- c(binary_exper_variables, binary_group_variables)

# Outer loop: iterate over each outcome variable
for (outcome in outcomes) {
  
  # Inner loop: iterate over each experience group variable
  for (group in groups) {
    
    # Filter data for exper_n = 1 and calculate descriptive statistics
    group_1_stats <- data %>%
      filter(get(group) == 1) %>%
      summarize(
        Outcome = outcome,
        Group_Name = group,
        Group = 1,
        Mean = mean(get(outcome), na.rm = TRUE),
        SD = sd(get(outcome), na.rm = TRUE),
        Median = median(get(outcome), na.rm = TRUE),
        Min = min(get(outcome), na.rm = TRUE),
        Max = max(get(outcome), na.rm = TRUE),
        N = n()
      )
    
    # Filter data for exper_n = 0 and calculate descriptive statistics
    group_0_stats <- data %>%
      filter(get(group) == 0) %>%
      summarize(
        Outcome = outcome,
        Group_Name = group,
        Group = 0,
        Mean = mean(get(outcome), na.rm = TRUE),
        SD = sd(get(outcome), na.rm = TRUE),
        Median = median(get(outcome), na.rm = TRUE),
        Min = min(get(outcome), na.rm = TRUE),
        Max = max(get(outcome), na.rm = TRUE),
        N = n()
      )
    
    # Combine statistics for both groups
    combined_stats <- bind_rows(group_1_stats, group_0_stats)
    
    # Append the result to the list
    descriptive_stats_list[[length(descriptive_stats_list) + 1]] <- combined_stats
  }
}

# Combine all descriptive statistics results into one dataframe
final_descriptive_stats <- bind_rows(descriptive_stats_list)

#write this to csv for later reference

write_csv(final_descriptive_stats, paste0("Describe by Group ", date_suffix,".csv"))

Scores to Scores EHDP to FORA Association

Plot

data_clean <- data %>%
  filter(EHDP_M > 4, EHDP_M < 8)
model_scores <- lm(EHDP_M ~ FORA_M, data = data_clean)
r_squared_scores <- summary(model_scores)$r.squared

ggplot(data_clean, aes(x = FORA_M, y = EHDP_M)) +
  geom_point(size = 3, alpha = 0.7) +  # Scatter plot points
  geom_smooth(method = "lm", se = FALSE) +  # Add trend lines
  labs(title = "Scatterplot of Scores with Trend Lines",
       x = "FORA_M",
       y = "EHDP_M") +
  theme_minimal() +
  annotate("text", x = max(data_clean$FORA_M) * 0.7, y = max(data_clean$EHDP_M), 
           label = paste("R²:", round(r_squared_scores, 3)), 
           color = "blue", hjust = 0)
## `geom_smooth()` using formula = 'y ~ x'

Consider each outlier and whether or not there is justification to remove from the model.

Outliers are 4 > EHDP_M > 8

Even after cleaning out these outliers, the the two variables have minimal explanatory value for each other.

Association of scores to scores

summary(model_scores)
## 
## Call:
## lm(formula = EHDP_M ~ FORA_M, data = data_clean)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.42042 -0.22931  0.02374  0.33957  1.18654 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  6.58265    0.27424  24.003   <2e-16 ***
## FORA_M      -0.04807    0.05570  -0.863    0.391    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5632 on 81 degrees of freedom
## Multiple R-squared:  0.00911,    Adjusted R-squared:  -0.003123 
## F-statistic: 0.7447 on 1 and 81 DF,  p-value: 0.3907

FORA score not a significant predictor of variance in EHDP score.

EHDP examine agreeableness/flexibility and extremity of scores

model <- lm(EHDP_n ~ abs(EHDP_M_z), data = data)

summary(model)
## 
## Call:
## lm(formula = EHDP_n ~ abs(EHDP_M_z), data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.0299 -1.7976  0.7144  1.9693  4.9701 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    11.1793     0.3705  30.176  < 2e-16 ***
## abs(EHDP_M_z)  -2.3352     0.3726  -6.267 1.46e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.679 on 85 degrees of freedom
## Multiple R-squared:  0.316,  Adjusted R-squared:  0.308 
## F-statistic: 39.28 on 1 and 85 DF,  p-value: 1.461e-08

Finding 2a: Association

Reasonably strong association at R-squared of .32 with distance from mean of EDHP being a significant predictor of EHDP number. The more extreme your views about EHDP the fewer items you agree with, and more importantly, the more central your views about EHDP the more agreeable you are in selecting more items on the scale.

model <- lm(EHDP_SD ~ abs(EHDP_M_z), data = data)
summary(model)
## 
## Call:
## lm(formula = EHDP_SD ~ abs(EHDP_M_z), data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.6791 -0.3103 -0.1255  0.1852  3.2423 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    4.00332    0.08295   48.26  < 2e-16 ***
## abs(EHDP_M_z) -0.35455    0.08343   -4.25 5.46e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5998 on 85 degrees of freedom
## Multiple R-squared:  0.1752, Adjusted R-squared:  0.1655 
## F-statistic: 18.06 on 1 and 85 DF,  p-value: 5.457e-05

Finding 2b: Association

This reveals significant but small negative relationships between extremity of views about the EHDP variable and breadth of statements endorsed.

This just means that average scores on EHDP are not made by selecting only a few statements in the middle, but by selecting a balance of statements on both sides of the mean.

With more data, this pattern may support the hypothesis that those in the middle have flexible views, endorsing statements on both sides of the continuum. It’s a given that those on the extremes of EHDP score would not have endorsed as many items as it’s the only way to keep your score on the extreme, but it is interesting to know that that’s not common in the middle.

FORA examine agreeableness/flexibility and extremity of scores

model <- lm(FORA_n ~ abs(FORA_M_z), data = data)
summary(model)
## 
## Call:
## lm(formula = FORA_n ~ abs(FORA_M_z), data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.9977 -1.4152 -0.0182  1.4303  6.2044 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    10.0445     0.4604  21.815   <2e-16 ***
## abs(FORA_M_z)  -1.1521     0.4631  -2.488   0.0148 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.625 on 85 degrees of freedom
## Multiple R-squared:  0.06787,    Adjusted R-squared:  0.0569 
## F-statistic: 6.189 on 1 and 85 DF,  p-value: 0.01481

Distance from mean of FORA_M is not a good predictor of number of items endorsed on construct. Significant but minimal explanatory.

model <- lm(FORA_SD ~ abs(FORA_M_z), data = data)
summary(model)
## 
## Call:
## lm(formula = FORA_SD ~ abs(FORA_M_z), data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.2027 -0.4397 -0.1036  0.3602  1.3258 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    3.64613    0.09948  36.651  < 2e-16 ***
## abs(FORA_M_z) -0.69534    0.10006  -6.949 6.98e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5671 on 85 degrees of freedom
## Multiple R-squared:  0.3623, Adjusted R-squared:  0.3548 
## F-statistic: 48.29 on 1 and 85 DF,  p-value: 6.982e-10

Finding: Association

Significant but small explanation of the range of FORA items endorsed by the extremity of FORA views. Like EHDP, this means that scores at the grand mean are not just agreeing to a few middle-scoring items but agreeing to more items on opposite ends of the continuum.

Subset data for correlation scores, sd, n, interests, experience, . . .

data_subset <- data %>%
  select(inter_n, 
         exper_n,
         flexibility, 
         agreeableness,
         years,
         school_poverty,
         locale,
         feedback_1
         )

#remove NAs - still 66 obs as of 11/3/24
data_subset <- na.omit(data_subset)

This subset includes the following variables for each of 66 individuals filtered:

col_names <- names(data_subset)
summary <- data.frame()
for (col_name in col_names) {
  summary <- rbind(summary, data.frame(describe(data_subset[col_name])))
}
summary <- summary[2:nrow(summary),]
print(summary)
##                vars  n      mean         sd    median   trimmed        mad
## exper_n           1 66  3.666667  1.8088103  3.000000  3.574074  1.4826000
## flexibility       1 66  6.802487  0.8430139  7.018548  6.937982  0.5071747
## agreeableness     1 66 19.378788  5.0314859 21.000000 19.574074  5.1891000
## years             1 66 15.621212  5.0801731 19.000000 16.537037  0.0000000
## school_poverty    1 66  2.166667  1.3540064  2.000000  2.203704  1.4826000
## locale            1 66  2.348485  0.9998834  2.000000  2.314815  0.0000000
## feedback_1        1 66 73.636364 19.6614706 80.000000 75.000000 14.8260000
##                     min        max      range        skew    kurtosis        se
## exper_n        1.000000   9.000000   8.000000  0.49440656 -0.08015521 0.2226492
## flexibility    3.137345   7.910654   4.773309 -2.15063142  5.89958990 0.1037678
## agreeableness  7.000000  29.000000  22.000000 -0.37332793 -0.66149077 0.6193331
## years          3.000000  19.000000  16.000000 -1.31177570  0.35674953 0.6253261
## school_poverty 0.000000   4.000000   4.000000  0.03119674 -1.20730028 0.1666667
## locale         1.000000   4.000000   3.000000  0.54970941 -0.84566509 0.1230771
## feedback_1     0.000000 100.000000 100.000000 -1.01923596  1.37918843 2.4201597

Examine correlations between this subset of variables

cor_matrix <- cor(data_subset)
print(cor_matrix)
##                    inter_n      exper_n flexibility agreeableness       years
## inter_n         1.00000000  0.419594726  0.26540763    0.45373522  0.08844577
## exper_n         0.41959473  1.000000000  0.07904636    0.17636826  0.40293134
## flexibility     0.26540763  0.079046361  1.00000000    0.54225915 -0.12672422
## agreeableness   0.45373522  0.176368259  0.54225915    1.00000000  0.11945548
## years           0.08844577  0.402931336 -0.12672422    0.11945548  1.00000000
## school_poverty  0.15376824  0.067004137 -0.09349146    0.01091482 -0.06225190
## locale         -0.06089231 -0.164456473 -0.23009648   -0.19177522 -0.12807804
## feedback_1      0.11125273 -0.008651823  0.10845763    0.21447036  0.07869294
##                school_poverty      locale   feedback_1
## inter_n            0.15376824 -0.06089231  0.111252725
## exper_n            0.06700414 -0.16445647 -0.008651823
## flexibility       -0.09349146 -0.23009648  0.108457629
## agreeableness      0.01091482 -0.19177522  0.214470358
## years             -0.06225190 -0.12807804  0.078692939
## school_poverty     1.00000000  0.02462117 -0.144474080
## locale             0.02462117  1.00000000 -0.182835998
## feedback_1        -0.14447408 -0.18283600  1.000000000

How high are the max correlations?

# Find pairs with correlation above 0.3
high_corr_pairs <- which(abs(cor_matrix) > 0.3 & upper.tri(cor_matrix), arr.ind = TRUE)

# Create a dataframe of these pairs with their correlation coefficients
high_cor_list <- data.frame(
  Var1 = rownames(cor_matrix)[high_corr_pairs[, 1]],
  Var2 = colnames(cor_matrix)[high_corr_pairs[, 2]],
  Correlation = cor_matrix[high_corr_pairs]
)

# Print the list
print(high_cor_list)
##          Var1          Var2 Correlation
## 1     inter_n       exper_n   0.4195947
## 2     inter_n agreeableness   0.4537352
## 3 flexibility agreeableness   0.5422592
## 4     exper_n         years   0.4029313

Highest is flexibility and agreeableness which is at 0.54, still not too strong.

PCA of standardized subset of variables

These are just the non-score variables that preliminary analysis has shown may be of somevalue in distinguishing individuals.

data_standardized <- scale(data_subset)
pca_result <- prcomp(data_standardized, center = TRUE, scale. = TRUE)

# Summary of PCA results
summary(pca_result)
## Importance of components:
##                           PC1    PC2    PC3    PC4    PC5     PC6     PC7
## Standard deviation     1.4871 1.1802 1.1068 0.9474 0.9261 0.79554 0.67367
## Proportion of Variance 0.2764 0.1741 0.1531 0.1122 0.1072 0.07911 0.05673
## Cumulative Proportion  0.2764 0.4505 0.6037 0.7159 0.8231 0.90217 0.95890
##                           PC8
## Standard deviation     0.5734
## Proportion of Variance 0.0411
## Cumulative Proportion  1.0000
# View the loadings (coefficients) of the original variables on the principal components
print(pca_result$rotation)
##                         PC1         PC2         PC3         PC4         PC5
## inter_n        -0.471650124 -0.14594319  0.33795308  0.21339357  0.22546797
## exper_n        -0.377112630 -0.53607205 -0.02160338  0.08038653 -0.15787730
## flexibility    -0.416871326  0.45782162  0.14902993  0.06230727 -0.34398719
## agreeableness  -0.526363718  0.22460559  0.13019678  0.13687767  0.02196996
## years          -0.213657961 -0.54441097 -0.42690411  0.09141006 -0.03642895
## school_poverty -0.009102141 -0.26038916  0.60421907 -0.56656538  0.32656601
## locale          0.294298875 -0.08367648  0.32637284  0.76833144  0.29960951
## feedback_1     -0.228357961  0.24513561 -0.43853490 -0.07548624  0.77954525
##                       PC6         PC7           PC8
## inter_n         0.3817902  0.49219911  0.3970366694
## exper_n         0.4071517 -0.48995582 -0.3643208857
## flexibility    -0.1234681 -0.50352646  0.4510156134
## agreeableness  -0.4526083  0.24584206 -0.6091236360
## years          -0.5754286  0.05775334  0.3627443125
## school_poverty -0.2958805 -0.22063754  0.0562677782
## locale         -0.1975636 -0.28414210  0.0001722078
## feedback_1      0.1032975 -0.26637008  0.0208796348
pca_result_rotation_df <- data.frame(pca_result$rotation)

Scree plot to examine PCs

screeplot(pca_result, type = "lines")

Biplot of PCs will show the directionality of components on axis that are the first two PCs, also locating each individual on the plot.

biplot(pca_result, scale = 0)

Loadings of PCs

print(pca_result$rotation[, 1:5])  # Show loadings for the first 5 components
##                         PC1         PC2         PC3         PC4         PC5
## inter_n        -0.471650124 -0.14594319  0.33795308  0.21339357  0.22546797
## exper_n        -0.377112630 -0.53607205 -0.02160338  0.08038653 -0.15787730
## flexibility    -0.416871326  0.45782162  0.14902993  0.06230727 -0.34398719
## agreeableness  -0.526363718  0.22460559  0.13019678  0.13687767  0.02196996
## years          -0.213657961 -0.54441097 -0.42690411  0.09141006 -0.03642895
## school_poverty -0.009102141 -0.26038916  0.60421907 -0.56656538  0.32656601
## locale          0.294298875 -0.08367648  0.32637284  0.76833144  0.29960951
## feedback_1     -0.228357961  0.24513561 -0.43853490 -0.07548624  0.77954525

They include:

Variable M
Agreeableness 18.9
Experience with advanced programs 3.5
Flexibility 6.9
Interests in advanced programs 5.5
Locale rurality 2.3
School poverty level 2.2
Years of teaching art 15.8
# Convert the PCA rotation matrix to a data frame for easier manipulation
rotation_df <- as.data.frame(pca_result$rotation[, 1:4])

# Ensure row names (variables) are added as a column to the data frame
rotation_df$Variable <- rownames(rotation_df)

# Reshape the data for ggplot2 (using 'melt' function)
rotation_melt <- melt(rotation_df, id.vars = "Variable", variable.name = "PC", value.name = "Loading")

# Plot
ggplot(rotation_melt, aes(x = Variable, y = Loading, fill = PC)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(x = "Variables", y = "Contribution to PC", title = "PCA Component Contributions") +
  theme_minimal() +
  scale_fill_discrete(name = "Principal Components") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

eigenvalues <- (pca_result$sdev)^2

print(eigenvalues)
## [1] 2.2113382 1.3928608 1.2250429 0.8976274 0.8576443 0.6328768 0.4538251
## [8] 0.3287845

Finding 3: Interpretation of Principal Components

Principal components analysis was conducted to abstract from the scaled continuous and ordinal variables.

PC1: Outsider skeptic

Eigenvalue of 2.2 explains a substantial portion of the variance. Much lower agreeableness and flexibility and second-lowest feedback rating. Experience and especially interest in fewer types of programs than others Fewer years than average, but not the fewest. Most rural schools, average school poverty.

PC2: Open, positive, & fresh

Eigenvalue of 1.4 explains a substantial amount of the variance, but less than PC1. Highest agreeableness, flexibility, and feedback ratings Interested in more types of programs than the few types of programs they have experience with - the fewest of all groups. Newest to teaching art. A little below average rurality and lowest school poverty level (suburban)

PC3: Cautious and mid-career at rural and at the poorest schools

Eigenvalue of 1.2 explains a substantial amount of the variance, but less than PC1 & PC2. Above-average agreeableness and flexibility, lowest feedback ratings. Experienced with average number of types of programs Interested in the greatest number of types of programs Second-least number of years of teaching than than other groups. At schools that are the most poor and second-most rural

PC4: Focused, experienced, more urban and less schools

Eigenvalue of 0.9 just below the threshold for usefulness in abstraction. About average agreeableness, flexibility, and feedback ratings Interested in fewer types of programs than has experience with Greatest number of years of teaching Working at the most urban schools with above-average poverty

Do grouping variables predict PCs?

First prepare the dataset

# Convert row names to a column called "Row_Name"
data <- data %>%
  tibble::rownames_to_column(var = "Row_Name")

data_subset <- data_subset %>%
  tibble::rownames_to_column(var = "Row_Name")

# Now perform the semi_join on the new "Row_Name" column
data_clean <- data %>%
  semi_join(data_subset, by = "Row_Name")

data_clean$PC1 <- pca_result$x[,1]
data_clean$PC2 <- pca_result$x[,2]
data_clean$PC3 <- pca_result$x[,3]
data_clean$PC4 <- pca_result$x[,4]

# Combine all predictor vectors into one
all_predictors <- c(binary_exper_variables, binary_group_variables)

# Create the formula as a string
predictors_formula <- paste(all_predictors, collapse = " + ")

PC1 ~

# Create the full formula for the linear model
model_formula <- as.formula(paste("PC1 ~", predictors_formula))
model <- lm(model_formula, data = data_clean)
summary(model)
## 
## Call:
## lm(formula = model_formula, data = data_clean)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.6055 -0.5787 -0.0665  0.5828  3.4923 
## 
## Coefficients: (5 not defined because of singularities)
##                     Estimate Std. Error t value Pr(>|t|)  
## (Intercept)          2.08243    1.12644   1.849   0.0716 .
## exper_0             -1.86795    0.98786  -1.891   0.0655 .
## exper_1             -0.79137    0.53274  -1.485   0.1449  
## exper_2             -0.94256    0.41454  -2.274   0.0282 *
## exper_3             -0.04742    0.51875  -0.091   0.9276  
## exper_4             -0.08206    0.45647  -0.180   0.8582  
## exper_5              0.48457    0.69467   0.698   0.4893  
## exper_6             -0.89593    0.43743  -2.048   0.0468 *
## exper_7             -0.11603    0.75372  -0.154   0.8784  
## exper_8             -0.45093    0.46122  -0.978   0.3338  
## exper_9             -0.21517    0.45073  -0.477   0.6356  
## exper_10            -0.38811    0.77372  -0.502   0.6186  
## preservice          -1.36554    1.07717  -1.268   0.2119  
## practicing                NA         NA      NA       NA  
## researcher          -0.16535    0.59978  -0.276   0.7841  
## raceWhite           -0.50569    0.69019  -0.733   0.4678  
## raceBlack                 NA         NA      NA       NA  
## raceNative          -1.25929    1.04505  -1.205   0.2349  
## raceAsian           -0.58047    1.30522  -0.445   0.6588  
## racePacific               NA         NA      NA       NA  
## raceOther           -1.61449    1.24443  -1.297   0.2016  
## ethnHispanic        -0.11670    0.59315  -0.197   0.8450  
## grad_degree         -0.62189    0.40062  -1.552   0.1281  
## teaching                  NA         NA      NA       NA  
## teach_experience          NA         NA      NA       NA  
## hs_grades            0.21954    0.47698   0.460   0.6477  
## es_grades            0.75582    0.52100   1.451   0.1543  
## school_poverty_high  0.24739    0.37187   0.665   0.5095  
## school_pub           0.36747    0.68875   0.534   0.5965  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.273 on 42 degrees of freedom
## Multiple R-squared:  0.5265, Adjusted R-squared:  0.2672 
## F-statistic:  2.03 on 23 and 42 DF,  p-value: 0.02281

Finding 4a: PC1 ~ -HS & -GT

The “Outsider Skeptic” not likely to have experience in Honor Society and Gifted & Talented programs.

model <- lm(PC1 ~ exper_2 + exper_6, data = data_clean)
summary(model)
## 
## Call:
## lm(formula = PC1 ~ exper_2 + exper_6, data = data_clean)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.2204 -0.9533 -0.1760  0.8984  3.4324 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.8426     0.2208   3.817 0.000311 ***
## exper_2      -1.2644     0.3202  -3.948 0.000201 ***
## exper_6      -0.8948     0.3273  -2.734 0.008118 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.252 on 63 degrees of freedom
## Multiple R-squared:  0.3126, Adjusted R-squared:  0.2908 
## F-statistic: 14.33 on 2 and 63 DF,  p-value: 7.445e-06

PC2 ~

model_formula <- as.formula(paste("PC2 ~", predictors_formula))
model <- lm(model_formula, data = data_clean)
summary(model)
## 
## Call:
## lm(formula = model_formula, data = data_clean)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.05284 -0.49795 -0.06932  0.51223  1.55966 
## 
## Coefficients: (5 not defined because of singularities)
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          2.56907    0.78594   3.269 0.002159 ** 
## exper_0              0.05623    0.68925   0.082 0.935369    
## exper_1             -0.63510    0.37171  -1.709 0.094911 .  
## exper_2              0.06430    0.28924   0.222 0.825160    
## exper_3             -0.47701    0.36194  -1.318 0.194672    
## exper_4             -0.95640    0.31849  -3.003 0.004491 ** 
## exper_5             -0.50234    0.48469  -1.036 0.305940    
## exper_6             -0.03623    0.30521  -0.119 0.906067    
## exper_7              0.11492    0.52589   0.219 0.828084    
## exper_8              0.07832    0.32180   0.243 0.808890    
## exper_9             -1.15102    0.31449  -3.660 0.000699 ***
## exper_10            -0.42418    0.53984  -0.786 0.436428    
## preservice           0.99573    0.75157   1.325 0.192376    
## practicing                NA         NA      NA       NA    
## researcher          -0.17142    0.41848  -0.410 0.684172    
## raceWhite           -0.25307    0.48156  -0.526 0.601994    
## raceBlack                 NA         NA      NA       NA    
## raceNative           0.52114    0.72916   0.715 0.478743    
## raceAsian            1.36019    0.91069   1.494 0.142759    
## racePacific               NA         NA      NA       NA    
## raceOther           -0.06796    0.86827  -0.078 0.937982    
## ethnHispanic        -0.30714    0.41386  -0.742 0.462132    
## grad_degree          0.24231    0.27952   0.867 0.390945    
## teaching                  NA         NA      NA       NA    
## teach_experience          NA         NA      NA       NA    
## hs_grades            0.16097    0.33280   0.484 0.631124    
## es_grades           -0.16691    0.36351  -0.459 0.648481    
## school_poverty_high -0.57952    0.25946  -2.234 0.030897 *  
## school_pub          -0.66462    0.48056  -1.383 0.173968    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8882 on 42 degrees of freedom
## Multiple R-squared:  0.634,  Adjusted R-squared:  0.4336 
## F-statistic: 3.163 on 23 and 42 DF,  p-value: 0.0005882

Finding 4b: PC2 ~ -AP, -Mentorship, Low Poverty

model <- lm(PC2 ~ exper_4 + exper_9 + school_poverty_high, data = data_clean)
summary(model)
## 
## Call:
## lm(formula = PC2 ~ exper_4 + exper_9 + school_poverty_high, data = data_clean)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.84607 -0.57835 -0.05833  0.46166  2.52490 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           1.0638     0.1939   5.487 8.04e-07 ***
## exper_4              -0.9120     0.2302  -3.961 0.000195 ***
## exper_9              -0.9785     0.2303  -4.249 7.34e-05 ***
## school_poverty_high  -0.6031     0.2357  -2.558 0.012978 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9084 on 62 degrees of freedom
## Multiple R-squared:  0.4349, Adjusted R-squared:  0.4076 
## F-statistic: 15.91 on 3 and 62 DF,  p-value: 8.833e-08

The Open-Positive-Fresh respondent does not have experience with AP or Mentorship and identifies with a school that is not high poverty.

PC3 ~

model_formula <- as.formula(paste("PC3 ~", predictors_formula))
model <- lm(model_formula, data = data_clean)
summary(model)
## 
## Call:
## lm(formula = model_formula, data = data_clean)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.41309 -0.45045  0.00631  0.45118  1.87007 
## 
## Coefficients: (5 not defined because of singularities)
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -0.38877    0.72737  -0.534  0.59582    
## exper_0              0.63228    0.63789   0.991  0.32726    
## exper_1              0.19020    0.34401   0.553  0.58326    
## exper_2             -0.17120    0.26768  -0.640  0.52594    
## exper_3              0.43995    0.33497   1.313  0.19618    
## exper_4             -0.48560    0.29475  -1.647  0.10692    
## exper_5             -0.18927    0.44857  -0.422  0.67522    
## exper_6             -0.17524    0.28246  -0.620  0.53835    
## exper_7              0.14806    0.48670   0.304  0.76247    
## exper_8             -0.01601    0.29782  -0.054  0.95738    
## exper_9              0.04153    0.29105   0.143  0.88722    
## exper_10             1.12432    0.49961   2.250  0.02972 *  
## preservice          -0.07474    0.69555  -0.107  0.91494    
## practicing                NA         NA      NA       NA    
## researcher          -0.41844    0.38729  -1.080  0.28612    
## raceWhite            0.76314    0.44567   1.712  0.09421 .  
## raceBlack                 NA         NA      NA       NA    
## raceNative           0.28408    0.67482   0.421  0.67591    
## raceAsian            1.88087    0.84282   2.232  0.03103 *  
## racePacific               NA         NA      NA       NA    
## raceOther            1.03191    0.80356   1.284  0.20612    
## ethnHispanic        -0.33682    0.38302  -0.879  0.38419    
## grad_degree         -0.73834    0.25869  -2.854  0.00667 ** 
## teaching                  NA         NA      NA       NA    
## teach_experience          NA         NA      NA       NA    
## hs_grades           -0.24325    0.30800  -0.790  0.43409    
## es_grades           -0.34118    0.33642  -1.014  0.31632    
## school_poverty_high  1.06512    0.24013   4.436  6.5e-05 ***
## school_pub          -0.14112    0.44474  -0.317  0.75258    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.822 on 42 degrees of freedom
## Multiple R-squared:  0.6436, Adjusted R-squared:  0.4484 
## F-statistic: 3.297 on 23 and 42 DF,  p-value: 0.0003892

Finding 4c: PC3 ~ Other Experience, No grad degree, high poverty school

The Cautious/Mid-career respondent is more likely to enter “other” for experience, have no graduate degree, and identify with a high-poverty school

model <- lm(PC3 ~ exper_10 + grad_degree + school_poverty_high, data = data_clean)
summary(model)
## 
## Call:
## lm(formula = PC3 ~ exper_10 + grad_degree + school_poverty_high, 
##     data = data_clean)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.50732 -0.48404 -0.01803  0.51646  1.96670 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          -0.0203     0.2129  -0.095 0.924315    
## exper_10              1.3054     0.3557   3.670 0.000506 ***
## grad_degree          -0.7217     0.2204  -3.275 0.001734 ** 
## school_poverty_high   1.2874     0.1953   6.592 1.09e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7603 on 62 degrees of freedom
## Multiple R-squared:   0.55,  Adjusted R-squared:  0.5282 
## F-statistic: 25.26 on 3 and 62 DF,  p-value: 8.505e-11

PC4 ~

model_formula <- as.formula(paste("PC4 ~", predictors_formula))
model <- lm(model_formula, data = data_clean)
summary(model)
## 
## Call:
## lm(formula = model_formula, data = data_clean)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.68245 -0.49891  0.01864  0.40548  1.48472 
## 
## Coefficients: (5 not defined because of singularities)
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          0.858244   0.714448   1.201   0.2364    
## exper_0              0.546320   0.626553   0.872   0.3882    
## exper_1             -0.038037   0.337896  -0.113   0.9109    
## exper_2             -0.338366   0.262927  -1.287   0.2052    
## exper_3             -0.025553   0.329017  -0.078   0.9385    
## exper_4             -0.161716   0.289516  -0.559   0.5794    
## exper_5             -0.497157   0.440599  -1.128   0.2656    
## exper_6              0.552367   0.277444   1.991   0.0530 .  
## exper_7             -0.079973   0.478053  -0.167   0.8679    
## exper_8              0.191294   0.292530   0.654   0.5167    
## exper_9              0.255548   0.285878   0.894   0.3765    
## exper_10             0.027024   0.490732   0.055   0.9563    
## preservice           0.770620   0.683198   1.128   0.2657    
## practicing                 NA         NA      NA       NA    
## researcher          -0.008833   0.380414  -0.023   0.9816    
## raceWhite            0.541757   0.437756   1.238   0.2227    
## raceBlack                  NA         NA      NA       NA    
## raceNative          -0.476137   0.662827  -0.718   0.4765    
## raceAsian            1.923431   0.827845   2.323   0.0251 *  
## racePacific                NA         NA      NA       NA    
## raceOther            0.958515   0.789286   1.214   0.2314    
## ethnHispanic        -0.005235   0.376211  -0.014   0.9890    
## grad_degree         -0.742075   0.254096  -2.920   0.0056 ** 
## teaching                   NA         NA      NA       NA    
## teach_experience           NA         NA      NA       NA    
## hs_grades           -0.174329   0.302529  -0.576   0.5675    
## es_grades           -0.570475   0.330445  -1.726   0.0916 .  
## school_poverty_high -1.210828   0.235859  -5.134 6.89e-06 ***
## school_pub          -0.291351   0.436842  -0.667   0.5085    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8074 on 42 degrees of freedom
## Multiple R-squared:  0.5307, Adjusted R-squared:  0.2737 
## F-statistic: 2.065 on 23 and 42 DF,  p-value: 0.02034

Finding 4d: PC4 ~ No grad degree & not low poverty

The focused/experienced respondent tends not to identify with high poverty school and does not have a grad degree.

model <- lm(PC4 ~ grad_degree + school_poverty_high, data = data_clean)
summary(model)
## 
## Call:
## lm(formula = PC4 ~ grad_degree + school_poverty_high, data = data_clean)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.04413 -0.57356 -0.08624  0.45400  1.91125 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           0.8106     0.2206   3.675 0.000494 ***
## grad_degree          -0.6015     0.2320  -2.592 0.011835 *  
## school_poverty_high  -0.9759     0.2067  -4.721 1.35e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8048 on 63 degrees of freedom
## Multiple R-squared:  0.3007, Adjusted R-squared:  0.2785 
## F-statistic: 13.55 on 2 and 63 DF,  p-value: 1.279e-05

Do PCs Predict Scores?

EHDP ~ PCs regression model

model <- lm(data_clean$EHDP_M ~ pca_result$x[, 1] + pca_result$x[, 2])
summary(model)
## 
## Call:
## lm(formula = data_clean$EHDP_M ~ pca_result$x[, 1] + pca_result$x[, 
##     2])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.3845 -0.3448  0.0113  0.3991  3.6367 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        6.42642    0.11036  58.233   <2e-16 ***
## pca_result$x[, 1]  0.10142    0.07478   1.356    0.180    
## pca_result$x[, 2] -0.09714    0.09422  -1.031    0.307    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8965 on 63 degrees of freedom
## Multiple R-squared:  0.04404,    Adjusted R-squared:  0.01369 
## F-statistic: 1.451 on 2 and 63 DF,  p-value: 0.242

No PCs are a good predictor of EHDP score.

FORA ~ PCs regression model

model <- lm(data_clean$FORA_M ~ pca_result$x[, 1] + pca_result$x[, 2] + pca_result$x[, 3])
summary(model)
## 
## Call:
## lm(formula = data_clean$FORA_M ~ pca_result$x[, 1] + pca_result$x[, 
##     2] + pca_result$x[, 3])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.4960 -0.4962 -0.1366  0.3614  2.1034 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        4.73742    0.09398  50.407  < 2e-16 ***
## pca_result$x[, 1] -0.45196    0.06369  -7.097 1.46e-09 ***
## pca_result$x[, 2]  0.43360    0.08024   5.403 1.10e-06 ***
## pca_result$x[, 3]  0.19593    0.08556   2.290   0.0254 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7635 on 62 degrees of freedom
## Multiple R-squared:  0.5777, Adjusted R-squared:  0.5572 
## F-statistic: 28.27 on 3 and 62 DF,  p-value: 1.214e-11

Finding 5: FORA is predicted by PCs 1-3.

FORA decreases by .45 units with a unit of increase on PC1 (outsider) and increases by .43 units with a unit increase in PC2 (early and open), both significant (p < .001). A considerable amount of the variation in FORA score is explained by PC predictors (Adjusted R-squared of 0.56). A respondent at the mean of both principal components had a FORA score of 4.73 (p < .001), which is a little on the openness=fairness side.

PC1 ~ lower FORA

PC2 ~ higher FORA

PC3 ~ a bit higher FORA

Next:

Add PC variables to dataframe.

# Assign the first principal component loadings (PC1) to the data
data_clean$PC1 <- pca_result$x[, 1]

# Assign the second principal component loadings (PC2) to the data
data_clean$PC2 <- pca_result$x[, 2]

# Assign the second principal component loadings (PC2) to the data
data_clean$PC3 <- pca_result$x[, 3]

3D Scatterplots of three PCs by group

Honor Society Experience

#change the grouping variable

# Create a 3D scatter plot
plot_ly(data = data_clean, 
        x = ~data_clean$PC1, 
        y = ~data_clean$PC2, 
        z = ~data_clean$PC3, 
        color = ~as.factor(exper_2), 
        type = "scatter3d", 
        mode = "markers") %>%
  layout(scene = list(xaxis = list(title = "PC1"),
                      yaxis = list(title = "PC2"),
                      zaxis = list(title = "PC3")),
         title = "3D Scatterplot of PC1, PC2, and PC3 Coded by exper_4")
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

Gifted & Talented Experience

# Create a 3D scatter plot
plot_ly(data = data_clean, 
        x = ~data_clean$PC1, 
        y = ~data_clean$PC2, 
        z = ~data_clean$PC3, 
        color = ~as.factor(exper_6), 
        type = "scatter3d", 
        mode = "markers") %>%
  layout(scene = list(xaxis = list(title = "PC1"),
                      yaxis = list(title = "PC2"),
                      zaxis = list(title = "PC3")),
         title = "3D Scatterplot of PC1, PC2, and PC3 Coded by exper_4")
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

AP Studio Art Experience

# Create a 3D scatter plot
plot_ly(data = data_clean, 
        x = ~data_clean$PC1, 
        y = ~data_clean$PC2, 
        z = ~data_clean$PC3, 
        color = ~as.factor(exper_4), 
        type = "scatter3d", 
        mode = "markers") %>%
  layout(scene = list(xaxis = list(title = "PC1"),
                      yaxis = list(title = "PC2"),
                      zaxis = list(title = "PC3")),
         title = "3D Scatterplot of PC1, PC2, and PC3 Coded by exper_4")
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

Art Mentorship Experience

# Create a 3D scatter plot
plot_ly(data = data_clean, 
        x = ~data_clean$PC1, 
        y = ~data_clean$PC2, 
        z = ~data_clean$PC3, 
        color = ~as.factor(exper_9), 
        type = "scatter3d", 
        mode = "markers") %>%
  layout(scene = list(xaxis = list(title = "PC1"),
                      yaxis = list(title = "PC2"),
                      zaxis = list(title = "PC3")),
         title = "3D Scatterplot of PC1, PC2, and PC3 Coded by exper_4")
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

School Poverty Identification

# Create a 3D scatter plot
plot_ly(data = data_clean, 
        x = ~data_clean$PC1, 
        y = ~data_clean$PC2, 
        z = ~data_clean$PC3, 
        color = ~as.factor(school_poverty_high), 
        type = "scatter3d", 
        mode = "markers") %>%
  layout(scene = list(xaxis = list(title = "PC1"),
                      yaxis = list(title = "PC2"),
                      zaxis = list(title = "PC3")),
         title = "3D Scatterplot of PC1, PC2, and PC3 Coded by exper_4")
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

‘Other’ Experience

# Create a 3D scatter plot
plot_ly(data = data_clean, 
        x = ~data_clean$PC1, 
        y = ~data_clean$PC2, 
        z = ~data_clean$PC3, 
        color = ~as.factor(exper_10), 
        type = "scatter3d", 
        mode = "markers") %>%
  layout(scene = list(xaxis = list(title = "PC1"),
                      yaxis = list(title = "PC2"),
                      zaxis = list(title = "PC3")),
         title = "3D Scatterplot of PC1, PC2, and PC3 Coded by exper_4")
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

Grad Degree

# Create a 3D scatter plot
plot_ly(data = data_clean, 
        x = ~data_clean$PC1, 
        y = ~data_clean$PC2, 
        z = ~data_clean$PC3, 
        color = ~as.factor(grad_degree), 
        type = "scatter3d", 
        mode = "markers") %>%
  layout(scene = list(xaxis = list(title = "PC1"),
                      yaxis = list(title = "PC2"),
                      zaxis = list(title = "PC3")),
         title = "3D Scatterplot of PC1, PC2, and PC3 Coded by exper_4")
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

School Poverty Identification

# Create a 3D scatter plot
plot_ly(data = data_clean, 
        x = ~data_clean$PC1, 
        y = ~data_clean$PC2, 
        z = ~data_clean$PC3, 
        color = ~as.factor(school_poverty_high), 
        type = "scatter3d", 
        mode = "markers") %>%
  layout(scene = list(xaxis = list(title = "PC1"),
                      yaxis = list(title = "PC2"),
                      zaxis = list(title = "PC3")),
         title = "3D Scatterplot of PC1, PC2, and PC3 Coded by exper_4")
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels