Scores on the Thurstone Scales:
FORA is a continuum from extreme openness (M = 1) to extreme selectivity (M = 11) with regards to the fairness of specialized, advanced art programs.
EHDP is a continuum from extreme focus on the happiness and benefit to the individual (M = 1) to extreme focus on the benefit to society (M = 11) as justification for specialized, advanced art programs.
Grouping variables include interests in and experience with each type of specialized, advanced art program, race/ethnicity variables, grade levels taught, locale and poverty level of school, etc.
The most common experience reported are Art Club and Awards. The least common are IB, Magnet School, and None.
a. Reasonably strong association at R-squared of .32 with distance from mean of EDHP being a significant predictor of EHDP number. The more extreme your views about EHDP the fewer items you agree with, and more importantly, the more central your views about EHDP the more agreeable you are in selecting more items on the scale.
b. Also a significant but small negative relationships between extremity of views about the EHDP variable and breadth of statements endorsed.
This just means that average scores on EHDP are not made by selecting only a few statements in the middle, but by selecting a balance of statements on both sides of the mean.
With more data, this pattern may support the hypothesis that those in the middle have flexible views, endorsing statements on both sides of the continuum. It’s a given that those on the extremes of EHDP score would not have endorsed as many items as it’s the only way to keep your score on the extreme, but it is interesting to know that that’s not common in the middle.
Principal components analysis was conducted to abstract from the scaled continuous and ordinal variables.
Eigenvalue of 2.2 explains a substantial portion of the variance. Much lower agreeableness and flexibility and second-lowest feedback rating. Experience and especially interest in fewer types of programs than others Fewer years than average, but not the fewest. Most rural schools, average school poverty.
Eigenvalue of 1.4 explains a substantial amount of the variance, but less than PC1. Highest agreeableness, flexibility, and feedback ratings Interested in more types of programs than the few types of programs they have experience with - the fewest of all groups. Newest to teaching art. A little below average rurality and lowest school poverty level (suburban)
Eigenvalue of 1.2 explains a substantial amount of the variance, but less than PC1 & PC2. Above-average agreeableness and flexibility, lowest feedback ratings. Experienced with average number of types of programs Interested in the greatest number of types of programs Second-least number of years of teaching than than other groups. At schools that are the most poor and second-most rural
Eigenvalue of 0.9 just below the threshold for usefulness in abstraction. About average agreeableness, flexibility, and feedback ratings Interested in fewer types of programs than has experience with Greatest number of years of teaching Working at the most urban schools with above-average poverty
FORA decreases by .45 units with a unit of increase on PC1 (outsider) and increases by .43 units with a unit increase in PC2 (early and open), both significant (p < .001). A considerable amount of the variation in FORA score is explained by PC predictors (Adjusted R-squared of 0.56). A respondent at the mean of both principal components had a FORA score of 4.73 (p < .001), which is a little on the openness=fairness side.
PC1 ~ lower FORA
PC2 ~ higher FORA
PC3 ~ a bit higher FORA
# Load necessary libraries
library(tidyr)
library(dplyr)
library(psych)
library(gridExtra)
library(ggplot2)
library(caret)
library(car)
library(readr)
library(quantreg)
library(tibble)
library(plotly)
library(reshape2)
library(clipr)
library(lavaan)
library(semTools)
Set the date for use in writing filenames
# Get the current date in the format YYYYMMDD
date_suffix <- format(Sys.Date(), "%Y%m%d")
data <- read.csv("asap-aq-responses.csv", colClasses = c(interests = "character", experience = "character"))
data <- data %>%
filter(Q_RelevantIDDuplicateScore < 80,
interests != "")
#still 87 responses okay go with it.
#remove unused columns from survey export
data[, 1:22] <- NULL
data[, 17:20] <- NULL
#16 useful variables remain
#binary variables for eligibility categories that are not mutually exclusive
data$preservice <- ifelse(grepl("1", data$eligible), 1, 0)
data$practicing <- ifelse(grepl("2", data$eligible), 1, 0)
data$researcher <- ifelse(grepl("3", data$eligible), 1, 0)
data$eligible <- NULL
#binary variables for race and ethnicity
data$raceWhite <- ifelse(grepl("1", data$race), 1, 0)
data$raceBlack <- ifelse(grepl("2", data$race), 1, 0)
data$raceNative <- ifelse(grepl("3", data$race), 1, 0)
data$raceAsian <- ifelse(grepl("4", data$race), 1, 0)
data$racePacific <- ifelse(grepl("5", data$race), 1, 0)
data$raceOther <- ifelse(grepl("6", data$race), 1, 0)
data$raceDecline <- ifelse(grepl("7", data$race), 1, 0)
data$race <- NULL
data$ethnHispanic <- ifelse(grepl("1", data$ethnicity), 1, 0)
data$ethnicity <- NULL
#binary variable for those with graduate degree
data$grad_degree <- ifelse(is.na(data$education), 0, ifelse(data$education > 5, 1, 0))
#binary variable for practicing teachers
data$teaching <- ifelse(is.na(data$school_type), 0, 1)
#binar variable for above-the-median years of experience
data$teach_experience <- ifelse(data$years > median(data$years, na.rm = TRUE), 1, 0)
#minimum grade level currently taught
data$grade_min <- sapply(data$grade_levels, function(x) {
# Split the string into a list of character numbers
l <- strsplit(x, ",")[[1]]
# Check if the list is empty or contains only NA values
if (all(is.na(l)) || length(l) == 0) {
return(NA) # Return NA if no valid numbers
} else {
return(min(as.numeric(l), na.rm = TRUE))
}
})
#maximum grade level currently taught
data$grade_max <- sapply(data$grade_levels, function(x) {
l <- strsplit(x, ",")[[1]]
# Check if the list is empty or contains only NA values
if (all(is.na(l)) || length(l) == 0) {
return(NA) # Return NA if no valid numbers
} else {
return(max(as.numeric(l), na.rm = TRUE))
}
})
#grade_range
data$grade_range <- sapply(data$grade_levels, function(x) {
l <- strsplit(x, ",")[[1]]
# Check if the list is empty or contains only NA values
if (all(is.na(l)) || length(l) == 0) {
return(NA) # Return NA if no valid numbers
} else {
return(diff(range(as.numeric(l), na.rm = TRUE)))
}
})
#Create a high school code and an elementary school code
data$hs_grades <- ifelse(is.na(data$grade_min), 0, ifelse(data$grade_min > 8, 1, 0))
data$es_grades <- ifelse(is.na(data$grade_min), 0, ifelse(data$grade_max < 7, 1, 0))
# Function to calculate the mean of the comma-separated numbers in each record of grade_levels
calculate_mean <- function(x) {
num_values <- as.numeric(unlist(strsplit(x, ","))) # Split, unlist, and convert to numeric
mean(num_values) # Calculate mean
}
# Apply the function to each element of data$grade_levels
data$grade_levels_M <- ifelse(data$grade_levels == "", NA, sapply(data$grade_levels, calculate_mean))
#remove the grade levels variable
data$grade_levels <- NULL
#simplify school type into four categories
data$school_type_cat <- ifelse(is.na(data$school_type), "None",
ifelse(data$school_type < 10, "Public",
ifelse(data$school_type == 10, "BIE",
ifelse(data$school_type < 30, "Private",
ifelse(data$school_type == 30, "MusOrg", data$school_type)))))
#binary variable for school poverty at 50% or greater as reported
data$school_poverty_high <- ifelse(is.na(data$school_poverty), 0, ifelse(data$school_poverty > 2, 1, 0))
#binary variable for publicly funded schools including BIE-funded
data$school_pub <- ifelse(is.na(data$school_type), 0, ifelse(data$school_type < 20, 1, 0))
#35 variables now
Number of interests and number of experience categories are also of interest. Create new variables for each and one for the difference between interests and experience categories.
data$inter_n <- sapply(data$interests, function(x) {
l <- strsplit(x, ",")[[1]]
length(l)
})
data$exper_n <- sapply(data$experience, function(x) {
l <- strsplit(x, ",")[[1]]
length(l)
})
data$IEdiff <- data$inter_n - data$exper_n
Create binary variables for the presence of each choice in EHDP and FORA scores.
all_items <- unlist(strsplit("1.1,1.2,2.1,2.2,3,4,5,6,7,8,9,10.1,10.2,11.1,11.2", ","))
# Create binary variables for each unique value in EHDP
for (val in all_items) {
data[[paste0("EHDP_",gsub("\\.", "_", val))]] <- sapply(data$EHDP, function(x) as.integer(val %in% strsplit(x, ",")[[1]]))
}
# Create binary variables for each unique value in FORA
for (val in all_items) {
data[[paste0("FORA_",gsub("\\.", "_", val))]] <- sapply(data$FORA, function(x) as.integer(val %in% strsplit(x, ",")[[1]]))
}
Next, find the n, mean, and SD of both score variables for each respondent.
#EHDP Mean
data$EHDP_M <- sapply(data$EHDP, function(x) {
# Split the string into a list of character numbers
l <- strsplit(x, ",")[[1]]
# Convert the character list to numeric, trim decimals, and calculate the mean
mean(trunc(as.numeric(l)), na.rm = TRUE)
})
#FORA Mean
data$FORA_M <- sapply(data$FORA, function(x) {
l <- strsplit(x, ",")[[1]]
mean(trunc(as.numeric(l)), na.rm = TRUE)
})
#EHDP n
data$EHDP_n <- sapply(data$EHDP, function(x) {
# Split the string into a list of character numbers
l <- strsplit(x, ",")[[1]]
# Count the n of items in the list
length(l)
})
#FORA n
data$FORA_n <- sapply(data$FORA, function(x) {
l <- strsplit(x, ",")[[1]]
length(l)
})
#EHDP standard deviation
data$EHDP_SD <- sapply(data$EHDP, function(x) {
# Split the string into a list of character numbers
l <- strsplit(x, ",")[[1]]
# Convert the character list to numeric, trim decimals, and calculate the SD
sd(trunc(as.numeric(l)), na.rm = TRUE)
})
#FORA stamdard devitation
data$FORA_SD <- sapply(data$FORA, function(x) {
l <- strsplit(x, ",")[[1]]
sd(trunc(as.numeric(l)), na.rm = TRUE)
})
#Okay to remove EHDP and FORA variables which were comma-separated lists of type char
data[, 1:2] <- NULL
#standardize scores
data$EHDP_M_z <- scale(data$EHDP_M)
data$FORA_M_z <- scale(data$FORA_M)
#combine flexibility in thinking about both constructs into one variable
#add FORA_SD and EHDP_SD
data$flexibility <- data$FORA_SD + data$EHDP_SD
#combine n for both constructs into one variable -- agreeableness
data$agreeableness <- data$FORA_n + data$EHDP_n
Binary variables for experience and interest are:
art club
honor society
award
AP
IB
GT
magnet
postsec credit
mentorship/internship
other
0 = none of the above.
#here is a list of all the interests and experience codes
all_items <- c(0,1,2,3,4,5,6,7,8,9,10)
# Create binary variables for each unique value in interests
for (val in all_items) {
data[[paste0("inter_", val)]] <- sapply(data$interests, function(x) as.integer(val %in% strsplit(x, ",")[[1]]))
}
# Create binary variables for each unique value in experience
for (val in all_items) {
data[[paste0("exper_", val)]] <- sapply(data$experience, function(x) as.integer(val %in% strsplit(x, ",")[[1]]))
}
hist(data$EHDP_M)
hist(data$FORA_M)
Leptokuric distribution of EHDP scores. Platykuric distribution of FORA scores.
Describe score variables
describe(data$EHDP_M)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 87 6.41 0.83 6.36 6.39 0.46 3.33 10.5 7.17 0.86 7.21 0.09
describe(data$FORA_M)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 87 4.75 1.15 4.92 4.83 1.19 1.33 7.08 5.75 -0.76 0.65 0.12
hist(data$EHDP_SD)
hist(data$FORA_SD)
ordinal_variables <- c('education','years', 'school_poverty', 'locale')
nominal_variables <-c('school_type_cat')
binary_group_variables <- c('preservice',
'practicing',
'researcher',
'raceWhite',
'raceBlack',
'raceNative',
'raceAsian',
'racePacific',
'raceOther',
'ethnHispanic',
'grad_degree',
'teaching',
'teach_experience',
'hs_grades',
'es_grades',
'school_poverty_high',
'school_pub'
)
binary_exper_variables <- grep("^exper_[0-9]", colnames(data), value = TRUE)
continuous_variables <-c('grade_range',
'grade_levels_M',
'inter_n',
'exper_n',
'IEdiff',
'EHDP_M',
'FORA_M',
'flexibility',
'agreeableness',
'feedback_1'
)
Count the number of each experience variable.
# Use lapply to sum each column specified in binary_exper_variables
sums_list <- lapply(binary_exper_variables, function(var) {
sum(data[[var]], na.rm = TRUE)
})
# Combine the results into a dataframe
sums_df <- data.frame(
Variable = binary_exper_variables,
Count = unlist(sums_list)
)
names <- c("None", "Art Club", "Honor Society", "Awards", "AP", "IB","Gifted & Talented","Magnet School", "Dual Credit", "Mentorship/Internship", "Other")
sums_df$Label <- names
sums_df$Proportion <- round(sums_df$Count / nrow(data),2)
sums_df <- sums_df %>% arrange(desc(Proportion))
# Display the resulting dataframe
print(sums_df)
## Variable Count Label Proportion
## 1 exper_1 65 Art Club 0.75
## 2 exper_3 54 Awards 0.62
## 3 exper_9 38 Mentorship/Internship 0.44
## 4 exper_4 37 AP 0.43
## 5 exper_2 31 Honor Society 0.36
## 6 exper_6 30 Gifted & Talented 0.34
## 7 exper_8 22 Dual Credit 0.25
## 8 exper_5 10 IB 0.11
## 9 exper_7 8 Magnet School 0.09
## 10 exper_0 6 None 0.07
## 11 exper_10 5 Other 0.06
write.csv(sums_df, paste0("Experience Counts ", date_suffix, ".csv"))
The table above counts the frequency of each type of experience selected by respondents. Greatest proportion are Art Club & Awards > 50%. Next areMentorship & AP > 40%, then Honor Society and G&T > 30%.
First the experience groups.
# List of grouping variables
group_vars <- binary_exper_variables
# Initialize an empty list to store plots
plot_list <- list()
# Loop through each grouping variable and create plots
for (i in seq_along(group_vars)) {
group_var <- group_vars[i]
# Create the plot for each grouping variable
p <- ggplot(data, aes_string(x = paste0("as.factor(", group_var, ")"), y = "FORA_M")) +
geom_boxplot() +
labs(x = group_var,
y = "FORA") +
theme_minimal()
# Store the plot in the list
plot_list[[i]] <- p
}
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
# Arrange all plots in a grid (2 rows by 5 columns as an example)
plot_grid <- grid.arrange(grobs = plot_list, ncol = 4)
Next other grouping variables
# List of grouping variables
group_vars <- binary_group_variables
# Initialize an empty list to store plots
plot_list <- list()
# Loop through each grouping variable and create plots
for (i in seq_along(group_vars)) {
group_var <- group_vars[i]
# Create the plot for each grouping variable
p <- ggplot(data, aes_string(x = paste0("as.factor(", group_var, ")"), y = "FORA_M")) +
geom_boxplot() +
labs(x = group_var,
y = "FORA") +
theme_minimal()
# Store the plot in the list
plot_list[[i]] <- p
}
# Arrange all plots in a grid (2 rows by 5 columns as an example)
plot_grid <- grid.arrange(grobs = plot_list, ncol = 4)
First the experience groups
# List of grouping variables
group_vars <- binary_exper_variables
# Initialize an empty list to store plots
plot_list <- list()
group_count <- data.frame()
# Loop through each grouping variable and create plots
for (i in seq_along(group_vars)) {
group_var <- group_vars[i]
# Create the plot for each grouping variable
p <- ggplot(data, aes_string(x = paste0("as.factor(", group_var, ")"), y = "EHDP_M")) +
geom_boxplot() +
labs(x = group_var,
y = "EHDP") +
theme_minimal()
# Store the plot in the list
plot_list[[i]] <- p
}
# Arrange all plots in a grid
plot_grid <- grid.arrange(grobs = plot_list, ncol = 6)
Then the other grouping variables
# List of grouping variables
group_vars <- binary_group_variables
# Initialize an empty list to store plots
plot_list <- list()
# Loop through each grouping variable and create plots
for (i in seq_along(group_vars)) {
group_var <- group_vars[i]
# Create the plot for each grouping variable
p <- ggplot(data, aes_string(x = paste0("as.factor(", group_var, ")"), y = "EHDP_M")) +
geom_boxplot() +
labs(x = group_var,
y = "EHDP") +
theme_minimal()
# Store the plot in the list
plot_list[[i]] <- p
}
# Arrange all plots in a grid
plot_grid <- grid.arrange(grobs = plot_list, ncol = 6)
outcomes <- continuous_variables
groups <- binary_exper_variables
# Initialize a list to store results
results_list <- list()
# Loop over each outcome variable
for (outcome in outcomes) {
# Initialize a list to store results for the current outcome
outcome_results_list <- list()
# Loop over each group variable
for (group in groups) {
# Perform a t-test comparing the means of the current outcome variable grouping variable
t_test_result <- t.test(data[[outcome]] ~ data[[group]])
# Create a dataframe with the results for this outcome variable for all groups
results <- data.frame(
Outcome = outcome,
Group = group,
P_Value = round(t_test_result$p.value, 3)
)
# Append the result to the outcome's results list
outcome_results_list[[group]] <- results
}
# Combine all results for this outcome into a single dataframe
outcome_results_df <- do.call(rbind, outcome_results_list)
# Append the combined results to the main results list
results_list[[outcome]] <- outcome_results_df
}
# Combine all results into a single data frame
final_results <- do.call(rbind, results_list)
significant_results_experience <- final_results %>%
filter(P_Value < .05)
print(significant_results_experience)
## Outcome Group P_Value
## grade_range.exper_6 grade_range exper_6 0.010
## grade_range.exper_9 grade_range exper_9 0.047
## grade_levels_M.exper_2 grade_levels_M exper_2 0.001
## grade_levels_M.exper_3 grade_levels_M exper_3 0.029
## grade_levels_M.exper_4 grade_levels_M exper_4 0.000
## inter_n.exper_2 inter_n exper_2 0.017
## inter_n.exper_4 inter_n exper_4 0.048
## exper_n.exper_0 exper_n exper_0 0.000
## exper_n.exper_1 exper_n exper_1 0.001
## exper_n.exper_2 exper_n exper_2 0.000
## exper_n.exper_3 exper_n exper_3 0.000
## exper_n.exper_4 exper_n exper_4 0.000
## exper_n.exper_6 exper_n exper_6 0.000
## exper_n.exper_7 exper_n exper_7 0.020
## exper_n.exper_8 exper_n exper_8 0.001
## exper_n.exper_9 exper_n exper_9 0.000
## IEdiff.exper_0 IEdiff exper_0 0.010
## IEdiff.exper_3 IEdiff exper_3 0.003
## IEdiff.exper_4 IEdiff exper_4 0.014
## IEdiff.exper_7 IEdiff exper_7 0.043
## IEdiff.exper_8 IEdiff exper_8 0.009
## IEdiff.exper_9 IEdiff exper_9 0.040
## flexibility.exper_10 flexibility exper_10 0.011
## agreeableness.exper_2 agreeableness exper_2 0.013
## feedback_1.exper_9 feedback_1 exper_9 0.029
outcomes <- continuous_variables
groups <- c('preservice',
'researcher',
'raceWhite',
'raceNative',
'raceOther',
'ethnHispanic',
'grad_degree',
'hs_grades',
'es_grades',
'school_poverty_high',
'school_pub'
)
#groups <- binary_group_variables
# Initialize a list to store results
results_list <- list()
# Loop over each outcome variable
for (outcome in outcomes) {
# Initialize a list to store results for the current outcome
outcome_results_list <- list()
# Loop over each group variable
for (group in groups) {
# Perform a t-test comparing the means of the current outcome variable grouping variable
t_test_result <- t.test(data[[outcome]] ~ as.factor(data[[group]]))
# Create a dataframe with the results for this outcome variable for all groups
results <- data.frame(
Outcome = outcome,
Group = group,
P_Value = round(t_test_result$p.value, 3)
)
# Append the result to the outcome's results list
outcome_results_list[[group]] <- results
}
# Combine all results for this outcome into a single dataframe
outcome_results_df <- do.call(rbind, outcome_results_list)
# Append the combined results to the main results list
results_list[[outcome]] <- outcome_results_df
}
# Combine all results into a single data frame
final_results <- do.call(rbind, results_list)
significant_results_group <- significant_results_group <- final_results[final_results$P_Value < 0.05, ]
print(significant_results_group)
## Outcome Group P_Value
## grade_range.hs_grades grade_range hs_grades 0.000
## grade_levels_M.raceWhite grade_levels_M raceWhite 0.030
## grade_levels_M.raceNative grade_levels_M raceNative 0.037
## grade_levels_M.hs_grades grade_levels_M hs_grades 0.000
## grade_levels_M.es_grades grade_levels_M es_grades 0.000
## exper_n.es_grades exper_n es_grades 0.011
## IEdiff.raceNative IEdiff raceNative 0.026
## FORA_M.preservice FORA_M preservice 0.001
## flexibility.raceWhite flexibility raceWhite 0.011
## agreeableness.preservice agreeableness preservice 0.018
Use this dataframe filtered by P-value <.05 to examine the any linear relationship between the outcome variable and the group belonging
# Step 1: Filter the results to keep only rows with p-value < 0.05
significant_results <- rbind(significant_results_experience, significant_results_group)
# Initialize a list to store regression results
regression_results_list <- list()
# Step 2: Perform linear regression for each significant outcome and exper variable combination
for (i in 1:nrow(significant_results)) {
# Extract the outcome variable and exper variable names
outcome_var <- significant_results$Outcome[i]
exper_var <- significant_results$Group[i]
# Create a linear model formula dynamically
formula <- as.formula(paste(outcome_var, "~", exper_var))
# Fit the linear model using lm()
model <- lm(formula, data = data)
# Store the summary of the model results
model_summary <- summary(model)
# Extract coefficients, R-squared, and p-value
coef <- model_summary$coefficients
r_squared <- model_summary$r.squared
p_value <- round(coef[2, 4], 3) # p-value for the slope (exper_n variable)
# Create a dataframe for the regression results
regression_result <- data.frame(
Outcome = outcome_var,
Group = exper_var,
Coefficient = coef[2, 1], # slope coefficient
R_Squared = r_squared,
P_Value = p_value
)
# Add the result to the list
regression_results_list[[i]] <- regression_result
}
# Combine all regression results into one dataframe
all_regression_results <- do.call(rbind, regression_results_list)
all_regression_results <- all_regression_results %>%
arrange(Outcome, desc(R_Squared))
print(all_regression_results)
## Outcome Group Coefficient R_Squared P_Value
## 1 FORA_M preservice 0.9633753 0.031134439 0.102
## 2 IEdiff exper_3 -1.6919192 0.101365876 0.003
## 3 IEdiff exper_7 -2.8022152 0.098615434 0.003
## 4 IEdiff exper_8 -1.7174825 0.083819590 0.007
## 5 IEdiff exper_0 2.7716049 0.074186310 0.011
## 6 IEdiff exper_4 -1.3648649 0.068481913 0.014
## 7 IEdiff exper_9 -1.1654135 0.050253306 0.037
## 8 IEdiff raceNative -1.2976190 0.008431791 0.398
## 9 agreeableness exper_2 2.6877880 0.060950714 0.021
## 10 agreeableness preservice 5.6596386 0.051683810 0.034
## 11 exper_n exper_4 2.3918919 0.431828708 0.000
## 12 exper_n exper_9 2.1186896 0.341013072 0.000
## 13 exper_n exper_2 2.1532258 0.328386702 0.000
## 14 exper_n exper_3 2.0050505 0.292290716 0.000
## 15 exper_n exper_6 1.9578947 0.267443149 0.000
## 16 exper_n exper_8 1.6804196 0.164751051 0.000
## 17 exper_n exper_7 2.4588608 0.155898393 0.000
## 18 exper_n exper_0 -2.7037037 0.144947640 0.000
## 19 exper_n exper_1 1.4223776 0.118038192 0.001
## 20 exper_n es_grades -1.1546218 0.064726659 0.017
## 21 feedback_1 exper_9 -10.0069686 0.065270744 0.026
## 22 flexibility raceWhite -0.5543625 0.070013930 0.013
## 23 flexibility exper_10 0.5097435 0.019541704 0.197
## 24 grade_levels_M hs_grades 5.5528205 0.685449277 0.000
## 25 grade_levels_M es_grades -5.9377204 0.572440936 0.000
## 26 grade_levels_M exper_4 2.7946886 0.178256089 0.000
## 27 grade_levels_M exper_2 2.3974224 0.126258099 0.002
## 28 grade_levels_M exper_3 1.7676104 0.066679507 0.025
## 29 grade_levels_M raceWhite 1.9251282 0.054925764 0.043
## 30 grade_levels_M raceNative -3.5780983 0.045537855 0.066
## 31 grade_range hs_grades -3.1555556 0.211875716 0.000
## 32 grade_range exper_6 2.5298273 0.128512963 0.002
## 33 grade_range exper_9 1.6774194 0.060491370 0.033
## 34 inter_n exper_2 1.2759217 0.062064443 0.020
## 35 inter_n exper_4 1.0270270 0.042852937 0.054
Results of early linear models reveal little explanatory power of single binary groups and outcome variables. Suggest abstraction with PCA may be worthwhile.
# Initialize a list to store descriptive statistics results
descriptive_stats_list <- list()
groups <- c(binary_exper_variables, binary_group_variables)
# Outer loop: iterate over each outcome variable
for (outcome in outcomes) {
# Inner loop: iterate over each experience group variable
for (group in groups) {
# Filter data for exper_n = 1 and calculate descriptive statistics
group_1_stats <- data %>%
filter(get(group) == 1) %>%
summarize(
Outcome = outcome,
Group_Name = group,
Group = 1,
Mean = mean(get(outcome), na.rm = TRUE),
SD = sd(get(outcome), na.rm = TRUE),
Median = median(get(outcome), na.rm = TRUE),
Min = min(get(outcome), na.rm = TRUE),
Max = max(get(outcome), na.rm = TRUE),
N = n()
)
# Filter data for exper_n = 0 and calculate descriptive statistics
group_0_stats <- data %>%
filter(get(group) == 0) %>%
summarize(
Outcome = outcome,
Group_Name = group,
Group = 0,
Mean = mean(get(outcome), na.rm = TRUE),
SD = sd(get(outcome), na.rm = TRUE),
Median = median(get(outcome), na.rm = TRUE),
Min = min(get(outcome), na.rm = TRUE),
Max = max(get(outcome), na.rm = TRUE),
N = n()
)
# Combine statistics for both groups
combined_stats <- bind_rows(group_1_stats, group_0_stats)
# Append the result to the list
descriptive_stats_list[[length(descriptive_stats_list) + 1]] <- combined_stats
}
}
# Combine all descriptive statistics results into one dataframe
final_descriptive_stats <- bind_rows(descriptive_stats_list)
#write this to csv for later reference
write_csv(final_descriptive_stats, paste0("Describe by Group ", date_suffix,".csv"))
Plot
data_clean <- data %>%
filter(EHDP_M > 4, EHDP_M < 8)
model_scores <- lm(EHDP_M ~ FORA_M, data = data_clean)
r_squared_scores <- summary(model_scores)$r.squared
ggplot(data_clean, aes(x = FORA_M, y = EHDP_M)) +
geom_point(size = 3, alpha = 0.7) + # Scatter plot points
geom_smooth(method = "lm", se = FALSE) + # Add trend lines
labs(title = "Scatterplot of Scores with Trend Lines",
x = "FORA_M",
y = "EHDP_M") +
theme_minimal() +
annotate("text", x = max(data_clean$FORA_M) * 0.7, y = max(data_clean$EHDP_M),
label = paste("R²:", round(r_squared_scores, 3)),
color = "blue", hjust = 0)
## `geom_smooth()` using formula = 'y ~ x'
Consider each outlier and whether or not there is justification to remove from the model.
Outliers are 4 > EHDP_M > 8
Even after cleaning out these outliers, the the two variables have minimal explanatory value for each other.
summary(model_scores)
##
## Call:
## lm(formula = EHDP_M ~ FORA_M, data = data_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.42042 -0.22931 0.02374 0.33957 1.18654
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.58265 0.27424 24.003 <2e-16 ***
## FORA_M -0.04807 0.05570 -0.863 0.391
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5632 on 81 degrees of freedom
## Multiple R-squared: 0.00911, Adjusted R-squared: -0.003123
## F-statistic: 0.7447 on 1 and 81 DF, p-value: 0.3907
FORA score not a significant predictor of variance in EHDP score.
model <- lm(EHDP_n ~ abs(EHDP_M_z), data = data)
summary(model)
##
## Call:
## lm(formula = EHDP_n ~ abs(EHDP_M_z), data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.0299 -1.7976 0.7144 1.9693 4.9701
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.1793 0.3705 30.176 < 2e-16 ***
## abs(EHDP_M_z) -2.3352 0.3726 -6.267 1.46e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.679 on 85 degrees of freedom
## Multiple R-squared: 0.316, Adjusted R-squared: 0.308
## F-statistic: 39.28 on 1 and 85 DF, p-value: 1.461e-08
Reasonably strong association at R-squared of .32 with distance from mean of EDHP being a significant predictor of EHDP number. The more extreme your views about EHDP the fewer items you agree with, and more importantly, the more central your views about EHDP the more agreeable you are in selecting more items on the scale.
model <- lm(EHDP_SD ~ abs(EHDP_M_z), data = data)
summary(model)
##
## Call:
## lm(formula = EHDP_SD ~ abs(EHDP_M_z), data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.6791 -0.3103 -0.1255 0.1852 3.2423
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.00332 0.08295 48.26 < 2e-16 ***
## abs(EHDP_M_z) -0.35455 0.08343 -4.25 5.46e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5998 on 85 degrees of freedom
## Multiple R-squared: 0.1752, Adjusted R-squared: 0.1655
## F-statistic: 18.06 on 1 and 85 DF, p-value: 5.457e-05
This reveals significant but small negative relationships between extremity of views about the EHDP variable and breadth of statements endorsed.
This just means that average scores on EHDP are not made by selecting only a few statements in the middle, but by selecting a balance of statements on both sides of the mean.
With more data, this pattern may support the hypothesis that those in the middle have flexible views, endorsing statements on both sides of the continuum. It’s a given that those on the extremes of EHDP score would not have endorsed as many items as it’s the only way to keep your score on the extreme, but it is interesting to know that that’s not common in the middle.
model <- lm(FORA_n ~ abs(FORA_M_z), data = data)
summary(model)
##
## Call:
## lm(formula = FORA_n ~ abs(FORA_M_z), data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.9977 -1.4152 -0.0182 1.4303 6.2044
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.0445 0.4604 21.815 <2e-16 ***
## abs(FORA_M_z) -1.1521 0.4631 -2.488 0.0148 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.625 on 85 degrees of freedom
## Multiple R-squared: 0.06787, Adjusted R-squared: 0.0569
## F-statistic: 6.189 on 1 and 85 DF, p-value: 0.01481
Distance from mean of FORA_M is not a good predictor of number of items endorsed on construct. Significant but minimal explanatory.
model <- lm(FORA_SD ~ abs(FORA_M_z), data = data)
summary(model)
##
## Call:
## lm(formula = FORA_SD ~ abs(FORA_M_z), data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.2027 -0.4397 -0.1036 0.3602 1.3258
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.64613 0.09948 36.651 < 2e-16 ***
## abs(FORA_M_z) -0.69534 0.10006 -6.949 6.98e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5671 on 85 degrees of freedom
## Multiple R-squared: 0.3623, Adjusted R-squared: 0.3548
## F-statistic: 48.29 on 1 and 85 DF, p-value: 6.982e-10
Significant but small explanation of the range of FORA items endorsed by the extremity of FORA views. Like EHDP, this means that scores at the grand mean are not just agreeing to a few middle-scoring items but agreeing to more items on opposite ends of the continuum.
data_subset <- data %>%
select(inter_n,
exper_n,
flexibility,
agreeableness,
years,
school_poverty,
locale,
feedback_1
)
#remove NAs - still 66 obs as of 11/3/24
data_subset <- na.omit(data_subset)
This subset includes the following variables for each of 66 individuals filtered:
col_names <- names(data_subset)
summary <- data.frame()
for (col_name in col_names) {
summary <- rbind(summary, data.frame(describe(data_subset[col_name])))
}
summary <- summary[2:nrow(summary),]
print(summary)
## vars n mean sd median trimmed mad
## exper_n 1 66 3.666667 1.8088103 3.000000 3.574074 1.4826000
## flexibility 1 66 6.802487 0.8430139 7.018548 6.937982 0.5071747
## agreeableness 1 66 19.378788 5.0314859 21.000000 19.574074 5.1891000
## years 1 66 15.621212 5.0801731 19.000000 16.537037 0.0000000
## school_poverty 1 66 2.166667 1.3540064 2.000000 2.203704 1.4826000
## locale 1 66 2.348485 0.9998834 2.000000 2.314815 0.0000000
## feedback_1 1 66 73.636364 19.6614706 80.000000 75.000000 14.8260000
## min max range skew kurtosis se
## exper_n 1.000000 9.000000 8.000000 0.49440656 -0.08015521 0.2226492
## flexibility 3.137345 7.910654 4.773309 -2.15063142 5.89958990 0.1037678
## agreeableness 7.000000 29.000000 22.000000 -0.37332793 -0.66149077 0.6193331
## years 3.000000 19.000000 16.000000 -1.31177570 0.35674953 0.6253261
## school_poverty 0.000000 4.000000 4.000000 0.03119674 -1.20730028 0.1666667
## locale 1.000000 4.000000 3.000000 0.54970941 -0.84566509 0.1230771
## feedback_1 0.000000 100.000000 100.000000 -1.01923596 1.37918843 2.4201597
Examine correlations between this subset of variables
cor_matrix <- cor(data_subset)
print(cor_matrix)
## inter_n exper_n flexibility agreeableness years
## inter_n 1.00000000 0.419594726 0.26540763 0.45373522 0.08844577
## exper_n 0.41959473 1.000000000 0.07904636 0.17636826 0.40293134
## flexibility 0.26540763 0.079046361 1.00000000 0.54225915 -0.12672422
## agreeableness 0.45373522 0.176368259 0.54225915 1.00000000 0.11945548
## years 0.08844577 0.402931336 -0.12672422 0.11945548 1.00000000
## school_poverty 0.15376824 0.067004137 -0.09349146 0.01091482 -0.06225190
## locale -0.06089231 -0.164456473 -0.23009648 -0.19177522 -0.12807804
## feedback_1 0.11125273 -0.008651823 0.10845763 0.21447036 0.07869294
## school_poverty locale feedback_1
## inter_n 0.15376824 -0.06089231 0.111252725
## exper_n 0.06700414 -0.16445647 -0.008651823
## flexibility -0.09349146 -0.23009648 0.108457629
## agreeableness 0.01091482 -0.19177522 0.214470358
## years -0.06225190 -0.12807804 0.078692939
## school_poverty 1.00000000 0.02462117 -0.144474080
## locale 0.02462117 1.00000000 -0.182835998
## feedback_1 -0.14447408 -0.18283600 1.000000000
How high are the max correlations?
# Find pairs with correlation above 0.3
high_corr_pairs <- which(abs(cor_matrix) > 0.3 & upper.tri(cor_matrix), arr.ind = TRUE)
# Create a dataframe of these pairs with their correlation coefficients
high_cor_list <- data.frame(
Var1 = rownames(cor_matrix)[high_corr_pairs[, 1]],
Var2 = colnames(cor_matrix)[high_corr_pairs[, 2]],
Correlation = cor_matrix[high_corr_pairs]
)
# Print the list
print(high_cor_list)
## Var1 Var2 Correlation
## 1 inter_n exper_n 0.4195947
## 2 inter_n agreeableness 0.4537352
## 3 flexibility agreeableness 0.5422592
## 4 exper_n years 0.4029313
Highest is flexibility and agreeableness which is at 0.54, still not too strong.
These are just the non-score variables that preliminary analysis has shown may be of somevalue in distinguishing individuals.
data_standardized <- scale(data_subset)
pca_result <- prcomp(data_standardized, center = TRUE, scale. = TRUE)
# Summary of PCA results
summary(pca_result)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 1.4871 1.1802 1.1068 0.9474 0.9261 0.79554 0.67367
## Proportion of Variance 0.2764 0.1741 0.1531 0.1122 0.1072 0.07911 0.05673
## Cumulative Proportion 0.2764 0.4505 0.6037 0.7159 0.8231 0.90217 0.95890
## PC8
## Standard deviation 0.5734
## Proportion of Variance 0.0411
## Cumulative Proportion 1.0000
# View the loadings (coefficients) of the original variables on the principal components
print(pca_result$rotation)
## PC1 PC2 PC3 PC4 PC5
## inter_n -0.471650124 -0.14594319 0.33795308 0.21339357 0.22546797
## exper_n -0.377112630 -0.53607205 -0.02160338 0.08038653 -0.15787730
## flexibility -0.416871326 0.45782162 0.14902993 0.06230727 -0.34398719
## agreeableness -0.526363718 0.22460559 0.13019678 0.13687767 0.02196996
## years -0.213657961 -0.54441097 -0.42690411 0.09141006 -0.03642895
## school_poverty -0.009102141 -0.26038916 0.60421907 -0.56656538 0.32656601
## locale 0.294298875 -0.08367648 0.32637284 0.76833144 0.29960951
## feedback_1 -0.228357961 0.24513561 -0.43853490 -0.07548624 0.77954525
## PC6 PC7 PC8
## inter_n 0.3817902 0.49219911 0.3970366694
## exper_n 0.4071517 -0.48995582 -0.3643208857
## flexibility -0.1234681 -0.50352646 0.4510156134
## agreeableness -0.4526083 0.24584206 -0.6091236360
## years -0.5754286 0.05775334 0.3627443125
## school_poverty -0.2958805 -0.22063754 0.0562677782
## locale -0.1975636 -0.28414210 0.0001722078
## feedback_1 0.1032975 -0.26637008 0.0208796348
pca_result_rotation_df <- data.frame(pca_result$rotation)
Scree plot to examine PCs
screeplot(pca_result, type = "lines")
Biplot of PCs will show the directionality of components on axis that are the first two PCs, also locating each individual on the plot.
biplot(pca_result, scale = 0)
Loadings of PCs
print(pca_result$rotation[, 1:5]) # Show loadings for the first 5 components
## PC1 PC2 PC3 PC4 PC5
## inter_n -0.471650124 -0.14594319 0.33795308 0.21339357 0.22546797
## exper_n -0.377112630 -0.53607205 -0.02160338 0.08038653 -0.15787730
## flexibility -0.416871326 0.45782162 0.14902993 0.06230727 -0.34398719
## agreeableness -0.526363718 0.22460559 0.13019678 0.13687767 0.02196996
## years -0.213657961 -0.54441097 -0.42690411 0.09141006 -0.03642895
## school_poverty -0.009102141 -0.26038916 0.60421907 -0.56656538 0.32656601
## locale 0.294298875 -0.08367648 0.32637284 0.76833144 0.29960951
## feedback_1 -0.228357961 0.24513561 -0.43853490 -0.07548624 0.77954525
They include:
| Variable | M |
|---|---|
| Agreeableness | 18.9 |
| Experience with advanced programs | 3.5 |
| Flexibility | 6.9 |
| Interests in advanced programs | 5.5 |
| Locale rurality | 2.3 |
| School poverty level | 2.2 |
| Years of teaching art | 15.8 |
# Convert the PCA rotation matrix to a data frame for easier manipulation
rotation_df <- as.data.frame(pca_result$rotation[, 1:4])
# Ensure row names (variables) are added as a column to the data frame
rotation_df$Variable <- rownames(rotation_df)
# Reshape the data for ggplot2 (using 'melt' function)
rotation_melt <- melt(rotation_df, id.vars = "Variable", variable.name = "PC", value.name = "Loading")
# Plot
ggplot(rotation_melt, aes(x = Variable, y = Loading, fill = PC)) +
geom_bar(stat = "identity", position = "dodge") +
labs(x = "Variables", y = "Contribution to PC", title = "PCA Component Contributions") +
theme_minimal() +
scale_fill_discrete(name = "Principal Components") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
eigenvalues <- (pca_result$sdev)^2
print(eigenvalues)
## [1] 2.2113382 1.3928608 1.2250429 0.8976274 0.8576443 0.6328768 0.4538251
## [8] 0.3287845
Principal components analysis was conducted to abstract from the scaled continuous and ordinal variables.
Eigenvalue of 2.2 explains a substantial portion of the variance. Much lower agreeableness and flexibility and second-lowest feedback rating. Experience and especially interest in fewer types of programs than others Fewer years than average, but not the fewest. Most rural schools, average school poverty.
Eigenvalue of 1.4 explains a substantial amount of the variance, but less than PC1. Highest agreeableness, flexibility, and feedback ratings Interested in more types of programs than the few types of programs they have experience with - the fewest of all groups. Newest to teaching art. A little below average rurality and lowest school poverty level (suburban)
Eigenvalue of 1.2 explains a substantial amount of the variance, but less than PC1 & PC2. Above-average agreeableness and flexibility, lowest feedback ratings. Experienced with average number of types of programs Interested in the greatest number of types of programs Second-least number of years of teaching than than other groups. At schools that are the most poor and second-most rural
Eigenvalue of 0.9 just below the threshold for usefulness in abstraction. About average agreeableness, flexibility, and feedback ratings Interested in fewer types of programs than has experience with Greatest number of years of teaching Working at the most urban schools with above-average poverty
First prepare the dataset
# Convert row names to a column called "Row_Name"
data <- data %>%
tibble::rownames_to_column(var = "Row_Name")
data_subset <- data_subset %>%
tibble::rownames_to_column(var = "Row_Name")
# Now perform the semi_join on the new "Row_Name" column
data_clean <- data %>%
semi_join(data_subset, by = "Row_Name")
data_clean$PC1 <- pca_result$x[,1]
data_clean$PC2 <- pca_result$x[,2]
data_clean$PC3 <- pca_result$x[,3]
data_clean$PC4 <- pca_result$x[,4]
# Combine all predictor vectors into one
all_predictors <- c(binary_exper_variables, binary_group_variables)
# Create the formula as a string
predictors_formula <- paste(all_predictors, collapse = " + ")
# Create the full formula for the linear model
model_formula <- as.formula(paste("PC1 ~", predictors_formula))
model <- lm(model_formula, data = data_clean)
summary(model)
##
## Call:
## lm(formula = model_formula, data = data_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.6055 -0.5787 -0.0665 0.5828 3.4923
##
## Coefficients: (5 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.08243 1.12644 1.849 0.0716 .
## exper_0 -1.86795 0.98786 -1.891 0.0655 .
## exper_1 -0.79137 0.53274 -1.485 0.1449
## exper_2 -0.94256 0.41454 -2.274 0.0282 *
## exper_3 -0.04742 0.51875 -0.091 0.9276
## exper_4 -0.08206 0.45647 -0.180 0.8582
## exper_5 0.48457 0.69467 0.698 0.4893
## exper_6 -0.89593 0.43743 -2.048 0.0468 *
## exper_7 -0.11603 0.75372 -0.154 0.8784
## exper_8 -0.45093 0.46122 -0.978 0.3338
## exper_9 -0.21517 0.45073 -0.477 0.6356
## exper_10 -0.38811 0.77372 -0.502 0.6186
## preservice -1.36554 1.07717 -1.268 0.2119
## practicing NA NA NA NA
## researcher -0.16535 0.59978 -0.276 0.7841
## raceWhite -0.50569 0.69019 -0.733 0.4678
## raceBlack NA NA NA NA
## raceNative -1.25929 1.04505 -1.205 0.2349
## raceAsian -0.58047 1.30522 -0.445 0.6588
## racePacific NA NA NA NA
## raceOther -1.61449 1.24443 -1.297 0.2016
## ethnHispanic -0.11670 0.59315 -0.197 0.8450
## grad_degree -0.62189 0.40062 -1.552 0.1281
## teaching NA NA NA NA
## teach_experience NA NA NA NA
## hs_grades 0.21954 0.47698 0.460 0.6477
## es_grades 0.75582 0.52100 1.451 0.1543
## school_poverty_high 0.24739 0.37187 0.665 0.5095
## school_pub 0.36747 0.68875 0.534 0.5965
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.273 on 42 degrees of freedom
## Multiple R-squared: 0.5265, Adjusted R-squared: 0.2672
## F-statistic: 2.03 on 23 and 42 DF, p-value: 0.02281
The “Outsider Skeptic” not likely to have experience in Honor Society and Gifted & Talented programs.
model <- lm(PC1 ~ exper_2 + exper_6, data = data_clean)
summary(model)
##
## Call:
## lm(formula = PC1 ~ exper_2 + exper_6, data = data_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.2204 -0.9533 -0.1760 0.8984 3.4324
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.8426 0.2208 3.817 0.000311 ***
## exper_2 -1.2644 0.3202 -3.948 0.000201 ***
## exper_6 -0.8948 0.3273 -2.734 0.008118 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.252 on 63 degrees of freedom
## Multiple R-squared: 0.3126, Adjusted R-squared: 0.2908
## F-statistic: 14.33 on 2 and 63 DF, p-value: 7.445e-06
model_formula <- as.formula(paste("PC2 ~", predictors_formula))
model <- lm(model_formula, data = data_clean)
summary(model)
##
## Call:
## lm(formula = model_formula, data = data_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.05284 -0.49795 -0.06932 0.51223 1.55966
##
## Coefficients: (5 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.56907 0.78594 3.269 0.002159 **
## exper_0 0.05623 0.68925 0.082 0.935369
## exper_1 -0.63510 0.37171 -1.709 0.094911 .
## exper_2 0.06430 0.28924 0.222 0.825160
## exper_3 -0.47701 0.36194 -1.318 0.194672
## exper_4 -0.95640 0.31849 -3.003 0.004491 **
## exper_5 -0.50234 0.48469 -1.036 0.305940
## exper_6 -0.03623 0.30521 -0.119 0.906067
## exper_7 0.11492 0.52589 0.219 0.828084
## exper_8 0.07832 0.32180 0.243 0.808890
## exper_9 -1.15102 0.31449 -3.660 0.000699 ***
## exper_10 -0.42418 0.53984 -0.786 0.436428
## preservice 0.99573 0.75157 1.325 0.192376
## practicing NA NA NA NA
## researcher -0.17142 0.41848 -0.410 0.684172
## raceWhite -0.25307 0.48156 -0.526 0.601994
## raceBlack NA NA NA NA
## raceNative 0.52114 0.72916 0.715 0.478743
## raceAsian 1.36019 0.91069 1.494 0.142759
## racePacific NA NA NA NA
## raceOther -0.06796 0.86827 -0.078 0.937982
## ethnHispanic -0.30714 0.41386 -0.742 0.462132
## grad_degree 0.24231 0.27952 0.867 0.390945
## teaching NA NA NA NA
## teach_experience NA NA NA NA
## hs_grades 0.16097 0.33280 0.484 0.631124
## es_grades -0.16691 0.36351 -0.459 0.648481
## school_poverty_high -0.57952 0.25946 -2.234 0.030897 *
## school_pub -0.66462 0.48056 -1.383 0.173968
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8882 on 42 degrees of freedom
## Multiple R-squared: 0.634, Adjusted R-squared: 0.4336
## F-statistic: 3.163 on 23 and 42 DF, p-value: 0.0005882
model <- lm(PC2 ~ exper_4 + exper_9 + school_poverty_high, data = data_clean)
summary(model)
##
## Call:
## lm(formula = PC2 ~ exper_4 + exper_9 + school_poverty_high, data = data_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.84607 -0.57835 -0.05833 0.46166 2.52490
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.0638 0.1939 5.487 8.04e-07 ***
## exper_4 -0.9120 0.2302 -3.961 0.000195 ***
## exper_9 -0.9785 0.2303 -4.249 7.34e-05 ***
## school_poverty_high -0.6031 0.2357 -2.558 0.012978 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9084 on 62 degrees of freedom
## Multiple R-squared: 0.4349, Adjusted R-squared: 0.4076
## F-statistic: 15.91 on 3 and 62 DF, p-value: 8.833e-08
The Open-Positive-Fresh respondent does not have experience with AP or Mentorship and identifies with a school that is not high poverty.
model_formula <- as.formula(paste("PC3 ~", predictors_formula))
model <- lm(model_formula, data = data_clean)
summary(model)
##
## Call:
## lm(formula = model_formula, data = data_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.41309 -0.45045 0.00631 0.45118 1.87007
##
## Coefficients: (5 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.38877 0.72737 -0.534 0.59582
## exper_0 0.63228 0.63789 0.991 0.32726
## exper_1 0.19020 0.34401 0.553 0.58326
## exper_2 -0.17120 0.26768 -0.640 0.52594
## exper_3 0.43995 0.33497 1.313 0.19618
## exper_4 -0.48560 0.29475 -1.647 0.10692
## exper_5 -0.18927 0.44857 -0.422 0.67522
## exper_6 -0.17524 0.28246 -0.620 0.53835
## exper_7 0.14806 0.48670 0.304 0.76247
## exper_8 -0.01601 0.29782 -0.054 0.95738
## exper_9 0.04153 0.29105 0.143 0.88722
## exper_10 1.12432 0.49961 2.250 0.02972 *
## preservice -0.07474 0.69555 -0.107 0.91494
## practicing NA NA NA NA
## researcher -0.41844 0.38729 -1.080 0.28612
## raceWhite 0.76314 0.44567 1.712 0.09421 .
## raceBlack NA NA NA NA
## raceNative 0.28408 0.67482 0.421 0.67591
## raceAsian 1.88087 0.84282 2.232 0.03103 *
## racePacific NA NA NA NA
## raceOther 1.03191 0.80356 1.284 0.20612
## ethnHispanic -0.33682 0.38302 -0.879 0.38419
## grad_degree -0.73834 0.25869 -2.854 0.00667 **
## teaching NA NA NA NA
## teach_experience NA NA NA NA
## hs_grades -0.24325 0.30800 -0.790 0.43409
## es_grades -0.34118 0.33642 -1.014 0.31632
## school_poverty_high 1.06512 0.24013 4.436 6.5e-05 ***
## school_pub -0.14112 0.44474 -0.317 0.75258
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.822 on 42 degrees of freedom
## Multiple R-squared: 0.6436, Adjusted R-squared: 0.4484
## F-statistic: 3.297 on 23 and 42 DF, p-value: 0.0003892
The Cautious/Mid-career respondent is more likely to enter “other” for experience, have no graduate degree, and identify with a high-poverty school
model <- lm(PC3 ~ exper_10 + grad_degree + school_poverty_high, data = data_clean)
summary(model)
##
## Call:
## lm(formula = PC3 ~ exper_10 + grad_degree + school_poverty_high,
## data = data_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.50732 -0.48404 -0.01803 0.51646 1.96670
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.0203 0.2129 -0.095 0.924315
## exper_10 1.3054 0.3557 3.670 0.000506 ***
## grad_degree -0.7217 0.2204 -3.275 0.001734 **
## school_poverty_high 1.2874 0.1953 6.592 1.09e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7603 on 62 degrees of freedom
## Multiple R-squared: 0.55, Adjusted R-squared: 0.5282
## F-statistic: 25.26 on 3 and 62 DF, p-value: 8.505e-11
model_formula <- as.formula(paste("PC4 ~", predictors_formula))
model <- lm(model_formula, data = data_clean)
summary(model)
##
## Call:
## lm(formula = model_formula, data = data_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.68245 -0.49891 0.01864 0.40548 1.48472
##
## Coefficients: (5 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.858244 0.714448 1.201 0.2364
## exper_0 0.546320 0.626553 0.872 0.3882
## exper_1 -0.038037 0.337896 -0.113 0.9109
## exper_2 -0.338366 0.262927 -1.287 0.2052
## exper_3 -0.025553 0.329017 -0.078 0.9385
## exper_4 -0.161716 0.289516 -0.559 0.5794
## exper_5 -0.497157 0.440599 -1.128 0.2656
## exper_6 0.552367 0.277444 1.991 0.0530 .
## exper_7 -0.079973 0.478053 -0.167 0.8679
## exper_8 0.191294 0.292530 0.654 0.5167
## exper_9 0.255548 0.285878 0.894 0.3765
## exper_10 0.027024 0.490732 0.055 0.9563
## preservice 0.770620 0.683198 1.128 0.2657
## practicing NA NA NA NA
## researcher -0.008833 0.380414 -0.023 0.9816
## raceWhite 0.541757 0.437756 1.238 0.2227
## raceBlack NA NA NA NA
## raceNative -0.476137 0.662827 -0.718 0.4765
## raceAsian 1.923431 0.827845 2.323 0.0251 *
## racePacific NA NA NA NA
## raceOther 0.958515 0.789286 1.214 0.2314
## ethnHispanic -0.005235 0.376211 -0.014 0.9890
## grad_degree -0.742075 0.254096 -2.920 0.0056 **
## teaching NA NA NA NA
## teach_experience NA NA NA NA
## hs_grades -0.174329 0.302529 -0.576 0.5675
## es_grades -0.570475 0.330445 -1.726 0.0916 .
## school_poverty_high -1.210828 0.235859 -5.134 6.89e-06 ***
## school_pub -0.291351 0.436842 -0.667 0.5085
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8074 on 42 degrees of freedom
## Multiple R-squared: 0.5307, Adjusted R-squared: 0.2737
## F-statistic: 2.065 on 23 and 42 DF, p-value: 0.02034
The focused/experienced respondent tends not to identify with high poverty school and does not have a grad degree.
model <- lm(PC4 ~ grad_degree + school_poverty_high, data = data_clean)
summary(model)
##
## Call:
## lm(formula = PC4 ~ grad_degree + school_poverty_high, data = data_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.04413 -0.57356 -0.08624 0.45400 1.91125
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.8106 0.2206 3.675 0.000494 ***
## grad_degree -0.6015 0.2320 -2.592 0.011835 *
## school_poverty_high -0.9759 0.2067 -4.721 1.35e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8048 on 63 degrees of freedom
## Multiple R-squared: 0.3007, Adjusted R-squared: 0.2785
## F-statistic: 13.55 on 2 and 63 DF, p-value: 1.279e-05
model <- lm(data_clean$EHDP_M ~ pca_result$x[, 1] + pca_result$x[, 2])
summary(model)
##
## Call:
## lm(formula = data_clean$EHDP_M ~ pca_result$x[, 1] + pca_result$x[,
## 2])
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.3845 -0.3448 0.0113 0.3991 3.6367
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.42642 0.11036 58.233 <2e-16 ***
## pca_result$x[, 1] 0.10142 0.07478 1.356 0.180
## pca_result$x[, 2] -0.09714 0.09422 -1.031 0.307
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8965 on 63 degrees of freedom
## Multiple R-squared: 0.04404, Adjusted R-squared: 0.01369
## F-statistic: 1.451 on 2 and 63 DF, p-value: 0.242
No PCs are a good predictor of EHDP score.
model <- lm(data_clean$FORA_M ~ pca_result$x[, 1] + pca_result$x[, 2] + pca_result$x[, 3])
summary(model)
##
## Call:
## lm(formula = data_clean$FORA_M ~ pca_result$x[, 1] + pca_result$x[,
## 2] + pca_result$x[, 3])
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.4960 -0.4962 -0.1366 0.3614 2.1034
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.73742 0.09398 50.407 < 2e-16 ***
## pca_result$x[, 1] -0.45196 0.06369 -7.097 1.46e-09 ***
## pca_result$x[, 2] 0.43360 0.08024 5.403 1.10e-06 ***
## pca_result$x[, 3] 0.19593 0.08556 2.290 0.0254 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7635 on 62 degrees of freedom
## Multiple R-squared: 0.5777, Adjusted R-squared: 0.5572
## F-statistic: 28.27 on 3 and 62 DF, p-value: 1.214e-11
FORA decreases by .45 units with a unit of increase on PC1 (outsider) and increases by .43 units with a unit increase in PC2 (early and open), both significant (p < .001). A considerable amount of the variation in FORA score is explained by PC predictors (Adjusted R-squared of 0.56). A respondent at the mean of both principal components had a FORA score of 4.73 (p < .001), which is a little on the openness=fairness side.
PC1 ~ lower FORA
PC2 ~ higher FORA
PC3 ~ a bit higher FORA
Next:
Add PC variables to dataframe.
# Assign the first principal component loadings (PC1) to the data
data_clean$PC1 <- pca_result$x[, 1]
# Assign the second principal component loadings (PC2) to the data
data_clean$PC2 <- pca_result$x[, 2]
# Assign the second principal component loadings (PC2) to the data
data_clean$PC3 <- pca_result$x[, 3]
#change the grouping variable
# Create a 3D scatter plot
plot_ly(data = data_clean,
x = ~data_clean$PC1,
y = ~data_clean$PC2,
z = ~data_clean$PC3,
color = ~as.factor(exper_2),
type = "scatter3d",
mode = "markers") %>%
layout(scene = list(xaxis = list(title = "PC1"),
yaxis = list(title = "PC2"),
zaxis = list(title = "PC3")),
title = "3D Scatterplot of PC1, PC2, and PC3 Coded by exper_4")
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
# Create a 3D scatter plot
plot_ly(data = data_clean,
x = ~data_clean$PC1,
y = ~data_clean$PC2,
z = ~data_clean$PC3,
color = ~as.factor(exper_6),
type = "scatter3d",
mode = "markers") %>%
layout(scene = list(xaxis = list(title = "PC1"),
yaxis = list(title = "PC2"),
zaxis = list(title = "PC3")),
title = "3D Scatterplot of PC1, PC2, and PC3 Coded by exper_4")
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
# Create a 3D scatter plot
plot_ly(data = data_clean,
x = ~data_clean$PC1,
y = ~data_clean$PC2,
z = ~data_clean$PC3,
color = ~as.factor(exper_4),
type = "scatter3d",
mode = "markers") %>%
layout(scene = list(xaxis = list(title = "PC1"),
yaxis = list(title = "PC2"),
zaxis = list(title = "PC3")),
title = "3D Scatterplot of PC1, PC2, and PC3 Coded by exper_4")
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
# Create a 3D scatter plot
plot_ly(data = data_clean,
x = ~data_clean$PC1,
y = ~data_clean$PC2,
z = ~data_clean$PC3,
color = ~as.factor(exper_9),
type = "scatter3d",
mode = "markers") %>%
layout(scene = list(xaxis = list(title = "PC1"),
yaxis = list(title = "PC2"),
zaxis = list(title = "PC3")),
title = "3D Scatterplot of PC1, PC2, and PC3 Coded by exper_4")
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
# Create a 3D scatter plot
plot_ly(data = data_clean,
x = ~data_clean$PC1,
y = ~data_clean$PC2,
z = ~data_clean$PC3,
color = ~as.factor(school_poverty_high),
type = "scatter3d",
mode = "markers") %>%
layout(scene = list(xaxis = list(title = "PC1"),
yaxis = list(title = "PC2"),
zaxis = list(title = "PC3")),
title = "3D Scatterplot of PC1, PC2, and PC3 Coded by exper_4")
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
# Create a 3D scatter plot
plot_ly(data = data_clean,
x = ~data_clean$PC1,
y = ~data_clean$PC2,
z = ~data_clean$PC3,
color = ~as.factor(exper_10),
type = "scatter3d",
mode = "markers") %>%
layout(scene = list(xaxis = list(title = "PC1"),
yaxis = list(title = "PC2"),
zaxis = list(title = "PC3")),
title = "3D Scatterplot of PC1, PC2, and PC3 Coded by exper_4")
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
# Create a 3D scatter plot
plot_ly(data = data_clean,
x = ~data_clean$PC1,
y = ~data_clean$PC2,
z = ~data_clean$PC3,
color = ~as.factor(grad_degree),
type = "scatter3d",
mode = "markers") %>%
layout(scene = list(xaxis = list(title = "PC1"),
yaxis = list(title = "PC2"),
zaxis = list(title = "PC3")),
title = "3D Scatterplot of PC1, PC2, and PC3 Coded by exper_4")
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
# Create a 3D scatter plot
plot_ly(data = data_clean,
x = ~data_clean$PC1,
y = ~data_clean$PC2,
z = ~data_clean$PC3,
color = ~as.factor(school_poverty_high),
type = "scatter3d",
mode = "markers") %>%
layout(scene = list(xaxis = list(title = "PC1"),
yaxis = list(title = "PC2"),
zaxis = list(title = "PC3")),
title = "3D Scatterplot of PC1, PC2, and PC3 Coded by exper_4")
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels