AERA 2026 GIF Bayes Validation

R Packages Used

library(readr)
library(ggplot2)
library(caret)
library(naivebayes)
library(tibble)
library(knitr)
library(kableExtra)
library(dplyr)

Load and Prepare Data

# Load cleaned data
vesas_data <- read_csv("PostBayesIPFall_251127b.csv")

## Rows: 37 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ResponseId
## dbl (9): PartID, PerceptGr, Duration, Sex, FinalRaceEthic, CT_E, CT_V, CTEz,...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# Convert group variable to factor
vesas_data$PerceptGr <- as.factor(vesas_data$PerceptGr)

# Check
head(vesas_data)

## # A tibble: 6 × 10
##   PartID PerceptGr Duration ResponseId      Sex FinalRaceEthic  CT_E  CT_V  CTEz
##    <dbl> <fct>        <dbl> <chr>         <dbl>          <dbl> <dbl> <dbl> <dbl>
## 1      2 2             2056 R_1mEkj5SYHi…     2              4    31    37  0.8 
## 2      8 2              910 R_0OjMW2JR2E…     1              5    25    32 -0.04
## 3     10 2              474 R_1OvsOCBOrH…     1              4    18    26 -1.04
## 4     12 2             1162 R_2uy37qvVVI…     1              1    33    38  0.97
## 5     13 2             1814 R_2Vg6LcYueo…     2              4    25    40  1.3 
## 6     15 2              885 R_O9zkIookFp…     2              5    26    29 -0.54
## # ℹ 1 more variable: CTVz <dbl>

Background

This analysis is part of a broader study investigating how students’ interpretations of academic challenges relate to their motivation for pursuing STEM. Specifically, the study examined how students label past science-related experiences—as failures, successes, or neutral events—and how those labels align with their self-reported expectations for success and perceived value of STEM fields.

As part of the data collection, each student responded to two open-ended prompts:

“Tell me about a challenge you faced in a science class. Was this a failure, success, or neither? Please explain why.”

“Tell me about a challenge you faced in a math class. Was this a failure, success, or neither? Please explain why.”

Each student’s qualitative responses were coded to capture their general perception of the experience. For every mention of failure, students received a score of -1; for every mention of success, a score of +1; and for neutral responses, a score of 0. These scores were summed across both prompts to create a cumulative label score. Students with positive scores were assigned to the Success group (20; 54.1%), negative scores to the Failure group (10; 27.0%), and scores of zero to the Neutral group (7; 18.9%). Students who did not provide responses (n=3) were classified as “Did Not State” and were excluded from further analysis.

In addition to the open-ended responses, students completed self-report surveys (VESAS) measuring two key aspects of motivation: Expectancy (confidence in succeeding in STEM) and Value (importance placed on STEM). Since the VESAS scores were NOT used to create the perception groups, they provided an independent measure for validating the groupings. A scatterplot of standardized (z-scored) VESAS scores revealed meaningful differences consistent with motivational theory: students in the Failure group tended to have lower expectancy and value scores compared to those in the Success and Neutral groups.

# Create Non-Bayes Scaterplot with Standardized VESAS scores are 'CTEz' (Expectancy) and 'CTVz' (Value)
ggplot(vesas_data, aes(x = CTEz, y = CTVz, color = PerceptGr)) +
  geom_point(size = 3, alpha = 0.7) +
  scale_color_manual(
    values = c("0" = "red", "1" = "gray", "2" = "blue"),
    labels = c("Failure", "Neutral", "Success", "Did Not State"),
    name = "Perception Group"
  ) +
  labs(
    title = " Scatterplot of Standardized Expect. & Value Scores by Perception Group",
    x = "Standardized Expectancy Score (CTEz)",
    y = "Standardized Value Score (CTVz)"
  ) +
  theme_minimal() +
  theme(
    legend.position = "right",
    plot.title = element_text(hjust = 0.5)
  )

Purpose of Naive Bayes Model

Reviewer feedback was mixed; highlighting both strengths and limitations of the initial validation approach. Although visual inspection of standardized VESAS scores offered preliminary support for the perception groupings, this method alone lacks the statistical rigor required for formal validation. Further, the underlying qualitative categorization was systematic but not theoretically grounded. aive Bayes Posterior Probabilities was attempted to formally assess the alignment between qualitative groupings and independent motivational measures.

Naive Bayes is a probabilistic classification approach based on Bayes’ theorem that estimates the likelihood of class membership given observed predictors. Because it accommodates categorical outcomes and continuous predictors, and provides posterior probability estimates for individual cases, it is well suited for evaluating alignment between motivational measures and perception-based groupings in small samples.

To examine whether quantitative motivational profiles could reproduce the qualitative perception groups, a supervised Naive Bayes classifier was trained using students’ VESAS Expectancy and Value scores, with perception-group classifications derived from qualitative coding serving as the outcome labels. The goal of this analysis was not to maximize predictive accuracy, but to assess the extent to which motivational magnitude aligns with students’ qualitative interpretations of academic challenge.

Train Naive Bayes Model

Model performance was estimated using 10-fold cross-validation. After resampling, the final Naive Bayes model was refit on the full dataset and used to generate predicted classes and posterior probabilities. Expectancy and Value scores were used as predictors, and hyperparameters were held constant due to sample size constraints.

train_control <- trainControl(method = "cv", number = 5)

model <- train(
  PerceptGr ~ CT_E + CT_V,
  data = vesas_data,
  method = "naive_bayes",
  trControl = train_control,
  tuneGrid = expand.grid(
    usekernel = FALSE,
    laplace = 0,
    adjust = 1
  )
)
# Print the model summary
print(model)

## Naive Bayes 
## 
## 37 samples
##  2 predictor
##  3 classes: '0', '1', '2' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 30, 30, 30, 29, 29 
## Resampling results:
## 
##   Accuracy   Kappa    
##   0.6535714  0.3436029
## 
## Tuning parameter 'laplace' was held constant at a value of 0
## Tuning
##  parameter 'usekernel' was held constant at a value of FALSE
## Tuning
##  parameter 'adjust' was held constant at a value of 1

Generate Predictions and Posterior Probabilities

# Predicted groups
predicted_classes <- predict(model, vesas_data, type = "raw")
# Posterior probabilities
posterior_probs <- predict(model, vesas_data, type = "prob")
# Add predicted group back into data
vesas_data$PredictedGroup <- predicted_classes

Final Results Table

For each student, the Naive Bayes model calculated posterior probabilities representing the likelihood of belonging to each perception group based on their VESAS Expectancy and Value scores. The highest posterior probability indicates the group to which the student is most likely to belong according to the model. This predicted group assignment was then compared to the student’s original perception group—based on qualitative sorting—to assess the consistency and validity of the initial classification.

library(kableExtra)

# Get predicted classes
vesas_data$PredictedGroup <- predict(model, vesas_data, type = "raw")

# Get posterior probabilities
posterior_probs <- predict(model, vesas_data, type = "prob")

# Combine into final table
results_tbl <- vesas_data %>%
  mutate(
    Prob_Group0 = posterior_probs[, "0"],
    Prob_Group1 = posterior_probs[, "1"],
    Prob_Group2 = posterior_probs[, "2"],
  ) %>%
  select(PartID, PerceptGr, PredictedGroup, Prob_Group0, Prob_Group1, Prob_Group2)

# Display using kable and scroll box
results_tbl %>%
  kable("html", digits = 4, col.names = c("PartID", "Actual Group", "Predicted Group", "Prob 0", "Prob 1", "Prob 2")) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = FALSE) %>%
  scroll_box(width = "100%", height = "400px")

PartID	Actual Group	Predicted Group	Prob 0	Prob 1	Prob 2
2	2	2	0.0261	0.3717	0.6022
8	2	2	0.1637	0.1375	0.6988
10	2	0	0.8731	0.0002	0.1267
12	2	2	0.0213	0.3257	0.6530
13	2	2	0.0770	0.1300	0.7930
15	2	2	0.2242	0.1905	0.5853
20	2	2	0.1181	0.3167	0.5652
21	2	2	0.1940	0.0497	0.7563
23	2	2	0.0261	0.3717	0.6022
27	2	2	0.0470	0.2422	0.7107
28	2	2	0.3378	0.0783	0.5838
29	2	2	0.0812	0.3165	0.6023
30	2	2	0.0203	0.3221	0.6577
35	2	0	0.9327	0.0001	0.0672
41	2	2	0.0210	0.2814	0.6976
46	2	2	0.0246	0.2888	0.6866
48	2	2	0.1292	0.1983	0.6725
50	2	2	0.0226	0.3605	0.6169
57	2	2	0.1239	0.0863	0.7898
59	2	2	0.0212	0.2271	0.7518
32	1	2	0.0200	0.2780	0.7020
45	1	2	0.0297	0.3659	0.6044
14	1	2	0.0265	0.3584	0.6151
26	1	2	0.0728	0.3967	0.5305
43	1	2	0.0361	0.3842	0.5797
56	1	2	0.1215	0.2607	0.6177
60	1	0	0.4770	0.1025	0.4205
9	0	2	0.1102	0.1981	0.6917
22	0	2	0.1403	0.1379	0.7218
24	0	2	0.4005	0.0733	0.5263
31	0	0	0.9968	0.0000	0.0032
36	0	0	1.0000	0.0000	0.0000
39	0	0	0.9062	0.0002	0.0937
47	0	0	0.5483	0.0039	0.4478
49	0	2	0.0411	0.3778	0.5811
51	0	0	0.9740	0.0000	0.0260
55	0	2	0.0866	0.2597	0.6537

Model Accuracy

# Compare predicted group to actual group
match_logical <- vesas_data$PredictedGroup == vesas_data$PerceptGr

# Calculate percent match
percent_match <- mean(match_logical) * 100
percent_match <- round(percent_match, 2)

# Format to show two decimal places
formatted_match <- formatC(percent_match, format = "f", digits = 2)

# Show the result
paste("The model correctly classified", formatted_match, "% of the cases.")

## [1] "The model correctly classified 62.16 % of the cases."

Results Summary

# Create a variable indicating whether prediction matches actual
vesas_data <- vesas_data %>% 
  mutate(Match = ifelse(PredictedGroup == PerceptGr, "Match", "Mismatch"))

# Map numeric groups to labels for clarity
group_labels <- c("0" = "Failure", "1" = "Neutral", "2" = "Success")
vesas_data$ActualLabel <- factor(vesas_data$PerceptGr, levels = names(group_labels), labels = group_labels)
vesas_data$PredictedLabel <- factor(vesas_data$PredictedGroup, levels = names(group_labels), labels = group_labels)

# Plot with color = Actual group, shape = Match status
ggplot(vesas_data, aes(x = CTEz, y = CTVz, color = ActualLabel, shape = Match)) +
  geom_point(size = 3, alpha = 0.8) +
  scale_color_manual(
    values = c("Failure" = "red", "Neutral" = "gray", "Success" = "blue")
  ) +
  labs(
    title = "VESAS Scores by Sorted Perception Group with Prediction Match Status",
    x = "Standardized Expectancy Score (CTEz)",
    y = "Standardized Value Score (CTVz)",
    color = "Actual Group",
    shape = "Prediction Match"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
    legend.position = "right"
  )

Misclassification Patterns by Perception Group

library(dplyr)
library(knitr)
library(kableExtra)

# Total sample size
N_total <- nrow(vesas_data)

# Group sizes
group_sizes <- vesas_data %>%
  count(ActualLabel, name = "Group_n")

# Misclassified cases only
misclass_summary <- vesas_data %>%
  filter(PredictedGroup != PerceptGr) %>%
  count(ActualLabel, PredictedLabel, name = "Misclass_n") %>%
  left_join(group_sizes, by = "ActualLabel") %>%
  mutate(
    Percent_of_Group = round(Misclass_n / Group_n * 100, 1),
    Percent_of_All_Misclass = round(Misclass_n / sum(Misclass_n) * 100, 1)
  ) %>%
  arrange(desc(Misclass_n))

misclass_summary %>%
  kable(
    format = "html",
    col.names = c(
      "Actual Group",
      "Predicted As",
      "Misclassified (n)",
      "Group Size (n)",
      "% of Group Misclassified",
      "% of All Misclassifications"
    ),
    caption = "<strong><span style='font-size:18px;color:black;'>Patterns of Misclassified Cases Across Perception Groups</span></strong>"
  ) %>%
  kable_styling(full_width = FALSE)

**Patterns of Misclassified Cases Across Perception Groups**
Actual Group	Predicted As	Misclassified (n)	Group Size (n)	% of Group Misclassified	% of All Misclassifications
Neutral	Success	6	7	85.7	42.9
Failure	Success	5	10	50.0	35.7
Success	Failure	2	20	10.0	14.3
Neutral	Failure	1	7	14.3	7.1

The classifier correctly predicted perception-group membership for approximately 62% of cases, leaving 38% misclassified. Classification accuracy was not evenly distributed across groups. When Expectancy and Value scores were extreme, the model reliably identified students in the failure and success groups, often assigning high posterior probabilities to these classifications.

Misclassification was concentrated disproportionately among students qualitatively categorized as neutral. Of the 14 misclassified cases, seven (7) —representing 50% of all misclassifications—originated from the neutral group, despite this group comprising a much smaller proportion of the analytic sample (n=7). Notably, none of the seven neutral cases were predicted as neutral by the classifier: six were classified as success and one as failure. In contrast, five (5) misclassified cases (approximately 36% of all misclassifications, 50% of all failure cases) came from the failure group, and two cases (approximately 14% of all misclassfiications, 10% of all success cases) from the success group.

Discussion

The present analysis provides initial evidence that probabilistic classification can serve as a meaningful bridge between qualitative interpretations of academic challenge and quantitative measures of motivation. Using students’ Expectancy and Value scores as predictors, the Naive Bayes classifier recovered perception-group membership at rates substantially above chance, particularly for students in the failure and success groups. This pattern suggests that motivational magnitude aligns closely with how students interpret challenge at the extremes, where perceptions are more evaluative and motivational profiles are more distinct.

Classification accuracy was not evenly distributed across groups. The classifier’s failure to recover the neutral category—even though it was trained on the qualitative labels themselves—indicates that the neutral group is not distinguished by motivational magnitude alone. Instead, these students’ Expectancy–Value profiles closely resemble those of the success group, despite qualitatively distinct interpretations. Neutral qualitatiave responses were characterized by non-evaluative language, flexible interpretations of difficulty, and an emphasis on learning or adjustment rather than success or failure. Though success perception group participants had comparable motivational levels, there qualitative responses used more evaluative language and more mentions of self-doubt. This distinction may explain why the Expectancy–Value measures, as operationalized in the VESAS scale, were unable to reliably differentiate neutral from success cases. While EVT effectively captures the magnitude of motivation—how strongly students value STEM and expect to succeed—it does not directly measure the degree of judgment, pressure, or self-evaluative framing through which challenges are interpreted. As a result, motivational profiles that appear similar quantitatively may diverge meaningfully depending on how a challenge is perceived.

In contrast, the failure group displayed a different pattern. Although misclassification occurred for some failure cases, these errors were largely confined to students whose Expectancy and Value scores fell near the sample mean. When motivational scores were low or clearly negative, the classifier reliably identified failure-group membership. This suggests that failure-based interpretations are more tightly coupled with motivational magnitude than neutral interpretations. Qualitatively, failure responses were marked by strong evaluative language, fixed attributions, and clearer judgments of ability or belonging—features that align more directly with lowered expectancy and value.

Conclusion

This study demonstrates the potential of supervised Bayesian classification as a method for integrating qualitative and quantitative approaches to studying motivation and persistence in STEM. Rather than serving as a definitive validation, the Naive Bayes analysis functions as a form of probabilistic interrater assessment, evaluating the extent to which independent motivational measures align with qualitative perceptions of challenge. The results show meaningful convergence at the extremes and theoretically informative divergence in the middle, particularly among students who adopt neutral, non-evaluative appraisals of difficulty.

Importantly, these findings suggest that the original −1, 0, +1 qualitative coding scheme captures cognitively meaningful distinctions that are not reducible to motivation magnitude alone. The neutral category, in particular, appears to reflect adaptive regulatory processes that are largely invisible to traditional motivational scales but may be critical for long-term persistence in high-attrition domains. This insight represents a key contribution of the mixed-methods approach: it not only tests alignment across methods but also reveals where dominant frameworks fall short.

Future work should extend this approach using larger samples, alternative classification methods, and additional constructs related to self-regulation, judgment, and cognitive load. In particular, the neutral group warrants deeper investigation, as its members may represent a resilient yet understudied profile with important implications for intervention design. With further development, this analytic strategy has the potential to refine how motivation is measured, how qualitative insights are validated, and how persistence is supported in demanding educational pathways.