SABER West 2026

Author

Cindy Trussell

Published

January 18, 2026

The Mentee’s Voice: Unpacking Perceptions of Mentoring That Foster Resilience and Self-Efficacy

This talk, given in January 2026, focused on data from cohort 4 of AK UNiTE in the summer of 2025.

For the first step in the analysis, I loaded the appropriate packages.

Show the code

library(tidyverse)
library(lme4)
library(lmerTest)
library(emmeans)
library(lattice)
library(plotly)
library(censReg)
library(broom)

Bring in the Data for Analysis

I used the Cohort4 data from the Summer 2025 Data Set. Read in the raw data Cohort4 from the data-raw folder into the environment.

I then wanted to review the data to ensure we know which types of data are in each column.

Show the code

glimpse(cohort4)

Rows: 77
Columns: 46
$ StudyNumber             <dbl> 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6…
$ ProgramMonth            <chr> "August 2025", "July 2025", "June 2025", "Augu…
$ APPType                 <chr> "Document the mentoring experience from the me…
$ PreviousPart            <chr> "Yes", "Yes", "No", "No", "No", "No", "No", "N…
$ PreviousCohort          <chr> "Cohort 3 mentoring participant", "Cohort 3 me…
$ YearsResearchTraining   <chr> "less than 1 year", "less than 1 year", "less …
$ ResearchTitle           <chr> "Undergraduate researcher", "Undergraduate res…
$ W1_AchieveGoals         <chr> "Somewhat agree", "Agree", "Agree", "Agree", "…
$ W1_AccomplishTasks      <chr> "Somewhat agree", "Somewhat agree", "Somewhat …
$ W1_ObtainOutcomes       <chr> "Agree", "Somewhat agree", "Agree", "Agree", "…
$ W1_Succeed              <chr> "Agree", "Agree", "Somewhat agree", "Somewhat …
$ W1_Overcome             <chr> "Somewhat agree", "Agree", "Somewhat agree", "…
$ W1_ConfidentTasks       <chr> "Somewhat agree", "Somewhat agree", "Somewhat …
$ W1_PeerComparison       <chr> "Agree", "Somewhat agree", "Somewhat agree", "…
$ W1_ToughPerformance     <chr> "Somewhat agree", "Somewhat agree", "Somewhat …
$ W1_AdaptChange          <chr> "Often true", "Often true", "Often true", "Oft…
$ W1_Resilience           <chr> "Often true", "Often true", "Often true", "Oft…
$ W1_ResearchExpectations <chr> "In my current research project I am working w…
$ W1_SuccessPotential     <chr> "In my undergraduate research project I’ve got…
$ W1_MentorSupport        <chr> "The biggest thing mentors can do to support t…
$ W1_OpenResponse         <chr> "I had a picnic with my friends.", "I am going…
$ W2_DefineAmbigousR      <chr> "A ambiguous situation in research is one that…
$ W2_HandleAmbigousR      <chr> "Referring to previous literature is a good pl…
$ W2_MentoringAmbigousR   <chr> "Mentors can provide moral support as well as …
$ W2_MentoringNG          <chr> "I think not being straightforward is the bigg…
$ W2_OpenResponse         <chr> "I had a game night with my best friends", "I …
$ W3_OutcomeEffort        <chr> "Proving the hypothesis and/or learning new in…
$ W3_Motivation           <chr> "Knowing that the research you are conducting …
$ W3_Support              <chr> "My personal values are important in preservin…
$ W3_OpenResponse         <chr> "I started a new book!", "I went to the arcade…
$ W4_AchieveGoals         <chr> "Agree", "Somewhat agree", "Agree", "Agree", "…
$ W4_AccomplishTasks      <chr> "Agree", "Agree", "Somewhat agree", "Agree", "…
$ W4_ObtainOutcomes       <chr> "Agree", "Somewhat agree", "Agree", "Agree", "…
$ W4_Succeed              <chr> "Agree", "Agree", "Somewhat agree", "Somewhat …
$ W4_Overcome             <chr> "Somewhat agree", "Agree", "Agree", "Agree", "…
$ W4_ConfidentTasks       <chr> "Somewhat agree", "Somewhat agree", "Agree", "…
$ W4_PeerComparison       <chr> "Somewhat agree", "Somewhat agree", "Agree", "…
$ W4_ToughPerformance     <chr> "Somewhat agree", "Agree", "Agree", "Somewhat …
$ W4_AdaptChange          <chr> "Often true", "Often true", "Often true", "Oft…
$ W4_Resilience           <chr> "Sometimes true", "Sometimes true", "Often tru…
$ W4_1Scientist           <chr> "Being in the lab for the first time in underg…
$ W4_MeaningfulMentor     <chr> "I am fortunate enough to have great mentors f…
$ W4_BelongNG             <chr> "I’ve had a couple of experiences where I didn…
$ W4_MentorDescription    <chr> "Mentoring has played an important role in my …
$ W4_MentorExpectations   <chr> "I expect my mentor to show interest in my goa…
$ W4_OpenResponse         <chr> "I learned how to make lasagna!", "I am planni…

Data Cleaning & Value Mapping

Next, I needed to assign numbers to the Likert scale choices for the New General Self-efficacy Scale (6-point) and the Connor-Davidson Resilience Scale (5-point). This code maps strongly agree, agree, etc., to the appropriate numbers.

Show the code

# Define the Likert scale mappings for reuse
likert_6_map <- c(
  "Strongly disagree" = 1, "Disagree" = 2, "Somewhat disagree" = 3,
  "Somewhat agree" = 4, "Agree" = 5, "Strongly agree" = 6
)

likert_5_map <- c(
  "Never true" = 1, "Rarely true" = 2, "Sometimes true" = 3,
  "Often true" = 4, "True nearly all the time" = 5
)

Create New Data Table with Composite Scores, Categories, and Normalized Gain

I then wanted to create a new data object called cohort4_with_composite that includes composite scores for self-efficacy and resilience. To get a more stable measure of a person’s mindset, the code averages individual questions into a single score for that specific week (Modified New General Self-Efficacy and Conor Davidson Resilience).

Self-Efficacy: Computes the mean of 8 items for Week 1 and Week 4.
Resilience: Computes the mean of the 2 resilience items for Week 1 and Week 4.
The final step calculates Hake’s Normalized Gain. This measures how much a participant improved from Week 1 to Week 4 relative to how much room they had for improvement.
How it handles the “Ceiling Effect”: If a participant started Week 1 with a perfect score (a 6 for Self-Efficacy or a 5 for Resilience), they cannot “improve” on the scale. The code identifies these “Ceiling” cases and manually sets their gain to 0. Without this ifelse logic, the formula would try to divide by zero, resulting in NaN (Not a Number) errors in your dataset.

Creating Analysis Categories

Temporal Sequencing: Maps the ProgramMonth (June, July, August 2025) to numeric values (1, 2, 3) to track progress over the summer.
Experience Levels: Collapses “Years of Research Training” into a numeric scale (1–6).
Role Assignment: Sorts participants into Mentees or Mentors based on their application type (APPType).
Professional Tiering: Groups various titles into three main buckets: Undergraduates, Graduate Students/Post Docs, and Primary Investigators/Faculty.

View the Cohort4_with_composite data table, which also includes the net gains for each participant for each month. In other words, how much each participant changed just during each month.

Show the code

glimpse(cohort4_with_composite)

Rows: 77
Columns: 56
$ StudyNumber                <fct> 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5…
$ ProgramMonth               <chr> "August 2025", "July 2025", "June 2025", "A…
$ APPType                    <chr> "Document the mentoring experience from the…
$ PreviousPart               <chr> "Yes", "Yes", "No", "No", "No", "No", "No",…
$ PreviousCohort             <chr> "Cohort 3 mentoring participant", "Cohort 3…
$ YearsResearchTraining      <dbl> 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 2, 1, 2, 2, 2…
$ ResearchTitle              <chr> "Undergraduate researcher", "Undergraduate …
$ W1_AchieveGoals            <dbl> 4, 5, 5, 5, 4, 4, 5, 5, 5, 5, 6, 5, 6, 5, 5…
$ W1_AccomplishTasks         <dbl> 4, 4, 4, 5, 5, 4, 5, 5, 6, 5, 5, 5, 6, 6, 6…
$ W1_ObtainOutcomes          <dbl> 5, 4, 5, 5, 5, 5, 5, 4, 5, 5, 6, 6, 6, 5, 5…
$ W1_Succeed                 <dbl> 5, 5, 4, 4, 3, 4, 6, 5, 4, 5, 6, 5, 5, 5, 4…
$ W1_Overcome                <dbl> 4, 5, 4, 5, 6, 5, 6, 5, 5, 5, 5, 5, 6, 6, 6…
$ W1_ConfidentTasks          <dbl> 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 5, 6…
$ W1_PeerComparison          <dbl> 5, 4, 4, 4, 4, 4, 5, 5, 5, 5, 4, 5, 6, 5, 5…
$ W1_ToughPerformance        <dbl> 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 6, 5, 6, 6, 6…
$ W1_AdaptChange             <dbl> 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 4, 5, 5, 5…
$ W1_Resilience              <dbl> 4, 4, 4, 4, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5…
$ W1_ResearchExpectations    <chr> "In my current research project I am workin…
$ W1_SuccessPotential        <chr> "In my undergraduate research project I’ve …
$ W1_MentorSupport           <chr> "The biggest thing mentors can do to suppor…
$ W1_OpenResponse            <chr> "I had a picnic with my friends.", "I am go…
$ W2_DefineAmbigousR         <chr> "A ambiguous situation in research is one t…
$ W2_HandleAmbigousR         <chr> "Referring to previous literature is a good…
$ W2_MentoringAmbigousR      <chr> "Mentors can provide moral support as well …
$ W2_MentoringNG             <chr> "I think not being straightforward is the b…
$ W2_OpenResponse            <chr> "I had a game night with my best friends", …
$ W3_OutcomeEffort           <chr> "Proving the hypothesis and/or learning new…
$ W3_Motivation              <chr> "Knowing that the research you are conducti…
$ W3_Support                 <chr> "My personal values are important in preser…
$ W3_OpenResponse            <chr> "I started a new book!", "I went to the arc…
$ W4_AchieveGoals            <dbl> 5, 4, 5, 5, 5, 5, 5, 5, 5, 6, 5, 5, 5, 4, 5…
$ W4_AccomplishTasks         <dbl> 5, 5, 4, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6…
$ W4_ObtainOutcomes          <dbl> 5, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 4, 4, 5…
$ W4_Succeed                 <dbl> 5, 5, 4, 4, 3, 4, 6, 6, 5, 6, 5, 6, 4, 5, 5…
$ W4_Overcome                <dbl> 4, 5, 5, 5, 5, 5, 5, 5, 5, 6, 5, 6, 6, 6, 6…
$ W4_ConfidentTasks          <dbl> 4, 4, 5, 5, 4, 5, 5, 5, 6, 6, 6, 5, 5, 6, 6…
$ W4_PeerComparison          <dbl> 4, 4, 5, 5, 4, 4, 5, 5, 5, 6, 6, 4, 5, 6, 5…
$ W4_ToughPerformance        <dbl> 4, 5, 5, 4, 5, 4, 5, 5, 5, 6, 6, 5, 6, 6, 6…
$ W4_AdaptChange             <dbl> 4, 4, 4, 4, 5, 5, 4, 4, 4, 5, 5, 3, 5, 5, 5…
$ W4_Resilience              <dbl> 3, 3, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5…
$ W4_1Scientist              <chr> "Being in the lab for the first time in und…
$ W4_MeaningfulMentor        <chr> "I am fortunate enough to have great mentor…
$ W4_BelongNG                <chr> "I’ve had a couple of experiences where I d…
$ W4_MentorDescription       <chr> "Mentoring has played an important role in …
$ W4_MentorExpectations      <chr> "I expect my mentor to show interest in my …
$ W4_OpenResponse            <chr> "I learned how to make lasagna!", "I am pla…
$ Composite_Self_Efficacy_W1 <dbl> 4.375, 4.375, 4.250, 4.750, 4.625, 4.500, 5…
$ Composite_Self_Efficacy_W4 <dbl> 4.500, 4.500, 4.750, 4.750, 4.500, 4.625, 5…
$ Composite_Resilience_W1    <dbl> 4.0, 4.0, 4.0, 4.0, 3.5, 3.5, 4.0, 4.0, 4.0…
$ Composite_Resilience_W4    <dbl> 3.5, 3.5, 4.0, 4.0, 4.5, 4.5, 4.0, 4.0, 4.0…
$ Month_Number               <dbl> 3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1…
$ Years_Factor               <fct> 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 2, 1, 2, 2, 2…
$ Mentoring_Category         <fct> mentee, mentee, mentee, mentee, mentee, men…
$ Research_Category          <chr> "Undergraduates", "Undergraduates", "Underg…
$ g_SelfEfficacy             <dbl> 0.07692308, 0.07692308, 0.28571429, 0.00000…
$ g_Resilience               <dbl> -0.5000000, -0.5000000, 0.0000000, 0.000000…

There are still a few more steps to prepare the data for the analyses we want to perform. To run the linear mixed-effects model, we pivot the columns to make the data long rather than wide. We also ensure that pre comes before post and that both the resilience composite score and the self-efficacy composite score are numeric variables.

Since we are also interested in whether the length of participation influenced gains in self-efficacy and resilience, we calculate this for each participant.

Show the code

#--------------Step 2-------------
# Define the composite score columns to pivot
score_cols_to_pivot <- c(
  "Composite_Self_Efficacy_W1", "Composite_Self_Efficacy_W4",
  "Composite_Resilience_W1", "Composite_Resilience_W4"
)

cohort4_long_final <- cohort4_with_composite %>%
  # Pivot the 4 composite columns longer
  pivot_longer(
    cols = all_of(score_cols_to_pivot), 
    names_to = c(".value", "TimePoint"), 
    names_sep = "_W",
    names_prefix = "Composite_"
  ) %>%
  
  # === Final LMEM Variable Setup ===
  mutate(
    # Create the key Pre/Post factor with correct levels
    Measure_Type = factor(
      ifelse(TimePoint == "1", "Pre", "Post"), 
      levels = c("Pre", "Post")
    ),
    # Ensure scores are numeric
    Resilience = as.numeric(Resilience),
    Self_Efficacy = as.numeric(Self_Efficacy)
  ) %>%
  # Calculate Length of Participation (count of Pre/Post measures)
  group_by(StudyNumber) %>%
  mutate(
    LengthParticipation = n() 
  ) %>%
  ungroup() %>%
  # Remove NAs and unnecessary columns
  filter(!is.na(Self_Efficacy) | !is.na(Resilience)) %>%
  select(-TimePoint)

The way these data are organized allows us to see whether net gains changed over one month or from the beginning to the end of the period during which a participant was engaged in the reflections.

Correlation of Self-efficacy to Resilience

Others have found that self-efficacy and resilience are correlated Abdolrezapour, Ganjeh, and Ghanbari (2023), so we wanted to analyze our participants to see whether we also find a similar correlation.

At this point, I exported the newly created data table to an R script named cor_self_and_res.

For this first correlation between self-efficacy and resilience, we are using each participant’s pre-score regardless of when they started. This first segment of code creates an object/data frame that filters the necessary comparisons.

Show the code

first_baseline_correlation <- cohort4_long_final %>%
  # 1. Filter only for baseline entries
  filter(Measure_Type == "Pre") %>%
  
  # 2. Group by participant
  group_by(StudyNumber) %>%
  
  # 3. Select the row with the smallest Month_Number (the earliest month)
  # If you have month names, ensure Month_Number is numeric (1=June, 2=July, etc.)
  slice_min(order_by = Month_Number, n = 1, with_ties = FALSE) %>%
  
  # 4. Cleanup and Grouping
  ungroup() %>%
  select(StudyNumber, Self_Efficacy, Resilience, Mentoring_Category, Month_Number)

Now it is time to run the correlation and visualize it.

Show the code

# Calculate the correlation coefficient
cor_val <- cor.test(first_baseline_correlation$Self_Efficacy, 
               first_baseline_correlation$Resilience, 
               method = "pearson")

print(cor_val)


    Pearson's product-moment correlation

data:  first_baseline_correlation$Self_Efficacy and first_baseline_correlation$Resilience
t = 4.5648, df = 38, p-value = 5.118e-05
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.3481325 0.7648295
sample estimates:
      cor 
0.5951048

A Pearson correlation coefficient was computed to assess the linear relationship between self-efficacy and resilience using the first time they completed the instrument. There was a positive correlation between the two variables, r(38) = .595, p <.001.

Show the code

ggplot(first_baseline_correlation, aes(x = Self_Efficacy, y = Resilience)) +
  # 1. Add jittered points to see density of Likert responses
  geom_jitter(width = 0.1, height = 0.1, alpha = 0.5, size = 2.5, color = "#2c3e50") +
  
  # 2. Add the Pearson regression line with Confidence Intervals (se = TRUE)
  geom_smooth(method = "lm", se = TRUE, fullrange = TRUE, color = "black", fill = "#bdc3c7", alpha = 0.4) +
  
  # 3. Standardize the axes to your survey scales
  scale_x_continuous(breaks = seq(1, 6, 0.5)) + 
  scale_y_continuous(breaks = seq(1, 5, 0.5)) +
  
  # 4. Styling and Labels
  theme_minimal(base_size = 14) +
  labs(
    title = "Baseline Correlation: Self-Efficacy vs. Resilience",
    subtitle = paste("Pearson's r =", round(cor_val$estimate, 2), "| Shaded area represents 95% Confidence Interval"),
    x = "Baseline Self-Efficacy (1-6 Scale)",
    y = "Baseline Resilience (1-5 Scale)",
    caption = "Data: First entry 'Pre' scores only"
  )
#export figure
ggsave(
  filename = "figs/BaselineCorreleation.svg", 
  device = "svg",
  width = 8,      # Width in inches
  height = 6,     # Height in inches
  units = "in"
)

Figure 1: Correlation of Self-efficacy to Reslience using first response score.

Show the code

first_baseline_correlation %>%
  count(Month_Number, name = "Frequency") %>%
  rename("Program Month" = Month_Number) %>%
  knitr::kable()

Table 1: Sample Size by Month

Program Month	Frequency
1	13
2	15
3	12

Normalized gains for the length of the program

Because some participants are in the program for one month and others for three, you want to capture their growth from the moment they entered (True Baseline) to the moment they left (Final Exit).

\[ g = \frac{\text{Last Post} - \text{First Pre}}{\text{Max Possible Score} - \text{First Pre}}\]What it represents: It tells us what percentage of the “available room for growth” the participant actually covered. A student who starts at a 5 and moves to a 6 has a “perfect” gain (1.0), just like a student moving from a 3 to a 6.

Show the code

cohort4_final_gains <- cohort4_long_final %>%
  group_by(StudyNumber) %>%
  summarize(
    # 1. Participation Length
    Months_Participated = n_distinct(Month_Number),
    
    # 2. Self-Efficacy: First Pre and Last Post
    First_Pre_SE  = Self_Efficacy[Measure_Type == "Pre" & Month_Number == min(Month_Number)][1],
    Last_Post_SE  = Self_Efficacy[Measure_Type == "Post" & Month_Number == max(Month_Number)][1],
    
    # 3. Resilience: First Pre and Last Post
    First_Pre_Res = Resilience[Measure_Type == "Pre" & Month_Number == min(Month_Number)][1],
    Last_Post_Res = Resilience[Measure_Type == "Post" & Month_Number == max(Month_Number)][1],
    
    # Keep metadata
    Mentoring_Category = first(Mentoring_Category),
    Research_Category = first(Research_Category),
    .groups = "drop"
  ) %>%
  # 4. Calculate Normalized Gains (Hake's g)
  mutate(
    # Gain for Self-Efficacy (1-6 scale)
    Gain_SelfEfficacy = (Last_Post_SE - First_Pre_SE) / (6 - First_Pre_SE),
    Gain_SelfEfficacy = ifelse(First_Pre_SE == 6, 0, Gain_SelfEfficacy),
    
    # Gain for Resilience (1-5 scale)
    Gain_Resilience = (Last_Post_Res - First_Pre_Res) / (5 - First_Pre_Res),
    Gain_Resilience = ifelse(First_Pre_Res == 5, 0, Gain_Resilience)
  )
glimpse(cohort4_final_gains)

Rows: 40
Columns: 10
$ StudyNumber         <fct> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,…
$ Months_Participated <int> 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 3, 2, 2, 2, 2,…
$ First_Pre_SE        <dbl> 4.250, 4.500, 5.000, 5.250, 5.375, 5.250, 5.500, 4…
$ Last_Post_SE        <dbl> 4.500, 4.750, 5.125, 6.000, 5.125, 5.000, 5.000, 5…
$ First_Pre_Res       <dbl> 4.0, 3.5, 4.0, 4.5, 5.0, 4.0, 5.0, 3.5, 4.0, 5.0, …
$ Last_Post_Res       <dbl> 3.5, 4.0, 4.0, 5.0, 5.0, 3.5, 4.0, 3.5, 4.0, 5.0, …
$ Mentoring_Category  <fct> mentee, mentee, mentee, mentee, mentee, mentee, me…
$ Research_Category   <chr> "Undergraduates", "Undergraduates", "Undergraduate…
$ Gain_SelfEfficacy   <dbl> 0.1428571, 0.1666667, 0.1250000, 1.0000000, -0.400…
$ Gain_Resilience     <dbl> -0.5000000, 0.3333333, 0.0000000, 1.0000000, 0.000…

This cohort4_final_gains data frame allows us to ask, “Does a participant who participated for three months have a greater gain than someone who participated for one month?” To ensure that we do the right kind of statistical test, we first want to see if our data are normally distributed

Show the code

# 1. Visual Check: Histogram of Self Efficacy
ggplot(cohort4_final_gains, aes(x = Gain_SelfEfficacy)) +
  geom_histogram(fill = "#7b178e", color = "white", bins = 15) +
  theme_minimal() +
  labs(title = "Distribution of Normalized Gains for Self-efficacy")

Show the code

# 1. Visual Check: Histogram of Resilience
ggplot(cohort4_final_gains, aes(x = Gain_Resilience)) +
  geom_histogram(fill = "#7b178e", color = "white", bins = 15) +
  theme_minimal() +
  labs(title = "Distribution of Normalized Gains for Resilience")

Show the code

# 2. Statistical Check: Shapiro-Wilk Test
# p > 0.05 means your data is likely Normal.
shapiro.test(cohort4_final_gains$Gain_SelfEfficacy)


    Shapiro-Wilk normality test

data:  cohort4_final_gains$Gain_SelfEfficacy
W = 0.95083, p-value = 0.08096

Show the code

#shapiro wilk for resilience
shapiro.test(cohort4_final_gains$Gain_Resilience)


    Shapiro-Wilk normality test

data:  cohort4_final_gains$Gain_Resilience
W = 0.80281, p-value = 7.758e-06

Since the Shapiro test p-value is greater than .05 for self-efficacy, we can use parametric statistics. But it is less that .05 for Resilience so a Kruskal-Wallis test is more appropriate.

ANOVA Results Self-Efficacy - One Month, Two Months, Three Months

Show the code

# Run ANOVA
anova_final_gains <- aov(Gain_SelfEfficacy ~ as.factor(Months_Participated), data = cohort4_final_gains)

# View results
summary(anova_final_gains)

                               Df Sum Sq Mean Sq F value Pr(>F)
as.factor(Months_Participated)  2  0.123 0.06173   0.213  0.809
Residuals                      37 10.701 0.28920

Show the code

# 1. Ensure Months_Participated is a factor so it creates separate boxes
cohort4_final_gains <- cohort4_final_gains %>%
  mutate(Months_Factor = as.factor(Months_Participated))

# 2. Create the Plot
ggplot(cohort4_final_gains, aes(x = Months_Factor, y = Gain_SelfEfficacy, fill = Months_Factor)) +
  # Add the boxplot for the summary statistics
  geom_boxplot(alpha = 0.5, outlier.shape = NA) + 
  
  # Add each StudyNumber as a jittered dot
  # width = 0.2 prevents the dots from spreading too far horizontally
  geom_jitter(width = 0.2, size = 2, alpha = 0.7, color = "#2c3e50") +
  
  # Styling
  theme_minimal(base_size = 14) +
  scale_fill_brewer(palette = "Greens") + # Subtle green gradient for time
  labs(
    title = "Self-Efficacy Growth by Program Duration",
    subtitle = "Normalized Gain (First Pre to Last Post)",
    x = "Number of Months Participated",
    y = "Normalized Gain (Hake's g)",
    caption = "Each dot represents one participant (StudyNumber)"
  ) +
  theme(legend.position = "none") # Legend is redundant since x-axis is labeled

#export figure
ggsave(
  filename = "figs/SelfEfficacy_Gains.svg", 
  device = "svg",
  width = 8,      # Width in inches
  height = 6,     # Height in inches
  units = "in"
)

Figure 2: Lack of net gains in self-efficacy by length of participation in the program.A one-way ANOVA was conducted to compare the effect of participation length on normalized self-efficacy gains. The results indicated that there were no significant differences between those who participated for 1, 2, or 3 months, \(F(2, 37) = 0.21, p = .809\).

A one-way ANOVA was conducted to compare the effect of participation length on normalized self-efficacy gains. The results indicated that there were no significant differences between those who participated for 1, 2, or 3 months, \(F(2, 37) = 0.21, p = .809\).

Gains over baseline self-efficacy?

Show the code

# Filter and test for each month group
t_test_results <- cohort4_final_gains %>%
  group_by(Months_Participated) %>%
  summarise(
    Mean_Gain = mean(Gain_SelfEfficacy, na.rm = TRUE),
    P_Value = t.test(Gain_SelfEfficacy, mu = 0)$p.value
  )

print(t_test_results)

# A tibble: 3 × 3
  Months_Participated Mean_Gain P_Value
                <int>     <dbl>   <dbl>
1                   1  -0.00825   0.953
2                   2  -0.0476    0.732
3                   3   0.0929    0.612

Kruskal-Wallis Test for Normalized Gains in Resilience

Show the code

resilience_kw <- kruskal.test(Gain_Resilience ~ as.factor(Months_Participated), 
                              data = cohort4_final_gains)

# View the results
print(resilience_kw)


    Kruskal-Wallis rank sum test

data:  Gain_Resilience by as.factor(Months_Participated)
Kruskal-Wallis chi-squared = 0.52091, df = 2, p-value = 0.7707

A Kruskal-Wallis test showed that normalized gains in resilience did not significantly differ by length of participation, \(\chi^2(2) = .521, p = .775\).

Show the code

ggplot(cohort4_final_gains, aes(x = Months_Factor, y = Gain_Resilience)) +
  # 1. Add the boxplot with the Purples palette
  geom_boxplot(aes(fill = Months_Factor), alpha = 0.5, outlier.shape = NA) + 
  scale_fill_brewer(palette = "Purples") + # Added missing + here

  # 2. Add StudyNumbers as dots, colored by the Ceiling condition
  geom_jitter(aes(color = (First_Pre_Res >= 5 | Last_Post_Res >= 5)), 
              width = 0.2, size = 2.5, alpha = 0.8) +

  # 3. Apply your specific purple colors
  scale_color_manual(values = c("#987eab", "#5e17eb"), 
                     labels = c("Below Ceiling", "At Ceiling")) +
  
  # 4. Styling, Labels, and REMOVING LEGEND
  theme_minimal(base_size = 15) + 
  theme(legend.position = "none") + # This hides all legends
  labs(title = "Resilience Normalized Gains by Participation Length",
       subtitle = "Darker purple dots indicate participants at the scale ceiling (5.0)",
       x = "Months of Participation",
       y = "Normalized Gain (Resilience)")
  
#export figure
ggsave(
  filename = "figs/Resilience_Gains.svg", 
  device = "svg",
  width = 8,      # Width in inches
  height = 6,     # Height in inches
  units = "in"
)

Figure 3: A Kruskal-Wallis test showed that normalized gains in resilience did not significantly differ by length of participation, X2(2) = .521, p = .775.

Gains over baseline resilience?

While we did not find that the normalized gains in resilience differed by length of time spent with us, we also wanted to know if there might have been significant gains over the baseline. So we ran a Wilcoxon signed-rank test.

Show the code

# Non-parametric test for resilience
wilcox_results <- cohort4_final_gains %>%
  group_by(Months_Participated) %>%
  summarise(
    Median_Gain = median(Gain_Resilience, na.rm = TRUE),
    P_Value = wilcox.test(Gain_Resilience, mu = 0)$p.value
  )

print(wilcox_results)

# A tibble: 3 × 3
  Months_Participated Median_Gain P_Value
                <int>       <dbl>   <dbl>
1                   1           0   0.850
2                   2           0   0.387
3                   3           0   0.586

Interactive Visualization of Self-efficacy Trajectories

Since the baseline remained unchanged regardless of how long participants engaged in the program, we thought it would be interesting to examine some individual trajectories. The code below creates an interactive chart showing the actual score for each and every time a participant answered the survey questions.

Show the code

# Create a chronological ordering factor
indiv_traject_data <- cohort4_long_final %>%
  mutate(
    # Create a label like "M1-Pre", "M1-Post", etc.
    Time_Label = paste0("M", Month_Number, "-", Measure_Type),
    # Create a numeric sequence for the x-axis
    # (Month 1 Pre = 1, Month 1 Post = 2, Month 2 Pre = 3, etc.)
    Timeline = (Month_Number - 1) * 2 + ifelse(Measure_Type == "Pre", 1, 2)
  ) %>%
  mutate(Time_Label = reorder(Time_Label, Timeline)) # Force R to keep them in order

# Create a list of IDs who stayed for 3 months
completers_3mo <- cohort4_long_final %>%
  group_by(StudyNumber) %>%
  filter(max(Month_Number) == 3) %>%
  pull(StudyNumber) %>%
  unique()

indiv_traject_data <- indiv_traject_data %>%
  filter(StudyNumber %in% completers_3mo)

# 1. Create a "Shared Data" object
shared_data <- highlight_key(indiv_traject_data, ~StudyNumber)

# 2. Build the plot using the shared data
p_interactive <- ggplot(shared_data, aes(x = Timeline, y = Self_Efficacy, group = StudyNumber)) +
  geom_line(color = "steelblue", alpha = 0.3) +
  geom_point(alpha = 0.5) +
  scale_x_continuous(breaks = 1:6, 
                     labels = c("M1-Pre", "M1-Post", "M2-Pre", "M2-Post", "M3-Pre", "M3-Post")) +
  theme_minimal()

# 3. Use ggplotly with 'highlight' options
ggplotly(p_interactive) %>%
  highlight(
    on = "plotly_hover", # Line lights up when you hover
    color = "#45185e",       # The color it turns when highlighted
    selectize = TRUE,    # Adds a dropdown menu to select a specific StudyNumber
    persistent = FALSE
  )

Who falls into a “high”, “medium”, and “low” self-efficacy categories?

Show the code

# 1. Calculate the Global Mean and Standard Deviation for all rows
# We assume your column is named 'Composite_Self_Efficacy'
global_stats <- cohort4_long_final %>%
  summarise(
    Mean_SE = mean(Self_Efficacy, na.rm = TRUE),
    SD_SE = sd(Self_Efficacy, na.rm = TRUE)
  )

# Save these as variables for the thresholds
m <- global_stats$Mean_SE
s <- global_stats$SD_SE
high_thresh <- m + s
low_thresh  <- m - s

# 2. Identify participants based on their average score
# We group by StudyNumber so each person has one "dot" on the plot
participant_summary <- cohort4_long_final %>%
  group_by(StudyNumber, Research_Category) %>%
  summarise(Avg_SE = mean(Self_Efficacy, na.rm = TRUE), .groups = "drop") %>%
  mutate(
    Outlier_Status = case_when(
      Avg_SE >= high_thresh ~ "High (>1 SD)",
      Avg_SE <= low_thresh  ~ "Low (<1 SD)",
      TRUE ~ "Within 1 SD"
    )
  )

# Print the outliers to the console
print("Participants falling 1 SD or more above the mean:")

[1] "Participants falling 1 SD or more above the mean:"

Show the code

print(participant_summary %>% filter(Outlier_Status == "High (>1 SD)"))

# A tibble: 5 × 4
  StudyNumber Research_Category Avg_SE Outlier_Status
  <fct>       <chr>              <dbl> <chr>         
1 12          Undergraduates      5.88 High (>1 SD)  
2 25          Undergraduates      5.97 High (>1 SD)  
3 27          Undergraduates      5.66 High (>1 SD)  
4 33          Graduate Students   5.75 High (>1 SD)  
5 36          Undergraduates      5.81 High (>1 SD)

Show the code

print("Participants falling 1 SD or more below the mean:")

[1] "Participants falling 1 SD or more below the mean:"

Show the code

print(participant_summary %>% filter(Outlier_Status == "Low (<1 SD)"))

# A tibble: 7 × 4
  StudyNumber Research_Category Avg_SE Outlier_Status
  <fct>       <chr>              <dbl> <chr>         
1 1           Undergraduates      4.46 Low (<1 SD)   
2 2           Undergraduates      4.62 Low (<1 SD)   
3 9           Graduate Students   4.21 Low (<1 SD)   
4 19          Undergraduates      4.62 Low (<1 SD)   
5 23          Undergraduates      4.62 Low (<1 SD)   
6 28          Graduate Students   4.47 Low (<1 SD)   
7 40          Graduate Students   4.56 Low (<1 SD)

Show the code

# 1. SET THE ORDER: Reorder Research_Category as a factor
participant_summary$Research_Category <- factor(
  participant_summary$Research_Category, 
  levels = c("Undergraduates", "Graduate Students", "Primary Investigators")
)

# 3. Graph the results
ggplot(participant_summary, aes(x = Research_Category, y = Avg_SE, color = Research_Category)) +
  # Add the Mean and SD threshold lines
  geom_hline(yintercept = m, color = "black", size = 1) +
  geom_hline(yintercept = c(high_thresh, low_thresh), linetype = "dashed", color = "#45185e") +
  
  # Add the participants as dots
  geom_jitter(width = 0.2, size = 3, alpha = 0.8) +
  scale_color_manual(values = c(
    "Undergraduates" = "#1b9e77", 
    "Graduate Students" = "#d95f02", 
    "Primary Investigators" = "#7570b3"
  )) +
  
  # Label the dots with StudyNumber for those outside 1 SD
  geom_text(aes(label = ifelse(Outlier_Status != "Within 1 SD", StudyNumber, "")), 
            vjust = -1, size = 3, fontface = "bold", show.legend = FALSE) +
  
  # Professional Formatting
  theme_classic(base_size = 14) +
  labs(
    title = "Self-Efficacy Distribution by Research Category",
    subtitle = paste0("Solid line: Mean (", round(m, 2), ") | Dashed lines: ±1 Standard Deviation"),
    y = "Average Self-Efficacy Score",
    x = "Research Category",
    caption = "Study Numbers are labeled for participants >1 SD from the mean."
  ) +
  theme(legend.position = "none")

Show the code

ggsave(
  filename = "figs/ResearchType_ActualScores.svg", 
  device = "svg",
  width = 8,      # Width in inches
  height = 6,     # Height in inches
  units = "in"
)

Years in Research Training

We first need to add the Years_Factor of years in research training to our data frame

Show the code

# 1. Join only the Years_Factor column into your gains data frame
cohort4_final_gains <- cohort4_final_gains %>%
  left_join(cohort4_with_composite %>% 
              dplyr::select(StudyNumber, Years_Factor), # Explicitly tell R to use dplyr
            by = "StudyNumber")

Visualize Gains in Self-Efficacy by Years in Research Training

Show the code

ggplot(cohort4_final_gains, aes(x = Years_Factor, y = Gain_SelfEfficacy)) +
  # Boxplot with Green palette
  geom_boxplot(aes(fill = Years_Factor), alpha = 0.5, outlier.shape = NA) + 
  scale_fill_brewer(palette = "Greens") +

  # Add individual points
  geom_jitter(width = 0.2, size = 2.5, alpha = 0.7, color = "#2c3e50") +
  
  theme_minimal(base_size = 15) + 
  theme(legend.position = "none") + 
  labs(title = "Self-Efficacy Gains by Years of Experience",
       x = "Years of Experience (Category)",
       y = "Normalized Gain (Self-Efficacy)")

Show the code

#export figure
ggsave(
  filename = "figs/SE_Gains_YrsExp.svg", 
  device = "svg",
  width = 8,      # Width in inches
  height = 6,     # Height in inches
  units = "in"
)

visualize the gains in resilience by years of research training

Show the code

ggplot(cohort4_final_gains, aes(x = Years_Factor, y = Gain_Resilience)) +
  # Boxplot with Purple palette
  geom_boxplot(aes(fill = Years_Factor), alpha = 0.5, outlier.shape = NA) + 
  scale_fill_brewer(palette = "Purples") +

  # Add individual points, highlighting the ceiling group in dark purple
  geom_jitter(aes(color = (First_Pre_Res >= 5 | Last_Post_Res >= 5)), 
              width = 0.2, size = 2.5, alpha = 0.8) +
  scale_color_manual(values = c("black", "#7b178e")) +
  
  theme_minimal(base_size = 15) + 
  theme(legend.position = "none") + 
  labs(title = "Resilience Gains by Years of Experience",
       subtitle = "Darker purple dots indicate participants at the scale ceiling",
       x = "Years of Experience (Category)",
       y = "Normalized Gain (Resilience)")

Show the code

#export figure
ggsave(
  filename = "figs/Res_Gains_YrsExp.svg", 
  device = "svg",
  width = 8,      # Width in inches
  height = 6,     # Height in inches
  units = "in"
)

Run the Tobit for Years in Research Experience

Show the code

# Tobit for Self-Efficacy (Ceiling = 6)
tobit_se_years <- censReg(Last_Post_SE ~ First_Pre_SE + Years_Factor, 
                    right = 6, 
                    data = cohort4_final_gains)

summary(tobit_se_years)


Call:
censReg(formula = Last_Post_SE ~ First_Pre_SE + Years_Factor, 
    right = 6, data = cohort4_final_gains)

Observations:
         Total  Left-censored     Uncensored Right-censored 
            77              0             72              5 

Coefficients:
              Estimate Std. error t value  Pr(> t)    
(Intercept)    2.28680    0.41257   5.543 2.98e-08 ***
First_Pre_SE   0.60358    0.07943   7.599 2.98e-14 ***
Years_Factor2 -0.13485    0.10690  -1.261  0.20714    
Years_Factor3 -0.36349    0.13236  -2.746  0.00603 ** 
Years_Factor4 -0.21031    0.22713  -0.926  0.35447    
Years_Factor5 -0.49220    0.25751  -1.911  0.05596 .  
Years_Factor6 -0.07648    0.11558  -0.662  0.50817    
logSigma      -1.06135    0.08488 -12.504  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Newton-Raphson maximisation, 6 iterations
Return code 8: successive function values within relative tolerance limit (reltol)
Log-likelihood: -31.91955 on 8 Df

Show the code

group_summary <- cohort4_final_gains %>%
  group_by(Years_Factor) %>%
  summarise(
    n = n(),
    mean_pre = mean(First_Pre_SE, na.rm = TRUE),
    mean_post = mean(Last_Post_SE, na.rm = TRUE),
    mean_gain = mean(Gain_SelfEfficacy, na.rm = TRUE)
  ) %>%
  mutate(across(where(is.numeric), round, 2))

print(group_summary)

# A tibble: 6 × 5
  Years_Factor     n mean_pre mean_post mean_gain
  <fct>        <dbl>    <dbl>     <dbl>     <dbl>
1 1               19     5.1       5.35      0.25
2 2               24     5.05      5.19      0.05
3 3               11     5.22      5.06     -0.23
4 4                3     5.71      5.46     -0.53
5 5                2     5         4.81     -0.54
6 6               18     4.88      5.15      0.06

Show the code

tobit_res_years <- censReg(Last_Post_Res ~ First_Pre_Res + Years_Factor, 
                    right = 5, 
                    data = cohort4_final_gains)

summary(tobit_res_years)


Call:
censReg(formula = Last_Post_Res ~ First_Pre_Res + Years_Factor, 
    right = 5, data = cohort4_final_gains)

Observations:
         Total  Left-censored     Uncensored Right-censored 
            77              0             46             31 

Coefficients:
              Estimate Std. error t value  Pr(> t)    
(Intercept)   -0.74299    0.78121  -0.951    0.342    
First_Pre_Res  1.23420    0.18309   6.741 1.57e-11 ***
Years_Factor2 -0.04031    0.20612  -0.196    0.845    
Years_Factor3 -0.00807    0.26734  -0.030    0.976    
Years_Factor4 -0.09393    0.46368  -0.203    0.839    
Years_Factor5  0.22080    0.48602   0.454    0.650    
Years_Factor6 -0.04283    0.21849  -0.196    0.845    
logSigma      -0.48546    0.11425  -4.249 2.15e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Newton-Raphson maximisation, 7 iterations
Return code 1: gradient close to zero (gradtol)
Log-likelihood: -63.05633 on 8 Df

Show the code

# Pulling the results into a data frame
tobit_coeffs <- as.data.frame(summary(tobit_res_years)$estimate)
tobit_coeffs$Variable <- rownames(tobit_coeffs)

# Filtering for just the Years_Factor rows
tobit_plot_data <- tobit_coeffs %>% 
  filter(grepl("Years_Factor", Variable))

ggplot(tobit_plot_data, aes(x = Variable, y = Estimate)) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") + # Zero line
  geom_point(size = 4, color = "#2c3e50") +
  geom_errorbar(aes(ymin = Estimate - `Std. error`, ymax = Estimate + `Std. error`), width = 0.2) +
  theme_minimal() +
  coord_flip() + # Makes it easier to read variable names
  labs(title = "Tobit Coefficients: Effect of Years of Experience",
       subtitle = "Reference Group: Years_Factor 1 | Error bars represent Std. Error",
       y = "Coefficient Estimate (Relative to Group 1)",
       x = "Experience Group")

Show the code

# Pulling the results into a data frame
tobit_coeffs_se <- as.data.frame(summary(tobit_se_years)$estimate)
tobit_coeffs_se$Variable <- rownames(tobit_coeffs_se)

# Filtering for just the Years_Factor rows
tobit_seplot_data <- tobit_coeffs_se %>% 
  filter(grepl("Years_Factor", Variable))

ggplot(tobit_seplot_data, aes(x = Variable, y = Estimate)) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") + # Zero line
  geom_point(size = 4, color = "#2c3e50") +
  geom_errorbar(aes(ymin = Estimate - `Std. error`, ymax = Estimate + `Std. error`), width = 0.2) +
  theme_minimal() +
  coord_flip() + # Makes it easier to read variable names
  labs(title = "Tobit Coefficients: Effect of Years of Experience on Self-Efficacy",
       subtitle = "Reference Group: Years_Factor 1 | Error bars represent Std. Error",
       y = "Coefficient Estimate (Relative to Group 1)",
       x = "Experience Group")

Are gains better than zero?

Show the code

# For Self-Efficacy
se_tests <- cohort4_final_gains %>%
  group_by(Years_Factor) %>%
  summarise(
    n = n(),
    mean_gain = mean(Gain_SelfEfficacy, na.rm = TRUE),
    p_value = t.test(Gain_SelfEfficacy, mu = 0)$p.value
  )
summary(se_tests)

 Years_Factor       n           mean_gain           p_value       
 1:1          Min.   : 2.00   Min.   :-0.54167   Min.   :0.03131  
 2:1          1st Qu.: 5.00   1st Qu.:-0.45395   1st Qu.:0.10306  
 3:1          Median :14.50   Median :-0.08952   Median :0.43888  
 4:1          Mean   :12.83   Mean   :-0.15590   Mean   :0.37040  
 5:1          3rd Qu.:18.75   3rd Qu.: 0.05815   3rd Qu.:0.60277  
 6:1          Max.   :24.00   Max.   : 0.25338   Max.   :0.66479

Show the code

# For Resilience
resilience_tests <- cohort4_final_gains %>%
  group_by(Years_Factor) %>%
  summarise(
    n = n(),
    mean_gain = mean(Gain_Resilience, na.rm = TRUE),
    p_value = t.test(Gain_Resilience, mu = 0)$p.value  # Testing if mean is different from 0
  )
summary(resilience_tests)

 Years_Factor       n           mean_gain           p_value      
 1:1          Min.   : 2.00   Min.   :-0.16667   Min.   :0.1227  
 2:1          1st Qu.: 5.00   1st Qu.: 0.03289   1st Qu.:0.1764  
 3:1          Median :14.50   Median : 0.13871   Median :0.3137  
 4:1          Mean   :12.83   Mean   : 0.09043   Mean   :0.4520  
 5:1          3rd Qu.:18.75   3rd Qu.: 0.17282   3rd Qu.:0.7020  
 6:1          Max.   :24.00   Max.   : 0.25000   Max.   :1.0000

References

Abdolrezapour, Parisa, Sahar Jahanbakhsh Ganjeh, and Nasim Ghanbari. 2023. “Self-Efficacy and Resilience as Predictors of Students’ Academic Motivation in Online Education.” PLoS One 18 (5): e0285984. https://doi.org/10.1371/journal.pone.0285984.