Predicting Student Dropout Risk Using Synthetic Data

[Victoria Fashina]

2024-12-31

Project Overview

🖥️ Live App: View Student Dropout Predictor Dashboard

Objective: Build a tool to predict university student dropout risk after their second semester.

Built with: - R, ggplot2, plotly, Shiny - ML models: Random Forest, Logistic Regression, GBM - SMOTE for class imbalance

Data: Synthetic, designed to replicate academic and behavioral student signals.

Spring 2025 Proposal

I Promised To: - ✅ Build a predictive AI based on attendance and Fitbit activity - ✅ Address cybersecurity and ethics in education - ✅ Deliver a working demo with interpretable features - ✅ Reflect on spiritual implications of AI (briefly)

The Dataset

  • 150 synthetic students
  • Variables include:
    • Course_Count, GPA, EFC
    • Fitbit-like data: Fatburn, Cardio, Cardio Peak
    • Dropout defined as no enrollment after 2nd semester

Preprocessing: Wide format + engineered changes.

Feature Engineering

  • GPA_Change = GPA2 - GPA1
  • Attendance_Change = Course_Count2 - Course_Count1
  • EFC_Change, Cardio_Change, etc.

Used to capture student momentum or decline over time.

Modeling Strategy

  • Models Tested:
    • ✅ Random Forest
    • Logistic Regression (baseline)
    • Gradient Boosting (best AUC)
  • Imbalance Handling:
    • SMOTE for minority class boost
    • 5-fold repeated CV for robust validation

Model Performance Summary

Model Accuracy AUC Kappa
Random Forest 36.7% 0.60 -0.28
Logistic Regression 53.3% 0.54 ~0.03
Gradient Boosting 46.7% 0.67 -0.09

Insight: GBM had the best discrimination power; Logistic Regression worked as expected for a baseline.

📊 Visualizing Risk Levels

Interpreting the Risk Plot

  • Bars show predicted dropout probability for top 20 students
  • Color legend:
    • 🟢 True Positive: correctly predicted dropout
    • 🔴 False Negative: dropout missed by model
    • 🔵 True Negative: correctly predicted to stay
    • 🟠 False Positive: flagged dropout, but stayed
  • Useful for advisors to review cases & intervene early

Ethics & Cybersecurity

  • All data pseudonymized
  • Protected processing and secure data handling
  • Ethical reflections:
    • AI supports human care, not replaces it
    • Christian ethics: fairness, transparency, stewardship

“AI should serve the student, not replace the counselor.”

Future Work 🚀

  • Predict dropout during first semester
    • Requires daily logs: LMS logins, activity, device data
  • Add real-time flags in the dashboard
  • Train models on institutional data for higher fidelity

Thank You!

Let’s build systems that notice students before they disappear.

🧠 Questions? 💬 Contact: [vfashina@oru.edu]

🖥️ Launch the Live Shiny App