1 INTRODUCTION

Passing the bar exam is a critical milestone for law school graduates and a key indicator of academic readiness and program effectiveness. Within a traditional three-year law curriculum, students advance from foundational legal instruction in the first and second years (1L and 2L) to more specialized preparation in the final year (3L), culminating in a standardized licensure exam.

As part of an internal academic review, this study analyzes data from students who took the bar exam between 2021 and 2024. The dataset includes both successful and unsuccessful candidates, offering a comprehensive view of academic, preparatory, and background factors potentially associated with bar passage.

The primary objective is to build a logistic regression model that identifies statistically significant predictors of exam success. To achieve this, the analysis is structured as follows:

  • Data Loading and Preparation: The dataset is cleaned and preprocessed by transforming character fields into categorical variables, resolving inconsistencies, creating binary indicators for program participation, and removing records with critical missing values.

  • Exploratory and Correlation Analysis: Summary statistics, data type inspection, and correlation matrices are used to assess redundancy among variables, evaluate scale and distribution, and guide the selection of valid predictors by avoiding post-outcome variables.

  • Model Construction: A logistic regression model is specified, including both main effects and theoretically justified interaction terms, to quantify the relationship between key predictors (e.g., GPA, LSAT, academic support) and bar passage.

  • Model Optimization: A backward stepwise selection process based on the Akaike Information Criterion (AIC) is employed to refine the model, retaining only the most informative and statistically robust terms.

This structured approach enables a interpretable analysis of bar exam performance across multiple student groups.


2 EXPLORATORY DATA ANALYSIS (EDA)

This section presents an exploratory data analysis (EDA) to prepare the dataset for modeling. It begins with loading and inspecting the raw data, identifying data types, and evaluating the presence of missing values. Subsequent steps involve data cleaning and transformation to correct formatting issues, recode inconsistent levels, and generate derived variables that enhance interpretability. Relevant features are then selected and renamed to align with the project’s structure. Finally, correlation analysis is conducted to assess linear relationships among numerical variables, helping to identify redundant predictors and ensuring the inclusion of only valid


2.1 DATA LOADING AND INITIAL ANALYSIS

In this section, we load and prepare the dataset for analysis. The process involves importing the data from an external source, inspecting its structure, identifying missing values, and converting key variables to their appropriate types. These steps ensure the dataset is properly formatted and ready for further statistical exploration and modeling.

DATA LOADING

# Create the dataframe
url <- "https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/Updated_Bar_Data_For_Review_Final.csv"
data <- read.csv(url)
rmarkdown::paged_table(data)


INITIAL DATA ANALYSIS

# Initial Data Exploration
str(data)
## 'data.frame':    476 obs. of  28 variables:
##  $ Year                       : int  2021 2021 2021 2021 2021 2021 2021 2021 2021 2021 ...
##  $ PassFail                   : chr  "F" "F" "F" "F" ...
##  $ Age                        : num  29.1 29.6 29 36.2 28.9 30.8 29.1 42.9 28.3 27.1 ...
##  $ LSAT                       : int  152 155 157 156 145 154 149 160 152 150 ...
##  $ UGPA                       : num  3.42 2.82 3.46 3.13 3.49 2.85 3.43 3.29 3.62 3.07 ...
##  $ CivPro                     : chr  "B+" "B+" "C" "D+" ...
##  $ LPI                        : chr  "A" "B" "B" "C" ...
##  $ LPII                       : chr  "A" "B" "B" "C+" ...
##  $ GPA_1L                     : num  3.21 2.43 2.62 2.27 2.29 ...
##  $ GPA_Final                  : num  3.29 3.2 2.91 2.77 2.9 2.82 3 3.09 3.21 2.74 ...
##  $ FinalRankPercentile        : num  0.46 0.33 0.08 0.02 0.08 0.05 0.15 0.22 0.34 0.01 ...
##  $ Accommodations             : chr  "N" "Y" "N" "N" ...
##  $ Probation                  : chr  "N" "Y" "N" "Y" ...
##  $ LegalAnalysis_TexasPractice: chr  "Y" "Y" "Y" "Y" ...
##  $ AdvLegalPerfSkills         : chr  "Y" "Y" "Y" "Y" ...
##  $ AdvLegalAnalysis           : chr  "Y" "Y" "Y" "Y" ...
##  $ BarPrepCompany             : chr  "Barbri" "Barbri" "Barbri" "Barbri" ...
##  $ BarPrepCompletion          : num  0.96 0.98 0.48 1 0.77 0.02 0.9 0.76 0.77 0.88 ...
##  $ OptIntoWritingGuide        : chr  "" "" "" "" ...
##  $ X.LawSchoolBarPrepWorkshops: int  3 0 3 0 5 1 5 5 1 5 ...
##  $ StudentSuccessInitiative   : chr  "N" "Cochran" "Smith" "Baldwin" ...
##  $ BarPrepMentor              : chr  "N" "N" "N" "N" ...
##  $ MPRE                       : num  103 76 99 81 99 NA 90 97 100 78 ...
##  $ MPT                        : num  3 3 3 2.5 3.5 3 2.5 2.5 3 2.5 ...
##  $ MEE                        : num  2.67 3.17 2.67 3 2.67 2 3.5 3 2.67 3.83 ...
##  $ WrittenScaledScore         : num  126 133 126 126 130 ...
##  $ MBE                        : num  133 133 118 140 125 ...
##  $ UBE                        : num  259 266 244 266 256 ...
summary(data)
##       Year        PassFail              Age             LSAT      
##  Min.   :2021   Length:476         Min.   :23.10   Min.   :141.0  
##  1st Qu.:2022   Class :character   1st Qu.:26.70   1st Qu.:153.0  
##  Median :2023   Mode  :character   Median :28.20   Median :156.0  
##  Mean   :2023                      Mean   :29.13   Mean   :155.3  
##  3rd Qu.:2024                      3rd Qu.:30.10   3rd Qu.:157.0  
##  Max.   :2024                      Max.   :65.70   Max.   :168.0  
##                                                                   
##       UGPA          CivPro              LPI                LPII          
##  Min.   :2.010   Length:476         Length:476         Length:476        
##  1st Qu.:3.250   Class :character   Class :character   Class :character  
##  Median :3.490   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :3.451                                                           
##  3rd Qu.:3.710                                                           
##  Max.   :4.140                                                           
##                                                                          
##      GPA_1L        GPA_Final    FinalRankPercentile Accommodations    
##  Min.   :2.200   Min.   :2.44   Min.   :0.0000      Length:476        
##  1st Qu.:2.781   1st Qu.:3.05   1st Qu.:0.2600      Class :character  
##  Median :3.083   Median :3.27   Median :0.5150      Mode  :character  
##  Mean   :3.086   Mean   :3.28   Mean   :0.5067                        
##  3rd Qu.:3.383   3rd Qu.:3.52   3rd Qu.:0.7500                        
##  Max.   :4.000   Max.   :3.99   Max.   :0.9900                        
##  NA's   :4                                                            
##   Probation         LegalAnalysis_TexasPractice AdvLegalPerfSkills
##  Length:476         Length:476                  Length:476        
##  Class :character   Class :character            Class :character  
##  Mode  :character   Mode  :character            Mode  :character  
##                                                                   
##                                                                   
##                                                                   
##                                                                   
##  AdvLegalAnalysis   BarPrepCompany     BarPrepCompletion OptIntoWritingGuide
##  Length:476         Length:476         Min.   :0.0200    Length:476         
##  Class :character   Class :character   1st Qu.:0.8000    Class :character   
##  Mode  :character   Mode  :character   Median :0.8900    Mode  :character   
##                                        Mean   :0.8635                       
##                                        3rd Qu.:0.9800                       
##                                        Max.   :1.0000                       
##                                        NA's   :23                           
##  X.LawSchoolBarPrepWorkshops StudentSuccessInitiative BarPrepMentor     
##  Min.   :0.000               Length:476               Length:476        
##  1st Qu.:0.000               Class :character         Class :character  
##  Median :0.000               Mode  :character         Mode  :character  
##  Mean   :1.532                                                          
##  3rd Qu.:3.000                                                          
##  Max.   :5.000                                                          
##                                                                         
##       MPRE             MPT             MEE        WrittenScaledScore
##  Min.   : 76.00   Min.   :1.000   Min.   :2.000   Min.   :111.7     
##  1st Qu.: 89.50   1st Qu.:3.000   1st Qu.:3.330   1st Qu.:138.0     
##  Median : 99.00   Median :3.500   Median :3.670   Median :146.9     
##  Mean   : 99.46   Mean   :3.651   Mean   :3.719   Mean   :146.6     
##  3rd Qu.:107.00   3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:155.7     
##  Max.   :145.00   Max.   :5.500   Max.   :5.330   Max.   :181.2     
##  NA's   :273                                                        
##       MBE             UBE       
##  Min.   :103.6   Min.   :227.3  
##  1st Qu.:138.7   1st Qu.:278.5  
##  Median :147.1   Median :293.5  
##  Mean   :146.2   Mean   :292.9  
##  3rd Qu.:154.0   3rd Qu.:306.8  
##  Max.   :187.9   Max.   :358.7  
## 
colSums(is.na(data))
##                        Year                    PassFail 
##                           0                           0 
##                         Age                        LSAT 
##                           0                           0 
##                        UGPA                      CivPro 
##                           0                           0 
##                         LPI                        LPII 
##                           0                           0 
##                      GPA_1L                   GPA_Final 
##                           4                           0 
##         FinalRankPercentile              Accommodations 
##                           0                           0 
##                   Probation LegalAnalysis_TexasPractice 
##                           0                           0 
##          AdvLegalPerfSkills            AdvLegalAnalysis 
##                           0                           0 
##              BarPrepCompany           BarPrepCompletion 
##                           0                          23 
##         OptIntoWritingGuide X.LawSchoolBarPrepWorkshops 
##                           0                           0 
##    StudentSuccessInitiative               BarPrepMentor 
##                           0                           0 
##                        MPRE                         MPT 
##                         273                           0 
##                         MEE          WrittenScaledScore 
##                           0                           0 
##                         MBE                         UBE 
##                           0                           0

Define the Target Variable

# Define the Target Variable (PassFail)
data$PassFail <- factor(data$PassFail, levels = c("F", "P"))

Convert Key Categorical Variables to Factors

# Convert Key Categorical Variables to Factors
categorical_vars <- c("Accommodations", "Probation", 
                      "LegalAnalysis_TexasPractice", 
                      "AdvLegalPerfSkills", "AdvLegalAnalysis", 
                      "BarPrepCompany", "StudentSuccessInitiative", "BarPrepMentor",
                      "CivPro", "LPI", "LPII")
data[categorical_vars] <- lapply(data[categorical_vars], factor)

Check Levels of Categorical Variables

# Check Levels of Categorical Variables
str(data)
## 'data.frame':    476 obs. of  28 variables:
##  $ Year                       : int  2021 2021 2021 2021 2021 2021 2021 2021 2021 2021 ...
##  $ PassFail                   : Factor w/ 2 levels "F","P": 1 1 1 1 1 1 1 1 1 2 ...
##  $ Age                        : num  29.1 29.6 29 36.2 28.9 30.8 29.1 42.9 28.3 27.1 ...
##  $ LSAT                       : int  152 155 157 156 145 154 149 160 152 150 ...
##  $ UGPA                       : num  3.42 2.82 3.46 3.13 3.49 2.85 3.43 3.29 3.62 3.07 ...
##  $ CivPro                     : Factor w/ 9 levels "","A","B","B+",..: 4 4 5 8 5 4 5 5 6 5 ...
##  $ LPI                        : Factor w/ 9 levels "","A","B","B+",..: 2 3 3 5 6 9 5 6 3 3 ...
##  $ LPII                       : Factor w/ 9 levels "","A","B","B+",..: 2 3 3 6 6 7 3 3 3 5 ...
##  $ GPA_1L                     : num  3.21 2.43 2.62 2.27 2.29 ...
##  $ GPA_Final                  : num  3.29 3.2 2.91 2.77 2.9 2.82 3 3.09 3.21 2.74 ...
##  $ FinalRankPercentile        : num  0.46 0.33 0.08 0.02 0.08 0.05 0.15 0.22 0.34 0.01 ...
##  $ Accommodations             : Factor w/ 2 levels "N","Y": 1 2 1 1 1 1 1 1 1 1 ...
##  $ Probation                  : Factor w/ 3 levels "N","N ","Y": 1 3 1 3 3 1 3 3 1 3 ...
##  $ LegalAnalysis_TexasPractice: Factor w/ 2 levels "N","Y": 2 2 2 2 2 2 2 2 2 2 ...
##  $ AdvLegalPerfSkills         : Factor w/ 2 levels "N","Y": 2 2 2 2 2 2 2 2 2 2 ...
##  $ AdvLegalAnalysis           : Factor w/ 2 levels "N","Y": 2 2 2 2 2 2 2 2 2 2 ...
##  $ BarPrepCompany             : Factor w/ 7 levels "","Barbri","Helix",..: 2 2 2 2 7 7 7 7 7 7 ...
##  $ BarPrepCompletion          : num  0.96 0.98 0.48 1 0.77 0.02 0.9 0.76 0.77 0.88 ...
##  $ OptIntoWritingGuide        : chr  "" "" "" "" ...
##  $ X.LawSchoolBarPrepWorkshops: int  3 0 3 0 5 1 5 5 1 5 ...
##  $ StudentSuccessInitiative   : Factor w/ 22 levels "Arrington","Aycock",..: 16 7 21 3 3 17 19 6 16 6 ...
##  $ BarPrepMentor              : Factor w/ 68 levels "AbbeyCoufal",..: 49 49 49 49 49 49 49 49 49 49 ...
##  $ MPRE                       : num  103 76 99 81 99 NA 90 97 100 78 ...
##  $ MPT                        : num  3 3 3 2.5 3.5 3 2.5 2.5 3 2.5 ...
##  $ MEE                        : num  2.67 3.17 2.67 3 2.67 2 3.5 3 2.67 3.83 ...
##  $ WrittenScaledScore         : num  126 133 126 126 130 ...
##  $ MBE                        : num  133 133 118 140 125 ...
##  $ UBE                        : num  259 266 244 266 256 ...
sapply(data[, c("Accommodations", "Probation", 
                "LegalAnalysis_TexasPractice", "AdvLegalPerfSkills", "AdvLegalAnalysis", 
                "BarPrepCompany", "StudentSuccessInitiative", "BarPrepMentor", "CivPro", "LPI", "LPII")], levels)
## $Accommodations
## [1] "N" "Y"
## 
## $Probation
## [1] "N"  "N " "Y" 
## 
## $LegalAnalysis_TexasPractice
## [1] "N" "Y"
## 
## $AdvLegalPerfSkills
## [1] "N" "Y"
## 
## $AdvLegalAnalysis
## [1] "N" "Y"
## 
## $BarPrepCompany
## [1] ""            "Barbri"      "Helix"       "JD Advising" "Kaplan"     
## [6] "Quimbee"     "Themis"     
## 
## $StudentSuccessInitiative
##  [1] "Arrington"   "Aycock"      "Baldwin"     "Beyer"       "Chapman"    
##  [6] "Christopher" "Cochran"     "Corn"        "Gonzalez"    "Hardberger" 
## [11] "Humphrey"    "Keffer"      "Lauriat"     "Lux"         "McDonald"   
## [16] "N"           "Rosen"       "RSherwin"    "Saavedra"    "Sherwin"    
## [21] "Smith"       "Stafford"   
## 
## $BarPrepMentor
##  [1] "AbbeyCoufal"        "AmberBeard"         "AmberRich"         
##  [4] "AshleyPirtle"       "AshleySanders"      "BenIvey"           
##  [7] "BrendaJohnson"      "BryanGreer"         "CadyMello"         
## [10] "ChrisRhodes"        "ClayElliott"        "ColeShooter"       
## [13] "ColleenByrom"       "ColleenElbe(Potts)" "ColleenPotts"      
## [16] "DanielleSaavedra"   "DavidHutchens"      "DavidRice"         
## [19] "DeirdreWard"        "DenetteVaughn"      "DolphWenzel"       
## [22] "GrantCoffey"        "HaleyHickey"        "HolleyMcDaniel"    
## [25] "HollyHaseloff"      "HoltonWestbrook"    "JacquelynnMayes"   
## [28] "JessicaAycock"      "JohnMoore"          "JordanChavez"      
## [31] "JosephAustin"       "JulieDavis"         "JustinPlescha"     
## [34] "KathleenGoegel"     "KatyCrocker"        "KimberlyKelley"    
## [37] "LauraFidelie"       "LauraMcDivitt"      "LaurenWelch"       
## [40] "LeenaAl-Souki"      "MadelynDeviney"     "MariaOviedo"       
## [43] "MelissaWaggoner"    "MerylBenham"        "MichaelEconomidis" 
## [46] "MikelaBryant"       "MistyPratt"         "MonicaReyes"       
## [49] "N"                  "PaulaMilan"         "PaulaMillan"       
## [52] "PaulBarkhurst"      "QuentinWetsel"      "RebekahLuna"       
## [55] "RebekaLuna"         "ReidLollis"         "SaraThornton"      
## [58] "ScottKeffer"        "ScoutBlosser"       "TasiaEaslon"       
## [61] "TomHall"            "TravisWeibold"      "TylynnPayne"       
## [64] "VictoriaWhitehead"  "VictorMellinger"    "WilliamWells"      
## [67] "WillRaftis"         "Y-DanielleSaavedra"
## 
## $CivPro
## [1] ""   "A"  "B"  "B+" "C"  "C+" "D"  "D+" "F" 
## 
## $LPI
## [1] ""   "A"  "B"  "B+" "C"  "C+" "D"  "D+" "F" 
## 
## $LPII
## [1] ""   "A"  "B"  "B+" "C"  "C+" "CR" "D"  "D+"

Check Number of Unique Values for Numerical Variables

# Check Number of Unique Values for Numerical Variables
sapply(data[, c("LSAT", "UGPA", "GPA_1L", "GPA_Final", "FinalRankPercentile",
                "BarPrepCompletion", "X.LawSchoolBarPrepWorkshops",
                "MPRE", "MPT", "MEE", "MBE", "UBE")], function(x) length(unique(x)))
##                        LSAT                        UGPA 
##                          25                         143 
##                      GPA_1L                   GPA_Final 
##                         220                         133 
##         FinalRankPercentile           BarPrepCompletion 
##                         100                          56 
## X.LawSchoolBarPrepWorkshops                        MPRE 
##                           6                          54 
##                         MPT                         MEE 
##                           9                          21 
##                         MBE                         UBE 
##                         173                         348

Observations

  • The dataset contains multiple variable types, including integers, numeric (floating point), and characters that represent categorical data.
  • The target variable PassFail was initially read as a character and was explicitly converted into a binary factor with levels “F” and “P”.
  • Several academic and administrative attributes (e.g., CivPro, LPI, LPII, Probation, BarPrepCompany) were originally stored as character strings and were correctly converted to factors to support categorical analysis.
  • Some variables contain missing values—particularly GPA_1L, BarPrepCompletion, and MPRE, which may require imputation or exclusion strategies in further analysis.
  • Inconsistencies in categorical levels were identified (e.g., extra spaces in “N” and empty strings ““) which may need to be cleaned or recoded.
  • Variables such as BarPrepMentor and StudentSuccessInitiative have a large number of distinct levels, suggesting they may be high-cardinality categorical variables.
  • Numerical variables show a wide range in the number of unique values, indicating a mix of continuous and discrete characteristics. This may influence variable treatment during modeling.


2.2 DATA FILTERING AND PREPARATION

In this section, we perform essential data cleaning and preparation tasks to ensure consistency and analytical integrity. This includes correcting issues in categorical variables, generating new binary indicators for student participation, selecting relevant features, renaming variables for clarity, and addressing missing values by removing irrelevant columns and incomplete records.

2.2.1 Clean Specific Issues in Categorical Variables

In this section, we apply targeted data cleaning and transformation steps to improve variable consistency and prepare the dataset for analysis. This includes trimming white spaces, recoding empty strings, handling missing values in key categorical variables, and creating new binary indicators to better capture student participation in mentoring and support programs.

Clean Probation Variable

# Clean Probation Variable
data$Probation <- trimws(data$Probation)
data$Probation <- factor(data$Probation)

Replace Empty Strings in BarPrepCompany

# Replace Empty Strings in BarPrepCompany
levels(data$BarPrepCompany) <- c(levels(data$BarPrepCompany), "None")
data$BarPrepCompany[data$BarPrepCompany == ""] <- "None"
data$BarPrepCompany <- factor(data$BarPrepCompany)

Handle Missing Values in CivPro, LPI, and LPII

# Handle Missing Values in CivPro, LPI, and LPII
data$CivPro[data$CivPro == ""] <- NA
data$LPI[data$LPI == ""] <- NA
data$LPII[data$LPII == ""] <- NA

data$CivPro <- factor(data$CivPro)
data$LPI <- factor(data$LPI)
data$LPII <- factor(data$LPII)

Create Binary Variables for Mentor and Student Support Participation

# Create Binary Variables for Mentor and Student Support Participation
data$HadMentor <- ifelse(data$BarPrepMentor == "N", "No", "Yes")
data$HadMentor <- factor(data$HadMentor)

data$StudentSuccessParticipated <- ifelse(data$StudentSuccessInitiative == "N", "No", "Yes")
data$StudentSuccessParticipated <- as.factor(data$StudentSuccessParticipated)

2.2.2 Create Filtered Dataset by Selecting Relevant Variables

In this section, we refine the dataset by selecting only the variables relevant to the analysis and renaming them according to the project’s naming convention. We also assess the presence of missing values across all selected features and apply cleaning steps accordingly. This includes removing irrelevant columns and excluding incomplete records for important variables

Select and Rename Relevant Variables

# Select and Rename Relevant Variables
data2 <- data[c(
  "Year", "PassFail", "Age", "LSAT", "UGPA",
  "CivPro", "LPI", "LPII", "GPA_1L", "GPA_Final", "FinalRankPercentile",
  "Accommodations", "Probation",
  "LegalAnalysis_TexasPractice", "AdvLegalPerfSkills", "AdvLegalAnalysis",
  "BarPrepCompany", "BarPrepCompletion",
  "OptIntoWritingGuide", "X.LawSchoolBarPrepWorkshops",
  "MPRE", "MPT", "MEE", "WrittenScaledScore", "MBE", "UBE",
  "StudentSuccessParticipated", "HadMentor"
)]

# Rename Variables in data2 According to Provided Naming Convention
colnames(data2) <- c(
  "Class", "Pass", "Age", "LSAT", "UGPA",
  "CivPro", "LP1", "LP2", "OneCum", "FGPA", "FinalRankPercentile",
  "Accom", "Probation",
  "LegalAnalysis", "AdvLegalPerf", "AdvLegalAnalysis",
  "BarPrep", "PctBarPrepComplete",
  "OptIntoWritingGuide", "NumPrepWorkshops",
  "MPRE", "MPT", "MEE", "WrittenScaledScore", "MBE", "UBE",
  "StudentSuccessInitiative", "BarPrepMentor"
)

Assess Missing Values

# Assess Missing Values
colSums(is.na(data2))
##                    Class                     Pass                      Age 
##                        0                        0                        0 
##                     LSAT                     UGPA                   CivPro 
##                        0                        0                        4 
##                      LP1                      LP2                   OneCum 
##                        5                        3                        4 
##                     FGPA      FinalRankPercentile                    Accom 
##                        0                        0                        0 
##                Probation            LegalAnalysis             AdvLegalPerf 
##                        0                        0                        0 
##         AdvLegalAnalysis                  BarPrep       PctBarPrepComplete 
##                        0                        0                       23 
##      OptIntoWritingGuide         NumPrepWorkshops                     MPRE 
##                        0                        0                      273 
##                      MPT                      MEE       WrittenScaledScore 
##                        0                        0                        0 
##                      MBE                      UBE StudentSuccessInitiative 
##                        0                        0                        0 
##            BarPrepMentor 
##                        0

Observations

  • The variable MPRE has approximately 57% missing values. It is recommended to remove this column entirely to avoid introducing incomplete or unreliable information into the model.
  • The variable OptIntoWritingGuide exists in the dataset but is not part of the required variables for the project analysis. It is recommended to remove this column as it is irrelevant to the project’s objectives.
  • The variable WrittenScaledScore is present but was not listed among the variables required for the analysis. It is recommended to remove this column to maintain focus on the specified features.
  • The variables CivPro, LP1, LP2, OneCum, and PctBarPrepComplete have a small number of missing values. It is recommended to remove rows with missing data in any of these variables to ensure that all records used in the modeling are complete and reliable.
  • The variable FinalRankPercentile is present but was not explicitly listed among the required variables.
# Remove unnecessary columns: MPRE, OptIntoWritingGuide, WrittenScaledScore and FinalRankPercentile
data2 <- subset(data2, select = -c(MPRE, OptIntoWritingGuide, WrittenScaledScore, FinalRankPercentile))

# Remove rows with missing values in critical academic and preparation variables
critical_vars <- c("CivPro", "LP1", "LP2", "OneCum", "PctBarPrepComplete")
data2 <- data2[complete.cases(data2[, critical_vars]), ]

# Verify that all missing values are removed
colSums(is.na(data2))
##                    Class                     Pass                      Age 
##                        0                        0                        0 
##                     LSAT                     UGPA                   CivPro 
##                        0                        0                        0 
##                      LP1                      LP2                   OneCum 
##                        0                        0                        0 
##                     FGPA                    Accom                Probation 
##                        0                        0                        0 
##            LegalAnalysis             AdvLegalPerf         AdvLegalAnalysis 
##                        0                        0                        0 
##                  BarPrep       PctBarPrepComplete         NumPrepWorkshops 
##                        0                        0                        0 
##                      MPT                      MEE                      MBE 
##                        0                        0                        0 
##                      UBE StudentSuccessInitiative            BarPrepMentor 
##                        0                        0                        0


2.3 CORRELATION ANALYSIS

In this section, we examine the relationships among numeric variables through a correlation matrix. This analysis helps identify redundant predictors and assess potential violations of modeling assumptions, such as causality

# Correlation Matrix Visualization
numeric_vars <- data2[, sapply(data2, is.numeric)]
cor_matrix <- cor(numeric_vars, use = "complete.obs")  

ggcorrplot(cor_matrix, 
           lab = TRUE,                           
           lab_size = 4,                       
           colors = c("red", "white", "#4A90E2"), 
           outline.color = "black",             
           show.legend = TRUE,                   
           title = "Correlation Matrix of Numeric Variables",
           ggtheme = ggplot2::theme_minimal()
)

Observations

  • The variables OneCum and FGPA exhibit a strong correlation of 0.87, indicating redundancy between first-year and final GPA. It is recommended to retain FGPA and exclude OneCum.
  • The variables MPT, MEE, MBE, and UBE show moderate to high intercorrelations (e.g., MBE and UBE: r = 0.87), which is expected given that UBE is a composite score derived from the others.
  • The variables MPT, MEE, MBE, and UBE represent results obtained during the final bar examination. Including them as predictors would imply using information generated after the outcome of interest (Pass), which constitutes a violation of the principle of causality. This compromises the validity of the model as a predictive tool.
  • Recommendation:
    • Exclude from the model: OneCum, MPT, MEE, MBE, and UBE.
    • It is recommended not to include the variable Class in the model, as it merely identifies the student’s entry year and shows no significant association with bar exam outcomes. Its inclusion could introduce noise without contributing predictive value.



3 MODEL IMPLEMENTATION

This section outlines the development and refinement of a logistic regression model designed to predict exam outcomes based on academic, preparatory, and support-related variables. We first construct a comprehensive model that incorporates both main effects and theoretically motivated interaction terms. Given the potential for overfitting, a stepwise selection process is then employed to simplify the model while maintaining predictive performance. The goal is to identify the most influential predictors and interactions that explain students’ likelihood of passing the exam.


3.1 INITIAL MODEL

Logistic regression is a widely used statistical method for modeling the relationship between a binary outcome variable and one or more independent variables. It allows us to estimate the probability of a specific outcome (e.g., passing the exam) as a function of several predictors, while interpreting the effect of each variable in terms of odds.

Beyond the individual effects of each predictor, certain combinations of factors may interact and jointly influence the probability that a student passes the bar exam. Therefore, interaction terms were included in the logistic regression model based on theoretical reasoning and the academic context.

The selected interactions fall into the following conceptual categories:

  • Academic Preparation × Academic Performance:
    • PctBarPrepComplete * FGPA
    • NumPrepWorkshops * FGPA
    • BarPrep * FGPA
  • Academic Support × At-Risk Academic History:
    • Probation * StudentSuccessInitiative
    • Probation * BarPrepMentor
    • Probation * LegalAnalysis
  • Elective Courses × Performance or Preparation:
    • AdvLegalPerf * FGPA
    • AdvLegalAnalysis * PctBarPrepComplete
    • LegalAnalysis * BarPrepMentor
  • Accommodations × Additional Support:
    • Accom * BarPrepMentor
    • Accom * NumPrepWorkshops
  • Age × Key Preparation Factors:
    • Age * BarPrep
    • Age * AdvLegalPerf


The general form of the logistic regression model is:

\[ \log\left(\frac{P}{1 - P}\right) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_p X_p + \beta_a (X_i \cdot X_j) + \beta_b (X_k \cdot X_l) + \cdots \]

Where:

  • \(P\) : is the probability that a student passes the bar exam (i.e. Pass = 1).
  • \(\log\left(\frac{P}{1 - P}\right)\): is the logit function, which transforms the probability into a continuous scale.
  • \(\beta_0\) is the intercept, representing the log-odds of passing when all predictors are equal to zero.
  • \(\beta_1, \beta_2, \dots, \beta_p\): are the regression coefficients that represent how each predictor influences the likelihood of passing the exam.
  • \(X_1, X_2, \dots, X_p\): are the independent variables, including:
    • Academic indicators (e.g., LSAT, UGPA, FGPA, CivPro, LP1, LP2)
    • Support-related variables (e.g., Accom, Probation, LegalAnalysis, AdvLegalPerf, AdvLegalAnalysis)
    • Preparation metrics (e.g., BarPrep, PctBarPrepComplete, NumPrepWorkshops)
    • Participation in support programs (e.g., StudentSuccessInitiative, BarPrepMentor)
  • \(\beta_{a}, \beta_{b}, \dots\): are the coefficients for interaction terms, representing the joint effect of two variables.
  • \((X_i \cdot X_j), (X_k \cdot X_l), \dots\): represent interactions between pairs of variables, such as:
    • PctBarPrepComplete × FGPA
    • Probation × BarPrepMentor
    • AdvLegalPerf × Age

We fit a logistic regression model to examine how various academic and preparatory factors influence the probability of passing the bar exam

model <- glm(
  Pass ~ LSAT + UGPA + FGPA +
    CivPro + LP1 + LP2 +
    Accom + Probation +
    LegalAnalysis + AdvLegalPerf + AdvLegalAnalysis +
    BarPrep + PctBarPrepComplete + NumPrepWorkshops +
    StudentSuccessInitiative + BarPrepMentor + Age +
    PctBarPrepComplete * FGPA +
    NumPrepWorkshops * FGPA +
    BarPrep * FGPA +
    Probation * StudentSuccessInitiative +
    Probation * BarPrepMentor +
    Probation * LegalAnalysis +
    AdvLegalPerf * FGPA +
    AdvLegalAnalysis * PctBarPrepComplete +
    LegalAnalysis * BarPrepMentor +
    Accom * BarPrepMentor +
    Accom * NumPrepWorkshops +
    Age * BarPrep +
    Age * AdvLegalPerf,
    
  family = binomial,
  data = data2
)

summary(model)
## 
## Call:
## glm(formula = Pass ~ LSAT + UGPA + FGPA + CivPro + LP1 + LP2 + 
##     Accom + Probation + LegalAnalysis + AdvLegalPerf + AdvLegalAnalysis + 
##     BarPrep + PctBarPrepComplete + NumPrepWorkshops + StudentSuccessInitiative + 
##     BarPrepMentor + Age + PctBarPrepComplete * FGPA + NumPrepWorkshops * 
##     FGPA + BarPrep * FGPA + Probation * StudentSuccessInitiative + 
##     Probation * BarPrepMentor + Probation * LegalAnalysis + AdvLegalPerf * 
##     FGPA + AdvLegalAnalysis * PctBarPrepComplete + LegalAnalysis * 
##     BarPrepMentor + Accom * BarPrepMentor + Accom * NumPrepWorkshops + 
##     Age * BarPrep + Age * AdvLegalPerf, family = binomial, data = data2)
## 
## Coefficients: (2 not defined because of singularities)
##                                          Estimate Std. Error z value Pr(>|z|)
## (Intercept)                            -121.11168   32.06595  -3.777 0.000159
## LSAT                                      0.40098    0.09879   4.059 4.92e-05
## UGPA                                      1.13283    0.86303   1.313 0.189311
## FGPA                                     16.61607    8.23642   2.017 0.043655
## CivProB                                  -0.24246    1.69992  -0.143 0.886584
## CivProB+                                 -0.91517    1.76946  -0.517 0.605016
## CivProC                                  -0.21065    1.75584  -0.120 0.904505
## CivProC+                                 -1.87381    1.79275  -1.045 0.295923
## CivProD                                 -18.83462 3956.18089  -0.005 0.996201
## CivProD+                                  1.22296    2.25717   0.542 0.587949
## CivProF                                   0.57058 3956.18255   0.000 0.999885
## LP1B                                      3.09940    1.12904   2.745 0.006048
## LP1B+                                     0.90304    1.04438   0.865 0.387223
## LP1C                                      2.19372    1.34275   1.634 0.102312
## LP1C+                                     1.78451    1.09281   1.633 0.102477
## LP1D                                     17.32865  313.11414   0.055 0.955865
## LP1D+                                     5.17139    2.27400   2.274 0.022958
## LP1F                                     -5.50236 3956.18199  -0.001 0.998890
## LP2B                                      0.55607    1.11542   0.499 0.618110
## LP2B+                                     1.38538    1.19002   1.164 0.244357
## LP2C                                      1.59572    1.48744   1.073 0.283362
## LP2C+                                     2.13734    1.36794   1.562 0.118182
## LP2CR                                     1.03683    1.32008   0.785 0.432204
## LP2D                                     17.19828 1734.11582   0.010 0.992087
## LP2D+                                    19.23622 1553.99695   0.012 0.990124
## AccomY                                   -1.88392    0.86315  -2.183 0.029064
## ProbationY                                8.14584    2.76681   2.944 0.003239
## LegalAnalysisY                           -2.84095    1.12966  -2.515 0.011908
## AdvLegalPerfY                            17.98095    9.54971   1.883 0.059717
## AdvLegalAnalysisY                        -1.34158    3.15973  -0.425 0.671137
## BarPrepHelix                             15.95202 3956.18087   0.004 0.996783
## BarPrepKaplan                            -9.63249   26.93799  -0.358 0.720658
## BarPrepThemis                             3.59691    7.83419   0.459 0.646141
## PctBarPrepComplete                       26.36757   26.31618   1.002 0.316366
## NumPrepWorkshops                         -2.55186    1.98903  -1.283 0.199504
## StudentSuccessInitiativeYes               0.59664    0.84688   0.705 0.481110
## BarPrepMentorYes                         -4.02863    1.36571  -2.950 0.003179
## Age                                      -0.07865    0.14412  -0.546 0.585243
## FGPA:PctBarPrepComplete                  -5.70865    8.83416  -0.646 0.518149
## FGPA:NumPrepWorkshops                     0.83027    0.66560   1.247 0.212246
## FGPA:BarPrepHelix                              NA         NA      NA       NA
## FGPA:BarPrepKaplan                       -5.08698    5.79570  -0.878 0.380098
## FGPA:BarPrepThemis                       -3.77535    2.63088  -1.435 0.151283
## ProbationY:StudentSuccessInitiativeYes   -3.24184    1.91577  -1.692 0.090610
## ProbationY:BarPrepMentorYes              -2.64625    2.15073  -1.230 0.218549
## ProbationY:LegalAnalysisY                -6.27705    2.66977  -2.351 0.018715
## FGPA:AdvLegalPerfY                       -1.02776    3.16203  -0.325 0.745157
## AdvLegalAnalysisY:PctBarPrepComplete      2.63073    3.85218   0.683 0.494657
## LegalAnalysisY:BarPrepMentorYes           3.98768    1.50018   2.658 0.007858
## AccomY:BarPrepMentorYes                   3.96404    4.25750   0.931 0.351817
## AccomY:NumPrepWorkshops                   1.49991    0.83849   1.789 0.073645
## BarPrepHelix:Age                               NA         NA      NA       NA
## BarPrepKaplan:Age                         0.84048    0.80120   1.049 0.294164
## BarPrepThemis:Age                         0.36413    0.15942   2.284 0.022366
## AdvLegalPerfY:Age                        -0.52593    0.20203  -2.603 0.009235
##                                           
## (Intercept)                            ***
## LSAT                                   ***
## UGPA                                      
## FGPA                                   *  
## CivProB                                   
## CivProB+                                  
## CivProC                                   
## CivProC+                                  
## CivProD                                   
## CivProD+                                  
## CivProF                                   
## LP1B                                   ** 
## LP1B+                                     
## LP1C                                      
## LP1C+                                     
## LP1D                                      
## LP1D+                                  *  
## LP1F                                      
## LP2B                                      
## LP2B+                                     
## LP2C                                      
## LP2C+                                     
## LP2CR                                     
## LP2D                                      
## LP2D+                                     
## AccomY                                 *  
## ProbationY                             ** 
## LegalAnalysisY                         *  
## AdvLegalPerfY                          .  
## AdvLegalAnalysisY                         
## BarPrepHelix                              
## BarPrepKaplan                             
## BarPrepThemis                             
## PctBarPrepComplete                        
## NumPrepWorkshops                          
## StudentSuccessInitiativeYes               
## BarPrepMentorYes                       ** 
## Age                                       
## FGPA:PctBarPrepComplete                   
## FGPA:NumPrepWorkshops                     
## FGPA:BarPrepHelix                         
## FGPA:BarPrepKaplan                        
## FGPA:BarPrepThemis                        
## ProbationY:StudentSuccessInitiativeYes .  
## ProbationY:BarPrepMentorYes               
## ProbationY:LegalAnalysisY              *  
## FGPA:AdvLegalPerfY                        
## AdvLegalAnalysisY:PctBarPrepComplete      
## LegalAnalysisY:BarPrepMentorYes        ** 
## AccomY:BarPrepMentorYes                   
## AccomY:NumPrepWorkshops                .  
## BarPrepHelix:Age                          
## BarPrepKaplan:Age                         
## BarPrepThemis:Age                      *  
## AdvLegalPerfY:Age                      ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 317.60  on 447  degrees of freedom
## Residual deviance: 130.56  on 395  degrees of freedom
## AIC: 236.56
## 
## Number of Fisher Scoring iterations: 16

Observations

  • Strong individual predictors:
    • LSAT and FGPA both show significant positive effects on the likelihood of passing the bar exam (p < 0.05).
    • ProbationY and LegalAnalysisY also show significant effects, suggesting these indicators of academic standing and support are relevant.
  • Significant interactions:
    • Probation × LegalAnalysis (p = 0.0187)
    • LegalAnalysis × BarPrepMentor (p = 0.0079)
    • AdvLegalPerf × Age (p = 0.0092)
    • BarPrepThemis × Age (p = 0.0434)
  • Potential overfitting or weak predictors:
    • Several CivPro and LP2 categories have very large standard errors and non-significant p-values, indicating potential redundancy or sparse data in those categories.
    • Interaction terms like FGPA:BarPrepKaplan and FGPA:AdvLegalPerfY are not significant and might not contribute meaningfully to the model.
  • Collinearity/singularity issues:
    • Two interaction terms (FGPA:BarPrepHelix and BarPrepHelix:Age) are marked as NA, meaning they were dropped due to perfect multicollinearity or lack of variability.
  • Model fit:
    • Residual deviance is 130.56, and AIC = 236.56.
    • Suggests a need for simplification to improve generalizability—justifying the use of stepwise selection.


3.2 MODEL REFINEMENT USING STEPWISE SELECTION

To refine the model and improve its simplicity, a stepwise selection procedure was applied. Specifically, we used the backward elimination approach, which begins with the full model, including all main effects and theoretically justified interaction terms, and iteratively removes non-contributing predictors based on the Akaike Information Criterion (AIC). This process balances model fit and complexity, aiming to retain only the most informative variables.

modelo_stepwise <- step(model, direction = "backward")
## Start:  AIC=236.56
## Pass ~ LSAT + UGPA + FGPA + CivPro + LP1 + LP2 + Accom + Probation + 
##     LegalAnalysis + AdvLegalPerf + AdvLegalAnalysis + BarPrep + 
##     PctBarPrepComplete + NumPrepWorkshops + StudentSuccessInitiative + 
##     BarPrepMentor + Age + PctBarPrepComplete * FGPA + NumPrepWorkshops * 
##     FGPA + BarPrep * FGPA + Probation * StudentSuccessInitiative + 
##     Probation * BarPrepMentor + Probation * LegalAnalysis + AdvLegalPerf * 
##     FGPA + AdvLegalAnalysis * PctBarPrepComplete + LegalAnalysis * 
##     BarPrepMentor + Accom * BarPrepMentor + Accom * NumPrepWorkshops + 
##     Age * BarPrep + Age * AdvLegalPerf
## 
##                                       Df Deviance    AIC
## - CivPro                               7   140.01 232.01
## - LP2                                  7   141.17 233.17
## - FGPA:AdvLegalPerf                    1   130.67 234.67
## - FGPA:BarPrep                         2   132.95 234.95
## - FGPA:PctBarPrepComplete              1   131.00 235.00
## - AdvLegalAnalysis:PctBarPrepComplete  1   131.03 235.03
## - Probation:BarPrepMentor              1   132.13 236.13
## - Accom:BarPrepMentor                  1   132.16 236.16
## - UGPA                                 1   132.32 236.32
## - FGPA:NumPrepWorkshops                1   132.32 236.32
## <none>                                     130.56 236.56
## - LP1                                  7   145.35 237.35
## - Probation:StudentSuccessInitiative   1   134.07 238.07
## - BarPrep:Age                          2   137.63 239.63
## - Probation:LegalAnalysis              1   136.06 240.06
## - AdvLegalPerf:Age                     1   138.41 242.41
## - LegalAnalysis:BarPrepMentor          1   138.68 242.68
## - Accom:NumPrepWorkshops               1   138.97 242.97
## - LSAT                                 1   152.92 256.92
## 
## Step:  AIC=232.01
## Pass ~ LSAT + UGPA + FGPA + LP1 + LP2 + Accom + Probation + LegalAnalysis + 
##     AdvLegalPerf + AdvLegalAnalysis + BarPrep + PctBarPrepComplete + 
##     NumPrepWorkshops + StudentSuccessInitiative + BarPrepMentor + 
##     Age + FGPA:PctBarPrepComplete + FGPA:NumPrepWorkshops + FGPA:BarPrep + 
##     Probation:StudentSuccessInitiative + Probation:BarPrepMentor + 
##     Probation:LegalAnalysis + FGPA:AdvLegalPerf + AdvLegalAnalysis:PctBarPrepComplete + 
##     LegalAnalysis:BarPrepMentor + Accom:BarPrepMentor + Accom:NumPrepWorkshops + 
##     BarPrep:Age + AdvLegalPerf:Age
## 
##                                       Df Deviance    AIC
## - LP2                                  7   148.44 226.44
## - LP1                                  7   150.75 228.75
## - FGPA:BarPrep                         2   141.20 229.20
## - FGPA:PctBarPrepComplete              1   140.34 230.34
## - FGPA:AdvLegalPerf                    1   140.43 230.43
## - UGPA                                 1   140.59 230.59
## - AdvLegalAnalysis:PctBarPrepComplete  1   140.86 230.86
## - Accom:BarPrepMentor                  1   141.52 231.52
## - FGPA:NumPrepWorkshops                1   141.56 231.56
## <none>                                     140.01 232.01
## - Probation:BarPrepMentor              1   142.38 232.38
## - BarPrep:Age                          2   145.87 233.87
## - Probation:StudentSuccessInitiative   1   143.96 233.96
## - Probation:LegalAnalysis              1   145.54 235.54
## - AdvLegalPerf:Age                     1   145.56 235.56
## - LegalAnalysis:BarPrepMentor          1   146.34 236.34
## - Accom:NumPrepWorkshops               1   147.84 237.84
## - LSAT                                 1   158.28 248.28
## 
## Step:  AIC=226.44
## Pass ~ LSAT + UGPA + FGPA + LP1 + Accom + Probation + LegalAnalysis + 
##     AdvLegalPerf + AdvLegalAnalysis + BarPrep + PctBarPrepComplete + 
##     NumPrepWorkshops + StudentSuccessInitiative + BarPrepMentor + 
##     Age + FGPA:PctBarPrepComplete + FGPA:NumPrepWorkshops + FGPA:BarPrep + 
##     Probation:StudentSuccessInitiative + Probation:BarPrepMentor + 
##     Probation:LegalAnalysis + FGPA:AdvLegalPerf + AdvLegalAnalysis:PctBarPrepComplete + 
##     LegalAnalysis:BarPrepMentor + Accom:BarPrepMentor + Accom:NumPrepWorkshops + 
##     BarPrep:Age + AdvLegalPerf:Age
## 
##                                       Df Deviance    AIC
## - FGPA:BarPrep                         2   149.11 223.11
## - UGPA                                 1   148.62 224.62
## - FGPA:AdvLegalPerf                    1   148.79 224.79
## - FGPA:PctBarPrepComplete              1   148.90 224.90
## - AdvLegalAnalysis:PctBarPrepComplete  1   148.98 224.98
## - LP1                                  7   162.38 226.38
## <none>                                     148.44 226.44
## - FGPA:NumPrepWorkshops                1   150.77 226.77
## - Accom:BarPrepMentor                  1   151.06 227.06
## - Probation:BarPrepMentor              1   151.66 227.66
## - BarPrep:Age                          2   153.88 227.88
## - Probation:StudentSuccessInitiative   1   153.35 229.35
## - AdvLegalPerf:Age                     1   153.88 229.88
## - LegalAnalysis:BarPrepMentor          1   156.29 232.29
## - Probation:LegalAnalysis              1   157.84 233.84
## - Accom:NumPrepWorkshops               1   158.56 234.56
## - LSAT                                 1   169.72 245.72
## 
## Step:  AIC=223.11
## Pass ~ LSAT + UGPA + FGPA + LP1 + Accom + Probation + LegalAnalysis + 
##     AdvLegalPerf + AdvLegalAnalysis + BarPrep + PctBarPrepComplete + 
##     NumPrepWorkshops + StudentSuccessInitiative + BarPrepMentor + 
##     Age + FGPA:PctBarPrepComplete + FGPA:NumPrepWorkshops + Probation:StudentSuccessInitiative + 
##     Probation:BarPrepMentor + Probation:LegalAnalysis + FGPA:AdvLegalPerf + 
##     AdvLegalAnalysis:PctBarPrepComplete + LegalAnalysis:BarPrepMentor + 
##     Accom:BarPrepMentor + Accom:NumPrepWorkshops + BarPrep:Age + 
##     AdvLegalPerf:Age
## 
##                                       Df Deviance    AIC
## - FGPA:PctBarPrepComplete              1   149.25 221.25
## - UGPA                                 1   149.26 221.26
## - FGPA:AdvLegalPerf                    1   149.45 221.45
## - AdvLegalAnalysis:PctBarPrepComplete  1   149.61 221.61
## <none>                                     149.11 223.11
## - FGPA:NumPrepWorkshops                1   151.22 223.22
## - LP1                                  7   163.50 223.50
## - Accom:BarPrepMentor                  1   151.71 223.71
## - BarPrep:Age                          2   153.90 223.90
## - Probation:BarPrepMentor              1   152.02 224.02
## - Probation:StudentSuccessInitiative   1   154.58 226.58
## - AdvLegalPerf:Age                     1   154.64 226.64
## - LegalAnalysis:BarPrepMentor          1   157.01 229.01
## - Probation:LegalAnalysis              1   157.88 229.88
## - Accom:NumPrepWorkshops               1   159.02 231.02
## - LSAT                                 1   170.22 242.22
## 
## Step:  AIC=221.25
## Pass ~ LSAT + UGPA + FGPA + LP1 + Accom + Probation + LegalAnalysis + 
##     AdvLegalPerf + AdvLegalAnalysis + BarPrep + PctBarPrepComplete + 
##     NumPrepWorkshops + StudentSuccessInitiative + BarPrepMentor + 
##     Age + FGPA:NumPrepWorkshops + Probation:StudentSuccessInitiative + 
##     Probation:BarPrepMentor + Probation:LegalAnalysis + FGPA:AdvLegalPerf + 
##     AdvLegalAnalysis:PctBarPrepComplete + LegalAnalysis:BarPrepMentor + 
##     Accom:BarPrepMentor + Accom:NumPrepWorkshops + BarPrep:Age + 
##     AdvLegalPerf:Age
## 
##                                       Df Deviance    AIC
## - UGPA                                 1   149.42 219.42
## - AdvLegalAnalysis:PctBarPrepComplete  1   149.62 219.62
## - FGPA:AdvLegalPerf                    1   149.64 219.64
## - FGPA:NumPrepWorkshops                1   151.23 221.23
## <none>                                     149.25 221.25
## - LP1                                  7   163.60 221.60
## - Accom:BarPrepMentor                  1   151.90 221.90
## - BarPrep:Age                          2   153.90 221.90
## - Probation:BarPrepMentor              1   152.12 222.12
## - AdvLegalPerf:Age                     1   154.68 224.68
## - Probation:StudentSuccessInitiative   1   154.93 224.93
## - LegalAnalysis:BarPrepMentor          1   157.17 227.17
## - Probation:LegalAnalysis              1   157.89 227.89
## - Accom:NumPrepWorkshops               1   159.07 229.07
## - LSAT                                 1   170.28 240.28
## 
## Step:  AIC=219.42
## Pass ~ LSAT + FGPA + LP1 + Accom + Probation + LegalAnalysis + 
##     AdvLegalPerf + AdvLegalAnalysis + BarPrep + PctBarPrepComplete + 
##     NumPrepWorkshops + StudentSuccessInitiative + BarPrepMentor + 
##     Age + FGPA:NumPrepWorkshops + Probation:StudentSuccessInitiative + 
##     Probation:BarPrepMentor + Probation:LegalAnalysis + FGPA:AdvLegalPerf + 
##     AdvLegalAnalysis:PctBarPrepComplete + LegalAnalysis:BarPrepMentor + 
##     Accom:BarPrepMentor + Accom:NumPrepWorkshops + BarPrep:Age + 
##     AdvLegalPerf:Age
## 
##                                       Df Deviance    AIC
## - AdvLegalAnalysis:PctBarPrepComplete  1   149.83 217.83
## - FGPA:AdvLegalPerf                    1   149.85 217.85
## - FGPA:NumPrepWorkshops                1   151.40 219.40
## <none>                                     149.42 219.42
## - LP1                                  7   163.67 219.67
## - Accom:BarPrepMentor                  1   152.08 220.08
## - BarPrep:Age                          2   154.37 220.37
## - Probation:BarPrepMentor              1   152.47 220.47
## - AdvLegalPerf:Age                     1   154.81 222.81
## - Probation:StudentSuccessInitiative   1   155.16 223.16
## - LegalAnalysis:BarPrepMentor          1   157.51 225.51
## - Probation:LegalAnalysis              1   158.83 226.83
## - Accom:NumPrepWorkshops               1   159.37 227.37
## - LSAT                                 1   170.49 238.49
## 
## Step:  AIC=217.83
## Pass ~ LSAT + FGPA + LP1 + Accom + Probation + LegalAnalysis + 
##     AdvLegalPerf + AdvLegalAnalysis + BarPrep + PctBarPrepComplete + 
##     NumPrepWorkshops + StudentSuccessInitiative + BarPrepMentor + 
##     Age + FGPA:NumPrepWorkshops + Probation:StudentSuccessInitiative + 
##     Probation:BarPrepMentor + Probation:LegalAnalysis + FGPA:AdvLegalPerf + 
##     LegalAnalysis:BarPrepMentor + Accom:BarPrepMentor + Accom:NumPrepWorkshops + 
##     BarPrep:Age + AdvLegalPerf:Age
## 
##                                      Df Deviance    AIC
## - AdvLegalAnalysis                    1   149.97 215.97
## - FGPA:AdvLegalPerf                   1   150.30 216.30
## <none>                                    149.83 217.83
## - FGPA:NumPrepWorkshops               1   151.96 217.96
## - LP1                                 7   164.28 218.28
## - Accom:BarPrepMentor                 1   152.51 218.51
## - BarPrep:Age                         2   155.01 219.01
## - Probation:BarPrepMentor             1   153.01 219.01
## - AdvLegalPerf:Age                    1   154.82 220.82
## - Probation:StudentSuccessInitiative  1   155.28 221.28
## - LegalAnalysis:BarPrepMentor         1   157.97 223.97
## - Probation:LegalAnalysis             1   159.59 225.59
## - Accom:NumPrepWorkshops              1   159.67 225.67
## - LSAT                                1   170.49 236.49
## - PctBarPrepComplete                  1   185.01 251.01
## 
## Step:  AIC=215.97
## Pass ~ LSAT + FGPA + LP1 + Accom + Probation + LegalAnalysis + 
##     AdvLegalPerf + BarPrep + PctBarPrepComplete + NumPrepWorkshops + 
##     StudentSuccessInitiative + BarPrepMentor + Age + FGPA:NumPrepWorkshops + 
##     Probation:StudentSuccessInitiative + Probation:BarPrepMentor + 
##     Probation:LegalAnalysis + FGPA:AdvLegalPerf + LegalAnalysis:BarPrepMentor + 
##     Accom:BarPrepMentor + Accom:NumPrepWorkshops + BarPrep:Age + 
##     AdvLegalPerf:Age
## 
##                                      Df Deviance    AIC
## - FGPA:AdvLegalPerf                   1   150.43 214.43
## <none>                                    149.97 215.97
## - FGPA:NumPrepWorkshops               1   152.01 216.01
## - LP1                                 7   164.28 216.28
## - Accom:BarPrepMentor                 1   152.75 216.75
## - BarPrep:Age                         2   155.07 217.07
## - Probation:BarPrepMentor             1   153.46 217.46
## - AdvLegalPerf:Age                    1   154.87 218.87
## - Probation:StudentSuccessInitiative  1   155.46 219.46
## - LegalAnalysis:BarPrepMentor         1   157.99 221.99
## - Accom:NumPrepWorkshops              1   159.72 223.72
## - Probation:LegalAnalysis             1   159.90 223.90
## - LSAT                                1   170.69 234.69
## - PctBarPrepComplete                  1   185.30 249.30
## 
## Step:  AIC=214.43
## Pass ~ LSAT + FGPA + LP1 + Accom + Probation + LegalAnalysis + 
##     AdvLegalPerf + BarPrep + PctBarPrepComplete + NumPrepWorkshops + 
##     StudentSuccessInitiative + BarPrepMentor + Age + FGPA:NumPrepWorkshops + 
##     Probation:StudentSuccessInitiative + Probation:BarPrepMentor + 
##     Probation:LegalAnalysis + LegalAnalysis:BarPrepMentor + Accom:BarPrepMentor + 
##     Accom:NumPrepWorkshops + BarPrep:Age + AdvLegalPerf:Age
## 
##                                      Df Deviance    AIC
## <none>                                    150.43 214.43
## - FGPA:NumPrepWorkshops               1   152.55 214.55
## - LP1                                 7   165.14 215.14
## - Accom:BarPrepMentor                 1   153.24 215.24
## - BarPrep:Age                         2   155.93 215.93
## - Probation:BarPrepMentor             1   153.98 215.98
## - Probation:StudentSuccessInitiative  1   155.58 217.58
## - AdvLegalPerf:Age                    1   156.93 218.93
## - LegalAnalysis:BarPrepMentor         1   158.13 220.13
## - Probation:LegalAnalysis             1   160.63 222.63
## - Accom:NumPrepWorkshops              1   160.66 222.66
## - LSAT                                1   171.04 233.04
## - PctBarPrepComplete                  1   186.82 248.82
summary(modelo_stepwise)
## 
## Call:
## glm(formula = Pass ~ LSAT + FGPA + LP1 + Accom + Probation + 
##     LegalAnalysis + AdvLegalPerf + BarPrep + PctBarPrepComplete + 
##     NumPrepWorkshops + StudentSuccessInitiative + BarPrepMentor + 
##     Age + FGPA:NumPrepWorkshops + Probation:StudentSuccessInitiative + 
##     Probation:BarPrepMentor + Probation:LegalAnalysis + LegalAnalysis:BarPrepMentor + 
##     Accom:BarPrepMentor + Accom:NumPrepWorkshops + BarPrep:Age + 
##     AdvLegalPerf:Age, family = binomial, data = data2)
## 
## Coefficients: (1 not defined because of singularities)
##                                          Estimate Std. Error z value Pr(>|z|)
## (Intercept)                             -77.26688   16.45683  -4.695 2.66e-06
## LSAT                                      0.32312    0.07873   4.104 4.06e-05
## FGPA                                      7.62862    1.90623   4.002 6.28e-05
## LP1B                                      2.45949    0.85710   2.870 0.004111
## LP1B+                                     0.86828    0.84795   1.024 0.305847
## LP1C                                      2.25387    1.12478   2.004 0.045088
## LP1C+                                     1.53055    0.82329   1.859 0.063018
## LP1D                                     12.77530  182.11723   0.070 0.944075
## LP1D+                                     4.52225    1.75318   2.579 0.009896
## LP1F                                     -6.86233 1455.39855  -0.005 0.996238
## AccomY                                   -2.02271    0.74651  -2.710 0.006737
## ProbationY                                8.93183    2.41907   3.692 0.000222
## LegalAnalysisY                           -2.05678    0.85430  -2.408 0.016060
## AdvLegalPerfY                            10.24174    4.08860   2.505 0.012247
## BarPrepHelix                             14.70974 1455.39790   0.010 0.991936
## BarPrepKaplan                           -23.65909   18.72741  -1.263 0.206467
## BarPrepThemis                            -4.76433    3.46276  -1.376 0.168860
## PctBarPrepComplete                        9.73694    1.89435   5.140 2.75e-07
## NumPrepWorkshops                         -2.43445    1.73304  -1.405 0.160101
## StudentSuccessInitiativeYes               0.56483    0.76293   0.740 0.459097
## BarPrepMentorYes                         -3.14512    1.10756  -2.840 0.004516
## Age                                      -0.10560    0.12115  -0.872 0.383411
## FGPA:NumPrepWorkshops                     0.77463    0.57763   1.341 0.179908
## ProbationY:StudentSuccessInitiativeYes   -3.46016    1.70536  -2.029 0.042460
## ProbationY:BarPrepMentorYes              -3.49753    1.89029  -1.850 0.064275
## ProbationY:LegalAnalysisY                -6.16617    2.07631  -2.970 0.002980
## LegalAnalysisY:BarPrepMentorYes           3.52039    1.32205   2.663 0.007749
## AccomY:BarPrepMentorYes                   4.04478    2.93011   1.380 0.167458
## AccomY:NumPrepWorkshops                   1.49695    0.82649   1.811 0.070106
## BarPrepHelix:Age                               NA         NA      NA       NA
## BarPrepKaplan:Age                         0.79700    0.68414   1.165 0.244031
## BarPrepThemis:Age                         0.24211    0.11986   2.020 0.043388
## AdvLegalPerfY:Age                        -0.35209    0.14393  -2.446 0.014431
##                                           
## (Intercept)                            ***
## LSAT                                   ***
## FGPA                                   ***
## LP1B                                   ** 
## LP1B+                                     
## LP1C                                   *  
## LP1C+                                  .  
## LP1D                                      
## LP1D+                                  ** 
## LP1F                                      
## AccomY                                 ** 
## ProbationY                             ***
## LegalAnalysisY                         *  
## AdvLegalPerfY                          *  
## BarPrepHelix                              
## BarPrepKaplan                             
## BarPrepThemis                             
## PctBarPrepComplete                     ***
## NumPrepWorkshops                          
## StudentSuccessInitiativeYes               
## BarPrepMentorYes                       ** 
## Age                                       
## FGPA:NumPrepWorkshops                     
## ProbationY:StudentSuccessInitiativeYes *  
## ProbationY:BarPrepMentorYes            .  
## ProbationY:LegalAnalysisY              ** 
## LegalAnalysisY:BarPrepMentorYes        ** 
## AccomY:BarPrepMentorYes                   
## AccomY:NumPrepWorkshops                .  
## BarPrepHelix:Age                          
## BarPrepKaplan:Age                         
## BarPrepThemis:Age                      *  
## AdvLegalPerfY:Age                      *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 317.60  on 447  degrees of freedom
## Residual deviance: 150.43  on 416  degrees of freedom
## AIC: 214.43
## 
## Number of Fisher Scoring iterations: 14
model_final <- glm(formula = Pass ~ LSAT + FGPA + LP1 + Accom + Probation + 
    LegalAnalysis + AdvLegalPerf + BarPrep + PctBarPrepComplete + 
    NumPrepWorkshops + StudentSuccessInitiative + BarPrepMentor + 
    Age + FGPA:NumPrepWorkshops + Probation:StudentSuccessInitiative + 
    Probation:BarPrepMentor + Probation:LegalAnalysis + LegalAnalysis:BarPrepMentor + 
    Accom:BarPrepMentor + Accom:NumPrepWorkshops + BarPrep:Age + 
    AdvLegalPerf:Age, family = binomial, data = data2)

summary(model_final)
## 
## Call:
## glm(formula = Pass ~ LSAT + FGPA + LP1 + Accom + Probation + 
##     LegalAnalysis + AdvLegalPerf + BarPrep + PctBarPrepComplete + 
##     NumPrepWorkshops + StudentSuccessInitiative + BarPrepMentor + 
##     Age + FGPA:NumPrepWorkshops + Probation:StudentSuccessInitiative + 
##     Probation:BarPrepMentor + Probation:LegalAnalysis + LegalAnalysis:BarPrepMentor + 
##     Accom:BarPrepMentor + Accom:NumPrepWorkshops + BarPrep:Age + 
##     AdvLegalPerf:Age, family = binomial, data = data2)
## 
## Coefficients: (1 not defined because of singularities)
##                                          Estimate Std. Error z value Pr(>|z|)
## (Intercept)                             -77.26688   16.45683  -4.695 2.66e-06
## LSAT                                      0.32312    0.07873   4.104 4.06e-05
## FGPA                                      7.62862    1.90623   4.002 6.28e-05
## LP1B                                      2.45949    0.85710   2.870 0.004111
## LP1B+                                     0.86828    0.84795   1.024 0.305847
## LP1C                                      2.25387    1.12478   2.004 0.045088
## LP1C+                                     1.53055    0.82329   1.859 0.063018
## LP1D                                     12.77530  182.11723   0.070 0.944075
## LP1D+                                     4.52225    1.75318   2.579 0.009896
## LP1F                                     -6.86233 1455.39855  -0.005 0.996238
## AccomY                                   -2.02271    0.74651  -2.710 0.006737
## ProbationY                                8.93183    2.41907   3.692 0.000222
## LegalAnalysisY                           -2.05678    0.85430  -2.408 0.016060
## AdvLegalPerfY                            10.24174    4.08860   2.505 0.012247
## BarPrepHelix                             14.70974 1455.39790   0.010 0.991936
## BarPrepKaplan                           -23.65909   18.72741  -1.263 0.206467
## BarPrepThemis                            -4.76433    3.46276  -1.376 0.168860
## PctBarPrepComplete                        9.73694    1.89435   5.140 2.75e-07
## NumPrepWorkshops                         -2.43445    1.73304  -1.405 0.160101
## StudentSuccessInitiativeYes               0.56483    0.76293   0.740 0.459097
## BarPrepMentorYes                         -3.14512    1.10756  -2.840 0.004516
## Age                                      -0.10560    0.12115  -0.872 0.383411
## FGPA:NumPrepWorkshops                     0.77463    0.57763   1.341 0.179908
## ProbationY:StudentSuccessInitiativeYes   -3.46016    1.70536  -2.029 0.042460
## ProbationY:BarPrepMentorYes              -3.49753    1.89029  -1.850 0.064275
## ProbationY:LegalAnalysisY                -6.16617    2.07631  -2.970 0.002980
## LegalAnalysisY:BarPrepMentorYes           3.52039    1.32205   2.663 0.007749
## AccomY:BarPrepMentorYes                   4.04478    2.93011   1.380 0.167458
## AccomY:NumPrepWorkshops                   1.49695    0.82649   1.811 0.070106
## BarPrepHelix:Age                               NA         NA      NA       NA
## BarPrepKaplan:Age                         0.79700    0.68414   1.165 0.244031
## BarPrepThemis:Age                         0.24211    0.11986   2.020 0.043388
## AdvLegalPerfY:Age                        -0.35209    0.14393  -2.446 0.014431
##                                           
## (Intercept)                            ***
## LSAT                                   ***
## FGPA                                   ***
## LP1B                                   ** 
## LP1B+                                     
## LP1C                                   *  
## LP1C+                                  .  
## LP1D                                      
## LP1D+                                  ** 
## LP1F                                      
## AccomY                                 ** 
## ProbationY                             ***
## LegalAnalysisY                         *  
## AdvLegalPerfY                          *  
## BarPrepHelix                              
## BarPrepKaplan                             
## BarPrepThemis                             
## PctBarPrepComplete                     ***
## NumPrepWorkshops                          
## StudentSuccessInitiativeYes               
## BarPrepMentorYes                       ** 
## Age                                       
## FGPA:NumPrepWorkshops                     
## ProbationY:StudentSuccessInitiativeYes *  
## ProbationY:BarPrepMentorYes            .  
## ProbationY:LegalAnalysisY              ** 
## LegalAnalysisY:BarPrepMentorYes        ** 
## AccomY:BarPrepMentorYes                   
## AccomY:NumPrepWorkshops                .  
## BarPrepHelix:Age                          
## BarPrepKaplan:Age                         
## BarPrepThemis:Age                      *  
## AdvLegalPerfY:Age                      *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 317.60  on 447  degrees of freedom
## Residual deviance: 150.43  on 416  degrees of freedom
## AIC: 214.43
## 
## Number of Fisher Scoring iterations: 14

Observations

  • Stepwise Logistic Regression Model:

    • The model selection began with a comprehensive model including all main effects and theoretically motivated interaction terms, yielding an initial AIC of 236.56.
    • Using backward elimination guided by the Akaike Information Criterion (AIC), variables that did not substantially contribute to the model’s fit were progressively removed.
    • After multiple steps, the final model reached an AIC of 214.43, indicating improved balance between fit and complexity.
  • Key insights from the final model:

    • Strong Predictors (statistically significant, p < 0.05):
      • LSAT, FGPA: Strong academic indicators.
      • LP1B, LP1C, LP1D+, and LP1C+ (marginal): Some levels of Legal Process 1 performance are meaningful.
      • Accom, Probation, LegalAnalysis, AdvLegalPerf: Student support and academic flags remain influential.
      • PctBarPrepComplete: Highly significant preparation metric.
      • BarPrepMentor, LegalAnalysis × BarPrepMentor, Probation × LegalAnalysis: Key interaction effects highlight differential support outcomes.
      • AdvLegalPerf × Age, BarPrepThemis × Age: Show age-related performance interactions.
  • Marginal Predictors (p between 0.05 and 0.10): LP1C+, Accom × NumPrepWorkshops, Probation × BarPrepMentor

  • Removed variables: The procedure eliminated UGPA, CivPro, LP2, and several interaction terms (e.g., FGPA × BarPrep), indicating limited predictive value in the current dataset.

  • Model Quality:

    • Residual deviance = 150.43.
    • Null deviance = 317.60, suggesting considerable improvement.
    • Final AIC = 214.43, indicating a better trade-off between complexity and goodness-of-fit.



4 CONCLUSIONS

  • Data Loading and Preparation
    • The original dataset included multiple variable types (numeric, integer, categorical) and required extensive cleaning and recoding to enable proper modeling.
    • Variables such as MPRE and post-outcome exam scores (MPT, MEE, MBE, UBE) were excluded to preserve model causality and integrity.
    • Redundant features (e.g., OneCum, FinalRankPercentile) and irrelevant columns (e.g., OptIntoWritingGuide) were removed. Observations with missing values in key variables were also excluded to ensure data completeness.
  • Exploratory and Correlation Analysis
    • High correlations (e.g., OneCum and FGPA: r = 0.87) led to the removal of collinear predictors.
    • Several variables showed distributional irregularities or inconsistent categorical levels, which were cleaned prior to modeling.
    • The correlation structure informed decisions about variable selection and interaction exploration, avoiding variables with embedded outcome information.
  • Initial Model Implementation
    • A full logistic regression model was built including all main effects and theoretically justified interaction terms.
    • Strong predictors included LSAT, FGPA, Probation, and LegalAnalysis, each significantly associated with bar exam outcomes (p < 0.05).
    • Several interaction effects (e.g., Probation × LegalAnalysis, LegalAnalysis × BarPrepMentor, AdvLegalPerf × Age) highlighted important dependencies among support and academic variables.
    • The model exhibited a residual deviance of 130.56 and an AIC of 236.56 but also revealed potential overfitting, multicollinearity issues (e.g., NA coefficients), and non-significant terms.
  • Model Refinement and Selection
    • A backward stepwise selection process guided by AIC was used to iteratively remove non-informative terms, optimizing the balance between simplicity and fit.
    • The final model achieved a reduced AIC of 214.43 and retained key predictors and interactions with clear academic or support relevance.
    • Redundant variables such as UGPA, CivPro, LP2, and several non-contributing interactions were eliminated, improving generalizability.
  • Final Model Insights
    • Significant Predictors: LSAT, FGPA, LP1 performance levels, Probation, Accommodations, LegalAnalysis, and BarPrepMentor participation.
    • Key Interactions: Probation × LegalAnalysis, LegalAnalysis × BarPrepMentor, AdvLegalPerf × Age, BarPrepThemis × Age—indicating that academic history and support services jointly shape outcomes.
    • Marginal Predictors: Terms like Accom × NumPrepWorkshops and Probation × BarPrepMentor exhibited near-significance, suggesting areas for further exploration.
    • The model substantially reduced deviance compared to the null model (from 317.60 to 150.43), supporting its explanatory strength.
    • The final logistic regression model offers a reliable and interpretable tool for identifying students at risk of bar exam failure. It highlights the interplay between academic preparation, institutional support, and individual characteristics such as age and prior academic standing.



5 COMPLETE R-CODE

# SECTION 1: EXPLORATORY DATA ANALYSIS (EDA) ----------------------------------------------------------
# DATA LOADING AND INITIAL ANALYSIS
# Create the dataframe
url <- "https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/Updated_Bar_Data_For_Review_Final.csv"
data <- read.csv(url)
data

# Initial Data Exploration
str(data)
summary(data)
colSums(is.na(data))

# Define the Target Variable (PassFail)
data$PassFail <- factor(data$PassFail, levels = c("F", "P"))

# Convert Key Categorical Variables to Factors
categorical_vars <- c("Accommodations", "Probation", 
                      "LegalAnalysis_TexasPractice", 
                      "AdvLegalPerfSkills", "AdvLegalAnalysis", 
                      "BarPrepCompany", "StudentSuccessInitiative", "BarPrepMentor",
                      "CivPro", "LPI", "LPII")
data[categorical_vars] <- lapply(data[categorical_vars], factor)

# Check Levels of Categorical Variables
str(data)
sapply(data[, c("Accommodations", "Probation", 
                "LegalAnalysis_TexasPractice", "AdvLegalPerfSkills", "AdvLegalAnalysis", 
                "BarPrepCompany", "StudentSuccessInitiative", "BarPrepMentor", "CivPro", "LPI", "LPII")], levels)

# Check Number of Unique Values for Numerical Variables
sapply(data[, c("LSAT", "UGPA", "GPA_1L", "GPA_Final", "FinalRankPercentile",
                "BarPrepCompletion", "X.LawSchoolBarPrepWorkshops",
                "MPRE", "MPT", "MEE", "MBE", "UBE")], function(x) length(unique(x)))


# DATA FILTERING AND PREPARATION
# Clean Probation Variable
data$Probation <- trimws(data$Probation)
data$Probation <- factor(data$Probation)

# Replace Empty Strings in BarPrepCompany
levels(data$BarPrepCompany) <- c(levels(data$BarPrepCompany), "None")
data$BarPrepCompany[data$BarPrepCompany == ""] <- "None"
data$BarPrepCompany <- factor(data$BarPrepCompany)

# Handle Missing Values in CivPro, LPI, and LPII
data$CivPro[data$CivPro == ""] <- NA
data$LPI[data$LPI == ""] <- NA
data$LPII[data$LPII == ""] <- NA
data$CivPro <- factor(data$CivPro)
data$LPI <- factor(data$LPI)
data$LPII <- factor(data$LPII)

# Create Binary Variables for Mentor and Student Support Participation
data$HadMentor <- ifelse(data$BarPrepMentor == "N", "No", "Yes")
data$HadMentor <- factor(data$HadMentor)
data$StudentSuccessParticipated <- ifelse(data$StudentSuccessInitiative == "N", "No", "Yes")
data$StudentSuccessParticipated <- as.factor(data$StudentSuccessParticipated)

# Create Filtered Dataset data2 by Selecting Relevant Variables
# Select and Rename Relevant Variables
data2 <- data[c(
  "Year", "PassFail", "Age", "LSAT", "UGPA",
  "CivPro", "LPI", "LPII", "GPA_1L", "GPA_Final", "FinalRankPercentile",
  "Accommodations", "Probation",
  "LegalAnalysis_TexasPractice", "AdvLegalPerfSkills", "AdvLegalAnalysis",
  "BarPrepCompany", "BarPrepCompletion",
  "OptIntoWritingGuide", "X.LawSchoolBarPrepWorkshops",
  "MPRE", "MPT", "MEE", "WrittenScaledScore", "MBE", "UBE",
  "StudentSuccessParticipated", "HadMentor"
)]
# Rename Variables in data2 According to Provided Naming Convention
colnames(data2) <- c(
  "Class", "Pass", "Age", "LSAT", "UGPA",
  "CivPro", "LP1", "LP2", "OneCum", "FGPA", "FinalRankPercentile",
  "Accom", "Probation",
  "LegalAnalysis", "AdvLegalPerf", "AdvLegalAnalysis",
  "BarPrep", "PctBarPrepComplete",
  "OptIntoWritingGuide", "NumPrepWorkshops",
  "MPRE", "MPT", "MEE", "WrittenScaledScore", "MBE", "UBE",
  "StudentSuccessInitiative", "BarPrepMentor"
)

# Assess Missing Values
colSums(is.na(data2))

# Remove unnecessary columns: MPRE, OptIntoWritingGuide, WrittenScaledScore and FinalRankPercentile
data2 <- subset(data2, select = -c(MPRE, OptIntoWritingGuide, WrittenScaledScore, FinalRankPercentile))
# Remove rows with missing values in critical academic and preparation variables
critical_vars <- c("CivPro", "LP1", "LP2", "OneCum", "PctBarPrepComplete")
data2 <- data2[complete.cases(data2[, critical_vars]), ]
# Verify that all missing values are removed
colSums(is.na(data2))

# CORRELATION ANALYSIS
# Correlation Matrix Visualization
numeric_vars <- data2[, sapply(data2, is.numeric)]
cor_matrix <- cor(numeric_vars, use = "complete.obs")  

ggcorrplot(cor_matrix, 
           lab = TRUE,                           
           lab_size = 4,                       
           colors = c("red", "white", "#4A90E2"), 
           outline.color = "black",             
           show.legend = TRUE,                   
           title = "Correlation Matrix of Numeric Variables",
           ggtheme = ggplot2::theme_minimal()
)
# -----------------------------------------------------------------------------------------------------


# SECTION 2: MODEL IMPLEMENTATION ---------------------------------------------------------------------
# Initial model
model <- glm(
  Pass ~ LSAT + UGPA + FGPA +
    CivPro + LP1 + LP2 +
    Accom + Probation +
    LegalAnalysis + AdvLegalPerf + AdvLegalAnalysis +
    BarPrep + PctBarPrepComplete + NumPrepWorkshops +
    StudentSuccessInitiative + BarPrepMentor + Age +
    PctBarPrepComplete * FGPA +
    NumPrepWorkshops * FGPA +
    BarPrep * FGPA +
    Probation * StudentSuccessInitiative +
    Probation * BarPrepMentor +
    Probation * LegalAnalysis +
    AdvLegalPerf * FGPA +
    AdvLegalAnalysis * PctBarPrepComplete +
    LegalAnalysis * BarPrepMentor +
    Accom * BarPrepMentor +
    Accom * NumPrepWorkshops +
    Age * BarPrep +
    Age * AdvLegalPerf,
    
  family = binomial,
  data = data2
)

summary(model)

# MODEL REFINEMENT USING STEPWISE SELECTION
modelo_stepwise <- step(model, direction = "backward")
summary(modelo_stepwise)

# Final Model
model_final <- glm(formula = Pass ~ LSAT + FGPA + LP1 + Accom + Probation + 
    LegalAnalysis + AdvLegalPerf + BarPrep + PctBarPrepComplete + 
    NumPrepWorkshops + StudentSuccessInitiative + BarPrepMentor + 
    Age + FGPA:NumPrepWorkshops + Probation:StudentSuccessInitiative + 
    Probation:BarPrepMentor + Probation:LegalAnalysis + LegalAnalysis:BarPrepMentor + 
    Accom:BarPrepMentor + Accom:NumPrepWorkshops + BarPrep:Age + 
    AdvLegalPerf:Age, family = binomial, data = data2)

summary(model_final)

# -----------------------------------------------------------------------------------------------------