library(dplyr)
library(ggplot2)
library(broom)
library(car)
library(pROC)
library(caret)
library(knitr)
library(kableExtra)

options(scipen = 999)
theme_set(theme_minimal())

Introduction

The TTU Law School provided a de-identified dataset of bar examination outcomes for recent cohorts from 2021 through 2025. The goal of this analysis is to identify actionable predictors that distinguish students who pass the Uniform Bar Examination from those who do not.

The main research question is:

Which pre-admission, law-school performance, and bar-preparation variables are associated with bar exam success?

The primary response variable is PassFail, because the school’s main administrative goal is to increase bar passage rates. I also model UBE as a continuous response because it gives more information than a simple pass/fail classification.

Prior Hypotheses

Before fitting models, I expect the following:

  1. LSAT and UGPA will be positively related to UBE and passing because they measure pre-law-school academic preparation.
  2. 1L GPA and final law school GPA will be stronger predictors than LSAT and UGPA because they measure performance during law school.
  3. BarPrepCompletion and MPRE will be positively associated with passing because they reflect bar readiness and preparation behavior.
  4. Probation will be negatively associated with passing because it reflects prior academic difficulty.
  5. Bar component scores such as MBE and WrittenScaledScore will strongly predict UBE, but they are final exam components, so they are less useful for early intervention.

Data and Methods

df <- read.csv(
  "https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/BarData_2025.csv",
  check.names = FALSE
)

str(df)
## 'data.frame':    600 obs. of  28 variables:
##  $ Year                       : int  2021 2021 2021 2021 2021 2021 2021 2021 2021 2021 ...
##  $ PassFail                   : chr  "F" "F" "F" "F" ...
##  $ Age                        : num  29.1 29.6 29 36.2 28.9 30.8 29.1 42.9 28.3 27.1 ...
##  $ LSAT                       : int  152 155 157 156 145 154 149 160 152 150 ...
##  $ UGPA                       : chr  "3.42" "2.82" "3.46" "3.13" ...
##  $ CivPro                     : chr  "B+" "B+" "C" "D+" ...
##  $ LPI                        : chr  "A" "B" "B" "C" ...
##  $ LPII                       : chr  "A" "B" "B" "C+" ...
##  $ GPA_1L                     : num  3.21 2.43 2.62 2.27 2.29 ...
##  $ GPA_Final                  : num  3.29 3.2 2.91 2.77 2.9 2.82 3 3.09 3.21 2.74 ...
##  $ FinalRankPercentile        : num  0.46 0.33 0.08 0.02 0.08 0.05 0.15 0.22 0.34 0.01 ...
##  $ Accommodations             : chr  "N" "Y" "N" "N" ...
##  $ Probation                  : chr  "N" "Y" "N" "Y" ...
##  $ LegalAnalysis_TexasPractice: chr  "Y" "Y" "Y" "Y" ...
##  $ AdvLegalPerfSkills         : chr  "Y" "Y" "Y" "Y" ...
##  $ AdvLegalAnalysis           : chr  "Y" "Y" "Y" "Y" ...
##  $ BarPrepCompany             : chr  "Barbri" "Barbri" "Barbri" "Barbri" ...
##  $ BarPrepCompletion          : num  0.96 0.98 0.48 1 0.77 0.02 0.9 0.76 0.77 0.88 ...
##  $ OptIntoWritingGuide        : chr  "" "" "" "" ...
##  $ #LawSchoolBarPrepWorkshops : int  3 0 3 0 5 1 5 5 1 5 ...
##  $ StudentSuccessInitiative   : chr  "N" "Cochran" "Smith" "Baldwin" ...
##  $ BarPrepMentor              : chr  "N" "N" "N" "N" ...
##  $ MPRE                       : num  103 76 99 81 99 NA 90 97 100 78 ...
##  $ MPT                        : num  3 3 3 2.5 3.5 3 2.5 2.5 3 2.5 ...
##  $ MEE                        : num  2.67 3.17 2.67 3 2.67 2 3.5 3 2.67 3.83 ...
##  $ WrittenScaledScore         : num  126 133 126 126 130 ...
##  $ MBE                        : num  133 133 118 140 125 ...
##  $ UBE                        : num  259 266 244 266 256 ...
head(df)
##   Year PassFail  Age LSAT UGPA CivPro LPI LPII GPA_1L GPA_Final
## 1 2021        F 29.1  152 3.42     B+   A    A  3.206      3.29
## 2 2021        F 29.6  155 2.82     B+   B    B  2.431      3.20
## 3 2021        F 29.0  157 3.46      C   B    B  2.620      2.91
## 4 2021        F 36.2  156 3.13     D+   C   C+  2.275      2.77
## 5 2021        F 28.9  145 3.49      C  C+   C+  2.293      2.90
## 6 2021        F 30.8  154 2.85     B+   F   CR  2.538      2.82
##   FinalRankPercentile Accommodations Probation LegalAnalysis_TexasPractice
## 1                0.46              N         N                           Y
## 2                0.33              Y         Y                           Y
## 3                0.08              N         N                           Y
## 4                0.02              N         Y                           Y
## 5                0.08              N         Y                           Y
## 6                0.05              N         N                           Y
##   AdvLegalPerfSkills AdvLegalAnalysis BarPrepCompany BarPrepCompletion
## 1                  Y                Y         Barbri              0.96
## 2                  Y                Y         Barbri              0.98
## 3                  Y                Y         Barbri              0.48
## 4                  Y                Y         Barbri              1.00
## 5                  Y                Y         Themis              0.77
## 6                  Y                Y         Themis              0.02
##   OptIntoWritingGuide #LawSchoolBarPrepWorkshops StudentSuccessInitiative
## 1                                              3                        N
## 2                                              0                  Cochran
## 3                                              3                    Smith
## 4                                              0                  Baldwin
## 5                                              5                  Baldwin
## 6                                              1                    Rosen
##   BarPrepMentor MPRE MPT  MEE WrittenScaledScore   MBE   UBE
## 1             N  103 3.0 2.67              125.5 133.3 258.8
## 2             N   76 3.0 3.17              133.1 132.7 265.8
## 3             N   99 3.0 2.67              125.5 118.2 243.7
## 4             N   81 2.5 3.00              125.5 140.1 265.6
## 5             N   99 3.5 2.67              130.5 125.4 255.9
## 6             N   NA 3.0 2.00              115.4 113.5 228.9
colnames(df)
##  [1] "Year"                        "PassFail"                   
##  [3] "Age"                         "LSAT"                       
##  [5] "UGPA"                        "CivPro"                     
##  [7] "LPI"                         "LPII"                       
##  [9] "GPA_1L"                      "GPA_Final"                  
## [11] "FinalRankPercentile"         "Accommodations"             
## [13] "Probation"                   "LegalAnalysis_TexasPractice"
## [15] "AdvLegalPerfSkills"          "AdvLegalAnalysis"           
## [17] "BarPrepCompany"              "BarPrepCompletion"          
## [19] "OptIntoWritingGuide"         "#LawSchoolBarPrepWorkshops" 
## [21] "StudentSuccessInitiative"    "BarPrepMentor"              
## [23] "MPRE"                        "MPT"                        
## [25] "MEE"                         "WrittenScaledScore"         
## [27] "MBE"                         "UBE"

Data Cleaning

The dataset contains numeric variables, letter grades, yes/no indicators, categorical variables, and missing values. Letter grades are converted to GPA-style numeric scores. Y/N variables are converted to factors. The response variable PassFail is coded with F as the reference level and P as the success outcome.

df$UGPA <- as.numeric(df$UGPA)
## Warning: NAs introduced by coercion
df$PassFail <- factor(df$PassFail, levels = c("F", "P"))

grade_map <- c(
  "A" = 4.0, "A-" = 3.7,
  "B+" = 3.3, "B" = 3.0, "B-" = 2.7,
  "C+" = 2.3, "C" = 2.0, "C-" = 1.7,
  "D+" = 1.3, "D" = 1.0, "D-" = 0.7,
  "F" = 0
)

df$CivPro_Num <- grade_map[trimws(df$CivPro)]
df$LPI_Num <- grade_map[trimws(df$LPI)]
df$LPII_Num <- grade_map[trimws(df$LPII)]

df$Accommodations <- factor(df$Accommodations)
df$Probation <- factor(df$Probation)
df$LegalAnalysis_TexasPractice <- factor(df$LegalAnalysis_TexasPractice)
df$AdvLegalPerfSkills <- factor(df$AdvLegalPerfSkills)
df$AdvLegalAnalysis <- factor(df$AdvLegalAnalysis)
df$OptIntoWritingGuide <- factor(df$OptIntoWritingGuide)
df$BarPrepCompany <- factor(df$BarPrepCompany)

df$PassBinary <- ifelse(df$PassFail == "P", 1, 0)

Missing Value Review

missing_table <- data.frame(
  Variable = names(df),
  Missing = colSums(is.na(df))
) %>%
  arrange(desc(Missing))

kable(missing_table, caption = "Missing Values by Variable")
Missing Values by Variable
Variable Missing
MPRE MPRE 397
LPII_Num LPII_Num 56
BarPrepCompletion BarPrepCompletion 26
LPI_Num LPI_Num 9
GPA_1L GPA_1L 8
CivPro_Num CivPro_Num 7
UGPA UGPA 1
Year Year 0
PassFail PassFail 0
Age Age 0
LSAT LSAT 0
CivPro CivPro 0
LPI LPI 0
LPII LPII 0
GPA_Final GPA_Final 0
FinalRankPercentile FinalRankPercentile 0
Accommodations Accommodations 0
Probation Probation 0
LegalAnalysis_TexasPractice LegalAnalysis_TexasPractice 0
AdvLegalPerfSkills AdvLegalPerfSkills 0
AdvLegalAnalysis AdvLegalAnalysis 0
BarPrepCompany BarPrepCompany 0
OptIntoWritingGuide OptIntoWritingGuide 0
#LawSchoolBarPrepWorkshops #LawSchoolBarPrepWorkshops 0
StudentSuccessInitiative StudentSuccessInitiative 0
BarPrepMentor BarPrepMentor 0
MPT MPT 0
MEE MEE 0
WrittenScaledScore WrittenScaledScore 0
MBE MBE 0
UBE UBE 0
PassBinary PassBinary 0

Rows with missing values are not deleted from the full dataset immediately. Instead, each model uses complete cases for the variables included in that specific model. This preserves as much usable information as possible.

Descriptive Statistics

df %>%
  summarise(
    N = n(),
    Pass_Rate = mean(PassFail == "P", na.rm = TRUE),
    Mean_UBE = mean(UBE, na.rm = TRUE),
    SD_UBE = sd(UBE, na.rm = TRUE),
    Mean_LSAT = mean(LSAT, na.rm = TRUE),
    Mean_UGPA = mean(UGPA, na.rm = TRUE),
    Mean_GPA_1L = mean(GPA_1L, na.rm = TRUE),
    Mean_GPA_Final = mean(GPA_Final, na.rm = TRUE),
    Mean_BarPrepCompletion = mean(BarPrepCompletion, na.rm = TRUE)
  ) %>%
  kable(digits = 3, caption = "Summary Statistics")
Summary Statistics
N Pass_Rate Mean_UBE SD_UBE Mean_LSAT Mean_UGPA Mean_GPA_1L Mean_GPA_Final Mean_BarPrepCompletion
600 0.898 294.709 21.435 155.628 3.478 3.091 3.275 0.865
df %>%
  count(PassFail) %>%
  mutate(Percent = n / sum(n)) %>%
  kable(digits = 3, caption = "Pass/Fail Counts")
Pass/Fail Counts
PassFail n Percent
F 61 0.102
P 539 0.898

Exploratory Data Analysis

ggplot(df, aes(x = PassFail, y = UBE)) +
  geom_boxplot() +
  labs(
    title = "UBE Score by Pass/Fail Status",
    x = "Pass/Fail",
    y = "UBE Score"
  )

ggplot(df, aes(x = LSAT, y = UBE, color = PassFail)) +
  geom_point(alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(
    title = "LSAT and UBE Score",
    x = "LSAT",
    y = "UBE Score"
  )
## `geom_smooth()` using formula = 'y ~ x'

ggplot(df, aes(x = GPA_Final, y = UBE, color = PassFail)) +
  geom_point(alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(
    title = "Final Law School GPA and UBE Score",
    x = "Final Law School GPA",
    y = "UBE Score"
  )
## `geom_smooth()` using formula = 'y ~ x'

ggplot(df, aes(x = BarPrepCompletion, y = UBE, color = PassFail)) +
  geom_point(alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(
    title = "Bar Prep Completion and UBE Score",
    x = "Bar Prep Completion Proportion",
    y = "UBE Score"
  )
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 26 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 26 rows containing missing values or values outside the scale range
## (`geom_point()`).

Model 1: Admission Predictors of UBE

The first model uses only pre-admission predictors: LSAT and UGPA.

model1 <- lm(UBE ~ LSAT + UGPA, data = df)
summary(model1)
## 
## Call:
## lm(formula = UBE ~ LSAT + UGPA, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -66.221 -13.466   1.022  14.406  54.180 
## 
## Coefficients:
##             Estimate Std. Error t value         Pr(>|t|)    
## (Intercept)  14.1607    36.1339   0.392            0.695    
## LSAT          1.5557     0.2183   7.125 0.00000000000302 ***
## UGPA         11.0483     2.2807   4.844 0.00000162235350 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20.41 on 596 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.09799,    Adjusted R-squared:  0.09496 
## F-statistic: 32.37 on 2 and 596 DF,  p-value: 0.00000000000004499
tidy(model1) %>%
  kable(digits = 4, caption = "Model 1 Coefficients")
Model 1 Coefficients
term estimate std.error statistic p.value
(Intercept) 14.1607 36.1339 0.3919 0.6953
LSAT 1.5557 0.2183 7.1248 0.0000
UGPA 11.0483 2.2807 4.8442 0.0000
glance(model1) %>%
  kable(digits = 4, caption = "Model 1 Fit Statistics")
Model 1 Fit Statistics
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
0.098 0.095 20.4068 32.373 0 2 -2654.946 5317.892 5335.473 248197.2 596 599

Model 1 Interpretation

This model tests whether entering academic credentials predict final UBE score. Positive LSAT and UGPA coefficients mean that stronger pre-admission academic indicators are associated with higher UBE scores. However, this model is limited because it does not include law school performance or bar preparation behavior.

Model 2: Academic Performance Predictors of UBE

The second model adds 1L and final law school performance variables.

model2 <- lm(
  UBE ~ LSAT + UGPA + GPA_1L + GPA_Final + FinalRankPercentile + CivPro_Num + LPI_Num + LPII_Num,
  data = df
)

summary(model2)
## 
## Call:
## lm(formula = UBE ~ LSAT + UGPA + GPA_1L + GPA_Final + FinalRankPercentile + 
##     CivPro_Num + LPI_Num + LPII_Num, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -55.599 -10.535   0.027  11.175  47.349 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          50.1499    48.6794   1.030 0.303380    
## LSAT                  0.7859     0.2066   3.803 0.000159 ***
## UGPA                  4.2020     2.1366   1.967 0.049739 *  
## GPA_1L               13.3904     4.7221   2.836 0.004747 ** 
## GPA_Final            28.0801    12.0307   2.334 0.019965 *  
## FinalRankPercentile   2.1709    13.5915   0.160 0.873159    
## CivPro_Num           -0.3857     1.5371  -0.251 0.801977    
## LPI_Num              -5.8481     1.5274  -3.829 0.000144 ***
## LPII_Num             -2.4745     1.5765  -1.570 0.117106    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 16.88 on 532 degrees of freedom
##   (59 observations deleted due to missingness)
## Multiple R-squared:  0.3878, Adjusted R-squared:  0.3786 
## F-statistic: 42.13 on 8 and 532 DF,  p-value: < 0.00000000000000022
tidy(model2) %>%
  kable(digits = 4, caption = "Model 2 Coefficients")
Model 2 Coefficients
term estimate std.error statistic p.value
(Intercept) 50.1499 48.6794 1.0302 0.3034
LSAT 0.7859 0.2066 3.8033 0.0002
UGPA 4.2020 2.1366 1.9667 0.0497
GPA_1L 13.3904 4.7221 2.8357 0.0047
GPA_Final 28.0801 12.0307 2.3340 0.0200
FinalRankPercentile 2.1709 13.5915 0.1597 0.8732
CivPro_Num -0.3857 1.5371 -0.2509 0.8020
LPI_Num -5.8481 1.5274 -3.8288 0.0001
LPII_Num -2.4745 1.5765 -1.5696 0.1171
glance(model2) %>%
  kable(digits = 4, caption = "Model 2 Fit Statistics")
Model 2 Fit Statistics
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
0.3878 0.3786 16.8835 42.1279 0 8 -2292.157 4604.314 4647.248 151648.6 532 541

Multicollinearity Check

model2_vif <- lm(
  UBE ~ LSAT + UGPA + GPA_1L + GPA_Final + FinalRankPercentile,
  data = df
)

vif(model2_vif)
##                LSAT                UGPA              GPA_1L           GPA_Final 
##            1.113699            1.113260            4.385148           28.848300 
## FinalRankPercentile 
##           29.033362

Model 2 Diagnostics

par(mfrow = c(2, 2))
plot(model2)

par(mfrow = c(1, 1))

Model 2 Interpretation

Model 2 is more useful than Model 1 because it includes law school performance. If GPA_1L and GPA_Final are significant and positive, this suggests that students who perform better in law school tend to earn higher UBE scores. If LSAT remains significant after controlling for law school GPA, then pre-admission preparation still contributes independently to bar performance.

Course-grade coefficients should be interpreted carefully because grades may be correlated with GPA_1L and GPA_Final. The VIF output should be reviewed to check whether predictors overlap too strongly.

Model 3: Preparation and Support Predictors of UBE

The third model focuses on variables that are more directly actionable by the law school.

model3_data <- na.omit(df[, c(
  "UBE",
  "BarPrepCompletion",
  "#LawSchoolBarPrepWorkshops",
  "MPRE"
)])

model3 <- lm(
  UBE ~ BarPrepCompletion + `#LawSchoolBarPrepWorkshops` + MPRE,
  data = model3_data
)
summary(model3)
## 
## Call:
## lm(formula = UBE ~ BarPrepCompletion + `#LawSchoolBarPrepWorkshops` + 
##     MPRE, data = model3_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -59.959 -11.044  -0.199  12.812  42.658 
## 
## Coefficients:
##                               Estimate Std. Error t value             Pr(>|t|)
## (Intercept)                  194.25361   12.66956  15.332 < 0.0000000000000002
## BarPrepCompletion             35.48324   10.54834   3.364              0.00093
## `#LawSchoolBarPrepWorkshops`  -0.59809    0.63609  -0.940              0.34827
## MPRE                           0.68081    0.09857   6.907      0.0000000000725
##                                 
## (Intercept)                  ***
## BarPrepCompletion            ***
## `#LawSchoolBarPrepWorkshops`    
## MPRE                         ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.33 on 190 degrees of freedom
## Multiple R-squared:  0.2638, Adjusted R-squared:  0.2521 
## F-statistic: 22.69 on 3 and 190 DF,  p-value: 0.000000000001337

Model 3 Interpretation

This model focuses preparation and support variables.

OptIntoWritingGuide was removed because after removing missing values it did not have at least two usable factor levels.

A positive coefficient for BarPrepCompletion indicates that higher completion of bar preparation is associated with higher UBE scores.

A positive coefficient for MPRE indicates that higher MPRE scores are associated with stronger bar exam performance.

Model 4: Logistic Regression Predicting Pass/Fail

Because the primary institutional goal is to improve passage rates, the main model is a logistic regression predicting PassFail.

model4_data <- df %>%
  select(
    PassFail,
    LSAT,
    UGPA,
    GPA_Final,
    BarPrepCompletion,
    `#LawSchoolBarPrepWorkshops`,
    MPRE,
    Probation
  ) %>%
  na.omit()

model4 <- glm(
  PassFail ~ LSAT + UGPA + GPA_Final +
    BarPrepCompletion + `#LawSchoolBarPrepWorkshops` +
    MPRE + Probation,
  data = model4_data,
  family = binomial
)

summary(model4)
## 
## Call:
## glm(formula = PassFail ~ LSAT + UGPA + GPA_Final + BarPrepCompletion + 
##     `#LawSchoolBarPrepWorkshops` + MPRE + Probation, family = binomial, 
##     data = model4_data)
## 
## Coefficients:
##                               Estimate Std. Error z value Pr(>|z|)   
## (Intercept)                  -45.35126   15.27108  -2.970  0.00298 **
## LSAT                           0.17021    0.08222   2.070  0.03844 * 
## UGPA                           0.26005    0.83588   0.311  0.75572   
## GPA_Final                      3.57463    1.31159   2.725  0.00642 **
## BarPrepCompletion              3.08364    1.86934   1.650  0.09903 . 
## `#LawSchoolBarPrepWorkshops`   0.06478    0.13644   0.475  0.63493   
## MPRE                           0.06574    0.02871   2.289  0.02205 * 
## ProbationY                    -0.84753    0.69908  -1.212  0.22538   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 145.21  on 193  degrees of freedom
## Residual deviance: 100.47  on 186  degrees of freedom
## AIC: 116.47
## 
## Number of Fisher Scoring iterations: 6
odds_table <- tidy(model4, exponentiate = TRUE, conf.int = TRUE)

kable(odds_table, digits = 4, caption = "Model 4 Odds Ratios")
Model 4 Odds Ratios
term estimate std.error statistic p.value conf.low conf.high
(Intercept) 0.0000 15.2711 -2.9697 0.0030 0.0000 0.0000
LSAT 1.1856 0.0822 2.0702 0.0384 1.0148 1.4053
UGPA 1.2970 0.8359 0.3111 0.7557 0.2421 6.6665
GPA_Final 35.6816 1.3116 2.7254 0.0064 3.0764 560.4948
BarPrepCompletion 21.8378 1.8693 1.6496 0.0990 0.5352 896.4957
#LawSchoolBarPrepWorkshops 1.0669 0.1364 0.4748 0.6349 0.8224 1.4117
MPRE 1.0679 0.0287 2.2895 0.0221 1.0127 1.1343
ProbationY 0.4285 0.6991 -1.2124 0.2254 0.1092 1.7492

Likelihood Ratio Test

model4_null <- glm(
  PassFail ~ 1,
  data = model4_data,
  family = binomial
)

anova(model4_null, model4, test = "Chisq")
## Analysis of Deviance Table
## 
## Model 1: PassFail ~ 1
## Model 2: PassFail ~ LSAT + UGPA + GPA_Final + BarPrepCompletion + `#LawSchoolBarPrepWorkshops` + 
##     MPRE + Probation
##   Resid. Df Resid. Dev Df Deviance     Pr(>Chi)    
## 1       193     145.21                             
## 2       186     100.47  7   44.736 0.0000001539 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Pseudo R-Squared

pseudo_r2 <- 1 - model4$deviance / model4$null.deviance
pseudo_r2
## [1] 0.3080773

Prediction Accuracy

model4_data$predicted_probability <- predict(
  model4,
  newdata = model4_data,
  type = "response"
)

model4_data$predicted_class <- ifelse(
  model4_data$predicted_probability >= 0.50,
  "P",
  "F"
)

model4_data$predicted_class <- factor(
  model4_data$predicted_class,
  levels = c("F", "P")
)

confusionMatrix(
  model4_data$predicted_class,
  model4_data$PassFail,
  positive = "P"
)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   F   P
##          F   7   3
##          P  17 167
##                                           
##                Accuracy : 0.8969          
##                  95% CI : (0.8453, 0.9359)
##     No Information Rate : 0.8763          
##     P-Value [Acc > NIR] : 0.22610         
##                                           
##                   Kappa : 0.3656          
##                                           
##  Mcnemar's Test P-Value : 0.00365         
##                                           
##             Sensitivity : 0.9824          
##             Specificity : 0.2917          
##          Pos Pred Value : 0.9076          
##          Neg Pred Value : 0.7000          
##              Prevalence : 0.8763          
##          Detection Rate : 0.8608          
##    Detection Prevalence : 0.9485          
##       Balanced Accuracy : 0.6370          
##                                           
##        'Positive' Class : P               
## 

ROC Curve and AUC

roc_obj <- roc(
  model4_data$PassFail,
  model4_data$predicted_probability,
  levels = c("F", "P")
)
## Setting direction: controls < cases
plot(roc_obj, main = "ROC Curve for Logistic Regression Model")

auc(roc_obj)
## Area under the curve: 0.8824

Logistic Model Interpretation

The logistic model estimates the probability of passing the bar exam. Odds ratios greater than 1 increase the odds of passing, while odds ratios below 1 reduce the odds of passing.

Important variables to evaluate are:

  • LSAT: A positive odds ratio means each additional LSAT point increases the odds of passing.
  • GPA_Final: A positive and significant odds ratio means final law school performance is a strong predictor of passage.
  • BarPrepCompletion: A positive coefficient means students who complete more of their bar preparation program are more likely to pass.
  • MPRE: A positive coefficient means higher MPRE scores are associated with higher passage odds.
  • Probation: An odds ratio below 1 means students who experienced academic probation have lower odds of passing, after controlling for other variables.

Model 5: Bar Component Model

This model uses final exam components to confirm how UBE is constructed. It is not ideal for early intervention because MBE and WrittenScaledScore are known only at the end of the exam process.

model5 <- lm(UBE ~ MBE + WrittenScaledScore, data = df)
summary(model5)
## 
## Call:
## lm(formula = UBE ~ MBE + WrittenScaledScore, data = df)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.039048 -0.001787 -0.000894 -0.000023  0.297714 
## 
## Coefficients:
##                      Estimate Std. Error   t value            Pr(>|t|)    
## (Intercept)        0.01835752 0.00980995     1.871              0.0618 .  
## MBE                0.99993300 0.00007063 14158.090 <0.0000000000000002 ***
## WrittenScaledScore 0.99994874 0.00006626 15091.411 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01737 on 597 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 4.561e+08 on 2 and 597 DF,  p-value: < 0.00000000000000022
tidy(model5) %>%
  kable(digits = 4, caption = "Component Model Coefficients")
Component Model Coefficients
term estimate std.error statistic p.value
(Intercept) 0.0184 0.0098 1.8713 0.0618
MBE 0.9999 0.0001 14158.0901 0.0000
WrittenScaledScore 0.9999 0.0001 15091.4115 0.0000

The coefficients should be approximately 1 for both MBE and WrittenScaledScore because UBE is constructed as:

UBE = MBE + WrittenScaledScore

Discussion

The analysis supports the idea that bar passage is not determined by one single variable. Pre-admission academic measures such as LSAT and UGPA provide some predictive information, but law school performance and preparation variables are more useful for intervention.

Model 1 shows whether students with stronger entering credentials tend to score higher on the UBE. Model 2 adds law school performance and should explain more variation in UBE scores. Model 3 focuses on bar preparation and readiness. Model 4 is the most important model for the school because it directly predicts the probability of passing.

The most defensible findings are those that are statistically significant, have practical effect sizes, and represent variables the school can act on before the bar exam.

Recommendations

Recommendation 1: Create an Early Bar-Risk Flag

The school should create an early warning system using LSAT, UGPA, 1L GPA, final GPA trajectory, and probation status. Students with weaker academic indicators should be identified before bar preparation begins.

This is actionable because the school already has these variables before the bar exam. The intervention could include required advising, academic coaching, and structured study planning.

Recommendation 2: Increase Monitoring of Bar Prep Completion

If BarPrepCompletion is positive and significant, the school should track commercial bar prep completion weekly during the preparation period. Students falling below expected completion levels should receive immediate outreach.

This is practical because bar prep completion is a behavior that can be changed before the exam.

Recommendation 3: Use MPRE and Writing Support as Readiness Indicators

If MPRE is significant, the school should use MPRE scores as an early readiness indicator. Students with lower MPRE scores should be encouraged to participate in additional bar-focused workshops, writing practice, and mentor meetings.

This recommendation is useful because MPRE is completed before the bar exam and can identify students needing support.

Limitations

This analysis is observational, so the models show association rather than definite causation. Some variables may be correlated with each other, such as 1L GPA, final GPA, rank percentile, and course grades. Missing values also affect some models because each model uses complete cases for its included variables.

Another limitation is that some variables, such as MBE, MEE, MPT, WrittenScaledScore, and UBE, are exam outcomes or components of the final score. They are useful for understanding performance but less useful for designing early interventions.

Conclusion

The most useful model for school decision-making is the logistic regression model predicting PassFail. The results should guide targeted intervention: identify academically at-risk students early, monitor bar preparation completion, and use MPRE and writing-related support as readiness signals. These recommendations are practical, evidence-based, and feasible for the law school to implement before the next bar preparation cycle.