library(dplyr)
library(ggplot2)
library(broom)
library(car)
library(pROC)
library(caret)
library(knitr)
library(kableExtra)
options(scipen = 999)
theme_set(theme_minimal())
The TTU Law School provided a de-identified dataset of bar examination outcomes for recent cohorts from 2021 through 2025. The goal of this analysis is to identify actionable predictors that distinguish students who pass the Uniform Bar Examination from those who do not.
The main research question is:
Which pre-admission, law-school performance, and bar-preparation variables are associated with bar exam success?
The primary response variable is PassFail, because the school’s main administrative goal is to increase bar passage rates. I also model UBE as a continuous response because it gives more information than a simple pass/fail classification.
Before fitting models, I expect the following:
df <- read.csv(
"https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/BarData_2025.csv",
check.names = FALSE
)
str(df)
## 'data.frame': 600 obs. of 28 variables:
## $ Year : int 2021 2021 2021 2021 2021 2021 2021 2021 2021 2021 ...
## $ PassFail : chr "F" "F" "F" "F" ...
## $ Age : num 29.1 29.6 29 36.2 28.9 30.8 29.1 42.9 28.3 27.1 ...
## $ LSAT : int 152 155 157 156 145 154 149 160 152 150 ...
## $ UGPA : chr "3.42" "2.82" "3.46" "3.13" ...
## $ CivPro : chr "B+" "B+" "C" "D+" ...
## $ LPI : chr "A" "B" "B" "C" ...
## $ LPII : chr "A" "B" "B" "C+" ...
## $ GPA_1L : num 3.21 2.43 2.62 2.27 2.29 ...
## $ GPA_Final : num 3.29 3.2 2.91 2.77 2.9 2.82 3 3.09 3.21 2.74 ...
## $ FinalRankPercentile : num 0.46 0.33 0.08 0.02 0.08 0.05 0.15 0.22 0.34 0.01 ...
## $ Accommodations : chr "N" "Y" "N" "N" ...
## $ Probation : chr "N" "Y" "N" "Y" ...
## $ LegalAnalysis_TexasPractice: chr "Y" "Y" "Y" "Y" ...
## $ AdvLegalPerfSkills : chr "Y" "Y" "Y" "Y" ...
## $ AdvLegalAnalysis : chr "Y" "Y" "Y" "Y" ...
## $ BarPrepCompany : chr "Barbri" "Barbri" "Barbri" "Barbri" ...
## $ BarPrepCompletion : num 0.96 0.98 0.48 1 0.77 0.02 0.9 0.76 0.77 0.88 ...
## $ OptIntoWritingGuide : chr "" "" "" "" ...
## $ #LawSchoolBarPrepWorkshops : int 3 0 3 0 5 1 5 5 1 5 ...
## $ StudentSuccessInitiative : chr "N" "Cochran" "Smith" "Baldwin" ...
## $ BarPrepMentor : chr "N" "N" "N" "N" ...
## $ MPRE : num 103 76 99 81 99 NA 90 97 100 78 ...
## $ MPT : num 3 3 3 2.5 3.5 3 2.5 2.5 3 2.5 ...
## $ MEE : num 2.67 3.17 2.67 3 2.67 2 3.5 3 2.67 3.83 ...
## $ WrittenScaledScore : num 126 133 126 126 130 ...
## $ MBE : num 133 133 118 140 125 ...
## $ UBE : num 259 266 244 266 256 ...
head(df)
## Year PassFail Age LSAT UGPA CivPro LPI LPII GPA_1L GPA_Final
## 1 2021 F 29.1 152 3.42 B+ A A 3.206 3.29
## 2 2021 F 29.6 155 2.82 B+ B B 2.431 3.20
## 3 2021 F 29.0 157 3.46 C B B 2.620 2.91
## 4 2021 F 36.2 156 3.13 D+ C C+ 2.275 2.77
## 5 2021 F 28.9 145 3.49 C C+ C+ 2.293 2.90
## 6 2021 F 30.8 154 2.85 B+ F CR 2.538 2.82
## FinalRankPercentile Accommodations Probation LegalAnalysis_TexasPractice
## 1 0.46 N N Y
## 2 0.33 Y Y Y
## 3 0.08 N N Y
## 4 0.02 N Y Y
## 5 0.08 N Y Y
## 6 0.05 N N Y
## AdvLegalPerfSkills AdvLegalAnalysis BarPrepCompany BarPrepCompletion
## 1 Y Y Barbri 0.96
## 2 Y Y Barbri 0.98
## 3 Y Y Barbri 0.48
## 4 Y Y Barbri 1.00
## 5 Y Y Themis 0.77
## 6 Y Y Themis 0.02
## OptIntoWritingGuide #LawSchoolBarPrepWorkshops StudentSuccessInitiative
## 1 3 N
## 2 0 Cochran
## 3 3 Smith
## 4 0 Baldwin
## 5 5 Baldwin
## 6 1 Rosen
## BarPrepMentor MPRE MPT MEE WrittenScaledScore MBE UBE
## 1 N 103 3.0 2.67 125.5 133.3 258.8
## 2 N 76 3.0 3.17 133.1 132.7 265.8
## 3 N 99 3.0 2.67 125.5 118.2 243.7
## 4 N 81 2.5 3.00 125.5 140.1 265.6
## 5 N 99 3.5 2.67 130.5 125.4 255.9
## 6 N NA 3.0 2.00 115.4 113.5 228.9
colnames(df)
## [1] "Year" "PassFail"
## [3] "Age" "LSAT"
## [5] "UGPA" "CivPro"
## [7] "LPI" "LPII"
## [9] "GPA_1L" "GPA_Final"
## [11] "FinalRankPercentile" "Accommodations"
## [13] "Probation" "LegalAnalysis_TexasPractice"
## [15] "AdvLegalPerfSkills" "AdvLegalAnalysis"
## [17] "BarPrepCompany" "BarPrepCompletion"
## [19] "OptIntoWritingGuide" "#LawSchoolBarPrepWorkshops"
## [21] "StudentSuccessInitiative" "BarPrepMentor"
## [23] "MPRE" "MPT"
## [25] "MEE" "WrittenScaledScore"
## [27] "MBE" "UBE"
The dataset contains numeric variables, letter grades, yes/no indicators, categorical variables, and missing values. Letter grades are converted to GPA-style numeric scores. Y/N variables are converted to factors. The response variable PassFail is coded with F as the reference level and P as the success outcome.
df$UGPA <- as.numeric(df$UGPA)
## Warning: NAs introduced by coercion
df$PassFail <- factor(df$PassFail, levels = c("F", "P"))
grade_map <- c(
"A" = 4.0, "A-" = 3.7,
"B+" = 3.3, "B" = 3.0, "B-" = 2.7,
"C+" = 2.3, "C" = 2.0, "C-" = 1.7,
"D+" = 1.3, "D" = 1.0, "D-" = 0.7,
"F" = 0
)
df$CivPro_Num <- grade_map[trimws(df$CivPro)]
df$LPI_Num <- grade_map[trimws(df$LPI)]
df$LPII_Num <- grade_map[trimws(df$LPII)]
df$Accommodations <- factor(df$Accommodations)
df$Probation <- factor(df$Probation)
df$LegalAnalysis_TexasPractice <- factor(df$LegalAnalysis_TexasPractice)
df$AdvLegalPerfSkills <- factor(df$AdvLegalPerfSkills)
df$AdvLegalAnalysis <- factor(df$AdvLegalAnalysis)
df$OptIntoWritingGuide <- factor(df$OptIntoWritingGuide)
df$BarPrepCompany <- factor(df$BarPrepCompany)
df$PassBinary <- ifelse(df$PassFail == "P", 1, 0)
missing_table <- data.frame(
Variable = names(df),
Missing = colSums(is.na(df))
) %>%
arrange(desc(Missing))
kable(missing_table, caption = "Missing Values by Variable")
| Variable | Missing | |
|---|---|---|
| MPRE | MPRE | 397 |
| LPII_Num | LPII_Num | 56 |
| BarPrepCompletion | BarPrepCompletion | 26 |
| LPI_Num | LPI_Num | 9 |
| GPA_1L | GPA_1L | 8 |
| CivPro_Num | CivPro_Num | 7 |
| UGPA | UGPA | 1 |
| Year | Year | 0 |
| PassFail | PassFail | 0 |
| Age | Age | 0 |
| LSAT | LSAT | 0 |
| CivPro | CivPro | 0 |
| LPI | LPI | 0 |
| LPII | LPII | 0 |
| GPA_Final | GPA_Final | 0 |
| FinalRankPercentile | FinalRankPercentile | 0 |
| Accommodations | Accommodations | 0 |
| Probation | Probation | 0 |
| LegalAnalysis_TexasPractice | LegalAnalysis_TexasPractice | 0 |
| AdvLegalPerfSkills | AdvLegalPerfSkills | 0 |
| AdvLegalAnalysis | AdvLegalAnalysis | 0 |
| BarPrepCompany | BarPrepCompany | 0 |
| OptIntoWritingGuide | OptIntoWritingGuide | 0 |
| #LawSchoolBarPrepWorkshops | #LawSchoolBarPrepWorkshops | 0 |
| StudentSuccessInitiative | StudentSuccessInitiative | 0 |
| BarPrepMentor | BarPrepMentor | 0 |
| MPT | MPT | 0 |
| MEE | MEE | 0 |
| WrittenScaledScore | WrittenScaledScore | 0 |
| MBE | MBE | 0 |
| UBE | UBE | 0 |
| PassBinary | PassBinary | 0 |
Rows with missing values are not deleted from the full dataset immediately. Instead, each model uses complete cases for the variables included in that specific model. This preserves as much usable information as possible.
df %>%
summarise(
N = n(),
Pass_Rate = mean(PassFail == "P", na.rm = TRUE),
Mean_UBE = mean(UBE, na.rm = TRUE),
SD_UBE = sd(UBE, na.rm = TRUE),
Mean_LSAT = mean(LSAT, na.rm = TRUE),
Mean_UGPA = mean(UGPA, na.rm = TRUE),
Mean_GPA_1L = mean(GPA_1L, na.rm = TRUE),
Mean_GPA_Final = mean(GPA_Final, na.rm = TRUE),
Mean_BarPrepCompletion = mean(BarPrepCompletion, na.rm = TRUE)
) %>%
kable(digits = 3, caption = "Summary Statistics")
| N | Pass_Rate | Mean_UBE | SD_UBE | Mean_LSAT | Mean_UGPA | Mean_GPA_1L | Mean_GPA_Final | Mean_BarPrepCompletion |
|---|---|---|---|---|---|---|---|---|
| 600 | 0.898 | 294.709 | 21.435 | 155.628 | 3.478 | 3.091 | 3.275 | 0.865 |
df %>%
count(PassFail) %>%
mutate(Percent = n / sum(n)) %>%
kable(digits = 3, caption = "Pass/Fail Counts")
| PassFail | n | Percent |
|---|---|---|
| F | 61 | 0.102 |
| P | 539 | 0.898 |
ggplot(df, aes(x = PassFail, y = UBE)) +
geom_boxplot() +
labs(
title = "UBE Score by Pass/Fail Status",
x = "Pass/Fail",
y = "UBE Score"
)
ggplot(df, aes(x = LSAT, y = UBE, color = PassFail)) +
geom_point(alpha = 0.7) +
geom_smooth(method = "lm", se = FALSE) +
labs(
title = "LSAT and UBE Score",
x = "LSAT",
y = "UBE Score"
)
## `geom_smooth()` using formula = 'y ~ x'
ggplot(df, aes(x = GPA_Final, y = UBE, color = PassFail)) +
geom_point(alpha = 0.7) +
geom_smooth(method = "lm", se = FALSE) +
labs(
title = "Final Law School GPA and UBE Score",
x = "Final Law School GPA",
y = "UBE Score"
)
## `geom_smooth()` using formula = 'y ~ x'
ggplot(df, aes(x = BarPrepCompletion, y = UBE, color = PassFail)) +
geom_point(alpha = 0.7) +
geom_smooth(method = "lm", se = FALSE) +
labs(
title = "Bar Prep Completion and UBE Score",
x = "Bar Prep Completion Proportion",
y = "UBE Score"
)
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 26 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 26 rows containing missing values or values outside the scale range
## (`geom_point()`).
The first model uses only pre-admission predictors: LSAT and UGPA.
model1 <- lm(UBE ~ LSAT + UGPA, data = df)
summary(model1)
##
## Call:
## lm(formula = UBE ~ LSAT + UGPA, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -66.221 -13.466 1.022 14.406 54.180
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 14.1607 36.1339 0.392 0.695
## LSAT 1.5557 0.2183 7.125 0.00000000000302 ***
## UGPA 11.0483 2.2807 4.844 0.00000162235350 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 20.41 on 596 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.09799, Adjusted R-squared: 0.09496
## F-statistic: 32.37 on 2 and 596 DF, p-value: 0.00000000000004499
tidy(model1) %>%
kable(digits = 4, caption = "Model 1 Coefficients")
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 14.1607 | 36.1339 | 0.3919 | 0.6953 |
| LSAT | 1.5557 | 0.2183 | 7.1248 | 0.0000 |
| UGPA | 11.0483 | 2.2807 | 4.8442 | 0.0000 |
glance(model1) %>%
kable(digits = 4, caption = "Model 1 Fit Statistics")
| r.squared | adj.r.squared | sigma | statistic | p.value | df | logLik | AIC | BIC | deviance | df.residual | nobs |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.098 | 0.095 | 20.4068 | 32.373 | 0 | 2 | -2654.946 | 5317.892 | 5335.473 | 248197.2 | 596 | 599 |
This model tests whether entering academic credentials predict final UBE score. Positive LSAT and UGPA coefficients mean that stronger pre-admission academic indicators are associated with higher UBE scores. However, this model is limited because it does not include law school performance or bar preparation behavior.
The second model adds 1L and final law school performance variables.
model2 <- lm(
UBE ~ LSAT + UGPA + GPA_1L + GPA_Final + FinalRankPercentile + CivPro_Num + LPI_Num + LPII_Num,
data = df
)
summary(model2)
##
## Call:
## lm(formula = UBE ~ LSAT + UGPA + GPA_1L + GPA_Final + FinalRankPercentile +
## CivPro_Num + LPI_Num + LPII_Num, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -55.599 -10.535 0.027 11.175 47.349
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 50.1499 48.6794 1.030 0.303380
## LSAT 0.7859 0.2066 3.803 0.000159 ***
## UGPA 4.2020 2.1366 1.967 0.049739 *
## GPA_1L 13.3904 4.7221 2.836 0.004747 **
## GPA_Final 28.0801 12.0307 2.334 0.019965 *
## FinalRankPercentile 2.1709 13.5915 0.160 0.873159
## CivPro_Num -0.3857 1.5371 -0.251 0.801977
## LPI_Num -5.8481 1.5274 -3.829 0.000144 ***
## LPII_Num -2.4745 1.5765 -1.570 0.117106
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 16.88 on 532 degrees of freedom
## (59 observations deleted due to missingness)
## Multiple R-squared: 0.3878, Adjusted R-squared: 0.3786
## F-statistic: 42.13 on 8 and 532 DF, p-value: < 0.00000000000000022
tidy(model2) %>%
kable(digits = 4, caption = "Model 2 Coefficients")
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 50.1499 | 48.6794 | 1.0302 | 0.3034 |
| LSAT | 0.7859 | 0.2066 | 3.8033 | 0.0002 |
| UGPA | 4.2020 | 2.1366 | 1.9667 | 0.0497 |
| GPA_1L | 13.3904 | 4.7221 | 2.8357 | 0.0047 |
| GPA_Final | 28.0801 | 12.0307 | 2.3340 | 0.0200 |
| FinalRankPercentile | 2.1709 | 13.5915 | 0.1597 | 0.8732 |
| CivPro_Num | -0.3857 | 1.5371 | -0.2509 | 0.8020 |
| LPI_Num | -5.8481 | 1.5274 | -3.8288 | 0.0001 |
| LPII_Num | -2.4745 | 1.5765 | -1.5696 | 0.1171 |
glance(model2) %>%
kable(digits = 4, caption = "Model 2 Fit Statistics")
| r.squared | adj.r.squared | sigma | statistic | p.value | df | logLik | AIC | BIC | deviance | df.residual | nobs |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.3878 | 0.3786 | 16.8835 | 42.1279 | 0 | 8 | -2292.157 | 4604.314 | 4647.248 | 151648.6 | 532 | 541 |
model2_vif <- lm(
UBE ~ LSAT + UGPA + GPA_1L + GPA_Final + FinalRankPercentile,
data = df
)
vif(model2_vif)
## LSAT UGPA GPA_1L GPA_Final
## 1.113699 1.113260 4.385148 28.848300
## FinalRankPercentile
## 29.033362
par(mfrow = c(2, 2))
plot(model2)
par(mfrow = c(1, 1))
Model 2 is more useful than Model 1 because it includes law school performance. If GPA_1L and GPA_Final are significant and positive, this suggests that students who perform better in law school tend to earn higher UBE scores. If LSAT remains significant after controlling for law school GPA, then pre-admission preparation still contributes independently to bar performance.
Course-grade coefficients should be interpreted carefully because grades may be correlated with GPA_1L and GPA_Final. The VIF output should be reviewed to check whether predictors overlap too strongly.
The third model focuses on variables that are more directly actionable by the law school.
model3_data <- na.omit(df[, c(
"UBE",
"BarPrepCompletion",
"#LawSchoolBarPrepWorkshops",
"MPRE"
)])
model3 <- lm(
UBE ~ BarPrepCompletion + `#LawSchoolBarPrepWorkshops` + MPRE,
data = model3_data
)
summary(model3)
##
## Call:
## lm(formula = UBE ~ BarPrepCompletion + `#LawSchoolBarPrepWorkshops` +
## MPRE, data = model3_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -59.959 -11.044 -0.199 12.812 42.658
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 194.25361 12.66956 15.332 < 0.0000000000000002
## BarPrepCompletion 35.48324 10.54834 3.364 0.00093
## `#LawSchoolBarPrepWorkshops` -0.59809 0.63609 -0.940 0.34827
## MPRE 0.68081 0.09857 6.907 0.0000000000725
##
## (Intercept) ***
## BarPrepCompletion ***
## `#LawSchoolBarPrepWorkshops`
## MPRE ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.33 on 190 degrees of freedom
## Multiple R-squared: 0.2638, Adjusted R-squared: 0.2521
## F-statistic: 22.69 on 3 and 190 DF, p-value: 0.000000000001337
This model focuses preparation and support variables.
OptIntoWritingGuide was removed because after removing missing values it did not have at least two usable factor levels.
A positive coefficient for BarPrepCompletion indicates that higher completion of bar preparation is associated with higher UBE scores.
A positive coefficient for MPRE indicates that higher MPRE scores are associated with stronger bar exam performance.
Because the primary institutional goal is to improve passage rates, the main model is a logistic regression predicting PassFail.
model4_data <- df %>%
select(
PassFail,
LSAT,
UGPA,
GPA_Final,
BarPrepCompletion,
`#LawSchoolBarPrepWorkshops`,
MPRE,
Probation
) %>%
na.omit()
model4 <- glm(
PassFail ~ LSAT + UGPA + GPA_Final +
BarPrepCompletion + `#LawSchoolBarPrepWorkshops` +
MPRE + Probation,
data = model4_data,
family = binomial
)
summary(model4)
##
## Call:
## glm(formula = PassFail ~ LSAT + UGPA + GPA_Final + BarPrepCompletion +
## `#LawSchoolBarPrepWorkshops` + MPRE + Probation, family = binomial,
## data = model4_data)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -45.35126 15.27108 -2.970 0.00298 **
## LSAT 0.17021 0.08222 2.070 0.03844 *
## UGPA 0.26005 0.83588 0.311 0.75572
## GPA_Final 3.57463 1.31159 2.725 0.00642 **
## BarPrepCompletion 3.08364 1.86934 1.650 0.09903 .
## `#LawSchoolBarPrepWorkshops` 0.06478 0.13644 0.475 0.63493
## MPRE 0.06574 0.02871 2.289 0.02205 *
## ProbationY -0.84753 0.69908 -1.212 0.22538
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 145.21 on 193 degrees of freedom
## Residual deviance: 100.47 on 186 degrees of freedom
## AIC: 116.47
##
## Number of Fisher Scoring iterations: 6
odds_table <- tidy(model4, exponentiate = TRUE, conf.int = TRUE)
kable(odds_table, digits = 4, caption = "Model 4 Odds Ratios")
| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| (Intercept) | 0.0000 | 15.2711 | -2.9697 | 0.0030 | 0.0000 | 0.0000 |
| LSAT | 1.1856 | 0.0822 | 2.0702 | 0.0384 | 1.0148 | 1.4053 |
| UGPA | 1.2970 | 0.8359 | 0.3111 | 0.7557 | 0.2421 | 6.6665 |
| GPA_Final | 35.6816 | 1.3116 | 2.7254 | 0.0064 | 3.0764 | 560.4948 |
| BarPrepCompletion | 21.8378 | 1.8693 | 1.6496 | 0.0990 | 0.5352 | 896.4957 |
#LawSchoolBarPrepWorkshops |
1.0669 | 0.1364 | 0.4748 | 0.6349 | 0.8224 | 1.4117 |
| MPRE | 1.0679 | 0.0287 | 2.2895 | 0.0221 | 1.0127 | 1.1343 |
| ProbationY | 0.4285 | 0.6991 | -1.2124 | 0.2254 | 0.1092 | 1.7492 |
model4_null <- glm(
PassFail ~ 1,
data = model4_data,
family = binomial
)
anova(model4_null, model4, test = "Chisq")
## Analysis of Deviance Table
##
## Model 1: PassFail ~ 1
## Model 2: PassFail ~ LSAT + UGPA + GPA_Final + BarPrepCompletion + `#LawSchoolBarPrepWorkshops` +
## MPRE + Probation
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 193 145.21
## 2 186 100.47 7 44.736 0.0000001539 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
pseudo_r2 <- 1 - model4$deviance / model4$null.deviance
pseudo_r2
## [1] 0.3080773
model4_data$predicted_probability <- predict(
model4,
newdata = model4_data,
type = "response"
)
model4_data$predicted_class <- ifelse(
model4_data$predicted_probability >= 0.50,
"P",
"F"
)
model4_data$predicted_class <- factor(
model4_data$predicted_class,
levels = c("F", "P")
)
confusionMatrix(
model4_data$predicted_class,
model4_data$PassFail,
positive = "P"
)
## Confusion Matrix and Statistics
##
## Reference
## Prediction F P
## F 7 3
## P 17 167
##
## Accuracy : 0.8969
## 95% CI : (0.8453, 0.9359)
## No Information Rate : 0.8763
## P-Value [Acc > NIR] : 0.22610
##
## Kappa : 0.3656
##
## Mcnemar's Test P-Value : 0.00365
##
## Sensitivity : 0.9824
## Specificity : 0.2917
## Pos Pred Value : 0.9076
## Neg Pred Value : 0.7000
## Prevalence : 0.8763
## Detection Rate : 0.8608
## Detection Prevalence : 0.9485
## Balanced Accuracy : 0.6370
##
## 'Positive' Class : P
##
roc_obj <- roc(
model4_data$PassFail,
model4_data$predicted_probability,
levels = c("F", "P")
)
## Setting direction: controls < cases
plot(roc_obj, main = "ROC Curve for Logistic Regression Model")
auc(roc_obj)
## Area under the curve: 0.8824
The logistic model estimates the probability of passing the bar exam. Odds ratios greater than 1 increase the odds of passing, while odds ratios below 1 reduce the odds of passing.
Important variables to evaluate are:
This model uses final exam components to confirm how UBE is constructed. It is not ideal for early intervention because MBE and WrittenScaledScore are known only at the end of the exam process.
model5 <- lm(UBE ~ MBE + WrittenScaledScore, data = df)
summary(model5)
##
## Call:
## lm(formula = UBE ~ MBE + WrittenScaledScore, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.039048 -0.001787 -0.000894 -0.000023 0.297714
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.01835752 0.00980995 1.871 0.0618 .
## MBE 0.99993300 0.00007063 14158.090 <0.0000000000000002 ***
## WrittenScaledScore 0.99994874 0.00006626 15091.411 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01737 on 597 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: 1
## F-statistic: 4.561e+08 on 2 and 597 DF, p-value: < 0.00000000000000022
tidy(model5) %>%
kable(digits = 4, caption = "Component Model Coefficients")
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 0.0184 | 0.0098 | 1.8713 | 0.0618 |
| MBE | 0.9999 | 0.0001 | 14158.0901 | 0.0000 |
| WrittenScaledScore | 0.9999 | 0.0001 | 15091.4115 | 0.0000 |
The coefficients should be approximately 1 for both MBE and WrittenScaledScore because UBE is constructed as:
UBE = MBE + WrittenScaledScore
The analysis supports the idea that bar passage is not determined by one single variable. Pre-admission academic measures such as LSAT and UGPA provide some predictive information, but law school performance and preparation variables are more useful for intervention.
Model 1 shows whether students with stronger entering credentials tend to score higher on the UBE. Model 2 adds law school performance and should explain more variation in UBE scores. Model 3 focuses on bar preparation and readiness. Model 4 is the most important model for the school because it directly predicts the probability of passing.
The most defensible findings are those that are statistically significant, have practical effect sizes, and represent variables the school can act on before the bar exam.
The school should create an early warning system using LSAT, UGPA, 1L GPA, final GPA trajectory, and probation status. Students with weaker academic indicators should be identified before bar preparation begins.
This is actionable because the school already has these variables before the bar exam. The intervention could include required advising, academic coaching, and structured study planning.
If BarPrepCompletion is positive and significant, the school should track commercial bar prep completion weekly during the preparation period. Students falling below expected completion levels should receive immediate outreach.
This is practical because bar prep completion is a behavior that can be changed before the exam.
If MPRE is significant, the school should use MPRE scores as an early readiness indicator. Students with lower MPRE scores should be encouraged to participate in additional bar-focused workshops, writing practice, and mentor meetings.
This recommendation is useful because MPRE is completed before the bar exam and can identify students needing support.
This analysis is observational, so the models show association rather than definite causation. Some variables may be correlated with each other, such as 1L GPA, final GPA, rank percentile, and course grades. Missing values also affect some models because each model uses complete cases for its included variables.
Another limitation is that some variables, such as MBE, MEE, MPT, WrittenScaledScore, and UBE, are exam outcomes or components of the final score. They are useful for understanding performance but less useful for designing early interventions.
The most useful model for school decision-making is the logistic regression model predicting PassFail. The results should guide targeted intervention: identify academically at-risk students early, monitor bar preparation completion, and use MPRE and writing-related support as readiness signals. These recommendations are practical, evidence-based, and feasible for the law school to implement before the next bar preparation cycle.