library(dplyr)
library(ggplot2)
library(broom)
library(car)
library(pROC)
library(caret)
library(knitr)
library(kableExtra)

options(scipen = 999)
theme_set(theme_minimal())

Introduction

The TTU Law School provided a de-identified dataset of bar examination outcomes for recent cohorts from 2021 through 2025. The goal of this analysis is to identify actionable predictors that distinguish students who pass the Uniform Bar Examination from those who do not.

The main research question is:

Which pre-admission, law-school performance, and bar-preparation variables are associated with bar exam success?

The primary response variable is PassFail, because the school’s main administrative goal is to increase bar passage rates. I also model UBE as a continuous response because it gives more information than a simple pass/fail classification.

Prior Hypotheses

Before fitting models, I expect the following:

LSAT and UGPA will be positively related to UBE and passing because they measure pre-law-school academic preparation.
1L GPA and final law school GPA will be stronger predictors than LSAT and UGPA because they measure performance during law school.
BarPrepCompletion and MPRE will be positively associated with passing because they reflect bar readiness and preparation behavior.
Probation will be negatively associated with passing because it reflects prior academic difficulty.
Bar component scores such as MBE and WrittenScaledScore will strongly predict UBE, but they are final exam components, so they are less useful for early intervention.

Data and Methods

df <- read.csv(
  "https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/BarData_2025.csv",
  check.names = FALSE
)

str(df)

## 'data.frame':    600 obs. of  28 variables:
##  $ Year                       : int  2021 2021 2021 2021 2021 2021 2021 2021 2021 2021 ...
##  $ PassFail                   : chr  "F" "F" "F" "F" ...
##  $ Age                        : num  29.1 29.6 29 36.2 28.9 30.8 29.1 42.9 28.3 27.1 ...
##  $ LSAT                       : int  152 155 157 156 145 154 149 160 152 150 ...
##  $ UGPA                       : chr  "3.42" "2.82" "3.46" "3.13" ...
##  $ CivPro                     : chr  "B+" "B+" "C" "D+" ...
##  $ LPI                        : chr  "A" "B" "B" "C" ...
##  $ LPII                       : chr  "A" "B" "B" "C+" ...
##  $ GPA_1L                     : num  3.21 2.43 2.62 2.27 2.29 ...
##  $ GPA_Final                  : num  3.29 3.2 2.91 2.77 2.9 2.82 3 3.09 3.21 2.74 ...
##  $ FinalRankPercentile        : num  0.46 0.33 0.08 0.02 0.08 0.05 0.15 0.22 0.34 0.01 ...
##  $ Accommodations             : chr  "N" "Y" "N" "N" ...
##  $ Probation                  : chr  "N" "Y" "N" "Y" ...
##  $ LegalAnalysis_TexasPractice: chr  "Y" "Y" "Y" "Y" ...
##  $ AdvLegalPerfSkills         : chr  "Y" "Y" "Y" "Y" ...
##  $ AdvLegalAnalysis           : chr  "Y" "Y" "Y" "Y" ...
##  $ BarPrepCompany             : chr  "Barbri" "Barbri" "Barbri" "Barbri" ...
##  $ BarPrepCompletion          : num  0.96 0.98 0.48 1 0.77 0.02 0.9 0.76 0.77 0.88 ...
##  $ OptIntoWritingGuide        : chr  "" "" "" "" ...
##  $ #LawSchoolBarPrepWorkshops : int  3 0 3 0 5 1 5 5 1 5 ...
##  $ StudentSuccessInitiative   : chr  "N" "Cochran" "Smith" "Baldwin" ...
##  $ BarPrepMentor              : chr  "N" "N" "N" "N" ...
##  $ MPRE                       : num  103 76 99 81 99 NA 90 97 100 78 ...
##  $ MPT                        : num  3 3 3 2.5 3.5 3 2.5 2.5 3 2.5 ...
##  $ MEE                        : num  2.67 3.17 2.67 3 2.67 2 3.5 3 2.67 3.83 ...
##  $ WrittenScaledScore         : num  126 133 126 126 130 ...
##  $ MBE                        : num  133 133 118 140 125 ...
##  $ UBE                        : num  259 266 244 266 256 ...

head(df)

##   Year PassFail  Age LSAT UGPA CivPro LPI LPII GPA_1L GPA_Final
## 1 2021        F 29.1  152 3.42     B+   A    A  3.206      3.29
## 2 2021        F 29.6  155 2.82     B+   B    B  2.431      3.20
## 3 2021        F 29.0  157 3.46      C   B    B  2.620      2.91
## 4 2021        F 36.2  156 3.13     D+   C   C+  2.275      2.77
## 5 2021        F 28.9  145 3.49      C  C+   C+  2.293      2.90
## 6 2021        F 30.8  154 2.85     B+   F   CR  2.538      2.82
##   FinalRankPercentile Accommodations Probation LegalAnalysis_TexasPractice
## 1                0.46              N         N                           Y
## 2                0.33              Y         Y                           Y
## 3                0.08              N         N                           Y
## 4                0.02              N         Y                           Y
## 5                0.08              N         Y                           Y
## 6                0.05              N         N                           Y
##   AdvLegalPerfSkills AdvLegalAnalysis BarPrepCompany BarPrepCompletion
## 1                  Y                Y         Barbri              0.96
## 2                  Y                Y         Barbri              0.98
## 3                  Y                Y         Barbri              0.48
## 4                  Y                Y         Barbri              1.00
## 5                  Y                Y         Themis              0.77
## 6                  Y                Y         Themis              0.02
##   OptIntoWritingGuide #LawSchoolBarPrepWorkshops StudentSuccessInitiative
## 1                                              3                        N
## 2                                              0                  Cochran
## 3                                              3                    Smith
## 4                                              0                  Baldwin
## 5                                              5                  Baldwin
## 6                                              1                    Rosen
##   BarPrepMentor MPRE MPT  MEE WrittenScaledScore   MBE   UBE
## 1             N  103 3.0 2.67              125.5 133.3 258.8
## 2             N   76 3.0 3.17              133.1 132.7 265.8
## 3             N   99 3.0 2.67              125.5 118.2 243.7
## 4             N   81 2.5 3.00              125.5 140.1 265.6
## 5             N   99 3.5 2.67              130.5 125.4 255.9
## 6             N   NA 3.0 2.00              115.4 113.5 228.9

colnames(df)

##  [1] "Year"                        "PassFail"                   
##  [3] "Age"                         "LSAT"                       
##  [5] "UGPA"                        "CivPro"                     
##  [7] "LPI"                         "LPII"                       
##  [9] "GPA_1L"                      "GPA_Final"                  
## [11] "FinalRankPercentile"         "Accommodations"             
## [13] "Probation"                   "LegalAnalysis_TexasPractice"
## [15] "AdvLegalPerfSkills"          "AdvLegalAnalysis"           
## [17] "BarPrepCompany"              "BarPrepCompletion"          
## [19] "OptIntoWritingGuide"         "#LawSchoolBarPrepWorkshops" 
## [21] "StudentSuccessInitiative"    "BarPrepMentor"              
## [23] "MPRE"                        "MPT"                        
## [25] "MEE"                         "WrittenScaledScore"         
## [27] "MBE"                         "UBE"

Data Cleaning

The dataset contains numeric variables, letter grades, yes/no indicators, categorical variables, and missing values. Letter grades are converted to GPA-style numeric scores. Y/N variables are converted to factors. The response variable PassFail is coded with F as the reference level and P as the success outcome.

df$UGPA <- as.numeric(df$UGPA)

## Warning: NAs introduced by coercion

df$PassFail <- factor(df$PassFail, levels = c("F", "P"))

grade_map <- c(
  "A" = 4.0, "A-" = 3.7,
  "B+" = 3.3, "B" = 3.0, "B-" = 2.7,
  "C+" = 2.3, "C" = 2.0, "C-" = 1.7,
  "D+" = 1.3, "D" = 1.0, "D-" = 0.7,
  "F" = 0
)

df$CivPro_Num <- grade_map[trimws(df$CivPro)]
df$LPI_Num <- grade_map[trimws(df$LPI)]
df$LPII_Num <- grade_map[trimws(df$LPII)]

df$Accommodations <- factor(df$Accommodations)
df$Probation <- factor(df$Probation)
df$LegalAnalysis_TexasPractice <- factor(df$LegalAnalysis_TexasPractice)
df$AdvLegalPerfSkills <- factor(df$AdvLegalPerfSkills)
df$AdvLegalAnalysis <- factor(df$AdvLegalAnalysis)
df$OptIntoWritingGuide <- factor(df$OptIntoWritingGuide)
df$BarPrepCompany <- factor(df$BarPrepCompany)

df$PassBinary <- ifelse(df$PassFail == "P", 1, 0)

Missing Value Review

missing_table <- data.frame(
  Variable = names(df),
  Missing = colSums(is.na(df))
) %>%
  arrange(desc(Missing))

kable(missing_table, caption = "Missing Values by Variable")

Missing Values by Variable
	Variable	Missing
MPRE	MPRE	397
LPII_Num	LPII_Num	56
BarPrepCompletion	BarPrepCompletion	26
LPI_Num	LPI_Num	9
GPA_1L	GPA_1L	8
CivPro_Num	CivPro_Num	7
UGPA	UGPA	1
Year	Year	0
PassFail	PassFail	0
Age	Age	0
LSAT	LSAT	0
CivPro	CivPro	0
LPI	LPI	0
LPII	LPII	0
GPA_Final	GPA_Final	0
FinalRankPercentile	FinalRankPercentile	0
Accommodations	Accommodations	0
Probation	Probation	0
LegalAnalysis_TexasPractice	LegalAnalysis_TexasPractice	0
AdvLegalPerfSkills	AdvLegalPerfSkills	0
AdvLegalAnalysis	AdvLegalAnalysis	0
BarPrepCompany	BarPrepCompany	0
OptIntoWritingGuide	OptIntoWritingGuide	0
#LawSchoolBarPrepWorkshops	#LawSchoolBarPrepWorkshops	0
StudentSuccessInitiative	StudentSuccessInitiative	0
BarPrepMentor	BarPrepMentor	0
MPT	MPT	0
MEE	MEE	0
WrittenScaledScore	WrittenScaledScore	0
MBE	MBE	0
UBE	UBE	0
PassBinary	PassBinary	0

Rows with missing values are not deleted from the full dataset immediately. Instead, each model uses complete cases for the variables included in that specific model. This preserves as much usable information as possible.

Descriptive Statistics

df %>%
  summarise(
    N = n(),
    Pass_Rate = mean(PassFail == "P", na.rm = TRUE),
    Mean_UBE = mean(UBE, na.rm = TRUE),
    SD_UBE = sd(UBE, na.rm = TRUE),
    Mean_LSAT = mean(LSAT, na.rm = TRUE),
    Mean_UGPA = mean(UGPA, na.rm = TRUE),
    Mean_GPA_1L = mean(GPA_1L, na.rm = TRUE),
    Mean_GPA_Final = mean(GPA_Final, na.rm = TRUE),
    Mean_BarPrepCompletion = mean(BarPrepCompletion, na.rm = TRUE)
  ) %>%
  kable(digits = 3, caption = "Summary Statistics")

Summary Statistics
N	Pass_Rate	Mean_UBE	SD_UBE	Mean_LSAT	Mean_UGPA	Mean_GPA_1L	Mean_GPA_Final	Mean_BarPrepCompletion
600	0.898	294.709	21.435	155.628	3.478	3.091	3.275	0.865

df %>%
  count(PassFail) %>%
  mutate(Percent = n / sum(n)) %>%
  kable(digits = 3, caption = "Pass/Fail Counts")

Pass/Fail Counts
PassFail	n	Percent
F	61	0.102
P	539	0.898

Exploratory Data Analysis

ggplot(df, aes(x = PassFail, y = UBE)) +
  geom_boxplot() +
  labs(
    title = "UBE Score by Pass/Fail Status",
    x = "Pass/Fail",
    y = "UBE Score"
  )

ggplot(df, aes(x = LSAT, y = UBE, color = PassFail)) +
  geom_point(alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(
    title = "LSAT and UBE Score",
    x = "LSAT",
    y = "UBE Score"
  )

## `geom_smooth()` using formula = 'y ~ x'

ggplot(df, aes(x = GPA_Final, y = UBE, color = PassFail)) +
  geom_point(alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(
    title = "Final Law School GPA and UBE Score",
    x = "Final Law School GPA",
    y = "UBE Score"
  )

## `geom_smooth()` using formula = 'y ~ x'

ggplot(df, aes(x = BarPrepCompletion, y = UBE, color = PassFail)) +
  geom_point(alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(
    title = "Bar Prep Completion and UBE Score",
    x = "Bar Prep Completion Proportion",
    y = "UBE Score"
  )

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 26 rows containing non-finite outside the scale range
## (`stat_smooth()`).

## Warning: Removed 26 rows containing missing values or values outside the scale range
## (`geom_point()`).

Model 1: Admission Predictors of UBE

The first model uses only pre-admission predictors: LSAT and UGPA.

model1 <- lm(UBE ~ LSAT + UGPA, data = df)
summary(model1)

## 
## Call:
## lm(formula = UBE ~ LSAT + UGPA, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -66.221 -13.466   1.022  14.406  54.180 
## 
## Coefficients:
##             Estimate Std. Error t value         Pr(>|t|)    
## (Intercept)  14.1607    36.1339   0.392            0.695    
## LSAT          1.5557     0.2183   7.125 0.00000000000302 ***
## UGPA         11.0483     2.2807   4.844 0.00000162235350 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20.41 on 596 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.09799,    Adjusted R-squared:  0.09496 
## F-statistic: 32.37 on 2 and 596 DF,  p-value: 0.00000000000004499

tidy(model1) %>%
  kable(digits = 4, caption = "Model 1 Coefficients")

Model 1 Coefficients
term	estimate	std.error	statistic	p.value
(Intercept)	14.1607	36.1339	0.3919	0.6953
LSAT	1.5557	0.2183	7.1248	0.0000
UGPA	11.0483	2.2807	4.8442	0.0000

glance(model1) %>%
  kable(digits = 4, caption = "Model 1 Fit Statistics")

Model 1 Fit Statistics
r.squared	adj.r.squared	sigma	statistic	p.value	df	logLik	AIC	BIC	deviance	df.residual	nobs
0.098	0.095	20.4068	32.373	0	2	-2654.946	5317.892	5335.473	248197.2	596	599

Model 1 Interpretation

This model tests whether entering academic credentials predict final UBE score. Positive LSAT and UGPA coefficients mean that stronger pre-admission academic indicators are associated with higher UBE scores. However, this model is limited because it does not include law school performance or bar preparation behavior.

Model 2: Academic Performance Predictors of UBE

The second model adds 1L and final law school performance variables.

model2 <- lm(
  UBE ~ LSAT + UGPA + GPA_1L + GPA_Final + FinalRankPercentile + CivPro_Num + LPI_Num + LPII_Num,
  data = df
)

summary(model2)

## 
## Call:
## lm(formula = UBE ~ LSAT + UGPA + GPA_1L + GPA_Final + FinalRankPercentile + 
##     CivPro_Num + LPI_Num + LPII_Num, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -55.599 -10.535   0.027  11.175  47.349 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          50.1499    48.6794   1.030 0.303380    
## LSAT                  0.7859     0.2066   3.803 0.000159 ***
## UGPA                  4.2020     2.1366   1.967 0.049739 *  
## GPA_1L               13.3904     4.7221   2.836 0.004747 ** 
## GPA_Final            28.0801    12.0307   2.334 0.019965 *  
## FinalRankPercentile   2.1709    13.5915   0.160 0.873159    
## CivPro_Num           -0.3857     1.5371  -0.251 0.801977    
## LPI_Num              -5.8481     1.5274  -3.829 0.000144 ***
## LPII_Num             -2.4745     1.5765  -1.570 0.117106    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 16.88 on 532 degrees of freedom
##   (59 observations deleted due to missingness)
## Multiple R-squared:  0.3878, Adjusted R-squared:  0.3786 
## F-statistic: 42.13 on 8 and 532 DF,  p-value: < 0.00000000000000022

tidy(model2) %>%
  kable(digits = 4, caption = "Model 2 Coefficients")

Model 2 Coefficients
term	estimate	std.error	statistic	p.value
(Intercept)	50.1499	48.6794	1.0302	0.3034
LSAT	0.7859	0.2066	3.8033	0.0002
UGPA	4.2020	2.1366	1.9667	0.0497
GPA_1L	13.3904	4.7221	2.8357	0.0047
GPA_Final	28.0801	12.0307	2.3340	0.0200
FinalRankPercentile	2.1709	13.5915	0.1597	0.8732
CivPro_Num	-0.3857	1.5371	-0.2509	0.8020
LPI_Num	-5.8481	1.5274	-3.8288	0.0001
LPII_Num	-2.4745	1.5765	-1.5696	0.1171

glance(model2) %>%
  kable(digits = 4, caption = "Model 2 Fit Statistics")

Model 2 Fit Statistics
r.squared	adj.r.squared	sigma	statistic	p.value	df	logLik	AIC	BIC	deviance	df.residual	nobs
0.3878	0.3786	16.8835	42.1279	0	8	-2292.157	4604.314	4647.248	151648.6	532	541

Multicollinearity Check

model2_vif <- lm(
  UBE ~ LSAT + UGPA + GPA_1L + GPA_Final + FinalRankPercentile,
  data = df
)

vif(model2_vif)

##                LSAT                UGPA              GPA_1L           GPA_Final 
##            1.113699            1.113260            4.385148           28.848300 
## FinalRankPercentile 
##           29.033362

Model 2 Diagnostics

par(mfrow = c(2, 2))
plot(model2)

par(mfrow = c(1, 1))

Model 2 Interpretation

Model 2 is more useful than Model 1 because it includes law school performance. If GPA_1L and GPA_Final are significant and positive, this suggests that students who perform better in law school tend to earn higher UBE scores. If LSAT remains significant after controlling for law school GPA, then pre-admission preparation still contributes independently to bar performance.

Course-grade coefficients should be interpreted carefully because grades may be correlated with GPA_1L and GPA_Final. The VIF output should be reviewed to check whether predictors overlap too strongly.

Model 3: Preparation and Support Predictors of UBE

The third model focuses on variables that are more directly actionable by the law school.

model3_data <- na.omit(df[, c(
  "UBE",
  "BarPrepCompletion",
  "#LawSchoolBarPrepWorkshops",
  "MPRE"
)])

model3 <- lm(
  UBE ~ BarPrepCompletion + `#LawSchoolBarPrepWorkshops` + MPRE,
  data = model3_data
)
summary(model3)

## 
## Call:
## lm(formula = UBE ~ BarPrepCompletion + `#LawSchoolBarPrepWorkshops` + 
##     MPRE, data = model3_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -59.959 -11.044  -0.199  12.812  42.658 
## 
## Coefficients:
##                               Estimate Std. Error t value             Pr(>|t|)
## (Intercept)                  194.25361   12.66956  15.332 < 0.0000000000000002
## BarPrepCompletion             35.48324   10.54834   3.364              0.00093
## `#LawSchoolBarPrepWorkshops`  -0.59809    0.63609  -0.940              0.34827
## MPRE                           0.68081    0.09857   6.907      0.0000000000725
##                                 
## (Intercept)                  ***
## BarPrepCompletion            ***
## `#LawSchoolBarPrepWorkshops`    
## MPRE                         ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.33 on 190 degrees of freedom
## Multiple R-squared:  0.2638, Adjusted R-squared:  0.2521 
## F-statistic: 22.69 on 3 and 190 DF,  p-value: 0.000000000001337

Model 3 Interpretation

This model focuses preparation and support variables.

OptIntoWritingGuide was removed because after removing missing values it did not have at least two usable factor levels.

A positive coefficient for BarPrepCompletion indicates that higher completion of bar preparation is associated with higher UBE scores.

A positive coefficient for MPRE indicates that higher MPRE scores are associated with stronger bar exam performance.

Model 4: Logistic Regression Predicting Pass/Fail

Because the primary institutional goal is to improve passage rates, the main model is a logistic regression predicting PassFail.

model4_data <- df %>%
  select(
    PassFail,
    LSAT,
    UGPA,
    GPA_Final,
    BarPrepCompletion,
    `#LawSchoolBarPrepWorkshops`,
    MPRE,
    Probation
  ) %>%
  na.omit()

model4 <- glm(
  PassFail ~ LSAT + UGPA + GPA_Final +
    BarPrepCompletion + `#LawSchoolBarPrepWorkshops` +
    MPRE + Probation,
  data = model4_data,
  family = binomial
)

summary(model4)

## 
## Call:
## glm(formula = PassFail ~ LSAT + UGPA + GPA_Final + BarPrepCompletion + 
##     `#LawSchoolBarPrepWorkshops` + MPRE + Probation, family = binomial, 
##     data = model4_data)
## 
## Coefficients:
##                               Estimate Std. Error z value Pr(>|z|)   
## (Intercept)                  -45.35126   15.27108  -2.970  0.00298 **
## LSAT                           0.17021    0.08222   2.070  0.03844 * 
## UGPA                           0.26005    0.83588   0.311  0.75572   
## GPA_Final                      3.57463    1.31159   2.725  0.00642 **
## BarPrepCompletion              3.08364    1.86934   1.650  0.09903 . 
## `#LawSchoolBarPrepWorkshops`   0.06478    0.13644   0.475  0.63493   
## MPRE                           0.06574    0.02871   2.289  0.02205 * 
## ProbationY                    -0.84753    0.69908  -1.212  0.22538   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 145.21  on 193  degrees of freedom
## Residual deviance: 100.47  on 186  degrees of freedom
## AIC: 116.47
## 
## Number of Fisher Scoring iterations: 6

odds_table <- tidy(model4, exponentiate = TRUE, conf.int = TRUE)

kable(odds_table, digits = 4, caption = "Model 4 Odds Ratios")

Model 4 Odds Ratios
term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	0.0000	15.2711	-2.9697	0.0030	0.0000	0.0000
LSAT	1.1856	0.0822	2.0702	0.0384	1.0148	1.4053
UGPA	1.2970	0.8359	0.3111	0.7557	0.2421	6.6665
GPA_Final	35.6816	1.3116	2.7254	0.0064	3.0764	560.4948
BarPrepCompletion	21.8378	1.8693	1.6496	0.0990	0.5352	896.4957
`#LawSchoolBarPrepWorkshops`	1.0669	0.1364	0.4748	0.6349	0.8224	1.4117
MPRE	1.0679	0.0287	2.2895	0.0221	1.0127	1.1343
ProbationY	0.4285	0.6991	-1.2124	0.2254	0.1092	1.7492

Likelihood Ratio Test

model4_null <- glm(
  PassFail ~ 1,
  data = model4_data,
  family = binomial
)

anova(model4_null, model4, test = "Chisq")

## Analysis of Deviance Table
## 
## Model 1: PassFail ~ 1
## Model 2: PassFail ~ LSAT + UGPA + GPA_Final + BarPrepCompletion + `#LawSchoolBarPrepWorkshops` + 
##     MPRE + Probation
##   Resid. Df Resid. Dev Df Deviance     Pr(>Chi)    
## 1       193     145.21                             
## 2       186     100.47  7   44.736 0.0000001539 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Pseudo R-Squared

pseudo_r2 <- 1 - model4$deviance / model4$null.deviance
pseudo_r2

## [1] 0.3080773

Prediction Accuracy

model4_data$predicted_probability <- predict(
  model4,
  newdata = model4_data,
  type = "response"
)

model4_data$predicted_class <- ifelse(
  model4_data$predicted_probability >= 0.50,
  "P",
  "F"
)

model4_data$predicted_class <- factor(
  model4_data$predicted_class,
  levels = c("F", "P")
)

confusionMatrix(
  model4_data$predicted_class,
  model4_data$PassFail,
  positive = "P"
)

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   F   P
##          F   7   3
##          P  17 167
##                                           
##                Accuracy : 0.8969          
##                  95% CI : (0.8453, 0.9359)
##     No Information Rate : 0.8763          
##     P-Value [Acc > NIR] : 0.22610         
##                                           
##                   Kappa : 0.3656          
##                                           
##  Mcnemar's Test P-Value : 0.00365         
##                                           
##             Sensitivity : 0.9824          
##             Specificity : 0.2917          
##          Pos Pred Value : 0.9076          
##          Neg Pred Value : 0.7000          
##              Prevalence : 0.8763          
##          Detection Rate : 0.8608          
##    Detection Prevalence : 0.9485          
##       Balanced Accuracy : 0.6370          
##                                           
##        'Positive' Class : P               
##

ROC Curve and AUC

roc_obj <- roc(
  model4_data$PassFail,
  model4_data$predicted_probability,
  levels = c("F", "P")
)

## Setting direction: controls < cases

plot(roc_obj, main = "ROC Curve for Logistic Regression Model")

auc(roc_obj)

## Area under the curve: 0.8824

Logistic Model Interpretation

The logistic model estimates the probability of passing the bar exam. Odds ratios greater than 1 increase the odds of passing, while odds ratios below 1 reduce the odds of passing.

Important variables to evaluate are:

LSAT: A positive odds ratio means each additional LSAT point increases the odds of passing.
GPA_Final: A positive and significant odds ratio means final law school performance is a strong predictor of passage.
BarPrepCompletion: A positive coefficient means students who complete more of their bar preparation program are more likely to pass.
MPRE: A positive coefficient means higher MPRE scores are associated with higher passage odds.
Probation: An odds ratio below 1 means students who experienced academic probation have lower odds of passing, after controlling for other variables.

Model 5: Bar Component Model

This model uses final exam components to confirm how UBE is constructed. It is not ideal for early intervention because MBE and WrittenScaledScore are known only at the end of the exam process.

model5 <- lm(UBE ~ MBE + WrittenScaledScore, data = df)
summary(model5)

## 
## Call:
## lm(formula = UBE ~ MBE + WrittenScaledScore, data = df)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.039048 -0.001787 -0.000894 -0.000023  0.297714 
## 
## Coefficients:
##                      Estimate Std. Error   t value            Pr(>|t|)    
## (Intercept)        0.01835752 0.00980995     1.871              0.0618 .  
## MBE                0.99993300 0.00007063 14158.090 <0.0000000000000002 ***
## WrittenScaledScore 0.99994874 0.00006626 15091.411 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01737 on 597 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 4.561e+08 on 2 and 597 DF,  p-value: < 0.00000000000000022

tidy(model5) %>%
  kable(digits = 4, caption = "Component Model Coefficients")

Component Model Coefficients
term	estimate	std.error	statistic	p.value
(Intercept)	0.0184	0.0098	1.8713	0.0618
MBE	0.9999	0.0001	14158.0901	0.0000
WrittenScaledScore	0.9999	0.0001	15091.4115	0.0000

The coefficients should be approximately 1 for both MBE and WrittenScaledScore because UBE is constructed as:

UBE = MBE + WrittenScaledScore

Discussion

The analysis supports the idea that bar passage is not determined by one single variable. Pre-admission academic measures such as LSAT and UGPA provide some predictive information, but law school performance and preparation variables are more useful for intervention.

Model 1 shows whether students with stronger entering credentials tend to score higher on the UBE. Model 2 adds law school performance and should explain more variation in UBE scores. Model 3 focuses on bar preparation and readiness. Model 4 is the most important model for the school because it directly predicts the probability of passing.

The most defensible findings are those that are statistically significant, have practical effect sizes, and represent variables the school can act on before the bar exam.

Recommendations

Recommendation 1: Create an Early Bar-Risk Flag

The school should create an early warning system using LSAT, UGPA, 1L GPA, final GPA trajectory, and probation status. Students with weaker academic indicators should be identified before bar preparation begins.

This is actionable because the school already has these variables before the bar exam. The intervention could include required advising, academic coaching, and structured study planning.

Recommendation 2: Increase Monitoring of Bar Prep Completion

If BarPrepCompletion is positive and significant, the school should track commercial bar prep completion weekly during the preparation period. Students falling below expected completion levels should receive immediate outreach.

This is practical because bar prep completion is a behavior that can be changed before the exam.

Recommendation 3: Use MPRE and Writing Support as Readiness Indicators

If MPRE is significant, the school should use MPRE scores as an early readiness indicator. Students with lower MPRE scores should be encouraged to participate in additional bar-focused workshops, writing practice, and mentor meetings.

This recommendation is useful because MPRE is completed before the bar exam and can identify students needing support.

Limitations

This analysis is observational, so the models show association rather than definite causation. Some variables may be correlated with each other, such as 1L GPA, final GPA, rank percentile, and course grades. Missing values also affect some models because each model uses complete cases for its included variables.

Another limitation is that some variables, such as MBE, MEE, MPT, WrittenScaledScore, and UBE, are exam outcomes or components of the final score. They are useful for understanding performance but less useful for designing early interventions.

Conclusion

The most useful model for school decision-making is the logistic regression model predicting PassFail. The results should guide targeted intervention: identify academically at-risk students early, monitor bar preparation completion, and use MPRE and writing-related support as readiness signals. These recommendations are practical, evidence-based, and feasible for the law school to implement before the next bar preparation cycle.

statistical Analysis of Bar Exam Performance

Sribabu Chipurupalli

2026-05-04