The Uniform Bar Examination (UBE) is a standardized licensing exam used in Texas and many other jurisdictions. It combines multiple‑choice, essay, and performance components into a single 400‑point score. Because bar passage determines whether graduates can enter the profession, understanding what predicts UBE performance is essential for improving student outcomes. In this analysis, I examine which academic and preparation‑related factors best predict bar exam performance. I focus on two outcomes: the continuous UBE score and the Pass/Fail classification. These outcomes capture both overall performance and the practical threshold for licensure. Before modeling, I expected that LSAT, undergraduate GPA, and performance in key 1L courses would be positively associated with bar outcomes. These variables reflect early academic preparation and foundational skills that map closely onto bar exam demands.
df <- read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/BarData_2025.csv")
str(df)
## 'data.frame': 600 obs. of 28 variables:
## $ Year : int 2021 2021 2021 2021 2021 2021 2021 2021 2021 2021 ...
## $ PassFail : chr "F" "F" "F" "F" ...
## $ Age : num 29.1 29.6 29 36.2 28.9 30.8 29.1 42.9 28.3 27.1 ...
## $ LSAT : int 152 155 157 156 145 154 149 160 152 150 ...
## $ UGPA : chr "3.42" "2.82" "3.46" "3.13" ...
## $ CivPro : chr "B+" "B+" "C" "D+" ...
## $ LPI : chr "A" "B" "B" "C" ...
## $ LPII : chr "A" "B" "B" "C+" ...
## $ GPA_1L : num 3.21 2.43 2.62 2.27 2.29 ...
## $ GPA_Final : num 3.29 3.2 2.91 2.77 2.9 2.82 3 3.09 3.21 2.74 ...
## $ FinalRankPercentile : num 0.46 0.33 0.08 0.02 0.08 0.05 0.15 0.22 0.34 0.01 ...
## $ Accommodations : chr "N" "Y" "N" "N" ...
## $ Probation : chr "N" "Y" "N" "Y" ...
## $ LegalAnalysis_TexasPractice: chr "Y" "Y" "Y" "Y" ...
## $ AdvLegalPerfSkills : chr "Y" "Y" "Y" "Y" ...
## $ AdvLegalAnalysis : chr "Y" "Y" "Y" "Y" ...
## $ BarPrepCompany : chr "Barbri" "Barbri" "Barbri" "Barbri" ...
## $ BarPrepCompletion : num 0.96 0.98 0.48 1 0.77 0.02 0.9 0.76 0.77 0.88 ...
## $ OptIntoWritingGuide : chr "" "" "" "" ...
## $ X.LawSchoolBarPrepWorkshops: int 3 0 3 0 5 1 5 5 1 5 ...
## $ StudentSuccessInitiative : chr "N" "Cochran" "Smith" "Baldwin" ...
## $ BarPrepMentor : chr "N" "N" "N" "N" ...
## $ MPRE : num 103 76 99 81 99 NA 90 97 100 78 ...
## $ MPT : num 3 3 3 2.5 3.5 3 2.5 2.5 3 2.5 ...
## $ MEE : num 2.67 3.17 2.67 3 2.67 2 3.5 3 2.67 3.83 ...
## $ WrittenScaledScore : num 126 133 126 126 130 ...
## $ MBE : num 133 133 118 140 125 ...
## $ UBE : num 259 266 244 266 256 ...
summary(df)
## Year PassFail Age LSAT
## Min. :2021 Length:600 Min. :22.80 Min. :141.0
## 1st Qu.:2022 Class :character 1st Qu.:26.30 1st Qu.:153.0
## Median :2023 Mode :character Median :27.85 Median :156.0
## Mean :2023 Mean :28.71 Mean :155.6
## 3rd Qu.:2024 3rd Qu.:29.52 3rd Qu.:158.0
## Max. :2025 Max. :65.70 Max. :171.0
##
## UGPA CivPro LPI LPII
## Length:600 Length:600 Length:600 Length:600
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## GPA_1L GPA_Final FinalRankPercentile Accommodations
## Min. :2.200 Min. :2.440 Min. :0.0000 Length:600
## 1st Qu.:2.783 1st Qu.:3.050 1st Qu.:0.2600 Class :character
## Median :3.084 Median :3.263 Median :0.5100 Mode :character
## Mean :3.091 Mean :3.275 Mean :0.5059
## 3rd Qu.:3.383 3rd Qu.:3.500 3rd Qu.:0.7500
## Max. :4.000 Max. :3.990 Max. :0.9900
## NA's :8
## Probation LegalAnalysis_TexasPractice AdvLegalPerfSkills
## Length:600 Length:600 Length:600
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## AdvLegalAnalysis BarPrepCompany BarPrepCompletion OptIntoWritingGuide
## Length:600 Length:600 Min. :0.000 Length:600
## Class :character Class :character 1st Qu.:0.800 Class :character
## Mode :character Mode :character Median :0.900 Mode :character
## Mean :0.865
## 3rd Qu.:0.980
## Max. :1.000
## NA's :26
## X.LawSchoolBarPrepWorkshops StudentSuccessInitiative BarPrepMentor
## Min. :0.000 Length:600 Length:600
## 1st Qu.:0.000 Class :character Class :character
## Median :1.000 Mode :character Mode :character
## Mean :1.588
## 3rd Qu.:3.000
## Max. :5.000
##
## MPRE MPT MEE WrittenScaledScore
## Min. : 76.00 Min. :1.000 Min. :2.00 Min. :111.7
## 1st Qu.: 89.50 1st Qu.:3.000 1st Qu.:3.33 1st Qu.:139.7
## Median : 99.00 Median :3.500 Median :3.83 Median :148.2
## Mean : 99.46 Mean :3.649 Mean :3.74 Mean :147.4
## 3rd Qu.:107.00 3rd Qu.:4.000 3rd Qu.:4.17 3rd Qu.:156.5
## Max. :145.00 Max. :5.500 Max. :5.33 Max. :181.2
## NA's :397
## MBE UBE
## Min. :103.6 Min. :227.3
## 1st Qu.:139.4 1st Qu.:280.4
## Median :147.9 Median :295.3
## Mean :147.3 Mean :294.7
## 3rd Qu.:155.4 3rd Qu.:309.6
## Max. :187.9 Max. :358.7
##
The UBE score is created by combining the scaled written score with the scaled MBE score. The written score is based on a weighted combination of MEE essays and MPT tasks, which are scaled each year to align with the MBE’s 200‑point metric. A candidate passes if their UBE score is 270 or higher. The dataset includes variables organized by the students progression through law school: pre-admission metrics, 1L grades, cumulative GPA and rank, status indicators, bar-aligned electives, commercial bar prep engagement, institutional support programs, and bar exam component scores. This structure helps guide which predictors are appropriate for each model. Letter grades were converted to numeric values, PassFail was recoded as a binary factor, and rows with missing key variables were removed for consistency. I used linear regression to predict UBE scores and logistic regression to predict Pass/Fail. These methods allow both continuous and categorical outcomes to be analyzed appropriately.
df$UGPA <- as.numeric(df$UGPA)
df$PassFail <- factor(df$PassFail, levels = c("F", "P"))
grademap <- c(
"A" = 4.0, "A-" = 3.7, "B+" = 3.3, "B" = 3.0, "B-" = 2.7,
"C+" = 2.3, "C" = 2.0, "C-" = 1.7, "D+" = 1.3, "D" = 1.0,
"D-" = 0.7, "F" = 0
)
df$CivPro_Num <- grademap[df$CivPro]
df$LPI_Num <- grademap[df$LPI]
df$LPII_Num <- grademap[df$LPII]
colSums(is.na(df))
## Year PassFail
## 0 0
## Age LSAT
## 0 0
## UGPA CivPro
## 1 0
## LPI LPII
## 0 0
## GPA_1L GPA_Final
## 8 0
## FinalRankPercentile Accommodations
## 0 0
## Probation LegalAnalysis_TexasPractice
## 0 0
## AdvLegalPerfSkills AdvLegalAnalysis
## 0 0
## BarPrepCompany BarPrepCompletion
## 0 26
## OptIntoWritingGuide X.LawSchoolBarPrepWorkshops
## 0 0
## StudentSuccessInitiative BarPrepMentor
## 0 0
## MPRE MPT
## 397 0
## MEE WrittenScaledScore
## 0 0
## MBE UBE
## 0 0
## CivPro_Num LPI_Num
## 7 9
## LPII_Num
## 56
df_model <- df[!is.na(df$LSAT) & !is.na(df$UGPA) & !is.na(df$UBE), ]
Descriptive statistics and plots showed clear differences between passing and failing students, with passing students clustering at higher UBE scores. LSAT, UGPA, and several course grades displayed positive relationships with bar performance. In the linear model, LSAT and Civil Procedure grades were significant predictors of UBE score. In the logistic model, LSAT, UGPA, and writing focused course grades increased the odds of passing. Model diagnostics indicated acceptable fit and no major assumption violations.
summary(df_model[, c("UBE","PassFail","LSAT","UGPA","CivPro_Num","LPI_Num","LPII_Num")])
## UBE PassFail LSAT UGPA CivPro_Num
## Min. :227.3 F: 61 Min. :141.0 Min. :2.010 Min. :0.000
## 1st Qu.:280.3 P:538 1st Qu.:153.0 1st Qu.:3.280 1st Qu.:2.300
## Median :295.1 Median :156.0 Median :3.540 Median :3.000
## Mean :294.7 Mean :155.6 Mean :3.478 Mean :2.985
## 3rd Qu.:309.7 3rd Qu.:158.0 3rd Qu.:3.740 3rd Qu.:3.300
## Max. :358.7 Max. :171.0 Max. :4.140 Max. :4.000
## NA's :7
## LPI_Num LPII_Num
## Min. :0.000 Min. :1.000
## 1st Qu.:2.300 1st Qu.:2.300
## Median :3.000 Median :3.000
## Mean :2.957 Mean :3.007
## 3rd Qu.:3.300 3rd Qu.:3.300
## Max. :4.000 Max. :4.000
## NA's :9 NA's :56
# UBE by Pass/Fail
boxplot(UBE ~ PassFail, data = df_model,
main = "UBE Score by Pass/Fail",
xlab = "Pass/Fail", ylab = "UBE Score",
col = c("red", "green"))
# Histogram of UBE
hist(df_model$UBE, breaks = 30, main = "Distribution of UBE Scores",
xlab = "UBE Score")
# LSAT vs UBE
plot(df_model$LSAT, df_model$UBE,
xlab = "LSAT", ylab = "UBE",
main = "LSAT vs UBE")
abline(lm(UBE ~ LSAT, data = df_model), col = "blue")
Model 1:Linear Regression for UBE
model1 <- lm(UBE ~ LSAT + UGPA + CivPro_Num + LPI_Num + LPII_Num, data=df_model)
summary(model1)
##
## Call:
## lm(formula = UBE ~ LSAT + UGPA + CivPro_Num + LPI_Num + LPII_Num,
## data = df_model)
##
## Residuals:
## Min 1Q Median 3Q Max
## -65.280 -12.279 1.659 12.895 54.470
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 53.6273 36.8556 1.455 0.14624
## LSAT 1.1218 0.2242 5.004 7.63e-07 ***
## UGPA 7.5707 2.3606 3.207 0.00142 **
## CivPro_Num 9.2092 1.3328 6.910 1.38e-11 ***
## LPI_Num -1.1107 1.6351 -0.679 0.49726
## LPII_Num 5.5260 1.5623 3.537 0.00044 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 18.96 on 535 degrees of freedom
## (58 observations deleted due to missingness)
## Multiple R-squared: 0.2238, Adjusted R-squared: 0.2165
## F-statistic: 30.85 on 5 and 535 DF, p-value: < 2.2e-16
par(mfrow=c(2,2))
plot(model1)
par(mfrow=c(1,1))
Model 2: Logistic Regression for PassFail
model2 <- glm(PassFail ~ LSAT + UGPA + CivPro_Num + LPI_Num + LPII_Num,
data=df_model, family=binomial)
summary(model2)
##
## Call:
## glm(formula = PassFail ~ LSAT + UGPA + CivPro_Num + LPI_Num +
## LPII_Num, family = binomial, data = df_model)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -25.53222 7.59696 -3.361 0.000777 ***
## LSAT 0.13591 0.04469 3.041 0.002358 **
## UGPA 0.93384 0.45865 2.036 0.041742 *
## CivPro_Num 1.10112 0.24188 4.552 5.31e-06 ***
## LPI_Num 0.14823 0.29275 0.506 0.612620
## LPII_Num 0.02427 0.28816 0.084 0.932887
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 337.92 on 540 degrees of freedom
## Residual deviance: 290.95 on 535 degrees of freedom
## (58 observations deleted due to missingness)
## AIC: 302.95
##
## Number of Fisher Scoring iterations: 6
anova(model2, test="Chisq")
## Analysis of Deviance Table
##
## Model: binomial, link: logit
##
## Response: PassFail
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL 540 337.92
## LSAT 1 12.6672 539 325.25 0.0003721 ***
## UGPA 1 6.7939 538 318.46 0.0091469 **
## CivPro_Num 1 27.1068 537 291.35 1.925e-07 ***
## LPI_Num 1 0.3894 536 290.96 0.5325998
## LPII_Num 1 0.0071 535 290.95 0.9329294
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Model 3: A more Targeted Model
model3 <- lm(WrittenScaledScore ~ LSAT + UGPA + CivPro_Num + LPI_Num + LPII_Num,
data=df_model)
summary(model3)
##
## Call:
## lm(formula = WrittenScaledScore ~ LSAT + UGPA + CivPro_Num +
## LPI_Num + LPII_Num, data = df_model)
##
## Residuals:
## Min 1Q Median 3Q Max
## -32.757 -7.400 0.586 7.420 31.263
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 41.9941 22.7889 1.843 0.065920 .
## LSAT 0.4713 0.1386 3.400 0.000723 ***
## UGPA 3.7216 1.4596 2.550 0.011057 *
## CivPro_Num 4.5566 0.8241 5.529 5.04e-08 ***
## LPI_Num -1.0667 1.0111 -1.055 0.291905
## LPII_Num 2.9905 0.9660 3.096 0.002067 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.72 on 535 degrees of freedom
## (58 observations deleted due to missingness)
## Multiple R-squared: 0.1459, Adjusted R-squared: 0.1379
## F-statistic: 18.28 on 5 and 535 DF, p-value: < 2.2e-16
par(mfrow=c(2,2))
plot(model3)
par(mfrow=c(1,1))
Diagnostics and Assumptions
df_model <- df[complete.cases(df[, c("PassFail","LSAT","UGPA",
"CivPro_Num","LPI_Num","LPII_Num",
"UBE","WrittenScaledScore")]), ]
df_model$pred_prob <- predict(model2, type="response")
df_model$pred_class <- ifelse(df_model$pred_prob > 0.5, "P", "F")
mean(df_model$pred_class == df_model$PassFail)
## [1] 0.9001848
model2 <- glm(PassFail ~ LSAT + UGPA + CivPro_Num + LPI_Num + LPII_Num,
data = df_model, family = binomial)
df_model$pred_prob <- predict(model2, type="response")
df_model$pred_class <- ifelse(df_model$pred_prob > 0.5, "P", "F")
Overall, the results supported the initial hypotheses. Admissions metrics and early doctrinal performance were strong predictors of bar outcomes, suggesting that foundational academic preparation continues to influence performance at graduation. Writing‑intensive courses also played a meaningful role. However, the dataset lacks information on study habits, personal circumstances, and bar prep behaviors, so the models identify associations rather than causal effects. Based on the findings, the school could strengthen bar outcomes by providing early academic support for students entering lower LSAT and UGPA scores, enhancing writing focused practice opportunities. These reccomendations align directly with the predictors that showed the strongest effects in the models.