The Texas Tech University School of Law would like to investigate the characteristics that differentiate between the students that pass the Uniform Bar Examination (UBE) and those that fail. The most critical objective for the institution is not only to identify the characteristics of passing students but also those of which something can be done in the context of the existing admission policy and capacity of the program. In this respect, UBE score is considered the main response variable while PassFail is a secondary one.
UBE score is the most appropriate primary measure since it is a continuous variable and hence provides more information than the dichotomous pass/fail. For instance, a student who scores 269 and another one who scores 230 both fail but the two failures have entirely different meaning in terms of what kind of interventions should be made. Using UBE, the school will be able to determine how significant the factors are in influencing the change towards or away from the cutoff point of 270.
Before model estimation, I anticipated the following:
The data file includes 600 candidate records for the period 2021-2025 in bar administrations. The variable UBE Score is an aggregate score out of 400 points and equals Written Scaled Score + MBE. A score of UBE ≥ 270 allows passing the test in Texas. The components of the bar exam, MBE, MEE, MPT, Written Scaled Score, and UBE, are the end points or intermediate outcomes, so they cannot be predictors in our models.
df <- read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/BarData_2025.csv",
stringsAsFactors = FALSE)
str(df)
## 'data.frame': 600 obs. of 28 variables:
## $ Year : int 2021 2021 2021 2021 2021 2021 2021 2021 2021 2021 ...
## $ PassFail : chr "F" "F" "F" "F" ...
## $ Age : num 29.1 29.6 29 36.2 28.9 30.8 29.1 42.9 28.3 27.1 ...
## $ LSAT : int 152 155 157 156 145 154 149 160 152 150 ...
## $ UGPA : chr "3.42" "2.82" "3.46" "3.13" ...
## $ CivPro : chr "B+" "B+" "C" "D+" ...
## $ LPI : chr "A" "B" "B" "C" ...
## $ LPII : chr "A" "B" "B" "C+" ...
## $ GPA_1L : num 3.21 2.43 2.62 2.27 2.29 ...
## $ GPA_Final : num 3.29 3.2 2.91 2.77 2.9 2.82 3 3.09 3.21 2.74 ...
## $ FinalRankPercentile : num 0.46 0.33 0.08 0.02 0.08 0.05 0.15 0.22 0.34 0.01 ...
## $ Accommodations : chr "N" "Y" "N" "N" ...
## $ Probation : chr "N" "Y" "N" "Y" ...
## $ LegalAnalysis_TexasPractice: chr "Y" "Y" "Y" "Y" ...
## $ AdvLegalPerfSkills : chr "Y" "Y" "Y" "Y" ...
## $ AdvLegalAnalysis : chr "Y" "Y" "Y" "Y" ...
## $ BarPrepCompany : chr "Barbri" "Barbri" "Barbri" "Barbri" ...
## $ BarPrepCompletion : num 0.96 0.98 0.48 1 0.77 0.02 0.9 0.76 0.77 0.88 ...
## $ OptIntoWritingGuide : chr "" "" "" "" ...
## $ X.LawSchoolBarPrepWorkshops: int 3 0 3 0 5 1 5 5 1 5 ...
## $ StudentSuccessInitiative : chr "N" "Cochran" "Smith" "Baldwin" ...
## $ BarPrepMentor : chr "N" "N" "N" "N" ...
## $ MPRE : num 103 76 99 81 99 NA 90 97 100 78 ...
## $ MPT : num 3 3 3 2.5 3.5 3 2.5 2.5 3 2.5 ...
## $ MEE : num 2.67 3.17 2.67 3 2.67 2 3.5 3 2.67 3.83 ...
## $ WrittenScaledScore : num 126 133 126 126 130 ...
## $ MBE : num 133 133 118 140 125 ...
## $ UBE : num 259 266 244 266 256 ...
Some variables needed some data processing prior to fitting models:
df$UGPA <- as.numeric(df$UGPA)
df$PassFail <- factor(df$PassFail, levels = c("F", "P"))
df$Pass <- ifelse(df$PassFail == "P", 1, 0)
df$Year <- factor(df$Year)
grademap <- c("A" = 4.0, "A-" = 3.7, "B+" = 3.3, "B" = 3.0, "B-" = 2.7,
"C+" = 2.3, "C" = 2.0, "C-" = 1.7, "D+" = 1.3,
"D" = 1.0, "D-" = 0.7, "F" = 0)
df$CivPro_Num <- grademap[df$CivPro]
df$LPI_Num <- grademap[df$LPI]
df$LPII_Num <- grademap[df$LPII]
yn_vars <- c("Accommodations", "Probation", "LegalAnalysis_TexasPractice",
"AdvLegalPerfSkills", "AdvLegalAnalysis")
for (v in yn_vars) {
df[[paste0(v, "_Y")]] <- ifelse(df[[v]] == "Y", 1, 0)
}
df$SSI_Y <- ifelse(df$StudentSuccessInitiative != "N", 1, 0)
df$Mentor_Y <- ifelse(df$BarPrepMentor != "N", 1, 0)
df$WritingGuide_Y <- ifelse(df$OptIntoWritingGuide == "Y" & !is.na(df$OptIntoWritingGuide), 1, 0)
names(df)[grepl("LawSchoolBarPrepWorkshops", names(df))] <- "LawSchoolBarPrepWorkshops"
missing_summary <- data.frame(
variable = names(df),
missing = sapply(df, function(x) sum(is.na(x)))
)
missing_summary[missing_summary$missing > 0, ]
## variable missing
## UGPA UGPA 1
## GPA_1L GPA_1L 8
## BarPrepCompletion BarPrepCompletion 26
## MPRE MPRE 397
## CivPro_Num CivPro_Num 7
## LPI_Num LPI_Num 9
## LPII_Num LPII_Num 56
table(df$PassFail)
##
## F P
## 61 539
prop.table(table(df$PassFail))
##
## F P
## 0.1016667 0.8983333
aggregate(UBE ~ PassFail, data = df,
FUN = function(x) c(mean = mean(x), sd = sd(x), min = min(x), max = max(x), n = length(x)))
## PassFail UBE.mean UBE.sd UBE.min UBE.max UBE.n
## 1 F 255.36721 10.86143 227.30000 269.40000 61.00000
## 2 P 299.16160 17.40792 269.50000 358.70000 539.00000
aggregate(cbind(UBE, LSAT, UGPA, GPA_Final, BarPrepCompletion, LawSchoolBarPrepWorkshops) ~ PassFail,
data = df, FUN = mean, na.rm = TRUE)
## PassFail UBE LSAT UGPA GPA_Final BarPrepCompletion
## 1 F 255.6586 153.7931 3.381379 2.945259 0.7486500
## 2 P 299.1412 155.8233 3.494097 3.316210 0.8778724
## LawSchoolBarPrepWorkshops
## 1 1.775862
## 2 1.584466
In this dataset, 539 out of 600 candidates were successful, with an overall success rate of around 89.8%. The mean score on the UBE exam for unsuccessful candidates was around 255.4, while successful candidates was 299.2. The average bar preparation status, GPA, LSAT, and undergraduate GPA of unsuccessful candidates were lower than those of successful candidates. It is expected that there is a difference between the scores of successful and unsuccessful candidates on the UBE test since PassFail is derived from UBE.
boxplot(UBE ~ PassFail, data = df,
main = "UBE Scores by Pass/Fail Outcome",
xlab = "Pass/Fail", ylab = "UBE Score",
col = c("tomato", "darkseagreen3"))
abline(h = 270, lty = 2)
plot(df$GPA_Final, df$UBE,
main = "UBE Score by Final Law School GPA",
xlab = "Final Law School GPA", ylab = "UBE Score")
abline(lm(UBE ~ GPA_Final, data = df), lwd = 2)
abline(h = 270, lty = 2)
plot(df$BarPrepCompletion, df$UBE,
main = "UBE Score by Bar Prep Completion",
xlab = "Bar Prep Completion Proportion", ylab = "UBE Score")
abline(lm(UBE ~ BarPrepCompletion, data = df), lwd = 2)
abline(h = 270, lty = 2)
cor_vars <- c("LSAT", "UGPA", "GPA_1L", "GPA_Final", "FinalRankPercentile",
"BarPrepCompletion", "LawSchoolBarPrepWorkshops", "UBE")
round(cor(df[cor_vars], use = "pairwise.complete.obs"), 3)
## LSAT UGPA GPA_1L GPA_Final FinalRankPercentile
## LSAT 1.000 -0.162 0.209 0.140 0.152
## UGPA -0.162 1.000 0.178 0.214 0.227
## GPA_1L 0.209 0.178 1.000 0.870 0.871
## GPA_Final 0.140 0.214 0.870 1.000 0.982
## FinalRankPercentile 0.152 0.227 0.871 0.982 1.000
## BarPrepCompletion -0.097 0.138 0.177 0.268 0.268
## LawSchoolBarPrepWorkshops -0.127 0.033 -0.180 -0.081 -0.089
## UBE 0.250 0.145 0.525 0.574 0.573
## BarPrepCompletion LawSchoolBarPrepWorkshops UBE
## LSAT -0.097 -0.127 0.250
## UGPA 0.138 0.033 0.145
## GPA_1L 0.177 -0.180 0.525
## GPA_Final 0.268 -0.081 0.574
## FinalRankPercentile 0.268 -0.089 0.573
## BarPrepCompletion 1.000 0.070 0.323
## LawSchoolBarPrepWorkshops 0.070 1.000 -0.035
## UBE 0.323 -0.035 1.000
Exploratory correlations show the strength of the modeling process. GPA and Final Rank Percentile are highly correlated with UBE; however, these variables are measuring essentially the same thing from different perspectives. In order to prevent multicollinearity problems, I used the variable GPA_Final without using Final Rank Percentile. BarPrepCompletion is moderately correlated with UBE, whereas LawSchoolBarPrepWorkshops is not at all correlated with UBE.
I estimated several models rather than one oversized model. This keeps each model connected to a specific research question.
The first model tries to examine whether any admission variables can predict UBE scores. This kind of model may prove useful as an initial framework but cannot be recommended for use because the need is to make recommendations without changing the requirements for admission.
model1 <- lm(UBE ~ LSAT + UGPA + Age + Year, data = df)
summary(model1)
##
## Call:
## lm(formula = UBE ~ LSAT + UGPA + Age + Year, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -68.658 -12.783 0.793 14.457 53.985
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 45.4750 38.8812 1.170 0.242638
## LSAT 1.4560 0.2258 6.447 0.000000000237 ***
## UGPA 9.2586 2.4478 3.782 0.000171 ***
## Age -0.2771 0.2265 -1.223 0.221725
## Year2022 -5.4000 2.7310 -1.977 0.048476 *
## Year2023 -1.1389 2.6669 -0.427 0.669506
## Year2024 -3.5755 2.6970 -1.326 0.185439
## Year2025 1.7521 2.8046 0.625 0.532398
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 20.31 on 591 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.1142, Adjusted R-squared: 0.1037
## F-statistic: 10.88 on 7 and 591 DF, p-value: 0.0000000000005944
Model 1 interpretation. According to the regression, both LSAT and UGPA were significantly positive predictors of the dependent variable. When all else equal, one additional LSAT point would increase the dependent variable by roughly 1.46 points on average. All else equal, when a student’s UGPA increased by one point, his/her UBE score would increase by around 9.26 points. According to the results, about 11.4% of the variation in the dependent variable could be explained with these two variables.
In Model 2, I introduced law school academic achievement. Final GPA became my key measure of academic performance in the cumulative form. Other academic measures were GPA_1L and probation. Year was my cohort control variable.
model2 <- lm(UBE ~ GPA_Final + GPA_1L + Probation_Y + Year, data = df)
summary(model2)
##
## Call:
## lm(formula = UBE ~ GPA_Final + GPA_1L + Probation_Y + Year, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -58.947 -10.495 0.295 10.923 49.639
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 162.9314 8.3230 19.576 < 0.0000000000000002 ***
## GPA_Final 35.6814 4.6948 7.600 0.000000000000118 ***
## GPA_1L 3.8663 3.7666 1.026 0.3051
## Probation_Y 0.6297 2.9577 0.213 0.8315
## Year2022 -4.9692 2.3419 -2.122 0.0343 *
## Year2023 3.8682 2.2573 1.714 0.0871 .
## Year2024 4.4415 2.2411 1.982 0.0480 *
## Year2025 10.3195 2.2827 4.521 0.000007460377877 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 16.94 on 584 degrees of freedom
## (8 observations deleted due to missingness)
## Multiple R-squared: 0.385, Adjusted R-squared: 0.3776
## F-statistic: 52.22 on 7 and 584 DF, p-value: < 0.00000000000000022
Model 2 interpretation. The most important predictor of academic performance was final GPA. All else being equal, a one-point rise in GPA would lead to an average 35.7 point rise in the UBE score. GPA_1L stopped showing statistical significance after introducing the final GPA. Thus, the final GPA accounted for most of what GPA_1L could predict before. Probation showed no significant effect after accounting for GPA due to its dependence on academic performance. This model could explain approximately 38.5% of variation in UBE scores.
The Model 3 recommendation includes bar preparation, bar-related course work, and support program variables while controlling for the last semester’s law school GPA. However, MBE, MEE, MPT, WrittenScaledScore, and the other components of the UBE exam score are not included because they are the results of the test process itself.
model3 <- lm(UBE ~ GPA_Final + BarPrepCompletion + LawSchoolBarPrepWorkshops +
LegalAnalysis_TexasPractice_Y + AdvLegalPerfSkills_Y + AdvLegalAnalysis_Y +
WritingGuide_Y + SSI_Y + Mentor_Y + Probation_Y + Year,
data = df)
summary(model3)
##
## Call:
## lm(formula = UBE ~ GPA_Final + BarPrepCompletion + LawSchoolBarPrepWorkshops +
## LegalAnalysis_TexasPractice_Y + AdvLegalPerfSkills_Y + AdvLegalAnalysis_Y +
## WritingGuide_Y + SSI_Y + Mentor_Y + Probation_Y + Year, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -55.81 -10.88 0.01 10.79 53.55
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 118.02494 12.66024 9.322 < 0.0000000000000002
## GPA_Final 41.78432 3.22045 12.975 < 0.0000000000000002
## BarPrepCompletion 26.51375 4.80576 5.517 0.0000000528
## LawSchoolBarPrepWorkshops -0.04181 0.40464 -0.103 0.917747
## LegalAnalysis_TexasPractice_Y 2.38792 2.87231 0.831 0.406125
## AdvLegalPerfSkills_Y 5.44006 2.53610 2.145 0.032379
## AdvLegalAnalysis_Y 4.25267 1.85770 2.289 0.022439
## WritingGuide_Y -3.78314 2.12825 -1.778 0.076016
## SSI_Y 5.05622 2.31211 2.187 0.029168
## Mentor_Y 0.26997 1.66812 0.162 0.871488
## Probation_Y -1.67731 2.82516 -0.594 0.552951
## Year2022 -4.24671 2.29186 -1.853 0.064416
## Year2023 15.77974 4.12829 3.822 0.000147
## Year2024 16.26014 4.15730 3.911 0.000103
## Year2025 23.06337 4.26667 5.405 0.0000000959
##
## (Intercept) ***
## GPA_Final ***
## BarPrepCompletion ***
## LawSchoolBarPrepWorkshops
## LegalAnalysis_TexasPractice_Y
## AdvLegalPerfSkills_Y *
## AdvLegalAnalysis_Y *
## WritingGuide_Y .
## SSI_Y *
## Mentor_Y
## Probation_Y
## Year2022 .
## Year2023 ***
## Year2024 ***
## Year2025 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 16.36 on 559 degrees of freedom
## (26 observations deleted due to missingness)
## Multiple R-squared: 0.4194, Adjusted R-squared: 0.4048
## F-statistic: 28.84 on 14 and 559 DF, p-value: < 0.00000000000000022
Model 3 interpretation. The most important result, therefore, is the finding that bar preparation still shows significant relationship with UBE score even while controlling for GPA and year effects. Every one unit increase in completion proportion, from 0% to 100% completion, would lead to a 26.5 unit increase in UBE score.
In the year-controlled regression, the two bar-aligned electives demonstrated positive correlations as well. Advanced Legal Performance Skills correlated with a UBE score that was roughly 5.44 points higher, while Advanced Legal Analysis correlated with a score that was roughly 4.25 points higher, holding other variables constant. Student Success Initiative enrollment correlated positively as well, roughly 5.06 points higher, although this must be viewed with caution as the selection of a support program likely lacks randomness.
Attending law school bar prep workshops did not exhibit significance in this regression. The coefficient was nearly zero, implying that the current count of workshop attendance does not reflect an increase in scores when accounting for GPA and completion rates, among other variables. However, this does not necessarily suggest a lack of benefit from attending workshops.
vif_base <- function(model) {
X <- model.matrix(model)
X <- X[, colnames(X) != "(Intercept)", drop = FALSE]
out <- numeric(ncol(X))
names(out) <- colnames(X)
for (i in seq_len(ncol(X))) {
r2 <- summary(lm(X[, i] ~ X[, -i]))$r.squared
out[i] <- 1 / (1 - r2)
}
sort(out, decreasing = TRUE)
}
vif_base(model3)
## Year2025 Year2024
## 6.496296 6.494613
## Year2023 LegalAnalysis_TexasPractice_Y
## 5.741221 4.398149
## AdvLegalPerfSkills_Y WritingGuide_Y
## 3.426916 2.244883
## GPA_Final SSI_Y
## 2.235070 2.145044
## AdvLegalAnalysis_Y Year2022
## 1.775052 1.608092
## Probation_Y LawSchoolBarPrepWorkshops
## 1.236972 1.234018
## BarPrepCompletion Mentor_Y
## 1.122662 1.100718
The VIF values for most of the independent variables that do not have year in their names were quite reasonable, being lower than levels of worry commonly expressed. I made sure not to include both GPA_Final and FinalRankPercentile since these variables can be considered alternatives.
par(mfrow = c(2, 2))
plot(model3)
par(mfrow = c(1, 1))
cooks <- cooks.distance(model3)
plot(cooks, type = "h", main = "Cook's Distance for Model 3", ylab = "Cook's distance")
abline(h = 4 / length(cooks), lty = 2)
sum(cooks > 4 / length(cooks))
## [1] 38
max(cooks)
## [1] 0.02941939
The residual plots should be inspected on the rendered output screen. For Model 3, some points have Cook’s Distance exceeding the standard value of 4/n, but the largest Cook’s Distance that I got in my analysis was only around 0.028, indicating that there is no particular point exerting too much influence on the model. There might be a slight case of heteroskedasticity when looking at the residual vs fitted values graph, which is normal in score data but still considered a weakness in the study.
The following table summarizes the main models from my run of the cleaned data.
| Model | Response | Purpose | Key results | Fit |
|---|---|---|---|---|
| Model 1 | UBE | Admissions baseline | LSAT and UGPA significant and positive | R² ≈ 0.114 |
| Model 2 | UBE | Academic performance | GPA_Final significant; GPA_1L and Probation not significant after GPA_Final | R² ≈ 0.385 |
| Model 3 | UBE | Actionable preparation/support | GPA_Final, BarPrepCompletion, Advanced Legal Performance Skills, Advanced Legal Analysis, and SSI positive in year-controlled model | R² ≈ 0.419 |
The strongest and most consistent predictors across models are final law school GPA and bar prep completion. The academic model shows that cumulative law school performance is much more predictive than admissions variables alone. The actionable model suggests that commercial bar prep completion and certain bar-aligned courses are associated with score gains even after accounting for final GPA.
Final GPA. Final law school GPA is the strongest predictor of UBE score. In Model 3, a one-point increase in GPA_Final was associated with about a 41.8-point increase in UBE score. This does not mean the school can instantly raise a student’s GPA, but it confirms that cumulative academic performance is a powerful risk indicator. Students with lower final GPA should be identified early for targeted bar support.
BarPrepCompletion. BarPrepCompletion is the most actionable predictor. In Model 3, moving from 0% to 100% completion was associated with about 26.5 UBE points. A more realistic 10-percentage-point increase was associated with about 2.65 UBE points. For students near the passing line, a few points can be decisive. This result supports creating structures that help students reach high completion levels before the exam.
Bar-aligned electives. Advanced Legal Performance Skills and Advanced Legal Analysis were positively associated with UBE score in the year-controlled UBE model. The results obtained were 5.4 and 4.3 UBE points, respectively. This is a small effect size, but still significant, especially for those students who find themselves at the margin of passing the exam. Because these are course-taking variables rather than randomized treatments, the results should be interpreted as associations rather than proof of causality.
Workshops. The number of law school bar prep workshops was not statistically significant. This suggests that simply counting workshop attendance may not be enough. The school should examine workshop design, timing, attendance requirements, and whether the workshops produce measurable practice completion or feedback.
Support programs. Student Success Initiative participation had a positive coefficient in Model 3, but support-program effects must be interpreted cautiously. Students are probably not assigned support at random. If higher-risk students are more likely to receive support, a simple regression coefficient can understate the value of the program. A stronger future analysis would compare similar-risk students with and without support.
Most prior hypotheses were supported. LSAT, UGPA, final GPA, and BarPrepCompletion were positive predictors as expected. Bar-aligned electives were positive in the main UBE model, which also matched expectations. The workshop hypothesis was not supported because the workshop count was not significant. The support-program hypothesis was mixed: SSI was positive in the UBE model, while mentoring was not significant.
This analysis is observational, so the coefficients should not be interpreted as causal effects. Students choose courses, opt into resources, complete bar prep at different rates, and are assigned support programs for reasons that may also relate to bar outcomes. Missing data also matters, especially for MPRE and OptIntoWritingGuide. MPRE was excluded from the main models because most records were missing MPRE values. Finally, PassFail is imbalanced because most candidates passed, which makes logistic regression less informative than the continuous UBE model for estimating practical effect sizes.
It would be reasonable for the school to monitor bar prep completion among its students on a weekly basis and organize mandatory meetings for students lagging behind the threshold. The variable that can serve as the strongest action-oriented coefficient is BarPrepCompletion. With each 10% increase, a student could get an additional 2.65 points on the UBE exam. That is why setting up alerts for students completing less than 70%, 80%, and 90% of the required materials would be helpful.
This recommendation is justified because it suggests a policy that can influence students’ actions directly. The problem with this recommendation is that completion can reflect students’ motivation levels or time management. Nevertheless, the completion rate can easily be monitored, which makes it a very actionable point.
In regard to the main UBE model, the Advanced Legal Performance Skills class is correlated with an average gain of 5.44 additional UBE points, while the Advanced Legal Analysis course is correlated with 4.25 more additional UBE points, adjusting for GPA, whether or not the students have completed their bar preparation courses, support indicators, and academic year. These classes have to do with the bar exam’s written components, and hence they can help struggling students to get familiarized with the practical application of analysis in the MPT and MEE parts of the bar exam. It should be noted here that correlation does not imply causation.
There is no evidence to suggest that the workshop count has any significance as a predictor of the UBE score when other variables have been controlled for. Consequently, there should be no expectation that an increase in the number of workshops will result in higher passage rates. Rather, the workshops should be reconfigured to focus on the following measurable outcomes: essays, MPT assignments, MBE sets timed to reflect examination conditions, personalized feedback, and completion checks.
This recommendation follows from the analysis since there is no indication that the count variable makes a measurable contribution, while BarPrepCompletion does.
This analysis shows that bar exam performance is best predicted by cumulative law school performance and bar preparation completion. Admissions variables matter, but they explain far less of the variation in UBE score than law school GPA and preparation behaviors. The most defensible interventions are to monitor bar prep completion, prioritize bar-aligned skills courses for students near risk thresholds, and redesign workshops to produce measurable practice outputs. These steps are realistic for the law school to implement without changing admissions criteria or expanding the overall program.