Bar Exam Analysis

Introduction

This analysis is to examine which measurable factors are associated with bar exam outcomes and to translate those findings into guidance that could be useful to the law school. From an institutional standpoint, bar passage is a binary outcome, so a logistic regression model with “Pass/Fail” as the response variable is appropriate. This allows the predictors to be interpreted in terms of increased or decreased likelihood of passing, which is a natural framework for understanding risk.

A linear regression model is fit using the Uniform Bar Examination score “UBE” as the response. “UBE” is a continuous measure, this model allows predictor effects to be discussed in terms of score changes. This helps clarify the magnitude of each effect and supports interpretation of the logistic results.

Predictors are selected based on their timing and instructional relevance. “GPA_Final” and “FinalRankPercentile” summarize cumulative law school performance. “BarPrepCompletion” reflects student engagement with commercial bar preparation. “MPRE” is included as an early standardized exam taken prior to the bar and may serve as a signal of exam readiness. Variables that determine the outcome, such as “MBE” or “WrittenScaledScore”, are excluded to avoid built‑in relationships.

Data and Methods

The data include bar exam outcomes for multiple graduating groups. The dataset is imported using “read.csv().” The response variable “Pass/Fail” is converted into a factor with “F” set as the reference level so that positive coefficients correspond to increased odds of passing. Observations with missing data on any modeling variable are removed using “complete.cases().” Two models are estimated using base R. A logistic regression is used for bar passage, and a linear regression is used for total UBE score.

Computation: Data Import, Cleaning, and Model

df <- read.csv("BarData_2025.csv")

df$PassFail <- factor(df$PassFail,levels=c("F","P"))

vars_used <- c("PassFail","UBE","GPA_Final","FinalRankPercentile","BarPrepCompletion","MPRE")

df <- df[complete.cases(df[,vars_used]),]

model_logit <- glm(PassFail~GPA_Final+FinalRankPercentile+BarPrepCompletion+MPRE,family=binomial,data=df)

model_lm <- lm(UBE~GPA_Final+FinalRankPercentile+BarPrepCompletion+MPRE,data=df)

summary(model_logit)

## 
## Call:
## glm(formula = PassFail ~ GPA_Final + FinalRankPercentile + BarPrepCompletion + 
##     MPRE, family = binomial, data = df)
## 
## Coefficients:
##                      Estimate Std. Error z value Pr(>|z|)  
## (Intercept)         -23.98713   12.75039  -1.881   0.0599 .
## GPA_Final             6.01103    4.45012   1.351   0.1768  
## FinalRankPercentile  -2.10805    4.69558  -0.449   0.6535  
## BarPrepCompletion     2.45690    1.81187   1.356   0.1751  
## MPRE                  0.05735    0.02709   2.117   0.0343 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 145.21  on 193  degrees of freedom
## Residual deviance: 107.04  on 189  degrees of freedom
## AIC: 117.04
## 
## Number of Fisher Scoring iterations: 6

summary(model_lm)

## 
## Call:
## lm(formula = UBE ~ GPA_Final + FinalRankPercentile + BarPrepCompletion + 
##     MPRE, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -56.951  -9.378   0.738   9.299  38.497 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)   
## (Intercept)          70.51552   53.75772   1.312  0.19120   
## GPA_Final            54.56708   19.61674   2.782  0.00596 **
## FinalRankPercentile -19.03818   20.41881  -0.932  0.35233   
## BarPrepCompletion    22.70495    8.87133   2.559  0.01127 * 
## MPRE                  0.29366    0.09306   3.155  0.00186 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.48 on 189 degrees of freedom
## Multiple R-squared:  0.4888, Adjusted R-squared:  0.478 
## F-statistic: 45.18 on 4 and 189 DF,  p-value: < 2.2e-16

cor(df[, c("GPA_Final","FinalRankPercentile","BarPrepCompletion","MPRE")])

##                     GPA_Final FinalRankPercentile BarPrepCompletion      MPRE
## GPA_Final           1.0000000           0.9829186         0.1858068 0.4683340
## FinalRankPercentile 0.9829186           1.0000000         0.1847957 0.4360027
## BarPrepCompletion   0.1858068           0.1847957         1.0000000 0.1201107
## MPRE                0.4683340           0.4360027         0.1201107 1.0000000

par(mfrow = c(2,2))
par(mfrow = c(1,1))

plot(model_lm)

Results

Bar Passage Regression

The logistic regression model evaluates how academic performance, bar preparation, and standardized test performance relate to the probability of passing the bar exam. The model reduces the deviance from 145.21 (null deviance) to 107.04 (residual deviance) using four predictors, indicating that the model explains a meaningful portion of the variation in bar passage relative to an intercept‑only model. The model converged normally in six Fisher scoring iterations and has an AIC of 117.04.

Among the predictors, MPRE is the only variable that is statistically significant at the 5% level (Estimate = 0.0574, p = 0.0343). This implies that higher MPRE scores are associated with increased odds of passing the bar exam. Because MPRE is taken prior to the bar exam, this result supports its role as an early indicator of readiness.

GPA_Final and BarPrepCompletion both have positive estimated coefficients (6.01 and 2.46), indicating that higher GPA and greater bar prep completion are associated with higher odds of passing. Neither variable reaches conventional levels of statistical significance in this model (p ≈ 0.18 for both). These results suggest positive directional effects that may be practically meaningful but are estimated with uncertainty. FinalRankPercentile has a negative coefficient and is not statistically significant. This result is likely influenced by the extremely high correlation between FinalRankPercentile and GPA_Final, noted in the correlation matrix.

UBE Score Regression

The linear regression model explains nearly half of the variation in UBE scores (R^2 = 0.4888; adjusted R^2 = 0.478). The overall F‑statistic is 45.18 with a p-value below 2.2e‑16, indicating that the predictors jointly explain a portion of variability. GPA_Final has a strong and statistically significant effect on UBE score (Estimate = 54.57, p = 0.00596). This implies that a one‑point increase in final law school GPA is associated with an average increase of approximately 55 UBE points. BarPrepCompletion is also statistically significant (Estimate = 22.70, p = 0.0113). Students who complete more of their bar preparation program tend to earn higher UBE scores, even after controlling for cumulative academic performance. MPRE is again statistically significant (Estimate = 0.294, p = 0.00186), indicating that higher MPRE scores are associated with higher bar exam performance measured.

FinalRankPercentile is not statistically significant in the linear model (p = 0.35), despite being theoretically relevant. This aligns with the correlation matrix, which shows an extremely high correlation (~ 0.98) between GPA_Final and FinalRankPercentile, suggesting overlapping explanatory content.

Collinearity Assessment

The correlation matrix confirms very high correlation between GPA_Final and FinalRankPercentile (ρ ~ 0.983). This level of correlation explains why rank does not appear statistically significant once GPA is included in the model. Correlations between MPRE and GPA are moderate (~ 0.47), while BarPrepCompletion shows low correlation with academic variables, supporting its interpretation as a distinct factor.

Model Diagnostics

The Residuals vs Fitted plot shows no strong nonlinear pattern, indicating that a linearity is reasonable. Residuals are centered around zero across the range of fitted values, with no clear curvature.

The Normal Q–Q plot shows deviations in the tails, but the bulk of residuals follow the theoretical line closely. Given the sample size (n ~ 190), minor departures from normality are not unexpected and are not sufficient to invalidate.

The Scale–Location plot does not show a systematic increase or decrease in residual spread across fitted values, suggesting approximate homogeneity of variance.

The Residuals vs Leverage plot identifies a few observations with moderate difference, but none exceed Cook’s distance. This indicates that no single observation is unduly influencing the model estimates.

Discussion

The analysis shows that cumulative academic performance remains the strongest and most consistent predictor of bar exam success. Final GPA is statistically significant in the UBE model and positively associated with bar passage probability in the logistic model. This reflects that the bar exam largely tests skills and knowledge developed throughout law school.

Bar preparation completion emerges as an important and actionable factor. Although not statistically significant in the logistic model at the 5% level, BarPrepCompletion is significant in the linear model and has a positive sign in both models. Given its relatively low correlation with GPA and rank, this variable captures behavior that is not redundant with academic performance.

MPRE performance is statistically significant in both models. Since the MPRE is administered prior to bar preparation, this result supports the idea that early standardized assessment can identify students who may struggle later.

The lack of significance for FinalRankPercentile once GPA is included is explained by collinearity. When predictors are highly correlated, statistical significance should be interpreted cautiously, and substantive meaning should guide variable selection. Overall, the models are statistically adequate, and the diagnostic plots do not raise serious concerns about violations of linear model assumptions.

Recommendations

Monitor and enforce bar preparation completion

BarPrepCompletion has a statistically significant effect on UBE score and a positive association with bar passage. This variable reflects student behavior rather than fixed ability, the law school should track completion rates and intervene early when students fall behind.

Use MPRE performance as an early warning signal

MPRE is significant in both models and is observed before bar preparation begins. Students with low MPRE scores should be targeted for supplemental academic and bar‑skills support well in advance of the exam.

Prioritize resources for students with weaker cumulative academic performance

GPA_Final consistently predicts outcomes and dominates class rank due to collinearity. Support programs should be more intensive for students with lower GPAs, where marginal improvements are most likely.

5344 Assignment 17: Bar Exam Analysis

Austen Ruby, R11935023

2026-04-29