In our provided data set, one of the key variables to look out for is the overall UBE score. This score combines the scaled scores of the WrittenScaledScore and the MBE, and could give a clearer idea of what variables are more influential than others, as there can be a quantifiable increase or decrease in overall score based on certain input variables (i.e. say having a final GPA of 3.7 tends to increase scores by +10 points). This is also the score that determines a pass or fail, so it is a good starting point for analysis, and more broken down and specific variables such as WrittenScaledScore or MBE can have further analysis at a later date.
I have two hypotheses for what factors out of those provided may be contributing to the over UBE scores:
Participation in these programs should be helping to increase scores as these programs should solidify knowledge and prepare students for the upcoming exams.
A student’s performance in one individual class may not give a clear idea of there performance and mastery overall, and overall understanding of material is what these tests are about.
getwd()
## [1] "C:/Users/Ringi/OneDrive/Desktop/Spring2026/DataAnalysis"
all_data <- read.csv("BarPass_2025.csv")
str(all_data)
## 'data.frame': 600 obs. of 28 variables:
## $ Year : int 2021 2021 2021 2021 2021 2021 2021 2021 2021 2021 ...
## $ PassFail : chr "F" "F" "F" "F" ...
## $ Age : num 29.1 29.6 29 36.2 28.9 30.8 29.1 42.9 28.3 27.1 ...
## $ LSAT : int 152 155 157 156 145 154 149 160 152 150 ...
## $ UGPA : chr "3.42" "2.82" "3.46" "3.13" ...
## $ CivPro : chr "B+" "B+" "C" "D+" ...
## $ LPI : chr "A" "B" "B" "C" ...
## $ LPII : chr "A" "B" "B" "C+" ...
## $ GPA_1L : num 3.21 2.43 2.62 2.27 2.29 ...
## $ GPA_Final : num 3.29 3.2 2.91 2.77 2.9 2.82 3 3.09 3.21 2.74 ...
## $ FinalRankPercentile : num 0.46 0.33 0.08 0.02 0.08 0.05 0.15 0.22 0.34 0.01 ...
## $ Accommodations : chr "N" "Y" "N" "N" ...
## $ Probation : chr "N" "Y" "N" "Y" ...
## $ LegalAnalysis_TexasPractice: chr "Y" "Y" "Y" "Y" ...
## $ AdvLegalPerfSkills : chr "Y" "Y" "Y" "Y" ...
## $ AdvLegalAnalysis : chr "Y" "Y" "Y" "Y" ...
## $ BarPrepCompany : chr "Barbri" "Barbri" "Barbri" "Barbri" ...
## $ BarPrepCompletion : num 0.96 0.98 0.48 1 0.77 0.02 0.9 0.76 0.77 0.88 ...
## $ OptIntoWritingGuide : chr "" "" "" "" ...
## $ X.LawSchoolBarPrepWorkshops: int 3 0 3 0 5 1 5 5 1 5 ...
## $ StudentSuccessInitiative : chr "N" "Cochran" "Smith" "Baldwin" ...
## $ BarPrepMentor : chr "N" "N" "N" "N" ...
## $ MPRE : num 103 76 99 81 99 NA 90 97 100 78 ...
## $ MPT : num 3 3 3 2.5 3.5 3 2.5 2.5 3 2.5 ...
## $ MEE : num 2.67 3.17 2.67 3 2.67 2 3.5 3 2.67 3.83 ...
## $ WrittenScaledScore : num 126 133 126 126 130 ...
## $ MBE : num 133 133 118 140 125 ...
## $ UBE : num 259 266 244 266 256 ...
grademap <- c("A"=4.0, "A-"=3.7, "B+"=3.3,"B"=3.0,"B-"=2.7,"C+"=2.3,"C"=2.0,"C-"=1.7,"D+"=1.3,"D"=1.0,"D-"=.07,"F"=0)
all_data$UGPA<-as.numeric(all_data$UGPA)
## Warning: NAs introduced by coercion
all_data$CivPro_Num <- grademap[all_data$CivPro]
all_data$LPI_Num <- grademap[all_data$LPI]
all_data$LPII_Num <- grademap[all_data$LPII]
all_data$StudentSuccessInitiative[all_data$StudentSuccessInitiative != "N"] <- "Y"
all_data$StudentSuccessInitiative <- factor(all_data$StudentSuccessInitiative, levels = c("N", "Y"))
all_data$BarPrepMentor[all_data$BarPrepMentor != "N"] <- "Y"
all_data$BarPrepMentor <- factor(all_data$BarPrepMentor, levels = c("N", "Y"))
From the structure of the data shown above, we can see that there are several variables relevant to the hypotheses that need to be changed in order to properly model them. We want specific GPAs rather than letter grades, so those need to be mapped to numeric values in order to properly compare them. We aren’t looking to test specific success initiatives or mentors, so any value that isn’t an N for no will be changed to be Y for yes, the student did participate in that particular program.
subpass <- subset(all_data,UBE >= 270)
subfail <- subset(all_data,UBE < 270)
subinit <- subset(all_data,StudentSuccessInitiative!="N")
subNOinit <- subset(all_data,StudentSuccessInitiative=="N")
subment <- subset(all_data,StudentSuccessInitiative!="N")
subNOment <- subset(all_data,StudentSuccessInitiative=="N")
These subsets will be used to generate boxplots of values for comparisons later on.
str(all_data)
## 'data.frame': 600 obs. of 31 variables:
## $ Year : int 2021 2021 2021 2021 2021 2021 2021 2021 2021 2021 ...
## $ PassFail : chr "F" "F" "F" "F" ...
## $ Age : num 29.1 29.6 29 36.2 28.9 30.8 29.1 42.9 28.3 27.1 ...
## $ LSAT : int 152 155 157 156 145 154 149 160 152 150 ...
## $ UGPA : num 3.42 2.82 3.46 3.13 3.49 2.85 3.43 3.29 3.62 3.07 ...
## $ CivPro : chr "B+" "B+" "C" "D+" ...
## $ LPI : chr "A" "B" "B" "C" ...
## $ LPII : chr "A" "B" "B" "C+" ...
## $ GPA_1L : num 3.21 2.43 2.62 2.27 2.29 ...
## $ GPA_Final : num 3.29 3.2 2.91 2.77 2.9 2.82 3 3.09 3.21 2.74 ...
## $ FinalRankPercentile : num 0.46 0.33 0.08 0.02 0.08 0.05 0.15 0.22 0.34 0.01 ...
## $ Accommodations : chr "N" "Y" "N" "N" ...
## $ Probation : chr "N" "Y" "N" "Y" ...
## $ LegalAnalysis_TexasPractice: chr "Y" "Y" "Y" "Y" ...
## $ AdvLegalPerfSkills : chr "Y" "Y" "Y" "Y" ...
## $ AdvLegalAnalysis : chr "Y" "Y" "Y" "Y" ...
## $ BarPrepCompany : chr "Barbri" "Barbri" "Barbri" "Barbri" ...
## $ BarPrepCompletion : num 0.96 0.98 0.48 1 0.77 0.02 0.9 0.76 0.77 0.88 ...
## $ OptIntoWritingGuide : chr "" "" "" "" ...
## $ X.LawSchoolBarPrepWorkshops: int 3 0 3 0 5 1 5 5 1 5 ...
## $ StudentSuccessInitiative : Factor w/ 2 levels "N","Y": 1 2 2 2 2 2 2 2 1 2 ...
## $ BarPrepMentor : Factor w/ 2 levels "N","Y": 1 1 1 1 1 1 1 1 1 1 ...
## $ MPRE : num 103 76 99 81 99 NA 90 97 100 78 ...
## $ MPT : num 3 3 3 2.5 3.5 3 2.5 2.5 3 2.5 ...
## $ MEE : num 2.67 3.17 2.67 3 2.67 2 3.5 3 2.67 3.83 ...
## $ WrittenScaledScore : num 126 133 126 126 130 ...
## $ MBE : num 133 133 118 140 125 ...
## $ UBE : num 259 266 244 266 256 ...
## $ CivPro_Num : num 3.3 3.3 2 1.3 2 3.3 2 2 2.3 2 ...
## $ LPI_Num : num 4 3 3 2 2.3 0 2 2.3 3 3 ...
## $ LPII_Num : num 4 3 3 2.3 2.3 NA 3 3 3 2 ...
head(all_data)
## Year PassFail Age LSAT UGPA CivPro LPI LPII GPA_1L GPA_Final
## 1 2021 F 29.1 152 3.42 B+ A A 3.206 3.29
## 2 2021 F 29.6 155 2.82 B+ B B 2.431 3.20
## 3 2021 F 29.0 157 3.46 C B B 2.620 2.91
## 4 2021 F 36.2 156 3.13 D+ C C+ 2.275 2.77
## 5 2021 F 28.9 145 3.49 C C+ C+ 2.293 2.90
## 6 2021 F 30.8 154 2.85 B+ F CR 2.538 2.82
## FinalRankPercentile Accommodations Probation LegalAnalysis_TexasPractice
## 1 0.46 N N Y
## 2 0.33 Y Y Y
## 3 0.08 N N Y
## 4 0.02 N Y Y
## 5 0.08 N Y Y
## 6 0.05 N N Y
## AdvLegalPerfSkills AdvLegalAnalysis BarPrepCompany BarPrepCompletion
## 1 Y Y Barbri 0.96
## 2 Y Y Barbri 0.98
## 3 Y Y Barbri 0.48
## 4 Y Y Barbri 1.00
## 5 Y Y Themis 0.77
## 6 Y Y Themis 0.02
## OptIntoWritingGuide X.LawSchoolBarPrepWorkshops StudentSuccessInitiative
## 1 3 N
## 2 0 Y
## 3 3 Y
## 4 0 Y
## 5 5 Y
## 6 1 Y
## BarPrepMentor MPRE MPT MEE WrittenScaledScore MBE UBE CivPro_Num LPI_Num
## 1 N 103 3.0 2.67 125.5 133.3 258.8 3.3 4.0
## 2 N 76 3.0 3.17 133.1 132.7 265.8 3.3 3.0
## 3 N 99 3.0 2.67 125.5 118.2 243.7 2.0 3.0
## 4 N 81 2.5 3.00 125.5 140.1 265.6 1.3 2.0
## 5 N 99 3.5 2.67 130.5 125.4 255.9 2.0 2.3
## 6 N NA 3.0 2.00 115.4 113.5 228.9 3.3 0.0
## LPII_Num
## 1 4.0
## 2 3.0
## 3 3.0
## 4 2.3
## 5 2.3
## 6 NA
hyp1 <- lm(UBE~StudentSuccessInitiative*BarPrepMentor*X.LawSchoolBarPrepWorkshops, data=all_data)
plot(hyp1,1:2)
summary(hyp1)
##
## Call:
## lm(formula = UBE ~ StudentSuccessInitiative * BarPrepMentor *
## X.LawSchoolBarPrepWorkshops, data = all_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -70.438 -13.007 -0.449 13.082 59.593
##
## Coefficients:
## Estimate
## (Intercept) 299.1072
## StudentSuccessInitiativeY -17.0119
## BarPrepMentorY 2.7899
## X.LawSchoolBarPrepWorkshops 0.4307
## StudentSuccessInitiativeY:BarPrepMentorY -8.2936
## StudentSuccessInitiativeY:X.LawSchoolBarPrepWorkshops -1.3841
## BarPrepMentorY:X.LawSchoolBarPrepWorkshops -1.3459
## StudentSuccessInitiativeY:BarPrepMentorY:X.LawSchoolBarPrepWorkshops 4.4876
## Std. Error
## (Intercept) 1.3512
## StudentSuccessInitiativeY 2.7162
## BarPrepMentorY 3.1309
## X.LawSchoolBarPrepWorkshops 0.6464
## StudentSuccessInitiativeY:BarPrepMentorY 6.6269
## StudentSuccessInitiativeY:X.LawSchoolBarPrepWorkshops 1.1009
## BarPrepMentorY:X.LawSchoolBarPrepWorkshops 1.1454
## StudentSuccessInitiativeY:BarPrepMentorY:X.LawSchoolBarPrepWorkshops 2.2796
## t value
## (Intercept) 221.357
## StudentSuccessInitiativeY -6.263
## BarPrepMentorY 0.891
## X.LawSchoolBarPrepWorkshops 0.666
## StudentSuccessInitiativeY:BarPrepMentorY -1.252
## StudentSuccessInitiativeY:X.LawSchoolBarPrepWorkshops -1.257
## BarPrepMentorY:X.LawSchoolBarPrepWorkshops -1.175
## StudentSuccessInitiativeY:BarPrepMentorY:X.LawSchoolBarPrepWorkshops 1.969
## Pr(>|t|)
## (Intercept) < 2e-16
## StudentSuccessInitiativeY 7.26e-10
## BarPrepMentorY 0.3733
## X.LawSchoolBarPrepWorkshops 0.5055
## StudentSuccessInitiativeY:BarPrepMentorY 0.2112
## StudentSuccessInitiativeY:X.LawSchoolBarPrepWorkshops 0.2091
## BarPrepMentorY:X.LawSchoolBarPrepWorkshops 0.2404
## StudentSuccessInitiativeY:BarPrepMentorY:X.LawSchoolBarPrepWorkshops 0.0495
##
## (Intercept) ***
## StudentSuccessInitiativeY ***
## BarPrepMentorY
## X.LawSchoolBarPrepWorkshops
## StudentSuccessInitiativeY:BarPrepMentorY
## StudentSuccessInitiativeY:X.LawSchoolBarPrepWorkshops
## BarPrepMentorY:X.LawSchoolBarPrepWorkshops
## StudentSuccessInitiativeY:BarPrepMentorY:X.LawSchoolBarPrepWorkshops *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 19.79 on 592 degrees of freedom
## Multiple R-squared: 0.1578, Adjusted R-squared: 0.1479
## F-statistic: 15.85 on 7 and 592 DF, p-value: < 2.2e-16
hyp1_2 <- glm(UBE~StudentSuccessInitiative, data=all_data)
The residuals and Q-Q graphs show that the data is a normal distribution for the first model, however we can see that only the Student Success Initiatives and the three way interactions seem to have any significance to the model. This led to the final model only having the Student Success Initiative.
boxplot(subinit$UBE,subNOinit$UBE,main="Side-by-side boxplots of UBE scores by participation in student success initiative programs by yes/no",
xlab="participation",ylab="UBE scores",names=c("Yes","No"),
col=c("green","red"))
boxplot(subment$UBE,subNOment$UBE,main="Side-by-side boxplots of UBE scores by students having mentors by yes/no",
xlab="having a mentor",ylab="UBE scores",names=c("Yes","No"),
col=c("green","red"))
plot(all_data$X.LawSchoolBarPrepWorkshops,all_data$UBE)
Interestingly, it appears as though participation in student success initiatives and having a mentor actually decreases the likelihood of passing the exam.
hyp2_1 <- lm(UBE ~ UGPA+GPA_Final+CivPro_Num+LPI_Num+LPII_Num, data=all_data)
plot(hyp2_1,1:2)
summary(hyp2_1)
##
## Call:
## lm(formula = UBE ~ UGPA + GPA_Final + CivPro_Num + LPI_Num +
## LPII_Num, data = all_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -62.172 -10.860 0.196 12.019 49.990
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 164.530 9.729 16.912 < 2e-16 ***
## UGPA 2.714 2.132 1.273 0.203526
## GPA_Final 40.584 3.532 11.492 < 2e-16 ***
## CivPro_Num 2.864 1.372 2.087 0.037359 *
## LPI_Num -5.346 1.535 -3.482 0.000538 ***
## LPII_Num -1.302 1.546 -0.842 0.400334
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.36 on 535 degrees of freedom
## (59 observations deleted due to missingness)
## Multiple R-squared: 0.3493, Adjusted R-squared: 0.3432
## F-statistic: 57.44 on 5 and 535 DF, p-value: < 2.2e-16
hyp2_2 <- lm(UBE ~ UGPA+GPA_Final+CivPro_Num+LPI_Num,data=all_data)
summary(hyp2_2)
##
## Call:
## lm(formula = UBE ~ UGPA + GPA_Final + CivPro_Num + LPI_Num, data = all_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -61.91 -10.82 0.27 11.95 50.73
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 160.785 9.355 17.188 < 2e-16 ***
## UGPA 2.963 2.034 1.457 0.14573
## GPA_Final 39.918 3.161 12.628 < 2e-16 ***
## CivPro_Num 2.014 1.320 1.525 0.12770
## LPI_Num -4.424 1.364 -3.244 0.00124 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.53 on 585 degrees of freedom
## (10 observations deleted due to missingness)
## Multiple R-squared: 0.3403, Adjusted R-squared: 0.3358
## F-statistic: 75.44 on 4 and 585 DF, p-value: < 2.2e-16
hyp2_3 <- lm(UBE ~ GPA_Final+LPI_Num,data=all_data)
summary(hyp2_3)
##
## Call:
## lm(formula = UBE ~ GPA_Final + LPI_Num, data = all_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -62.730 -10.996 -0.097 11.820 50.216
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 166.800 7.490 22.269 < 2e-16 ***
## GPA_Final 42.694 2.692 15.862 < 2e-16 ***
## LPI_Num -4.006 1.342 -2.984 0.00296 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.56 on 588 degrees of freedom
## (9 observations deleted due to missingness)
## Multiple R-squared: 0.3351, Adjusted R-squared: 0.3328
## F-statistic: 148.2 on 2 and 588 DF, p-value: < 2.2e-16
As can be seen in the Residuals and Q-Q plots, the data appears to have a normal distribution. In the summary however, it can be seen that the effects of undergraduate GPA, CivPro GPA, and LPI_II GPA do not have much effect, so they are excluded in the final model.
boxplot(subfail$GPA_Final,subpass$GPA_Final,main="Side-by-side boxplots of student final GPAs by pass/fail",
xlab="Score groups <270 (failing) and >=270 (passing)",ylab="Final GPA",names=c("Fail","Pass"),
col=c("red","green"))
boxplot(subfail$LPI_Num,subpass$LPI_Num,main="Side-by-side boxplots of students LPI GPAs by pass/fail",
xlab="Score groups <270 (failing) and >=270 (passing)",ylab="LPI GPA",names=c("Fail","Pass"),
col=c("red","green"))
As we can see from the boxplots above, there doesn’t appear to be much variation in scores based on LPI GPA, but there is some variation based on Final GPA, so LPI GPA was excluded from the final model.
hyp1_2 <- glm(UBE~StudentSuccessInitiative, data=all_data)
summary(hyp1_2)
##
## Call:
## glm(formula = UBE ~ StudentSuccessInitiative, data = all_data)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 299.6883 0.9368 319.90 <2e-16 ***
## StudentSuccessInitiativeY -19.0786 1.8432 -10.35 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 390.5415)
##
## Null deviance: 275388 on 599 degrees of freedom
## Residual deviance: 233544 on 598 degrees of freedom
## AIC: 5287.2
##
## Number of Fisher Scoring iterations: 2
final_hyp2 <- lm(UBE ~ GPA_Final,data=all_data)
summary(final_hyp2)
##
## Call:
## lm(formula = UBE ~ GPA_Final, data = all_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -63.166 -11.345 -0.094 11.512 50.889
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 169.176 7.425 22.79 <2e-16 ***
## GPA_Final 38.346 2.256 16.99 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.62 on 598 degrees of freedom
## Multiple R-squared: 0.3257, Adjusted R-squared: 0.3245
## F-statistic: 288.8 on 1 and 598 DF, p-value: < 2.2e-16
plot(all_data$GPA_Final,all_data$UBE)
abline(final_hyp2,col="red")
From what we can see in the data, the programs currently available to students do not seem to be making a positive influence on overall UBE scores, but students who are able to keep up a higher GPA upon graduation appear to be getting higher scores than those with lower final GPAs.
It is recommend that the student success initiatives and mentor programs for the UBE get a rework in order to make them more effective. It is also recommended that advisers take a closer look at a student’s current overall GPA before deciding if the student is ready for the exam.