Bar Exam Performance Analysis

Introduction

I chose the UBE score as main response variable because it is a continuous measure. Also i modeled Pass Fail using logistic regression because that determines whether a student gets licensed.

Moreover, as the UBE can be calculate directly from the scores i did not use the other functions.

Hypotheses As per my understanding LSAT and UGPA will predict UBE score because they measure academic ability before law school.

Also the final law school GPA will predict UBE better than admission scores because it reflect all 3 years of law school.

I also expect that students who completed more of their bar prep program will have a higher chance of passing because they spent more time preparing.

df<-read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/BarData_2025.csv")
str(df)

## 'data.frame':    600 obs. of  28 variables:
##  $ Year                       : int  2021 2021 2021 2021 2021 2021 2021 2021 2021 2021 ...
##  $ PassFail                   : chr  "F" "F" "F" "F" ...
##  $ Age                        : num  29.1 29.6 29 36.2 28.9 30.8 29.1 42.9 28.3 27.1 ...
##  $ LSAT                       : int  152 155 157 156 145 154 149 160 152 150 ...
##  $ UGPA                       : chr  "3.42" "2.82" "3.46" "3.13" ...
##  $ CivPro                     : chr  "B+" "B+" "C" "D+" ...
##  $ LPI                        : chr  "A" "B" "B" "C" ...
##  $ LPII                       : chr  "A" "B" "B" "C+" ...
##  $ GPA_1L                     : num  3.21 2.43 2.62 2.27 2.29 ...
##  $ GPA_Final                  : num  3.29 3.2 2.91 2.77 2.9 2.82 3 3.09 3.21 2.74 ...
##  $ FinalRankPercentile        : num  0.46 0.33 0.08 0.02 0.08 0.05 0.15 0.22 0.34 0.01 ...
##  $ Accommodations             : chr  "N" "Y" "N" "N" ...
##  $ Probation                  : chr  "N" "Y" "N" "Y" ...
##  $ LegalAnalysis_TexasPractice: chr  "Y" "Y" "Y" "Y" ...
##  $ AdvLegalPerfSkills         : chr  "Y" "Y" "Y" "Y" ...
##  $ AdvLegalAnalysis           : chr  "Y" "Y" "Y" "Y" ...
##  $ BarPrepCompany             : chr  "Barbri" "Barbri" "Barbri" "Barbri" ...
##  $ BarPrepCompletion          : num  0.96 0.98 0.48 1 0.77 0.02 0.9 0.76 0.77 0.88 ...
##  $ OptIntoWritingGuide        : chr  "" "" "" "" ...
##  $ X.LawSchoolBarPrepWorkshops: int  3 0 3 0 5 1 5 5 1 5 ...
##  $ StudentSuccessInitiative   : chr  "N" "Cochran" "Smith" "Baldwin" ...
##  $ BarPrepMentor              : chr  "N" "N" "N" "N" ...
##  $ MPRE                       : num  103 76 99 81 99 NA 90 97 100 78 ...
##  $ MPT                        : num  3 3 3 2.5 3.5 3 2.5 2.5 3 2.5 ...
##  $ MEE                        : num  2.67 3.17 2.67 3 2.67 2 3.5 3 2.67 3.83 ...
##  $ WrittenScaledScore         : num  126 133 126 126 130 ...
##  $ MBE                        : num  133 133 118 140 125 ...
##  $ UBE                        : num  259 266 244 266 256 ...

df$UGPA<-as.numeric(df$UGPA)

## Warning: NAs introduced by coercion

df$PassFail<-factor(df$PassFail, levels = c("F", "P"))

grademap<-c("A"=4.0, "A-"=3.7, "B+"=3.3, "B"=3.0, "B-"=2.7,"C+"=2.3, "C"=2.0, "C-"=1.7, "D+"=1.3, "D"=1.0, "D-"=.07, "F"=0)

df$CivPro_Num<-grademap[df$CivPro]
df$LPI_Num<-grademap[df$LPI]
df$LPII_Num<-grademap[df$LPII]

df$BarPrepCompletion<-as.numeric(df$BarPrepCompletion)

head(df)

##   Year PassFail  Age LSAT UGPA CivPro LPI LPII GPA_1L GPA_Final
## 1 2021        F 29.1  152 3.42     B+   A    A  3.206      3.29
## 2 2021        F 29.6  155 2.82     B+   B    B  2.431      3.20
## 3 2021        F 29.0  157 3.46      C   B    B  2.620      2.91
## 4 2021        F 36.2  156 3.13     D+   C   C+  2.275      2.77
## 5 2021        F 28.9  145 3.49      C  C+   C+  2.293      2.90
## 6 2021        F 30.8  154 2.85     B+   F   CR  2.538      2.82
##   FinalRankPercentile Accommodations Probation LegalAnalysis_TexasPractice
## 1                0.46              N         N                           Y
## 2                0.33              Y         Y                           Y
## 3                0.08              N         N                           Y
## 4                0.02              N         Y                           Y
## 5                0.08              N         Y                           Y
## 6                0.05              N         N                           Y
##   AdvLegalPerfSkills AdvLegalAnalysis BarPrepCompany BarPrepCompletion
## 1                  Y                Y         Barbri              0.96
## 2                  Y                Y         Barbri              0.98
## 3                  Y                Y         Barbri              0.48
## 4                  Y                Y         Barbri              1.00
## 5                  Y                Y         Themis              0.77
## 6                  Y                Y         Themis              0.02
##   OptIntoWritingGuide X.LawSchoolBarPrepWorkshops StudentSuccessInitiative
## 1                                               3                        N
## 2                                               0                  Cochran
## 3                                               3                    Smith
## 4                                               0                  Baldwin
## 5                                               5                  Baldwin
## 6                                               1                    Rosen
##   BarPrepMentor MPRE MPT  MEE WrittenScaledScore   MBE   UBE CivPro_Num LPI_Num
## 1             N  103 3.0 2.67              125.5 133.3 258.8        3.3     4.0
## 2             N   76 3.0 3.17              133.1 132.7 265.8        3.3     3.0
## 3             N   99 3.0 2.67              125.5 118.2 243.7        2.0     3.0
## 4             N   81 2.5 3.00              125.5 140.1 265.6        1.3     2.0
## 5             N   99 3.5 2.67              130.5 125.4 255.9        2.0     2.3
## 6             N   NA 3.0 2.00              115.4 113.5 228.9        3.3     0.0
##   LPII_Num
## 1      4.0
## 2      3.0
## 3      3.0
## 4      2.3
## 5      2.3
## 6       NA

colnames(df)

##  [1] "Year"                        "PassFail"                   
##  [3] "Age"                         "LSAT"                       
##  [5] "UGPA"                        "CivPro"                     
##  [7] "LPI"                         "LPII"                       
##  [9] "GPA_1L"                      "GPA_Final"                  
## [11] "FinalRankPercentile"         "Accommodations"             
## [13] "Probation"                   "LegalAnalysis_TexasPractice"
## [15] "AdvLegalPerfSkills"          "AdvLegalAnalysis"           
## [17] "BarPrepCompany"              "BarPrepCompletion"          
## [19] "OptIntoWritingGuide"         "X.LawSchoolBarPrepWorkshops"
## [21] "StudentSuccessInitiative"    "BarPrepMentor"              
## [23] "MPRE"                        "MPT"                        
## [25] "MEE"                         "WrittenScaledScore"         
## [27] "MBE"                         "UBE"                        
## [29] "CivPro_Num"                  "LPI_Num"                    
## [31] "LPII_Num"

I converted letter grades to numbers using a grade map so they can be used in regression. UGPA and BarPrepCompletion were also converted to numeric.

Summary Statistics

summary(df[, c("UBE", "LSAT", "UGPA", "GPA_Final", "BarPrepCompletion")])

##       UBE             LSAT            UGPA         GPA_Final    
##  Min.   :227.3   Min.   :141.0   Min.   :2.010   Min.   :2.440  
##  1st Qu.:280.4   1st Qu.:153.0   1st Qu.:3.280   1st Qu.:3.050  
##  Median :295.3   Median :156.0   Median :3.540   Median :3.263  
##  Mean   :294.7   Mean   :155.6   Mean   :3.478   Mean   :3.275  
##  3rd Qu.:309.6   3rd Qu.:158.0   3rd Qu.:3.740   3rd Qu.:3.500  
##  Max.   :358.7   Max.   :171.0   Max.   :4.140   Max.   :3.990  
##                                  NA's   :1                      
##  BarPrepCompletion
##  Min.   :0.000    
##  1st Qu.:0.800    
##  Median :0.900    
##  Mean   :0.865    
##  3rd Qu.:0.980    
##  Max.   :1.000    
##  NA's   :26

Here I get the overview of the minimum, maximum, and average value.

Pass/Fail count

table(df$PassFail)

## 
##   F   P 
##  61 539

df_pass<-subset(df, PassFail == "P")
df_fail<-subset(df, PassFail == "F")

boxplot(df_pass$UBE, df_fail$UBE, main="Side-by-side Boxplot of UBE Scores by Pass/Fail",xlab= "Pass/Fail of Exam", ylab="UBE Score", names=c("Pass","Fail"), col=c("green", "red"))

Here we can see clear difference in the UBE scores between students who pass and fail.

Scatter plots

plot(df$LSAT, df$UBE)
abline(lm(df$UBE~df$LSAT))

plot(df$UGPA, df$UBE)
abline(lm(df$UBE~df$UGPA))

plot(df$GPA_Final, df$UBE)
abline(lm(df$UBE~df$GPA_Final))

Models

Model 1 - LSAT and UGPA Predicting UBE Score (Interaction)

I hypothesize that LSAT and UGPA together predict UBE score. I first tested whether the two variables interact with each other.

model1<-lm(UBE~LSAT*UGPA, data=df)
summary(model1)

## 
## Call:
## lm(formula = UBE ~ LSAT * UGPA, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -66.392 -13.162   0.862  14.074  53.757 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -618.7953   349.9206  -1.768   0.0775 .
## LSAT           5.5922     2.2303   2.507   0.0124 *
## UGPA         188.4714    97.5901   1.931   0.0539 .
## LSAT:UGPA     -1.1317     0.6223  -1.819   0.0695 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20.37 on 595 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.103,  Adjusted R-squared:  0.09845 
## F-statistic: 22.77 on 3 and 595 DF,  p-value: 5.777e-14

plot(model1, 1)

plot(model1, 2)

The interaction term is not significant so i will try the simpler model without interaction.

Model 2 - LSAT and UGPA Predicting UBE Score (Additive)

model2<-lm(UBE~LSAT+UGPA, data=df)
summary(model2)

## 
## Call:
## lm(formula = UBE ~ LSAT + UGPA, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -66.221 -13.466   1.022  14.406  54.180 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  14.1607    36.1339   0.392    0.695    
## LSAT          1.5557     0.2183   7.125 3.02e-12 ***
## UGPA         11.0483     2.2807   4.844 1.62e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20.41 on 596 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.09799,    Adjusted R-squared:  0.09496 
## F-statistic: 32.37 on 2 and 596 DF,  p-value: 4.499e-14

Both LSAT and UGPA are significant predictors of UBE score.

We can see R-squared shows how much variation in UBE score. these support my first hypothesis.

Model 3 - Final GPA Predicting UBE Score

I want to see if final law school GPA predicts bar performance better than admission scores since it covers all three years of law school.

df_m3<-df[complete.cases(df[, c("UBE", "GPA_Final")]), ]
model3<-lm(UBE~GPA_Final, data=df_m3)
summary(model3)

## 
## Call:
## lm(formula = UBE ~ GPA_Final, data = df_m3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -63.100 -11.370  -0.115  11.630  51.048 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  168.344      7.399   22.75   <2e-16 ***
## GPA_Final     38.585      2.249   17.16   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.56 on 598 degrees of freedom
## Multiple R-squared:  0.3299, Adjusted R-squared:  0.3288 
## F-statistic: 294.5 on 1 and 598 DF,  p-value: < 2.2e-16

plot(model3, 1)

plot(model3, 2)

GPA final is significant. The R-squared is higher than Model 2 which means final law school GPA explains more variation in UBE score than admission scores alone. This supports my second hypothesis.

Model 4 - LSAT and UGPA Predicting Pass/Fail (Interaction)

Here I use logistic regression to predict whether a student passes or fails directly. I first test the interaction model.

model4<-glm(PassFail~LSAT*UGPA, data=df, family=binomial)
anova(model4, test="Chisq")

## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: PassFail
## 
## Terms added sequentially (first to last)
## 
## 
##           Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                        598     394.26              
## LSAT       1  17.3830       597     376.88 3.056e-05 ***
## UGPA       1   6.6384       596     370.24   0.00998 ** 
## LSAT:UGPA  1   0.1542       595     370.09   0.69455    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The interaction term is not significant. So lets try simpler model.

Model 5 - LSAT and UGPA Predicting Pass/Fail (Additive)

model5<-glm(PassFail~LSAT+UGPA, data=df, family=binomial)
summary(model5)

## 
## Call:
## glm(formula = PassFail ~ LSAT + UGPA, family = binomial, data = df)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -27.32850    6.24420  -4.377 1.21e-05 ***
## LSAT          0.16872    0.03717   4.539 5.66e-06 ***
## UGPA          0.98478    0.37537   2.624   0.0087 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 394.26  on 598  degrees of freedom
## Residual deviance: 370.24  on 596  degrees of freedom
##   (1 observation deleted due to missingness)
## AIC: 376.24
## 
## Number of Fisher Scoring iterations: 5

anova(model5, test = "Chisq")

## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: PassFail
## 
## Terms added sequentially (first to last)
## 
## 
##      Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                   598     394.26              
## LSAT  1  17.3830       597     376.88 3.056e-05 ***
## UGPA  1   6.6384       596     370.24   0.00998 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

null_dev<-model5$null.deviance
resid_dev<-model5$deviance

1-pchisq(null_dev-resid_dev, df=2)

## [1] 6.0788e-06

1-resid_dev/null_dev

## [1] 0.06092768

The chi-square p-value shows the model is statistically significant overall. The pseudo R-squared tells me how much better this model predicts pass/fail compared to guessing. Both LSAT and UGPA are significant predictors of passing the bar exam.

Discussion

Models 1 and 2 tested whether admission criteria predict UBE score. The interaction between LSAT and UGPA was not significant so the simpler additive model is better. Both LSAT and UGPA are positive predictors which supports my first hypothesis.

Model 3 showed that final GPA is a stronger predictor of UBE score with a higher R-squared than Model 2. This supports my second hypothesis. What a student does during law school matters more than what they came in with.

Models 4 and 5 used logistic regression to predict pass/fail. LSAT and UGPA are both significant. The pseudo R-squared is small which tells me admission criteria alone are not enough to fully explain who passes.

Recommendations

Students with lower LSAT scores should be connected to academic support early in 1 year. Models 2 and 5 both show LSAT is a significant predictor of bar performance. This is observational so we cannot say low LSAT means failure but it is early warning sign.
The school should monitor final GPA throughout the program. Model 3 shows GPA Final is the strongest predictor of UBE score among the variables I tested. Students below a certain GPA could be automatically enrolled in a bar prep support program before graduation.
Students should be encouraged to complete their bar prep program fully. The boxplot showed that students who passed had higher bar prep completion rates. The school could check in with students midway through bar prep season to make sure they are on track.

Complete Code

df<-read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/BarData_2025.csv")
str(df)

## 'data.frame':    600 obs. of  28 variables:
##  $ Year                       : int  2021 2021 2021 2021 2021 2021 2021 2021 2021 2021 ...
##  $ PassFail                   : chr  "F" "F" "F" "F" ...
##  $ Age                        : num  29.1 29.6 29 36.2 28.9 30.8 29.1 42.9 28.3 27.1 ...
##  $ LSAT                       : int  152 155 157 156 145 154 149 160 152 150 ...
##  $ UGPA                       : chr  "3.42" "2.82" "3.46" "3.13" ...
##  $ CivPro                     : chr  "B+" "B+" "C" "D+" ...
##  $ LPI                        : chr  "A" "B" "B" "C" ...
##  $ LPII                       : chr  "A" "B" "B" "C+" ...
##  $ GPA_1L                     : num  3.21 2.43 2.62 2.27 2.29 ...
##  $ GPA_Final                  : num  3.29 3.2 2.91 2.77 2.9 2.82 3 3.09 3.21 2.74 ...
##  $ FinalRankPercentile        : num  0.46 0.33 0.08 0.02 0.08 0.05 0.15 0.22 0.34 0.01 ...
##  $ Accommodations             : chr  "N" "Y" "N" "N" ...
##  $ Probation                  : chr  "N" "Y" "N" "Y" ...
##  $ LegalAnalysis_TexasPractice: chr  "Y" "Y" "Y" "Y" ...
##  $ AdvLegalPerfSkills         : chr  "Y" "Y" "Y" "Y" ...
##  $ AdvLegalAnalysis           : chr  "Y" "Y" "Y" "Y" ...
##  $ BarPrepCompany             : chr  "Barbri" "Barbri" "Barbri" "Barbri" ...
##  $ BarPrepCompletion          : num  0.96 0.98 0.48 1 0.77 0.02 0.9 0.76 0.77 0.88 ...
##  $ OptIntoWritingGuide        : chr  "" "" "" "" ...
##  $ X.LawSchoolBarPrepWorkshops: int  3 0 3 0 5 1 5 5 1 5 ...
##  $ StudentSuccessInitiative   : chr  "N" "Cochran" "Smith" "Baldwin" ...
##  $ BarPrepMentor              : chr  "N" "N" "N" "N" ...
##  $ MPRE                       : num  103 76 99 81 99 NA 90 97 100 78 ...
##  $ MPT                        : num  3 3 3 2.5 3.5 3 2.5 2.5 3 2.5 ...
##  $ MEE                        : num  2.67 3.17 2.67 3 2.67 2 3.5 3 2.67 3.83 ...
##  $ WrittenScaledScore         : num  126 133 126 126 130 ...
##  $ MBE                        : num  133 133 118 140 125 ...
##  $ UBE                        : num  259 266 244 266 256 ...

df$UGPA<-as.numeric(df$UGPA)

## Warning: NAs introduced by coercion

df$PassFail<-factor(df$PassFail, levels=c("F", "P"))

grademap<-c("A"=4.0, "A-"=3.7, "B+"=3.3, "B"=3.0, "B-"=2.7,"C+"=2.3, "C"=2.0, "C-"=1.7, "D+"=1.3, "D"=1.0, "D-"=.07,"F"=0)

df$CivPro_Num<-grademap[df$CivPro]
df$LPI_Num<-grademap[df$LPI]
df$LPII_Num<-grademap[df$LPII]

df$BarPrepCompletion<-as.numeric(df$BarPrepCompletion)

head(df)

##   Year PassFail  Age LSAT UGPA CivPro LPI LPII GPA_1L GPA_Final
## 1 2021        F 29.1  152 3.42     B+   A    A  3.206      3.29
## 2 2021        F 29.6  155 2.82     B+   B    B  2.431      3.20
## 3 2021        F 29.0  157 3.46      C   B    B  2.620      2.91
## 4 2021        F 36.2  156 3.13     D+   C   C+  2.275      2.77
## 5 2021        F 28.9  145 3.49      C  C+   C+  2.293      2.90
## 6 2021        F 30.8  154 2.85     B+   F   CR  2.538      2.82
##   FinalRankPercentile Accommodations Probation LegalAnalysis_TexasPractice
## 1                0.46              N         N                           Y
## 2                0.33              Y         Y                           Y
## 3                0.08              N         N                           Y
## 4                0.02              N         Y                           Y
## 5                0.08              N         Y                           Y
## 6                0.05              N         N                           Y
##   AdvLegalPerfSkills AdvLegalAnalysis BarPrepCompany BarPrepCompletion
## 1                  Y                Y         Barbri              0.96
## 2                  Y                Y         Barbri              0.98
## 3                  Y                Y         Barbri              0.48
## 4                  Y                Y         Barbri              1.00
## 5                  Y                Y         Themis              0.77
## 6                  Y                Y         Themis              0.02
##   OptIntoWritingGuide X.LawSchoolBarPrepWorkshops StudentSuccessInitiative
## 1                                               3                        N
## 2                                               0                  Cochran
## 3                                               3                    Smith
## 4                                               0                  Baldwin
## 5                                               5                  Baldwin
## 6                                               1                    Rosen
##   BarPrepMentor MPRE MPT  MEE WrittenScaledScore   MBE   UBE CivPro_Num LPI_Num
## 1             N  103 3.0 2.67              125.5 133.3 258.8        3.3     4.0
## 2             N   76 3.0 3.17              133.1 132.7 265.8        3.3     3.0
## 3             N   99 3.0 2.67              125.5 118.2 243.7        2.0     3.0
## 4             N   81 2.5 3.00              125.5 140.1 265.6        1.3     2.0
## 5             N   99 3.5 2.67              130.5 125.4 255.9        2.0     2.3
## 6             N   NA 3.0 2.00              115.4 113.5 228.9        3.3     0.0
##   LPII_Num
## 1      4.0
## 2      3.0
## 3      3.0
## 4      2.3
## 5      2.3
## 6       NA

colnames(df)

##  [1] "Year"                        "PassFail"                   
##  [3] "Age"                         "LSAT"                       
##  [5] "UGPA"                        "CivPro"                     
##  [7] "LPI"                         "LPII"                       
##  [9] "GPA_1L"                      "GPA_Final"                  
## [11] "FinalRankPercentile"         "Accommodations"             
## [13] "Probation"                   "LegalAnalysis_TexasPractice"
## [15] "AdvLegalPerfSkills"          "AdvLegalAnalysis"           
## [17] "BarPrepCompany"              "BarPrepCompletion"          
## [19] "OptIntoWritingGuide"         "X.LawSchoolBarPrepWorkshops"
## [21] "StudentSuccessInitiative"    "BarPrepMentor"              
## [23] "MPRE"                        "MPT"                        
## [25] "MEE"                         "WrittenScaledScore"         
## [27] "MBE"                         "UBE"                        
## [29] "CivPro_Num"                  "LPI_Num"                    
## [31] "LPII_Num"

summary(df[, c("UBE", "LSAT", "UGPA", "GPA_Final", "BarPrepCompletion")])

##       UBE             LSAT            UGPA         GPA_Final    
##  Min.   :227.3   Min.   :141.0   Min.   :2.010   Min.   :2.440  
##  1st Qu.:280.4   1st Qu.:153.0   1st Qu.:3.280   1st Qu.:3.050  
##  Median :295.3   Median :156.0   Median :3.540   Median :3.263  
##  Mean   :294.7   Mean   :155.6   Mean   :3.478   Mean   :3.275  
##  3rd Qu.:309.6   3rd Qu.:158.0   3rd Qu.:3.740   3rd Qu.:3.500  
##  Max.   :358.7   Max.   :171.0   Max.   :4.140   Max.   :3.990  
##                                  NA's   :1                      
##  BarPrepCompletion
##  Min.   :0.000    
##  1st Qu.:0.800    
##  Median :0.900    
##  Mean   :0.865    
##  3rd Qu.:0.980    
##  Max.   :1.000    
##  NA's   :26

table(df$PassFail)

## 
##   F   P 
##  61 539

df_pass<-subset(df, PassFail == "P")
df_fail<-subset(df, PassFail == "F")

boxplot(df_pass$UBE, df_fail$UBE, main = "Side-by-side Boxplot of UBE Scores by Pass/Fail", xlab = "Pass/Fail of Exam", ylab = "UBE Score", names = c("Pass", "Fail"), col = c("green", "red"))

plot(df$LSAT, df$UBE)
abline(lm(df$UBE~df$LSAT))

plot(df$UGPA, df$UBE)
abline(lm(df$UBE~df$UGPA))

plot(df$GPA_Final, df$UBE)
abline(lm(df$UBE~df$GPA_Final))

model1<-lm(UBE~LSAT*UGPA, data=df)
summary(model1)

## 
## Call:
## lm(formula = UBE ~ LSAT * UGPA, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -66.392 -13.162   0.862  14.074  53.757 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -618.7953   349.9206  -1.768   0.0775 .
## LSAT           5.5922     2.2303   2.507   0.0124 *
## UGPA         188.4714    97.5901   1.931   0.0539 .
## LSAT:UGPA     -1.1317     0.6223  -1.819   0.0695 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20.37 on 595 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.103,  Adjusted R-squared:  0.09845 
## F-statistic: 22.77 on 3 and 595 DF,  p-value: 5.777e-14

plot(model1, 1)

plot(model1, 2)

model2<-lm(UBE~LSAT+UGPA, data=df)
summary(model2)

## 
## Call:
## lm(formula = UBE ~ LSAT + UGPA, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -66.221 -13.466   1.022  14.406  54.180 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  14.1607    36.1339   0.392    0.695    
## LSAT          1.5557     0.2183   7.125 3.02e-12 ***
## UGPA         11.0483     2.2807   4.844 1.62e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20.41 on 596 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.09799,    Adjusted R-squared:  0.09496 
## F-statistic: 32.37 on 2 and 596 DF,  p-value: 4.499e-14

df_m3<-df[complete.cases(df[, c("UBE", "GPA_Final")]), ]
model3<-lm(UBE~GPA_Final, data=df_m3)
summary(model3)

## 
## Call:
## lm(formula = UBE ~ GPA_Final, data = df_m3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -63.100 -11.370  -0.115  11.630  51.048 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  168.344      7.399   22.75   <2e-16 ***
## GPA_Final     38.585      2.249   17.16   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.56 on 598 degrees of freedom
## Multiple R-squared:  0.3299, Adjusted R-squared:  0.3288 
## F-statistic: 294.5 on 1 and 598 DF,  p-value: < 2.2e-16

plot(model3, 1)

plot(model3, 2)

model4<-glm(PassFail~LSAT*UGPA, data=df, family=binomial)
anova(model4, test="Chisq")

## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: PassFail
## 
## Terms added sequentially (first to last)
## 
## 
##           Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                        598     394.26              
## LSAT       1  17.3830       597     376.88 3.056e-05 ***
## UGPA       1   6.6384       596     370.24   0.00998 ** 
## LSAT:UGPA  1   0.1542       595     370.09   0.69455    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

model5<-glm(PassFail~LSAT+UGPA, data=df, family=binomial)
summary(model5)

## 
## Call:
## glm(formula = PassFail ~ LSAT + UGPA, family = binomial, data = df)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -27.32850    6.24420  -4.377 1.21e-05 ***
## LSAT          0.16872    0.03717   4.539 5.66e-06 ***
## UGPA          0.98478    0.37537   2.624   0.0087 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 394.26  on 598  degrees of freedom
## Residual deviance: 370.24  on 596  degrees of freedom
##   (1 observation deleted due to missingness)
## AIC: 376.24
## 
## Number of Fisher Scoring iterations: 5

anova(model5, test="Chisq")

## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: PassFail
## 
## Terms added sequentially (first to last)
## 
## 
##      Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                   598     394.26              
## LSAT  1  17.3830       597     376.88 3.056e-05 ***
## UGPA  1   6.6384       596     370.24   0.00998 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

null_dev<-model5$null.deviance
resid_dev<-model5$deviance

1-pchisq(null_dev-resid_dev, df = 2)

## [1] 6.0788e-06

1-resid_dev/null_dev

## [1] 0.06092768