I chose the UBE score as main response variable because it is a continuous measure. Also i modeled Pass Fail using logistic regression because that determines whether a student gets licensed.
Moreover, as the UBE can be calculate directly from the scores i did not use the other functions.
Hypotheses As per my understanding LSAT and UGPA will predict UBE score because they measure academic ability before law school.
Also the final law school GPA will predict UBE better than admission scores because it reflect all 3 years of law school.
I also expect that students who completed more of their bar prep program will have a higher chance of passing because they spent more time preparing.
df<-read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/BarData_2025.csv")
str(df)
## 'data.frame': 600 obs. of 28 variables:
## $ Year : int 2021 2021 2021 2021 2021 2021 2021 2021 2021 2021 ...
## $ PassFail : chr "F" "F" "F" "F" ...
## $ Age : num 29.1 29.6 29 36.2 28.9 30.8 29.1 42.9 28.3 27.1 ...
## $ LSAT : int 152 155 157 156 145 154 149 160 152 150 ...
## $ UGPA : chr "3.42" "2.82" "3.46" "3.13" ...
## $ CivPro : chr "B+" "B+" "C" "D+" ...
## $ LPI : chr "A" "B" "B" "C" ...
## $ LPII : chr "A" "B" "B" "C+" ...
## $ GPA_1L : num 3.21 2.43 2.62 2.27 2.29 ...
## $ GPA_Final : num 3.29 3.2 2.91 2.77 2.9 2.82 3 3.09 3.21 2.74 ...
## $ FinalRankPercentile : num 0.46 0.33 0.08 0.02 0.08 0.05 0.15 0.22 0.34 0.01 ...
## $ Accommodations : chr "N" "Y" "N" "N" ...
## $ Probation : chr "N" "Y" "N" "Y" ...
## $ LegalAnalysis_TexasPractice: chr "Y" "Y" "Y" "Y" ...
## $ AdvLegalPerfSkills : chr "Y" "Y" "Y" "Y" ...
## $ AdvLegalAnalysis : chr "Y" "Y" "Y" "Y" ...
## $ BarPrepCompany : chr "Barbri" "Barbri" "Barbri" "Barbri" ...
## $ BarPrepCompletion : num 0.96 0.98 0.48 1 0.77 0.02 0.9 0.76 0.77 0.88 ...
## $ OptIntoWritingGuide : chr "" "" "" "" ...
## $ X.LawSchoolBarPrepWorkshops: int 3 0 3 0 5 1 5 5 1 5 ...
## $ StudentSuccessInitiative : chr "N" "Cochran" "Smith" "Baldwin" ...
## $ BarPrepMentor : chr "N" "N" "N" "N" ...
## $ MPRE : num 103 76 99 81 99 NA 90 97 100 78 ...
## $ MPT : num 3 3 3 2.5 3.5 3 2.5 2.5 3 2.5 ...
## $ MEE : num 2.67 3.17 2.67 3 2.67 2 3.5 3 2.67 3.83 ...
## $ WrittenScaledScore : num 126 133 126 126 130 ...
## $ MBE : num 133 133 118 140 125 ...
## $ UBE : num 259 266 244 266 256 ...
df$UGPA<-as.numeric(df$UGPA)
## Warning: NAs introduced by coercion
df$PassFail<-factor(df$PassFail, levels = c("F", "P"))
grademap<-c("A"=4.0, "A-"=3.7, "B+"=3.3, "B"=3.0, "B-"=2.7,"C+"=2.3, "C"=2.0, "C-"=1.7, "D+"=1.3, "D"=1.0, "D-"=.07, "F"=0)
df$CivPro_Num<-grademap[df$CivPro]
df$LPI_Num<-grademap[df$LPI]
df$LPII_Num<-grademap[df$LPII]
df$BarPrepCompletion<-as.numeric(df$BarPrepCompletion)
head(df)
## Year PassFail Age LSAT UGPA CivPro LPI LPII GPA_1L GPA_Final
## 1 2021 F 29.1 152 3.42 B+ A A 3.206 3.29
## 2 2021 F 29.6 155 2.82 B+ B B 2.431 3.20
## 3 2021 F 29.0 157 3.46 C B B 2.620 2.91
## 4 2021 F 36.2 156 3.13 D+ C C+ 2.275 2.77
## 5 2021 F 28.9 145 3.49 C C+ C+ 2.293 2.90
## 6 2021 F 30.8 154 2.85 B+ F CR 2.538 2.82
## FinalRankPercentile Accommodations Probation LegalAnalysis_TexasPractice
## 1 0.46 N N Y
## 2 0.33 Y Y Y
## 3 0.08 N N Y
## 4 0.02 N Y Y
## 5 0.08 N Y Y
## 6 0.05 N N Y
## AdvLegalPerfSkills AdvLegalAnalysis BarPrepCompany BarPrepCompletion
## 1 Y Y Barbri 0.96
## 2 Y Y Barbri 0.98
## 3 Y Y Barbri 0.48
## 4 Y Y Barbri 1.00
## 5 Y Y Themis 0.77
## 6 Y Y Themis 0.02
## OptIntoWritingGuide X.LawSchoolBarPrepWorkshops StudentSuccessInitiative
## 1 3 N
## 2 0 Cochran
## 3 3 Smith
## 4 0 Baldwin
## 5 5 Baldwin
## 6 1 Rosen
## BarPrepMentor MPRE MPT MEE WrittenScaledScore MBE UBE CivPro_Num LPI_Num
## 1 N 103 3.0 2.67 125.5 133.3 258.8 3.3 4.0
## 2 N 76 3.0 3.17 133.1 132.7 265.8 3.3 3.0
## 3 N 99 3.0 2.67 125.5 118.2 243.7 2.0 3.0
## 4 N 81 2.5 3.00 125.5 140.1 265.6 1.3 2.0
## 5 N 99 3.5 2.67 130.5 125.4 255.9 2.0 2.3
## 6 N NA 3.0 2.00 115.4 113.5 228.9 3.3 0.0
## LPII_Num
## 1 4.0
## 2 3.0
## 3 3.0
## 4 2.3
## 5 2.3
## 6 NA
colnames(df)
## [1] "Year" "PassFail"
## [3] "Age" "LSAT"
## [5] "UGPA" "CivPro"
## [7] "LPI" "LPII"
## [9] "GPA_1L" "GPA_Final"
## [11] "FinalRankPercentile" "Accommodations"
## [13] "Probation" "LegalAnalysis_TexasPractice"
## [15] "AdvLegalPerfSkills" "AdvLegalAnalysis"
## [17] "BarPrepCompany" "BarPrepCompletion"
## [19] "OptIntoWritingGuide" "X.LawSchoolBarPrepWorkshops"
## [21] "StudentSuccessInitiative" "BarPrepMentor"
## [23] "MPRE" "MPT"
## [25] "MEE" "WrittenScaledScore"
## [27] "MBE" "UBE"
## [29] "CivPro_Num" "LPI_Num"
## [31] "LPII_Num"
I converted letter grades to numbers using a grade map so they can be used in regression. UGPA and BarPrepCompletion were also converted to numeric.
summary(df[, c("UBE", "LSAT", "UGPA", "GPA_Final", "BarPrepCompletion")])
## UBE LSAT UGPA GPA_Final
## Min. :227.3 Min. :141.0 Min. :2.010 Min. :2.440
## 1st Qu.:280.4 1st Qu.:153.0 1st Qu.:3.280 1st Qu.:3.050
## Median :295.3 Median :156.0 Median :3.540 Median :3.263
## Mean :294.7 Mean :155.6 Mean :3.478 Mean :3.275
## 3rd Qu.:309.6 3rd Qu.:158.0 3rd Qu.:3.740 3rd Qu.:3.500
## Max. :358.7 Max. :171.0 Max. :4.140 Max. :3.990
## NA's :1
## BarPrepCompletion
## Min. :0.000
## 1st Qu.:0.800
## Median :0.900
## Mean :0.865
## 3rd Qu.:0.980
## Max. :1.000
## NA's :26
Here I get the overview of the minimum, maximum, and average value.
Pass/Fail count
table(df$PassFail)
##
## F P
## 61 539
df_pass<-subset(df, PassFail == "P")
df_fail<-subset(df, PassFail == "F")
boxplot(df_pass$UBE, df_fail$UBE, main="Side-by-side Boxplot of UBE Scores by Pass/Fail",xlab= "Pass/Fail of Exam", ylab="UBE Score", names=c("Pass","Fail"), col=c("green", "red"))
Here we can see clear difference in the UBE scores between students who pass and fail.
Scatter plots
plot(df$LSAT, df$UBE)
abline(lm(df$UBE~df$LSAT))
plot(df$UGPA, df$UBE)
abline(lm(df$UBE~df$UGPA))
plot(df$GPA_Final, df$UBE)
abline(lm(df$UBE~df$GPA_Final))
I hypothesize that LSAT and UGPA together predict UBE score. I first tested whether the two variables interact with each other.
model1<-lm(UBE~LSAT*UGPA, data=df)
summary(model1)
##
## Call:
## lm(formula = UBE ~ LSAT * UGPA, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -66.392 -13.162 0.862 14.074 53.757
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -618.7953 349.9206 -1.768 0.0775 .
## LSAT 5.5922 2.2303 2.507 0.0124 *
## UGPA 188.4714 97.5901 1.931 0.0539 .
## LSAT:UGPA -1.1317 0.6223 -1.819 0.0695 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 20.37 on 595 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.103, Adjusted R-squared: 0.09845
## F-statistic: 22.77 on 3 and 595 DF, p-value: 5.777e-14
plot(model1, 1)
plot(model1, 2)
The interaction term is not significant so i will try the simpler model without interaction.
model2<-lm(UBE~LSAT+UGPA, data=df)
summary(model2)
##
## Call:
## lm(formula = UBE ~ LSAT + UGPA, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -66.221 -13.466 1.022 14.406 54.180
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 14.1607 36.1339 0.392 0.695
## LSAT 1.5557 0.2183 7.125 3.02e-12 ***
## UGPA 11.0483 2.2807 4.844 1.62e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 20.41 on 596 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.09799, Adjusted R-squared: 0.09496
## F-statistic: 32.37 on 2 and 596 DF, p-value: 4.499e-14
Both LSAT and UGPA are significant predictors of UBE score.
We can see R-squared shows how much variation in UBE score. these support my first hypothesis.
I want to see if final law school GPA predicts bar performance better than admission scores since it covers all three years of law school.
df_m3<-df[complete.cases(df[, c("UBE", "GPA_Final")]), ]
model3<-lm(UBE~GPA_Final, data=df_m3)
summary(model3)
##
## Call:
## lm(formula = UBE ~ GPA_Final, data = df_m3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -63.100 -11.370 -0.115 11.630 51.048
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 168.344 7.399 22.75 <2e-16 ***
## GPA_Final 38.585 2.249 17.16 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.56 on 598 degrees of freedom
## Multiple R-squared: 0.3299, Adjusted R-squared: 0.3288
## F-statistic: 294.5 on 1 and 598 DF, p-value: < 2.2e-16
plot(model3, 1)
plot(model3, 2)
GPA final is significant. The R-squared is higher than Model 2 which means final law school GPA explains more variation in UBE score than admission scores alone. This supports my second hypothesis.
Here I use logistic regression to predict whether a student passes or fails directly. I first test the interaction model.
model4<-glm(PassFail~LSAT*UGPA, data=df, family=binomial)
anova(model4, test="Chisq")
## Analysis of Deviance Table
##
## Model: binomial, link: logit
##
## Response: PassFail
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL 598 394.26
## LSAT 1 17.3830 597 376.88 3.056e-05 ***
## UGPA 1 6.6384 596 370.24 0.00998 **
## LSAT:UGPA 1 0.1542 595 370.09 0.69455
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The interaction term is not significant. So lets try simpler model.
model5<-glm(PassFail~LSAT+UGPA, data=df, family=binomial)
summary(model5)
##
## Call:
## glm(formula = PassFail ~ LSAT + UGPA, family = binomial, data = df)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -27.32850 6.24420 -4.377 1.21e-05 ***
## LSAT 0.16872 0.03717 4.539 5.66e-06 ***
## UGPA 0.98478 0.37537 2.624 0.0087 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 394.26 on 598 degrees of freedom
## Residual deviance: 370.24 on 596 degrees of freedom
## (1 observation deleted due to missingness)
## AIC: 376.24
##
## Number of Fisher Scoring iterations: 5
anova(model5, test = "Chisq")
## Analysis of Deviance Table
##
## Model: binomial, link: logit
##
## Response: PassFail
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL 598 394.26
## LSAT 1 17.3830 597 376.88 3.056e-05 ***
## UGPA 1 6.6384 596 370.24 0.00998 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
null_dev<-model5$null.deviance
resid_dev<-model5$deviance
1-pchisq(null_dev-resid_dev, df=2)
## [1] 6.0788e-06
1-resid_dev/null_dev
## [1] 0.06092768
The chi-square p-value shows the model is statistically significant overall. The pseudo R-squared tells me how much better this model predicts pass/fail compared to guessing. Both LSAT and UGPA are significant predictors of passing the bar exam.
Models 1 and 2 tested whether admission criteria predict UBE score. The interaction between LSAT and UGPA was not significant so the simpler additive model is better. Both LSAT and UGPA are positive predictors which supports my first hypothesis.
Model 3 showed that final GPA is a stronger predictor of UBE score with a higher R-squared than Model 2. This supports my second hypothesis. What a student does during law school matters more than what they came in with.
Models 4 and 5 used logistic regression to predict pass/fail. LSAT and UGPA are both significant. The pseudo R-squared is small which tells me admission criteria alone are not enough to fully explain who passes.
Students with lower LSAT scores should be connected to academic support early in 1 year. Models 2 and 5 both show LSAT is a significant predictor of bar performance. This is observational so we cannot say low LSAT means failure but it is early warning sign.
The school should monitor final GPA throughout the program. Model 3 shows GPA Final is the strongest predictor of UBE score among the variables I tested. Students below a certain GPA could be automatically enrolled in a bar prep support program before graduation.
Students should be encouraged to complete their bar prep program fully. The boxplot showed that students who passed had higher bar prep completion rates. The school could check in with students midway through bar prep season to make sure they are on track.
df<-read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/BarData_2025.csv")
str(df)
## 'data.frame': 600 obs. of 28 variables:
## $ Year : int 2021 2021 2021 2021 2021 2021 2021 2021 2021 2021 ...
## $ PassFail : chr "F" "F" "F" "F" ...
## $ Age : num 29.1 29.6 29 36.2 28.9 30.8 29.1 42.9 28.3 27.1 ...
## $ LSAT : int 152 155 157 156 145 154 149 160 152 150 ...
## $ UGPA : chr "3.42" "2.82" "3.46" "3.13" ...
## $ CivPro : chr "B+" "B+" "C" "D+" ...
## $ LPI : chr "A" "B" "B" "C" ...
## $ LPII : chr "A" "B" "B" "C+" ...
## $ GPA_1L : num 3.21 2.43 2.62 2.27 2.29 ...
## $ GPA_Final : num 3.29 3.2 2.91 2.77 2.9 2.82 3 3.09 3.21 2.74 ...
## $ FinalRankPercentile : num 0.46 0.33 0.08 0.02 0.08 0.05 0.15 0.22 0.34 0.01 ...
## $ Accommodations : chr "N" "Y" "N" "N" ...
## $ Probation : chr "N" "Y" "N" "Y" ...
## $ LegalAnalysis_TexasPractice: chr "Y" "Y" "Y" "Y" ...
## $ AdvLegalPerfSkills : chr "Y" "Y" "Y" "Y" ...
## $ AdvLegalAnalysis : chr "Y" "Y" "Y" "Y" ...
## $ BarPrepCompany : chr "Barbri" "Barbri" "Barbri" "Barbri" ...
## $ BarPrepCompletion : num 0.96 0.98 0.48 1 0.77 0.02 0.9 0.76 0.77 0.88 ...
## $ OptIntoWritingGuide : chr "" "" "" "" ...
## $ X.LawSchoolBarPrepWorkshops: int 3 0 3 0 5 1 5 5 1 5 ...
## $ StudentSuccessInitiative : chr "N" "Cochran" "Smith" "Baldwin" ...
## $ BarPrepMentor : chr "N" "N" "N" "N" ...
## $ MPRE : num 103 76 99 81 99 NA 90 97 100 78 ...
## $ MPT : num 3 3 3 2.5 3.5 3 2.5 2.5 3 2.5 ...
## $ MEE : num 2.67 3.17 2.67 3 2.67 2 3.5 3 2.67 3.83 ...
## $ WrittenScaledScore : num 126 133 126 126 130 ...
## $ MBE : num 133 133 118 140 125 ...
## $ UBE : num 259 266 244 266 256 ...
df$UGPA<-as.numeric(df$UGPA)
## Warning: NAs introduced by coercion
df$PassFail<-factor(df$PassFail, levels=c("F", "P"))
grademap<-c("A"=4.0, "A-"=3.7, "B+"=3.3, "B"=3.0, "B-"=2.7,"C+"=2.3, "C"=2.0, "C-"=1.7, "D+"=1.3, "D"=1.0, "D-"=.07,"F"=0)
df$CivPro_Num<-grademap[df$CivPro]
df$LPI_Num<-grademap[df$LPI]
df$LPII_Num<-grademap[df$LPII]
df$BarPrepCompletion<-as.numeric(df$BarPrepCompletion)
head(df)
## Year PassFail Age LSAT UGPA CivPro LPI LPII GPA_1L GPA_Final
## 1 2021 F 29.1 152 3.42 B+ A A 3.206 3.29
## 2 2021 F 29.6 155 2.82 B+ B B 2.431 3.20
## 3 2021 F 29.0 157 3.46 C B B 2.620 2.91
## 4 2021 F 36.2 156 3.13 D+ C C+ 2.275 2.77
## 5 2021 F 28.9 145 3.49 C C+ C+ 2.293 2.90
## 6 2021 F 30.8 154 2.85 B+ F CR 2.538 2.82
## FinalRankPercentile Accommodations Probation LegalAnalysis_TexasPractice
## 1 0.46 N N Y
## 2 0.33 Y Y Y
## 3 0.08 N N Y
## 4 0.02 N Y Y
## 5 0.08 N Y Y
## 6 0.05 N N Y
## AdvLegalPerfSkills AdvLegalAnalysis BarPrepCompany BarPrepCompletion
## 1 Y Y Barbri 0.96
## 2 Y Y Barbri 0.98
## 3 Y Y Barbri 0.48
## 4 Y Y Barbri 1.00
## 5 Y Y Themis 0.77
## 6 Y Y Themis 0.02
## OptIntoWritingGuide X.LawSchoolBarPrepWorkshops StudentSuccessInitiative
## 1 3 N
## 2 0 Cochran
## 3 3 Smith
## 4 0 Baldwin
## 5 5 Baldwin
## 6 1 Rosen
## BarPrepMentor MPRE MPT MEE WrittenScaledScore MBE UBE CivPro_Num LPI_Num
## 1 N 103 3.0 2.67 125.5 133.3 258.8 3.3 4.0
## 2 N 76 3.0 3.17 133.1 132.7 265.8 3.3 3.0
## 3 N 99 3.0 2.67 125.5 118.2 243.7 2.0 3.0
## 4 N 81 2.5 3.00 125.5 140.1 265.6 1.3 2.0
## 5 N 99 3.5 2.67 130.5 125.4 255.9 2.0 2.3
## 6 N NA 3.0 2.00 115.4 113.5 228.9 3.3 0.0
## LPII_Num
## 1 4.0
## 2 3.0
## 3 3.0
## 4 2.3
## 5 2.3
## 6 NA
colnames(df)
## [1] "Year" "PassFail"
## [3] "Age" "LSAT"
## [5] "UGPA" "CivPro"
## [7] "LPI" "LPII"
## [9] "GPA_1L" "GPA_Final"
## [11] "FinalRankPercentile" "Accommodations"
## [13] "Probation" "LegalAnalysis_TexasPractice"
## [15] "AdvLegalPerfSkills" "AdvLegalAnalysis"
## [17] "BarPrepCompany" "BarPrepCompletion"
## [19] "OptIntoWritingGuide" "X.LawSchoolBarPrepWorkshops"
## [21] "StudentSuccessInitiative" "BarPrepMentor"
## [23] "MPRE" "MPT"
## [25] "MEE" "WrittenScaledScore"
## [27] "MBE" "UBE"
## [29] "CivPro_Num" "LPI_Num"
## [31] "LPII_Num"
summary(df[, c("UBE", "LSAT", "UGPA", "GPA_Final", "BarPrepCompletion")])
## UBE LSAT UGPA GPA_Final
## Min. :227.3 Min. :141.0 Min. :2.010 Min. :2.440
## 1st Qu.:280.4 1st Qu.:153.0 1st Qu.:3.280 1st Qu.:3.050
## Median :295.3 Median :156.0 Median :3.540 Median :3.263
## Mean :294.7 Mean :155.6 Mean :3.478 Mean :3.275
## 3rd Qu.:309.6 3rd Qu.:158.0 3rd Qu.:3.740 3rd Qu.:3.500
## Max. :358.7 Max. :171.0 Max. :4.140 Max. :3.990
## NA's :1
## BarPrepCompletion
## Min. :0.000
## 1st Qu.:0.800
## Median :0.900
## Mean :0.865
## 3rd Qu.:0.980
## Max. :1.000
## NA's :26
table(df$PassFail)
##
## F P
## 61 539
df_pass<-subset(df, PassFail == "P")
df_fail<-subset(df, PassFail == "F")
boxplot(df_pass$UBE, df_fail$UBE, main = "Side-by-side Boxplot of UBE Scores by Pass/Fail", xlab = "Pass/Fail of Exam", ylab = "UBE Score", names = c("Pass", "Fail"), col = c("green", "red"))
plot(df$LSAT, df$UBE)
abline(lm(df$UBE~df$LSAT))
plot(df$UGPA, df$UBE)
abline(lm(df$UBE~df$UGPA))
plot(df$GPA_Final, df$UBE)
abline(lm(df$UBE~df$GPA_Final))
model1<-lm(UBE~LSAT*UGPA, data=df)
summary(model1)
##
## Call:
## lm(formula = UBE ~ LSAT * UGPA, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -66.392 -13.162 0.862 14.074 53.757
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -618.7953 349.9206 -1.768 0.0775 .
## LSAT 5.5922 2.2303 2.507 0.0124 *
## UGPA 188.4714 97.5901 1.931 0.0539 .
## LSAT:UGPA -1.1317 0.6223 -1.819 0.0695 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 20.37 on 595 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.103, Adjusted R-squared: 0.09845
## F-statistic: 22.77 on 3 and 595 DF, p-value: 5.777e-14
plot(model1, 1)
plot(model1, 2)
model2<-lm(UBE~LSAT+UGPA, data=df)
summary(model2)
##
## Call:
## lm(formula = UBE ~ LSAT + UGPA, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -66.221 -13.466 1.022 14.406 54.180
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 14.1607 36.1339 0.392 0.695
## LSAT 1.5557 0.2183 7.125 3.02e-12 ***
## UGPA 11.0483 2.2807 4.844 1.62e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 20.41 on 596 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.09799, Adjusted R-squared: 0.09496
## F-statistic: 32.37 on 2 and 596 DF, p-value: 4.499e-14
df_m3<-df[complete.cases(df[, c("UBE", "GPA_Final")]), ]
model3<-lm(UBE~GPA_Final, data=df_m3)
summary(model3)
##
## Call:
## lm(formula = UBE ~ GPA_Final, data = df_m3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -63.100 -11.370 -0.115 11.630 51.048
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 168.344 7.399 22.75 <2e-16 ***
## GPA_Final 38.585 2.249 17.16 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.56 on 598 degrees of freedom
## Multiple R-squared: 0.3299, Adjusted R-squared: 0.3288
## F-statistic: 294.5 on 1 and 598 DF, p-value: < 2.2e-16
plot(model3, 1)
plot(model3, 2)
model4<-glm(PassFail~LSAT*UGPA, data=df, family=binomial)
anova(model4, test="Chisq")
## Analysis of Deviance Table
##
## Model: binomial, link: logit
##
## Response: PassFail
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL 598 394.26
## LSAT 1 17.3830 597 376.88 3.056e-05 ***
## UGPA 1 6.6384 596 370.24 0.00998 **
## LSAT:UGPA 1 0.1542 595 370.09 0.69455
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model5<-glm(PassFail~LSAT+UGPA, data=df, family=binomial)
summary(model5)
##
## Call:
## glm(formula = PassFail ~ LSAT + UGPA, family = binomial, data = df)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -27.32850 6.24420 -4.377 1.21e-05 ***
## LSAT 0.16872 0.03717 4.539 5.66e-06 ***
## UGPA 0.98478 0.37537 2.624 0.0087 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 394.26 on 598 degrees of freedom
## Residual deviance: 370.24 on 596 degrees of freedom
## (1 observation deleted due to missingness)
## AIC: 376.24
##
## Number of Fisher Scoring iterations: 5
anova(model5, test="Chisq")
## Analysis of Deviance Table
##
## Model: binomial, link: logit
##
## Response: PassFail
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL 598 394.26
## LSAT 1 17.3830 597 376.88 3.056e-05 ***
## UGPA 1 6.6384 596 370.24 0.00998 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
null_dev<-model5$null.deviance
resid_dev<-model5$deviance
1-pchisq(null_dev-resid_dev, df = 2)
## [1] 6.0788e-06
1-resid_dev/null_dev
## [1] 0.06092768