This report presents a logistic regression analysis predicting bar passage (PassFail) using data from a law school. The analysis includes data cleaning, model building, interpretation of significant predictors, visualizations, model diagnostics, and evidence-based recommendations.
#install.packages("caret", dependencies = TRUE, type = "binary")
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.4.3
## Warning: package 'ggplot2' was built under R version 4.4.3
## Warning: package 'tibble' was built under R version 4.4.3
## Warning: package 'tidyr' was built under R version 4.4.3
## Warning: package 'readr' was built under R version 4.4.3
## Warning: package 'purrr' was built under R version 4.4.3
## Warning: package 'dplyr' was built under R version 4.4.3
## Warning: package 'stringr' was built under R version 4.4.3
## Warning: package 'forcats' was built under R version 4.4.3
## Warning: package 'lubridate' was built under R version 4.4.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(GGally)
## Warning: package 'GGally' was built under R version 4.4.3
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
library(caret)
## Warning: package 'caret' was built under R version 4.4.3
## Loading required package: lattice
##
## Attaching package: 'caret'
##
## The following object is masked from 'package:purrr':
##
## lift
library(MASS)
##
## Attaching package: 'MASS'
##
## The following object is masked from 'package:dplyr':
##
## select
library(pscl)
## Warning: package 'pscl' was built under R version 4.4.3
## Classes and Methods for R originally developed in the
## Political Science Computational Laboratory
## Department of Political Science
## Stanford University (2002-2015),
## by and under the direction of Simon Jackman.
## hurdle and zeroinfl functions by Achim Zeileis.
library(dplyr)
The dataset was imported from a CSV file located at the specified path. The structure of the data was examined to understand the variables and their types.
#import the dataset
data <- read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/refs/heads/main/Updated_Bar_Data_For_Review_Final.csv")
str(data)
## 'data.frame': 476 obs. of 28 variables:
## $ Year : int 2021 2021 2021 2021 2021 2021 2021 2021 2021 2021 ...
## $ PassFail : chr "F" "F" "F" "F" ...
## $ Age : num 29.1 29.6 29 36.2 28.9 30.8 29.1 42.9 28.3 27.1 ...
## $ LSAT : int 152 155 157 156 145 154 149 160 152 150 ...
## $ UGPA : num 3.42 2.82 3.46 3.13 3.49 2.85 3.43 3.29 3.62 3.07 ...
## $ CivPro : chr "B+" "B+" "C" "D+" ...
## $ LPI : chr "A" "B" "B" "C" ...
## $ LPII : chr "A" "B" "B" "C+" ...
## $ GPA_1L : num 3.21 2.43 2.62 2.27 2.29 ...
## $ GPA_Final : num 3.29 3.2 2.91 2.77 2.9 2.82 3 3.09 3.21 2.74 ...
## $ FinalRankPercentile : num 0.46 0.33 0.08 0.02 0.08 0.05 0.15 0.22 0.34 0.01 ...
## $ Accommodations : chr "N" "Y" "N" "N" ...
## $ Probation : chr "N" "Y" "N" "Y" ...
## $ LegalAnalysis_TexasPractice: chr "Y" "Y" "Y" "Y" ...
## $ AdvLegalPerfSkills : chr "Y" "Y" "Y" "Y" ...
## $ AdvLegalAnalysis : chr "Y" "Y" "Y" "Y" ...
## $ BarPrepCompany : chr "Barbri" "Barbri" "Barbri" "Barbri" ...
## $ BarPrepCompletion : num 0.96 0.98 0.48 1 0.77 0.02 0.9 0.76 0.77 0.88 ...
## $ OptIntoWritingGuide : chr "" "" "" "" ...
## $ X.LawSchoolBarPrepWorkshops: int 3 0 3 0 5 1 5 5 1 5 ...
## $ StudentSuccessInitiative : chr "N" "Cochran" "Smith" "Baldwin" ...
## $ BarPrepMentor : chr "N" "N" "N" "N" ...
## $ MPRE : num 103 76 99 81 99 NA 90 97 100 78 ...
## $ MPT : num 3 3 3 2.5 3.5 3 2.5 2.5 3 2.5 ...
## $ MEE : num 2.67 3.17 2.67 3 2.67 2 3.5 3 2.67 3.83 ...
## $ WrittenScaledScore : num 126 133 126 126 130 ...
## $ MBE : num 133 133 118 140 125 ...
## $ UBE : num 259 266 244 266 256 ...
The following variables were selected for analysis: Numeric Variables: LSAT, UGPA, GPA_1L, GPA_Final, FinalRankPercentile, BarPrepCompletion, X.LawSchoolBarPrepWorkshops Categorical Variables: Accommodations, Probation, LegalAnalysis_TexasPractice, AdvLegalPerfSkills, AdvLegalAnalysis, BarPrepCompany, StudentSuccessInitiative, BarPrepMentor, CivPro, LPI, LPII The target variable is PassFail.
# data clean
numeric_vars <- c("LSAT", "UGPA", "GPA_1L", "GPA_Final", "FinalRankPercentile", "BarPrepCompletion", "X.LawSchoolBarPrepWorkshops",
"LSAT", "UGPA")
categorical_vars <- c("Accommodations", "Probation", "LegalAnalysis_TexasPractice",
"AdvLegalPerfSkills", "AdvLegalAnalysis", "BarPrepCompany",
"StudentSuccessInitiative", "BarPrepMentor", "CivPro", "LPI", "LPII")
data <- data %>%
dplyr::select(all_of(c(numeric_vars, categorical_vars, "PassFail")))
The PassFail variable was converted to a factor to represent the binary outcome (pass/fail). Additionally, the StudentSuccessInitiative and BarPrepMentor variables were transformed into binary factors (0/1) based on their values.
# transform data
data$PassFail <- as.factor(data$PassFail)
data <- data %>%
mutate(
StudentSuccessInitiative = ifelse(toupper(StudentSuccessInitiative) %in% c("N", "NO"), 0, 1),
BarPrepMentor = ifelse(toupper(BarPrepMentor) %in% c("N", "NO"), 0, 1)
) %>%
mutate(across(c(StudentSuccessInitiative, BarPrepMentor), as.factor))
Rows with missing values were removed from the dataset to ensure data integrity for the analysis.
data_clean<-na.omit(data)
data_clean[categorical_vars] <- lapply(data_clean[categorical_vars], as.factor)
Before building logistic regression model, I performed several exploratory data analysis (EDA) steps to understand the bar exam outcomes and the factors affecting them.
The first visualization is a bar chart that shows the distribution of bar exam outcomes (Pass/Fail). It illustrates that the majority of students passed the bar exam, with approximately 400 students passing and around 50 failing. This indicates a relatively high overall pass rate but also reveals an imbalance in the dataset, which could affect the modeling process.
ggplot(data_clean, aes(x = PassFail)) +
geom_bar(fill = "steelblue") +
theme_minimal() +
labs(title = "Distribution of Bar Exam Outcomes")
The pairs plot provides a matrix of scatterplots showing the relationships between LSAT scores, UGPA, GPA_1L, GPA_Final, and PassFail. This visualization helps identify correlations between these numeric variables and their relationship with the PassFail outcome. It reveals patterns such as the positive correlation between LSAT scores and PassFail, suggesting that higher LSAT scores are associated with a higher likelihood of passing the bar.
pairs(data_clean[, c("LSAT", "UGPA", "GPA_1L", "GPA_Final", "PassFail")],
lower.panel = NULL)
A stacked bar chart illustrates the pass rates grouped by the grades received in the CivPro course. It shows that students with higher CivPro grades (such as A and B) have a higher pass rate, while those with lower grades (such as D and F) have a significantly lower pass rate. This indicates that performance in the CivPro course is a strong predictor of bar exam success.
ggplot(data_clean, aes(x = CivPro, fill = PassFail)) +
geom_bar(position = "fill") +
theme_minimal() +
labs(title = "Pass rates grouped by CivPro course grades")
The histogram displays the distribution of LSAT scores, filled by PassFail status. It shows that students with higher LSAT scores are more likely to pass the bar exam, with the pass rate increasing as LSAT scores rise. This visualization reinforces the importance of LSAT scores as a predictor of bar passage.
ggplot(data_clean, aes(x = LSAT, fill = PassFail)) +
geom_histogram(position = "fill", bins = 20) +
theme_minimal() +
labs(title = "The relationship between LSAT scores and pass rates")
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_bar()`).
The dataset was split into training and testing sets using an 80-20 split to evaluate the model’s performance.
# train data and test data
set.seed(123)
train_index <- createDataPartition(data_clean$PassFail, p = 0.8, list = FALSE)
train_data <- data_clean[train_index, ]
test_data <- data_clean[-train_index, ]
A base logistic regression model was fit using the glm() function with family = binomial. The model includes multiple predictors such as LSAT, UGPA, GPA_1L, GPA_Final, and others.
# define base formula
base_formula <- as.formula(
PassFail ~ LSAT + UGPA + GPA_1L + GPA_Final + FinalRankPercentile +
Accommodations + Probation + BarPrepCompany +
BarPrepCompletion + X.LawSchoolBarPrepWorkshops + StudentSuccessInitiative +
LegalAnalysis_TexasPractice+AdvLegalPerfSkills+AdvLegalAnalysis+
BarPrepMentor + CivPro + LPI + LPII
)
train_data$CivPro <- factor(train_data$CivPro)
levels(train_data$CivPro)
## [1] "A" "B" "B+" "C" "C+" "D" "D+" "F"
# base model
base_model <- glm(base_formula, data = train_data, family = binomial)
# model summary
summary(base_model)
##
## Call:
## glm(formula = base_formula, family = binomial, data = train_data)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -104.60414 26.82664 -3.899 9.65e-05 ***
## LSAT 0.36833 0.10132 3.635 0.000278 ***
## UGPA 1.99226 1.01718 1.959 0.050158 .
## GPA_1L 1.01417 1.87737 0.540 0.589055
## GPA_Final 10.42274 5.12917 2.032 0.042148 *
## FinalRankPercentile -7.38388 5.77946 -1.278 0.201388
## AccommodationsY -0.08459 0.76237 -0.111 0.911652
## ProbationN 3.66255 2.43391 1.505 0.132375
## ProbationY -0.31466 0.92319 -0.341 0.733225
## BarPrepCompanyHelix 20.10688 3956.18073 0.005 0.995945
## BarPrepCompanyKaplan -1.68251 1.74302 -0.965 0.334401
## BarPrepCompanyThemis 1.99871 0.72372 2.762 0.005750 **
## BarPrepCompletion 11.32612 2.50556 4.520 6.17e-06 ***
## X.LawSchoolBarPrepWorkshops -0.14763 0.14938 -0.988 0.323001
## StudentSuccessInitiative1 -0.75186 0.90683 -0.829 0.407042
## LegalAnalysis_TexasPracticeY -2.17007 0.95140 -2.281 0.022553 *
## AdvLegalPerfSkillsY 0.54675 1.03335 0.529 0.596734
## AdvLegalAnalysisY 0.06820 0.86919 0.078 0.937456
## BarPrepMentor1 -0.45159 0.78578 -0.575 0.565497
## CivProB 1.56339 1.51427 1.032 0.301866
## CivProB+ 2.18549 1.62478 1.345 0.178593
## CivProC -0.44261 1.61990 -0.273 0.784674
## CivProC+ -0.49890 1.47655 -0.338 0.735452
## CivProD -17.86884 3956.18076 -0.005 0.996396
## CivProD+ 0.15341 1.85498 0.083 0.934089
## CivProF 11.31209 3956.18100 0.003 0.997719
## LPIB 1.25940 1.15632 1.089 0.276088
## LPIB+ -0.02529 1.14724 -0.022 0.982411
## LPIC 1.33607 1.44735 0.923 0.355947
## LPIC+ 0.98988 1.17358 0.843 0.398967
## LPID 4.06440 2.97710 1.365 0.172184
## LPID+ -0.50200 2.23772 -0.224 0.822495
## LPIIB 0.22056 1.28326 0.172 0.863535
## LPIIB+ 0.85044 1.31570 0.646 0.518033
## LPIIC 2.96086 1.79341 1.651 0.098746 .
## LPIIC+ 0.82102 1.38844 0.591 0.554304
## LPIICR 0.44386 1.44942 0.306 0.759429
## LPIID 20.49104 2071.19633 0.010 0.992106
## LPIID+ 18.87357 1587.41827 0.012 0.990514
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 255.29 on 359 degrees of freedom
## Residual deviance: 111.69 on 321 degrees of freedom
## AIC: 189.69
##
## Number of Fisher Scoring iterations: 16
# correlation
cor_matrix <- cor(train_data[, numeric_vars])
print(cor_matrix)
## LSAT UGPA GPA_1L GPA_Final
## LSAT 1.00000000 -0.19714583 0.2047858 0.09983408
## UGPA -0.19714583 1.00000000 0.1982986 0.28036589
## GPA_1L 0.20478577 0.19829855 1.0000000 0.86321663
## GPA_Final 0.09983408 0.28036589 0.8632166 1.00000000
## FinalRankPercentile 0.12107579 0.28664445 0.8660633 0.98042850
## BarPrepCompletion -0.14275201 0.14829724 0.1561402 0.25454658
## X.LawSchoolBarPrepWorkshops -0.14601247 0.05345049 -0.2079524 -0.09012703
## LSAT.1 1.00000000 -0.19714583 0.2047858 0.09983408
## UGPA.1 -0.19714583 1.00000000 0.1982986 0.28036589
## FinalRankPercentile BarPrepCompletion
## LSAT 0.1210758 -0.1427520
## UGPA 0.2866445 0.1482972
## GPA_1L 0.8660633 0.1561402
## GPA_Final 0.9804285 0.2545466
## FinalRankPercentile 1.0000000 0.2421480
## BarPrepCompletion 0.2421480 1.0000000
## X.LawSchoolBarPrepWorkshops -0.1068939 0.1197472
## LSAT.1 0.1210758 -0.1427520
## UGPA.1 0.2866445 0.1482972
## X.LawSchoolBarPrepWorkshops LSAT.1 UGPA.1
## LSAT -0.14601247 1.00000000 -0.19714583
## UGPA 0.05345049 -0.19714583 1.00000000
## GPA_1L -0.20795243 0.20478577 0.19829855
## GPA_Final -0.09012703 0.09983408 0.28036589
## FinalRankPercentile -0.10689386 0.12107579 0.28664445
## BarPrepCompletion 0.11974716 -0.14275201 0.14829724
## X.LawSchoolBarPrepWorkshops 1.00000000 -0.14601247 0.05345049
## LSAT.1 -0.14601247 1.00000000 -0.19714583
## UGPA.1 0.05345049 -0.19714583 1.00000000
Stepwise selection was used to identify the most relevant predictors and refine the model.
# step model
step_model <- stepAIC(base_model, direction = "both")
## Start: AIC=189.69
## PassFail ~ LSAT + UGPA + GPA_1L + GPA_Final + FinalRankPercentile +
## Accommodations + Probation + BarPrepCompany + BarPrepCompletion +
## X.LawSchoolBarPrepWorkshops + StudentSuccessInitiative +
## LegalAnalysis_TexasPractice + AdvLegalPerfSkills + AdvLegalAnalysis +
## BarPrepMentor + CivPro + LPI + LPII
##
## Df Deviance AIC
## - LPI 6 116.61 182.61
## - AdvLegalAnalysis 1 111.69 187.69
## - Accommodations 1 111.70 187.70
## - AdvLegalPerfSkills 1 111.97 187.97
## - GPA_1L 1 111.98 187.99
## - BarPrepMentor 1 112.02 188.02
## - LPII 7 124.13 188.13
## - StudentSuccessInitiative 1 112.39 188.39
## - CivPro 7 124.60 188.60
## - X.LawSchoolBarPrepWorkshops 1 112.67 188.67
## - Probation 2 114.77 188.77
## - FinalRankPercentile 1 113.36 189.36
## <none> 111.69 189.69
## - UGPA 1 115.61 191.61
## - GPA_Final 1 116.17 192.17
## - LegalAnalysis_TexasPractice 1 117.36 193.36
## - BarPrepCompany 3 123.48 195.49
## - LSAT 1 128.08 204.08
## - BarPrepCompletion 1 147.25 223.25
##
## Step: AIC=182.61
## PassFail ~ LSAT + UGPA + GPA_1L + GPA_Final + FinalRankPercentile +
## Accommodations + Probation + BarPrepCompany + BarPrepCompletion +
## X.LawSchoolBarPrepWorkshops + StudentSuccessInitiative +
## LegalAnalysis_TexasPractice + AdvLegalPerfSkills + AdvLegalAnalysis +
## BarPrepMentor + CivPro + LPII
##
## Df Deviance AIC
## - GPA_1L 1 116.61 180.62
## - Accommodations 1 116.62 180.62
## - AdvLegalAnalysis 1 116.64 180.64
## - LPII 7 128.95 180.95
## - BarPrepMentor 1 117.20 181.20
## - AdvLegalPerfSkills 1 117.37 181.37
## - FinalRankPercentile 1 117.53 181.53
## - StudentSuccessInitiative 1 118.05 182.05
## - X.LawSchoolBarPrepWorkshops 1 118.16 182.16
## <none> 116.61 182.61
## - CivPro 7 130.66 182.66
## - Probation 2 121.00 183.00
## - UGPA 1 119.53 183.53
## - GPA_Final 1 120.06 184.06
## - LegalAnalysis_TexasPractice 1 123.66 187.66
## - BarPrepCompany 3 129.44 189.44
## + LPI 6 111.69 189.69
## - LSAT 1 133.50 197.50
## - BarPrepCompletion 1 151.85 215.85
##
## Step: AIC=180.61
## PassFail ~ LSAT + UGPA + GPA_Final + FinalRankPercentile + Accommodations +
## Probation + BarPrepCompany + BarPrepCompletion + X.LawSchoolBarPrepWorkshops +
## StudentSuccessInitiative + LegalAnalysis_TexasPractice +
## AdvLegalPerfSkills + AdvLegalAnalysis + BarPrepMentor + CivPro +
## LPII
##
## Df Deviance AIC
## - Accommodations 1 116.62 178.62
## - AdvLegalAnalysis 1 116.64 178.64
## - BarPrepMentor 1 117.21 179.21
## - AdvLegalPerfSkills 1 117.38 179.38
## - FinalRankPercentile 1 117.53 179.53
## - StudentSuccessInitiative 1 118.05 180.05
## - LPII 7 130.19 180.19
## - X.LawSchoolBarPrepWorkshops 1 118.23 180.23
## <none> 116.61 180.62
## - Probation 2 121.00 181.00
## - UGPA 1 119.54 181.54
## - CivPro 7 131.92 181.92
## - GPA_Final 1 120.17 182.17
## + GPA_1L 1 116.61 182.61
## - LegalAnalysis_TexasPractice 1 123.98 185.98
## - BarPrepCompany 3 129.51 187.51
## + LPI 6 111.98 187.99
## - LSAT 1 134.27 196.27
## - BarPrepCompletion 1 151.85 213.85
##
## Step: AIC=178.62
## PassFail ~ LSAT + UGPA + GPA_Final + FinalRankPercentile + Probation +
## BarPrepCompany + BarPrepCompletion + X.LawSchoolBarPrepWorkshops +
## StudentSuccessInitiative + LegalAnalysis_TexasPractice +
## AdvLegalPerfSkills + AdvLegalAnalysis + BarPrepMentor + CivPro +
## LPII
##
## Df Deviance AIC
## - AdvLegalAnalysis 1 116.64 176.64
## - BarPrepMentor 1 117.22 177.22
## - AdvLegalPerfSkills 1 117.40 177.40
## - FinalRankPercentile 1 117.55 177.55
## - StudentSuccessInitiative 1 118.07 178.07
## - X.LawSchoolBarPrepWorkshops 1 118.26 178.26
## - LPII 7 130.38 178.38
## <none> 116.62 178.62
## - Probation 2 121.02 179.02
## - UGPA 1 119.54 179.54
## - CivPro 7 131.99 179.99
## - GPA_Final 1 120.21 180.21
## + Accommodations 1 116.61 180.62
## + GPA_1L 1 116.62 180.62
## - LegalAnalysis_TexasPractice 1 124.18 184.18
## - BarPrepCompany 3 129.53 185.53
## + LPI 6 112.02 186.02
## - LSAT 1 134.27 194.27
## - BarPrepCompletion 1 151.90 211.90
##
## Step: AIC=176.64
## PassFail ~ LSAT + UGPA + GPA_Final + FinalRankPercentile + Probation +
## BarPrepCompany + BarPrepCompletion + X.LawSchoolBarPrepWorkshops +
## StudentSuccessInitiative + LegalAnalysis_TexasPractice +
## AdvLegalPerfSkills + BarPrepMentor + CivPro + LPII
##
## Df Deviance AIC
## - BarPrepMentor 1 117.26 175.26
## - FinalRankPercentile 1 117.58 175.58
## - AdvLegalPerfSkills 1 117.85 175.85
## - StudentSuccessInitiative 1 118.15 176.15
## - X.LawSchoolBarPrepWorkshops 1 118.27 176.27
## - LPII 7 130.38 176.38
## <none> 116.64 176.64
## - UGPA 1 119.55 177.55
## - Probation 2 121.57 177.57
## - GPA_Final 1 120.22 178.22
## - CivPro 7 132.58 178.58
## + AdvLegalAnalysis 1 116.62 178.62
## + Accommodations 1 116.64 178.64
## + GPA_1L 1 116.64 178.64
## - LegalAnalysis_TexasPractice 1 124.18 182.18
## + LPI 6 112.05 184.05
## - BarPrepCompany 3 131.25 185.25
## - LSAT 1 134.43 192.43
## - BarPrepCompletion 1 152.40 210.40
##
## Step: AIC=175.26
## PassFail ~ LSAT + UGPA + GPA_Final + FinalRankPercentile + Probation +
## BarPrepCompany + BarPrepCompletion + X.LawSchoolBarPrepWorkshops +
## StudentSuccessInitiative + LegalAnalysis_TexasPractice +
## AdvLegalPerfSkills + CivPro + LPII
##
## Df Deviance AIC
## - FinalRankPercentile 1 118.15 174.15
## - StudentSuccessInitiative 1 118.62 174.62
## - AdvLegalPerfSkills 1 118.73 174.73
## - LPII 7 130.93 174.93
## <none> 117.26 175.26
## - X.LawSchoolBarPrepWorkshops 1 119.56 175.56
## - UGPA 1 119.88 175.88
## - Probation 2 122.08 176.08
## - GPA_Final 1 120.64 176.64
## + BarPrepMentor 1 116.64 176.64
## - CivPro 7 132.89 176.89
## + AdvLegalAnalysis 1 117.22 177.22
## + GPA_1L 1 117.24 177.24
## + Accommodations 1 117.26 177.26
## - LegalAnalysis_TexasPractice 1 125.53 181.53
## + LPI 6 112.59 182.59
## - BarPrepCompany 3 132.23 184.23
## - LSAT 1 134.67 190.67
## - BarPrepCompletion 1 152.75 208.75
##
## Step: AIC=174.15
## PassFail ~ LSAT + UGPA + GPA_Final + Probation + BarPrepCompany +
## BarPrepCompletion + X.LawSchoolBarPrepWorkshops + StudentSuccessInitiative +
## LegalAnalysis_TexasPractice + AdvLegalPerfSkills + CivPro +
## LPII
##
## Df Deviance AIC
## - StudentSuccessInitiative 1 118.98 172.98
## - LPII 7 131.63 173.63
## <none> 118.15 174.15
## - X.LawSchoolBarPrepWorkshops 1 120.21 174.21
## - UGPA 1 120.29 174.29
## - AdvLegalPerfSkills 1 120.44 174.44
## - Probation 2 122.71 174.71
## + FinalRankPercentile 1 117.26 175.26
## + BarPrepMentor 1 117.58 175.58
## - CivPro 7 133.68 175.68
## + AdvLegalAnalysis 1 118.10 176.10
## + Accommodations 1 118.14 176.14
## + GPA_1L 1 118.14 176.14
## - GPA_Final 1 124.10 178.10
## - LegalAnalysis_TexasPractice 1 126.44 180.44
## + LPI 6 114.11 182.11
## - BarPrepCompany 3 133.79 183.79
## - LSAT 1 134.69 188.69
## - BarPrepCompletion 1 153.06 207.06
##
## Step: AIC=172.98
## PassFail ~ LSAT + UGPA + GPA_Final + Probation + BarPrepCompany +
## BarPrepCompletion + X.LawSchoolBarPrepWorkshops + LegalAnalysis_TexasPractice +
## AdvLegalPerfSkills + CivPro + LPII
##
## Df Deviance AIC
## - LPII 7 132.00 172.00
## - AdvLegalPerfSkills 1 120.91 172.91
## - UGPA 1 120.94 172.94
## <none> 118.98 172.98
## - Probation 2 123.14 173.14
## - X.LawSchoolBarPrepWorkshops 1 121.52 173.51
## + StudentSuccessInitiative 1 118.15 174.15
## + BarPrepMentor 1 118.53 174.53
## + FinalRankPercentile 1 118.62 174.62
## + AdvLegalAnalysis 1 118.88 174.88
## - CivPro 7 134.88 174.88
## + GPA_1L 1 118.96 174.96
## + Accommodations 1 118.97 174.97
## - LegalAnalysis_TexasPractice 1 127.56 179.56
## + LPI 6 114.45 180.45
## - BarPrepCompany 3 133.96 181.96
## - LSAT 1 135.48 187.48
## - GPA_Final 1 135.92 187.92
## - BarPrepCompletion 1 154.40 206.40
##
## Step: AIC=172
## PassFail ~ LSAT + UGPA + GPA_Final + Probation + BarPrepCompany +
## BarPrepCompletion + X.LawSchoolBarPrepWorkshops + LegalAnalysis_TexasPractice +
## AdvLegalPerfSkills + CivPro
##
## Df Deviance AIC
## <none> 132.00 172.00
## - UGPA 1 134.59 172.59
## - Probation 2 136.90 172.90
## + LPII 7 118.98 172.98
## + GPA_1L 1 130.99 172.99
## - AdvLegalPerfSkills 1 135.09 173.09
## - X.LawSchoolBarPrepWorkshops 1 135.19 173.19
## + BarPrepMentor 1 131.48 173.48
## + StudentSuccessInitiative 1 131.63 173.63
## + FinalRankPercentile 1 131.66 173.66
## + Accommodations 1 131.80 173.80
## + AdvLegalAnalysis 1 132.00 174.00
## - CivPro 7 149.26 175.26
## + LPI 6 125.67 177.68
## - BarPrepCompany 3 144.03 178.03
## - LegalAnalysis_TexasPractice 1 143.28 181.28
## - GPA_Final 1 147.62 185.62
## - LSAT 1 150.31 188.31
## - BarPrepCompletion 1 164.88 202.88
# print result
print(step_model)
##
## Call: glm(formula = PassFail ~ LSAT + UGPA + GPA_Final + Probation +
## BarPrepCompany + BarPrepCompletion + X.LawSchoolBarPrepWorkshops +
## LegalAnalysis_TexasPractice + AdvLegalPerfSkills + CivPro,
## family = binomial, data = train_data)
##
## Coefficients:
## (Intercept) LSAT
## -69.76127 0.30106
## UGPA GPA_Final
## 1.20996 4.37048
## ProbationN ProbationY
## 3.56786 0.46041
## BarPrepCompanyHelix BarPrepCompanyKaplan
## 17.91199 -2.10463
## BarPrepCompanyThemis BarPrepCompletion
## 1.42878 9.12065
## X.LawSchoolBarPrepWorkshops LegalAnalysis_TexasPracticeY
## -0.22325 -2.38996
## AdvLegalPerfSkillsY CivProB
## 1.14968 1.72565
## CivProB+ CivProC
## 2.19544 -0.47593
## CivProC+ CivProD
## 0.13886 -16.93465
## CivProD+ CivProF
## -0.09315 10.44633
##
## Degrees of Freedom: 359 Total (i.e. Null); 340 Residual
## Null Deviance: 255.3
## Residual Deviance: 132 AIC: 172
The final model was built using the significant predictors identified through stepwise selection.
# final model
final_formula <- step_model$call$formula
final_model <- glm(final_formula, data = train_data, family = binomial)
summary(final_model)
##
## Call:
## glm(formula = final_formula, family = binomial, data = train_data)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -69.76127 15.18731 -4.593 4.36e-06 ***
## LSAT 0.30106 0.07666 3.927 8.60e-05 ***
## UGPA 1.20996 0.75358 1.606 0.10836
## GPA_Final 4.37048 1.19666 3.652 0.00026 ***
## ProbationN 3.56786 1.83827 1.941 0.05227 .
## ProbationY 0.46041 0.64796 0.711 0.47736
## BarPrepCompanyHelix 17.91199 2399.54498 0.007 0.99404
## BarPrepCompanyKaplan -2.10463 1.29610 -1.624 0.10441
## BarPrepCompanyThemis 1.42878 0.55756 2.563 0.01039 *
## BarPrepCompletion 9.12065 1.91429 4.765 1.89e-06 ***
## X.LawSchoolBarPrepWorkshops -0.22325 0.12557 -1.778 0.07542 .
## LegalAnalysis_TexasPracticeY -2.38996 0.75701 -3.157 0.00159 **
## AdvLegalPerfSkillsY 1.14968 0.66044 1.741 0.08172 .
## CivProB 1.72565 1.31153 1.316 0.18826
## CivProB+ 2.19544 1.46578 1.498 0.13419
## CivProC -0.47593 1.28004 -0.372 0.71003
## CivProC+ 0.13886 1.24161 0.112 0.91095
## CivProD -16.93465 2399.54515 -0.007 0.99437
## CivProD+ -0.09315 1.46006 -0.064 0.94913
## CivProF 10.44633 2399.54521 0.004 0.99653
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 255.29 on 359 degrees of freedom
## Residual deviance: 132.00 on 340 degrees of freedom
## AIC: 172
##
## Number of Fisher Scoring iterations: 15
anova(base_model, final_model, test = "Chisq")
## Analysis of Deviance Table
##
## Model 1: PassFail ~ LSAT + UGPA + GPA_1L + GPA_Final + FinalRankPercentile +
## Accommodations + Probation + BarPrepCompany + BarPrepCompletion +
## X.LawSchoolBarPrepWorkshops + StudentSuccessInitiative +
## LegalAnalysis_TexasPractice + AdvLegalPerfSkills + AdvLegalAnalysis +
## BarPrepMentor + CivPro + LPI + LPII
## Model 2: PassFail ~ LSAT + UGPA + GPA_Final + Probation + BarPrepCompany +
## BarPrepCompletion + X.LawSchoolBarPrepWorkshops + LegalAnalysis_TexasPractice +
## AdvLegalPerfSkills + CivPro
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 321 111.69
## 2 340 132.00 -19 -20.311 0.3761
The final logistic regression model was fit using the formula derived from stepwise selection. The model includes the following significant predictors:
LSAT: Higher LSAT scores are associated with an increased likelihood of passing the bar (p-value = 8.60e-05).
GPA_Final: A higher final GPA significantly increases the likelihood of passing (p-value = 0.00026).
BarPrepCompletion: Completing bar preparation courses increases the chances of passing (p-value = 1.89e-06).
LegalAnalysis_TexasPracticeY: Participating in Texas practice negatively impacts the log-odds of passing, though the effect is still significant (p-value = 0.00159).
BarPrepCompanyThemis: Using Themis as a bar preparation company is positively associated with passing (p-value = 0.01039).
Null Deviance: 255.29 on 359 degrees of freedom
Residual Deviance: 132.00 on 340 degrees of freedom
AIC: 172
The model shows a significant reduction in deviance compared to the null model, indicating that the predictors contribute meaningfully to explaining the variance in bar passage outcomes. The AIC value provides a measure of model fit, with lower values indicating better fit.
# Model diagnostics
# Residuals vs Fitted plot
residuals_vs_fitted <- ggplot(final_model, aes(x = fitted(final_model), y = residuals(final_model))) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed") +
labs(title = "Residuals vs Fitted Values", x = "Fitted Values", y = "Residuals")
print(residuals_vs_fitted)
The Residuals vs Fitted Values plot for the logistic regression model shows a curved pattern in the residuals. Residuals are positive for lower fitted values and become more negative as fitted values increase, indicating that the model may underpredict for lower probabilities and overpredict for higher probabilities. This suggests potential model misspecification or non-linear relationships between predictors and the outcome not captured by the current model.
To evaluate the performance of the final model, I used the Confusion Matrix based on test dataset.
test_data$PassFail <- factor(test_data$PassFail, levels = c("F", "P"))
test_data$CivPro <- as.character(test_data$CivPro)
test_data$CivPro[test_data$CivPro == ""] <- NA
test_data$CivPro <- factor(test_data$CivPro)
test_data <- na.omit(test_data)
predictions <- predict(final_model, newdata = test_data, type = "response")
predictions <- ifelse(predictions > 0.5, "P", "F")
predictions <- factor(predictions, levels = c("F", "P"))
conf_matrix <- confusionMatrix(predictions, test_data$PassFail)
conf_matrix
## Confusion Matrix and Statistics
##
## Reference
## Prediction F P
## F 1 7
## P 9 71
##
## Accuracy : 0.8182
## 95% CI : (0.7216, 0.8924)
## No Information Rate : 0.8864
## P-Value [Acc > NIR] : 0.9803
##
## Kappa : 0.0112
##
## Mcnemar's Test P-Value : 0.8026
##
## Sensitivity : 0.10000
## Specificity : 0.91026
## Pos Pred Value : 0.12500
## Neg Pred Value : 0.88750
## Prevalence : 0.11364
## Detection Rate : 0.01136
## Detection Prevalence : 0.09091
## Balanced Accuracy : 0.50513
##
## 'Positive' Class : F
##
The result shows that the model achieved an overall accuracy of 81.82%, indicating a relatively good performance in correctly classifying the majority of cases. However, the sensitivity was very low at 10%, meaning the model was not effective at identifying “Fail” cases. The specificity was high at 91.026%, showing the model was good at correctly classifying “Pass” cases. Overall, the balanced accuracy of 50.513% and a Kappa statistic of 0.0112. One reason of the relative low balanced accuracy is the data is imbalanced. However, these results suggest that while the model has some predictive power, there is significant space for improvement, particularly in detecting “Fail” cases.
Based on my fianl moddel and analysis, I proposed the following evidence-based recommendations to increase the law school’s bar passage rate:
First, people should enhance Bar Preparation Programs. Strengthen the bar preparation courses and workshops to ensure students are well-prepared for the exam.
Second, they should support Student Success Initiatives. Expand initiatives aimed at student success, such as mentorship programs and academic support services.
Then, they are supposed to focus on Academic Performance. Implement strategies to improve both undergraduate GPA and final law school GPA, as these are significant predictors of bar passage.
Also, they could utilize Predictive Analytics. Use the identified predictors to identify at-risk students early and provide targeted interventions.
In the end, people can focus on improving CivPro Course Outcomes. Given the strong association between CivPro grades and bar passage, enhance the CivPro curriculum and support students in this course.
The logistic regression model provides valuable insights into the factors influencing bar passage rates. By addressing the significant predictors in fianl model and implementing the recommended strategies, the law school can work towards improving its bar passage rate.
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.