Abstract

This work examines changes in the racial and ethnic composition of admissions at Texas A&M college following the 1996 Hopwood case, which put a court prohibition on affirmative action. We estimate the extent to which these universities used affirmative action before the ban, and we examine how admissions officers at these universities adjusted the relative weights given to key applicant criteria throughout the suspension. We model the extent to which these new regulations succeeded in preserving minority admission rates at pre-Hopwood levels after examining whether changes in relative weights favored minority applicants. We discovered that most colleges followed the Hopwood rule, so that direct advantages offered to black and Hispanic candidates vanished (and, in some circumstances, became disadvantages). While there is evidence that universities changed the weights they placed on applicant characteristics other than race and ethnicity in ways that aided underrepresented minority applicants, these changes in the admissions process were unable to maintain the share of admitted students held by black and Hispanic applicants. As a result, these alternative admissions procedures have not been a reliable proxy for race and ethnicity.

Introduction

Despite some claims, people have acknowledged a gap in access to higher education still exists, especially for minority students. However, many have objected to affirmative action policies as a solution, and in 1996 in Texas, the Fifth Court Circuit of Appeals outlawed the use of race for college admissions in the state. After a noticeable drop in minority student enrollments, the governor of Texas signed a new bill as a compromise: any high school student finishing in top decile received a guaranteed admission to Texas public university (Texas A&M University). Instead of competing with applicants from the entire state, students only had to compete with their immediate classmates to access higher education.

Literature review

Discuss how other researchers have addressed similar problems, what their achievements are, and what the advantage and drawbacks of each reviewed approach are. Explain how your investigation is similar or different to the state-of-the-art. Please cite the relevant papers where appropriate.

Methodology

Discuss the key aspects of your problem, data set and regression model(s). Given that you are working on real-world data, explain at a high-level your exploratory data analysis, how you prepared the data for regression modeling, your process for building regression models, and your model selection.

Data

We employ administrative records from a Texas A&M University (1992-2022) that include info about admissions selectivity, public/private status, and the ethno-racial composition of their student body for this analysis. Importantly, the time period for the public organization comprises years prior to and following the judicial ban on affirmative action. This is significant because, while the judicial restriction applied to all schools in the 5th Circuit District, the top 10% policy was restricted to public colleges and universities. These records contain a plethora of information about the applicant pool, and have been standardized where necessary, and checked for consistency.

The application dataset contained 163,027 observations of 24 predictor variables and transcripts dataset contained 637,028 observations of 10 predictor variables, where each record represented an individual applicant. These variables describe items typically found on a college admission application such as the year and term an applicant desired to enroll, applicant demographics, applicant academic characteristics, and high school characteristics.

Unfortunately, the data does not often include information regarding a student’s high school academics or application essays. In evaluating the results, we take great note of these data constraints.

Each record in the application dataset included a response variable “admit” (Institution’s admission decision), was a boolean where “1” indicated the person was admitted.

A sample of the data appears in the following table:

While exploring this data, we made the following observations for the applications data:

  • 21 variables were categorical, and 3 were numeric.
  • actR, gradyear, hscentury, hslos variables had more that 50% of the missing data.
variable complete_rate n_missing min max
Texas_resident 1.00 0 NA NA
US_Citizen 1.00 0 NA NA
actR 0.46 72133 1 24
admit 1.00 0 NA NA
admit_prov 1.00 0 NA NA
decileR 1.00 0 NA NA
enroll 1.00 0 NA NA
ethnic 1.00 50 NA NA
gradyear 0.35 86914 2 22
hscentury 0.32 91636 NA NA
hseconstatus 0.72 38094 NA NA
hsinstate 1.00 103 NA NA
hslos 0.41 79635 NA NA
hsprivate 0.98 2375 NA NA
hstypeR 0.91 12338 NA NA
major_field 1.00 2 NA NA
male 1.00 52 NA NA
quartile 1.00 10 NA NA
satR 1.00 0 3 111
studentid 1.00 0 NA NA
studentid_uniq 1.00 0 NA NA
termdes 1.00 0 NA NA
testscoreR 1.00 0 3 111
yeardes 1.00 0 1992 2002

76% of all the students were admitted to the university.

By checking the frequency of the variables with few unique values (“male”,“ethnic”, “citizenship”, “restype”,“decileR”,“major_field”), we checked the frequency of each value. As demonstrated by the graph below, more than 50% of the data for the citizenship, restype and ethnic variables was US Citizen, Texas Resident and White, Non-Hispanic accordingly:

Transform Data

Students with missing values for ethnic variable are combined with “White, Non-Hispanic” students, resulting in cautious estimates of policy effects.

## The original levels Black, Non-Hispanic Hispanic American Indian/Alaskan Native Asian or Pacific Islander White, Non-Hispanic International Other None 
## have been replaced by Black, Non-Hispanic Hispanic American Indian/Alaskan Native Asian or Pacific Islander International Other White, Non-Hispanic
variable complete_rate n_missing min max
Texas_resident 1 0 NA NA
US_Citizen 1 0 NA NA
admit 1 0 NA NA
enroll 1 0 NA NA
ethnic 1 0 NA NA
hsinstate 1 0 NA NA
hsprivate 1 0 NA NA
male 1 0 NA NA
satR 1 0 3 111
termdes 1 0 NA NA
testscoreR 1 0 3 111
top10 1 0 NA NA
yeardes 1 0 1992 2002

## List of 99
##  $ line                      :List of 6
##   ..$ colour       : chr "black"
##   ..$ linewidth    : num 0.523
##   ..$ linetype     : num 1
##   ..$ lineend      : chr "butt"
##   ..$ arrow        : logi FALSE
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_line" "element"
##  $ rect                      :List of 5
##   ..$ fill         : chr "white"
##   ..$ colour       : chr "black"
##   ..$ linewidth    : num 0.523
##   ..$ linetype     : num 1
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_rect" "element"
##  $ text                      :List of 11
##   ..$ family       : chr "Arial Narrow"
##   ..$ face         : chr "plain"
##   ..$ colour       : chr "black"
##   ..$ size         : num 11.5
##   ..$ hjust        : num 0.5
##   ..$ vjust        : num 0.5
##   ..$ angle        : num 0
##   ..$ lineheight   : num 0.9
##   ..$ margin       : 'margin' num [1:4] 0points 0points 0points 0points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : logi FALSE
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ title                     : NULL
##  $ aspect.ratio              : NULL
##  $ axis.title                :List of 11
##   ..$ family       : chr "Arial Narrow"
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : num 9
##   ..$ hjust        : NULL
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi FALSE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.title.x              :List of 11
##   ..$ family       : chr "Arial Narrow"
##   ..$ face         : chr "plain"
##   ..$ colour       : NULL
##   ..$ size         : num 9
##   ..$ hjust        : num 1
##   ..$ vjust        : num 1
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 2.88points 0points 0points 0points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi FALSE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.title.x.top          :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : NULL
##   ..$ vjust        : num 0
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 0points 0points 2.88points 0points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.title.x.bottom       : NULL
##  $ axis.title.y              :List of 11
##   ..$ family       : chr "Arial Narrow"
##   ..$ face         : chr "plain"
##   ..$ colour       : NULL
##   ..$ size         : num 9
##   ..$ hjust        : num 1
##   ..$ vjust        : num 1
##   ..$ angle        : num 90
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 0points 2.88points 0points 0points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi FALSE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.title.y.left         : NULL
##  $ axis.title.y.right        :List of 11
##   ..$ family       : chr "Arial Narrow"
##   ..$ face         : chr "plain"
##   ..$ colour       : NULL
##   ..$ size         : num 9
##   ..$ hjust        : num 1
##   ..$ vjust        : num 0
##   ..$ angle        : num 90
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 0points 0points 0points 2.88points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi FALSE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.text                 :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : chr "grey30"
##   ..$ size         : 'rel' num 0.8
##   ..$ hjust        : NULL
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.text.x               :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : num 11.5
##   ..$ hjust        : NULL
##   ..$ vjust        : num 1
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 0points 0points 0points 0points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi FALSE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.text.x.top           :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : NULL
##   ..$ vjust        : num 0
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 0points 0points 2.3points 0points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.text.x.bottom        : NULL
##  $ axis.text.y               :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : num 11.5
##   ..$ hjust        : num 1
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 0points 0points 0points 0points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi FALSE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.text.y.left          : NULL
##  $ axis.text.y.right         :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : num 0
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 0points 0points 0points 2.3points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.ticks                : list()
##   ..- attr(*, "class")= chr [1:2] "element_blank" "element"
##  $ axis.ticks.x              : list()
##   ..- attr(*, "class")= chr [1:2] "element_blank" "element"
##  $ axis.ticks.x.top          : NULL
##  $ axis.ticks.x.bottom       : NULL
##  $ axis.ticks.y              : list()
##   ..- attr(*, "class")= chr [1:2] "element_blank" "element"
##  $ axis.ticks.y.left         : NULL
##  $ axis.ticks.y.right        : NULL
##  $ axis.ticks.length         : 'simpleUnit' num 2.88points
##   ..- attr(*, "unit")= int 8
##  $ axis.ticks.length.x       : NULL
##  $ axis.ticks.length.x.top   : NULL
##  $ axis.ticks.length.x.bottom: NULL
##  $ axis.ticks.length.y       : NULL
##  $ axis.ticks.length.y.left  : NULL
##  $ axis.ticks.length.y.right : NULL
##  $ axis.line                 : list()
##   ..- attr(*, "class")= chr [1:2] "element_blank" "element"
##  $ axis.line.x               : NULL
##  $ axis.line.x.top           : NULL
##  $ axis.line.x.bottom        : NULL
##  $ axis.line.y               : NULL
##  $ axis.line.y.left          : NULL
##  $ axis.line.y.right         : NULL
##  $ legend.background         : list()
##   ..- attr(*, "class")= chr [1:2] "element_blank" "element"
##  $ legend.margin             : 'margin' num [1:4] 5.75points 5.75points 5.75points 5.75points
##   ..- attr(*, "unit")= int 8
##  $ legend.spacing            : 'simpleUnit' num 11.5points
##   ..- attr(*, "unit")= int 8
##  $ legend.spacing.x          : NULL
##  $ legend.spacing.y          : NULL
##  $ legend.key                : list()
##   ..- attr(*, "class")= chr [1:2] "element_blank" "element"
##  $ legend.key.size           : 'simpleUnit' num 1.2lines
##   ..- attr(*, "unit")= int 3
##  $ legend.key.height         : NULL
##  $ legend.key.width          : NULL
##  $ legend.text               :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : 'rel' num 0.8
##   ..$ hjust        : NULL
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ legend.text.align         : NULL
##  $ legend.title              :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : num 0
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ legend.title.align        : NULL
##  $ legend.position           : chr "right"
##  $ legend.direction          : NULL
##  $ legend.justification      : chr "center"
##  $ legend.box                : NULL
##  $ legend.box.just           : NULL
##  $ legend.box.margin         : 'margin' num [1:4] 0cm 0cm 0cm 0cm
##   ..- attr(*, "unit")= int 1
##  $ legend.box.background     : list()
##   ..- attr(*, "class")= chr [1:2] "element_blank" "element"
##  $ legend.box.spacing        : 'simpleUnit' num 11.5points
##   ..- attr(*, "unit")= int 8
##  $ panel.background          : list()
##   ..- attr(*, "class")= chr [1:2] "element_blank" "element"
##  $ panel.border              : list()
##   ..- attr(*, "class")= chr [1:2] "element_blank" "element"
##  $ panel.spacing             : 'simpleUnit' num 2lines
##   ..- attr(*, "unit")= int 3
##  $ panel.spacing.x           : NULL
##  $ panel.spacing.y           : NULL
##  $ panel.grid                :List of 6
##   ..$ colour       : chr "#cccccc"
##   ..$ linewidth    : num 0.2
##   ..$ linetype     : NULL
##   ..$ lineend      : NULL
##   ..$ arrow        : logi FALSE
##   ..$ inherit.blank: logi FALSE
##   ..- attr(*, "class")= chr [1:2] "element_line" "element"
##  $ panel.grid.major          :List of 6
##   ..$ colour       : chr "#cccccc"
##   ..$ linewidth    : num 0.2
##   ..$ linetype     : NULL
##   ..$ lineend      : NULL
##   ..$ arrow        : logi FALSE
##   ..$ inherit.blank: logi FALSE
##   ..- attr(*, "class")= chr [1:2] "element_line" "element"
##  $ panel.grid.minor          :List of 6
##   ..$ colour       : chr "#cccccc"
##   ..$ linewidth    : num 0.15
##   ..$ linetype     : NULL
##   ..$ lineend      : NULL
##   ..$ arrow        : logi FALSE
##   ..$ inherit.blank: logi FALSE
##   ..- attr(*, "class")= chr [1:2] "element_line" "element"
##  $ panel.grid.major.x        : NULL
##  $ panel.grid.major.y        : NULL
##  $ panel.grid.minor.x        : NULL
##  $ panel.grid.minor.y        : NULL
##  $ panel.ontop               : logi FALSE
##  $ plot.background           : list()
##   ..- attr(*, "class")= chr [1:2] "element_blank" "element"
##  $ plot.title                :List of 11
##   ..$ family       : chr "Arial Narrow"
##   ..$ face         : chr "bold"
##   ..$ colour       : NULL
##   ..$ size         : num 18
##   ..$ hjust        : num 0
##   ..$ vjust        : num 1
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 0points 0points 10points 0points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi FALSE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ plot.title.position       : chr "panel"
##  $ plot.subtitle             :List of 11
##   ..$ family       : chr "Arial Narrow"
##   ..$ face         : chr "plain"
##   ..$ colour       : NULL
##   ..$ size         : num 12
##   ..$ hjust        : num 0
##   ..$ vjust        : num 1
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 0points 0points 15points 0points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi FALSE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ plot.caption              :List of 11
##   ..$ family       : chr "Arial Narrow"
##   ..$ face         : chr "italic"
##   ..$ colour       : NULL
##   ..$ size         : num 9
##   ..$ hjust        : num 1
##   ..$ vjust        : num 1
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 10points 0points 0points 0points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi FALSE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ plot.caption.position     : chr "panel"
##  $ plot.tag                  :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : 'rel' num 1.2
##   ..$ hjust        : num 0.5
##   ..$ vjust        : num 0.5
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ plot.tag.position         : chr "topleft"
##  $ plot.margin               : 'margin' num [1:4] 30points 30points 30points 30points
##   ..- attr(*, "unit")= int 8
##  $ strip.background          : list()
##   ..- attr(*, "class")= chr [1:2] "element_blank" "element"
##  $ strip.background.x        : NULL
##  $ strip.background.y        : NULL
##  $ strip.clip                : chr "inherit"
##  $ strip.placement           : chr "inside"
##  $ strip.text                :List of 11
##   ..$ family       : chr "Arial Narrow"
##   ..$ face         : chr "plain"
##   ..$ colour       : chr "grey10"
##   ..$ size         : num 12
##   ..$ hjust        : num 0
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : 'margin' num [1:4] 4.6points 4.6points 4.6points 4.6points
##   .. ..- attr(*, "unit")= int 8
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi FALSE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ strip.text.x              : NULL
##  $ strip.text.x.bottom       : NULL
##  $ strip.text.x.top          : NULL
##  $ strip.text.y              :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : NULL
##   ..$ vjust        : NULL
##   ..$ angle        : num -90
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ strip.text.y.left         :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : NULL
##   ..$ vjust        : NULL
##   ..$ angle        : num 90
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi TRUE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ strip.text.y.right        : NULL
##  $ strip.switch.pad.grid     : 'simpleUnit' num 2.88points
##   ..- attr(*, "unit")= int 8
##  $ strip.switch.pad.wrap     : 'simpleUnit' num 2.88points
##   ..- attr(*, "unit")= int 8
##  $ y                         : chr "Money input"
##  $ x                         : chr "Month"
##  - attr(*, "class")= chr [1:2] "theme" "gg"
##  - attr(*, "complete")= logi TRUE
##  - attr(*, "validate")= logi TRUE

Models

As the first step, we built the generalized linear model based on the dataset with some variables transformed to categorical from numeric for the time before Fall, 1998.

variable complete_rate n_missing min max
Texas_resident 1 0 NA NA
US_Citizen 1 0 NA NA
admit 1 0 NA NA
enroll 1 0 NA NA
ethnic 1 0 NA NA
hsinstate 1 0 NA NA
hsprivate 1 0 NA NA
male 1 0 NA NA
satR 1 0 3 111
termdes 1 0 NA NA
testscoreR 1 0 3 111
top10 1 0 NA NA
yeardes 1 0 1992 1998

Since our dependent variable is binary (0 and 1), we used logistic regression. To do so, the function glm() with family=binomial was used. At the beginning, variable below were included.

## 
## Call:
## glm(formula = admit ~ . - enroll - yeardes - termdes, family = binomial(link = "probit"), 
##     data = pre_df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.8936   0.0305   0.1906   0.6786   2.7660  
## 
## Coefficients: (1 not defined because of singularities)
##                                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                          -1.0649721  0.0569552 -18.698  < 2e-16 ***
## male1                                -0.1443959  0.0131538 -10.978  < 2e-16 ***
## ethnicHispanic                       -0.0258000  0.0370032  -0.697   0.4857    
## ethnicAmerican Indian/Alaskan Native -0.8790031  0.0939285  -9.358  < 2e-16 ***
## ethnicAsian or Pacific Islander      -0.9181691  0.0427157 -21.495  < 2e-16 ***
## ethnicInternational                  -0.4846015  0.0896390  -5.406 6.44e-08 ***
## ethnicOther                          -0.9894696  0.0689488 -14.351  < 2e-16 ***
## ethnicWhite, Non-Hispanic            -0.7983680  0.0325750 -24.509  < 2e-16 ***
## US_Citizen1                           0.0766258  0.0420904   1.821   0.0687 .  
## Texas_resident1                       0.7330951  0.0335813  21.830  < 2e-16 ***
## satR                                  0.0282780  0.0003536  79.970  < 2e-16 ***
## testscoreR                                   NA         NA      NA       NA    
## top10TRUE                             1.4734079  0.0196313  75.054  < 2e-16 ***
## hsprivatePrivate                     -0.1714509  0.0257750  -6.652 2.89e-11 ***
## hsprivateNone                        -0.2848834  0.0473206  -6.020 1.74e-09 ***
## hsinstateYes                          0.0653105  0.0361204   1.808   0.0706 .  
## hsinstateNone                        -0.2056722  0.2039254  -1.009   0.3132    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 73803  on 69018  degrees of freedom
## Residual deviance: 50413  on 69003  degrees of freedom
## AIC: 50445
## 
## Number of Fisher Scoring iterations: 6

Using StepAic() function helped to improve the model performance.

Observations 69019
Dependent variable admit
Type Generalized linear model
Family binomial
Link probit
𝛘²(15) 23390.53
Pseudo-R² (Cragg-Uhler) 0.44
Pseudo-R² (McFadden) 0.32
AIC 50444.85
BIC 50591.12
Est. S.E. z val. p
(Intercept) -1.06 0.06 -18.70 0.00
male1 -0.14 0.01 -10.98 0.00
ethnicHispanic -0.03 0.04 -0.70 0.49
ethnicAmerican Indian/Alaskan Native -0.88 0.09 -9.36 0.00
ethnicAsian or Pacific Islander -0.92 0.04 -21.49 0.00
ethnicInternational -0.48 0.09 -5.41 0.00
ethnicOther -0.99 0.07 -14.35 0.00
ethnicWhite, Non-Hispanic -0.80 0.03 -24.51 0.00
US_Citizen1 0.08 0.04 1.82 0.07
Texas_resident1 0.73 0.03 21.83 0.00
satR 0.03 0.00 79.97 0.00
top10TRUE 1.47 0.02 75.05 0.00
hsprivatePrivate -0.17 0.03 -6.65 0.00
hsprivateNone -0.28 0.05 -6.02 0.00
hsinstateYes 0.07 0.04 1.81 0.07
hsinstateNone -0.21 0.20 -1.01 0.31
Standard errors: MLE

Model Results

As it appeared, ethnic variable had the most negative effect for the university acceptance. While being Texas resident or being in top 10% could help to be admitted.

The null deviance of 7.380338^{4} defined how well the target variable could be predicted by a model with only an intercept term.

The residual deviance of 5.041285^{4} defined how well the target variable could be predicted by the AIC model that we fit with the predictor variables listed above. The lower the value, the better the model’s predictions of the response variable.

The p-value associated with this Chi-Square Statistic was 0 (less than .05), so the model could be useful.

The Akaike information criterion (AIC) was 5.044485^{4}. The lower the AIC value, the better the model’s ability to fit the data.

Checking Model Assumptions

The resulting plots show us similar to model 1 picture: under-fitting at lower predicted values, with the predicted proportions being larger than the observed proportions; over-fitting for a couple of predictor patterns at higher predicted values with the predicted values are much larger than the predicted proportions. The Standardized Pearson Residuals plot shows an approximate Standard Normal distribution if the model fits. The model seems good in the middle range but there are extremes on the right and left sides. Some normality of the residuals for the binomial logistic regression models is just an evidence of a decent fitting model.

By checking the linearity assumptions, we see that only rm seem like linear relation with the logit results.

##   1   2   3   4   5   6 
## "1" "1" "1" "1" "0" "1"

The Standardized Residuals plot seems to have a constant variance though there are some outliers.

The marginal model plots below show reasonable agreement across the two sets of fits indicating that model_2_aic is a valid model.

## Error in names(dat) <- object$term : 
##   'names' attribute [1] must be the same length as the vector [0]
## Error in names(dat) <- object$term : 
##   'names' attribute [1] must be the same length as the vector [0]

In terms of multicollinearity, all variables have a VIF less than 5. As a result, multicollinearity shouldn’t be a problem for our model.

##                    GVIF Df GVIF^(1/(2*Df))
## male           1.051668  1        1.025509
## ethnic         1.994261  6        1.059209
## US_Citizen     1.546489  1        1.243579
## Texas_resident 3.086787  1        1.756926
## satR           1.197600  1        1.094349
## top10          1.018228  1        1.009073
## hsprivate      1.383635  2        1.084564
## hsinstate      3.311345  2        1.348966

Predict

Once the uniform admission law was fully in force, fall, 1998.

Lasso

## 17 x 1 sparse Matrix of class "dgCMatrix"
##                                               s1
## (Intercept)                          -1.83381217
## male1                                -0.24603449
## ethnicHispanic                        .         
## ethnicAmerican Indian/Alaskan Native -1.42324025
## ethnicAsian or Pacific Islander      -1.53683380
## ethnicInternational                  -0.79574261
## ethnicOther                          -1.59897292
## ethnicWhite, Non-Hispanic            -1.32162247
## US_Citizen1                           0.12029934
## Texas_resident1                       1.24391271
## satR                                  0.04823873
## testscoreR                            .         
## top10TRUE                             2.73974407
## hsprivatePrivate                     -0.28104105
## hsprivateNone                        -0.48453373
## hsinstateYes                          0.08437993
## hsinstateNone                        -0.28174646
## 17 x 1 sparse Matrix of class "dgCMatrix"
##                                                 s1
## (Intercept)                          -1.906999e+00
## male1                                -2.045992e-01
## ethnicHispanic                        1.628669e-01
## ethnicAmerican Indian/Alaskan Native -9.314747e-01
## ethnicAsian or Pacific Islander      -1.193380e+00
## ethnicInternational                  -4.823771e-01
## ethnicOther                          -1.163898e+00
## ethnicWhite, Non-Hispanic            -1.033448e+00
## US_Citizen1                           8.443794e-02
## Texas_resident1                       1.205775e+00
## satR                                  4.561657e-02
## testscoreR                            8.843864e-17
## top10TRUE                             2.627511e+00
## hsprivatePrivate                     -2.075913e-01
## hsprivateNone                        -4.242971e-01
## hsinstateYes                          6.028163e-02
## hsinstateNone                         .
## 
## Call:  cv.glmnet(x = X, y = Y, nfolds = 5, family = "binomial", link = "logit",      standardize = TRUE, alpha = 1) 
## 
## Measure: Binomial Deviance 
## 
##        Lambda Index Measure       SE Nonzero
## min 0.0004846    64  0.7325 0.001479      14
## 1se 0.0025862    46  0.7338 0.001570      15

## $deviance
## lambda.1se 
##  0.7332928 
## attr(,"measure")
## [1] "Binomial Deviance"
## 
## $class
## lambda.1se 
##  0.1826743 
## attr(,"measure")
## [1] "Misclassification Error"
## 
## $auc
## [1] 0.8654476
## attr(,"measure")
## [1] "AUC"
## 
## $mse
## lambda.1se 
##  0.2410274 
## attr(,"measure")
## [1] "Mean-Squared Error"
## 
## $mae
## lambda.1se 
##  0.4864324 
## attr(,"measure")
## [1] "Mean Absolute Error"
## $AICc
## [1] -23258.91
## 
## $BIC
## [1] -23130.93
## $AICc
## [1] -23162.24
## 
## $BIC
## [1] -23025.12
Model Coefficients
Est
top10TRUE 2.740
Texas_resident1 1.244
US_Citizen1 0.120
hsinstateYes 0.084
satR 0.048
ethnicHispanic 0.000
testscoreR 0.000
male1 -0.246
hsprivatePrivate -0.281
hsinstateNone -0.282
hsprivateNone -0.485
ethnicInternational -0.796
ethnicWhite, Non-Hispanic -1.322
ethnicAmerican Indian/Alaskan Native -1.423
ethnicAsian or Pacific Islander -1.537
ethnicOther -1.599
(Intercept) -1.834

Model Results

The coefficients extracted at the lambda.min value are used to predict the relative crime rate for the testing data set. The confusion matrix highlights an accuracy of 91%.

Analysis

glm() Addmission Coefficients
coef < 1998 1998 1999 2000 2001 2002
(Intercept) -1.065 -1.437 -1.719 -1.277 -1.408 -1.699
male1 -0.144 -0.399 -0.252 -0.145 -0.114 -0.087
ethnicHispanic -0.026 0.117 -0.116 -0.100 0.074 0.173
ethnicAmerican Indian/Alaskan Native -0.879 0.346 0.095 -0.047 0.081 0.089
ethnicAsian or Pacific Islander -0.918 -0.194 -0.383 -0.407 -0.421 -0.211
ethnicInternational -0.485 -0.181 0.069 -0.061 -0.293 -0.114
ethnicOther -0.989 -0.300 0.023 -0.367 -0.369 -0.184
ethnicWhite, Non-Hispanic -0.798 0.204 0.070 -0.246 0.058 0.181
US_Citizen1 0.077 NA 0.251 NA NA NA
Texas_resident1 0.733 0.836 0.514 0.627 0.376 0.574
satR 0.028 0.031 0.019 0.021 0.021 0.021
top10TRUE 1.473 2.001 1.916 2.591 3.052 3.276
hsprivatePrivate -0.171 -0.162 -0.157 NA -0.123 -0.242
hsprivateNone -0.285 -0.487 -0.144 NA -0.087 -0.185
hsinstateYes 0.065 NA 0.443 -0.210 NA NA
hsinstateNone -0.206 NA 0.388 0.326 NA NA
Lasso Addmission Coefficients
coef < 1998 1998 1999 2000 2001 2002
top10TRUE 2.740 3.899 3.399 4.899 5.815 6.375
Texas_resident1 1.244 1.225 0.791 0.820 0.510 0.968
US_Citizen1 0.120 -0.242 0.436 0.051 0.174 -0.044
hsinstateYes 0.084 0.312 0.581 -0.245 0.117 -0.064
satR 0.048 0.056 0.027 0.029 0.028 0.027
male1 -0.246 -0.706 -0.365 -0.199 -0.130 -0.108
hsprivatePrivate -0.281 -0.229 -0.201 -0.066 -0.188 -0.348
hsinstateNone -0.282 NA NA 0.431 -0.340 0.527
hsprivateNone -0.485 -0.700 -0.179 -0.115 -0.093 -0.326
ethnicInternational -0.796 -0.701 NA -0.043 -0.360 -0.088
ethnicWhite, Non-Hispanic -1.322 0.341 0.100 -0.349 0.012 0.266
ethnicAmerican Indian/Alaskan Native -1.423 0.579 NA NA 0.003 0.061
ethnicAsian or Pacific Islander -1.537 -0.376 -0.590 -0.567 -0.690 -0.353
ethnicOther -1.599 -0.510 NA -0.483 -0.442 -0.304
(Intercept) -1.834 -2.420 -2.441 -1.741 -2.071 -2.299
ethnicHispanic NA 0.149 -0.101 NA 0.111 0.332
testscoreR NA 0.000 0.000 0.000 0.000 0.000

Discussion and Conclusions

Conclude your findings, limitations, and suggest areas for future work.