Academic Progress Rate: How does progress vary when looking at schools versus individual sports?

Universities, educators, and parents are concerned about the academic track of student atheletes. The National Collegiate Athletic Association designed the Academic Progress Rate (APR) that creates a score for athletes of which informs them of their likelihood to graduate school. The academic progress report is a compilation of several rates: their academic eligibility (did they earn passing marks to compete in championships), retention (were they retained as athletes in the following semester/year), and if they are on track for graduation. For my research, I am using the “Academic Scores for NCAA Athletic Programs” dataset that I downloaded from Kaggle.

I chose to conduct a multilevel analysis using academic progress rate as my continuous dependent variable. I will be looking at academic progress rate according to two levels: the schools and the individual sports teams. I will also determine how gender and conference-region impacts academic progress rate.

Packages Used

library(readr)
library(nlme)
library(lme4)
## Loading required package: Matrix
## 
## Attaching package: 'lme4'
## The following object is masked from 'package:nlme':
## 
##     lmList
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:nlme':
## 
##     collapse
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(lmerTest)
## 
## Attaching package: 'lmerTest'
## The following object is masked from 'package:lme4':
## 
##     lmer
## The following object is masked from 'package:stats':
## 
##     step
library(ggplot2)

Importing Dataset

apr <-read_csv("C:/Users/Skippz/Desktop/database.csv", col_names = TRUE)
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   SCHOOL_NAME = col_character(),
##   SPORT_NAME = col_character(),
##   NCAA_CONFERENCE = col_character()
## )
## See spec(...) for full column specifications.
apr1<-apr%>%
  select(SCHOOL_NAME, SPORT_NAME, SCHOOL_TYPE, FOURYEAR_SCORE, FOURYEAR_ATHLETES, FOURYEAR_ELIGIBILITY, FOURYEAR_RETENTION)%>%
  filter(!is.na(FOURYEAR_SCORE), !is.na(FOURYEAR_ATHLETES), !is.na(FOURYEAR_ELIGIBILITY), !is.na(FOURYEAR_RETENTION))
head(apr1)
## # A tibble: 6 x 7
##   SCHOOL_NAME SPORT_NAME SCHOOL_TYPE FOURYEAR_SCORE FOURYEAR_ATHLET~
##   <chr>       <chr>            <dbl>          <dbl>            <dbl>
## 1 Alabama A&~ Baseball             0            931               80
## 2 Alabama A&~ Football             0            932              321
## 3 Alabama A&~ Men's Bas~           0            964               43
## 4 Alabama A&~ Men's Golf           0            898               22
## 5 Alabama A&~ Men's Ten~           0            988               12
## 6 Alabama A&~ Men's Tra~           0            932               62
## # ... with 2 more variables: FOURYEAR_ELIGIBILITY <dbl>,
## #   FOURYEAR_RETENTION <dbl>

Analysis by School Type (Complete Pooling)

There are 2 different types of schools: public and private. According to complete pooling, public school has an average APR score of 975.4496 and private school has a 6.1180 increase in APR score beyond that.This suggests that private schools are more successful at keeping student-athletes on a strong academic track.

length(unique(apr1$SCHOOL_TYPE))
## [1] 2
cpool <- lm(FOURYEAR_SCORE ~ SCHOOL_TYPE, data = apr1)
summary(cpool)
## 
## Call:
## lm(formula = FOURYEAR_SCORE ~ SCHOOL_TYPE, data = apr1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1080.57    -8.45     6.55    16.43    24.55 
## 
## Coefficients:
##             Estimate Std. Error  t value Pr(>|t|)    
## (Intercept) 975.4496     0.7424 1313.935  < 2e-16 ***
## SCHOOL_TYPE   6.1180     1.2507    4.892 1.02e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 48.21 on 6509 degrees of freedom
## Multiple R-squared:  0.003663,   Adjusted R-squared:  0.00351 
## F-statistic: 23.93 on 1 and 6509 DF,  p-value: 1.024e-06

Analysis by School Level (Ecological Regression)

The ecological regression below states that an increase in the mean of private schools has an increase in the mean of APR scores by 7.0996. This data continues to support the idea that private schools have higher APR scores than their public counterparts.

length(unique(apr1$SCHOOL_NAME))
## [1] 385
aprmean <- apr1 %>%
  group_by(SCHOOL_NAME) %>%
  summarise(m_score = mean(FOURYEAR_SCORE, na.rm=TRUE), m_schooltype = mean(SCHOOL_TYPE, na.rm = TRUE))
head(aprmean)
## # A tibble: 6 x 3
##   SCHOOL_NAME                    m_score m_schooltype
##   <chr>                            <dbl>        <dbl>
## 1 Abilene Christian University      978.            1
## 2 Alabama A&M University            956.            0
## 3 Alabama State University          950.            0
## 4 Alcorn State University           956.            0
## 5 American International College    993             1
## 6 American University               986.            1
er <- lm(m_score ~ m_schooltype, data=aprmean)
summary(er)
## 
## Call:
## lm(formula = m_score ~ m_schooltype, data = aprmean)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -79.983  -3.441   3.288   8.348  22.725 
## 
## Coefficients:
##              Estimate Std. Error  t value Pr(>|t|)    
## (Intercept)  974.2746     0.9141 1065.819  < 2e-16 ***
## m_schooltype   7.0996     1.5437    4.599 5.77e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.45 on 383 degrees of freedom
## Multiple R-squared:  0.05234,    Adjusted R-squared:  0.04986 
## F-statistic: 21.15 on 1 and 383 DF,  p-value: 5.774e-06

No Pooling Model

Private schools always have a significantly higher APR score than public schools.

dcoef  <- apr1 %>% 
    group_by(SCHOOL_NAME) %>% 
    do(mod = lm(FOURYEAR_SCORE ~ SCHOOL_TYPE, data = .))
coef <- dcoef %>% do(data.frame(SCHOOL_TYPE = coef(.$mod)[1]))
ggplot(coef, aes(x = SCHOOL_TYPE)) + geom_histogram(fill = "#3366ff")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(coef, aes(x = SCHOOL_TYPE)) + geom_density(fill="#ffcc33")+xlab("APR Scores Across 385 Schools ")

#Finding the Slope There does not seem to be a significant difference between individual sports by public and private and their impact on APR scores.

dcoef <- apr1%>% 
    group_by(SPORT_NAME) %>% 
    do(mod = lm(FOURYEAR_SCORE ~ SCHOOL_TYPE, data = .))
coef <- dcoef %>% do(data.frame(SCHOOL_TYPE= coef(.$mod)[2.]))
ggplot(coef, aes(x = SCHOOL_TYPE)) + geom_histogram(fill="#34d000")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Random Intercept Model

According to this random intercept model, public schools had a standard deviation of 8.783339 and APR on average is 975.112. However, the standard deviation for private schools is 47.3848 and on average, they will score higher on APR than public schools by 6.1426.

random_i <- lme(FOURYEAR_SCORE ~ SCHOOL_TYPE, data = apr1, random = ~1|SCHOOL_NAME, method = "ML")
summary(random_i)
## Linear mixed-effects model fit by maximum likelihood
##  Data: apr1 
##        AIC      BIC    logLik
##   68900.87 68927.99 -34446.43
## 
## Random effects:
##  Formula: ~1 | SCHOOL_NAME
##         (Intercept) Residual
## StdDev:    8.783339  47.3848
## 
## Fixed effects: FOURYEAR_SCORE ~ SCHOOL_TYPE 
##                Value Std.Error   DF   t-value p-value
## (Intercept) 975.1120 0.9299801 6126 1048.5300   0e+00
## SCHOOL_TYPE   6.1426 1.5819439  383    3.8829   1e-04
##  Correlation: 
##             (Intr)
## SCHOOL_TYPE -0.588
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -22.4330095  -0.1632614   0.1193379   0.2902233   1.1395799 
## 
## Number of Observations: 6511
## Number of Groups: 385

Random Slope Model

random_s <- lme(FOURYEAR_SCORE ~ SCHOOL_TYPE, data = apr1, random = ~1|SCHOOL_NAME, method = "ML")
summary(random_s)
## Linear mixed-effects model fit by maximum likelihood
##  Data: apr1 
##        AIC      BIC    logLik
##   68900.87 68927.99 -34446.43
## 
## Random effects:
##  Formula: ~1 | SCHOOL_NAME
##         (Intercept) Residual
## StdDev:    8.783339  47.3848
## 
## Fixed effects: FOURYEAR_SCORE ~ SCHOOL_TYPE 
##                Value Std.Error   DF   t-value p-value
## (Intercept) 975.1120 0.9299801 6126 1048.5300   0e+00
## SCHOOL_TYPE   6.1426 1.5819439  383    3.8829   1e-04
##  Correlation: 
##             (Intr)
## SCHOOL_TYPE -0.588
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -22.4330095  -0.1632614   0.1193379   0.2902233   1.1395799 
## 
## Number of Observations: 6511
## Number of Groups: 385

Findings & Results

Private schools have significantly higher academic progress rates than public schools. These findings suggest that public schools may need more gudiance and programming on how to better motivate and train their athletes to do well both as an athlete and as an academic. Its extremely important to ensure that athletes have a solid educational background in case they do not go on a professional career track or are unable to grab attention of scouts for teams. There is no guarantee that an student athlete will have a successful sports career. College education should be giving them more opportunities in their careers and to improve their livelihood.