Our goal is to get a better understanding about the relationship between U.S. colleges’ selectivity and their graduation outcomes.
Is there evidence that the selectivity of an institution’s admissions benefits graduation outcomes (income)?
How well do standardized test scores predict graduate outcomes (income)?
For this project, we chose our data set to be from the College Scorecard due to its incredible amount of data, which is owned by the Department of Education of the United States.
The College Scorecard is an online tool that aims to allow people to compare the cost and value of U.S. colleges through displaying data about cost, graduation, demographics, aid, debt, income, and more for most colleges in the nation (around 6,500 institutions).
They also release data about each college with around 3,200 variables in many categories about colleges. An example of one of them is the median income of students (who received aid) 6 years after graduation.
For our exploratory analysis, our sample was every college in the United States, as this data is sourced from the government, which has oversight over all of them.
For income and aid related data, the source relies on students who receive financial aid for data, which is important to keep in mind when making conclusions.
The sample we used for our regression analysis all medium sized schools (5,000-15,000 students) in the United States.
We defined small schools as ones with less than 5,000 students, medium having 5,000-15,000 students and large schools having more than 15,000 students.
Medium sized schools were the exception to this, where they have always diverted from the others, except until recent years.
We can see that there is very little difference between small and large schools in terms of selectivity, even throughout the years.
Small and large sized schools had an change of -1% in admission rate from 2002 to 2022, while medium sized schools had a +3% change.
We can observe that all types of schools being very similar with SAT scores at 1050, except for medium schools around a specific time.
A noticeable difference comes around 2016, as that was when the SAT was restructured. It got noticeably easier to get a higher score, given the average spiked up by nearly 100 points.
At 2022, the average score for large and small schools was around 1170, which is a 7.5% increase compared to 20 years ago (2002).
After the new SAT came out, medium sized schools started to diverge from the other types and had higher averages.
There is an extreme difference between medium sized schools to small and large schools in regards to average income after graduation. This is interestingly similar to the first graph where the medium sized schools were the outlier compared to the other sizes.
Around 2008, there tends to be a drop of income for years, and knowing recent history, that could possibly be related to the 2008 recession.
At 2004, small and large schools had a very close increase of income compared to 2020 of +3,800 USD and +3,937 USD, while medium sized schools had an increase of +5,370 USD.
After 2013, all average incomes had a similar year by year increase, with medium schools following similar slopes with overall higher income in all years.
This exploratory data analysis has gotten us to observe that overall earnings have dropped since 2008, which aligns with the housing crisis and recession that happened in the U.S.
We have also noticed a pattern with admission rates being very similar to graduate earnings, so we want to analyze this phenomenon.
We believe there could be a possible relationship that we can find regarding how selective an institution is and the income from its graduates. We can use the same data set and perform regression analysis to discover this.
RQ: Is there evidence that the selectivity of an institution’s admissions benefits graduate outcomes(income)?
What is the relationship between colleges’ admissions rates, and graduate earnings?
How is this relationship shaped over time?
How has this relationship changed—if at all—since the 2008 recession?
\[\hat{Y} = \beta_0 + \beta_1x_1 + \beta_2x_2 + \beta_3x_3*\beta_1x_1\]
\(\hat{Y} =\) Median earnings 6 years after enrollment
\(x_1 =\) Admissions Rate of a given college
\(x_2 =\) Year, from 2004 to 2019
\(x_3 =\) Indicates if the year is before 2008
\(\beta_4 =\) The interaction between admissions rate and our pre/post 2008 condition.
By understanding how admission rates has affected earnings from 2004 to 2019, before and after 2008, we can gain a better understanding of their strength and trend of over time.
##
## Call:
## lm(formula = MD_EARN_WNE_P6 ~ ADM_RATE + year + pre_2008 * ADM_RATE,
## data = combined_df_mediumLM)
##
## Residuals:
## Min 1Q Median 3Q Max
## -33939 -5121 -498 4832 39753
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.006e+06 9.401e+04 -21.342 < 2e-16 ***
## ADM_RATE -2.410e+04 8.407e+02 -28.663 < 2e-16 ***
## year 1.023e+03 4.669e+01 21.916 < 2e-16 ***
## pre_2008 5.553e+02 1.453e+03 0.382 0.702
## ADM_RATE:pre_2008 1.005e+04 2.018e+03 4.979 6.73e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9005 on 3124 degrees of freedom
## Multiple R-squared: 0.302, Adjusted R-squared: 0.3011
## F-statistic: 337.9 on 4 and 3124 DF, p-value: < 2.2e-16
Our regression analysis found that there is a negative correlation between admissions rate and earnings; meaning that on avergae, the lower the admissions rate of a college, the higher the earnings are for graduates of that college.
Additionally:
\[H_0 = (\beta_1 = \beta_1+\beta_4)\text{ or }(\beta_4=0)\]In our analysis, we assumed that there is no change in the relationship between the admissions rate and year on graduate earnings before and after 2008. We can represent our null hypothesis as such:
Our alternative hypothesis is that there is a difference in the effect of admissions rate on graduate earnings:
\[H_a = (\beta_1 \neq \beta_1+\beta_4)\text{ or }(\beta_4\neq0)\]
In doing this test, we would like to assess the true affect of the 2008 recession on the relationship between admissions rate and earnings.
We can assume independence in our sample, because although colleges as businesses and research insitutions all influence one another in our economy and academia at large, we can treat each one as a seperate entity.
Our sample is large enough for us to have confidence in the results of our hypothesis test as well, comprising of more than 30 observations. In fact, our sample comprises of 3,129 schools.
Type I error: Mistakenly rejecting the assumption that there is no difference in the relationship between our explanatory variables and earnings from before 2008, to after.
This type of error could minimize the impact of the 2008 financial crisis, and warp our understanding of its historical relevance.
Type II error: Failing to reject the null hypothesis, and mistakenly supporting the assumption that the relationship between our explanatory variables and earnings changes from before 2008, to after.
This type of error could unfairly weigh a prospective student’s choice to attend an institution based on known data.
A significance level of 5%, or α = 0.05 was used to minimize the possibility of a Type I error.
Running our regression analysis under our null hypothesis, we obtained a P-value of <0.0001.
A significance level of 5%, or α = 0.05 was used to minimize the possibility of a Type I error.
With a significance level of 5%, these results suggest a significant difference in the effect of admissions rate on graduate earnings before and after the 2008 financial crisis.
This means that admissions rate showed a stronger effect on graduation outcome before 2008, and a weaker effect post-2008.
We have chosen a 95% confidence level for each of our coefficients.
Our sample size is large enough to be generally confident in our results; however, if we would like to generalize these trends to all colleges in the US our confidence interval must be wide enough to encompass this wider population of schools.
## 2.5 % 97.5 %
## (Intercept) -2190676.452 -1822025.515
## ADM_RATE -25746.623 -22449.713
## year 931.732 1114.828
## pre_2008 -2292.862 3403.423
## ADM_RATE:pre_2008 6090.385 14002.875
Each confidence interval above represents the range of possible “slopes”, for each variable in our multiple linear regression model.
R.Q. : Is there evidence that the selectivity of an institution’s admissions benefits graduation outcomes (income)?
From our Regression Analysis, we found that there is a negative correlation between admissions rate and earnings.
From our Hypothesis Test, we found that admissions rate showed a stronger effect on graduation outcome before 2008 than after.
Why does admissions rate drop at the same time earnings drop?
Were colleges accepting less students in reaction to the crash?
What other factors may be attributed to admissions rate, that may better explain graduate earnings for each college.
Presence of legacy admissions? Regional advantages?