The cost of a college education is a matter of great importance to thousands of U.S. high school students who embark on their journey to higher education each year. However, as the price of a college degree increases and the student debt crisis mounts, its return on investment coming into question. As a consequence, higher education institutions are seeing significant drops in enrollment as Gen Z becomes disillusioned.
Using data from the U.S. Department of Education’s College Scorecard, this analysis will explore the link between tuition rates, graduation rates, admission rate, and future earnings for thousands of colleges in the United States. Uncovering the relationships between these factors could be helpful for students who are exploring their college options and want the best opportunity for a return on their investment.
The data is compiled by universities nationwide the U.S. Department of Education (USDE) and made available on their website. The file is titled “Most Recent Institution-Level Data.” Data is reported to the USDE for students who were eligible for some form of federal aid.
Link to data: https://collegescorecard.ed.gov/data
# Reading in the data
colleges <- read.csv(file = 'Most-Recent-Cohorts-Institution.csv')
# head(colleges)
# Install and reading in color library
#install.packages('viridis')
library('viridis')
## Warning: package 'viridis' was built under R version 4.2.3
## Loading required package: viridisLite
library(car)
## Loading required package: carData
# Retrieve the number of rows and columns
dim(colleges)
## [1] 6681 2989
A college degree cannot help students earn a higher income if they do not graduate. Furthermore, failing to graduate from college after taking out student loans can have a negative impact on a student’s available income, making it harder to afford necessities like rent or groceries as they pay off their debt.
This section will examine whether universities that charge higher levels of tuition also have better graduation rates. It utilizes the 6-year graduation rate, which is generally considered in higher education to be the true average time it takes a student to complete their undergraduate studies rather than four years.
# Selecting only institution names, the cost of attendance and graduation rate
cost <- colleges[, c('INSTNM', 'COSTT4_A', 'C150_4')]
cost$COSTT4_A <- as.numeric(cost$COSTT4_A)
## Warning: NAs introduced by coercion
cost$C150_4 <- as.numeric(cost$C150_4)
## Warning: NAs introduced by coercion
colnames(cost) <- c('Institution Name', 'Cost of Attendance', 'Graduation Rate')
head(cost)
## Institution Name Cost of Attendance Graduation Rate
## 1 Alabama A & M University 23445 0.2866
## 2 University of Alabama at Birmingham 25542 0.6117
## 3 Amridge University 20100 0.2500
## 4 University of Alabama in Huntsville 24861 0.5714
## 5 Alabama State University 21892 0.3177
## 6 The University of Alabama 30016 0.7214
tail(cost)
## Institution Name Cost of Attendance
## 6676 Pennsylvania State University-Penn State Shenango NA
## 6677 Pennsylvania State University-Penn State Wilkes-Barre NA
## 6678 Pennsylvania State University-Penn State York NA
## 6679 Pennsylvania State University-Penn State Great Valley NA
## 6680 Pennsylvania State University-Penn State Harrisburg NA
## 6681 Pennsylvania State University-Penn State Brandywine NA
## Graduation Rate
## 6676 NA
## 6677 NA
## 6678 NA
## 6679 NA
## 6680 NA
## 6681 NA
# Checking for NAs
sum(is.na(cost))
## [1] 7769
# Removing rows with NAs.
cost <- cost[complete.cases(cost), ]
sum(is.na(cost))
## [1] 0
# Checking the dimensions of the remaining rows
dim(cost)
## [1] 2139 3
head(cost)
## Institution Name Cost of Attendance Graduation Rate
## 1 Alabama A & M University 23445 0.2866
## 2 University of Alabama at Birmingham 25542 0.6117
## 3 Amridge University 20100 0.2500
## 4 University of Alabama in Huntsville 24861 0.5714
## 5 Alabama State University 21892 0.3177
## 6 The University of Alabama 30016 0.7214
tail(cost)
## Institution Name
## 5977 Indiana Institute of Technology-College of Professional Studies
## 5996 Yeshiva of Ocean
## 5997 Congregation Talmidei Mesivta Tiferes Shmiel Aleksander
## 6007 Yeshiva Gedolah of Woodlake Village
## 6046 Urshan College
## 6106 The Pennsylvania State University
## Cost of Attendance Graduation Rate
## 5977 22827 0.1185
## 5996 14100 0.8333
## 5997 23300 0.1136
## 6007 14300 0.5000
## 6046 11054 0.1176
## 6106 33960 0.7281
We can see plainly that while the Graduation Rate almost normally distributed and slightly left skewed, the Cost of Attendance is skewed right. Those values are gathered around the lower end of the cost scale.
# Examining the data with histograms
hist(cost$`Cost of Attendance`)
hist(cost$`Graduation Rate`)
# Plotting the data shows a somewhat linear relationship
plot(cost$`Cost of Attendance`, cost$`Graduation Rate`, xlab="Cost of Attendance", ylab="Graduation Rate")
cost.lm <- lm(cost$`Graduation Rate` ~ cost$`Cost of Attendance`)
summary(cost.lm)
##
## Call:
## lm(formula = cost$`Graduation Rate` ~ cost$`Cost of Attendance`)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.58940 -0.10638 0.00005 0.10263 0.61930
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.784e-01 8.416e-03 33.08 <2e-16 ***
## cost$`Cost of Attendance` 7.005e-06 2.183e-07 32.09 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1754 on 2137 degrees of freedom
## Multiple R-squared: 0.3252, Adjusted R-squared: 0.3249
## F-statistic: 1030 on 1 and 2137 DF, p-value: < 2.2e-16
plot(cost$`Cost of Attendance`, cost$`Graduation Rate`, xlab="Cost of Attendance", ylab="Graduation Rate")
abline(cost.lm)
Based on the coefficients and p-value, there appears to be a linear relationship between the cost of tuition and graduation rate where a $1 increase in tuition leads to a minuscule increase in the graduate rate. But a plot of residuals reveals that there is a non-constant error variance. Therefore, a log transformation is needed to see if the model can be improved.
par(mfrow = c(2, 2))
plot(cost.lm)
# Performing a log transformation on Cost of Attendance
plot(cost$`Graduation Rate` ~ log(cost$`Cost of Attendance`), xlab="Cost of Attendance", ylab="Graduation Rate")
cost_log.lm <- lm(cost$`Graduation Rate` ~ log(cost$`Cost of Attendance`))
summary(cost_log.lm)
##
## Call:
## lm(formula = cost$`Graduation Rate` ~ log(cost$`Cost of Attendance`))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.60561 -0.10510 0.00225 0.11275 0.64335
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.790086 0.076905 -23.28 <2e-16 ***
## log(cost$`Cost of Attendance`) 0.223880 0.007445 30.07 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.179 on 2137 degrees of freedom
## Multiple R-squared: 0.2973, Adjusted R-squared: 0.297
## F-statistic: 904.2 on 1 and 2137 DF, p-value: < 2.2e-16
# Plotting the model
plot(log(cost$`Cost of Attendance`), cost$`Graduation Rate`, xlab="Cost of Attendance", ylab="Graduation Rate")
abline(cost_log.lm)
# Plotting the residuals
par(mfrow = c(2, 2))
plot(cost_log.lm)
The log transformation has improved the model, and we an observe much more randomness in the plot of the residuals. We can say there is a linear correlation between universities’ tuition and graduation rates, where a 1 unit increase in tuition will lead to a .22% increase in graduation rate.
This is perhaps because universities that charge more money may have more student and academic support resources.
High-cost and prestigious universities tend to be concentrated on the East and West Coasts, which could lead to the assumption that attending college in these areas will give students the best chance of later earning a high salary. This section will look at whether the geographic region is correlated with students’ starting salaries.
# Selecting only institution names, the cost of attendance, median earnings, region and graduation rate
geo_region <- colleges[, c('INSTNM', 'COSTT4_A', 'C150_4', 'MD_EARN_WNE_P6' , 'REGION')]
geo_region$COSTT4_A <- as.numeric(geo_region$COSTT4_A)
## Warning: NAs introduced by coercion
geo_region$C150_4 <- as.numeric(geo_region$C150_4)
## Warning: NAs introduced by coercion
geo_region$MD_EARN_WNE_P6 <- as.numeric(geo_region$MD_EARN_WNE_P6)
## Warning: NAs introduced by coercion
colnames(geo_region) <- c('Institution Name', 'Cost of Attendance', 'Graduation Rate', 'Median Earnings', 'Region')
head(geo_region)
## Institution Name Cost of Attendance Graduation Rate
## 1 Alabama A & M University 23445 0.2866
## 2 University of Alabama at Birmingham 25542 0.6117
## 3 Amridge University 20100 0.2500
## 4 University of Alabama in Huntsville 24861 0.5714
## 5 Alabama State University 21892 0.3177
## 6 The University of Alabama 30016 0.7214
## Median Earnings Region
## 1 28704 5
## 2 39271 5
## 3 32939 5
## 4 47533 5
## 5 25949 5
## 6 44485 5
tail(geo_region)
## Institution Name Cost of Attendance
## 6676 Pennsylvania State University-Penn State Shenango NA
## 6677 Pennsylvania State University-Penn State Wilkes-Barre NA
## 6678 Pennsylvania State University-Penn State York NA
## 6679 Pennsylvania State University-Penn State Great Valley NA
## 6680 Pennsylvania State University-Penn State Harrisburg NA
## 6681 Pennsylvania State University-Penn State Brandywine NA
## Graduation Rate Median Earnings Region
## 6676 NA NA 2
## 6677 NA NA 2
## 6678 NA NA 2
## 6679 NA NA 2
## 6680 NA NA 2
## 6681 NA NA 2
# Removing NAs
geo_region <- geo_region[complete.cases(geo_region), ]
sum(is.na(geo_region))
## [1] 0
head(geo_region)
## Institution Name Cost of Attendance Graduation Rate
## 1 Alabama A & M University 23445 0.2866
## 2 University of Alabama at Birmingham 25542 0.6117
## 3 Amridge University 20100 0.2500
## 4 University of Alabama in Huntsville 24861 0.5714
## 5 Alabama State University 21892 0.3177
## 6 The University of Alabama 30016 0.7214
## Median Earnings Region
## 1 28704 5
## 2 39271 5
## 3 32939 5
## 4 47533 5
## 5 25949 5
## 6 44485 5
tail(geo_region)
## Institution Name
## 5711 Husson University
## 5777 Indiana Wesleyan University-National & Global
## 5814 Carolina University
## 5871 Purdue University Northwest
## 5974 Drury University-College of Continuing Professional Studies
## 5977 Indiana Institute of Technology-College of Professional Studies
## Cost of Attendance Graduation Rate Median Earnings Region
## 5711 31303 0.5764 38215 1
## 5777 23591 0.3833 49124 3
## 5814 22830 0.4615 28038 5
## 5871 17408 0.4192 38320 3
## 5974 20867 0.3125 32347 4
## 5977 22827 0.1185 35502 3
dim(geo_region)
## [1] 2031 5
# Renaming columns
geo_regions <- geo_region[, c('Institution Name', 'Region', 'Median Earnings')]
geo_regions$Region[geo_regions$Region == 1] <- 'New England'
geo_regions$Region[geo_regions$Region == 2] <- 'Mid East'
geo_regions$Region[geo_regions$Region == 3] <- 'Great Lakes'
geo_regions$Region[geo_regions$Region == 4] <- 'Plains'
geo_regions$Region[geo_regions$Region == 5] <- 'Southeast'
geo_regions$Region[geo_regions$Region == 6] <- 'Southwest'
geo_regions$Region[geo_regions$Region == 7] <- 'Rocky Mountains'
geo_regions$Region[geo_regions$Region == 8] <- 'Far West'
geo_regions$Region[geo_regions$Region == 9] <- 'Outlying Areas'
head(geo_regions)
## Institution Name Region Median Earnings
## 1 Alabama A & M University Southeast 28704
## 2 University of Alabama at Birmingham Southeast 39271
## 3 Amridge University Southeast 32939
## 4 University of Alabama in Huntsville Southeast 47533
## 5 Alabama State University Southeast 25949
## 6 The University of Alabama Southeast 44485
# Calculating the median earnings per region
group_region <- aggregate(geo_regions$`Median Earnings`, list(geo_regions$Region), median)
colnames(group_region) <- c("Region", "Median")
group_region <- group_region[-1,]
group_region$Region <- as.character(group_region$Region)
geo_region <- geo_region[complete.cases(geo_region), ]
group_region
## Region Median
## 2 Far West 41851.0
## 3 Great Lakes 39914.0
## 4 Mid East 43284.5
## 5 New England 44057.0
## 6 Outlying Areas 19917.0
## 7 Plains 39996.0
## 8 Rocky Mountains 35365.0
## 9 Southeast 34078.0
## 10 Southwest 35876.0
# Visualizing the results
barplot(group_region$Median, col=viridis(9), xlab = 'Regions', ylab = 'Student Median Earnings')
legend("bottomright", fill = viridis(9), legend = group_region$Region, xpd=TRUE, cex=.6, bg = 'white')
We can see that New England, Mid East, and Far West (which include the states New York and California) have the highest median earnings for students. The Southeast and Outlying Areas, which include poor states like Louisiana and U.S. territories like Guam, have the lowest median earnings for students.
# Getting correlations for all columns, excluding the names column
cor(geo_region[,-1])
## Cost of Attendance Graduation Rate Median Earnings
## Cost of Attendance 1.0000000 0.5934087 0.4936390
## Graduation Rate 0.5934087 1.0000000 0.5593289
## Median Earnings 0.4936390 0.5593289 1.0000000
## Region -0.2985036 -0.2278097 -0.2273482
## Region
## Cost of Attendance -0.2985036
## Graduation Rate -0.2278097
## Median Earnings -0.2273482
## Region 1.0000000
# Generating a heat map
heatmap(abs(cor(geo_region[,-1])), cexRow = .7, cexCol = .7)
The heatmap of correlations is not showing a strong relationship between Earnings and the Region of a college or university. However, it reveals a correlation between Median Earnings and Graduation Rate. This deserves further investigation.
These histograms show that Graduation Rate is fairly normally distributed, but Median Earnings is right-skewed. This may call for a log transformation.
hist(geo_region$`Graduation Rate`)
hist(geo_region$`Median Earnings`)
# Plotting the data without log regression
earnings_grad.lm <- lm(geo_regions$`Median Earnings` ~ geo_region$`Graduation Rate`)
plot(geo_region$`Graduation Rate`, geo_regions$`Median Earnings`)
abline(earnings_grad.lm)
# Residuals without log regression
par(mfrow = c(2, 2))
plot(earnings_grad.lm)
# Plot with log regression on
earnings_grad.lm <- lm(log(geo_regions$`Median Earnings`) ~ geo_region$`Graduation Rate`)
plot(geo_region$`Graduation Rate`, log(geo_regions$`Median Earnings`))
abline(earnings_grad.lm)
par(mfrow = c(2, 2))
plot(earnings_grad.lm)
summary(earnings_grad.lm)
##
## Call:
## lm(formula = log(geo_regions$`Median Earnings`) ~ geo_region$`Graduation Rate`)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.30847 -0.10641 0.01391 0.12497 1.19952
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.13555 0.01456 695.95 <2e-16 ***
## geo_region$`Graduation Rate` 0.78816 0.02591 30.42 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2426 on 2029 degrees of freedom
## Multiple R-squared: 0.3132, Adjusted R-squared: 0.3129
## F-statistic: 925.3 on 1 and 2029 DF, p-value: < 2.2e-16
While the model isn’t perfect, we do see some correlation between Median Earnings and Graduation Rate. As the Graduation Rate increases by 1 percent, Median Earnings will increase by 0.79 units.
People think that an IT background earns college graduates a lot a high starting salary. We did a regression analysis on the this topic to find out if a correlation exists between having a STEM degree and high earnings.
We took the percentage of students who were awarded degrees in all kinds of technology programs (including Computer Engineering, Communication Technology, Science and Technology) from all universities in the data set. Additional data includes the median earnings after 10 years and the cost of attendance of every university, categorized based on whether their cost of attendance was above or below the mean tuition rate.
What we found is interesting! The increased percentage of technology graduates in U.S. doesn’t account for the increase in median earnings 10 years after graduation.
This shows that IT graduates are not really getting paid higher salaries than their peers!
# Subsetting the data
earningsTech <- colleges[, c('INSTNM', 'COSTT4_A', 'MD_EARN_WNE_P10', 'PCIP10','PCIP15','PCIP41')]
earningsTech$COSTT4_A <- as.numeric(earningsTech$COSTT4_A)
## Warning: NAs introduced by coercion
earningsTech$MD_EARN_WNE_P10 <- as.numeric(earningsTech$MD_EARN_WNE_P10)
## Warning: NAs introduced by coercion
earningsTech$PCIP10 <- as.numeric(earningsTech$PCIP10)
## Warning: NAs introduced by coercion
earningsTech$PCIP15 <- as.numeric(earningsTech$PCIP15)
## Warning: NAs introduced by coercion
earningsTech$PCIP41 <- as.numeric(earningsTech$PCIP41)
## Warning: NAs introduced by coercion
colnames(earningsTech) <- c('Institution Name', 'Cost of Attendance', 'Earnings', 'CommTech', 'EngTech', 'SciTech')
head(earningsTech)
## Institution Name Cost of Attendance Earnings CommTech
## 1 Alabama A & M University 23445 36339 0.0393
## 2 University of Alabama at Birmingham 25542 46990 0.0000
## 3 Amridge University 20100 37895 0.0000
## 4 University of Alabama in Huntsville 24861 54361 0.0000
## 5 Alabama State University 21892 32084 0.0000
## 6 The University of Alabama 30016 52751 0.0000
## EngTech SciTech
## 1 0.0154 0e+00
## 2 0.0000 4e-04
## 3 0.0000 0e+00
## 4 0.0000 0e+00
## 5 0.0000 0e+00
## 6 0.0000 0e+00
#Removing Na's
earningsTech <- earningsTech[complete.cases(earningsTech), ]
sum(is.na(earningsTech))
## [1] 0
#Combined the percentage of all Tech related degrees.
earningsTech$TotalTech <- earningsTech$CommTech + earningsTech$EngTech + earningsTech$SciTech
#log transformation for the percentage of degrees
earningsTech$TotalTech <- log(earningsTech$TotalTech)
#Removing infinite values after log transformation.
for (i in 1:nrow(earningsTech)){
if(is.infinite(earningsTech$TotalTech[i])){
earningsTech <- earningsTech[-i,]
}
}
for (i in 1:nrow(earningsTech)){
if(is.infinite(earningsTech$TotalTech[i])){
earningsTech <- earningsTech[-i,]
}
}
for (i in 1:nrow(earningsTech)){
if(is.infinite(earningsTech$TotalTech[i])){
earningsTech <- earningsTech[-i,]
}
}
for (i in 1:nrow(earningsTech)){
if(is.infinite(earningsTech$TotalTech[i])){
earningsTech <- earningsTech[-i,]
}
}
for (i in 1:nrow(earningsTech)){
if(is.infinite(earningsTech$TotalTech[i])){
earningsTech <- earningsTech[-i,]
}
}
# Categorizes universities as having a high or not high tuition rate
earningsTech$Inst_Range <- ifelse(earningsTech$`Cost of Attendance` >=
mean(earningsTech$`Cost of Attendance`), "AboveAvg", "BelowAvg")
#Creating a model
techModel <- lm(earningsTech$Earnings ~ earningsTech$TotalTech + earningsTech$`Cost of Attendance`+earningsTech$Inst_Range)
#Full Model
techmodel1 <- lm(earningsTech$Earnings ~ earningsTech$TotalTech + earningsTech$`Cost of Attendance`+earningsTech$Inst_Range + earningsTech$TotalTech: earningsTech$Inst_Range)
#Summary of the Model
summary(techModel)
##
## Call:
## lm(formula = earningsTech$Earnings ~ earningsTech$TotalTech +
## earningsTech$`Cost of Attendance` + earningsTech$Inst_Range)
##
## Residuals:
## Min 1Q Median 3Q Max
## -26098 -4800 -504 4084 47282
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.056e+04 1.159e+03 26.363 < 2e-16 ***
## earningsTech$TotalTech -1.115e+03 1.783e+02 -6.252 5.42e-10 ***
## earningsTech$`Cost of Attendance` 4.146e-01 2.631e-02 15.761 < 2e-16 ***
## earningsTech$Inst_RangeBelowAvg -3.889e+03 7.296e+02 -5.331 1.14e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8300 on 1359 degrees of freedom
## Multiple R-squared: 0.4175, Adjusted R-squared: 0.4163
## F-statistic: 324.7 on 3 and 1359 DF, p-value: < 2.2e-16
summary(techmodel1)
##
## Call:
## lm(formula = earningsTech$Earnings ~ earningsTech$TotalTech +
## earningsTech$`Cost of Attendance` + earningsTech$Inst_Range +
## earningsTech$TotalTech:earningsTech$Inst_Range)
##
## Residuals:
## Min 1Q Median 3Q Max
## -25309 -4979 -574 4164 48295
##
## Coefficients:
## Estimate Std. Error
## (Intercept) 2.863e+04 1.363e+03
## earningsTech$TotalTech -1.653e+03 2.686e+02
## earningsTech$`Cost of Attendance` 4.137e-01 2.625e-02
## earningsTech$Inst_RangeBelowAvg -4.855e+02 1.467e+03
## earningsTech$TotalTech:earningsTech$Inst_RangeBelowAvg 9.574e+02 3.582e+02
## t value Pr(>|t|)
## (Intercept) 21.009 < 2e-16 ***
## earningsTech$TotalTech -6.153 1e-09 ***
## earningsTech$`Cost of Attendance` 15.762 < 2e-16 ***
## earningsTech$Inst_RangeBelowAvg -0.331 0.74072
## earningsTech$TotalTech:earningsTech$Inst_RangeBelowAvg 2.673 0.00761 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8282 on 1358 degrees of freedom
## Multiple R-squared: 0.4206, Adjusted R-squared: 0.4189
## F-statistic: 246.4 on 4 and 1358 DF, p-value: < 2.2e-16
pairs(earningsTech$Earnings ~ earningsTech$TotalTech + earningsTech$`Cost of Attendance`)
# Partial F-test
anova(techModel, techmodel1)
## Analysis of Variance Table
##
## Model 1: earningsTech$Earnings ~ earningsTech$TotalTech + earningsTech$`Cost of Attendance` +
## earningsTech$Inst_Range
## Model 2: earningsTech$Earnings ~ earningsTech$TotalTech + earningsTech$`Cost of Attendance` +
## earningsTech$Inst_Range + earningsTech$TotalTech:earningsTech$Inst_Range
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1359 9.3626e+10
## 2 1358 9.3136e+10 1 489949973 7.1439 0.007612 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Plotting the Model
plot(techModel)
abline(techModel)
## Warning in abline(techModel): only using the first two of 4 regression
## coefficients
#Fitted Values Plot
plot(techModel$fitted.values, earningsTech$Earnings, main = "Fitted Values vs Earnings")
abline(lm(earningsTech$Earnings ~techModel$fitted.values))
residualPlots(techModel)
## Test stat Pr(>|Test stat|)
## earningsTech$TotalTech -0.4436 0.6574
## earningsTech$`Cost of Attendance` 1.2005 0.2302
## Tukey test 2.0663 0.0388 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Variance Inflation Factors
vif(techModel)
## earningsTech$TotalTech earningsTech$`Cost of Attendance`
## 1.004803 2.260711
## earningsTech$Inst_Range
## 2.256711
#Added Value plots
avPlots(techModel)
The partial F-test shows the interaction terms does not have any significance, leading us to move forward with the reduced model. From this model, universities charging below-average tuition rates also have students who earn lower salaries compared to their peers at high-tuition universities.
However, our analysis does not support the idea that the tech graduates in the U.S. are more highly paid than their peers.
Based on the Spearman correlation test , the p-value is < 2.2e-16, which is less than the significance level alpha = 0.05. We can conclude that ADM_RATE and OPENADMP are significantly correlated with a correlation coefficient of 0.86 and p-value of < 2.2e-16.
Based on Pearson’s correlation test, the p-value is < 2.2e-16, which is less than the significance level alpha = 0.05. We can conclude that TUITIONFEE_IN and ADM_RATE are significantly correlated with a correlation coefficient of 0.54 and p-value of < 2.2e-16.
Tuition and Graduation Rate: We did find a correlation between these two variables. Universities that charge more for tuition also tend to have better graduation rates, which means more students are finishing their studies. It’s possible that these institutions have more resources to support student success.
Region and Earnings: We did not find a strong relationship between where a student graduates and their future salaries. This debunks any misconception students might have that going specifically to the East or West Coasts for college would in itself lead to higher earnings.
Earnings and Graduation Rate: We did find a relationship between universities’ graduation rates and how much their students go on to earn. It stands to reason that if more students complete their studies, they are earning more money.
STEM and Earnings: We did not find a strong correlation between earning a STEM degree and making higher earnings than students who graduated with different degrees. However, we did find a relationship between tuition rates and earnings. Universities that charge more money for attendance also tend to have graduates who earn higher salaries.
Admission Rate and Open Admission Policy: We found a correlation between these variables. Universities with Open Admission Policies do in fact enroll more students. This leads us to believe that more exclusive universities would have smaller overall student bodies.
Admission Rate and Tuition: We did find a correlation between the exclusivity of a university and its tuition rate. The more selective a university is, the higher its tuition rate tends to be.