Research Question: What is the relationship between cost of attendance, average faculty salary, and admission rate?
I will use data from the U.S. Department of Education regarding college scorecard data from the 2017-18 academic year.
The data comes from the College Scorecard data about education regarding US Department of Education data, https://collegescorecard.ed.gov/data/, from the 2017-18 school year.
The target variable is cost of attendance. Predictor variables are average faculty salary and admission rate.
To analyze the three numeric variables, we can create a scatterplot and run a multiple linear regression model.
There is a weak positive relationship between cost of attendance, average faculty salary, and admission rate.
collegesc %>%
ggplot(aes(x = avgfacsal, y = costt4_a, color=adm_rate)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm") + scale_x_continuous(labels = scales::dollar) + scale_y_continuous(labels = scales::dollar)+
labs(
title = "What is the relationship between cost of attendance, average
faculty salary, and admission rate?",
caption = "Source: US Department of Education",
x = "Average Faculty Salary",
y = "Cost of Attendance"
)
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 3692 rows containing non-finite values (stat_smooth).
## Warning: Removed 3692 rows containing missing values (geom_point).
plot_ly(data=collegesc, x = ~adm_rate, y = ~costt4_a,
color = ~avgfacsal,
alpha = 0.6,
text = ~instnm) %>%
layout(title = "Cost of Attendance, Average Faculty Salary, and
Admission Rate: What is the Relationship?",
xaxis = list(title = "Admission Rate"),
yaxis = list(title = "Cost of Attendance"),
coloraxis = list(title="Average Faculty Salary"),
annotations =
list(x = 1, y = -0.1, text = "Source: US Dept. of Education",
showarrow = F, xref='paper', yref='paper',
xanchor='right', yanchor='auto', xshift=0, yshift=0,
font=list(size=11)))
## No trace type specified:
## Based on info supplied, a 'scatter' trace seems appropriate.
## Read more about this trace type -> https://plotly.com/r/reference/#scatter
## No scatter mode specifed:
## Setting the mode to markers
## Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode
## Warning: Ignoring 5279 observations
There seems to be a positive relationship between cost of attendance, admission rate, and average faculty salary. The first plot with Average Faculty Salary on the x-axis is missing 3692 rows and the second plot is missing 5279 observations out of 7112 rows of data. We can add a line of best fit on the first plot to check the slope of the line and confirm a positive relationship.
options(scipen=4)
csmlr <- lm(costt4_a ~ avgfacsal + adm_rate,
data = collegesc)
summary(csmlr)
##
## Call:
## lm(formula = costt4_a ~ avgfacsal + adm_rate, data = collegesc)
##
## Residuals:
## Min 1Q Median 3Q Max
## -38524 -12005 30 11631 36019
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 28517.0595 1809.2729 15.762 < 2e-16 ***
## avgfacsal 2.0701 0.1373 15.080 < 2e-16 ***
## adm_rate -13529.9055 1683.3792 -8.037 1.65e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13800 on 1785 degrees of freedom
## (5324 observations deleted due to missingness)
## Multiple R-squared: 0.1913, Adjusted R-squared: 0.1904
## F-statistic: 211.1 on 2 and 1785 DF, p-value: < 2.2e-16
Using multilinear regression to analyze if admission rate, cost of attendance, and average faculty salary are related, we can conclude both average faculty salary and admission rate are statistically significant since the p-values are under 0.05 as well as the intercept. Our intercept represents the average cost of attendance for schools that have a faculty salary and admission rate of 0. For every point decrease (0.01) from the average admission rate, the average cost of attendance increases $13,529.91. For every point increase (0.01) in average faculty salary, the average cost of attendance increases by $2.0701. Our R^2 value of 0.1904 indicates a weak positive relationship.
From our analysis we can conclude that there is a weak positive relationship between cost of attendance, average faculty salary, and admission rate.
A data limitation was the missing values for each of our variables. For further analysis, the missing values for the dataset should be considered and the weak correlation indicated by the r square.
It makes sense for cost of attendance to increase when admission rate is lower (more selective schools) and/or average faculty salary is higher. The analysis confirmed the positive relationship but the relationship is weak.
Since the variables are not moderately or strong correlated, it might be more useful to study how other variables in the dataset affect cost of attendance.