Introduction

Research Question: What is the relationship between cost of attendance, average faculty salary, and admission rate?

I will use data from the U.S. Department of Education regarding college scorecard data from the 2017-18 academic year.

Methods

The data comes from the College Scorecard data about education regarding US Department of Education data, https://collegescorecard.ed.gov/data/, from the 2017-18 school year.

The target variable is cost of attendance. Predictor variables are average faculty salary and admission rate.

To analyze the three numeric variables, we can create a scatterplot and run a multiple linear regression model.

Results

There is a weak positive relationship between cost of attendance, average faculty salary, and admission rate.

Exploring the data

collegesc %>% 
  ggplot(aes(x = avgfacsal, y = costt4_a, color=adm_rate)) + 
  geom_point(alpha = 0.5) + 
  geom_smooth(method = "lm") + scale_x_continuous(labels = scales::dollar) + scale_y_continuous(labels = scales::dollar)+
  labs(
    title = "What is the relationship between cost of attendance, average 
    faculty salary, and admission rate?",
    caption = "Source: US Department of Education",
    x = "Average Faculty Salary",
    y = "Cost of Attendance"
  )
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 3692 rows containing non-finite values (stat_smooth).
## Warning: Removed 3692 rows containing missing values (geom_point).

plot_ly(data=collegesc, x = ~adm_rate, y = ~costt4_a,
        color = ~avgfacsal,
        alpha = 0.6,
        text = ~instnm) %>%
  layout(title = "Cost of Attendance, Average Faculty Salary, and 
         Admission Rate: What is the Relationship?",
         xaxis = list(title = "Admission Rate"),
         yaxis = list(title = "Cost of Attendance"),
         coloraxis = list(title="Average Faculty Salary"),
         annotations = 
           list(x = 1, y = -0.1, text = "Source: US Dept. of Education", 
                showarrow = F, xref='paper', yref='paper', 
                xanchor='right', yanchor='auto', xshift=0, yshift=0,
                font=list(size=11)))
## No trace type specified:
##   Based on info supplied, a 'scatter' trace seems appropriate.
##   Read more about this trace type -> https://plotly.com/r/reference/#scatter
## No scatter mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode
## Warning: Ignoring 5279 observations

There seems to be a positive relationship between cost of attendance, admission rate, and average faculty salary. The first plot with Average Faculty Salary on the x-axis is missing 3692 rows and the second plot is missing 5279 observations out of 7112 rows of data. We can add a line of best fit on the first plot to check the slope of the line and confirm a positive relationship.

Analyzing and interpreting the data

options(scipen=4)
csmlr <- lm(costt4_a ~ avgfacsal + adm_rate, 
            data = collegesc)
summary(csmlr)
## 
## Call:
## lm(formula = costt4_a ~ avgfacsal + adm_rate, data = collegesc)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -38524 -12005     30  11631  36019 
## 
## Coefficients:
##                Estimate  Std. Error t value Pr(>|t|)    
## (Intercept)  28517.0595   1809.2729  15.762  < 2e-16 ***
## avgfacsal        2.0701      0.1373  15.080  < 2e-16 ***
## adm_rate    -13529.9055   1683.3792  -8.037 1.65e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13800 on 1785 degrees of freedom
##   (5324 observations deleted due to missingness)
## Multiple R-squared:  0.1913, Adjusted R-squared:  0.1904 
## F-statistic: 211.1 on 2 and 1785 DF,  p-value: < 2.2e-16

Using multilinear regression to analyze if admission rate, cost of attendance, and average faculty salary are related, we can conclude both average faculty salary and admission rate are statistically significant since the p-values are under 0.05 as well as the intercept. Our intercept represents the average cost of attendance for schools that have a faculty salary and admission rate of 0. For every point decrease (0.01) from the average admission rate, the average cost of attendance increases $13,529.91. For every point increase (0.01) in average faculty salary, the average cost of attendance increases by $2.0701. Our R^2 value of 0.1904 indicates a weak positive relationship.

Discussion

From our analysis we can conclude that there is a weak positive relationship between cost of attendance, average faculty salary, and admission rate.

A data limitation was the missing values for each of our variables. For further analysis, the missing values for the dataset should be considered and the weak correlation indicated by the r square.

It makes sense for cost of attendance to increase when admission rate is lower (more selective schools) and/or average faculty salary is higher. The analysis confirmed the positive relationship but the relationship is weak.

Since the variables are not moderately or strong correlated, it might be more useful to study how other variables in the dataset affect cost of attendance.

References

(https://edx.org)