Introduction

We currently have data on enrollment rates, SAT scores, and enrollment costs. My goal is to examine whether there is a relationship between enrollment costs and SAT scores, and to use enrollment rates as a third dimension to try and answer the question of how enrollment costs relate to SAT scores and acceptance rates.

Methods

The data has been preprocessed, including standardizing variable names and ensuring NA data. This data comes from the US Department of Education. My independent variable is the average SAT score, and my dependent variable is the cost of enrollment, taking into account changes in enrollment rates.

I used a linear variation model to estimate the trend of my data.

Results

Exploring the data

# Placeholder for code chunk

# Insert a meaningful table or plot
#  the plot_ly function 




# 首先过滤掉 sat_avg 或 costt4_a 中的缺失值
colle_clean <- colle %>% 
  filter(!is.na(sat_avg) & !is.na(costt4_a))

# 使用清理后的数据绘图
p <- plot_ly(data = colle_clean, x = ~sat_avg, y = ~costt4_a, 
             color = ~adm_rate, size = ~ugds,
             type = 'scatter', mode = 'markers',
             marker = list(opacity = 0.7),
             name = "Colleges")

# 添加拟合线(现在x和y长度一致)
p <- p %>% add_trace(x = ~sat_avg, 
                     y = ~fitted(lm(costt4_a ~ sat_avg, data = colle_clean)),
                     type = 'scatter', mode = 'lines',
                     line = list(color = 'orange', width = 3, opacity = 0.1, dash = 'longdash'),
                     name = "Linear Fit",
                     inherit = FALSE)

# 添加布局
p %>% layout(title = "Cost of attendance, SAT scores, admission rate,\nand median household income: What is the relationship?",
             xaxis = list(title = "Average SAT Score"),
             yaxis = list(title = "Cost of Attendance"),
             coloraxis = list(title = "Admission Rate"),
             annotations = 
               list(x = 1, y = -0.1, text = "Source: US Dept. of Education",
                    showarrow = F, xref = "paper", yref = "paper",
                    xanchor='right', yanchor='auto', xshift=0, yshift=0,
                    font=list(size=11)))
## Warning: `line.width` does not currently support multiple values.

I ignored the missing parts of the data and did not participate in this data analysis.

Analyzing and interpreting the data

# Placeholder for code chunk

mlr_1 <- lm(costt4_a ~ sat_avg + adm_rate + ugds, data=colle)

summary(mlr_1)
## 
## Call:
## lm(formula = costt4_a ~ sat_avg + adm_rate + ugds, data = colle)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -44193  -7851    647   7693  33457 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -3.647e+04  3.510e+03 -10.391  < 2e-16 ***
## sat_avg      7.212e+01  2.620e+00  27.531  < 2e-16 ***
## adm_rate    -6.989e+03  1.636e+03  -4.273 2.07e-05 ***
## ugds        -8.817e-01  4.048e-02 -21.779  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10790 on 1304 degrees of freedom
##   (5804 observations deleted due to missingness)
## Multiple R-squared:  0.4887, Adjusted R-squared:  0.4875 
## F-statistic: 415.4 on 3 and 1304 DF,  p-value: < 2.2e-16

The data analysis shows that average SAT scores, acceptance rates, and UDDs have a statistically significant relationship with enrollment costs. Specifically, average SAT scores are positively correlated with acceptance rates, meaning that higher scores lead to higher enrollment costs. Conversely, acceptance rates and UDDs are negatively correlated with enrollment costs, meaning that lower acceptance rates and UDDs lead to higher enrollment costs.

Discussion

This study investigates the relationship between enrollment costs and SAT scores, acceptance rates, and UDDs (Undergraduate Degrees and Graduated Students).

Data analysis revealed a statistically significant linear correlation between enrollment costs and these three factors.

Specifically, average SAT scores are positively correlated with acceptance rates, meaning higher scores generally lead to higher enrollment costs. Acceptance rates and UDDs are negatively correlated with enrollment costs, indicating that lower acceptance rates and UDDs correlate with higher enrollment costs.

One key finding of this study is that as SAT scores increase, enrollment costs appear to divide into two groups based on two different slopes, warranting further investigation and data analysis.

Regarding the missing data points, further research could utilize KNN or linear regression analysis to supplement the data and obtain more complete data for analysis.

References