data(cars)
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
library(jsonlite)
library(jsonlite)
api_base <- "https://min-api.cryptocompare.com/data/v2/histoday"
f_sym <- "BTC"
t_sym <- "USD"
days <- "100"
final_link <- paste0(api_base, "?fsym=", f_sym, "&tsym=", t_sym, "&limit=", days)
res <- fromJSON(final_link)
btc_data <- res$Data$Data
max_price <- max(btc_data$close)
print(max_price)
## [1] 96945.09
The goal of this project is to evaluate whether higher tuition and student debt are associated with higher post-graduation earnings, and to determine which institutions provide stronger financial return on investment (ROI).
We use the College Scorecard dataset from the U.S. Department of Education. This dataset includes tuition, debt, earnings, and institutional characteristics.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ purrr::flatten() masks jsonlite::flatten()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(corrplot)
## corrplot 0.95 loaded
college <- read_csv("Most-Recent-Cohorts-Institution.csv")
## Rows: 6429 Columns: 3306
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2380): OPEID, OPEID6, INSTNM, CITY, STABBR, ZIP, ACCREDAGENCY, INSTURL,...
## dbl (851): UNITID, SCH_DEG, HCM2, MAIN, NUMBRANCH, PREDDEG, HIGHDEG, CONTRO...
## lgl (75): LOCALE2, UG, UGDS_WHITENH, UGDS_BLACKNH, UGDS_API, UGDS_AIANOLD,...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Description:
We import the raw College Scorecard dataset into R for cleaning and
analysis.
college_clean <- college %>%
select(
INSTNM,
CONTROL,
TUITIONFEE_IN,
ADM_RATE,
C150_4,
DEBT_MDN,
MD_EARN_WNE_P10
)
Description:
We keep only the variables necessary to answer our research
questions.
college_clean <- college_clean %>%
mutate(across(where(is.character), ~na_if(., "NULL")))
Description:
We remove observations missing tuition, debt, or earnings because they
are essential to our analysis.
college_clean <- college_clean %>%
mutate(
TUITIONFEE_IN = as.numeric(TUITIONFEE_IN),
DEBT_MDN = as.numeric(DEBT_MDN),
MD_EARN_WNE_P10 = as.numeric(MD_EARN_WNE_P10),
ADM_RATE = as.numeric(ADM_RATE),
C150_4 = as.numeric(C150_4)
)
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `DEBT_MDN = as.numeric(DEBT_MDN)`.
## Caused by warning:
## ! NAs introduced by coercion
Description:
Some columns are stored as characters. We convert them to numeric for
proper calculations and modeling.
college_clean$CONTROL <- factor(
college_clean$CONTROL,
levels = c(1,2,3),
labels = c("Public","Private Nonprofit","Private For-Profit")
)
Description:
We relabel the institution type variable to improve readability.
college_clean <- college_clean %>%
mutate(
ROI_ratio = MD_EARN_WNE_P10 / DEBT_MDN
)
Description:
We create a return-on-investment (ROI) proxy by dividing earnings by
median debt.
summary(college_clean)
## INSTNM CONTROL TUITIONFEE_IN ADM_RATE
## Length:6429 Public :2056 Min. : 600 Min. :0.000
## Class :character Private Nonprofit :1953 1st Qu.: 5688 1st Qu.:0.604
## Mode :character Private For-Profit:2420 Median :11790 Median :0.779
## Mean :17238 Mean :0.728
## 3rd Qu.:23186 3rd Qu.:0.908
## Max. :69330 Max. :1.000
## NA's :2700 NA's :4483
## C150_4 DEBT_MDN MD_EARN_WNE_P10 ROI_ratio
## Min. :0.000 Min. : 1932 Min. : 8579 Min. : 0.6588
## 1st Qu.:0.372 1st Qu.: 7000 1st Qu.: 31830 1st Qu.: 3.0765
## Median :0.525 Median : 9500 Median : 40568 Median : 3.8182
## Mean :0.520 Mean :11269 Mean : 43508 Mean : 4.3167
## 3rd Qu.:0.671 3rd Qu.:15000 3rd Qu.: 51994 3rd Qu.: 5.1352
## Max. :1.000 Max. :38980 Max. :143372 Max. :20.8075
## NA's :4157 NA's :1146 NA's :1149 NA's :1615
Description:
We examine summary statistics to understand variable distributions and
ranges.
ggplot(college_clean, aes(x = TUITIONFEE_IN)) +
geom_histogram(bins = 30) +
labs(title = "Distribution of In-State Tuition",
x = "Tuition",
y = "Frequency")
## Warning: Removed 2700 rows containing non-finite outside the scale range
## (`stat_bin()`).
Description:
This histogram shows how tuition is distributed across institutions.
ggplot(college_clean, aes(x = MD_EARN_WNE_P10)) +
geom_histogram(bins = 30) +
labs(title = "Distribution of Median Earnings (10 Years)",
x = "Earnings",
y = "Frequency")
## Warning: Removed 1149 rows containing non-finite outside the scale range
## (`stat_bin()`).
Description:
This histogram shows how post-graduation earnings vary across
schools.
numeric_vars <- college_clean %>%
select(TUITIONFEE_IN, DEBT_MDN, MD_EARN_WNE_P10, ADM_RATE, C150_4)
cor_matrix <- cor(numeric_vars, use = "complete.obs")
corrplot(cor_matrix, method = "circle")
Description:
We calculate correlations to evaluate relationships between tuition,
debt, earnings, admission rate, and completion rate.
ggplot(college_clean, aes(x = TUITIONFEE_IN, y = MD_EARN_WNE_P10)) +
geom_point(alpha = 0.4) +
geom_smooth(method = "lm") +
labs(title = "Tuition vs Post-Graduation Earnings",
x = "In-State Tuition",
y = "Median Earnings (10 Years)")
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 2957 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 2957 rows containing missing values or values outside the scale range
## (`geom_point()`).
Description:
This scatterplot with regression line shows whether higher tuition is
associated with higher earnings.
ggplot(college_clean, aes(x = DEBT_MDN, y = MD_EARN_WNE_P10)) +
geom_point(alpha = 0.4) +
geom_smooth(method = "lm") +
labs(title = "Debt vs Earnings",
x = "Median Debt",
y = "Median Earnings (10 Years)")
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 1615 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 1615 rows containing missing values or values outside the scale range
## (`geom_point()`).
Description:
This plot evaluates whether higher student debt is associated with
higher earnings.
ggplot(college_clean, aes(x = CONTROL, y = MD_EARN_WNE_P10)) +
geom_boxplot() +
labs(title = "Earnings by Institution Type",
x = "Institution Type",
y = "Median Earnings (10 Years)")
## Warning: Removed 1149 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
Description:
This boxplot compares earnings between public and private
institutions.
ggplot(college_clean, aes(x = ADM_RATE, y = MD_EARN_WNE_P10)) +
geom_point(alpha = 0.4) +
geom_smooth(method = "lm") +
labs(title = "Admission Rate vs Earnings",
x = "Admission Rate",
y = "Median Earnings (10 Years)")
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 4656 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 4656 rows containing missing values or values outside the scale range
## (`geom_point()`).
Description:
This plot shows whether more selective institutions (lower admission
rates) produce higher earnings.
model <- lm(MD_EARN_WNE_P10 ~ TUITIONFEE_IN + DEBT_MDN + ADM_RATE + C150_4 + CONTROL,
data = college_clean)
summary(model)
##
## Call:
## lm(formula = MD_EARN_WNE_P10 ~ TUITIONFEE_IN + DEBT_MDN + ADM_RATE +
## C150_4 + CONTROL, data = college_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -48214 -5970 -1173 5268 79411
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.657e+04 1.961e+03 18.649 < 2e-16 ***
## TUITIONFEE_IN 4.088e-01 3.189e-02 12.821 < 2e-16 ***
## DEBT_MDN 3.918e-01 8.069e-02 4.855 1.32e-06 ***
## ADM_RATE -8.563e+03 1.540e+03 -5.561 3.16e-08 ***
## C150_4 2.909e+04 2.127e+03 13.674 < 2e-16 ***
## CONTROLPrivate Nonprofit -1.299e+04 9.845e+02 -13.199 < 2e-16 ***
## CONTROLPrivate For-Profit -1.472e+03 1.697e+03 -0.867 0.386
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11810 on 1539 degrees of freedom
## (4883 observations deleted due to missingness)
## Multiple R-squared: 0.4465, Adjusted R-squared: 0.4443
## F-statistic: 206.9 on 6 and 1539 DF, p-value: < 2.2e-16
Description:
We estimate a multiple linear regression model to determine which
variables significantly predict post-graduation earnings.
This project explored the relationship between tuition, student debt,
and post-graduation earnings.
The analysis provides insight into whether college provides financial
return and which institutional characteristics are associated with
stronger earnings outcomes.