Run the following code to load the car dataset. What is the median of the first column?
data(cars)
median(cars[,1])
## [1] 15
Write the R code that can answer the following one question: What is the maximum of daily close price for BTC in the data?
library(jsonlite)
url <- "https://min-api.cryptocompare.com/data/v2/histoday?fsym=BTC&tsym=USD&limit=100"
btc_json <- fromJSON(url)
btc_data <- btc_json$Data$Data
max(btc_data$close, na.rm = TRUE)
## [1] 96945.09
Project title:
Is College Worth It? Comparing Cost and Early Career Earnings
3–5 research questions:
Do colleges with higher tuition usually have higher median earnings after graduation? Do public and private colleges differ in tuition and earnings? Which states have the highest average tuition? Which states have the highest median earnings for graduates? Is there a difference between schools with high debt and schools with high earnings?
Coding:
library(tidyverse)
## Warning: package 'dplyr' was built under R version 4.4.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.2.0 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ purrr::flatten() masks jsonlite::flatten()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readr)
# Load dataset
college <- read_csv("Most-Recent-Cohorts-Institution.csv")
## Rows: 6322 Columns: 3308
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2368): OPEID, OPEID6, INSTNM, CITY, STABBR, ZIP, ACCREDAGENCY, INSTURL,...
## dbl (851): UNITID, SCH_DEG, HCM2, MAIN, NUMBRANCH, PREDDEG, HIGHDEG, CONTRO...
## lgl (89): LOCALE2, UG, UGDS_WHITENH, UGDS_BLACKNH, UGDS_API, UGDS_AIANOLD,...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(college)
## # A tibble: 6 × 3,308
## UNITID OPEID OPEID6 INSTNM CITY STABBR ZIP ACCREDAGENCY INSTURL NPCURL
## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 100654 00100200 001002 Alabama… Norm… AL 35762 Southern As… www.aa… www.a…
## 2 100663 00105200 001052 Univers… Birm… AL 3529… Southern As… https:… https…
## 3 100690 02503400 025034 Amridge… Mont… AL 3611… Southern As… https:… https…
## 4 100706 00105500 001055 Univers… Hunt… AL 35899 Southern As… www.ua… uah.c…
## 5 100724 00100500 001005 Alabama… Mont… AL 3610… Southern As… www.al… tcc.r…
## 6 100751 00105100 001051 The Uni… Tusc… AL 3548… Southern As… www.ua… ua.ai…
## # ℹ 3,298 more variables: SCH_DEG <dbl>, HCM2 <dbl>, MAIN <dbl>,
## # NUMBRANCH <dbl>, PREDDEG <dbl>, HIGHDEG <dbl>, CONTROL <dbl>,
## # ST_FIPS <dbl>, REGION <dbl>, LOCALE <dbl>, LOCALE2 <lgl>, LATITUDE <dbl>,
## # LONGITUDE <dbl>, CCBASIC <dbl>, CCUGPROF <dbl>, CCSIZSET <dbl>, HBCU <dbl>,
## # PBI <dbl>, ANNHI <dbl>, TRIBAL <dbl>, AANAPII <dbl>, HSI <dbl>,
## # NANTI <dbl>, MENONLY <dbl>, WOMENONLY <dbl>, RELAFFIL <dbl>,
## # ADM_RATE <dbl>, ADM_RATE_ALL <dbl>, SATVR25 <dbl>, SATVR75 <dbl>, …
# Select only useful columns
college_clean <- college %>%
select(
school = INSTNM,
state = STABBR,
tuition = TUITIONFEE_IN,
cost = COSTT4_A,
low_income_students = PCTPELL,
median_debt = GRAD_DEBT_MDN,
median_earnings = MD_EARN_WNE_P10
)
# Convert earnings and debt to numeric (dataset stores some as text)
college_clean <- college_clean %>%
mutate(
median_earnings = as.numeric(median_earnings),
median_debt = as.numeric(median_debt)
)
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `median_debt = as.numeric(median_debt)`.
## Caused by warning:
## ! NAs introduced by coercion
# Remove missing values
college_clean <- college_clean %>%
drop_na(tuition, median_earnings)
# Basic summary statistics
summary(college_clean)
## school state tuition cost
## Length:3399 Length:3399 Min. : 0 Min. : 4451
## Class :character Class :character 1st Qu.: 5550 1st Qu.:16844
## Mode :character Mode :character Median :12097 Median :25815
## Mean :18075 Mean :32025
## 3rd Qu.:25023 3rd Qu.:42555
## Max. :72097 Max. :93512
## NA's :347
## low_income_students median_debt median_earnings
## Min. :0.0000 Min. : 2819 Min. : 11998
## 1st Qu.:0.2457 1st Qu.:11719 1st Qu.: 37722
## Median :0.3389 Median :19976 Median : 44950
## Mean :0.3778 Mean :18507 Mean : 48203
## 3rd Qu.:0.4696 3rd Qu.:24807 3rd Qu.: 55939
## Max. :1.0000 Max. :43021 Max. :143372
## NA's :279 NA's :298
# Average earnings by state
state_summary <- college_clean %>%
group_by(state) %>%
summarise(
avg_tuition = mean(tuition, na.rm = TRUE),
avg_earnings = mean(median_earnings, na.rm = TRUE)
) %>%
arrange(desc(avg_earnings))
print(state_summary)
## # A tibble: 58 × 3
## state avg_tuition avg_earnings
## <chr> <dbl> <dbl>
## 1 RI 39308 65514.
## 2 MA 34673. 61678.
## 3 DC 34101. 60394.
## 4 CT 31057. 60388.
## 5 NJ 19837. 56295.
## 6 PA 27033. 55945.
## 7 NH 21665. 54872.
## 8 NY 25186 54869.
## 9 CA 16909. 54469.
## 10 MD 19207. 54074.
## # ℹ 48 more rows
# Relationship between tuition and earnings
ggplot(college_clean, aes(x = tuition, y = median_earnings)) +
geom_point(alpha = 0.4) +
geom_smooth(method = "lm", color = "blue") +
labs(
title = "Does Higher Tuition Lead to Higher Earnings?",
x = "Tuition Cost",
y = "Median Earnings 10 Years After Enrollment"
)
## `geom_smooth()` using formula = 'y ~ x'
# Debt vs earnings
ggplot(college_clean, aes(x = median_debt, y = median_earnings)) +
geom_point(alpha = 0.4) +
labs(
title = "Student Debt vs Earnings",
x = "Median Debt",
y = "Median Earnings"
)
## Warning: Removed 298 rows containing missing values or values outside the scale range
## (`geom_point()`).
Conculsion:
The results suggest that attending college is associated with higher earnings, and schools with higher tuition tend to have graduates with higher incomes.
However, the relationship is not perfect. Other factors such as school quality, field of study, and location also influence earnings after college.