library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.6
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.1 ✔ tibble 3.3.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.2
## ✔ purrr 1.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(openintro)
## Loading required package: airports
## Loading required package: cherryblossom
## Loading required package: usdata
library(ggplot2)
ggplot(loans_full_schema, aes(x = debt_to_income, y = interest_rate)) +
geom_point(alpha = 0.2) +
geom_smooth(method = "loess", se = FALSE, color = "red") +
labs(title = "Interest rate vs debt-to-income",
x = "Debt-to-income ratio",
y = "Interest rate (%)")
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 24 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_point()`).
Interpretation: The scatter plot shows a dense vertical
band at low DTI values with no clear upward or downward trend in
interest rates. Therefore, DTI alone does not seem to be a major driver
of loan pricing.
ggplot(loans_full_schema, aes(x = grade, y = interest_rate)) +
geom_boxplot(outlier.alpha = 0.2) +
labs(title = "Interest rate by loan grade",
x = "Loan grade",
y = "Interest rate (%)")
Interpretation: Median interest rates increase
systematically from grade A to grade G, and the distributions overlap
only modestly between adjacent grades.
This indicates a strong monotonic relationship: lower-quality grades are
consistently charged higher interest rates. ### Within the same
loan_grade, does annual_income matter for interest_rate?
ggplot(loans_full_schema, aes(x = annual_income, y = interest_rate)) +
geom_point(alpha = 0.2) +
scale_x_log10() +
facet_wrap(~ grade) +
labs(title = "Interest rate vs. annual income within each grade",
x = "Annual income (log scale)",
y = "Interest rate (%)")
## Warning in scale_x_log10(): log-10 transformation introduced infinite values.
Interpretation: Within each grade, interest rates
remain fairly flat across different income levels, with substantial
vertical spread but no clear trend along the income axis.
This suggests that once grade is set, income adds little extra
information for pricing, reinforcing the idea that grades are the
primary mechanism lenders use to encode borrower risk.