library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.1     ✔ tibble    3.3.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(openintro)
## Loading required package: airports
## Loading required package: cherryblossom
## Loading required package: usdata
library(ggplot2)

Is interest_rate associated with debt_to_income?

ggplot(loans_full_schema, aes(x = debt_to_income, y = interest_rate)) +
  geom_point(alpha = 0.2) +
  geom_smooth(method = "loess", se = FALSE, color = "red") +
  labs(title = "Interest rate vs debt-to-income",
       x = "Debt-to-income ratio",
       y = "Interest rate (%)")
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 24 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_point()`).

Interpretation: The scatter plot shows a dense vertical band at low DTI values with no clear upward or downward trend in interest rates. Therefore, DTI alone does not seem to be a major driver of loan pricing.

If DTI is weak, which variable explains interest_rate better – loan_grade?

ggplot(loans_full_schema, aes(x = grade, y = interest_rate)) +
  geom_boxplot(outlier.alpha = 0.2) +
  labs(title = "Interest rate by loan grade",
       x = "Loan grade",
       y = "Interest rate (%)")

Interpretation: Median interest rates increase systematically from grade A to grade G, and the distributions overlap only modestly between adjacent grades.
This indicates a strong monotonic relationship: lower-quality grades are consistently charged higher interest rates. ### Within the same loan_grade, does annual_income matter for interest_rate?

ggplot(loans_full_schema, aes(x = annual_income, y = interest_rate)) +
  geom_point(alpha = 0.2) +
  scale_x_log10() +
  facet_wrap(~ grade) +
  labs(title = "Interest rate vs. annual income within each grade",
       x = "Annual income (log scale)",
       y = "Interest rate (%)")
## Warning in scale_x_log10(): log-10 transformation introduced infinite values.

Interpretation: Within each grade, interest rates remain fairly flat across different income levels, with substantial vertical spread but no clear trend along the income axis.
This suggests that once grade is set, income adds little extra information for pricing, reinforcing the idea that grades are the primary mechanism lenders use to encode borrower risk.