How many distinct values are there for homeownership variable? Which value is the most common one?
Answer: 3 distinct values, Morgage is teh most common.
How many distinct interest rates are there? Which value is the most common one?
Answer: 58 distinct interest rate vlues, 9.39 is the most common.
Apply table function to the annual_income variable. Do you think the result is helpful or not?
Answer: yes because it shows the values and the frequency of each values.
Create a histogram of loan_amount. Customize your plot to give a graph that looks most reasonable to you.
## Loading required package: airports
## Loading required package: cherryblossom
## Loading required package: usdata
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ lubridate 1.9.4 ✔ tibble 3.2.1
## ✔ purrr 1.0.2 ✔ tidyr 1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Create a histogram of annual_income. What is the issue with your graph?
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Answer: The graph is not natural. It contains the extreme outliner.
Create a histogram of variable debt_to_income in loans with the following requirements:
The plotting range of x is between 0 and 100 The binwidth is 2 Create a density plot on top of the histogram
## Warning: Removed 57 rows containing non-finite outside the scale range
## (`stat_bin()`).
## Warning: Removed 57 rows containing non-finite outside the scale range
## (`stat_density()`).
For loans data, create a scatter plot of interest_rate vs debt_to_income with mapping color to grade.
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_point()`).
What can you learn from the graph?
Answer:
- Higher debt_to_income ratios are associated with higher interest rates.
- Each Loan grade appears to form a distinct cluster in the scatter plot.
- Higher grades such as A and B appear at lower debt_to_income and lower interest_rate.Lower grades namely D, E, F, G have higher interest rates and higher debt_to_income.
- Some outliners show high or low rate for a certain debt_to_income.
Question: Can you explain the distribution of debt_to_income?
Anwer: High debt_to_income is with higher interest_rate, low debt_to_income is with low interest_rate
## Saving 7 x 5 in image