Question #0: Lab exercise:

How many distinct values are there for homeownership variable? Which value is the most common one?

Answer: 3 distinct values, Morgage is teh most common.

How many distinct interest rates are there? Which value is the most common one?

Answer: 58 distinct interest rate vlues, 9.39 is the most common.

Apply table function to the annual_income variable. Do you think the result is helpful or not?

Answer: yes because it shows the values and the frequency of each values.

Question #0: Lab exercise:

Create a histogram of loan_amount. Customize your plot to give a graph that looks most reasonable to you.

## Loading required package: airports
## Loading required package: cherryblossom
## Loading required package: usdata
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ lubridate 1.9.4     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Create a histogram of annual_income. What is the issue with your graph?

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Answer: The graph is not natural. It contains the extreme outliner.

Question #0: Lab exercise:

Create a histogram of variable debt_to_income in loans with the following requirements:

The plotting range of x is between 0 and 100 The binwidth is 2 Create a density plot on top of the histogram

## Warning: Removed 57 rows containing non-finite outside the scale range
## (`stat_bin()`).
## Warning: Removed 57 rows containing non-finite outside the scale range
## (`stat_density()`).

Question #0: Lab exercise:

For loans data, create a scatter plot of interest_rate vs debt_to_income with mapping color to grade.

## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_point()`).

What can you learn from the graph?

Answer:

- Higher debt_to_income ratios are associated with higher interest rates.

- Each Loan grade appears to form a distinct cluster in the scatter plot.

- Higher grades such as A and B appear at lower debt_to_income and lower interest_rate.Lower grades namely D, E, F, G have higher interest rates and higher debt_to_income.

- Some outliners show high or low rate for a certain debt_to_income.

Question: Can you explain the distribution of debt_to_income?

Anwer: High debt_to_income is with higher interest_rate, low debt_to_income is with low interest_rate

Question #1: Create a scatter plot of loan_amount vs interest_rate with a color grouping using term variable (please use factor(term) to convert it into a categorical variable). Save your plot to your local folder.

## Saving 7 x 5 in image