Lab Homework #3

Lab Exercise 1:

How many distinct values are there for homeownership variable? Which value is the most common one?
```
sort(table(loans$homeownership))
```
```
## 
##               ANY      OWN     RENT MORTGAGE 
##        0        0     1353     3858     4789
```
4 distinct values for homeownership variables. Mortgage is the most common one.

How many distinct interest rates are there? Which value is the most common one?

length(unique(loans$interest_rate))

## [1] 58

sort(table(loans$interest_rate))

## 
## 30.94     6 30.75 30.65 29.69 30.17 30.79  22.9 23.87 25.81 22.91 24.84 28.72 
##     1     3     4     5     9     9    11    13    20    26    28    31    31 
## 23.88 26.77 24.85 25.82  26.3 21.85 20.39 19.42 17.47    20 18.45 21.45 18.06 
##    37    38    42    47    53    90    93   114   124   137   146   172   176 
## 14.07  5.31  6.71 10.41 17.09 16.01 19.03 15.04  6.07  7.96 13.58  5.32  7.34 
##   183   188   192   194   195   196   197   199   202   211   225   234   243 
##  9.92 11.98 12.61  7.97  10.9  6.08  9.43 16.02 15.05 10.91  6.72 14.08  7.35 
##   248   255   264   274   275   277   280   284   304   306   312   318   325 
## 12.62 10.42 13.59  9.44 11.99  9.93 
##   333   346   347   367   376   390

58 distinct interest rates, 9.93% is the most common one.

Apply table function to the annual_income variable. Do you think the result is helpful or not? Not really helpful

Lab Exercise 2:

Create a histogram of loan_amount. Customize your plot to give a graph that looks most reasonable to you.

ggplot(data = loans) +
  geom_histogram(mapping = aes(x = loan_amount), binwidth = 2500, boundary = 1000)

Create a histogram of annual_income. What is the issue with your graph?
```
ggplot(data = loans) +
  geom_histogram(mapping = aes(x = annual_income))
```
The graph is squeezed to the left because of outliers to the right that skewed the graph. This is better:
```
ggplot(data = loans) +
  geom_histogram(mapping = aes(x = annual_income)) + 
  xlim(0, 500000)
```

Lab Exercise 3:

Create a histogram of variable debt_to_income in loans with the following requirements:

The plotting range of x is between 0 and 100

ggplot(loans, aes(x = debt_to_income)) +
  geom_histogram(aes(y = after_stat(density)),  
                 boundary = 0, colour = "black", fill = "white") + xlim(0, 100)

The binwidth is 2

ggplot(loans, aes(x = debt_to_income)) +
  geom_histogram(aes(y = after_stat(density), binwidth = 2),  
                 boundary = 0, colour = "black", fill = "white") + xlim(0, 100)

Create a density plot on top of the histogram

ggplot(loans, aes(x = debt_to_income)) +
  geom_histogram(aes(y = after_stat(density), binwidth = 2),  
                 boundary = 0, colour = "black", fill = "white") + xlim(0, 100) +
  geom_density(adjust = 30/8, linewidth = 1.2)

Question: Can you explain the distribution of debt_to_income?

debt_to_income is relatively Normal with the center closer to 0. This means that most people have a relatively manageable to low debt to income ratio, but there are some people that has a pretty high debt to income ratio which explains the skewness of the distribution.

Lab Exercise 4:

For loans data, create a scatter plot of interest_rate vs debt_to_income with mapping color to grade. What can you learn from the graph?

ggplot(data = loans) + geom_point(mapping = aes(x = interest_rate, y = debt_to_income, color = grade))

From this scatter plot, I can tell that the higher the grade, the lower the interest_rate. Also most of the debt_to_income data points are below 100 and the extremely high debt_to_income are normally the mid-tier grade with the mid-tier interest_rate.

Lab Homework #1:

Finish all Lab Exercises
Create a scatter plot of loan_amount vs interest_rate with a color grouping using term variable (please use factor(term) to convert it into a categorical variable). Save your plot to your local folder.

ggplot(data = loans) + geom_point(mapping = aes(x = loan_amount, y = interest_rate, color = factor(term)))

ggsave("my-plot.pdf")

Submit your homework using R Markdown in pdf format.