Lab Exercise 1

Answer the following questions by using unique or table function:

How many distinct values are there for homeownership variable? Which value is the most common one?

unique(loans$homeownership)
## [1] MORTGAGE RENT     OWN     
## Levels:  ANY MORTGAGE OWN RENT

Answer: 3.

How many distinct interest rates are there? Which value is the most common one?

unique(loans$interest_rate)
##  [1] 14.07 12.61 17.09  6.72 13.59 11.99  6.71 15.04  9.92  9.43 19.03 28.72
## [13] 26.77 15.05  6.08 11.98  7.96  7.34  5.32  6.07 12.62  9.44 20.39  9.93
## [25] 21.45 10.42 18.06 22.91 30.79 17.47  5.31  7.97 14.08 19.42 10.91 16.02
## [37] 13.58 16.01 20.00 21.85 10.90 23.87  7.35 23.88 25.82 10.41 18.45 30.17
## [49] 24.85 25.81 24.84 30.75 29.69 26.30 22.90  6.00 30.65 30.94

Answer: 58 interest rates.

table(loans$interest_rate)
## 
##  5.31  5.32     6  6.07  6.08  6.71  6.72  7.34  7.35  7.96  7.97  9.43  9.44 
##   188   234     3   202   277   192   312   243   325   211   274   280   367 
##  9.92  9.93 10.41 10.42  10.9 10.91 11.98 11.99 12.61 12.62 13.58 13.59 14.07 
##   248   390   194   346   275   306   255   376   264   333   225   347   183 
## 14.08 15.04 15.05 16.01 16.02 17.09 17.47 18.06 18.45 19.03 19.42    20 20.39 
##   318   199   304   196   284   195   124   176   146   197   114   137    93 
## 21.45 21.85  22.9 22.91 23.87 23.88 24.84 24.85 25.81 25.82  26.3 26.77 28.72 
##   172    90    13    28    20    37    31    42    26    47    53    38    31 
## 29.69 30.17 30.65 30.75 30.79 30.94 
##     9     9     5     4    11     1

Answer: 9.93 is the most common at 390 occurrences.

Apply table function to the annual_income variable. Do you think the result is helpful or not?

options(max.print = 20)
table(loans$annual_income)
## 
##    0    1 3000 3120 3300 4000 4800 5000 5208 5235 5500 7200 7500 7800 8000 8500 
##   23    1    2    1    1    1    1    2    1    1    1    1    1    1    1    1 
## 9000 9600 9840 9972 
##    4    2    1    1 
##  [ reached getOption("max.print") -- omitted 1443 entries ]

Answer: No, it is a continuous numeric variable, each only appears once, basically.

Lab Exercise 2

Create a histogram of loan_amount. Customize your plot to give a graph that looks most reasonable to you.

ggplot(data = loans) +
  geom_histogram(mapping = aes(x = loan_amount), binwidth = 5000)

Answer: I prefer a wider binwidth because it groups more data points together to create a more coherent histogram.

Create a histogram of annual_income. What is the issue with your graph?

ggplot(data = loans) +
  geom_histogram(mapping = aes(x = annual_income))

Answer: The histogram is extremly positively skewed. It will make more sense for us to add an xlim argument.

ggplot(data = loans) +
  geom_histogram(mapping = aes(x = annual_income)) + xlim(0, 500000)

Create a histogram of variable debt_to_income in loans with the following requirements:

  1. The plotting range of x is between 0 and 100
  2. The binwidth is 2
  3. Create a density plot on top of the histogram
ggplot(data = loans, aes(x = debt_to_income)) +
  geom_histogram(mapping = aes(y = after_stat(density)),
                 binwidth = 2, colour = "black", fill = "white") +
  xlim(0, 100) +
  geom_density(linewidth = 1.2)

Lab Exercise 3

For loans data, create a scatter plot of interest_rate vs debt_to_income with mapping color to grade. What can you learn from the graph?

ggplot(data = loans) +
  geom_point(mapping = aes(x = interest_rate, y = debt_to_income,
                           color = grade))

Answer: This figure tells me that a person with higher income and loan credit grade receives more favourable interest rates.

Lab Homework #1

  1. Finish all Lab Exercises. Done.

  2. Create a scatter plot of loan_amount vs interest_rate with a color grouping using term variable (please use factor(term) to convert it into a categorical variable). Save your plot to your local folder.

ggplot(data = loans) +
  geom_point(mapping = aes(x = loan_amount, y = interest_rate,
                           color = factor(term)))

  1. Submit your homework using R Markdown in pdf format.

Lab Homework #2

Please see another PDF document.