table(loans$homeownership)
##
## ANY MORTGAGE OWN RENT
## 0 0 4789 1353 3858
There are four distinct values. Mortgage is the most common one.
sort(table(loans$interest_rate))
##
## 30.94 6 30.75 30.65 29.69 30.17 30.79 22.9 23.87 25.81 22.91 24.84 28.72
## 1 3 4 5 9 9 11 13 20 26 28 31 31
## 23.88 26.77 24.85 25.82 26.3 21.85 20.39 19.42 17.47 20 18.45 21.45 18.06
## 37 38 42 47 53 90 93 114 124 137 146 172 176
## 14.07 5.31 6.71 10.41 17.09 16.01 19.03 15.04 6.07 7.96 13.58 5.32 7.34
## 183 188 192 194 195 196 197 199 202 211 225 234 243
## 9.92 11.98 12.61 7.97 10.9 6.08 9.43 16.02 15.05 10.91 6.72 14.08 7.35
## 248 255 264 274 275 277 280 284 304 306 312 318 325
## 12.62 10.42 13.59 9.44 11.99 9.93
## 333 346 347 367 376 390
There are 58 distinct interest rates. 9.93 is the most common one.
table(table(loans$annual_income))
##
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
## 1111 102 38 29 17 12 9 5 7 2 7 2 1 5 2 1
## 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
## 3 5 1 2 4 5 4 1 4 3 1 5 2 4 2 1
## 34 35 36 37 39 40 41 43 44 45 46 47 48 50 53 60
## 2 1 2 4 1 1 2 1 1 2 6 1 2 1 1 2
## 62 66 69 70 73 75 76 77 80 89 90 92 96 99 110 116
## 1 2 2 1 1 1 2 1 1 1 1 1 1 1 1 1
## 121 124 141 168 178 182 204 221 236 247 248 260 273 314 350 383
## 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1
I think the result is useless. Because most of data are unique. It is a continuous variable.
ggplot(loans) +
geom_histogram(aes(x = loan_amount), binwidth = 2500, boundary = 0)
ggplot(loans) +
geom_histogram(aes(x = annual_income))
Extreme values cause the plot to be squeezed to the left.
Create a histogram of variable debt_to_income in loans with the following requirements:
ggplot(loans, aes(x = debt_to_income)) +
geom_histogram(aes(y = after_stat(density)), binwidth = 2) +
geom_density(color = "red") +
xlim(0, 100)
It looks like a gamma distribution.
For loans data, create a scatter plot of interest_rate vs debt_to_income with mapping color to grade. What can you learn from the graph?
ggplot(loans) +
geom_point(aes(x = interest_rate, y = debt_to_income, color = grade))
The higher loan grade always has higher interest rate.
ggplot(loans) +
geom_point(aes(x = loan_amount, y = interest_rate, color = factor(term)))
ggsave("test_plot.pdf")