Answer the following questions by using unique or table function:
unique(loans$homeownership)
## [1] MORTGAGE RENT OWN
## Levels: ANY MORTGAGE OWN RENT
Answer: 3.
unique(loans$interest_rate)
## [1] 14.07 12.61 17.09 6.72 13.59 11.99 6.71 15.04 9.92 9.43 19.03 28.72
## [13] 26.77 15.05 6.08 11.98 7.96 7.34 5.32 6.07 12.62 9.44 20.39 9.93
## [25] 21.45 10.42 18.06 22.91 30.79 17.47 5.31 7.97 14.08 19.42 10.91 16.02
## [37] 13.58 16.01 20.00 21.85 10.90 23.87 7.35 23.88 25.82 10.41 18.45 30.17
## [49] 24.85 25.81 24.84 30.75 29.69 26.30 22.90 6.00 30.65 30.94
Answer: 58 interest rates.
table(loans$interest_rate)
##
## 5.31 5.32 6 6.07 6.08 6.71 6.72 7.34 7.35 7.96 7.97 9.43 9.44
## 188 234 3 202 277 192 312 243 325 211 274 280 367
## 9.92 9.93 10.41 10.42 10.9 10.91 11.98 11.99 12.61 12.62 13.58 13.59 14.07
## 248 390 194 346 275 306 255 376 264 333 225 347 183
## 14.08 15.04 15.05 16.01 16.02 17.09 17.47 18.06 18.45 19.03 19.42 20 20.39
## 318 199 304 196 284 195 124 176 146 197 114 137 93
## 21.45 21.85 22.9 22.91 23.87 23.88 24.84 24.85 25.81 25.82 26.3 26.77 28.72
## 172 90 13 28 20 37 31 42 26 47 53 38 31
## 29.69 30.17 30.65 30.75 30.79 30.94
## 9 9 5 4 11 1
Answer: 9.93 is the most common at 390 occurrences.
options(max.print = 20)
table(loans$annual_income)
##
## 0 1 3000 3120 3300 4000 4800 5000 5208 5235 5500 7200 7500 7800 8000 8500
## 23 1 2 1 1 1 1 2 1 1 1 1 1 1 1 1
## 9000 9600 9840 9972
## 4 2 1 1
## [ reached getOption("max.print") -- omitted 1443 entries ]
Answer: No, it is a continuous numeric variable, each only appears once, basically.
ggplot(data = loans) +
geom_histogram(mapping = aes(x = loan_amount), binwidth = 5000)
Answer: I prefer a wider binwidth because it groups more
data points together to create a more coherent histogram.
ggplot(data = loans) +
geom_histogram(mapping = aes(x = annual_income))
Answer: The histogram is extremly positively skewed. It will make
more sense for us to add an xlim argument.
ggplot(data = loans) +
geom_histogram(mapping = aes(x = annual_income)) + xlim(0, 500000)
ggplot(data = loans, aes(x = debt_to_income)) +
geom_histogram(mapping = aes(y = after_stat(density)),
binwidth = 2, colour = "black", fill = "white") +
xlim(0, 100) +
geom_density(linewidth = 1.2)
ggplot(data = loans) +
geom_point(mapping = aes(x = interest_rate, y = debt_to_income,
color = grade))
Answer: This figure tells me that a person with higher income and loan credit grade receives more favourable interest rates.
Finish all Lab Exercises. Done.
Create a scatter plot of loan_amount vs interest_rate with a color grouping using term variable (please use factor(term) to convert it into a categorical variable). Save your plot to your local folder.
ggplot(data = loans) +
geom_point(mapping = aes(x = loan_amount, y = interest_rate,
color = factor(term)))
Please see another PDF document.