library(ggplot2)
library(openintro)
## 载入需要的程序包:airports
## 载入需要的程序包:cherryblossom
## 载入需要的程序包:usdata
data(loans_full_schema)

ggplot(loans_full_schema) + 
  geom_point(mapping = aes(x = loan_amount,y = interest_rate, color = factor(term)))

ggsave("my-plot.pdf")
## Saving 7 x 5 in image

How many distinct values are there for homeownership variable? unique(loans$homeownership)—4

Which value is the most common one? How many distinct interest rates are there? Which value is the most common one? MORTGAGE 58 9.93

Apply table function to the annual_income variable. Do you think the result is helpful or not? useless,because annual income is a continuous numeric variable with many unique values.

ggplot(loans_full_schema, aes(x = loan_amount)) +
  geom_histogram(binwidth = 2000,fill = "steelblue",color = "black")

ggplot(loans_full_schema, aes(x = annual_income)) +
  geom_histogram(
    bins = 30,
    fill = "orange",
    color = "black"
  ) 

Most observations are clustered on the left.

ggplot(loans_full_schema, aes(x = debt_to_income)) +
  geom_histogram(aes(y = after_stat(density)), binwidth = 2, 
                 boundary = 0, colour = "black", fill = "white") +
  xlim(0,100)+
  geom_density(colour="red", linewidth = 1) 
## Warning: Removed 57 rows containing non-finite outside the scale range
## (`stat_bin()`).
## Warning: Removed 57 rows containing non-finite outside the scale range
## (`stat_density()`).

right-skewed Most borrowers have relatively low to moderate debt-to-income ratios, while a smaller number of borrowers have very high ratios, creating a long right tail.

ggplot(data = loans_full_schema) + 
  geom_point(mapping = aes(x = interest_rate, y = debt_to_income, color = grade))
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_point()`).

Grade has an additional effect on interest rate beyond debt-to-income ratio alone.