library(ggplot2)
library(openintro)
## 载入需要的程序包:airports
## 载入需要的程序包:cherryblossom
## 载入需要的程序包:usdata
data(loans_full_schema)
ggplot(loans_full_schema) +
geom_point(mapping = aes(x = loan_amount,y = interest_rate, color = factor(term)))
ggsave("my-plot.pdf")
## Saving 7 x 5 in image
How many distinct values are there for homeownership variable? unique(loans$homeownership)—4
Which value is the most common one? How many distinct interest rates are there? Which value is the most common one? MORTGAGE 58 9.93
Apply table function to the annual_income variable. Do you think the result is helpful or not? useless,because annual income is a continuous numeric variable with many unique values.
ggplot(loans_full_schema, aes(x = loan_amount)) +
geom_histogram(binwidth = 2000,fill = "steelblue",color = "black")
ggplot(loans_full_schema, aes(x = annual_income)) +
geom_histogram(
bins = 30,
fill = "orange",
color = "black"
)
Most observations are clustered on the left.
ggplot(loans_full_schema, aes(x = debt_to_income)) +
geom_histogram(aes(y = after_stat(density)), binwidth = 2,
boundary = 0, colour = "black", fill = "white") +
xlim(0,100)+
geom_density(colour="red", linewidth = 1)
## Warning: Removed 57 rows containing non-finite outside the scale range
## (`stat_bin()`).
## Warning: Removed 57 rows containing non-finite outside the scale range
## (`stat_density()`).
right-skewed Most borrowers have relatively low to moderate debt-to-income ratios, while a smaller number of borrowers have very high ratios, creating a long right tail.
ggplot(data = loans_full_schema) +
geom_point(mapping = aes(x = interest_rate, y = debt_to_income, color = grade))
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_point()`).
Grade has an additional effect on interest rate beyond debt-to-income
ratio alone.