Lab 1: Answer the following questions by using unique or
table function
1. How many distinct values are there for homeownership variable?
Which value is the most common one?
Answer: 3. Mortgage is the most common one.
2. How many distinct interest rates are there? Which value is the
most common one?
Answer: 58. 9.93 is the most common one.
3. Apply table function to the annual_income variable. Do you
think the result is helpful or not?
Answer: Not really.
Lab 2: Question on Data
1. Why is the lowest interest rate 5%-ish and there was no lower
interest rate? Can you explain?
Answer: The Federal Reserves generally keep the
interest rate higher than the bank saving rate, so that they can earn
the interest and encourage people from taking loan, etc.
2. Why are there some peak interest rates arond 7%, 10%, 14%? Can
you explain?
Answer: According to the data, the histogram spikes up
at 7%-ish, 10%-ish, and 14%-ish. Maybe because of the higher the
interest rates is, the more it increases the return on savings. They
also make the cost of borrowing more expensive. Higher interest rates
help to slow down price rises (inflation).
Lab 3: Lab Exercise
1. Create a histogram of loan_amount. Customize your plot to give
a graph that looks most reasonable to you.
ggplot(data = loans) +
geom_histogram(mapping = aes(x = loan_amount), binwidth = 5000, boundary = 0) +
xlim(0, 40000)
2. Create a histogram of annual_income. What is the issue with
your graph?
ggplot(data = loans) + geom_histogram(mapping = aes(x = annual_income), binwidth = 5000, boundary = 1000) + xlim(0, 250000)
Answer: Overall, the trend is right-skewed shape,
yet the graph has several spikes along the way that implies some of the
annual income amount might be more common among people.
Lab 4: Lab Exercise
Create a histogram of variable debt_to_income in loans with the
following requirements:
1. The plotting range of x is between 0 and 100
2. The binwidth is 2
3. Create a density plot on top of the histogram
ggplot(loans, aes(x = debt_to_income)) +
geom_histogram(aes(y = after_stat(density)), binwidth = 2, boundary = 5, colour = "black", fill = "white") +
xlim(0, 100) +
geom_density(linewidth = 1.5)
Question: Can you explain the distribution of
debt_to_income?
Answer: It is a right skewed shape. Generally, people
consider their financiaal state is’heathy’ when the DTI(debt to income
ratio) is less than 36%. According to the plot, majority of the density
is under 40%, which means this data basically aligns with the expected
reality. Although there are some density on the right side(over 40%), it
is minority.
Lab 5: Lab Exercise
For loans data, create a scatter plot of interest_rate vs
debt_to_income with mapping color to grade. What can you learn from the
graph?
ggplot(data = loans)+
geom_point(mapping = aes(x = interest_rate, y = debt_to_income, color = grade))
Answer: The data shows that most of the DTI is
sunder 50%, despite their grade nor interest rate.
loans <- select(loans_full_schema, loan_amount, interest_rate, term, grade, state, annual_income, homeownership, debt_to_income)
ggplot(data = loans) +
geom_point(mapping = aes(x = loan_amount, y = interest_rate, color = term))
Yes, sir.