Question 0: Finish all Lab Exercises


Lab 1: Answer the following questions by using unique or table function

1. How many distinct values are there for homeownership variable? Which value is the most common one?
Answer: 3. Mortgage is the most common one.

2. How many distinct interest rates are there? Which value is the most common one?
Answer: 58. 9.93 is the most common one.

3. Apply table function to the annual_income variable. Do you think the result is helpful or not?
Answer: Not really.


Lab 2: Question on Data

1. Why is the lowest interest rate 5%-ish and there was no lower interest rate? Can you explain?
Answer: The Federal Reserves generally keep the interest rate higher than the bank saving rate, so that they can earn the interest and encourage people from taking loan, etc.

2. Why are there some peak interest rates arond 7%, 10%, 14%? Can you explain?
Answer: According to the data, the histogram spikes up at 7%-ish, 10%-ish, and 14%-ish. Maybe because of the higher the interest rates is, the more it increases the return on savings. They also make the cost of borrowing more expensive. Higher interest rates help to slow down price rises (inflation).


Lab 3: Lab Exercise

1. Create a histogram of loan_amount. Customize your plot to give a graph that looks most reasonable to you.

 ggplot(data = loans) + 
  geom_histogram(mapping = aes(x = loan_amount), binwidth = 5000, boundary = 0) +
  xlim(0, 40000)

2. Create a histogram of annual_income. What is the issue with your graph?

 ggplot(data = loans) + geom_histogram(mapping = aes(x = annual_income), binwidth = 5000, boundary = 1000) + xlim(0, 250000)

Answer: Overall, the trend is right-skewed shape, yet the graph has several spikes along the way that implies some of the annual income amount might be more common among people.


Lab 4: Lab Exercise

Create a histogram of variable debt_to_income in loans with the following requirements:
1. The plotting range of x is between 0 and 100
2. The binwidth is 2
3. Create a density plot on top of the histogram

ggplot(loans, aes(x = debt_to_income)) + 
   geom_histogram(aes(y = after_stat(density)), binwidth = 2, boundary = 5, colour = "black", fill = "white") +
   xlim(0, 100) +
   geom_density(linewidth = 1.5)

Question: Can you explain the distribution of debt_to_income?
Answer: It is a right skewed shape. Generally, people consider their financiaal state is’heathy’ when the DTI(debt to income ratio) is less than 36%. According to the plot, majority of the density is under 40%, which means this data basically aligns with the expected reality. Although there are some density on the right side(over 40%), it is minority.


Lab 5: Lab Exercise

For loans data, create a scatter plot of interest_rate vs debt_to_income with mapping color to grade. What can you learn from the graph?

ggplot(data = loans)+
   geom_point(mapping = aes(x = interest_rate, y = debt_to_income, color = grade))

Answer: The data shows that most of the DTI is sunder 50%, despite their grade nor interest rate.


Question 1: Create a scatter plot of loan_amount vs interest_rate with a color grouping using term variable (please use factor(term) to convert it into a categorical variable). Save your plot to your local folder.


loans <- select(loans_full_schema, loan_amount, interest_rate, term, grade, state, annual_income, homeownership, debt_to_income)

ggplot(data = loans) +
  geom_point(mapping = aes(x = loan_amount, y = interest_rate, color = term))


Question 2: Submit your homework using R Markdown in pdf format.

Yes, sir.