1 RQ1 Does default rate depend on the purpose of the loan?

1.1 SQ 1.1 Give the variables represented in each role in Figure 1

Default rate as a function of the loan’s purpose

  • Explanatory: Purpose

  • Response: Defaulted

1.2 SQ 1.2 Fill in the blanks and circle the correct options in parentheses to complete the text summary

Default rate differs somewhat by loan purpose. The highest default rate was for debt consolidation loans: 16.764 % of them defaulted. The lowest default rate was for major purchase loans: 88.787 % of them were repaid in full. Perhaps surprisingly, the default rate for credit card loans was 8.5 percentage points ( higher / ) than the default rate for educational loans. In other words, educational loans defaulted at a rate 8.5 % ( / lower) than credit card loans.

2 RQ 2 Does the debt-to-income ratio seem to influence the interest rate borrowers receive?

2.1 SQ 2.1 For this research question, what are

  1. The variables?

The variables are debt to income (dti) and interest rates (int_rate).

  1. Variable types?

Both debt to income (dti) and interest rates (int_rate) are quantitative /numeric variables.

  1. Variable roles?

The interest rate (int_rate) is the response (dependent) variable. The debt to income (dti) is the explanatory (independent) variable.

  1. Appropriate type of chart?

Because there are 2 numeric variables, the appropriate chart is the scatter plot, with interest rate on the y-axis and debt to income ratio on the x-axis.

  1. Appropriate summary statistic(s)?

The correlation coefficient and covariance would be useful summary statistics to quantify the degree of linear relationship between interest rates and debt to income ratio.

2.2 SQ 2.2 Chart

loans |>
    mutate(defaulted = factor(defaulted)) |>
    ggplot(aes(x = dti, y = int_rate, color = defaulted)) + 
    geom_point(alpha = 0.6) + 
    scale_color_tableau() + 
    ggthemes::theme_clean() + 
    labs(title = "Relationship Between Interest Rates and Debt to Income",
         subtitle = "Does the debt-to-income ratio seem to influence the interest rate borrowers \nreceive?", 
         x = "Debt to Income Ratio",
         y = "Interest Rates")
Relationship Between Interest Rates and Debt to Income

Relationship Between Interest Rates and Debt to Income

SQ 2.3 Text summary—give a qualitative summary plus one summary statistic

Qualitative summary: There is little relationship between interest rates and dent to income ratio. Note that the scatter plot is approximately flat meaning that as debt to income ratio goes up, interest rates on loans do not change significantly.

Quantitative summary: The correlation coefficient between interest rates and debt to income ratio is 0.22.

4 RQ 4 Can the borrower’s annual income be used to predict a loan’s monthly instalment amount?

4.1 SQ 4.1 For this research question, what are

  1. The variables?

The variables annual income (annual_inc) and monthly instalment amount (instalment).

  1. Variable types?

Annual income is a quantitative variable.

Loan instalment is also a quantitative variable.

  1. Variable roles?

In this context, monthly instalment amount is the response variable. Annual income is the explanatory or independent variable.

  1. Appropriate type of chart?

Because these two variables are quantitative, the scatter plot is the most appropriate chart to visualise the data.

loans |>
    ggplot(aes(x = annual_inc, y = installment)) + 
    geom_point() + 
    geom_smooth(method = "lm", se = FALSE) + 
    theme_bw() + 
    labs(title = "Annual Income and Loan Instalments", 
         x = "Annual Income, USD",
         y = "Monthly Instalment, USD") + 
    annotate(geom="text", x=1000000, y=750, label="y = 48.2-0.0505x",
              color="blue") 
Annual Income and Loan Instalments

Annual Income and Loan Instalments

loans |>
    ggplot(aes(x = annual_inc, y = installment)) + 
    geom_point() + 
    geom_smooth(method = "lm", se = FALSE) + 
    theme_bw() + 
    labs(title = "Annual Income and Loan Instalments (Log Scale)", x = "Annual Income, USD (Log Scale)",
         y = "Monthly Instalment, USD") + 
    annotate(geom="text", x=1000000, y=750, label="y = 48.2-0.0505x",
              color="blue") + 
    scale_x_log10()
Annual Income and Loan Instalments (Log Scale)

Annual Income and Loan Instalments (Log Scale)

  1. Appropriate summary statistic(s)?

These two are quantitative variables and hence, I use the correlation coefficient. The correlation between annual income and monthly instalments is 0.313.

In addition, I run regression analysis (Table 1). Here, I use both the logarithm of income and the raw income variables.

raw_model <- lm(installment ~ annual_inc, data = loans)

log_model <- lm(installment ~ log(annual_inc), data = loans)

stargazer::stargazer(raw_model, log_model, title = "Regression Analysis", type = "latex", column.labels = c("Raw Model", "Log Model"), header = FALSE)

To answer this research question, you could use either annual income or the natural log of annual income as the predictor. Assume you would like to use a linear model.

4.2 SQ 4.2 Which variable would make a better predictor and why?

Circle the best answer a. or b. and explain in the box below.

  1. Annual income.

  2. and why?

The model with the natural logarithm of income as the outcome has better predictive power. looking at the adjusted R-squared,the variability in the logarithm of annual income can explain 20.1% of the variability in monthly instalments compared to 9.8 % explained by the model using raw annual income.

5 RQ 5 Does the number of times the borrower has been 30+ days late provide an indication of whether they will default?

5.1 *SQ 5.1 For this research question, what are

  1. The variables?

The variables are:

  • Defaulted. Did the borrower default?

  • delinq_2yrs: The number of times the borrower had been 30+ days past due on a payment in the past 2 years.

  1. Variable types?
  • Defaulted is a categorical variable with 2 levels 0 and 1.

  • delinq_2yrs: This is a quantitative discrete variable of the count of times the borrower has been 30+ days past due on a payment in the past 2 years.

  1. Variable roles?
  • Defaulted is the response or dependent variable.

  • delinq_2yrs is the explanatory or independent variable.

  1. Appropriate type of chart?

Here, we are dealing with a quantitative and categorical variable. We can use a box plot to visualize the incidence of delinquency between defaulters and non-defaulters.

  1. Appropriate summary statistic(s)?

In this case, I would compute the mean and median number of times that defaulters and non-defaulters were 30+ days past due on a payment in the past 2 years.

5.2 SQ 5.2 Chart

See Figure 6 below.

loans |>
    mutate(defaulted = factor(defaulted, 
                              labels = c("No", "Yes"))) |>
    ggplot(mapping = aes(x = defaulted, 
                         y = delinq_2yrs,
                         fill = defaulted)) + 
    labs(x = "Defaulted?", 
         y = "Counts of Delinquency in Last 2 Years",
         title = "Defaults vs Number of Past Delinquencies",
         subtitle = 'Non-defaulters have lower incidences of average delinquencies although there are outliers') +
    geom_boxplot(show.legend = FALSE) + 
    scale_y_log10() + 
    scale_fill_tableau()
Defaults vs Number of Past Delinquencies

Defaults vs Number of Past Delinquencies

5.3 SQ 5.3 Table

loans|>
    mutate(defaulted = factor(defaulted, 
                              labels = c("Non-defaulters", "Defaulters"))) |>
    group_by(defaulted) |>
    summarise(Mean_times = mean(delinq_2yrs),
              Median_times = median(delinq_2yrs)) |>
    kbl(booktabs = TRUE, caption = "Number of Loan Delinquencies, Defaulters vs Non-defaulters") |>
    kable_classic(full_width = FALSE, 
                  latex_options = 'hold_position') |>
    footnote(general = "Non-defaulters have marginally low incidence of average delinquency")
Number of Loan Delinquencies, Defaulters vs Non-defaulters
defaulted Mean_times Median_times
Non-defaulters 0.162 0
Defaulters 0.175 0
Note:
Non-defaulters have marginally low incidence of average delinquency

5.4 SQ 5.4 Text Summary

The difference in the average delinquency rate between defaulters and non-defaulters is marginal. For instance, the median number of delinquency between the two groups is zero. However, non-defaulters show a marginally lower average number of delinquencies in the past 2 years, at 0.162 times and 0.175, respectively.

6 RQ 6 Does the purpose of a loan seem to affect the interest rate?

6.1 SQ 6.1 For this research question, what are:

  1. The variables?

The variables are:

  • Purpose (purpose).

  • Interest rates (int_rate).

  1. Variable types?
  • Purpose (purpose) is a categorical variable with 7 levels or categories.

  • Interest rate (int_rate) is a quantitative, continous variable.

  1. Variable roles?
  • Interest rate is the response or dependent variable.

  • Purpose is the explanatory or independent variable.

  1. Appropriate type of chart?

Again, given that we have a categorical and continuous variable, a box plot is suited for visualization.

  1. Appropriate summary statistic(s)?

In this case, it would be appropriate to compute the mean and median interest rates per loan purpose.

6.2 SQ 6.2 Table

loans|>
    mutate(defaulted = factor(purpose)) |>
    group_by(purpose) |>
    summarise(mean_interest_rates = mean(int_rate),
              median_interest_rates = median(int_rate)) |>
    arrange(desc(mean_interest_rates)) |>
    kbl(booktabs = TRUE, 
        caption = "Interest Rates by Loan Purpose") |>
    kable_classic(full_width = FALSE, 
                  latex_options = 'hold_position') |>
    footnote(general = "Small businesses have the highest interest on loans. Major Purchases have the least")
Interest Rates by Loan Purpose
purpose mean_interest_rates median_interest_rates
small_business 0.138 0.138
debt_consolidation 0.127 0.128
educational 0.120 0.122
credit_card 0.120 0.119
home_improvement 0.118 0.118
all_other 0.117 0.118
major_purchase 0.114 0.116
Note:
Small businesses have the highest interest on loans. Major Purchases have the least

6.3 SQ 6.3 Text summary

The purpose of a loan has a notable relationship with interest rates. Small businesses face the highest interest rates. Major purchases are charged the lowest rates, presumably because these purchases can serve as collateral.

7 RQ 7 Is this data set consistent with the conventional wisdom regarding interest rates and default risk?

The conventional wisdom is that lenders charge higher interest rates for higher-risk loans, i.e. those that are more likely to default.

“Figure 3: A chart visualizing the relationship between interest rate and default.”

7.1 SQ 7.1 Based on Figure 3, is this data set consistent with the conventional wisdom?

Why or why not—as a text summary, both qualitative and quantitative, and include metadata

Qualitative: The figure is line with conventional wisdom. There is a significant relationship between interest rates charged on loans and the rate of interest. Specifically, loans with higher rates of interest exhibit a higher rate of default.

Quantitative: In the figure, we see that loans with an interest rate of between 15.7% and 21.6% have a higher default rate of 26.9%. This is in contrast with loans with an interest of between 6% and 8.6% that have a default rate of 4.6%.