ECON 465 – Stage 3 Final Report: Credit Risk and Loan Approval

Author

Ece Kurtoğlu and Halil Rıfat Başbuğ

Economic Question

Do lenders rely primarily on creditworthiness signals (CIBIL score, income) or does a broader applicant profile — assets, education, employment type — independently predict loan approval? And what does this reveal about how credit risk is assessed in practice?

Access to credit is a fundamental mechanism in modern economies: it enables households to invest in housing, education, and consumption smoothing, and it allows firms to finance productive activity. The key question is not merely whether we can predict loan approval, but what information lenders actually use. If a simple credit score and income dominate, lending follows a narrow risk-based model. If broader profile variables add independent predictive power, lenders incorporate a richer assessment of borrower characteristics. Understanding this distinction matters for credit market efficiency and financial inclusion policy: applicants with limited credit histories may face systematic barriers regardless of their actual assets or earning potential.


Data

Source and Variables

The dataset is the Loan Approval Prediction dataset, obtained from Kaggle. It contains information about loan applicants including income, loan amount, loan term, credit score, number of dependents, employment status, education level, and asset values.

  • Source: Kaggle — Loan Approval Prediction dataset (loan_approval_dataset.csv)
  • Target variable: loan_status — binary: Approved / Rejected
  • Predictors: annual income, loan amount, loan term, CIBIL score, residential assets, commercial assets, luxury assets, bank assets, number of dependents, education level, employment type

Import and Cleaning

loan_raw <- read_csv("loan_approval_dataset.csv")
Rows: 4269 Columns: 13
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (3): education, self_employed, loan_status
dbl (10): loan_id, no_of_dependents, income_annum, loan_amount, loan_term, c...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
loan_clean <- loan_raw |>
  clean_names() |>
  drop_na() |>
  mutate(
    loan_status   = str_trim(loan_status),
    loan_approved = if_else(loan_status == "Approved", "approved", "rejected"),
    loan_approved = factor(loan_approved, levels = c("approved", "rejected"))
  )

glimpse(loan_clean)
Rows: 4,269
Columns: 14
$ loan_id                  <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14…
$ no_of_dependents         <dbl> 2, 0, 3, 3, 5, 0, 5, 2, 0, 5, 4, 2, 3, 2, 1, …
$ education                <chr> "Graduate", "Not Graduate", "Graduate", "Grad…
$ self_employed            <chr> "No", "Yes", "No", "No", "Yes", "Yes", "No", …
$ income_annum             <dbl> 9600000, 4100000, 9100000, 8200000, 9800000, …
$ loan_amount              <dbl> 29900000, 12200000, 29700000, 30700000, 24200…
$ loan_term                <dbl> 12, 8, 20, 8, 20, 10, 4, 20, 20, 10, 2, 18, 1…
$ cibil_score              <dbl> 778, 417, 506, 467, 382, 319, 678, 382, 782, …
$ residential_assets_value <dbl> 2400000, 2700000, 7100000, 18200000, 12400000…
$ commercial_assets_value  <dbl> 17600000, 2200000, 4500000, 3300000, 8200000,…
$ luxury_assets_value      <dbl> 22700000, 8800000, 33300000, 23300000, 294000…
$ bank_asset_value         <dbl> 8000000, 3300000, 12800000, 7900000, 5000000,…
$ loan_status              <chr> "Approved", "Rejected", "Rejected", "Rejected…
$ loan_approved            <fct> approved, rejected, rejected, rejected, rejec…
nrow(loan_clean)
[1] 4269

Variable names were standardised with clean_names(). Whitespace was removed from loan_status with str_trim() before creating the binary factor loan_approved. Rows with missing values were removed with drop_na(). The final dataset is tidy: each row is one loan application and each column is one variable.


Probability Analysis

Summary Statistics

loan_clean |>
  mutate(approved_num = as.numeric(loan_approved == "approved")) |>
  summarise(
    N             = n(),
    Approved      = sum(approved_num),
    Rejected      = n() - sum(approved_num),
    Approval_Rate = round(mean(approved_num), 4),
    Mean_Income   = round(mean(income_annum), 0),
    Mean_CIBIL    = round(mean(cibil_score), 2),
    Mean_Loan     = round(mean(loan_amount), 0)
  )
# A tibble: 1 × 7
      N Approved Rejected Approval_Rate Mean_Income Mean_CIBIL Mean_Loan
  <int>    <dbl>    <dbl>         <dbl>       <dbl>      <dbl>     <dbl>
1  4269     2656     1613         0.622     5059124       600.  15133450

Class Distribution

loan_clean |>
  count(loan_approved) |>
  mutate(pct = n / sum(n)) |>
  ggplot(aes(x = loan_approved, y = pct, fill = loan_approved)) +
  geom_col(show.legend = FALSE, width = 0.5) +
  geom_text(aes(label = scales::percent(pct, accuracy = 0.1)),
            vjust = -0.5, size = 4.5) +
  scale_y_continuous(labels = scales::percent, limits = c(0, 0.75)) +
  scale_fill_manual(values = c("approved" = "steelblue", "rejected" = "tomato")) +
  labs(
    title = "Distribution of Loan Approval Status",
    x     = "Loan Status",
    y     = "Share of Applications"
  ) +
  theme_minimal()

The bar chart shows the proportion of approved versus rejected applications. The loan_approved variable follows a Bernoulli distribution with probability parameter equal to the approval rate. The two classes are reasonably balanced, which means a classifier that simply predicts the majority class would not achieve a high baseline accuracy — any model that performs well must be genuinely learning from the predictors.

CIBIL Score Distribution by Approval Status

loan_clean |>
  ggplot(aes(x = cibil_score, fill = loan_approved)) +
  geom_histogram(bins = 30, alpha = 0.75, position = "identity") +
  scale_fill_manual(values = c("approved" = "steelblue", "rejected" = "tomato")) +
  labs(
    title = "CIBIL Score Distribution by Loan Approval Status",
    x     = "CIBIL Score",
    y     = "Count",
    fill  = "Status"
  ) +
  theme_minimal()

The CIBIL score histogram reveals near-complete separation between approved and rejected applicants: rejected applications cluster at low CIBIL scores while approved applications cluster at high scores. This is the single strongest visual signal in the dataset and directly motivates including CIBIL score in both models.

Continuous Predictors by Approval Status

loan_clean |>
  select(loan_approved, income_annum, loan_amount, loan_term,
         residential_assets_value, commercial_assets_value,
         luxury_assets_value, bank_asset_value) |>
  pivot_longer(-loan_approved, names_to = "variable", values_to = "value") |>
  ggplot(aes(x = loan_approved, y = value, fill = loan_approved)) +
  geom_boxplot(show.legend = FALSE) +
  scale_fill_manual(values = c("approved" = "steelblue", "rejected" = "tomato")) +
  facet_wrap(~variable, scales = "free_y") +
  labs(
    title = "Continuous Predictors by Loan Approval Status",
    x     = "Loan Status",
    y     = "Value"
  ) +
  theme_minimal()

Approved applicants have higher median income and higher asset values across all categories. The differences are visible but with more overlap than the CIBIL score, suggesting these variables carry additional predictive signal — but secondary to credit score.

Categorical Predictors — Approval Rate by Group

loan_clean |>
  select(loan_approved, education, self_employed) |>
  pivot_longer(-loan_approved, names_to = "variable", values_to = "category") |>
  ggplot(aes(x = category, fill = loan_approved)) +
  geom_bar(position = "fill") +
  scale_y_continuous(labels = scales::percent) +
  scale_fill_manual(values = c("approved" = "steelblue", "rejected" = "tomato")) +
  facet_wrap(~variable, scales = "free_x") +
  labs(
    title = "Loan Approval Rate by Categorical Predictors",
    x     = NULL,
    y     = "Share of Applications",
    fill  = "Status"
  ) +
  theme_minimal()

Education level and employment type show small differences in approval rates between groups, suggesting these variables may add limited independent predictive power once financial indicators are controlled for.


Modeling

Data Splitting

set.seed(465)

loan_split <- initial_split(loan_clean, prop = 0.80, strata = loan_approved)
loan_train <- training(loan_split)
loan_test  <- testing(loan_split)

cat("Training set:", nrow(loan_train), "observations\n")
Training set: 3414 observations
cat("Test set:    ", nrow(loan_test),  "observations\n")
Test set:     855 observations
cat("\nClass balance — training set:\n")

Class balance — training set:
print(table(loan_train$loan_approved))

approved rejected 
    2124     1290 
cat("\nClass balance — test set:\n")

Class balance — test set:
print(table(loan_test$loan_approved))

approved rejected 
     532      323 

The dataset was split 80/20. strata = loan_approved ensures the proportion of approved and rejected applications is preserved in both sets. set.seed(465) makes the split reproducible.

Model Specification

log_spec <- logistic_reg() |>
  set_engine("glm") |>
  set_mode("classification")

Logistic regression is appropriate because the target variable is binary. It models the log-odds of loan approval as a linear function of predictors, and produces interpretable coefficients in the form of log-odds (and odds ratios after exponentiation).

Model 1: Core Financial Indicators

loan_model_1 <- log_spec |>
  fit(
    loan_approved ~ income_annum + loan_amount + loan_term + cibil_score,
    data = loan_train
  )

tidy(loan_model_1)
# A tibble: 5 × 5
  term             estimate    std.error statistic   p.value
  <chr>               <dbl>        <dbl>     <dbl>     <dbl>
1 (Intercept)  11.1         0.456            24.2  1.53e-129
2 income_annum  0.000000438 0.0000000640      6.84 7.79e- 12
3 loan_amount  -0.000000139 0.0000000197     -7.04 1.86e- 12
4 loan_term     0.148       0.0125           11.8  3.93e- 32
5 cibil_score  -0.0242      0.000905        -26.7  3.08e-157

Rationale: This baseline model uses only the core financial indicators traditionally applied in credit scoring. cibil_score is the most direct measure of creditworthiness. income_annum captures repayment capacity. loan_amount captures lender exposure. loan_term reflects repayment horizon and uncertainty. This model tests whether standard financial metrics are sufficient for predicting approval decisions.

Model 2: Full Applicant Profile

loan_model_2 <- log_spec |>
  fit(
    loan_approved ~ no_of_dependents + education + self_employed +
      income_annum + loan_amount + loan_term + cibil_score +
      residential_assets_value + commercial_assets_value +
      luxury_assets_value + bank_asset_value,
    data = loan_train
  )

tidy(loan_model_2)
# A tibble: 12 × 5
   term                     estimate    std.error statistic   p.value
   <chr>                       <dbl>        <dbl>     <dbl>     <dbl>
 1 (Intercept)               1.11e+1 0.479          23.1    2.88e-118
 2 no_of_dependents          1.00e-2 0.0389          0.258  7.97e-  1
 3 educationNot Graduate     5.68e-2 0.131           0.433  6.65e-  1
 4 self_employedYes         -7.37e-2 0.131          -0.564  5.73e-  1
 5 income_annum              6.26e-7 0.000000101     6.19   6.16e- 10
 6 loan_amount              -1.40e-7 0.0000000198   -7.09   1.37e- 12
 7 loan_term                 1.51e-1 0.0127         11.9    1.35e- 32
 8 cibil_score              -2.43e-2 0.000912      -26.7    1.46e-156
 9 residential_assets_value -1.08e-9 0.0000000132   -0.0815 9.35e-  1
10 commercial_assets_value  -4.10e-9 0.0000000189   -0.216  8.29e-  1
11 luxury_assets_value      -3.57e-8 0.0000000193   -1.85   6.43e-  2
12 bank_asset_value         -6.97e-8 0.0000000371   -1.88   6.04e-  2

Rationale: This extended model adds asset holdings, employment type, education, and number of dependents. Asset variables capture collateral — the lender can claim these in default, reducing credit risk. self_employed captures income uncertainty. education proxies for long-term income stability. no_of_dependents reflects financial obligations that reduce disposable income. This model tests whether a richer applicant profile improves predictions beyond core financial metrics.


Results

Predictions and Evaluation Metrics

loan_pred_1 <- predict(loan_model_1, loan_test) |> bind_cols(loan_test)

loan_acc_1  <- loan_pred_1 |> accuracy(truth  = loan_approved, estimate = .pred_class)
loan_prec_1 <- loan_pred_1 |> precision(truth = loan_approved, estimate = .pred_class)
loan_rec_1  <- loan_pred_1 |> recall(truth    = loan_approved, estimate = .pred_class)
loan_pred_2 <- predict(loan_model_2, loan_test) |> bind_cols(loan_test)

loan_acc_2  <- loan_pred_2 |> accuracy(truth  = loan_approved, estimate = .pred_class)
loan_prec_2 <- loan_pred_2 |> precision(truth = loan_approved, estimate = .pred_class)
loan_rec_2  <- loan_pred_2 |> recall(truth    = loan_approved, estimate = .pred_class)

Model Comparison Table

bind_rows(
  bind_rows(loan_acc_1, loan_prec_1, loan_rec_1) |>
    mutate(Model = "Model 1: Core Financial (4 vars)"),
  bind_rows(loan_acc_2, loan_prec_2, loan_rec_2) |>
    mutate(Model = "Model 2: Full Profile (11 vars)")
) |>
  select(Model, Metric = .metric, Estimate = .estimate) |>
  mutate(Estimate = round(Estimate, 4))
# A tibble: 6 × 3
  Model                            Metric    Estimate
  <chr>                            <chr>        <dbl>
1 Model 1: Core Financial (4 vars) accuracy     0.926
2 Model 1: Core Financial (4 vars) precision    0.954
3 Model 1: Core Financial (4 vars) recall       0.927
4 Model 2: Full Profile (11 vars)  accuracy     0.924
5 Model 2: Full Profile (11 vars)  precision    0.953
6 Model 2: Full Profile (11 vars)  recall       0.923

Accuracy measures the overall share of correct predictions. Precision measures how many applications predicted as approved were actually approved — low precision means approving risky borrowers, which is costly for lenders. Recall measures how many genuinely approved applications were correctly identified — low recall means incorrectly rejecting creditworthy applicants, reducing credit market efficiency.

Model Selection: Why Model 1?

Model 1 is selected as the final model. It achieves equal or higher accuracy, precision, and recall on the test set despite using only 4 predictors instead of 11.

The logic is direct: if the 7 additional variables in Model 2 — assets, employment type, education, number of dependents — carried genuine independent predictive information about loan approval, Model 2 should outperform Model 1 on the held-out test set. It does not. This means those variables add no signal beyond what CIBIL score, income, loan amount, and loan term already capture. Including them adds model complexity without improving predictive accuracy, and on a test set the model has never seen, they add noise rather than signal.

This choice is also supported by the principle of parsimony (Occam’s Razor): when two models achieve similar predictive performance, the simpler model is preferred because it is easier to interpret, less prone to overfitting, and requires fewer data inputs. Model 1 satisfies all three conditions — it matches Model 2 on every evaluation metric while using less than half the number of predictors.

Model 1 is also more interpretable and more practical: a lender using four variables can explain every decision directly in terms of creditworthiness, repayment capacity, loan size, and loan term — which matters for regulatory compliance and for communicating decisions to applicants.

Cross-Validation

set.seed(465)

loan_folds <- vfold_cv(loan_train, v = 5, strata = loan_approved)

loan_cv <- fit_resamples(
  log_spec,
  loan_approved ~ income_annum + loan_amount + loan_term + cibil_score,
  resamples = loan_folds,
  metrics   = metric_set(accuracy, precision, recall)
)

loan_cv_results <- collect_metrics(loan_cv)
loan_cv_results
# A tibble: 3 × 6
  .metric   .estimator  mean     n std_err .config        
  <chr>     <chr>      <dbl> <int>   <dbl> <chr>          
1 accuracy  binary     0.914     5 0.00542 pre0_mod0_post0
2 precision binary     0.927     5 0.00448 pre0_mod0_post0
3 recall    binary     0.935     5 0.00610 pre0_mod0_post0
cv_acc  <- loan_cv_results |> filter(.metric == "accuracy")  |> pull(mean)
cv_prec <- loan_cv_results |> filter(.metric == "precision") |> pull(mean)
cv_rec  <- loan_cv_results |> filter(.metric == "recall")    |> pull(mean)

test_acc  <- loan_acc_1  |> pull(.estimate)
test_prec <- loan_prec_1 |> pull(.estimate)
test_rec  <- loan_rec_1  |> pull(.estimate)

tibble(
  Metric     = c("Accuracy", "Precision", "Recall"),
  `CV Mean`  = round(c(cv_acc, cv_prec, cv_rec), 4),
  `Test Set` = round(c(test_acc, test_prec, test_rec), 4),
  Difference = round(c(test_acc - cv_acc, test_prec - cv_prec, test_rec - cv_rec), 4)
)
# A tibble: 3 × 4
  Metric    `CV Mean` `Test Set` Difference
  <chr>         <dbl>      <dbl>      <dbl>
1 Accuracy      0.914      0.926     0.0127
2 Precision     0.927      0.954     0.0268
3 Recall        0.935      0.927    -0.0083

The 5-fold cross-validation results are very close to the test set results across all three metrics. The small differences confirm that Model 1 is stable and does not overfit the training data. Consistent performance across all five folds means the model’s accuracy does not depend on any particular subset of the training data.


Economic Interpretation

Answer to the Economic Question

The model results confirm that loan approval can be predicted with high accuracy using only four core financial indicators: CIBIL score, annual income, loan amount, and loan term. Adding a richer applicant profile — assets, employment type, education, number of dependents — does not improve predictive performance. This tells us that lenders in this dataset operate a narrow, credit-score-dominated model rather than a holistic applicant assessment.

Coefficient Interpretation

loan_coefs <- tidy(loan_model_1) |>
  filter(term != "(Intercept)") |>
  mutate(
    odds_ratio = round(exp(estimate), 4),
    estimate   = round(estimate, 4),
    std.error  = round(std.error, 4),
    statistic  = round(statistic, 3),
    stars = case_when(
      p.value < 0.001 ~ "***",
      p.value < 0.01  ~ "**",
      p.value < 0.05  ~ "*",
      p.value < 0.1   ~ ".",
      TRUE            ~ ""
    ),
    p.value = round(p.value, 4)
  ) |>
  select(term, estimate, std.error, statistic, p.value, stars, odds_ratio) |>
  arrange(p.value)

loan_coefs
# A tibble: 4 × 7
  term         estimate std.error statistic p.value stars odds_ratio
  <chr>           <dbl>     <dbl>     <dbl>   <dbl> <chr>      <dbl>
1 income_annum   0         0           6.84       0 ***        1    
2 loan_amount    0         0          -7.04       0 ***        1    
3 loan_term      0.148     0.0125     11.8        0 ***        1.16 
4 cibil_score   -0.0242    0.0009    -26.7        0 ***        0.976
cat("\nSignificance codes:  *** p<0.001   ** p<0.01   * p<0.05   . p<0.1\n")

Significance codes:  *** p<0.001   ** p<0.01   * p<0.05   . p<0.1

In logistic regression, each coefficient is the change in the log-odds of approval for a one-unit increase in that predictor, holding all other variables constant. Exponentiating gives the odds ratio — the multiplicative change in the odds of approval.

cibil_score (): The coefficient is -0.0242, giving an odds ratio of 0.9761. This means each additional point in CIBIL score multiplies the odds of loan approval by 0.9761, holding all other variables constant. This is the dominant predictor in the model — its significance level () and the near-complete separation visible in the histogram are now confirmed statistically. Creditworthiness, as summarised by past repayment behaviour, is the primary gateway to loan approval in this dataset.

income_annum (*):** The coefficient is 0, giving an odds ratio of 1. Higher annual income increases the odds of approval, consistent with lenders assessing the ability to service the debt. Each additional unit of annual income multiplies approval odds by 1.

loan_amount (*):** The coefficient is 0, giving an odds ratio of 1. The negative sign confirms that larger loans reduce approval odds — higher lender exposure increases credit risk. Each additional unit of loan amount multiplies approval odds by 1.

loan_term (*):** The coefficient is 0.1481, giving an odds ratio of 1.1596. This captures the effect of repayment horizon: a longer term increases uncertainty about the borrower’s future financial position, which is reflected in a positive effect on approval odds.

Policy Implications

If CIBIL score is the dominant predictor of approval, then applicants with limited credit histories face systematic barriers to credit access regardless of their actual assets or income. Policymakers could consider:

  1. Alternative credit scoring incorporating utility payments, rental history, or mobile money transactions as proxies for creditworthiness for those without formal credit records.
  2. Credit-building programs providing small, guaranteed loans to help applicants establish CIBIL histories before they need larger facilities.
  3. Collateral-based lending frameworks that give more weight to asset holdings for applicants who have assets but not credit histories.

Limitations and Reproducibility

Limitations

Limitation 1 — External validity: The dataset is from Kaggle and represents one lender’s decisions in one market context. The dominance of CIBIL score may not generalise to lenders with different risk appetites, regulatory environments, or customer bases. Any policy recommendation should be validated with data from the target market.

Limitation 2 — Observational data and omitted variable bias: The models identify correlations, not causal relationships. The CIBIL score itself aggregates many aspects of credit history, so its coefficient captures a composite effect rather than a single mechanism. Unobserved variables — loan purpose, applicant-lender relationship, regional economic conditions — may confound the estimates. Causal claims would require additional identification strategies.

Reproducibility

All random processes use set.seed(465), producing identical results on every render. All file paths are relative — loan_approval_dataset.csv must be in the same directory as the .qmd file. The analysis uses standard CRAN packages (tidyverse, tidymodels, janitor, skimr). The document renders with quarto::quarto_render("ECON465_Stage3_FinalReport.qmd") or the RStudio Render button.


AI Use Log

Interaction 1 — Stage 2: Cross-Validation Comparison Table

Prompt given: “I am working on a tidymodels project in R. I ran 5-fold cross-validation using fit_resamples() and collected metrics with collect_metrics(). The output has columns .metric and mean. I also have test set metrics in a tibble with columns .metric and .estimate. I want a single side-by-side table showing, for each metric, the CV mean and the test set value in the same row. How do I do this?”

How the output was used: The filter() and pull() approach suggested by the AI was adopted, as it was cleaner than pivot_wider(). The code was adapted for accuracy, precision, and recall with actual object names from the workflow.

Verification: Output was checked by manually inspecting individual metric values before writing interpretations.

Interaction 2 — Stage 3: Economic Interpretation Framing

Prompt given: “In my logistic regression model predicting loan approval, Model 1 with only 4 financial variables performs as well as Model 2 with 11 variables. How should I interpret this economically — what does it mean that adding assets and employment type did not improve predictions?”

How the output was used: The framing around a “narrow, credit-score-dominated model” and the financial inclusion implication was incorporated into the economic interpretation section. Policy implications were developed independently beyond what the AI suggested.

Verification: The interpretation was cross-checked against the coefficient estimates to confirm consistency with the actual results. The AI generated no statistical output — only conceptual framing.


Final Reflections

Suggested Improvement

The most valuable improvement would be to test tree-based ensemble methods — a random forest or gradient boosting classifier — alongside logistic regression. The current analysis assumes linear relationships between predictors and the log-odds of approval. A random forest would capture nonlinear interactions: for example, a high CIBIL score may have a larger effect on approval when combined with high income than when income is low, and this interaction would not be captured by a linear model. If the random forest achieves materially higher accuracy, this would suggest the approval decision is more complex than the linear model implies. If performance is similar, it would strengthen the conclusion that four financial indicators are genuinely sufficient.

New Economic Question

“What is the causal effect of a one-unit increase in CIBIL score on the probability of loan approval, and does this effect differ across income groups?”

This moves beyond prediction to causal identification. Answering it would require a regression discontinuity design — exploiting discrete CIBIL score thresholds that lenders apply — or a quasi-experimental approach using changes in credit reporting regulations as natural experiments. If the effect is larger for low-income applicants, credit-building programs would provide the greatest marginal benefit to those most at risk of financial exclusion.


Conclusion

This project applied the complete data science pipeline to a real-world credit market dataset. The key findings are:

  1. Loan approval can be predicted with high accuracy using only four core financial indicators — CIBIL score, annual income, loan amount, and loan term.
  2. Adding a broader applicant profile (assets, employment type, education, number of dependents) does not improve predictive performance on the held-out test set.
  3. Lenders in this dataset operate a narrow, creditworthiness-centred model in which past repayment behaviour (CIBIL score) is the primary screening criterion.
  4. The model is stable across five-fold cross-validation, confirming these results are not specific to the training sample.

These findings have direct implications for financial inclusion policy. If credit access is primarily gated by credit score, applicants who lack credit histories face structural barriers regardless of their actual economic circumstances. Addressing this requires either changing the information lenders receive (alternative scoring) or changing the institutional framework within which lending decisions are made.