Credit Risk Analysis

Predictive Modelling with Logistic Regression on the German Credit Dataset

Author

Gousia Ain

Published

March 1, 2026

1 Motivation:

Banks face a fundamental challenge: how to maximize loan profits while minimizing default risk Every loan approved to a “bad” customer results in losses, while every loan denied to a “good” customer means lost revenue. This asymmetry is what makes credit risk modelling both analytically challenging and genuinely consequential. It is not simply a classification problem — it is a question of how financial opportunity is allocated, and who bears the cost when the decision is wrong. Therefore, building a predictive model helps minimize credit risk and improve decision-making. What motivates me about this problem is that behind every data point is a real person. Building accurate and fair models means fewer defaults for lenders, but also fewer people incorrectly denied credit they deserve. That combination of rigorous analysis and real-world impact is what drew me to this project.

2 Data provenance:

The German Credit dataset was originally compiled by Prof. Hans Hofmann of the University of Hamburg and donated to the UCI Machine Learning Repository in 1994. It is publicly available under a Creative Commons Attribution 4.0 (CC BY 4.0) licence, permitting free use and adaptation with appropriate credit.

An important note on cost asymmetry: the dataset’s original documentation specifies a cost matrix in which misclassifying a bad customer as good carries a penalty five times greater than the reverse error. This asymmetry underpins the business case for prioritising specificity alongside overall accuracy in model evaluation.

Hofmann, H. (1994). Statlog (German Credit Data) [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5NC77

3 Data description:

The dataset contains 1,000 loan applications described by 20 predictor variables — a mix of categorical and integer features — and one binary target variable classifying each applicant as a good or bad credit risk. No missing values are present in the data.

Show Code

# Load required libraries
library(tidyverse)
library(knitr)

# Load the data
german_data <- read.table("german.data", header = FALSE)

# Display basic info
cat("Dataset Shape:", dim(german_data)[1], "rows,", dim(german_data)[2], "columns\n")

Dataset Shape: 1000 rows, 21 columns

Show Code

write.csv(german_data, "german_credit.csv", row.names = FALSE)

# Assign meaningful column names
colnames(german_data) <- c(
  "checking_status", "duration", "credit_history", "purpose",
  "credit_amount", "savings", "employment", "installment_rate",
  "personal_status_sex", "other_debtors", "residence_since",
  "property", "age", "other_installment_plans", "housing",
  "num_credits", "job", "num_dependents", "telephone",
  "foreign_worker", "class"
)

# Display variable names and their meanings
variable_descriptions <- data.frame(
  Variable = colnames(german_data),
  Description = c(
    "Status of existing checking account",
    "Loan duration in months",
    "Credit history",
    "Purpose of loan",
    "Credit amount in Deutsche Marks",
    "Savings account/bonds",
    "Employment duration",
    "Installment rate (% of disposable income)",
    "Personal status & sex",
    "Other debtors/guarantors",
    "Residence since (years)",
    "Property",
    "Age in years",
    "Other installment plans",
    "Housing situation",
    "Number of existing credits",
    "Job type",
    "Number of dependents",
    "Telephone ownership",
    "Foreign worker status",
    "Credit class (good/bad)"
  )
)

kable(variable_descriptions, caption = "Variable Descriptions")

Variable Descriptions
Variable	Description
checking_status	Status of existing checking account
duration	Loan duration in months
credit_history	Credit history
purpose	Purpose of loan
credit_amount	Credit amount in Deutsche Marks
savings	Savings account/bonds
employment	Employment duration
installment_rate	Installment rate (% of disposable income)
personal_status_sex	Personal status & sex
other_debtors	Other debtors/guarantors
residence_since	Residence since (years)
property	Property
age	Age in years
other_installment_plans	Other installment plans
housing	Housing situation
num_credits	Number of existing credits
job	Job type
num_dependents	Number of dependents
telephone	Telephone ownership
foreign_worker	Foreign worker status
class	Credit class (good/bad)

4 Data cleaning:

Before modelling, raw coded variables (e.g. A11, A14) were replaced with meaningful column names and all categorical variables were converted to factors. Without this step, R would treat category codes as numeric values, producing meaningless coefficient estimates and incorrect dummy variable encoding.

Show Code

# Convert categorical variables to factors
categorical_vars <- c(
  "checking_status","credit_history","purpose","savings","employment",
  "personal_status_sex","other_debtors","property",
  "other_installment_plans","housing","job",
  "telephone","foreign_worker"
)

german_data[categorical_vars] <- lapply(german_data[categorical_vars], factor)

# Recode target variable
german_data$class <- factor(german_data$class, 
                           levels = c(1, 2),
                           labels = c("good", "bad"))

5 EDA : Exploratory Data Analysis

5.1 Distribution of credit classes

Show Code

# Class distribution
class_dist <- table(german_data$class)
class_prop <- prop.table(class_dist) * 100

# Plot
ggplot(german_data, aes(x = class, fill = class)) +
  geom_bar() +
  geom_text(stat='count', aes(label=..count..), vjust=-0.5) +
  labs(title = "Distribution of Credit Classes",
       subtitle = paste0("Good: ", round(class_prop[1], 1), 
                        "% | Bad: ", round(class_prop[2], 1), "%"),
       x = "Credit Class", y = "Count") +
  theme_minimal() +
  scale_fill_manual(values = c("good" = "steelblue", "bad" = "coral"))

The barplot shows class imbalance, with 70% of observations classified as good credit and 30% as bad credit. This uneven distribution may cause the model to become biased toward the majority class and may affect our modeling strategy. Therefore, it is important to consider using techniques such as resampling to ensure the model can effectively learn from both classes.

5.2 summary statistics by classes:

Show Code

library(tidyverse)
library(knitr)
library(kableExtra)

# Calculate summary statistics
summary_stats <- german_data %>%
  group_by(class) %>%
  summarise(
    avg_duration = mean(duration),
    sd_duration = sd(duration),
    avg_credit_amount = mean(credit_amount),
    sd_credit_amount = sd(credit_amount),
    avg_age = mean(age),
    sd_age = sd(age),
    avg_installment_rate = mean(installment_rate),
    sd_installment_rate = sd(installment_rate)
  )
# Create formatted summary table with explicit dplyr::select
summary_table <- summary_stats %>% 
  mutate(
    `Loan Duration (months)` = sprintf("%.1f ± %.1f", avg_duration, sd_duration),
    `Credit Amount (DM)` = sprintf("%.0f ± %.0f", avg_credit_amount, sd_credit_amount),
    `Age (years)` = sprintf("%.1f ± %.1f", avg_age, sd_age),
    `Installment Rate (%)` = sprintf("%.2f ± %.2f", avg_installment_rate, sd_installment_rate)
  ) %>%
  dplyr::select(Class = class, `Loan Duration (months)`, `Credit Amount (DM)`,
                `Age (years)`, `Installment Rate (%)`)

# Display with kableExtra
summary_table %>% 
  kable(
    caption = "**Table: Descriptive Statistics by Credit Class (Mean ± SD)**",
    align = c("l", "c", "c", "c", "c"),
    linesep = ""
  ) %>%
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed", "responsive"),
    full_width = FALSE,
    position = "center",
    font_size = 13
  ) %>%
  column_spec(1, bold = TRUE, background = "skyblue") %>% 
  column_spec(2:5, width = "4cm") %>% 
  add_header_above(c(" " = 1, "Summary Statistics (Mean ± Standard Deviation)" = 4))

**Table: Descriptive Statistics by Credit Class (Mean ± SD)**
	Summary Statistics (Mean ± Standard Deviation)
Class	Loan Duration (months)	Credit Amount (DM)	Age (years)	Installment Rate (%)
good	19.2 ± 11.1	2985 ± 2401	36.2 ± 11.4	2.92 ± 1.13
bad	24.9 ± 13.3	3938 ± 3536	34.0 ± 11.2	3.10 ± 1.09

loan Duration: Customers with bad credit take loans for about 24.9 months on average, compared to 19.2 months for good credit customers. This is roughly a 5–6 month longer repayment period, which suggests longer loan terms may be associated with higher default risk.

Credit Amount: The average borrowed amount for bad credit customers is 3938 DM, while for good credit customers it is 2985 DM. This means risky customers borrow approximately 950 DM more on average, indicating that larger loan sizes may increase the probability of default. The higher standard deviation (3536 vs 2401) also shows more variability among risky borrowers.

Age: Good credit customers are slightly older on average (36.2 years) compared to bad credit customers (34.0 years). The difference is only about 2 years, and since the standard deviations are similar, age may not be an influencing factor.

Installment Rate: The average installment rate is 3.10% for bad credit and 2.92% for good credit, which is a very small difference. Given the similar spread in both groups, this variable likely has limited predictive power.

Overall Numerical Insight:

The most noticeable differences appear in loan duration and credit amount, where bad credit customers borrow more and repay over longer periods. Age and installment rate show only small mean differences, suggesting weaker influence on credit classification.

Show Code

##| message: false
#| warning: false
#| results: hide

# Load libraries in correct order (MASS last)
library(tidyverse)
library(MASS)

# Create long format
numeric_vars_long <- german_data %>%
  dplyr::select(duration, credit_amount, age, installment_rate, class) %>%
  tidyr::pivot_longer(
    cols = c(duration, credit_amount, age, installment_rate),
    names_to = "variable",
    values_to = "value"
  )

# Plot
ggplot(numeric_vars_long, aes(x = class, y = value, fill = class)) +
  geom_boxplot(alpha = 0.7) +
  facet_wrap(~ variable, scales = "free", ncol = 2) +
  labs(title = "Distribution of Numerical Variables by Credit Class",
       x = "Credit Class", y = "Value") +
  scale_fill_manual(values = c("good" = "steelblue", "bad" = "coral"))

Loan Size: It can be observed that customers classified as bad credit generally take higher loan amounts. The variability is also greater in this group, suggesting that larger borrowed amounts may be associated with increased repayment risk.

Repayment Period: The bad credit group tends to have longer repayment durations compared to the good credit group. This indicates that longer loan periods might contribute to a higher probability of default.

Installment Proportion: Both credit groups show very similar distributions for this variable. Therefore, it does not seem to play a major role in distinguishing between good and bad credit customers in this dataset.

5.3 Checking account status vs Credit risk

Show Code

# Checking status vs credit risk
ggplot(german_data, aes(x = checking_status, fill = class)) +
  geom_bar(position = "fill") +
  labs(
       subtitle = "A11: <0 DM | A12: 0-200 DM | A13: >200 DM | A14: No account",
       x = "Checking Account Status", y = "Proportion") +
 
  scale_fill_manual(values = c("good" = "steelblue", "bad" = "coral")) +
  coord_flip()

Customers with very low balances (<0 DM) show the highest proportion of bad credit, indicating strong association between poor account status and higher default risk. As account balance improves (especially >200 DM), the proportion of good credit increases significantly, suggesting checking account status is a strong predictor of credit risk

5.4 Credit history vs Credit risk

Show Code

# Credit history analysis
ggplot(german_data, aes(x = credit_history, fill = class)) +
  geom_bar(position = "fill") +
  labs(title = "Credit History vs Credit Risk",
       x = "Credit History Category", y = "Proportion") +
 
  scale_fill_manual(values = c("good" = "steelblue", "bad" = "coral"))

Borrowers with critical credit history (A34) have the lowest default rate, while those with no credits taken (A30) and delayed payments (A31) show the highest proportion of bad credit outcomes.

5.5 Loan purpose vs Credit risk

Show Code

# Purpose of loan
ggplot(german_data, aes(x = purpose, fill = class)) +
  geom_bar(position = "fill") +
  labs(
       x = "Purpose", y = "Proportion") +

  scale_fill_manual(values = c("good" = "steelblue", "bad" = "coral")) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Used car loans (A41) and retraining purposes (A48) have the lowest default rates, while domestic appliances (A46) and other purposes (A45) show the highest proportion of bad credit outcomes — suggesting loan purpose is a meaningful signal of repayment risk.

6 Logistic Regression :

Logistic regression was chosen as it is well-suited for binary classification problems, interpretable, and widely used in credit risk modelling where understanding the direction and magnitude of each predictor’s effect is important

Model assumption: Logistic regression assumes a linear relationship between predictors and the log-odds of the outcome. Diagnostic plots are examined below to verify this assumption holds.

6.1 Full model:

Show Code

# Full model with all predictors
model_full <- glm(
  class ~ .,
  data = german_data,
  family = binomial
)

# Diagnostic plots - Full Model
par(mfrow = c(2,2))
plot(model_full, 
     main = "Full Model Diagnostics")

Show Code

par(mfrow = (c(1,1)))

Regression diagnostics were examined for the full model. - The residuals vs fitted and scale-location plots display the characteristic butterfly pattern expected in logistic regression due to the binary outcome structure — this is not a violation of model assumptions. - The Q-Q plot shows reasonable conformity along the diagonal with minor deviation in the upper tail, suggesting a small number of poorly fitted observations. - The residuals vs leverage plot identifies observations 204, 736, and 819 as having relatively high leverage, warranting further investigation as potential influential points. Overall, no assumption violations were detected that would fundamentally undermine the model’s validity.

6.2 Stepwise model:

Show Code

# Stepwise model selection
library(MASS)

model_aic <- stepAIC(
  model_full,
  direction = "backward",
  trace = FALSE
)



# Diagnostic plots - Stepwise Model  
par(mfrow = c(2,2))
plot(model_aic,
     main = "AIC Model Diagnostics")

Show Code

# Reset plot layout
par(mfrow = c(1,1))

Diagnostic plots for the AIC stepwise model closely mirror those of the full model, confirming that variable reduction did not introduce new assumption violations. The residuals vs fitted and scale-location plots display the expected logistic regression structure in both cases. Notably, the leverage plot reveals that observation 204 remains influential across both models — suggesting it represents a genuinely unusual case in the data — while observation 736, flagged in the full model, no longer appears after stepwise selection, indicating its leverage was tied to a removed predictor. Observation 158 emerges as a new leverage point in the AIC model and warrants further inspection alongside observation 204.

6.3 Compare models:

Show Code

# Compare models
model_comparison <- data.frame(
  Model = c("Full Model", "stepwise Model"),
  AIC = c(AIC(model_full), AIC(model_aic)),
  Variables = c(length(coef(model_full)), length(coef(model_aic)))
)

kable(model_comparison, caption = "Model Comparison")

Model Comparison
Model	AIC	Variables
Full Model	993.8178	49
stepwise Model	982.4980	36

The stepwise AIC model removes six predictors — employment, residence_since, property, num_credits, job, and num_dependents — that did not contribute meaningfully to model fit. The result is a leaner model with lower AIC (982.5 vs 993.8) and nearly identical residual deviance (910.5 vs 895.8), confirming that the removed variables added complexity without improving prediction. The remaining coefficients are largely consistent in direction and magnitude across both models, suggesting the stepwise selection was stable and the full model was not substantially distorted by the presence of irrelevant predictors. All further interpretation is based on the AIC-selected model, chosen for its parsimony while maintaining goodness of fit, reducing overfitting , and enhancing generalisability to unseen data.

7 Model result and interpretation:

7.1 AIC- stepwise model coefficient plot

Show Code

# Extract coefficients from AIC model
coef_df <- as.data.frame(summary(model_aic)$coefficients)
coef_df$variable <- rownames(coef_df)
coef_df <- coef_df[coef_df$variable != "(Intercept)", ]
coef_df <- coef_df[order(coef_df$Estimate), ]

# Keep only significant predictors (p < 0.05)
sig_coef <- coef_df[coef_df$`Pr(>|z|)` < 0.05, ]

# Plot coefficients with confidence intervals
ggplot(sig_coef,
       aes(x = reorder(variable, Estimate), y = Estimate)) +
  geom_point(size = 3, color = "steelblue") +
  geom_errorbar(
    aes(ymin = Estimate - 1.96 * `Std. Error`,
        ymax = Estimate + 1.96 * `Std. Error`),
    width = 0.2, color = "gray50"
  ) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  coord_flip() +
  labs(
    title = "Significant Predictors of Credit Risk",
    subtitle = "Positive values increase probability of 'good' credit",
    x = "Predictor Variables",
    y = "Coefficient Estimate (Log-odds)"
  )

The coefficient plot illustrates customers with higher installment rates, longer loan durations, and larger credit amounts show positive coefficients, meaning they’re actually more likely to be good credit risks—possibly because they may have been pre-screened as stable borrowers, or they might be taking longer loans precisely because they can manage payments comfortably. However, the strongest predictors of default risk : no checking account (A14), used car purchases (A41), critical credit history (A34), foreign worker status (A202), and no savings (A65, A64), all of which have negative coefficients pushing toward bad credit

7.2 Odds-Ratio plot

Show Code

# Convert to odds ratios
sig_coef$OR <- exp(sig_coef$Estimate)
sig_coef$CI_lower <- exp(sig_coef$Estimate - 1.96 * sig_coef$`Std. Error`)
sig_coef$CI_upper <- exp(sig_coef$Estimate + 1.96 * sig_coef$`Std. Error`)

# Plot odds ratios
ggplot(sig_coef,
       aes(x = reorder(variable, OR), y = OR)) +
  geom_point(size = 3, color = "steelblue") +
  geom_errorbar(
    aes(ymin = CI_lower, ymax = CI_upper),
    width = 0.2, color = "gray50"
  ) +
  geom_hline(yintercept = 1, linetype = "dashed", color = "red") +
  coord_flip() +
  labs(
    title = "Odds Ratios: How Each Factor Affects Credit Risk",
    subtitle = "Values >1 increase good credit probability | Values <1 decrease it",
    x = "Predictor Variables",
    y = "Odds Ratio (with 95% CI)"
  )

Only installment_rate shows a confidently protective effect (OR ≈ 1.4), while duration and credit_amount sit near 1.0 The strongest and most reliable risk factors are checking_statusA14 (OR ≈ 0.18), purposeA41 (OR ≈ 0.19), and credit_historyA34 (OR ≈ 0.24), all with narrow confidence intervals. The remaining predictors cluster between 0.3 and 0.5, representing a moderate risk group, though foreign_workerA202 and savingsA64 carry wide intervals and should be interpreted cautiously.

7.3 CV model

Show Code

library(caret)

set.seed(123)
ctrl <- trainControl(
  method = "cv",
  number = 5,
  classProbs = TRUE,
  summaryFunction = twoClassSummary
)

cv_model <- train(
  class ~ .,
  data = german_data,
  method = "glm",
  family = binomial,
  trControl = ctrl,
  metric = "ROC"
)

print(cv_model)

Generalized Linear Model 

1000 samples
  20 predictor
   2 classes: 'good', 'bad' 

No pre-processing
Resampling: Cross-Validated (5 fold) 
Summary of sample sizes: 800, 800, 800, 800, 800 
Resampling results:

  ROC        Sens       Spec     
  0.7774762  0.8714286  0.4766667

Five-fold cross-validation yielded an AUC of 0.778, confirming the model generalises reasonably well to unseen data. However, the results reveal a notable imbalance: sensitivity of 87% indicates the model is effective at identifying creditworthy customers, but specificity of only 48% means the majority of actual defaulters are misclassified as good.

This asymmetry stems from the class imbalance in the dataset (70% good, 30% bad) and has significant business implications — missed defaults represent direct financial losses. Future work should explore resampling techniques such as SMOTE or cost-sensitive learning to improve the model’s ability to detect bad credit customers without sacrificing too much sensitivity.

7.4 ROC full curve

Show Code

library(pROC)

# Predictions
prob_aic <- predict(model_aic, german_data, type = "response")
prob_full <- predict(model_full, german_data, type = "response")

# ROC objects
roc_aic <- roc(german_data$class, prob_aic, 
               levels = c("good", "bad"), direction = "<")
roc_full <- roc(german_data$class, prob_full, 
                levels = c("good", "bad"), direction = "<")

# Plot both ROC curves
plot(roc_full, col = "gray", lwd = 2, 
     main = "ROC Curve Comparison")
plot(roc_aic, col = "steelblue", lwd = 2, add = TRUE)

# Add AUC values
legend("bottomright",
       legend = c(paste("Full Model AUC:", round(auc(roc_full), 3)),
                  paste("AIC Model AUC:", round(auc(roc_aic), 3))),
       col = c("gray", "steelblue"),
       lwd = 2)

abline(a = 0, b = 1, lty = 2, col = "red")

The ROC curve comparison reveals that the AIC-selected stepwise model (AUC = 0.828) performs nearly identically to the full model (AUC = 0.834) despite using considerably fewer predictors — a difference of just 0.006 AUC that is negligible in practice. Visually, the two curves are almost indistinguishable. This supports selecting the stepwise model on grounds of parsimony and interpretability, both critical considerations in regulated credit risk environments. Notably, both in-sample AUCs exceed the cross-validated estimate of 0.778, indicating some degree of overfitting — the CV result should be treated as the more realistic performance benchmark for deployment.

7.5 Confusion matrix

Show Code

# Find optimal threshold
optimal_coords <- coords(roc_aic, "best", 
                         best.method = "closest.topleft",
                         ret = c("threshold", "specificity", "sensitivity"))

# Predict using optimal threshold
predictions <- ifelse(prob_aic > optimal_coords$threshold, "bad", "good")
predictions <- factor(predictions, levels = c("good", "bad"))

# Confusion matrix
conf_matrix <- confusionMatrix(predictions, german_data$class)



# Extract key metrics
metrics <- data.frame(
  Metric = c("Accuracy", "Sensitivity (Good)", "Specificity (Bad)", 
             "Precision", "F1 Score", "AUC"),
  Value = c(conf_matrix$overall["Accuracy"],
            conf_matrix$byClass["Sensitivity"],
            conf_matrix$byClass["Specificity"],
            conf_matrix$byClass["Precision"],
            conf_matrix$byClass["F1"],
            auc(roc_aic))
)

kable(metrics, caption = "Model Performance Metrics", digits = 3)

Model Performance Metrics
	Metric	Value
Accuracy	Accuracy	0.753
Sensitivity	Sensitivity (Good)	0.754
Specificity	Specificity (Bad)	0.750
Precision	Precision	0.876
F1	F1 Score	0.810
	AUC	0.828

Using the optimal classification threshold identified from the ROC curve, the model achieves 75.3% accuracy — significantly above the no-information rate of 70% (p < 0.001). Crucially, the model demonstrates balanced performance with sensitivity and specificity both near 75%, a substantial improvement over the raw cross-validated specificity of 48%. Of the 300 actual bad credit customers, 225 are correctly identified, while 75 are misclassified as good — representing the highest-cost errors from a business perspective. The positive predictive value of 87.6% indicates that loan approvals from this model are highly reliable, though the negative predictive value of 56.7% suggests flagged rejections should be reviewed manually rather than automatically declined.

8 Business insights:

Show Code

business_insights <- data.frame(
  Risk_Factor = c(
    "No checking account (A14)",
    "Used car purchase (A41)",
    "Critical credit history (A34)",
    "No savings / unknown savings (A65)",
    "Mid-range savings 500-1000 DM (A64)",
    "No other installment plans (A143)",
    "Foreign worker status (A202)",
    "High installment rate",
    "Borderline model predictions"
  ),
  
  Model_Evidence = c(
    "Strongest predictor — OR ≈ 0.18, p < 0.001",
    "Second strongest — OR ≈ 0.19, p < 0.001",
    "OR ≈ 0.24, p = 0.001 — reliable estimate",
    "OR ≈ 0.39, p < 0.001",
    "OR ≈ 0.27, p = 0.01 — wide CI, treat cautiously",
    "OR ≈ 0.52, p = 0.007",
    "OR ≈ 0.25, p = 0.03 — very wide CI",
    "OR ≈ 1.39, p < 0.001 — only confidently protective factor",
    "Model NPV = 56.7% — 'bad' predictions correct just over half the time"
  ),
  
  Impact = c(
    "Highest Risk",
    "Very High Risk",
    "Very High Risk",
    "High Risk",
    "High Risk — uncertain",
    "Moderate Risk",
    "High Risk — flag for fairness review",
    "Protective",
    "Model Uncertainty"
  ),
  
  Business_Action = c(
    "Require collateral or guarantor before approval; no checking account removes key repayment monitoring tool",
    "Require larger down payment; used cars depreciate rapidly and offer weak collateral value",
    "Request full credit report and alternative income evidence; do not auto-reject — review manually",
    "Verify alternative assets (property, pension); absence of savings removes financial buffer for repayment shocks",
    "Estimate is unstable — do not over-penalise; request savings documentation before decision",
    "Request explanation of financial obligations; absence of existing plans may signal income instability",
    "Verify employment contract and stability; note: demographic factors require consistent application to comply with fair lending regulations",
    "Treat as positive repayment capacity signal; customers committing higher income share demonstrate financial confidence",
    "Do not auto-reject borderline cases — refer to human underwriter; automated rejection risks wrongly denying credit to 4 in 10 flagged customers"
  )
)

kable(business_insights, 
      caption = "Business Recommendations Derived from Model Evidence",
      col.names = c("Risk Factor", "Model Evidence", "Impact", "Recommended Action"))

Business Recommendations Derived from Model Evidence
Risk Factor	Model Evidence	Impact	Recommended Action
No checking account (A14)	Strongest predictor — OR ≈ 0.18, p < 0.001	Highest Risk	Require collateral or guarantor before approval; no checking account removes key repayment monitoring tool
Used car purchase (A41)	Second strongest — OR ≈ 0.19, p < 0.001	Very High Risk	Require larger down payment; used cars depreciate rapidly and offer weak collateral value
Critical credit history (A34)	OR ≈ 0.24, p = 0.001 — reliable estimate	Very High Risk	Request full credit report and alternative income evidence; do not auto-reject — review manually
No savings / unknown savings (A65)	OR ≈ 0.39, p < 0.001	High Risk	Verify alternative assets (property, pension); absence of savings removes financial buffer for repayment shocks
Mid-range savings 500-1000 DM (A64)	OR ≈ 0.27, p = 0.01 — wide CI, treat cautiously	High Risk — uncertain	Estimate is unstable — do not over-penalise; request savings documentation before decision
No other installment plans (A143)	OR ≈ 0.52, p = 0.007	Moderate Risk	Request explanation of financial obligations; absence of existing plans may signal income instability
Foreign worker status (A202)	OR ≈ 0.25, p = 0.03 — very wide CI	High Risk — flag for fairness review	Verify employment contract and stability; note: demographic factors require consistent application to comply with fair lending regulations
High installment rate	OR ≈ 1.39, p < 0.001 — only confidently protective factor	Protective	Treat as positive repayment capacity signal; customers committing higher income share demonstrate financial confidence
Borderline model predictions	Model NPV = 56.7% — ‘bad’ predictions correct just over half the time	Model Uncertainty	Do not auto-reject borderline cases — refer to human underwriter; automated rejection risks wrongly denying credit to 4 in 10 flagged customers

9 Limitations:

1. No holdout test set

The confusion matrix and ROC curve were evaluated on the same data the model was trained on, which means performance is likely slightly optimistic. The 5-fold cross-validated AUC of 0.778 is the more honest estimate of how the model would perform on new loan applications. In future work, I would set aside a dedicated test set before any model training to get a fully independent evaluation.

2. Class imbalance

The dataset contains 700 good and 300 bad credit customers. This imbalance pushes the model toward predicting “good” by default, which is reflected in the low negative predictive value of 56.7%. Techniques like SMOTE or cost-sensitive learning could help the model better identify defaulters

3. Suppressor effects in continuous variables

Duration and credit amount showed positive coefficients in the model despite the EDA showing bad customers tend to have longer loans and larger amounts. This contradiction likely reflects multicollinearity with other predictors rather than a genuine protective effect.

4. Dataset age and context

The German Credit dataset originates from the 1970s and reflects a specific historical and economic context that may not generalise to modern lending. Spending patterns, financial products, and borrower demographics have changed substantially since then.

5. Ethical concern with demographic variables

Foreign worker status was statistically significant in the model, but using demographic characteristics in credit decisions raises serious fairness and legal concerns under modern equal credit opportunity regulations. In a real deployment, this variable would need to be reviewed carefully or removed entirely.

10 Conclusion :

This analysis set out to identify the key drivers of credit default risk using the German Credit dataset. The stepwise logistic regression model emerged as the preferred approach because the AUC is slightly better than full model regression (AUC 0.828 vs 0.834) with fewer predictors, lower AIC, and greater interpretability.

The strongest and most actionable finding is that the absence of financial footprint — no checking account, no savings, no credit history — is the clearest signal of default risk. Customers cannot be assessed reliably when there is nothing to assess.

That said, the model has real limitations. The cross-validated AUC of 0.778 is the honest performance estimate, and the negative predictive value of 56.7% means the model should never be used to automatically reject applicants. It is a decision-support tool, not a decision-maker.

If I were to extend this work, I would implement a formal train/test split, explore SMOTE to address class imbalance, and test a gradient boosting model to capture non-linear interactions the logistic regression cannot. The business recommendations derived here provide a foundation — but responsible deployment would require further validation on more recent, representative data.

11 References:

Hofmann, H. (1994). Statlog (German Credit Data). UCI Machine Learning Repository. https://doi.org/10.24432/C5NC77

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An Introduction to Statistical Learning with Applications in R (2nd ed.). Springer.

R Core Team. (2024). R: A Language and Environment for Statistical Computing (Version 4.4.2). R Foundation for Statistical Computing.

Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer.

Kuhn, M. (2008). Building Predictive Models in R Using the caret Package. Journal of Statistical Software, 28(5). https://doi.org/10.18637/jss.v028.i05

Robin, X., et al. (2011). pROC: An open-source package for R and S+. BMC Bioinformatics, 12, 77.

Venables, W. N., & Ripley, B. D. (2002). Modern Applied Statistics with S (4th ed.). Springer.