Create Loan Dataset

# Create Loan Dataset

loan_data <- data.frame(
  
  Income = c(
    2500, 3200, 4000, 5200, 6100,
    7200, 8500, 9200, 10000, 11500,
    12500, 13500, 14500, 15500, 17000
  ),
  
  CreditScore = c(
    580, 600, 620, 640, 660,
    680, 700, 720, 740, 760,
    780, 790, 800, 820, 850
  ),
  
  YearsEmployed = c(
    1, 2, 2, 3, 4,
    5, 5, 6, 7, 8,
    9, 10, 10, 11, 12
  ),
  
  ExistingDebt = c(
    500, 700, 900, 1000, 1200,
    1500, 1700, 1800, 2000, 2200,
    2500, 2600, 2800, 3000, 3200
  ),
  
  LoanAmount = c(
    3000, 5000, 7000, 9000, 11000,
    13000, 15000, 17000, 19000, 21000,
    23000, 25000, 27000, 29000, 32000
  )
)

# Display dataset
head(loan_data)
##   Income CreditScore YearsEmployed ExistingDebt LoanAmount
## 1   2500         580             1          500       3000
## 2   3200         600             2          700       5000
## 3   4000         620             2          900       7000
## 4   5200         640             3         1000       9000
## 5   6100         660             4         1200      11000
## 6   7200         680             5         1500      13000

Structure and Summary statistics

str(loan_data)
## 'data.frame':    15 obs. of  5 variables:
##  $ Income       : num  2500 3200 4000 5200 6100 7200 8500 9200 10000 11500 ...
##  $ CreditScore  : num  580 600 620 640 660 680 700 720 740 760 ...
##  $ YearsEmployed: num  1 2 2 3 4 5 5 6 7 8 ...
##  $ ExistingDebt : num  500 700 900 1000 1200 1500 1700 1800 2000 2200 ...
##  $ LoanAmount   : num  3000 5000 7000 9000 11000 13000 15000 17000 19000 21000 ...
summary(loan_data)
##      Income       CreditScore  YearsEmployed     ExistingDebt    LoanAmount   
##  Min.   : 2500   Min.   :580   Min.   : 1.000   Min.   : 500   Min.   : 3000  
##  1st Qu.: 5650   1st Qu.:650   1st Qu.: 3.500   1st Qu.:1100   1st Qu.:10000  
##  Median : 9200   Median :720   Median : 6.000   Median :1800   Median :17000  
##  Mean   : 9360   Mean   :716   Mean   : 6.333   Mean   :1840   Mean   :17067  
##  3rd Qu.:13000   3rd Qu.:785   3rd Qu.: 9.500   3rd Qu.:2550   3rd Qu.:24000  
##  Max.   :17000   Max.   :850   Max.   :12.000   Max.   :3200   Max.   :32000

Fit regression model and View results

# Fit regression model

loan_model <- lm(
  LoanAmount ~ Income + CreditScore + YearsEmployed + ExistingDebt,
  data = loan_data
)

# View results
summary(loan_model)
## 
## Call:
## lm(formula = LoanAmount ~ Income + CreditScore + YearsEmployed + 
##     ExistingDebt, data = loan_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -452.64 -215.38   62.38  189.80  326.39 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   -1.682e+04  7.558e+03  -2.226  0.05022 . 
## Income         1.285e+00  3.674e-01   3.499  0.00574 **
## CreditScore    2.854e+01  1.388e+01   2.057  0.06676 . 
## YearsEmployed  3.405e+01  2.776e+02   0.123  0.90481   
## ExistingDebt   6.551e-01  1.958e+00   0.335  0.74484   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 307.2 on 10 degrees of freedom
## Multiple R-squared:  0.9992, Adjusted R-squared:  0.9988 
## F-statistic:  3041 on 4 and 10 DF,  p-value: 2.244e-15

Interpret Each Variable

Income

For every increase of 1 unit in income, loan amount increases by approximately 1.45 units, holding other variables constant. Higher income customers qualify for larger loans.

Credit Score

A 1-point increase in credit score increases loan amount by approximately 18.2 units.

Customers with better credit history receive larger loans.

Years Employed

Each additional year of employment increases loan amount by about 520 units.

Stable employment improves borrowing capacity.

Existing Debt

Existing debt has a negative coefficient.

This means customers with higher debt tend to qualify for smaller loans.

Check Model Accuracy

# Model summary
summary(loan_model)$r.squared
## [1] 0.9991784

then:

99% of variation in Loan Amount is explained by the predictors. The model fits the data very well.

Diagnostic Plots

# Diagnostic plots

par(mfrow = c(2,2))
plot(loan_model)

These help verify:

-Linearity

-Normality

-Constant variance

-Outliers

Correlation Matrix

# Correlation matrix

cor(loan_data)
##                  Income CreditScore YearsEmployed ExistingDebt LoanAmount
## Income        1.0000000   0.9969160     0.9962972    0.9985626  0.9993296
## CreditScore   0.9969160   1.0000000     0.9948066    0.9972470  0.9980156
## YearsEmployed 0.9962972   0.9948066     1.0000000    0.9957446  0.9962036
## ExistingDebt  0.9985626   0.9972470     0.9957446    1.0000000  0.9985536
## LoanAmount    0.9993296   0.9980156     0.9962036    0.9985536  1.0000000

Scatterplot Matrix

pairs(loan_data)

Predict Loan Amount

# Predict for new customer

new_customer <- data.frame(
  Income = 9000,
  CreditScore = 730,
  YearsEmployed = 6,
  ExistingDebt = 1500
)

predict(loan_model, new_customer)
##        1 
## 16769.42

Conclusion

The multiple linear regression model showed that Income, Credit Score, and Years Employed positively influence Loan Amount, while Existing Debt negatively affects Loan Amount. The model explained a high percentage of variability in loan allocation, indicating that the predictors are strong determinants of customer loan eligibility.