Visualizing German Credit Dataset

## 'data.frame':    1000 obs. of  21 variables:
##  $ status_chkaccnt : chr  "A11" "A12" "A14" "A11" ...
##  $ duration        : int  6 48 12 42 24 36 24 36 12 30 ...
##  $ credit_hist     : chr  "A34" "A32" "A34" "A32" ...
##  $ purpose         : chr  "A43" "A43" "A46" "A42" ...
##  $ credit_amt      : int  1169 5951 2096 7882 4870 9055 2835 6948 3059 5234 ...
##  $ savings_accnt   : chr  "A65" "A61" "A61" "A61" ...
##  $ present_emp     : chr  "A75" "A73" "A74" "A74" ...
##  $ installment_rate: int  4 2 2 2 3 2 3 2 2 4 ...
##  $ status_sex      : chr  "A93" "A92" "A93" "A93" ...
##  $ other_debtors   : chr  "A101" "A101" "A101" "A103" ...
##  $ present_resid   : int  4 2 3 4 4 4 4 2 4 2 ...
##  $ property        : chr  "A121" "A121" "A121" "A122" ...
##  $ age             : int  67 22 49 45 53 35 53 35 61 28 ...
##  $ other_install   : chr  "A143" "A143" "A143" "A143" ...
##  $ housing         : chr  "A152" "A152" "A152" "A153" ...
##  $ n_credits       : int  2 1 1 1 2 1 1 1 1 2 ...
##  $ job             : chr  "A173" "A173" "A172" "A173" ...
##  $ n_people        : int  1 1 2 2 2 2 1 1 1 1 ...
##  $ telephone       : chr  "A192" "A191" "A191" "A191" ...
##  $ foreign         : chr  "A201" "A201" "A201" "A201" ...
##  $ risk_status     : int  1 2 1 1 2 1 1 1 1 2 ...
## Warning: package 'ggpubr' was built under R version 4.0.5
## Loading required package: ggplot2

Dataset Description

The dataset was provided by Dr.ย Hans Hofmann. There are 1000 instances where individuals have been classified as risky or not.

Number of Attributes: 20 (7 numerical, 13 categorical) The variable risk_status (v21) in the dataset is the risk label where 1 means bad and 2 means good.

There is a total on 21 attributes in the dataset. Their descriptions and details have been tabulated below:

  • Status of existing checking account.
  • Duration in month
  • Credit history
  • Purpose
  • Credit credit_amt
  • Savings account/bonds
  • Present employment since
  • Installment rate in percentage of disposable income
  • Personal status and status_sex
  • Other debtors / guarantors
  • Present residence since
  • Property
  • Age in years
  • Other installment plans
  • Housing
  • Number of existing credits at this bank
  • Job
  • Number of people being liable to provide maintenance + for
  • Telephone
  • foreign worker

Insights from Exploratory Data Analysis

Following are the insights from the visual plot:

  • The median value of age for good records is more than the bad records which could indicate that younger people tend to be bad credit risk.
  • Risky credit records have a higher median of duration in comparison to safe.
  • The credit amount is higher for bad records
  • There is a significant difference for duration as well for the bad and good records.