Title: “German credit”

Abstract:

The objective of this project is to classify people described by a set of attributes as good or bad credit risks. Build a model to predict the credit risk associated with a customer, based on his profile attributes.

The German Credit Data contains data on 21 variables and the classification whether an applicant is considered a Good or a Bad credit risk for 1000 loan applicants.

When a bank receives a loan application, based on the applicant’s profile the bank has to make a decision regarding whether to go ahead with the loan approval or not. Two types of risks are associated with the bank’s decision :

If the applicant is a good credit risk, i.e. is likely to repay the loan, then not approving the loan to the person results in a loss of business to the bank.

If the applicant is a bad credit risk, i.e. is not likely to repay the loan, then approving the loan to the person results in a financial loss to the bank. Objective of Analysis:

Minimization of risk and maximization of profit on behalf of the bank. ???

To minimize loss from the bank’s perspective, the bank needs a decision rule regarding who to give approval of the loan and who not to. An applicant’s demographic and socio-economic profiles are considered by loan managers before a decision is taken regarding his/her loan application. ??? The German Credit Data contains data on 21 variables and the classification whether an applicant is considered a Good or a Bad credit risk for 1000 loan applicants. A predictive model developed on this data is expected to provide a bank manager guidance for making a decision whether to approve a loan to a prospective applicant based on his/her profiles.

data importing:

dim is used to check weather the imported data is correct or not(no of observations and columns).

german_credit<-read.csv("F://desktop 18th jan//DATA ANALYTICS//1.R-prog//projects//00000 project3//german_credit.csv")


dim(german_credit)
## [1] 1000   21

names is used to print all the variables which are imported by a given dataset.

names(german_credit)
##  [1] "Creditability"                    
##  [2] "Account.Balance"                  
##  [3] "Duration.of.Credit..month."       
##  [4] "Payment.Status.of.Previous.Credit"
##  [5] "Purpose"                          
##  [6] "Credit.Amount"                    
##  [7] "Value.Savings.Stocks"             
##  [8] "Length.of.current.employment"     
##  [9] "Instalment.per.cent"              
## [10] "Sex...Marital.Status"             
## [11] "Guarantors"                       
## [12] "Duration.in.Current.address"      
## [13] "Most.valuable.available.asset"    
## [14] "Age..years."                      
## [15] "Concurrent.Credits"               
## [16] "Type.of.apartment"                
## [17] "No.of.Credits.at.this.Bank"       
## [18] "Occupation"                       
## [19] "No.of.dependents"                 
## [20] "Telephone"                        
## [21] "Foreign.Worker"

head is used to print first 6 observations.

head(german_credit)
##   Creditability Account.Balance Duration.of.Credit..month.
## 1             1               1                         18
## 2             1               1                          9
## 3             1               2                         12
## 4             1               1                         12
## 5             1               1                         12
## 6             1               1                         10
##   Payment.Status.of.Previous.Credit Purpose Credit.Amount
## 1                                 4       2          1049
## 2                                 4       0          2799
## 3                                 2       9           841
## 4                                 4       0          2122
## 5                                 4       0          2171
## 6                                 4       0          2241
##   Value.Savings.Stocks Length.of.current.employment Instalment.per.cent
## 1                    1                            2                   4
## 2                    1                            3                   2
## 3                    2                            4                   2
## 4                    1                            3                   3
## 5                    1                            3                   4
## 6                    1                            2                   1
##   Sex...Marital.Status Guarantors Duration.in.Current.address
## 1                    2          1                           4
## 2                    3          1                           2
## 3                    2          1                           4
## 4                    3          1                           2
## 5                    3          1                           4
## 6                    3          1                           3
##   Most.valuable.available.asset Age..years. Concurrent.Credits
## 1                             2          21                  3
## 2                             1          36                  3
## 3                             1          23                  3
## 4                             1          39                  3
## 5                             2          38                  1
## 6                             1          48                  3
##   Type.of.apartment No.of.Credits.at.this.Bank Occupation No.of.dependents
## 1                 1                          1          3                1
## 2                 1                          2          3                2
## 3                 1                          1          2                1
## 4                 1                          2          2                2
## 5                 2                          2          2                1
## 6                 1                          2          2                2
##   Telephone Foreign.Worker
## 1         1              1
## 2         1              1
## 3         1              1
## 4         1              2
## 5         1              2
## 6         1              2

tail is used to print last few observations.

tail(german_credit)
##      Creditability Account.Balance Duration.of.Credit..month.
## 995              0               1                         12
## 996              0               1                         24
## 997              0               1                         24
## 998              0               4                         21
## 999              0               2                         12
## 1000             0               1                         30
##      Payment.Status.of.Previous.Credit Purpose Credit.Amount
## 995                                  0       3          6199
## 996                                  2       3          1987
## 997                                  2       0          2303
## 998                                  4       0         12680
## 999                                  2       3          6468
## 1000                                 2       2          6350
##      Value.Savings.Stocks Length.of.current.employment Instalment.per.cent
## 995                     1                            3                   4
## 996                     1                            3                   2
## 997                     1                            5                   4
## 998                     5                            5                   4
## 999                     5                            1                   2
## 1000                    5                            5                   4
##      Sex...Marital.Status Guarantors Duration.in.Current.address
## 995                     3          1                           2
## 996                     3          1                           4
## 997                     3          2                           1
## 998                     3          1                           4
## 999                     3          1                           1
## 1000                    3          1                           4
##      Most.valuable.available.asset Age..years. Concurrent.Credits
## 995                              2          28                  3
## 996                              1          21                  3
## 997                              1          45                  3
## 998                              4          30                  3
## 999                              4          52                  3
## 1000                             2          31                  3
##      Type.of.apartment No.of.Credits.at.this.Bank Occupation
## 995                  1                          2          3
## 996                  1                          1          2
## 997                  2                          1          3
## 998                  3                          1          4
## 999                  2                          1          4
## 1000                 2                          1          3
##      No.of.dependents Telephone Foreign.Worker
## 995                 1         2              1
## 996                 2         1              1
## 997                 1         1              1
## 998                 1         2              1
## 999                 1         2              1
## 1000                1         1              1

str is used to find structure of given data

str(german_credit)
## 'data.frame':    1000 obs. of  21 variables:
##  $ Creditability                    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Account.Balance                  : int  1 1 2 1 1 1 1 1 4 2 ...
##  $ Duration.of.Credit..month.       : int  18 9 12 12 12 10 8 6 18 24 ...
##  $ Payment.Status.of.Previous.Credit: int  4 4 2 4 4 4 4 4 4 2 ...
##  $ Purpose                          : int  2 0 9 0 0 0 0 0 3 3 ...
##  $ Credit.Amount                    : int  1049 2799 841 2122 2171 2241 3398 1361 1098 3758 ...
##  $ Value.Savings.Stocks             : int  1 1 2 1 1 1 1 1 1 3 ...
##  $ Length.of.current.employment     : int  2 3 4 3 3 2 4 2 1 1 ...
##  $ Instalment.per.cent              : int  4 2 2 3 4 1 1 2 4 1 ...
##  $ Sex...Marital.Status             : int  2 3 2 3 3 3 3 3 2 2 ...
##  $ Guarantors                       : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Duration.in.Current.address      : int  4 2 4 2 4 3 4 4 4 4 ...
##  $ Most.valuable.available.asset    : int  2 1 1 1 2 1 1 1 3 4 ...
##  $ Age..years.                      : int  21 36 23 39 38 48 39 40 65 23 ...
##  $ Concurrent.Credits               : int  3 3 3 3 1 3 3 3 3 3 ...
##  $ Type.of.apartment                : int  1 1 1 1 2 1 2 2 2 1 ...
##  $ No.of.Credits.at.this.Bank       : int  1 2 1 2 2 2 2 1 2 1 ...
##  $ Occupation                       : int  3 3 2 2 2 2 2 2 1 1 ...
##  $ No.of.dependents                 : int  1 2 1 2 1 2 1 2 1 1 ...
##  $ Telephone                        : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Foreign.Worker                   : int  1 1 1 2 2 2 2 2 1 1 ...

view is used weather the dataset we import is correct or not

View(german_credit)

summary which gives the overall information of given dataset

summary(german_credit)
##  Creditability Account.Balance Duration.of.Credit..month.
##  Min.   :0.0   Min.   :1.000   Min.   : 4.0              
##  1st Qu.:0.0   1st Qu.:1.000   1st Qu.:12.0              
##  Median :1.0   Median :2.000   Median :18.0              
##  Mean   :0.7   Mean   :2.577   Mean   :20.9              
##  3rd Qu.:1.0   3rd Qu.:4.000   3rd Qu.:24.0              
##  Max.   :1.0   Max.   :4.000   Max.   :72.0              
##  Payment.Status.of.Previous.Credit    Purpose       Credit.Amount  
##  Min.   :0.000                     Min.   : 0.000   Min.   :  250  
##  1st Qu.:2.000                     1st Qu.: 1.000   1st Qu.: 1366  
##  Median :2.000                     Median : 2.000   Median : 2320  
##  Mean   :2.545                     Mean   : 2.828   Mean   : 3271  
##  3rd Qu.:4.000                     3rd Qu.: 3.000   3rd Qu.: 3972  
##  Max.   :4.000                     Max.   :10.000   Max.   :18424  
##  Value.Savings.Stocks Length.of.current.employment Instalment.per.cent
##  Min.   :1.000        Min.   :1.000                Min.   :1.000      
##  1st Qu.:1.000        1st Qu.:3.000                1st Qu.:2.000      
##  Median :1.000        Median :3.000                Median :3.000      
##  Mean   :2.105        Mean   :3.384                Mean   :2.973      
##  3rd Qu.:3.000        3rd Qu.:5.000                3rd Qu.:4.000      
##  Max.   :5.000        Max.   :5.000                Max.   :4.000      
##  Sex...Marital.Status   Guarantors    Duration.in.Current.address
##  Min.   :1.000        Min.   :1.000   Min.   :1.000              
##  1st Qu.:2.000        1st Qu.:1.000   1st Qu.:2.000              
##  Median :3.000        Median :1.000   Median :3.000              
##  Mean   :2.682        Mean   :1.145   Mean   :2.845              
##  3rd Qu.:3.000        3rd Qu.:1.000   3rd Qu.:4.000              
##  Max.   :4.000        Max.   :3.000   Max.   :4.000              
##  Most.valuable.available.asset  Age..years.    Concurrent.Credits
##  Min.   :1.000                 Min.   :19.00   Min.   :1.000     
##  1st Qu.:1.000                 1st Qu.:27.00   1st Qu.:3.000     
##  Median :2.000                 Median :33.00   Median :3.000     
##  Mean   :2.358                 Mean   :35.54   Mean   :2.675     
##  3rd Qu.:3.000                 3rd Qu.:42.00   3rd Qu.:3.000     
##  Max.   :4.000                 Max.   :75.00   Max.   :3.000     
##  Type.of.apartment No.of.Credits.at.this.Bank   Occupation   
##  Min.   :1.000     Min.   :1.000              Min.   :1.000  
##  1st Qu.:2.000     1st Qu.:1.000              1st Qu.:3.000  
##  Median :2.000     Median :1.000              Median :3.000  
##  Mean   :1.928     Mean   :1.407              Mean   :2.904  
##  3rd Qu.:2.000     3rd Qu.:2.000              3rd Qu.:3.000  
##  Max.   :3.000     Max.   :4.000              Max.   :4.000  
##  No.of.dependents   Telephone     Foreign.Worker 
##  Min.   :1.000    Min.   :1.000   Min.   :1.000  
##  1st Qu.:1.000    1st Qu.:1.000   1st Qu.:1.000  
##  Median :1.000    Median :1.000   Median :1.000  
##  Mean   :1.155    Mean   :1.404   Mean   :1.037  
##  3rd Qu.:1.000    3rd Qu.:2.000   3rd Qu.:1.000  
##  Max.   :2.000    Max.   :2.000   Max.   :2.000
x<-sum(is.na(german_credit))
x
## [1] 0

data information

Variable1 = “Creditability”

It is a Binary data ,which can take only two possible values.The two values in a binary variable, numerically as 0 and 1

0: No, 1:Yes

It is a type of categorical data, which more generally represents experiments with a fixed number of possible outcomes.

This is the target/response variable

variable2=“Account.Balance”

It contains qualitative data. there are four categories.

1 : … < 0 DM

2 : 0 <= … < 200 DM

3 : … >= 200 DM / salary assignments for at least 1 year

4 : no checking account

DM-Deutsche mark.The basic unit of money in Germany.

Account.Balance contains qualitative data.Central tendencies ,dispersion does not make any sense.frequency table,mode and barplot are calculated for qualitative data.mode gives the maximum value of status of Account.Balance.

frequency table of Account.Balance

tab<-table(german_credit$Account.Balance)
tab
## 
##   1   2   3   4 
## 274 269  63 394
names(tab)
## [1] "1" "2" "3" "4"
x<-sum(is.na(german_credit$Account.Balance))
x
## [1] 0

1 -stands for zero balance, 2 -stands for below 200 balance, 3 -stands for above 200 balance, 4 -stands for no checking accounts.

mode of Account.Balance.It gives the maximum value.

temp <- table(as.vector(german_credit$Account.Balance))
names(temp)[temp == max(temp)]
## [1] "4"

mode of Status of Account.Balance is 4.

ggplot of Account.Balance

library("ggplot2")
## Warning: package 'ggplot2' was built under R version 3.3.2
qplot(data<-german_credit$Account.Balance,main="Account.Balance", ylab="German_currency-Dm", colour= I("purple"),size=I(5))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

output description:

1 stands for 274 people have zero balance.

2 stands for 269 people have below 200 DM balance.

3 stands for 63 people have above 200 DM balance.

4 stands for 394 people have no checking account.

variable3=“Duration.of.Credit..month”

It is a Numerical data. boxplot of duration of credit month

quantile(german_credit$Duration.of.Credit..month.)
##   0%  25%  50%  75% 100% 
##    4   12   18   24   72
quantile(german_credit$Duration.of.Credit..month.,c(0.75,0.80,0.90,1))
##  75%  80%  90% 100% 
##   24   30   36   72
boxplot(german_credit$Duration.of.Credit..month.)

output description

In this boxplot the minimum is 4 , maximum is 72, and median is 18. first quartile is 12,third quartile is 24.

histogram of Duration.of.Credit..month.

hist(german_credit$Duration.of.Credit..month.)

correlation between Duration.of.Credit..month. and response

library("ltm")
## Warning: package 'ltm' was built under R version 3.3.2
## Loading required package: MASS
## Loading required package: msm
## Warning: package 'msm' was built under R version 3.3.2
## Loading required package: polycor
## Warning: package 'polycor' was built under R version 3.3.2
biserial.cor(german_credit$Duration.of.Credit..month.,german_credit$Creditability)
## [1] 0.2148192

correlation is 0.21.Duration.of.Credit..month. and Creditability positively correlated. #t-test

t.test(german_credit$Duration.of.Credit..month.)
## 
##  One Sample t-test
## 
## data:  german_credit$Duration.of.Credit..month.
## t = 54.816, df = 999, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  20.15469 21.65131
## sample estimates:
## mean of x 
##    20.903

variable4=“Payment.Status.of.Previous.Credit”

It is a Categorical data.It contains 5 categories.

0: no credits taken

1: all credits at this bank paid back duly

2: existing credits paid back duly till now

3: delay in paying off in the past

4: critical account

Payment.Status.of.Previous.Credit contains qualitative data.Central tendencies ,dispersion does not make any sense.frequency table,mode and barplot are calculated for qualitative data.mode gives the maximum value of status of Account.Balance.

frequency table of Payment.Status.of.Previous.Credit

tab<-table(german_credit$Payment.Status.of.Previous.Credit)
tab
## 
##   0   1   2   3   4 
##  40  49 530  88 293
names(tab)
## [1] "0" "1" "2" "3" "4"

0: no credits taken

1: all credits at this bank paid back duly

2: existing credits paid back duly till now

3: delay in paying off in the past

4: critical account

mode of Payment.Status.of.Previous.Credit It gives the maximum value.

temp <- table(as.vector(german_credit$Payment.Status.of.Previous.Credit))
names(temp)[temp == max(temp)]
## [1] "2"

mode of Status of Payment.Status.of.Previous.Credit.=2

ggplot of Payment.Status.of.Previous.Credit

library("ggplot2")
qplot(data<-german_credit$Payment.Status.of.Previous.Credit,main="Payment.Status.of.Previous.Credit", ylab="customers", colour= I("purple"),size=I(5))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

output description:

0 stands for 40 customers have no credits taken/all credits paid back duly.

1 stands for 49 customers are all credits at this bank paid back duly.

2 stands for 530 customers have existing credits paid back duly till now.

3 stands for 88 customers are delay in paying off in the past.

4 stands for 293 customers have critical account/other credits existing (not at this bank)

correlation between Payment.Status.of.Previous.Credit and creditability

library("ltm")
 biserial.cor(german_credit$Payment.Status.of.Previous.Credit,german_credit$Creditability)
## [1] -0.2286703
library(vcd)
## Warning: package 'vcd' was built under R version 3.3.2
## Loading required package: grid
contin_table<-table(german_credit$Payment.Status.of.Previous.Credit,german_credit$Creditability)
contin_table
##    
##       0   1
##   0  25  15
##   1  28  21
##   2 169 361
##   3  28  60
##   4  50 243
assocstats(contin_table)
##                     X^2 df   P(> X^2)
## Likelihood Ratio 60.467  4 2.3139e-12
## Pearson          61.691  4 1.2792e-12
## 
## Phi-Coefficient   : NA 
## Contingency Coeff.: 0.241 
## Cramer's V        : 0.248

correlation is -0.22. Payment.Status.of.Previous.Credit and creditability are negatively correlated.

library("gmodels")
## Warning: package 'gmodels' was built under R version 3.3.2
CrossTable(german_credit$Creditability, german_credit$Payment.Status.of.Previous.Credit, digits=1,prop.r=F, prop.t=F, prop.chisq=F, chisq=T)
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |           N / Col Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  1000 
## 
##  
##                             | german_credit$Payment.Status.of.Previous.Credit 
## german_credit$Creditability |         0 |         1 |         2 |         3 |         4 | Row Total | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|-----------|
##                           0 |        25 |        28 |       169 |        28 |        50 |       300 | 
##                             |       0.6 |       0.6 |       0.3 |       0.3 |       0.2 |           | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|-----------|
##                           1 |        15 |        21 |       361 |        60 |       243 |       700 | 
##                             |       0.4 |       0.4 |       0.7 |       0.7 |       0.8 |           | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|-----------|
##                Column Total |        40 |        49 |       530 |        88 |       293 |      1000 | 
##                             |       0.0 |       0.0 |       0.5 |       0.1 |       0.3 |           | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|-----------|
## 
##  
## Statistics for All Table Factors
## 
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 =  61.6914     d.f. =  4     p =  1.279187e-12 
## 
## 
## 

variable5=“purpose”

purpose is a qualitative data.It contains 11 categories. 0 : car (new)

1 : car (used)

2 : furniture/equipment

3 : radio/television

4 : domestic appliances

5 : repairs

6 : education

7 : (vacation - does not exist?)

8 : retraining

9 : business

10 : others

purpose contains qualitative data.Central tendencies ,dispersion does not make any sense.frequency table,mode and barplot are calculated for qualitative data.mode gives the maximum value of purpose

frequency table of purpose

tab<-table(german_credit$Purpose)
tab
## 
##   0   1   2   3   4   5   6   8   9  10 
## 234 103 181 280  12  22  50   9  97  12
names(tab)
##  [1] "0"  "1"  "2"  "3"  "4"  "5"  "6"  "8"  "9"  "10"

0 : car (new)

1 : car (used)

2 : furniture/equipment

3 : radio/television

4 : domestic appliances

5 : repairs

6 : education

7 : (vacation - does not exist?)

8 : retraining

9 : business

10 : others

mode of purpose.It gives the maximum value.

temp <- table(as.vector(german_credit$Purpose))
names(temp)[temp == max(temp)]
## [1] "3"

mode of Status of Account.Balance is 3.

ggplot of purpose

library("ggplot2", lib.loc="~/R/win-library/3.3")
qplot(data<-german_credit$Purpose,main="purpose", ylab="customers", colour= I("purple"),size=I(5))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

correlation between purpose and creditability

library("ltm", lib.loc="~/R/win-library/3.3")
 biserial.cor(german_credit$Purpose,german_credit$Creditability)
## [1] 0.01796988
library(vcd)
contin_table<-table(german_credit$Purpose,german_credit$Creditability)
contin_table
##     
##        0   1
##   0   89 145
##   1   17  86
##   2   58 123
##   3   62 218
##   4    4   8
##   5    8  14
##   6   22  28
##   8    1   8
##   9   34  63
##   10   5   7
assocstats(contin_table)
##                     X^2 df   P(> X^2)
## Likelihood Ratio 34.510  9 7.2688e-05
## Pearson          33.356  9 1.1575e-04
## 
## Phi-Coefficient   : NA 
## Contingency Coeff.: 0.18 
## Cramer's V        : 0.183

correlation is 0.017. purpose and creditability are positivevely correlated.

crosstable of purpose and creditability

library("gmodels", lib.loc="~/R/win-library/3.3")
CrossTable(german_credit$Creditability, german_credit$Purpose, digits=1,prop.r=F, prop.t=F, prop.chisq=F, chisq=T)
## Warning in chisq.test(t, correct = FALSE, ...): Chi-squared approximation
## may be incorrect
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |           N / Col Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  1000 
## 
##  
##                             | german_credit$Purpose 
## german_credit$Creditability |         0 |         1 |         2 |         3 |         4 |         5 |         6 |         8 |         9 |        10 | Row Total | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
##                           0 |        89 |        17 |        58 |        62 |         4 |         8 |        22 |         1 |        34 |         5 |       300 | 
##                             |       0.4 |       0.2 |       0.3 |       0.2 |       0.3 |       0.4 |       0.4 |       0.1 |       0.4 |       0.4 |           | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
##                           1 |       145 |        86 |       123 |       218 |         8 |        14 |        28 |         8 |        63 |         7 |       700 | 
##                             |       0.6 |       0.8 |       0.7 |       0.8 |       0.7 |       0.6 |       0.6 |       0.9 |       0.6 |       0.6 |           | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
##                Column Total |       234 |       103 |       181 |       280 |        12 |        22 |        50 |         9 |        97 |        12 |      1000 | 
##                             |       0.2 |       0.1 |       0.2 |       0.3 |       0.0 |       0.0 |       0.0 |       0.0 |       0.1 |       0.0 |           | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
## 
##  
## Statistics for All Table Factors
## 
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 =  33.35645     d.f. =  9     p =  0.0001157491 
## 
## 
## 

variable6=“Credit.Amount”

It is a numerical data. summary gives four quartiles of Credit.Amount

summary(german_credit$Credit.Amount)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     250    1366    2320    3271    3972   18420
hist(german_credit$Credit.Amount)

boxplot of Credit.Amount

quantile(german_credit$Credit.Amount)
##       0%      25%      50%      75%     100% 
##   250.00  1365.50  2319.50  3972.25 18424.00
quantile(german_credit$Credit.Amount,c(0.75,0.80,0.90,1))
##      75%      80%      90%     100% 
##  3972.25  4720.00  7179.40 18424.00
boxplot(german_credit$Credit.Amount)

output description Notethat outliers are discussed later.

histogram of Credit.Amount

hist(german_credit$Credit.Amount)

correlation between Credit.Amount and response

library("ltm", lib.loc="~/R/win-library/3.3")
biserial.cor(german_credit$Credit.Amount,german_credit$Creditability)
## [1] 0.1546628

correlation is 0.15. Credit.Amount and Creditability positively correlated. #t-test

t.test(german_credit$Credit.Amount)
## 
##  One Sample t-test
## 
## data:  german_credit$Credit.Amount
## t = 36.647, df = 999, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  3096.083 3446.413
## sample estimates:
## mean of x 
##  3271.248

variable7=“Value.Savings.Stocks”

Average balance in savings account

Average balance in savings account is a qualitative data.It contains 5 categories.

1 : < 100 DM

2 : 100<= … < 500 DM

3 : 500<= … < 1000 DM

4 : =>1000 DM

5 : unknown/ no savings account

DM-Deutsche mark.The basic unit of money in Germany.

Average balance in savings account contains qualitative data.Central tendencies ,dispersion does not make any sense.frequency table,mode and barplot are calculated for qualitative data.mode gives the maximum value of Average balance in savings account

frequency table of Average balance in savings account

tab<-table(german_credit$Value.Savings.Stocks)
tab
## 
##   1   2   3   4   5 
## 603 103  63  48 183
names(tab)
## [1] "1" "2" "3" "4" "5"

1 : < 100 DM

2 : 100<= … < 500 DM

3 : 500<= … < 1000 DM

4 : =>1000 DM

5 : unknown/ no savings account

mode of Value.Savings.Stocks.It gives the maximum value.

temp <- table(as.vector(german_credit$Value.Savings.Stocks))
names(temp)[temp == max(temp)]
## [1] "1"

mode of Value.Savings.Stocks is 1.

ggplot of Value.Savings.Stocks

library("ggplot2", lib.loc="~/R/win-library/3.3")
qplot(data<-german_credit$Value.Savings.Stocks,main="Value.Savings.Stocks", ylab="German_currency-Dm", colour= I("purple"),size=I(5))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

output description: 1 stands for 603 people have below 100 DM balance.

2 stands for 103 people have below 500 DM balance.

3 stands for 63 people have below 1000 DM balance.

4 stands for 48 people have above 1000 DM balance.

5 stands for 183 people have no checking account.

correlation between Value.Savings.Stocks and creditability

library("ltm", lib.loc="~/R/win-library/3.3")
 biserial.cor(german_credit$Value.Savings.Stocks,german_credit$Creditability)
## [1] -0.1788532
library(vcd)
contin_table<-table(german_credit$Value.Savings.Stocks,german_credit$Creditability)
contin_table
##    
##       0   1
##   1 217 386
##   2  34  69
##   3  11  52
##   4   6  42
##   5  32 151
assocstats(contin_table)
##                     X^2 df   P(> X^2)
## Likelihood Ratio 38.975  4 7.0491e-08
## Pearson          36.099  4 2.7612e-07
## 
## Phi-Coefficient   : NA 
## Contingency Coeff.: 0.187 
## Cramer's V        : 0.19

correlation is -0.17. Value.Savings.Stocks and creditability are negatively correlated.

crosstable of Account.Balance and creditability

library("gmodels", lib.loc="~/R/win-library/3.3")
CrossTable(german_credit$Creditability, german_credit$Value.Savings.Stocks, digits=1, prop.r=F, prop.t=F, prop.chisq=F, chisq=T)
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |           N / Col Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  1000 
## 
##  
##                             | german_credit$Value.Savings.Stocks 
## german_credit$Creditability |         1 |         2 |         3 |         4 |         5 | Row Total | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|-----------|
##                           0 |       217 |        34 |        11 |         6 |        32 |       300 | 
##                             |       0.4 |       0.3 |       0.2 |       0.1 |       0.2 |           | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|-----------|
##                           1 |       386 |        69 |        52 |        42 |       151 |       700 | 
##                             |       0.6 |       0.7 |       0.8 |       0.9 |       0.8 |           | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|-----------|
##                Column Total |       603 |       103 |        63 |        48 |       183 |      1000 | 
##                             |       0.6 |       0.1 |       0.1 |       0.0 |       0.2 |           | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|-----------|
## 
##  
## Statistics for All Table Factors
## 
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 =  36.09893     d.f. =  4     p =  2.761214e-07 
## 
## 
## 

variable8 = “Length.of.current.employment”

It is a qualitative data.It has 5 categories.

1 : unemployed

2: < 1 year

3 : 1 <= … < 4 years

4 : 4 <=… < 7 years

4 : >= 7 years

Length.of.current.employment contains qualitative data.Central tendencies ,dispersion does not make any sense.frequency table,mode and barplot are calculated for qualitative data.mode gives the maximum value of Length.of.current.employment

frequency table of Length.of.current.employment

tab<-table(german_credit$Length.of.current.employment)
tab
## 
##   1   2   3   4   5 
##  62 172 339 174 253
names(tab)
## [1] "1" "2" "3" "4" "5"

1 : unemployed

2: < 1 year

3 : 1 <= … < 4 years

4 : 4 <=… < 7 years

4 : >= 7 years

mode of Length.of.current.employment.It gives the maximum value.

temp <- table(as.vector(german_credit$Length.of.current.employment))
names(temp)[temp == max(temp)]
## [1] "3"

mode of Status of Length.of.current.employment is 3.

ggplot of Length.of.current.employment

library("ggplot2", lib.loc="~/R/win-library/3.3")
qplot(data<-german_credit$Length.of.current.employment,main="Length.of.current.employment", ylab="employees", colour= I("purple"),size=I(5))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

correlation between Length.of.current.employment and creditability

correlation is -0.11. Length.of.current.employment and creditability are negatively correlated.

crosstable of Length.of.current.employment and creditability

library("ltm", lib.loc="~/R/win-library/3.3")
 biserial.cor(german_credit$Length.of.current.employment,german_credit$Creditability)
## [1] -0.115944
library(vcd)
contin_table<-table(german_credit$Length.of.current.employment,german_credit$Creditability)
contin_table
##    
##       0   1
##   1  23  39
##   2  70 102
##   3 104 235
##   4  39 135
##   5  64 189
assocstats(contin_table)
##                     X^2 df  P(> X^2)
## Likelihood Ratio 18.164  4 0.0011464
## Pearson          18.368  4 0.0010455
## 
## Phi-Coefficient   : NA 
## Contingency Coeff.: 0.134 
## Cramer's V        : 0.136

correlation is -0.11. Length.of.current.employment and creditability are negatively correlated.

crosstable of Length.of.current.employment and creditability

library("gmodels", lib.loc="~/R/win-library/3.3")
CrossTable(german_credit$Creditability, german_credit$Length.of.current.employment, digits=1, prop.r=F, prop.t=F, prop.chisq=F, chisq=T)
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |           N / Col Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  1000 
## 
##  
##                             | german_credit$Length.of.current.employment 
## german_credit$Creditability |         1 |         2 |         3 |         4 |         5 | Row Total | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|-----------|
##                           0 |        23 |        70 |       104 |        39 |        64 |       300 | 
##                             |       0.4 |       0.4 |       0.3 |       0.2 |       0.3 |           | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|-----------|
##                           1 |        39 |       102 |       235 |       135 |       189 |       700 | 
##                             |       0.6 |       0.6 |       0.7 |       0.8 |       0.7 |           | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|-----------|
##                Column Total |        62 |       172 |       339 |       174 |       253 |      1000 | 
##                             |       0.1 |       0.2 |       0.3 |       0.2 |       0.3 |           | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|-----------|
## 
##  
## Statistics for All Table Factors
## 
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 =  18.36827     d.f. =  4     p =  0.001045452 
## 
## 
## 

variable9 =“Instalment.per.cent”

Installment rate as % of disposable income.It is a qualitative data.It has a 4 categories.

frequency table of Instalment.per.cent

tab<-table(german_credit$Instalment.per.cent)
tab
## 
##   1   2   3   4 
## 136 231 157 476
names(tab)
## [1] "1" "2" "3" "4"

mode of Instalment.per.cent.It gives the maximum value.

temp <- table(as.vector(german_credit$Instalment.per.cent))
names(temp)[temp == max(temp)]
## [1] "4"

mode of Status of Instalment.per.cent is 4.

ggplot of Instalment.per.cent

library("ggplot2", lib.loc="~/R/win-library/3.3")
qplot(data<-german_credit$Instalment.per.cent,main="Instalment.per.cent", ylab="customers", colour= I("purple"),size=I(5))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

correlation between Instalment.per.cent and creditability

library("ltm", lib.loc="~/R/win-library/3.3")
 biserial.cor(german_credit$Instalment.per.cent,german_credit$Creditability)
## [1] 0.07236773
library(vcd)
contin_table<-table(german_credit$Instalment.per.cent,german_credit$Creditability)
contin_table
##    
##       0   1
##   1  34 102
##   2  62 169
##   3  45 112
##   4 159 317
assocstats(contin_table)
##                     X^2 df P(> X^2)
## Likelihood Ratio 5.5065  3  0.13825
## Pearson          5.4768  3  0.14003
## 
## Phi-Coefficient   : NA 
## Contingency Coeff.: 0.074 
## Cramer's V        : 0.074

correlation is 0.072. Instalment.per.cent and creditability are positively correlated.

crosstable of Instalment.per.cent and creditability

library("gmodels", lib.loc="~/R/win-library/3.3")
CrossTable(german_credit$Creditability, german_credit$Instalment.per.cent, digits=1, prop.r=F, prop.t=F, prop.chisq=F, chisq=T)
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |           N / Col Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  1000 
## 
##  
##                             | german_credit$Instalment.per.cent 
## german_credit$Creditability |         1 |         2 |         3 |         4 | Row Total | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|
##                           0 |        34 |        62 |        45 |       159 |       300 | 
##                             |       0.2 |       0.3 |       0.3 |       0.3 |           | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|
##                           1 |       102 |       169 |       112 |       317 |       700 | 
##                             |       0.8 |       0.7 |       0.7 |       0.7 |           | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|
##                Column Total |       136 |       231 |       157 |       476 |      1000 | 
##                             |       0.1 |       0.2 |       0.2 |       0.5 |           | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|
## 
##  
## Statistics for All Table Factors
## 
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 =  5.476792     d.f. =  3     p =  0.1400333 
## 
## 
## 

variable10 =“Sex…Marital.Status”

variable 10 is Personal status and sex .It is a qualitative data.There are 4 categories.

1 : male : divorced/separated

2 : female : divorced/separated/married

3 : male : single

4 : male : married/widowed

frequency table of Sex…Marital.Status

tab<-table(german_credit$Sex...Marital.Status)
tab
## 
##   1   2   3   4 
##  50 310 548  92
names(tab)
## [1] "1" "2" "3" "4"

1 : male : divorced/separated

2 : female : divorced/separated/married

3 : male : single

4 : male : married/widowed

mode of Sex…Marital.Status.It gives the maximum value.

temp <- table(as.vector(german_credit$Sex...Marital.Status))
names(temp)[temp == max(temp)]
## [1] "3"

mode of Status of Sex…Marital.Status is 3.

ggplot of Sex…Marital.Status

library("ggplot2", lib.loc="~/R/win-library/3.3")
qplot(data<-german_credit$Sex...Marital.Status,main="Sex...Marital.Status", ylab="customers", colour= I("purple"),size=I(5))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

output description:

1-50 mens are divorced/separated.

2-310 womens aredivorced/separated/married.

3-548 males are single.

4-92 males are married/widowed.

correlation between Sex…Marital.Status and creditability

library("ltm", lib.loc="~/R/win-library/3.3")
 biserial.cor(german_credit$Sex...Marital.Status,german_credit$Creditability)
## [1] -0.0881402
library(vcd)
contin_table<-table(german_credit$Sex...Marital.Status,german_credit$Creditability)
contin_table
##    
##       0   1
##   1  20  30
##   2 109 201
##   3 146 402
##   4  25  67
assocstats(contin_table)
##                     X^2 df P(> X^2)
## Likelihood Ratio 9.4414  3 0.023963
## Pearson          9.6052  3 0.022238
## 
## Phi-Coefficient   : NA 
## Contingency Coeff.: 0.098 
## Cramer's V        : 0.098

correlation is -0.088. Sex…Marital.Status and creditability are negatively correlated.

crosstable of Sex…Marital.Status and creditability

library("gmodels", lib.loc="~/R/win-library/3.3")
CrossTable(german_credit$Creditability, german_credit$Sex...Marital.Status, digits=1, prop.r=F, prop.t=F, prop.chisq=F, chisq=T)
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |           N / Col Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  1000 
## 
##  
##                             | german_credit$Sex...Marital.Status 
## german_credit$Creditability |         1 |         2 |         3 |         4 | Row Total | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|
##                           0 |        20 |       109 |       146 |        25 |       300 | 
##                             |       0.4 |       0.4 |       0.3 |       0.3 |           | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|
##                           1 |        30 |       201 |       402 |        67 |       700 | 
##                             |       0.6 |       0.6 |       0.7 |       0.7 |           | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|
##                Column Total |        50 |       310 |       548 |        92 |      1000 | 
##                             |       0.0 |       0.3 |       0.5 |       0.1 |           | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|
## 
##  
## Statistics for All Table Factors
## 
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 =  9.605214     d.f. =  3     p =  0.02223801 
## 
## 
## 

variable11 = “Guarantors”

It is a qualitative data.It contain 3 categories.

1 : none

2 : co-applicant

3 : guarantor

frequency table of Guarantors

tab<-table(german_credit$Guarantors)
tab
## 
##   1   2   3 
## 907  41  52
names(tab)
## [1] "1" "2" "3"

1 -stands for none, 2 -stands for co-applicant, 3 -stands for guarantor .

mode of Guarantors.It gives the maximum value.

temp <- table(as.vector(german_credit$Guarantors))
names(temp)[temp == max(temp)]
## [1] "1"

mode of Status of Guarantors 1.

ggplot of Guarantors

library("ggplot2", lib.loc="~/R/win-library/3.3")
qplot(data<-german_credit$Guarantors,main="Guarantors", ylab="customers", colour= I("purple"),size=I(5))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

output description:

1 stands for 907 customers have no Guarantors.

2 stands for 41 customers have co-applicants.

3 stands for 52 customers have Guarantors.

library("ltm", lib.loc="~/R/win-library/3.3")
 biserial.cor(german_credit$Guarantors,german_credit$Creditability)
## [1] -0.0251242
library(vcd)
contin_table<-table(german_credit$Guarantors,german_credit$Creditability)
contin_table
##    
##       0   1
##   1 272 635
##   2  18  23
##   3  10  42
assocstats(contin_table)
##                     X^2 df P(> X^2)
## Likelihood Ratio 6.6501  2 0.035971
## Pearson          6.6454  2 0.036056
## 
## Phi-Coefficient   : NA 
## Contingency Coeff.: 0.081 
## Cramer's V        : 0.082

correlation is -0.025. Guarantors and creditability are negatively correlated.

crosstable of Guarantors and creditability

library("gmodels", lib.loc="~/R/win-library/3.3")
CrossTable(german_credit$Creditability, german_credit$Guarantors, digits=1, prop.r=F, prop.t=F, prop.chisq=F, chisq=T)
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |           N / Col Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  1000 
## 
##  
##                             | german_credit$Guarantors 
## german_credit$Creditability |         1 |         2 |         3 | Row Total | 
## ----------------------------|-----------|-----------|-----------|-----------|
##                           0 |       272 |        18 |        10 |       300 | 
##                             |       0.3 |       0.4 |       0.2 |           | 
## ----------------------------|-----------|-----------|-----------|-----------|
##                           1 |       635 |        23 |        42 |       700 | 
##                             |       0.7 |       0.6 |       0.8 |           | 
## ----------------------------|-----------|-----------|-----------|-----------|
##                Column Total |       907 |        41 |        52 |      1000 | 
##                             |       0.9 |       0.0 |       0.1 |           | 
## ----------------------------|-----------|-----------|-----------|-----------|
## 
##  
## Statistics for All Table Factors
## 
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 =  6.645367     d.f. =  2     p =  0.03605595 
## 
## 
## 

variables12 = “Duration.in.Current.address”

It is a Categorical data.It has a 4 categories.

1: <= 1 year

2: <.<=2 years

3: <.<=3 years

4: >4years

frequency table of Duration.in.Current.address

tab<-table(german_credit$Duration.in.Current.address)
tab
## 
##   1   2   3   4 
## 130 308 149 413
names(tab)
## [1] "1" "2" "3" "4"

1: <= 1 year

2: <.<=2 years

3: <.<=3 years

4: >4years

mode of Duration.in.Current.address.It gives the maximum value.

temp <- table(as.vector(german_credit$Duration.in.Current.address))
names(temp)[temp == max(temp)]
## [1] "4"

mode of Duration.in.Current.address is 4.

ggplot of Duration.in.Current.address

library("ggplot2", lib.loc="~/R/win-library/3.3")
qplot(data<-german_credit$Duration.in.Current.address,main="Duration.in.Current.address", ylab="customers", colour= I("purple"),size=I(5))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

library("ltm", lib.loc="~/R/win-library/3.3")
 biserial.cor(german_credit$Duration.in.Current.address,german_credit$Creditability)
## [1] 0.002965675
library(vcd)
contin_table<-table(german_credit$Duration.in.Current.address,german_credit$Creditability)
contin_table
##    
##       0   1
##   1  36  94
##   2  97 211
##   3  43 106
##   4 124 289
assocstats(contin_table)
##                      X^2 df P(> X^2)
## Likelihood Ratio 0.75207  3  0.86089
## Pearson          0.74930  3  0.86155
## 
## Phi-Coefficient   : NA 
## Contingency Coeff.: 0.027 
## Cramer's V        : 0.027

correlation is 0.002. Duration.in.Current.address and creditability are positively correlated.

crosstable of Duration.in.Current.address and creditability

library("gmodels", lib.loc="~/R/win-library/3.3")
CrossTable(german_credit$Creditability, german_credit$Duration.in.Current.address, digits=1, prop.r=F, prop.t=F, prop.chisq=F, chisq=T)
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |           N / Col Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  1000 
## 
##  
##                             | german_credit$Duration.in.Current.address 
## german_credit$Creditability |         1 |         2 |         3 |         4 | Row Total | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|
##                           0 |        36 |        97 |        43 |       124 |       300 | 
##                             |       0.3 |       0.3 |       0.3 |       0.3 |           | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|
##                           1 |        94 |       211 |       106 |       289 |       700 | 
##                             |       0.7 |       0.7 |       0.7 |       0.7 |           | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|
##                Column Total |       130 |       308 |       149 |       413 |      1000 | 
##                             |       0.1 |       0.3 |       0.1 |       0.4 |           | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|
## 
##  
## Statistics for All Table Factors
## 
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 =  0.7492964     d.f. =  3     p =  0.8615521 
## 
## 
## 

variable13=“Most.valuable.available.asset”

It is a qualitative data.It contains 4 categories.

1 : real estate

2 : if not A121 : building society savings agreement/life insurance

3 : if not A121/A122 : car or other, not in variable 7

4 : unknown / no property

frequency table Most.valuable.available.asset

tab<-table(german_credit$Most.valuable.available.asset)
tab
## 
##   1   2   3   4 
## 282 232 332 154
names(tab)
## [1] "1" "2" "3" "4"

1 : real estate

2 : if not A121 : building society savings agreement/life insurance

3 : if not A121/A122 : car or other, not in variable 7

4 : unknown / no property

mode of Most.valuable.available.asset.It gives the maximum value.

temp <- table(as.vector(german_credit$Most.valuable.available.asset))
names(temp)[temp == max(temp)]
## [1] "3"

mode of Most.valuable.available.asset is 3.

ggplot of Most.valuable.available.asset

library("ggplot2", lib.loc="~/R/win-library/3.3")
qplot(data<-german_credit$Most.valuable.available.asset,main="Most.valuable.available.asset", ylab="customers", colour= I("purple"),size=I(5))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

correlation between Most.valuable.available.asset and creditability

library("ltm", lib.loc="~/R/win-library/3.3")
 biserial.cor(german_credit$Most.valuable.available.asset,german_credit$Creditability)
## [1] 0.1425406
library(vcd)
contin_table<-table(german_credit$Most.valuable.available.asset,german_credit$Creditability)
contin_table
##    
##       0   1
##   1  60 222
##   2  71 161
##   3 102 230
##   4  67  87
assocstats(contin_table)
##                     X^2 df   P(> X^2)
## Likelihood Ratio 23.546  3 3.1063e-05
## Pearson          23.720  3 2.8584e-05
## 
## Phi-Coefficient   : NA 
## Contingency Coeff.: 0.152 
## Cramer's V        : 0.154

correlation is 0.14. Most.valuable.available.asset and creditability are positively correlated.

crosstable of Most.valuable.available.asset and creditability

library("gmodels", lib.loc="~/R/win-library/3.3")
CrossTable(german_credit$Creditability, german_credit$Most.valuable.available.asset, digits=1, prop.r=F, prop.t=F, prop.chisq=F, chisq=T)
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |           N / Col Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  1000 
## 
##  
##                             | german_credit$Most.valuable.available.asset 
## german_credit$Creditability |         1 |         2 |         3 |         4 | Row Total | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|
##                           0 |        60 |        71 |       102 |        67 |       300 | 
##                             |       0.2 |       0.3 |       0.3 |       0.4 |           | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|
##                           1 |       222 |       161 |       230 |        87 |       700 | 
##                             |       0.8 |       0.7 |       0.7 |       0.6 |           | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|
##                Column Total |       282 |       232 |       332 |       154 |      1000 | 
##                             |       0.3 |       0.2 |       0.3 |       0.2 |           | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|
## 
##  
## Statistics for All Table Factors
## 
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 =  23.71955     d.f. =  3     p =  2.858442e-05 
## 
## 
## 

variable14 = “Age..years.”

It is a Numerical data.

head(german_credit$Age..years.)
## [1] 21 36 23 39 38 48

Univariate Analysis of Age..years.

Central tendencies of Age..years.

mean of Age..years.

mean(german_credit$Age..years.)
## [1] 35.542

median of Age..years.

median(german_credit$Age..years.)
## [1] 33

Dispersion of Age..years. Variance of Age..years.

var(german_credit$Age..years.)
## [1] 128.8831

Standard deviation of Age..years.

sd(german_credit$Age..years.)
## [1] 11.35267

summary gives four quartiles of Age..years.

summary(german_credit$Age..years.)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   19.00   27.00   33.00   35.54   42.00   75.00

boxplot of Age..years.

quantile(german_credit$Age..years.)
##   0%  25%  50%  75% 100% 
##   19   27   33   42   75
quantile(german_credit$Age..years.,c(0.75,0.80,0.90,1))
##  75%  80%  90% 100% 
##   42   44   52   75
boxplot(german_credit$Age..years.)

output description

In this boxplot the minimum is 19 , maximum is 75, and median is 33. first quartile is 27,third quartile is 42. Notethat outliers are discussed later.

histogram of Age..years.

hist(german_credit$Age..years.)

correlation between Age..years. and Creditability

library("ltm", lib.loc="~/R/win-library/3.3")
biserial.cor(german_credit$Age..years.,german_credit$Creditability)
## [1] -0.0912263

correlation is -0.091.Age..years.and Creditability negatively correlated. #t-test

t.test(german_credit$Age..years.)
## 
##  One Sample t-test
## 
## data:  german_credit$Age..years.
## t = 99.002, df = 999, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  34.83751 36.24649
## sample estimates:
## mean of x 
##    35.542

variable15=“Concurrent.Credits”

It is a qualitative data.It contains 3 categories.

1 : bank

2 : stores

3 : none

frequency table of Concurrent.Credits

tab<-table(german_credit$Concurrent.Credits)
tab
## 
##   1   2   3 
## 139  47 814
names(tab)
## [1] "1" "2" "3"

1 : bank

2 : stores

3 : none

mode of Concurrent.Credits.It gives the maximum value.

temp <- table(as.vector(german_credit$Concurrent.Credits))
names(temp)[temp == max(temp)]
## [1] "3"

mode of Concurrent.Credits is 3.

ggplot of Concurrent.Credits

library("ggplot2", lib.loc="~/R/win-library/3.3")
qplot(data<-german_credit$Concurrent.Credits,main="Concurrent.Credits", ylab="customers", colour= I("purple"),size=I(5))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

output description:

1 stands for 139 customers are in bank.

2 stands for 47 customers are in store.

3 stands for 814 customers have no concurrent credits.

correlation between Concurrent.Credits and creditability

library("ltm", lib.loc="~/R/win-library/3.3")
 biserial.cor(german_credit$Concurrent.Credits,german_credit$Creditability)
## [1] -0.1097892
library(vcd)
contin_table<-table(german_credit$Concurrent.Credits,german_credit$Creditability)
contin_table
##    
##       0   1
##   1  57  82
##   2  19  28
##   3 224 590
assocstats(contin_table)
##                     X^2 df  P(> X^2)
## Likelihood Ratio 12.303  2 0.0021298
## Pearson          12.839  2 0.0016293
## 
## Phi-Coefficient   : NA 
## Contingency Coeff.: 0.113 
## Cramer's V        : 0.113

correlation is -0.109. Concurrent.Credits and creditability are negatively correlated.

crosstable of Concurrent.Credits and creditability

library("gmodels", lib.loc="~/R/win-library/3.3")
CrossTable(german_credit$Creditability, german_credit$Concurrent.Credits, digits=1, prop.r=F, prop.t=F, prop.chisq=F, chisq=T)
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |           N / Col Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  1000 
## 
##  
##                             | german_credit$Concurrent.Credits 
## german_credit$Creditability |         1 |         2 |         3 | Row Total | 
## ----------------------------|-----------|-----------|-----------|-----------|
##                           0 |        57 |        19 |       224 |       300 | 
##                             |       0.4 |       0.4 |       0.3 |           | 
## ----------------------------|-----------|-----------|-----------|-----------|
##                           1 |        82 |        28 |       590 |       700 | 
##                             |       0.6 |       0.6 |       0.7 |           | 
## ----------------------------|-----------|-----------|-----------|-----------|
##                Column Total |       139 |        47 |       814 |      1000 | 
##                             |       0.1 |       0.0 |       0.8 |           | 
## ----------------------------|-----------|-----------|-----------|-----------|
## 
##  
## Statistics for All Table Factors
## 
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 =  12.83919     d.f. =  2     p =  0.001629318 
## 
## 
## 

variable16 = “Type.of.apartment”

It is a qualitative data.It has a 3 categories.

1 : rent

2 : own

3 : for free

frequency table of Type.of.apartment

tab<-table(german_credit$Type.of.apartment)
tab
## 
##   1   2   3 
## 179 714 107
names(tab)
## [1] "1" "2" "3"

1 : rent

2 : own

3 : for free

mode of Type.of.apartment.It gives the maximum value.

temp <- table(as.vector(german_credit$Type.of.apartment))
names(temp)[temp == max(temp)]
## [1] "2"

mode of Status of Type.of.apartment is 2.

ggplot of Type.of.apartment

library("ggplot2", lib.loc="~/R/win-library/3.3")
qplot(data<-german_credit$Type.of.apartment,main="Type.of.apartment", ylab="customers", colour= I("purple"),size=I(5))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

output description:

1 stands for 179 customers are staying in rent houses.

2 stands for 714 customers are staying in own houses.

3 stands for 107 customers are staying in free quaters.

correlation between Type.of.apartment and creditability

library("ltm", lib.loc="~/R/win-library/3.3")
 biserial.cor(german_credit$Type.of.apartment,german_credit$Creditability)
## [1] -0.01810985
library(vcd)
contin_table<-table(german_credit$Type.of.apartment,german_credit$Creditability)
contin_table
##    
##       0   1
##   1  70 109
##   2 186 528
##   3  44  63
assocstats(contin_table)
##                     X^2 df   P(> X^2)
## Likelihood Ratio 18.129  2 1.1573e-04
## Pearson          18.674  2 8.8103e-05
## 
## Phi-Coefficient   : NA 
## Contingency Coeff.: 0.135 
## Cramer's V        : 0.137

correlation is -0.018. Type.of.apartment and creditability are negatively correlated.

crosstable of Type.of.apartment and creditability

library("gmodels", lib.loc="~/R/win-library/3.3")
CrossTable(german_credit$Creditability, german_credit$Type.of.apartment, digits=1, prop.r=F, prop.t=F, prop.chisq=F, chisq=T)
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |           N / Col Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  1000 
## 
##  
##                             | german_credit$Type.of.apartment 
## german_credit$Creditability |         1 |         2 |         3 | Row Total | 
## ----------------------------|-----------|-----------|-----------|-----------|
##                           0 |        70 |       186 |        44 |       300 | 
##                             |       0.4 |       0.3 |       0.4 |           | 
## ----------------------------|-----------|-----------|-----------|-----------|
##                           1 |       109 |       528 |        63 |       700 | 
##                             |       0.6 |       0.7 |       0.6 |           | 
## ----------------------------|-----------|-----------|-----------|-----------|
##                Column Total |       179 |       714 |       107 |      1000 | 
##                             |       0.2 |       0.7 |       0.1 |           | 
## ----------------------------|-----------|-----------|-----------|-----------|
## 
##  
## Statistics for All Table Factors
## 
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 =  18.67401     d.f. =  2     p =  8.810311e-05 
## 
## 
## 

variable17= “No.of.Credits.at.this.Bank”

It is a qualitative data. It has a 4 categories.

frequency table No.of.Credits.at.this.Bank

tab<-table(german_credit$No.of.Credits.at.this.Bank)
tab
## 
##   1   2   3   4 
## 633 333  28   6
names(tab)
## [1] "1" "2" "3" "4"

mode of No.of.Credits.at.this.Bank.It gives the maximum value.

temp <- table(as.vector(german_credit$No.of.Credits.at.this.Bank))
names(temp)[temp == max(temp)]
## [1] "1"

mode of No.of.Credits.at.this.Bank is 1.

ggplot of No.of.Credits.at.this.Bank

library("ggplot2", lib.loc="~/R/win-library/3.3")
qplot(data<-german_credit$No.of.Credits.at.this.Bank,main="No.of.Credits.at.this.Bank", ylab="customers", colour= I("purple"),size=I(5))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

correlation between No.of.Credits.at.this.Bank and creditability

library("ltm", lib.loc="~/R/win-library/3.3")
 biserial.cor(german_credit$No.of.Credits.at.this.Bank,german_credit$Creditability)
## [1] -0.04570962
library(vcd)
contin_table<-table(german_credit$No.of.Credits.at.this.Bank,german_credit$Creditability)
contin_table
##    
##       0   1
##   1 200 433
##   2  92 241
##   3   6  22
##   4   2   4
assocstats(contin_table)
##                     X^2 df P(> X^2)
## Likelihood Ratio 2.7425  3  0.43304
## Pearson          2.6712  3  0.44514
## 
## Phi-Coefficient   : NA 
## Contingency Coeff.: 0.052 
## Cramer's V        : 0.052

correlation is -0.045. No.of.Credits.at.this.Bank and creditability are negatively correlated.

crosstable of No.of.Credits.at.this.Bank and creditability

library("gmodels", lib.loc="~/R/win-library/3.3")
CrossTable(german_credit$Creditability, german_credit$No.of.Credits.at.this.Bank, digits=1, prop.r=F, prop.t=F, prop.chisq=F, chisq=T)
## Warning in chisq.test(t, correct = FALSE, ...): Chi-squared approximation
## may be incorrect
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |           N / Col Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  1000 
## 
##  
##                             | german_credit$No.of.Credits.at.this.Bank 
## german_credit$Creditability |         1 |         2 |         3 |         4 | Row Total | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|
##                           0 |       200 |        92 |         6 |         2 |       300 | 
##                             |       0.3 |       0.3 |       0.2 |       0.3 |           | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|
##                           1 |       433 |       241 |        22 |         4 |       700 | 
##                             |       0.7 |       0.7 |       0.8 |       0.7 |           | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|
##                Column Total |       633 |       333 |        28 |         6 |      1000 | 
##                             |       0.6 |       0.3 |       0.0 |       0.0 |           | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|
## 
##  
## Statistics for All Table Factors
## 
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 =  2.671198     d.f. =  3     p =  0.4451441 
## 
## 
## 

variable18 =“Occupation”

It is a qualitative data.It contains 4 categories.

1 : unemployed/ unskilled - non-resident

2 : unskilled - resident

3 : skilled employee / official

4 : management/ self-employed/highly qualified employee/ officer

frequency table of Occupation

tab<-table(german_credit$Occupation)
tab
## 
##   1   2   3   4 
##  22 200 630 148
names(tab)
## [1] "1" "2" "3" "4"

1 : unemployed/ unskilled - non-resident

2 : unskilled - resident

3 : skilled employee / official

4 : management/ self-employed/highly qualified employee/ officer

mode of Occupation.It gives the maximum value.

temp <- table(as.vector(german_credit$Occupation))
names(temp)[temp == max(temp)]
## [1] "3"

mode of Occupation is 3.

ggplot of Occupation

library("ggplot2", lib.loc="~/R/win-library/3.3")
qplot(data<-german_credit$Occupation,main="Occupation", ylab="customers", colour= I("purple"),size=I(5))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

output description:

1 : 22 customers are unemployed/ unskilled - non-resident

2 : 200 customers are unskilled - resident

3 : 630 customers are skilled employee / official

4 : 148 customers are management/ self-employed/highly qualified employee/ officer

correlation between Occupation and creditability

library("ltm", lib.loc="~/R/win-library/3.3")
 biserial.cor(german_credit$Occupation,german_credit$Creditability)
## [1] 0.03271863
library(vcd)
contin_table<-table(german_credit$Occupation,german_credit$Creditability)
contin_table
##    
##       0   1
##   1   7  15
##   2  56 144
##   3 186 444
##   4  51  97
assocstats(contin_table)
##                     X^2 df P(> X^2)
## Likelihood Ratio 1.8540  3  0.60326
## Pearson          1.8852  3  0.59658
## 
## Phi-Coefficient   : NA 
## Contingency Coeff.: 0.043 
## Cramer's V        : 0.043

correlation is 0.032. Occupation and creditability are positively correlated.

crosstable of Occupation and creditability

library("gmodels", lib.loc="~/R/win-library/3.3")
CrossTable(german_credit$Creditability, german_credit$Occupation, digits=1, prop.r=F, prop.t=F, prop.chisq=F, chisq=T)
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |           N / Col Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  1000 
## 
##  
##                             | german_credit$Occupation 
## german_credit$Creditability |         1 |         2 |         3 |         4 | Row Total | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|
##                           0 |         7 |        56 |       186 |        51 |       300 | 
##                             |       0.3 |       0.3 |       0.3 |       0.3 |           | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|
##                           1 |        15 |       144 |       444 |        97 |       700 | 
##                             |       0.7 |       0.7 |       0.7 |       0.7 |           | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|
##                Column Total |        22 |       200 |       630 |       148 |      1000 | 
##                             |       0.0 |       0.2 |       0.6 |       0.1 |           | 
## ----------------------------|-----------|-----------|-----------|-----------|-----------|
## 
##  
## Statistics for All Table Factors
## 
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 =  1.885156     d.f. =  3     p =  0.5965816 
## 
## 
## 

variable19 =“No.of.dependents”

It is a qualitative data.

frequency table of No.of.dependents

tab<-table(german_credit$No.of.dependents)
tab
## 
##   1   2 
## 845 155
names(tab)
## [1] "1" "2"

mode of No.of.dependents.It gives the maximum value.

temp <- table(as.vector(german_credit$No.of.dependents))
names(temp)[temp == max(temp)]
## [1] "1"

mode of Status of No.of.dependents 1.

ggplot of No.of.dependents

library("ggplot2", lib.loc="~/R/win-library/3.3")
qplot(data<-german_credit$No.of.dependents,main="No.of.dependents", ylab="customers", colour= I("purple"),size=I(5))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

correlation between No.of.dependents and creditability

library("ltm", lib.loc="~/R/win-library/3.3")
 biserial.cor(german_credit$No.of.dependents,german_credit$Creditability)
## [1] -0.003013345
library(vcd)
contin_table<-table(german_credit$No.of.dependents,german_credit$Creditability)
contin_table
##    
##       0   1
##   1 254 591
##   2  46 109
assocstats(contin_table)
##                        X^2 df P(> X^2)
## Likelihood Ratio 0.0091047  1  0.92398
## Pearson          0.0090893  1  0.92405
## 
## Phi-Coefficient   : 0.003 
## Contingency Coeff.: 0.003 
## Cramer's V        : 0.003

correlation is -0.003. No.of.dependents and creditability are negatively correlated.

crosstable of No.of.dependents and creditability

library("gmodels", lib.loc="~/R/win-library/3.3")
CrossTable(german_credit$Creditability, german_credit$No.of.dependents, digits=1, prop.r=F, prop.t=F, prop.chisq=F, chisq=T)
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |           N / Col Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  1000 
## 
##  
##                             | german_credit$No.of.dependents 
## german_credit$Creditability |         1 |         2 | Row Total | 
## ----------------------------|-----------|-----------|-----------|
##                           0 |       254 |        46 |       300 | 
##                             |       0.3 |       0.3 |           | 
## ----------------------------|-----------|-----------|-----------|
##                           1 |       591 |       109 |       700 | 
##                             |       0.7 |       0.7 |           | 
## ----------------------------|-----------|-----------|-----------|
##                Column Total |       845 |       155 |      1000 | 
##                             |       0.8 |       0.2 |           | 
## ----------------------------|-----------|-----------|-----------|
## 
##  
## Statistics for All Table Factors
## 
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 =  0.009089339     d.f. =  1     p =  0.9240463 
## 
## Pearson's Chi-squared test with Yates' continuity correction 
## ------------------------------------------------------------
## Chi^2 =  0     d.f. =  1     p =  1 
## 
## 

variable20 =“Telephone”

It is a qualitative data. it contains 2 categories.

1 : none

2 : yes, registered under the customers name

frequency table of Telephone

tab<-table(german_credit$Telephone)
tab
## 
##   1   2 
## 596 404
names(tab)
## [1] "1" "2"

1 -stands for none, 2 -stands for yes, registered under the customers name.

mode of Telephone.It gives the maximum value.

temp <- table(as.vector(german_credit$Telephone))
names(temp)[temp == max(temp)]
## [1] "1"

mode of Status of Telephone is 1.

ggplot of Telephone

library("ggplot2", lib.loc="~/R/win-library/3.3")
qplot(data<-german_credit$Telephone,main="Telephone", ylab="customers", colour= I("purple"),size=I(5))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

correlation between Telephone and creditability

library("ltm", lib.loc="~/R/win-library/3.3")
 biserial.cor(german_credit$Telephone,german_credit$Creditability)
## [1] -0.03644795
library(vcd)
contin_table<-table(german_credit$Telephone,german_credit$Creditability)
contin_table
##    
##       0   1
##   1 187 409
##   2 113 291
assocstats(contin_table)
##                     X^2 df P(> X^2)
## Likelihood Ratio 1.3359  1  0.24776
## Pearson          1.3298  1  0.24884
## 
## Phi-Coefficient   : 0.036 
## Contingency Coeff.: 0.036 
## Cramer's V        : 0.036

correlation is -0.35. Telephone and creditability are negatively correlated.

crosstable of Telephone and creditability

library("gmodels", lib.loc="~/R/win-library/3.3")
CrossTable(german_credit$Creditability, german_credit$Telephone, digits=1, prop.r=F, prop.t=F, prop.chisq=F, chisq=T)
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |           N / Col Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  1000 
## 
##  
##                             | german_credit$Telephone 
## german_credit$Creditability |         1 |         2 | Row Total | 
## ----------------------------|-----------|-----------|-----------|
##                           0 |       187 |       113 |       300 | 
##                             |       0.3 |       0.3 |           | 
## ----------------------------|-----------|-----------|-----------|
##                           1 |       409 |       291 |       700 | 
##                             |       0.7 |       0.7 |           | 
## ----------------------------|-----------|-----------|-----------|
##                Column Total |       596 |       404 |      1000 | 
##                             |       0.6 |       0.4 |           | 
## ----------------------------|-----------|-----------|-----------|
## 
##  
## Statistics for All Table Factors
## 
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 =  1.329783     d.f. =  1     p =  0.2488438 
## 
## Pearson's Chi-squared test with Yates' continuity correction 
## ------------------------------------------------------------
## Chi^2 =  1.172559     d.f. =  1     p =  0.2788762 
## 
## 

variable21= “Foreign.Worker”

It is a qualitative data.It contains a two categories.

1 : yes

2 : no

frequency table of Foreign.Worker

tab<-table(german_credit$Foreign.Worker)
tab
## 
##   1   2 
## 963  37
names(tab)
## [1] "1" "2"

1 -stands for yes, 2 -stands for no

mode of Foreign.Worker.It gives the maximum value.

temp <- table(as.vector(german_credit$Foreign.Worker))
names(temp)[temp == max(temp)]
## [1] "1"

mode of Foreign.Workere is 1.

ggplot of Foreign.Worker

library("ggplot2", lib.loc="~/R/win-library/3.3")
qplot(data<-german_credit$Foreign.Worker,main="Foreign.Worker", ylab="customers", colour= I("purple"),size=I(5))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

correlation between Foreign.Worker and creditability

library("ltm", lib.loc="~/R/win-library/3.3")
 a<-biserial.cor(german_credit$Foreign.Worker,german_credit$Creditability)

library(vcd)
contin_table<-table(german_credit$Foreign.Worker,german_credit$Creditability)
contin_table
##    
##       0   1
##   1 296 667
##   2   4  33
assocstats(contin_table)
##                     X^2 df  P(> X^2)
## Likelihood Ratio 8.0724  1 0.0044945
## Pearson          6.7370  1 0.0094431
## 
## Phi-Coefficient   : 0.082 
## Contingency Coeff.: 0.082 
## Cramer's V        : 0.082

correlation is -0.08. Foreign.Worker and creditability are negatively correlated.

crosstable of Foreign.Worker and creditability

library("gmodels", lib.loc="~/R/win-library/3.3")
CrossTable(german_credit$Creditability, german_credit$Foreign.Worker, digits=1, prop.r=F, prop.t=F, prop.chisq=F, chisq=T)
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |           N / Col Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  1000 
## 
##  
##                             | german_credit$Foreign.Worker 
## german_credit$Creditability |         1 |         2 | Row Total | 
## ----------------------------|-----------|-----------|-----------|
##                           0 |       296 |         4 |       300 | 
##                             |       0.3 |       0.1 |           | 
## ----------------------------|-----------|-----------|-----------|
##                           1 |       667 |        33 |       700 | 
##                             |       0.7 |       0.9 |           | 
## ----------------------------|-----------|-----------|-----------|
##                Column Total |       963 |        37 |      1000 | 
##                             |       1.0 |       0.0 |           | 
## ----------------------------|-----------|-----------|-----------|
## 
##  
## Statistics for All Table Factors
## 
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 =  6.737044     d.f. =  1     p =  0.009443096 
## 
## Pearson's Chi-squared test with Yates' continuity correction 
## ------------------------------------------------------------
## Chi^2 =  5.821576     d.f. =  1     p =  0.01583075 
## 
##