Company Bankruptcy Prediction

Christopher Nindyo

2022-09-01

knitr::include_graphics('bank-bankrupt-creative-concept-banking-house-building-fenced-warning-line-signal-tape-inscription-caution-board-as-46776262.jpg')

Objective

The data were collected from the Taiwan Economic Journal for the years 1999 to 2009. Company bankruptcy was defined based on the business regulations of the Taiwan Stock Exchange. From this, we are going to predict the company going to bankrupt or not.

Libraries

library(dplyr)
library(tidyverse)
library(gmodels)
library(gtools)
library(class)
library(caret)

Logistic Regression

Data Import

I use dataset from Kaggle. It’s called “Company Bankruptcy Prediction”. Or you can download from here.

company <- read.csv('data.csv')
glimpse(company)
#> Rows: 6,819
#> Columns: 96
#> $ Bankrupt.                                               <int> 1, 1, 1, 1, 1,…
#> $ ROA.C..before.interest.and.depreciation.before.interest <dbl> 0.3705943, 0.4…
#> $ ROA.A..before.interest.and...after.tax                  <dbl> 0.4243894, 0.5…
#> $ ROA.B..before.interest.and.depreciation.after.tax       <dbl> 0.4057498, 0.5…
#> $ Operating.Gross.Margin                                  <dbl> 0.6014572, 0.6…
#> $ Realized.Sales.Gross.Margin                             <dbl> 0.6014572, 0.6…
#> $ Operating.Profit.Rate                                   <dbl> 0.9989692, 0.9…
#> $ Pre.tax.net.Interest.Rate                               <dbl> 0.7968871, 0.7…
#> $ After.tax.net.Interest.Rate                             <dbl> 0.8088094, 0.8…
#> $ Non.industry.income.and.expenditure.revenue             <dbl> 0.3026464, 0.3…
#> $ Continuous.interest.rate..after.tax.                    <dbl> 0.7809849, 0.7…
#> $ Operating.Expense.Rate                                  <dbl> 0.0001256969, …
#> $ Research.and.development.expense.rate                   <dbl> 0, 0, 25500000…
#> $ Cash.flow.rate                                          <dbl> 0.4581431, 0.4…
#> $ Interest.bearing.debt.interest.rate                     <dbl> 0.0007250725, …
#> $ Tax.rate..A.                                            <dbl> 0.000000000, 0…
#> $ Net.Value.Per.Share..B.                                 <dbl> 0.1479499, 0.1…
#> $ Net.Value.Per.Share..A.                                 <dbl> 0.1479499, 0.1…
#> $ Net.Value.Per.Share..C.                                 <dbl> 0.1479499, 0.1…
#> $ Persistent.EPS.in.the.Last.Four.Seasons                 <dbl> 0.1691406, 0.2…
#> $ Cash.Flow.Per.Share                                     <dbl> 0.3116644, 0.3…
#> $ Revenue.Per.Share..Yuan...                              <dbl> 0.017559780, 0…
#> $ Operating.Profit.Per.Share..Yuan...                     <dbl> 0.09592053, 0.…
#> $ Per.Share.Net.profit.before.tax..Yuan...                <dbl> 0.1387362, 0.1…
#> $ Realized.Sales.Gross.Profit.Growth.Rate                 <dbl> 0.02210228, 0.…
#> $ Operating.Profit.Growth.Rate                            <dbl> 0.8481950, 0.8…
#> $ After.tax.Net.Profit.Growth.Rate                        <dbl> 0.6889795, 0.6…
#> $ Regular.Net.Profit.Growth.Rate                          <dbl> 0.6889795, 0.6…
#> $ Continuous.Net.Profit.Growth.Rate                       <dbl> 0.2175354, 0.2…
#> $ Total.Asset.Growth.Rate                                 <dbl> 4980000000, 61…
#> $ Net.Value.Growth.Rate                                   <dbl> 0.0003269773, …
#> $ Total.Asset.Return.Growth.Rate.Ratio                    <dbl> 0.2631000, 0.2…
#> $ Cash.Reinvestment..                                     <dbl> 0.3637253, 0.3…
#> $ Current.Ratio                                           <dbl> 0.002258963, 0…
#> $ Quick.Ratio                                             <dbl> 0.0012077551, …
#> $ Interest.Expense.Ratio                                  <dbl> 0.6299513, 0.6…
#> $ Total.debt.Total.net.worth                              <dbl> 0.021265924, 0…
#> $ Debt.ratio..                                            <dbl> 0.20757626, 0.…
#> $ Net.worth.Assets                                        <dbl> 0.7924237, 0.8…
#> $ Long.term.fund.suitability.ratio..A.                    <dbl> 0.005024455, 0…
#> $ Borrowing.dependency                                    <dbl> 0.3902844, 0.3…
#> $ Contingent.liabilities.Net.worth                        <dbl> 0.006478502, 0…
#> $ Operating.profit.Paid.in.capital                        <dbl> 0.09588483, 0.…
#> $ Net.profit.before.tax.Paid.in.capital                   <dbl> 0.1377573, 0.1…
#> $ Inventory.and.accounts.receivable.Net.value             <dbl> 0.3980357, 0.3…
#> $ Total.Asset.Turnover                                    <dbl> 0.08695652, 0.…
#> $ Accounts.Receivable.Turnover                            <dbl> 0.0018138841, …
#> $ Average.Collection.Days                                 <dbl> 0.003487364, 0…
#> $ Inventory.Turnover.Rate..times.                         <dbl> 0.0001820926, …
#> $ Fixed.Assets.Turnover.Frequency                         <dbl> 0.0001165007, …
#> $ Net.Worth.Turnover.Rate..times.                         <dbl> 0.03290323, 0.…
#> $ Revenue.per.person                                      <dbl> 0.034164182, 0…
#> $ Operating.profit.per.person                             <dbl> 0.3929129, 0.3…
#> $ Allocation.rate.per.person                              <dbl> 0.037135302, 0…
#> $ Working.Capital.to.Total.Assets                         <dbl> 0.6727753, 0.7…
#> $ Quick.Assets.Total.Assets                               <dbl> 0.16667296, 0.…
#> $ Current.Assets.Total.Assets                             <dbl> 0.1906430, 0.1…
#> $ Cash.Total.Assets                                       <dbl> 0.0040944060, …
#> $ Quick.Assets.Current.Liability                          <dbl> 0.001996771, 0…
#> $ Cash.Current.Liability                                  <dbl> 0.0001473360, …
#> $ Current.Liability.to.Assets                             <dbl> 0.14730845, 0.…
#> $ Operating.Funds.to.Liability                            <dbl> 0.3340152, 0.3…
#> $ Inventory.Working.Capital                               <dbl> 0.2769202, 0.2…
#> $ Inventory.Current.Liability                             <dbl> 0.001035990, 0…
#> $ Current.Liabilities.Liability                           <dbl> 0.6762692, 0.3…
#> $ Working.Capital.Equity                                  <dbl> 0.7212746, 0.7…
#> $ Current.Liabilities.Equity                              <dbl> 0.3390770, 0.3…
#> $ Long.term.Liability.to.Current.Assets                   <dbl> 0.025592368, 0…
#> $ Retained.Earnings.to.Total.Assets                       <dbl> 0.9032248, 0.9…
#> $ Total.income.Total.expense                              <dbl> 0.002021613, 0…
#> $ Total.expense.Assets                                    <dbl> 0.064855708, 0…
#> $ Current.Asset.Turnover.Rate                             <dbl> 701000000.0000…
#> $ Quick.Asset.Turnover.Rate                               <dbl> 6550000000.000…
#> $ Working.capitcal.Turnover.Rate                          <dbl> 0.5938305, 0.5…
#> $ Cash.Turnover.Rate                                      <dbl> 458000000.0000…
#> $ Cash.Flow.to.Sales                                      <dbl> 0.6715677, 0.6…
#> $ Fixed.Assets.to.Assets                                  <dbl> 0.4242058, 0.4…
#> $ Current.Liability.to.Liability                          <dbl> 0.6762692, 0.3…
#> $ Current.Liability.to.Equity                             <dbl> 0.3390770, 0.3…
#> $ Equity.to.Long.term.Liability                           <dbl> 0.1265495, 0.1…
#> $ Cash.Flow.to.Total.Assets                               <dbl> 0.6375554, 0.6…
#> $ Cash.Flow.to.Liability                                  <dbl> 0.4586091, 0.4…
#> $ CFO.to.Assets                                           <dbl> 0.5203819, 0.5…
#> $ Cash.Flow.to.Equity                                     <dbl> 0.3129049, 0.3…
#> $ Current.Liability.to.Current.Assets                     <dbl> 0.11825048, 0.…
#> $ Liability.Assets.Flag                                   <int> 0, 0, 0, 0, 0,…
#> $ Net.Income.to.Total.Assets                              <dbl> 0.7168453, 0.7…
#> $ Total.assets.to.GNP.price                               <dbl> 0.0092194400, …
#> $ No.credit.Interval                                      <dbl> 0.6228790, 0.6…
#> $ Gross.Profit.to.Sales                                   <dbl> 0.6014533, 0.6…
#> $ Net.Income.to.Stockholder.s.Equity                      <dbl> 0.8278902, 0.8…
#> $ Liability.to.Equity                                     <dbl> 0.2902019, 0.2…
#> $ Degree.of.Financial.Leverage..DFL.                      <dbl> 0.02660063, 0.…
#> $ Interest.Coverage.Ratio..Interest.expense.to.EBIT.      <dbl> 0.5640501, 0.5…
#> $ Net.Income.Flag                                         <int> 1, 1, 1, 1, 1,…
#> $ Equity.to.Liability                                     <dbl> 0.01646874, 0.…

About the data Attributes : (Y = Output feature, X = Input features)
Y - Bankrupt?: Class label
X1 - ROA(C) before interest and depreciation before interest: Return On Total Assets(C)
X2 - ROA(A) before interest and % after tax: Return On Total Assets(A)
X3 - ROA(B) before interest and depreciation after tax: Return On Total Assets(B)
X4 - Operating Gross Margin: Gross Profit/Net Sales
X5 - Realized Sales Gross Margin: Realized Gross Profit/Net Sales
X6 - Operating Profit Rate: Operating Income/Net Sales
X7 - Pre-tax net Interest Rate: Pre-Tax Income/Net Sales
X8 - After-tax net Interest Rate: Net Income/Net Sales
X9 - Non-industry income and expenditure/revenue: Net Non-operating Income Ratio
X10 - Continuous interest rate (after tax): Net Income-Exclude Disposal Gain or Loss/Net Sales
X11 - Operating Expense Rate: Operating Expenses/Net Sales
X12 - Research and development expense rate: (Research and Development Expenses)/Net Sales
X13 - Cash flow rate: Cash Flow from Operating/Current Liabilities
X14 - Interest-bearing debt interest rate: Interest-bearing Debt/Equity
X15 - Tax rate (A): Effective Tax Rate
X16 - Net Value Per Share (B): Book Value Per Share(B)
X17 - Net Value Per Share (A): Book Value Per Share(A)
X18 - Net Value Per Share (C): Book Value Per Share(C)
X19 - Persistent EPS in the Last Four Seasons: EPS-Net Income
X20 - Cash Flow Per Share
X21 - Revenue Per Share (Yuan ¥): Sales Per Share
X22 - Operating Profit Per Share (Yuan ¥): Operating Income Per Share
X23 - Per Share Net profit before tax (Yuan ¥): Pretax Income Per Share
X24 - Realized Sales Gross Profit Growth Rate
X25 - Operating Profit Growth Rate: Operating Income Growth
X26 - After-tax Net Profit Growth Rate: Net Income Growth
X27 - Regular Net Profit Growth Rate: Continuing Operating Income after Tax Growth
X28 - Continuous Net Profit Growth Rate: Net Income-Excluding Disposal Gain or Loss Growth
X29 - Total Asset Growth Rate: Total Asset Growth
X30 - Net Value Growth Rate: Total Equity Growth
X31 - Total Asset Return Growth Rate Ratio: Return on Total Asset Growth
X32 - Cash Reinvestment %: Cash Reinvestment Ratio
X33 - Current Ratio
X34 - Quick Ratio: Acid Test
X35 - Interest Expense Ratio: Interest Expenses/Total Revenue
X36 - Total debt/Total net worth: Total Liability/Equity Ratio
X37 - Debt ratio %: Liability/Total Assets
X38 - Net worth/Assets: Equity/Total Assets
X39 - Long-term fund suitability ratio (A): (Long-term Liability+Equity)/Fixed Assets
X40 - Borrowing dependency: Cost of Interest-bearing Debt
X41 - Contingent liabilities/Net worth: Contingent Liability/Equity
X42 - Operating profit/Paid-in capital: Operating Income/Capital
X43 - Net profit before tax/Paid-in capital: Pretax Income/Capital
X44 - Inventory and accounts receivable/Net value: (Inventory+Accounts Receivables)/Equity
X45 - Total Asset Turnover
X46 - Accounts Receivable Turnover
X47 - Average Collection Days: Days Receivable Outstanding
X48 - Inventory Turnover Rate (times)
X49 - Fixed Assets Turnover Frequency
X50 - Net Worth Turnover Rate (times): Equity Turnover
X51 - Revenue per person: Sales Per Employee
X52 - Operating profit per person: Operation Income Per Employee
X53 - Allocation rate per person: Fixed Assets Per Employee
X54 - Working Capital to Total Assets
X55 - Quick Assets/Total Assets
X56 - Current Assets/Total Assets
X57 - Cash/Total Assets
X58 - Quick Assets/Current Liability
X59 - Cash/Current Liability
X60 - Current Liability to Assets
X61 - Operating Funds to Liability
X62 - Inventory/Working Capital
X63 - Inventory/Current Liability
X64 - Current Liabilities/Liability
X65 - Working Capital/Equity
X66 - Current Liabilities/Equity
X67 - Long-term Liability to Current Assets
X68 - Retained Earnings to Total Assets
X69 - Total income/Total expense
X70 - Total expense/Assets
X71 - Current Asset Turnover Rate: Current Assets to Sales
X72 - Quick Asset Turnover Rate: Quick Assets to Sales
X73 - Working capitcal Turnover Rate: Working Capital to Sales
X74 - Cash Turnover Rate: Cash to Sales
X75 - Cash Flow to Sales
X76 - Fixed Assets to Assets
X77 - Current Liability to Liability
X78 - Current Liability to Equity
X79 - Equity to Long-term Liability
X80 - Cash Flow to Total Assets
X81 - Cash Flow to Liability
X82 - CFO to Assets
X83 - Cash Flow to Equity
X84 - Current Liability to Current Assets
X85 - Liability-Assets Flag: 1 if Total Liability exceeds Total Assets, 0 otherwise
X86 - Net Income to Total Assets
X87 - Total assets to GNP price
X88 - No-credit Interval
X89 - Gross Profit to Sales
X90 - Net Income to Stockholder’s Equity
X91 - Liability to Equity
X92 - Degree of Financial Leverage (DFL)
X93 - Interest Coverage Ratio (Interest expense to EBIT)
X94 - Net Income Flag: 1 if Net Income is Negative for the last two years, 0 otherwise
X95 - Equity to Liability

Check the data first with head()

head(company)
#>   Bankrupt. ROA.C..before.interest.and.depreciation.before.interest
#> 1         1                                               0.3705943
#> 2         1                                               0.4642909
#> 3         1                                               0.4260713
#> 4         1                                               0.3998440
#> 5         1                                               0.4650222
#> 6         1                                               0.3886803
#>   ROA.A..before.interest.and...after.tax
#> 1                              0.4243894
#> 2                              0.5382141
#> 3                              0.4990188
#> 4                              0.4512647
#> 5                              0.5384322
#> 6                              0.4151766
#>   ROA.B..before.interest.and.depreciation.after.tax Operating.Gross.Margin
#> 1                                         0.4057498              0.6014572
#> 2                                         0.5167300              0.6102351
#> 3                                         0.4722951              0.6014500
#> 4                                         0.4577333              0.5835411
#> 5                                         0.5222978              0.5987835
#> 6                                         0.4191338              0.5901714
#>   Realized.Sales.Gross.Margin Operating.Profit.Rate Pre.tax.net.Interest.Rate
#> 1                   0.6014572             0.9989692                 0.7968871
#> 2                   0.6102351             0.9989460                 0.7973802
#> 3                   0.6013635             0.9988574                 0.7964034
#> 4                   0.5835411             0.9986997                 0.7969670
#> 5                   0.5987835             0.9989731                 0.7973661
#> 6                   0.5902507             0.9987581                 0.7969032
#>   After.tax.net.Interest.Rate Non.industry.income.and.expenditure.revenue
#> 1                   0.8088094                                   0.3026464
#> 2                   0.8093007                                   0.3035564
#> 3                   0.8083875                                   0.3020352
#> 4                   0.8089656                                   0.3033495
#> 5                   0.8093037                                   0.3034750
#> 6                   0.8087706                                   0.3031158
#>   Continuous.interest.rate..after.tax. Operating.Expense.Rate
#> 1                            0.7809849           0.0001256969
#> 2                            0.7815060           0.0002897851
#> 3                            0.7802839           0.0002361297
#> 4                            0.7812410           0.0001078888
#> 5                            0.7815500  7890000000.0000000000
#> 6                            0.7810691           0.0001571500
#>   Research.and.development.expense.rate Cash.flow.rate
#> 1                                     0      0.4581431
#> 2                                     0      0.4618673
#> 3                              25500000      0.4585206
#> 4                                     0      0.4657054
#> 5                                     0      0.4627463
#> 6                                     0      0.4658615
#>   Interest.bearing.debt.interest.rate Tax.rate..A. Net.Value.Per.Share..B.
#> 1                        0.0007250725            0               0.1479499
#> 2                        0.0006470647            0               0.1822511
#> 3                        0.0007900790            0               0.1779107
#> 4                        0.0004490449            0               0.1541865
#> 5                        0.0006860686            0               0.1675024
#> 6                        0.0007160716            0               0.1555771
#>   Net.Value.Per.Share..A. Net.Value.Per.Share..C.
#> 1               0.1479499               0.1479499
#> 2               0.1822511               0.1822511
#> 3               0.1779107               0.1937129
#> 4               0.1541865               0.1541865
#> 5               0.1675024               0.1675024
#> 6               0.1555771               0.1555771
#>   Persistent.EPS.in.the.Last.Four.Seasons Cash.Flow.Per.Share
#> 1                               0.1691406           0.3116644
#> 2                               0.2089439           0.3181368
#> 3                               0.1805805           0.3071019
#> 4                               0.1937222           0.3216736
#> 5                               0.2125366           0.3191625
#> 6                               0.1744351           0.3253873
#>   Revenue.Per.Share..Yuan... Operating.Profit.Per.Share..Yuan...
#> 1                0.017559780                          0.09592053
#> 2                0.021144335                          0.09372201
#> 3                0.005944008                          0.09233776
#> 4                0.014368468                          0.07776240
#> 5                0.029689792                          0.09689765
#> 6                0.018104270                          0.07808810
#>   Per.Share.Net.profit.before.tax..Yuan...
#> 1                                0.1387362
#> 2                                0.1699179
#> 3                                0.1428033
#> 4                                0.1486028
#> 5                                0.1684115
#> 6                                0.1388115
#>   Realized.Sales.Gross.Profit.Growth.Rate Operating.Profit.Growth.Rate
#> 1                              0.02210228                    0.8481950
#> 2                              0.02208017                    0.8480879
#> 3                              0.02276010                    0.8480940
#> 4                              0.02204607                    0.8480055
#> 5                              0.02209591                    0.8482582
#> 6                              0.02156494                    0.8479828
#>   After.tax.Net.Profit.Growth.Rate Regular.Net.Profit.Growth.Rate
#> 1                        0.6889795                      0.6889795
#> 2                        0.6896929                      0.6897017
#> 3                        0.6894627                      0.6894697
#> 4                        0.6891095                      0.6891095
#> 5                        0.6896969                      0.6896969
#> 6                        0.6891051                      0.6891775
#>   Continuous.Net.Profit.Growth.Rate Total.Asset.Growth.Rate
#> 1                         0.2175354              4980000000
#> 2                         0.2176196              6110000000
#> 3                         0.2176013              7280000000
#> 4                         0.2175682              4880000000
#> 5                         0.2176256              5510000000
#> 6                         0.2175664               608000000
#>   Net.Value.Growth.Rate Total.Asset.Return.Growth.Rate.Ratio
#> 1          0.0003269773                            0.2631000
#> 2          0.0004430401                            0.2645158
#> 3          0.0003964253                            0.2641840
#> 4          0.0003824259                            0.2633712
#> 5          0.0004389476                            0.2652182
#> 6          0.0003517819                            0.2632500
#>   Cash.Reinvestment.. Current.Ratio  Quick.Ratio Interest.Expense.Ratio
#> 1           0.3637253   0.002258963 0.0012077551              0.6299513
#> 2           0.3767091   0.006016206 0.0040393668              0.6351725
#> 3           0.3689132   0.011542554 0.0053475602              0.6296314
#> 4           0.3840766   0.004194059 0.0028964911              0.6302284
#> 5           0.3796897   0.006022446 0.0037274466              0.6360550
#> 6           0.3880258   0.002740085 0.0008546614              0.6301838
#>   Total.debt.Total.net.worth Debt.ratio.. Net.worth.Assets
#> 1                0.021265924    0.2075763        0.7924237
#> 2                0.012502394    0.1711763        0.8288237
#> 3                0.021247686    0.2075158        0.7924842
#> 4                0.009572402    0.1514648        0.8485352
#> 5                0.005149600    0.1065091        0.8934909
#> 6                0.014213152    0.1804275        0.8195725
#>   Long.term.fund.suitability.ratio..A. Borrowing.dependency
#> 1                          0.005024455            0.3902844
#> 2                          0.005058882            0.3767600
#> 3                          0.005099899            0.3790929
#> 4                          0.005046924            0.3797427
#> 5                          0.005303319            0.3750254
#> 6                          0.004913193            0.3814482
#>   Contingent.liabilities.Net.worth Operating.profit.Paid.in.capital
#> 1                      0.006478502                       0.09588483
#> 2                      0.005835039                       0.09374338
#> 3                      0.006561982                       0.09231847
#> 4                      0.005365848                       0.07772729
#> 5                      0.006623525                       0.09692706
#> 6                      0.005749123                       0.07810185
#>   Net.profit.before.tax.Paid.in.capital
#> 1                             0.1377573
#> 2                             0.1689616
#> 3                             0.1480356
#> 4                             0.1475605
#> 5                             0.1674610
#> 6                             0.1378252
#>   Inventory.and.accounts.receivable.Net.value Total.Asset.Turnover
#> 1                                   0.3980357           0.08695652
#> 2                                   0.3977249           0.06446777
#> 3                                   0.4065805           0.01499250
#> 4                                   0.3979245           0.08995502
#> 5                                   0.4000788           0.17541229
#> 6                                   0.4004191           0.09595202
#>   Accounts.Receivable.Turnover Average.Collection.Days
#> 1                  0.001813884             0.003487364
#> 2                  0.001286356             0.004916808
#> 3                  0.001495338             0.004226849
#> 4                  0.001966056             0.003214967
#> 5                  0.001448673             0.004366891
#> 6                  0.001527802             0.004137189
#>   Inventory.Turnover.Rate..times. Fixed.Assets.Turnover.Frequency
#> 1                    0.0001820926                    0.0001165007
#> 2           9360000000.0000000000            719000000.0000000000
#> 3             65000000.0000000000           2650000000.0000000000
#> 4           7130000000.0000000000           9150000000.0000000000
#> 5                    0.0001633674                    0.0002935211
#> 6            650000000.0000000000           9300000000.0000000000
#>   Net.Worth.Turnover.Rate..times. Revenue.per.person
#> 1                      0.03290323        0.034164182
#> 2                      0.02548387        0.006888651
#> 3                      0.01338710        0.028996960
#> 4                      0.02806452        0.015463478
#> 5                      0.04016129        0.058111423
#> 6                      0.02967742        0.021300471
#>   Operating.profit.per.person Allocation.rate.per.person
#> 1                   0.3929129                 0.03713530
#> 2                   0.3915900                 0.01233497
#> 3                   0.3819678                 0.14101631
#> 4                   0.3784966                 0.02131999
#> 5                   0.3943715                 0.02398821
#> 6                   0.3775695                 0.03282877
#>   Working.Capital.to.Total.Assets Quick.Assets.Total.Assets
#> 1                       0.6727753                0.16667296
#> 2                       0.7511109                0.12723600
#> 3                       0.8295019                0.34020088
#> 4                       0.7257542                0.16157453
#> 5                       0.7518225                0.26032988
#> 6                       0.6867286                0.08026371
#>   Current.Assets.Total.Assets Cash.Total.Assets Quick.Assets.Current.Liability
#> 1                   0.1906430      0.0040944060                    0.001996771
#> 2                   0.1824191      0.0149477270                    0.004136030
#> 3                   0.6028057      0.0009909445                    0.006302481
#> 4                   0.2258149      0.0188506248                    0.002961238
#> 5                   0.3583802      0.0141609738                    0.004274771
#> 6                   0.2145360      0.0026452256                    0.000988425
#>   Cash.Current.Liability Current.Liability.to.Assets
#> 1           0.0001473360                  0.14730845
#> 2           0.0013839101                  0.05696283
#> 3  5340000000.0000000000                  0.09816206
#> 4           0.0010106464                  0.09871463
#> 5           0.0006804636                  0.11019485
#> 6           0.0001008563                  0.13900211
#>   Operating.Funds.to.Liability Inventory.Working.Capital
#> 1                    0.3340152                 0.2769202
#> 2                    0.3411060                 0.2896416
#> 3                    0.3367315                 0.2774555
#> 4                    0.3487164                 0.2765803
#> 5                    0.3446388                 0.2879127
#> 6                    0.3505631                 0.2766785
#>   Inventory.Current.Liability Current.Liabilities.Liability
#> 1                 0.001035990                     0.6762692
#> 2                 0.005209682                     0.3085886
#> 3                 0.013878786                     0.4460275
#> 4                 0.003540148                     0.6158484
#> 5                 0.004868570                     0.9750066
#> 6                 0.004879131                     0.7333519
#>   Working.Capital.Equity Current.Liabilities.Equity
#> 1              0.7212746                  0.3390770
#> 2              0.7319753                  0.3297401
#> 3              0.7427286                  0.3347769
#> 4              0.7298249                  0.3315090
#> 5              0.7319996                  0.3307263
#> 6              0.7252016                  0.3355344
#>   Long.term.Liability.to.Current.Assets Retained.Earnings.to.Total.Assets
#> 1                           0.025592368                         0.9032248
#> 2                           0.023946819                         0.9310652
#> 3                           0.003715116                         0.9099034
#> 4                           0.022165200                         0.9069022
#> 5                           0.000000000                         0.9138502
#> 6                           0.003772505                         0.9030413
#>   Total.income.Total.expense Total.expense.Assets Current.Asset.Turnover.Rate
#> 1                0.002021613           0.06485571        701000000.0000000000
#> 2                0.002225608           0.02551586                0.0001065198
#> 3                0.002060071           0.02138743                0.0017910937
#> 4                0.001831359           0.02416107       8140000000.0000000000
#> 5                0.002223930           0.02638525       6680000000.0000000000
#> 6                0.001865609           0.04009362       8010000000.0000000000
#>   Quick.Asset.Turnover.Rate Working.capitcal.Turnover.Rate Cash.Turnover.Rate
#> 1      6550000000.000000000                      0.5938305          458000000
#> 2      7700000000.000000000                      0.5939155         2490000000
#> 3               0.001022676                      0.5945019          761000000
#> 4      6050000000.000000000                      0.5938888         2030000000
#> 5      5050000000.000000000                      0.5939153          824000000
#> 6      2810000000.000000000                      0.5938458          295000000
#>   Cash.Flow.to.Sales Fixed.Assets.to.Assets Current.Liability.to.Liability
#> 1          0.6715677              0.4242058                      0.6762692
#> 2          0.6715699              0.4688281                      0.3085886
#> 3          0.6715713              0.2761792                      0.4460275
#> 4          0.6715192              0.5591440                      0.6158484
#> 5          0.6715631              0.3095549                      0.9750066
#> 6          0.6715676              0.6031935                      0.7333519
#>   Current.Liability.to.Equity Equity.to.Long.term.Liability
#> 1                   0.3390770                     0.1265495
#> 2                   0.3297401                     0.1209161
#> 3                   0.3347769                     0.1179223
#> 4                   0.3315090                     0.1207605
#> 5                   0.3307263                     0.1109332
#> 6                   0.3355344                     0.1129172
#>   Cash.Flow.to.Total.Assets Cash.Flow.to.Liability CFO.to.Assets
#> 1                 0.6375554              0.4586091     0.5203819
#> 2                 0.6411000              0.4590011     0.5671013
#> 3                 0.6427646              0.4592540     0.5384905
#> 4                 0.5790393              0.4485179     0.6041051
#> 5                 0.6223741              0.4544109     0.5784689
#> 6                 0.6374698              0.4584993     0.6221901
#>   Cash.Flow.to.Equity Current.Liability.to.Current.Assets Liability.Assets.Flag
#> 1           0.3129049                          0.11825048                     0
#> 2           0.3141631                          0.04777528                     0
#> 3           0.3145154                          0.02534649                     0
#> 4           0.3023823                          0.06724962                     0
#> 5           0.3115672                          0.04772537                     0
#> 6           0.3132685                          0.09952193                     0
#>   Net.Income.to.Total.Assets Total.assets.to.GNP.price No.credit.Interval
#> 1                  0.7168453               0.009219440          0.6228790
#> 2                  0.7952971               0.008323302          0.6236517
#> 3                  0.7746697               0.040002853          0.6238410
#> 4                  0.7395545               0.003252475          0.6229287
#> 5                  0.7950159               0.003877563          0.6235207
#> 6                  0.7104205               0.005277875          0.6226046
#>   Gross.Profit.to.Sales Net.Income.to.Stockholder.s.Equity Liability.to.Equity
#> 1             0.6014533                          0.8278902           0.2902019
#> 2             0.6102365                          0.8399693           0.2838460
#> 3             0.6014493                          0.8367743           0.2901885
#> 4             0.5835376                          0.8346971           0.2817212
#> 5             0.5987815                          0.8399727           0.2785138
#> 6             0.5901723                          0.8299390           0.2850871
#>   Degree.of.Financial.Leverage..DFL.
#> 1                         0.02660063
#> 2                         0.26457682
#> 3                         0.02655472
#> 4                         0.02669663
#> 5                         0.02475185
#> 6                         0.02667537
#>   Interest.Coverage.Ratio..Interest.expense.to.EBIT. Net.Income.Flag
#> 1                                          0.5640501               1
#> 2                                          0.5701749               1
#> 3                                          0.5637061               1
#> 4                                          0.5646634               1
#> 5                                          0.5756166               1
#> 6                                          0.5645383               1
#>   Equity.to.Liability
#> 1          0.01646874
#> 2          0.02079431
#> 3          0.01647411
#> 4          0.02398233
#> 5          0.03549020
#> 6          0.01953448

Data Cleansing

As we can see the data still not clean, the data type is quite not right. We are going to clean a bit with dplyr. Then check again with glimpse().

company_clean <-  company %>% 
  dplyr::select(Bankrupt., Operating.Gross.Margin , Operating.Profit.Rate,
                After.tax.net.Interest.Rate, Operating.Expense.Rate,
                Cash.flow.rate, Gross.Profit.to.Sales,
                Continuous.Net.Profit.Growth.Rate ,Net.worth.Assets,
                Cash.Turnover.Rate, Current.Liability.to.Current.Assets) %>%
  mutate_if(is.integer, as.factor) %>% 
  mutate(
    Bankrupt. = factor(Bankrupt. , levels = c(0, 1),
                    labels = c("Not Bankrupt",
                               "Bankrupt")))

glimpse(company_clean)
#> Rows: 6,819
#> Columns: 11
#> $ Bankrupt.                           <fct> Bankrupt, Bankrupt, Bankrupt, Bank…
#> $ Operating.Gross.Margin              <dbl> 0.6014572, 0.6102351, 0.6014500, 0…
#> $ Operating.Profit.Rate               <dbl> 0.9989692, 0.9989460, 0.9988574, 0…
#> $ After.tax.net.Interest.Rate         <dbl> 0.8088094, 0.8093007, 0.8083875, 0…
#> $ Operating.Expense.Rate              <dbl> 0.0001256969, 0.0002897851, 0.0002…
#> $ Cash.flow.rate                      <dbl> 0.4581431, 0.4618673, 0.4585206, 0…
#> $ Gross.Profit.to.Sales               <dbl> 0.6014533, 0.6102365, 0.6014493, 0…
#> $ Continuous.Net.Profit.Growth.Rate   <dbl> 0.2175354, 0.2176196, 0.2176013, 0…
#> $ Net.worth.Assets                    <dbl> 0.7924237, 0.8288237, 0.7924842, 0…
#> $ Cash.Turnover.Rate                  <dbl> 458000000.0000000000, 2490000000.0…
#> $ Current.Liability.to.Current.Assets <dbl> 0.11825048, 0.04777528, 0.02534649…

Check the missing value

company_clean %>% is.na() %>% colSums()
#>                           Bankrupt.              Operating.Gross.Margin 
#>                                   0                                   0 
#>               Operating.Profit.Rate         After.tax.net.Interest.Rate 
#>                                   0                                   0 
#>              Operating.Expense.Rate                      Cash.flow.rate 
#>                                   0                                   0 
#>               Gross.Profit.to.Sales   Continuous.Net.Profit.Growth.Rate 
#>                                   0                                   0 
#>                    Net.worth.Assets                  Cash.Turnover.Rate 
#>                                   0                                   0 
#> Current.Liability.to.Current.Assets 
#>                                   0

The dataset has no missing values. We can continue.

Pre-Processing Data

Check data proportion from target variable. The target variable or independent varible is Bankcrupt..

company_clean$Bankrupt. %>% table() %>% prop.table
#> .
#> Not Bankrupt     Bankrupt 
#>    0.9677372    0.0322628

It’s imbalanced. We are going handle it later.

Cross Validation

We split the dataset to Train Dataset dan Test Dataset. Train dataset is use to train the model. Test dataset is use to validate our model and see how well our model work to unseen data.

# Split the dataset to train data and test data

index <- sample(nrow(company_clean),
                nrow(company_clean)*0.8)
company_train <- company_clean[index,]
company_test <- company_clean[-index,]
# check proportion table for data train
company_train$Bankrupt. %>% table() %>% prop.table
#> .
#> Not Bankrupt     Bankrupt 
#>   0.96736939   0.03263061

Our data is imbalance. So we do upsampling to data train.

RNGkind(sample.kind = "Rounding")
set.seed(70)
library(rsample)

company_train_up <- upSample(
  x = company_train %>% dplyr::select(-Bankrupt.), 
  y= company_train$Bankrupt.,  
  yname = "Bankrupt."
  ) 
# check proportion table for data train
company_train_up$Bankrupt. %>% table() %>% prop.table
#> .
#> Not Bankrupt     Bankrupt 
#>          0.5          0.5

Modeling

We are going to make model with Logistic Regression. Modeling with glm(). The target / independent variable is Bankrupt.

model_lr <- glm(
  formula = Bankrupt.~ .,
  family = 'binomial',
  data = company_train_up
)

summary(model_lr)
#> 
#> Call:
#> glm(formula = Bankrupt. ~ ., family = "binomial", data = company_train_up)
#> 
#> Deviance Residuals: 
#>    Min      1Q  Median      3Q     Max  
#>  -8.49   -8.49    0.00    0.00    8.49  
#> 
#> Coefficients:
#>                                                           Estimate
#> (Intercept)                               6094775238684640.0000000
#> Operating.Gross.Margin              -133109908820438450186.0000000
#> Operating.Profit.Rate                    25955025910767376.0000000
#> After.tax.net.Interest.Rate              -3085474774989755.0000000
#> Operating.Expense.Rate                              -29196.2179969
#> Cash.flow.rate                          -17191486051624098.0000000
#> Gross.Profit.to.Sales                133104409970840371200.0000000
#> Continuous.Net.Profit.Growth.Rate       -25412735752718940.0000000
#> Net.worth.Assets                        -13145407137642790.0000000
#> Cash.Turnover.Rate                                  -39723.7341970
#> Current.Liability.to.Current.Assets      -4288783996462404.0000000
#>                                                         Std. Error     z value
#> (Intercept)                                       71635059.5184516    85080899
#> Operating.Gross.Margin                        314203971127.2032471  -423641714
#> Operating.Profit.Rate                            111882058.2161452   231985596
#> After.tax.net.Interest.Rate                      102880465.4576698   -29990871
#> Operating.Expense.Rate                                   0.0002063  -141522648
#> Cash.flow.rate                                    47563076.3376580  -361446050
#> Gross.Profit.to.Sales                         314202809182.2778931   423625780
#> Continuous.Net.Profit.Growth.Rate                 78027681.7113657  -325688720
#> Net.worth.Assets                                  12163946.2788169 -1080686057
#> Cash.Turnover.Rate                                       0.0002436  -163092051
#> Current.Liability.to.Current.Assets               17631621.5918203  -243243877
#>                                                Pr(>|z|)    
#> (Intercept)                         <0.0000000000000002 ***
#> Operating.Gross.Margin              <0.0000000000000002 ***
#> Operating.Profit.Rate               <0.0000000000000002 ***
#> After.tax.net.Interest.Rate         <0.0000000000000002 ***
#> Operating.Expense.Rate              <0.0000000000000002 ***
#> Cash.flow.rate                      <0.0000000000000002 ***
#> Gross.Profit.to.Sales               <0.0000000000000002 ***
#> Continuous.Net.Profit.Growth.Rate   <0.0000000000000002 ***
#> Net.worth.Assets                    <0.0000000000000002 ***
#> Cash.Turnover.Rate                  <0.0000000000000002 ***
#> Current.Liability.to.Current.Assets <0.0000000000000002 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance:  14631  on 10553  degrees of freedom
#> Residual deviance: 272995  on 10543  degrees of freedom
#> AIC: 273017
#> 
#> Number of Fisher Scoring iterations: 25

Model Fitting

library(MASS)
model2 <- stepAIC(model_lr, direction = "backward", trace = F)

summary(model2)
#> 
#> Call:
#> glm(formula = Bankrupt. ~ Operating.Gross.Margin + Operating.Expense.Rate + 
#>     Cash.flow.rate + Continuous.Net.Profit.Growth.Rate + Net.worth.Assets + 
#>     Cash.Turnover.Rate + Current.Liability.to.Current.Assets, 
#>     family = "binomial", data = company_train_up)
#> 
#> Deviance Residuals: 
#>     Min       1Q   Median       3Q      Max  
#> -7.0975  -0.6171   0.0019   0.6579   2.2223  
#> 
#> Coefficients:
#>                                                Estimate          Std. Error
#> (Intercept)                          78.467921268228167   3.183195861246445
#> Operating.Gross.Margin              -34.387925481701700   2.497008570076800
#> Operating.Expense.Rate               -0.000000000100512   0.000000000008488
#> Cash.flow.rate                      -47.768048370968096   2.695796389837017
#> Continuous.Net.Profit.Growth.Rate   -65.783236945054966  11.085822098942645
#> Net.worth.Assets                    -25.256591810520330   0.689606865994374
#> Cash.Turnover.Rate                   -0.000000000144141   0.000000000010371
#> Current.Liability.to.Current.Assets  16.635234166348820   1.311227432149053
#>                                     z value             Pr(>|z|)    
#> (Intercept)                          24.651 < 0.0000000000000002 ***
#> Operating.Gross.Margin              -13.772 < 0.0000000000000002 ***
#> Operating.Expense.Rate              -11.841 < 0.0000000000000002 ***
#> Cash.flow.rate                      -17.719 < 0.0000000000000002 ***
#> Continuous.Net.Profit.Growth.Rate    -5.934        0.00000000296 ***
#> Net.worth.Assets                    -36.625 < 0.0000000000000002 ***
#> Cash.Turnover.Rate                  -13.899 < 0.0000000000000002 ***
#> Current.Liability.to.Current.Assets  12.687 < 0.0000000000000002 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 14631.0  on 10553  degrees of freedom
#> Residual deviance:  8890.8  on 10546  degrees of freedom
#> AIC: 8906.8
#> 
#> Number of Fisher Scoring iterations: 6

Prediction

With Model2 from stepwise, we are going to predict with data test.

company_test$prob_bankrupt<-predict(model2, type = "response", newdata = company_test)

Plot the prediction

ggplot(company_test, aes(x=prob_bankrupt)) +
  geom_density(lwd=0.5) +
  labs(title = "Distribution of Probability Prediction Data") +
  theme_minimal()

company_test$pred_bankrupt <- factor(ifelse(company_test$prob_bankrupt > 0.5, "Bankrupt","Not Bankrupt"))
company_test[1:10, c("pred_bankrupt", "Bankrupt.")]
#>    pred_bankrupt    Bankrupt.
#> 9   Not Bankrupt Not Bankrupt
#> 19  Not Bankrupt Not Bankrupt
#> 23  Not Bankrupt Not Bankrupt
#> 29  Not Bankrupt Not Bankrupt
#> 30      Bankrupt     Bankrupt
#> 34  Not Bankrupt Not Bankrupt
#> 55      Bankrupt     Bankrupt
#> 57      Bankrupt     Bankrupt
#> 58      Bankrupt     Bankrupt
#> 71  Not Bankrupt Not Bankrupt

Model Evaluation

Model Evaluation with Confusion Matrix

library(caret)
CM <- confusionMatrix(company_test$pred_bankrupt, company_test$Bankrupt., positive = "Bankrupt")
CM
#> Confusion Matrix and Statistics
#> 
#>               Reference
#> Prediction     Not Bankrupt Bankrupt
#>   Not Bankrupt         1064        5
#>   Bankrupt              258       37
#>                                              
#>                Accuracy : 0.8072             
#>                  95% CI : (0.7852, 0.8278)   
#>     No Information Rate : 0.9692             
#>     P-Value [Acc > NIR] : 1                  
#>                                              
#>                   Kappa : 0.1751             
#>                                              
#>  Mcnemar's Test P-Value : <0.0000000000000002
#>                                              
#>             Sensitivity : 0.88095            
#>             Specificity : 0.80484            
#>          Pos Pred Value : 0.12542            
#>          Neg Pred Value : 0.99532            
#>              Prevalence : 0.03079            
#>          Detection Rate : 0.02713            
#>    Detection Prevalence : 0.21628            
#>       Balanced Accuracy : 0.84290            
#>                                              
#>        'Positive' Class : Bankrupt           
#> 
  • Sensitivity : from all the positive classes, how many the actual positive
  • Specificity : from all the negative classes, how many the actual negative
  • Pos Pred Value : from all the positive classes, how many the predict positive correct
  • Pos Pred Value : from all the negative classes, how many the predict negative correct

For this case , we want sensitifity as high as possible. Because we concern the actual positive.

Sensitivity : 0.78571. This already good model.

Model Interpretation

# Odds ratio all coefficients
inv.logit(model2$coefficients) %>% 
  data.frame() 
#>                                                                         .
#> (Intercept)                         1.00000000000000000000000000000000000
#> Operating.Gross.Margin              0.00000000000000116282328356635084455
#> Operating.Expense.Rate              0.49999999997487204472790267573145684
#> Cash.flow.rate                      0.00000000000000000000179721560640081
#> Continuous.Net.Profit.Growth.Rate   0.00000000000000000000000000002695896
#> Net.worth.Assets                    0.00000000001074487939130762646344069
#> Cash.Turnover.Rate                  0.49999999996396476964477528781571891
#> Current.Liability.to.Current.Assets 0.99999994037758399567650258177309297

We can interpertate :

The company’s bankrupt probability would rise 50% when Operating.Expense.Rate and Cash.Turnover.Rate every one unit increase. With anything else being equal.

K-Nearest Neighbour

Cross Validation

set.seed(50)

index <- rsample::initial_split(data=company_train_up,  
                       prop = 0.8, 
                       strata = Bankrupt.)
com_train <- training(index)
com_test <- testing(index)

Check the proportion

prop.table(table(com_train$Bankrupt.))
#> 
#> Not Bankrupt     Bankrupt 
#>          0.5          0.5
prop.table(table(com_test$Bankrupt.))
#> 
#> Not Bankrupt     Bankrupt 
#>          0.5          0.5

Prepare train data and test data

# Train Data predictor
train_x <- com_train %>% select_if(is.numeric) 

# Train Data target
train_y <- com_train %>% dplyr::select(Bankrupt.) 

# Test Data predictor
test_x <- com_test %>% select_if(is.numeric)

# Test Data target
test_y <-  com_test %>% dplyr::select(Bankrupt.)

Scalling train data and test data

train_x <- scale(train_x)

test_x <- scale(test_x, 
                center=attr(train_x, "scaled:center"),
                scale=attr(train_x, "scaled:scale")) 

Choose K value from rooted observasi

sqrt(nrow(train_x))
#> [1] 91.88036

We use k = 91

library(class)

com_pred <- knn(train = train_x, 
    test = test_x, 
    cl = train_y$Bankrupt., 
    k=91) 

Evaluation Model

confusionMatrix(data=com_pred, 
                reference=test_y$Bankrupt., 
                positive="Bankrupt")
#> Confusion Matrix and Statistics
#> 
#>               Reference
#> Prediction     Not Bankrupt Bankrupt
#>   Not Bankrupt          842      113
#>   Bankrupt              214      943
#>                                                
#>                Accuracy : 0.8452               
#>                  95% CI : (0.829, 0.8603)      
#>     No Information Rate : 0.5                  
#>     P-Value [Acc > NIR] : < 0.00000000000000022
#>                                                
#>                   Kappa : 0.6903               
#>                                                
#>  Mcnemar's Test P-Value : 0.00000003202        
#>                                                
#>             Sensitivity : 0.8930               
#>             Specificity : 0.7973               
#>          Pos Pred Value : 0.8150               
#>          Neg Pred Value : 0.8817               
#>              Prevalence : 0.5000               
#>          Detection Rate : 0.4465               
#>    Detection Prevalence : 0.5478               
#>       Balanced Accuracy : 0.8452               
#>                                                
#>        'Positive' Class : Bankrupt             
#> 

with Sensitivity : 0.9015, it has better value than Logistic Regression.

Conclusion

We already try this problem with Logistic Regression and K-Nearest Neighbour. Because we want focus on the bankrupt, we choose sensitivity as our metric evaluation. We want the lowest False Negative (the condition when we predict not bankrupt, but actually bankrupt) as low as possible. In my opinion, we can use K-Nearest Neighbour model.