knitr::include_graphics('bank-bankrupt-creative-concept-banking-house-building-fenced-warning-line-signal-tape-inscription-caution-board-as-46776262.jpg')Objective
The data were collected from the Taiwan Economic Journal for the years 1999 to 2009. Company bankruptcy was defined based on the business regulations of the Taiwan Stock Exchange. From this, we are going to predict the company going to bankrupt or not.
Libraries
library(dplyr)
library(tidyverse)
library(gmodels)
library(gtools)
library(class)
library(caret)Logistic Regression
Data Import
I use dataset from Kaggle. It’s called “Company Bankruptcy Prediction”. Or you can download from here.
company <- read.csv('data.csv')
glimpse(company)#> Rows: 6,819
#> Columns: 96
#> $ Bankrupt. <int> 1, 1, 1, 1, 1,…
#> $ ROA.C..before.interest.and.depreciation.before.interest <dbl> 0.3705943, 0.4…
#> $ ROA.A..before.interest.and...after.tax <dbl> 0.4243894, 0.5…
#> $ ROA.B..before.interest.and.depreciation.after.tax <dbl> 0.4057498, 0.5…
#> $ Operating.Gross.Margin <dbl> 0.6014572, 0.6…
#> $ Realized.Sales.Gross.Margin <dbl> 0.6014572, 0.6…
#> $ Operating.Profit.Rate <dbl> 0.9989692, 0.9…
#> $ Pre.tax.net.Interest.Rate <dbl> 0.7968871, 0.7…
#> $ After.tax.net.Interest.Rate <dbl> 0.8088094, 0.8…
#> $ Non.industry.income.and.expenditure.revenue <dbl> 0.3026464, 0.3…
#> $ Continuous.interest.rate..after.tax. <dbl> 0.7809849, 0.7…
#> $ Operating.Expense.Rate <dbl> 0.0001256969, …
#> $ Research.and.development.expense.rate <dbl> 0, 0, 25500000…
#> $ Cash.flow.rate <dbl> 0.4581431, 0.4…
#> $ Interest.bearing.debt.interest.rate <dbl> 0.0007250725, …
#> $ Tax.rate..A. <dbl> 0.000000000, 0…
#> $ Net.Value.Per.Share..B. <dbl> 0.1479499, 0.1…
#> $ Net.Value.Per.Share..A. <dbl> 0.1479499, 0.1…
#> $ Net.Value.Per.Share..C. <dbl> 0.1479499, 0.1…
#> $ Persistent.EPS.in.the.Last.Four.Seasons <dbl> 0.1691406, 0.2…
#> $ Cash.Flow.Per.Share <dbl> 0.3116644, 0.3…
#> $ Revenue.Per.Share..Yuan... <dbl> 0.017559780, 0…
#> $ Operating.Profit.Per.Share..Yuan... <dbl> 0.09592053, 0.…
#> $ Per.Share.Net.profit.before.tax..Yuan... <dbl> 0.1387362, 0.1…
#> $ Realized.Sales.Gross.Profit.Growth.Rate <dbl> 0.02210228, 0.…
#> $ Operating.Profit.Growth.Rate <dbl> 0.8481950, 0.8…
#> $ After.tax.Net.Profit.Growth.Rate <dbl> 0.6889795, 0.6…
#> $ Regular.Net.Profit.Growth.Rate <dbl> 0.6889795, 0.6…
#> $ Continuous.Net.Profit.Growth.Rate <dbl> 0.2175354, 0.2…
#> $ Total.Asset.Growth.Rate <dbl> 4980000000, 61…
#> $ Net.Value.Growth.Rate <dbl> 0.0003269773, …
#> $ Total.Asset.Return.Growth.Rate.Ratio <dbl> 0.2631000, 0.2…
#> $ Cash.Reinvestment.. <dbl> 0.3637253, 0.3…
#> $ Current.Ratio <dbl> 0.002258963, 0…
#> $ Quick.Ratio <dbl> 0.0012077551, …
#> $ Interest.Expense.Ratio <dbl> 0.6299513, 0.6…
#> $ Total.debt.Total.net.worth <dbl> 0.021265924, 0…
#> $ Debt.ratio.. <dbl> 0.20757626, 0.…
#> $ Net.worth.Assets <dbl> 0.7924237, 0.8…
#> $ Long.term.fund.suitability.ratio..A. <dbl> 0.005024455, 0…
#> $ Borrowing.dependency <dbl> 0.3902844, 0.3…
#> $ Contingent.liabilities.Net.worth <dbl> 0.006478502, 0…
#> $ Operating.profit.Paid.in.capital <dbl> 0.09588483, 0.…
#> $ Net.profit.before.tax.Paid.in.capital <dbl> 0.1377573, 0.1…
#> $ Inventory.and.accounts.receivable.Net.value <dbl> 0.3980357, 0.3…
#> $ Total.Asset.Turnover <dbl> 0.08695652, 0.…
#> $ Accounts.Receivable.Turnover <dbl> 0.0018138841, …
#> $ Average.Collection.Days <dbl> 0.003487364, 0…
#> $ Inventory.Turnover.Rate..times. <dbl> 0.0001820926, …
#> $ Fixed.Assets.Turnover.Frequency <dbl> 0.0001165007, …
#> $ Net.Worth.Turnover.Rate..times. <dbl> 0.03290323, 0.…
#> $ Revenue.per.person <dbl> 0.034164182, 0…
#> $ Operating.profit.per.person <dbl> 0.3929129, 0.3…
#> $ Allocation.rate.per.person <dbl> 0.037135302, 0…
#> $ Working.Capital.to.Total.Assets <dbl> 0.6727753, 0.7…
#> $ Quick.Assets.Total.Assets <dbl> 0.16667296, 0.…
#> $ Current.Assets.Total.Assets <dbl> 0.1906430, 0.1…
#> $ Cash.Total.Assets <dbl> 0.0040944060, …
#> $ Quick.Assets.Current.Liability <dbl> 0.001996771, 0…
#> $ Cash.Current.Liability <dbl> 0.0001473360, …
#> $ Current.Liability.to.Assets <dbl> 0.14730845, 0.…
#> $ Operating.Funds.to.Liability <dbl> 0.3340152, 0.3…
#> $ Inventory.Working.Capital <dbl> 0.2769202, 0.2…
#> $ Inventory.Current.Liability <dbl> 0.001035990, 0…
#> $ Current.Liabilities.Liability <dbl> 0.6762692, 0.3…
#> $ Working.Capital.Equity <dbl> 0.7212746, 0.7…
#> $ Current.Liabilities.Equity <dbl> 0.3390770, 0.3…
#> $ Long.term.Liability.to.Current.Assets <dbl> 0.025592368, 0…
#> $ Retained.Earnings.to.Total.Assets <dbl> 0.9032248, 0.9…
#> $ Total.income.Total.expense <dbl> 0.002021613, 0…
#> $ Total.expense.Assets <dbl> 0.064855708, 0…
#> $ Current.Asset.Turnover.Rate <dbl> 701000000.0000…
#> $ Quick.Asset.Turnover.Rate <dbl> 6550000000.000…
#> $ Working.capitcal.Turnover.Rate <dbl> 0.5938305, 0.5…
#> $ Cash.Turnover.Rate <dbl> 458000000.0000…
#> $ Cash.Flow.to.Sales <dbl> 0.6715677, 0.6…
#> $ Fixed.Assets.to.Assets <dbl> 0.4242058, 0.4…
#> $ Current.Liability.to.Liability <dbl> 0.6762692, 0.3…
#> $ Current.Liability.to.Equity <dbl> 0.3390770, 0.3…
#> $ Equity.to.Long.term.Liability <dbl> 0.1265495, 0.1…
#> $ Cash.Flow.to.Total.Assets <dbl> 0.6375554, 0.6…
#> $ Cash.Flow.to.Liability <dbl> 0.4586091, 0.4…
#> $ CFO.to.Assets <dbl> 0.5203819, 0.5…
#> $ Cash.Flow.to.Equity <dbl> 0.3129049, 0.3…
#> $ Current.Liability.to.Current.Assets <dbl> 0.11825048, 0.…
#> $ Liability.Assets.Flag <int> 0, 0, 0, 0, 0,…
#> $ Net.Income.to.Total.Assets <dbl> 0.7168453, 0.7…
#> $ Total.assets.to.GNP.price <dbl> 0.0092194400, …
#> $ No.credit.Interval <dbl> 0.6228790, 0.6…
#> $ Gross.Profit.to.Sales <dbl> 0.6014533, 0.6…
#> $ Net.Income.to.Stockholder.s.Equity <dbl> 0.8278902, 0.8…
#> $ Liability.to.Equity <dbl> 0.2902019, 0.2…
#> $ Degree.of.Financial.Leverage..DFL. <dbl> 0.02660063, 0.…
#> $ Interest.Coverage.Ratio..Interest.expense.to.EBIT. <dbl> 0.5640501, 0.5…
#> $ Net.Income.Flag <int> 1, 1, 1, 1, 1,…
#> $ Equity.to.Liability <dbl> 0.01646874, 0.…
About the data Attributes : (Y = Output feature, X = Input features) Y - Bankrupt?: Class label X1 - ROA(C) before interest and depreciation before interest: Return On Total Assets(C) X2 - ROA(A) before interest and % after tax: Return On Total Assets(A) X3 - ROA(B) before interest and depreciation after tax: Return On Total Assets(B) X4 - Operating Gross Margin: Gross Profit/Net Sales X5 - Realized Sales Gross Margin: Realized Gross Profit/Net Sales X6 - Operating Profit Rate: Operating Income/Net Sales X7 - Pre-tax net Interest Rate: Pre-Tax Income/Net Sales X8 - After-tax net Interest Rate: Net Income/Net Sales X9 - Non-industry income and expenditure/revenue: Net Non-operating Income Ratio X10 - Continuous interest rate (after tax): Net Income-Exclude Disposal Gain or Loss/Net Sales X11 - Operating Expense Rate: Operating Expenses/Net Sales X12 - Research and development expense rate: (Research and Development Expenses)/Net Sales X13 - Cash flow rate: Cash Flow from Operating/Current Liabilities X14 - Interest-bearing debt interest rate: Interest-bearing Debt/Equity X15 - Tax rate (A): Effective Tax Rate X16 - Net Value Per Share (B): Book Value Per Share(B) X17 - Net Value Per Share (A): Book Value Per Share(A) X18 - Net Value Per Share (C): Book Value Per Share(C) X19 - Persistent EPS in the Last Four Seasons: EPS-Net Income X20 - Cash Flow Per Share X21 - Revenue Per Share (Yuan ¥): Sales Per Share X22 - Operating Profit Per Share (Yuan ¥): Operating Income Per Share X23 - Per Share Net profit before tax (Yuan ¥): Pretax Income Per Share X24 - Realized Sales Gross Profit Growth Rate X25 - Operating Profit Growth Rate: Operating Income Growth X26 - After-tax Net Profit Growth Rate: Net Income Growth X27 - Regular Net Profit Growth Rate: Continuing Operating Income after Tax Growth X28 - Continuous Net Profit Growth Rate: Net Income-Excluding Disposal Gain or Loss Growth X29 - Total Asset Growth Rate: Total Asset Growth X30 - Net Value Growth Rate: Total Equity Growth X31 - Total Asset Return Growth Rate Ratio: Return on Total Asset Growth X32 - Cash Reinvestment %: Cash Reinvestment Ratio X33 - Current Ratio X34 - Quick Ratio: Acid Test X35 - Interest Expense Ratio: Interest Expenses/Total Revenue X36 - Total debt/Total net worth: Total Liability/Equity Ratio X37 - Debt ratio %: Liability/Total Assets X38 - Net worth/Assets: Equity/Total Assets X39 - Long-term fund suitability ratio (A): (Long-term Liability+Equity)/Fixed Assets X40 - Borrowing dependency: Cost of Interest-bearing Debt X41 - Contingent liabilities/Net worth: Contingent Liability/Equity X42 - Operating profit/Paid-in capital: Operating Income/Capital X43 - Net profit before tax/Paid-in capital: Pretax Income/Capital X44 - Inventory and accounts receivable/Net value: (Inventory+Accounts Receivables)/Equity X45 - Total Asset Turnover X46 - Accounts Receivable Turnover X47 - Average Collection Days: Days Receivable Outstanding X48 - Inventory Turnover Rate (times) X49 - Fixed Assets Turnover Frequency X50 - Net Worth Turnover Rate (times): Equity Turnover X51 - Revenue per person: Sales Per Employee X52 - Operating profit per person: Operation Income Per Employee X53 - Allocation rate per person: Fixed Assets Per Employee X54 - Working Capital to Total Assets X55 - Quick Assets/Total Assets X56 - Current Assets/Total Assets X57 - Cash/Total Assets X58 - Quick Assets/Current Liability X59 - Cash/Current Liability X60 - Current Liability to Assets X61 - Operating Funds to Liability X62 - Inventory/Working Capital X63 - Inventory/Current Liability X64 - Current Liabilities/Liability X65 - Working Capital/Equity X66 - Current Liabilities/Equity X67 - Long-term Liability to Current Assets X68 - Retained Earnings to Total Assets X69 - Total income/Total expense X70 - Total expense/Assets X71 - Current Asset Turnover Rate: Current Assets to Sales X72 - Quick Asset Turnover Rate: Quick Assets to Sales X73 - Working capitcal Turnover Rate: Working Capital to Sales X74 - Cash Turnover Rate: Cash to Sales X75 - Cash Flow to Sales X76 - Fixed Assets to Assets X77 - Current Liability to Liability X78 - Current Liability to Equity X79 - Equity to Long-term Liability X80 - Cash Flow to Total Assets X81 - Cash Flow to Liability X82 - CFO to Assets X83 - Cash Flow to Equity X84 - Current Liability to Current Assets X85 - Liability-Assets Flag: 1 if Total Liability exceeds Total Assets, 0 otherwise X86 - Net Income to Total Assets X87 - Total assets to GNP price X88 - No-credit Interval X89 - Gross Profit to Sales X90 - Net Income to Stockholder’s Equity X91 - Liability to Equity X92 - Degree of Financial Leverage (DFL) X93 - Interest Coverage Ratio (Interest expense to EBIT) X94 - Net Income Flag: 1 if Net Income is Negative for the last two years, 0 otherwise X95 - Equity to Liability
Check the data first with head()
head(company)#> Bankrupt. ROA.C..before.interest.and.depreciation.before.interest
#> 1 1 0.3705943
#> 2 1 0.4642909
#> 3 1 0.4260713
#> 4 1 0.3998440
#> 5 1 0.4650222
#> 6 1 0.3886803
#> ROA.A..before.interest.and...after.tax
#> 1 0.4243894
#> 2 0.5382141
#> 3 0.4990188
#> 4 0.4512647
#> 5 0.5384322
#> 6 0.4151766
#> ROA.B..before.interest.and.depreciation.after.tax Operating.Gross.Margin
#> 1 0.4057498 0.6014572
#> 2 0.5167300 0.6102351
#> 3 0.4722951 0.6014500
#> 4 0.4577333 0.5835411
#> 5 0.5222978 0.5987835
#> 6 0.4191338 0.5901714
#> Realized.Sales.Gross.Margin Operating.Profit.Rate Pre.tax.net.Interest.Rate
#> 1 0.6014572 0.9989692 0.7968871
#> 2 0.6102351 0.9989460 0.7973802
#> 3 0.6013635 0.9988574 0.7964034
#> 4 0.5835411 0.9986997 0.7969670
#> 5 0.5987835 0.9989731 0.7973661
#> 6 0.5902507 0.9987581 0.7969032
#> After.tax.net.Interest.Rate Non.industry.income.and.expenditure.revenue
#> 1 0.8088094 0.3026464
#> 2 0.8093007 0.3035564
#> 3 0.8083875 0.3020352
#> 4 0.8089656 0.3033495
#> 5 0.8093037 0.3034750
#> 6 0.8087706 0.3031158
#> Continuous.interest.rate..after.tax. Operating.Expense.Rate
#> 1 0.7809849 0.0001256969
#> 2 0.7815060 0.0002897851
#> 3 0.7802839 0.0002361297
#> 4 0.7812410 0.0001078888
#> 5 0.7815500 7890000000.0000000000
#> 6 0.7810691 0.0001571500
#> Research.and.development.expense.rate Cash.flow.rate
#> 1 0 0.4581431
#> 2 0 0.4618673
#> 3 25500000 0.4585206
#> 4 0 0.4657054
#> 5 0 0.4627463
#> 6 0 0.4658615
#> Interest.bearing.debt.interest.rate Tax.rate..A. Net.Value.Per.Share..B.
#> 1 0.0007250725 0 0.1479499
#> 2 0.0006470647 0 0.1822511
#> 3 0.0007900790 0 0.1779107
#> 4 0.0004490449 0 0.1541865
#> 5 0.0006860686 0 0.1675024
#> 6 0.0007160716 0 0.1555771
#> Net.Value.Per.Share..A. Net.Value.Per.Share..C.
#> 1 0.1479499 0.1479499
#> 2 0.1822511 0.1822511
#> 3 0.1779107 0.1937129
#> 4 0.1541865 0.1541865
#> 5 0.1675024 0.1675024
#> 6 0.1555771 0.1555771
#> Persistent.EPS.in.the.Last.Four.Seasons Cash.Flow.Per.Share
#> 1 0.1691406 0.3116644
#> 2 0.2089439 0.3181368
#> 3 0.1805805 0.3071019
#> 4 0.1937222 0.3216736
#> 5 0.2125366 0.3191625
#> 6 0.1744351 0.3253873
#> Revenue.Per.Share..Yuan... Operating.Profit.Per.Share..Yuan...
#> 1 0.017559780 0.09592053
#> 2 0.021144335 0.09372201
#> 3 0.005944008 0.09233776
#> 4 0.014368468 0.07776240
#> 5 0.029689792 0.09689765
#> 6 0.018104270 0.07808810
#> Per.Share.Net.profit.before.tax..Yuan...
#> 1 0.1387362
#> 2 0.1699179
#> 3 0.1428033
#> 4 0.1486028
#> 5 0.1684115
#> 6 0.1388115
#> Realized.Sales.Gross.Profit.Growth.Rate Operating.Profit.Growth.Rate
#> 1 0.02210228 0.8481950
#> 2 0.02208017 0.8480879
#> 3 0.02276010 0.8480940
#> 4 0.02204607 0.8480055
#> 5 0.02209591 0.8482582
#> 6 0.02156494 0.8479828
#> After.tax.Net.Profit.Growth.Rate Regular.Net.Profit.Growth.Rate
#> 1 0.6889795 0.6889795
#> 2 0.6896929 0.6897017
#> 3 0.6894627 0.6894697
#> 4 0.6891095 0.6891095
#> 5 0.6896969 0.6896969
#> 6 0.6891051 0.6891775
#> Continuous.Net.Profit.Growth.Rate Total.Asset.Growth.Rate
#> 1 0.2175354 4980000000
#> 2 0.2176196 6110000000
#> 3 0.2176013 7280000000
#> 4 0.2175682 4880000000
#> 5 0.2176256 5510000000
#> 6 0.2175664 608000000
#> Net.Value.Growth.Rate Total.Asset.Return.Growth.Rate.Ratio
#> 1 0.0003269773 0.2631000
#> 2 0.0004430401 0.2645158
#> 3 0.0003964253 0.2641840
#> 4 0.0003824259 0.2633712
#> 5 0.0004389476 0.2652182
#> 6 0.0003517819 0.2632500
#> Cash.Reinvestment.. Current.Ratio Quick.Ratio Interest.Expense.Ratio
#> 1 0.3637253 0.002258963 0.0012077551 0.6299513
#> 2 0.3767091 0.006016206 0.0040393668 0.6351725
#> 3 0.3689132 0.011542554 0.0053475602 0.6296314
#> 4 0.3840766 0.004194059 0.0028964911 0.6302284
#> 5 0.3796897 0.006022446 0.0037274466 0.6360550
#> 6 0.3880258 0.002740085 0.0008546614 0.6301838
#> Total.debt.Total.net.worth Debt.ratio.. Net.worth.Assets
#> 1 0.021265924 0.2075763 0.7924237
#> 2 0.012502394 0.1711763 0.8288237
#> 3 0.021247686 0.2075158 0.7924842
#> 4 0.009572402 0.1514648 0.8485352
#> 5 0.005149600 0.1065091 0.8934909
#> 6 0.014213152 0.1804275 0.8195725
#> Long.term.fund.suitability.ratio..A. Borrowing.dependency
#> 1 0.005024455 0.3902844
#> 2 0.005058882 0.3767600
#> 3 0.005099899 0.3790929
#> 4 0.005046924 0.3797427
#> 5 0.005303319 0.3750254
#> 6 0.004913193 0.3814482
#> Contingent.liabilities.Net.worth Operating.profit.Paid.in.capital
#> 1 0.006478502 0.09588483
#> 2 0.005835039 0.09374338
#> 3 0.006561982 0.09231847
#> 4 0.005365848 0.07772729
#> 5 0.006623525 0.09692706
#> 6 0.005749123 0.07810185
#> Net.profit.before.tax.Paid.in.capital
#> 1 0.1377573
#> 2 0.1689616
#> 3 0.1480356
#> 4 0.1475605
#> 5 0.1674610
#> 6 0.1378252
#> Inventory.and.accounts.receivable.Net.value Total.Asset.Turnover
#> 1 0.3980357 0.08695652
#> 2 0.3977249 0.06446777
#> 3 0.4065805 0.01499250
#> 4 0.3979245 0.08995502
#> 5 0.4000788 0.17541229
#> 6 0.4004191 0.09595202
#> Accounts.Receivable.Turnover Average.Collection.Days
#> 1 0.001813884 0.003487364
#> 2 0.001286356 0.004916808
#> 3 0.001495338 0.004226849
#> 4 0.001966056 0.003214967
#> 5 0.001448673 0.004366891
#> 6 0.001527802 0.004137189
#> Inventory.Turnover.Rate..times. Fixed.Assets.Turnover.Frequency
#> 1 0.0001820926 0.0001165007
#> 2 9360000000.0000000000 719000000.0000000000
#> 3 65000000.0000000000 2650000000.0000000000
#> 4 7130000000.0000000000 9150000000.0000000000
#> 5 0.0001633674 0.0002935211
#> 6 650000000.0000000000 9300000000.0000000000
#> Net.Worth.Turnover.Rate..times. Revenue.per.person
#> 1 0.03290323 0.034164182
#> 2 0.02548387 0.006888651
#> 3 0.01338710 0.028996960
#> 4 0.02806452 0.015463478
#> 5 0.04016129 0.058111423
#> 6 0.02967742 0.021300471
#> Operating.profit.per.person Allocation.rate.per.person
#> 1 0.3929129 0.03713530
#> 2 0.3915900 0.01233497
#> 3 0.3819678 0.14101631
#> 4 0.3784966 0.02131999
#> 5 0.3943715 0.02398821
#> 6 0.3775695 0.03282877
#> Working.Capital.to.Total.Assets Quick.Assets.Total.Assets
#> 1 0.6727753 0.16667296
#> 2 0.7511109 0.12723600
#> 3 0.8295019 0.34020088
#> 4 0.7257542 0.16157453
#> 5 0.7518225 0.26032988
#> 6 0.6867286 0.08026371
#> Current.Assets.Total.Assets Cash.Total.Assets Quick.Assets.Current.Liability
#> 1 0.1906430 0.0040944060 0.001996771
#> 2 0.1824191 0.0149477270 0.004136030
#> 3 0.6028057 0.0009909445 0.006302481
#> 4 0.2258149 0.0188506248 0.002961238
#> 5 0.3583802 0.0141609738 0.004274771
#> 6 0.2145360 0.0026452256 0.000988425
#> Cash.Current.Liability Current.Liability.to.Assets
#> 1 0.0001473360 0.14730845
#> 2 0.0013839101 0.05696283
#> 3 5340000000.0000000000 0.09816206
#> 4 0.0010106464 0.09871463
#> 5 0.0006804636 0.11019485
#> 6 0.0001008563 0.13900211
#> Operating.Funds.to.Liability Inventory.Working.Capital
#> 1 0.3340152 0.2769202
#> 2 0.3411060 0.2896416
#> 3 0.3367315 0.2774555
#> 4 0.3487164 0.2765803
#> 5 0.3446388 0.2879127
#> 6 0.3505631 0.2766785
#> Inventory.Current.Liability Current.Liabilities.Liability
#> 1 0.001035990 0.6762692
#> 2 0.005209682 0.3085886
#> 3 0.013878786 0.4460275
#> 4 0.003540148 0.6158484
#> 5 0.004868570 0.9750066
#> 6 0.004879131 0.7333519
#> Working.Capital.Equity Current.Liabilities.Equity
#> 1 0.7212746 0.3390770
#> 2 0.7319753 0.3297401
#> 3 0.7427286 0.3347769
#> 4 0.7298249 0.3315090
#> 5 0.7319996 0.3307263
#> 6 0.7252016 0.3355344
#> Long.term.Liability.to.Current.Assets Retained.Earnings.to.Total.Assets
#> 1 0.025592368 0.9032248
#> 2 0.023946819 0.9310652
#> 3 0.003715116 0.9099034
#> 4 0.022165200 0.9069022
#> 5 0.000000000 0.9138502
#> 6 0.003772505 0.9030413
#> Total.income.Total.expense Total.expense.Assets Current.Asset.Turnover.Rate
#> 1 0.002021613 0.06485571 701000000.0000000000
#> 2 0.002225608 0.02551586 0.0001065198
#> 3 0.002060071 0.02138743 0.0017910937
#> 4 0.001831359 0.02416107 8140000000.0000000000
#> 5 0.002223930 0.02638525 6680000000.0000000000
#> 6 0.001865609 0.04009362 8010000000.0000000000
#> Quick.Asset.Turnover.Rate Working.capitcal.Turnover.Rate Cash.Turnover.Rate
#> 1 6550000000.000000000 0.5938305 458000000
#> 2 7700000000.000000000 0.5939155 2490000000
#> 3 0.001022676 0.5945019 761000000
#> 4 6050000000.000000000 0.5938888 2030000000
#> 5 5050000000.000000000 0.5939153 824000000
#> 6 2810000000.000000000 0.5938458 295000000
#> Cash.Flow.to.Sales Fixed.Assets.to.Assets Current.Liability.to.Liability
#> 1 0.6715677 0.4242058 0.6762692
#> 2 0.6715699 0.4688281 0.3085886
#> 3 0.6715713 0.2761792 0.4460275
#> 4 0.6715192 0.5591440 0.6158484
#> 5 0.6715631 0.3095549 0.9750066
#> 6 0.6715676 0.6031935 0.7333519
#> Current.Liability.to.Equity Equity.to.Long.term.Liability
#> 1 0.3390770 0.1265495
#> 2 0.3297401 0.1209161
#> 3 0.3347769 0.1179223
#> 4 0.3315090 0.1207605
#> 5 0.3307263 0.1109332
#> 6 0.3355344 0.1129172
#> Cash.Flow.to.Total.Assets Cash.Flow.to.Liability CFO.to.Assets
#> 1 0.6375554 0.4586091 0.5203819
#> 2 0.6411000 0.4590011 0.5671013
#> 3 0.6427646 0.4592540 0.5384905
#> 4 0.5790393 0.4485179 0.6041051
#> 5 0.6223741 0.4544109 0.5784689
#> 6 0.6374698 0.4584993 0.6221901
#> Cash.Flow.to.Equity Current.Liability.to.Current.Assets Liability.Assets.Flag
#> 1 0.3129049 0.11825048 0
#> 2 0.3141631 0.04777528 0
#> 3 0.3145154 0.02534649 0
#> 4 0.3023823 0.06724962 0
#> 5 0.3115672 0.04772537 0
#> 6 0.3132685 0.09952193 0
#> Net.Income.to.Total.Assets Total.assets.to.GNP.price No.credit.Interval
#> 1 0.7168453 0.009219440 0.6228790
#> 2 0.7952971 0.008323302 0.6236517
#> 3 0.7746697 0.040002853 0.6238410
#> 4 0.7395545 0.003252475 0.6229287
#> 5 0.7950159 0.003877563 0.6235207
#> 6 0.7104205 0.005277875 0.6226046
#> Gross.Profit.to.Sales Net.Income.to.Stockholder.s.Equity Liability.to.Equity
#> 1 0.6014533 0.8278902 0.2902019
#> 2 0.6102365 0.8399693 0.2838460
#> 3 0.6014493 0.8367743 0.2901885
#> 4 0.5835376 0.8346971 0.2817212
#> 5 0.5987815 0.8399727 0.2785138
#> 6 0.5901723 0.8299390 0.2850871
#> Degree.of.Financial.Leverage..DFL.
#> 1 0.02660063
#> 2 0.26457682
#> 3 0.02655472
#> 4 0.02669663
#> 5 0.02475185
#> 6 0.02667537
#> Interest.Coverage.Ratio..Interest.expense.to.EBIT. Net.Income.Flag
#> 1 0.5640501 1
#> 2 0.5701749 1
#> 3 0.5637061 1
#> 4 0.5646634 1
#> 5 0.5756166 1
#> 6 0.5645383 1
#> Equity.to.Liability
#> 1 0.01646874
#> 2 0.02079431
#> 3 0.01647411
#> 4 0.02398233
#> 5 0.03549020
#> 6 0.01953448
Data Cleansing
As we can see the data still not clean, the data type is quite not
right. We are going to clean a bit with dplyr. Then check
again with glimpse().
company_clean <- company %>%
dplyr::select(Bankrupt., Operating.Gross.Margin , Operating.Profit.Rate,
After.tax.net.Interest.Rate, Operating.Expense.Rate,
Cash.flow.rate, Gross.Profit.to.Sales,
Continuous.Net.Profit.Growth.Rate ,Net.worth.Assets,
Cash.Turnover.Rate, Current.Liability.to.Current.Assets) %>%
mutate_if(is.integer, as.factor) %>%
mutate(
Bankrupt. = factor(Bankrupt. , levels = c(0, 1),
labels = c("Not Bankrupt",
"Bankrupt")))
glimpse(company_clean)#> Rows: 6,819
#> Columns: 11
#> $ Bankrupt. <fct> Bankrupt, Bankrupt, Bankrupt, Bank…
#> $ Operating.Gross.Margin <dbl> 0.6014572, 0.6102351, 0.6014500, 0…
#> $ Operating.Profit.Rate <dbl> 0.9989692, 0.9989460, 0.9988574, 0…
#> $ After.tax.net.Interest.Rate <dbl> 0.8088094, 0.8093007, 0.8083875, 0…
#> $ Operating.Expense.Rate <dbl> 0.0001256969, 0.0002897851, 0.0002…
#> $ Cash.flow.rate <dbl> 0.4581431, 0.4618673, 0.4585206, 0…
#> $ Gross.Profit.to.Sales <dbl> 0.6014533, 0.6102365, 0.6014493, 0…
#> $ Continuous.Net.Profit.Growth.Rate <dbl> 0.2175354, 0.2176196, 0.2176013, 0…
#> $ Net.worth.Assets <dbl> 0.7924237, 0.8288237, 0.7924842, 0…
#> $ Cash.Turnover.Rate <dbl> 458000000.0000000000, 2490000000.0…
#> $ Current.Liability.to.Current.Assets <dbl> 0.11825048, 0.04777528, 0.02534649…
Check the missing value
company_clean %>% is.na() %>% colSums()#> Bankrupt. Operating.Gross.Margin
#> 0 0
#> Operating.Profit.Rate After.tax.net.Interest.Rate
#> 0 0
#> Operating.Expense.Rate Cash.flow.rate
#> 0 0
#> Gross.Profit.to.Sales Continuous.Net.Profit.Growth.Rate
#> 0 0
#> Net.worth.Assets Cash.Turnover.Rate
#> 0 0
#> Current.Liability.to.Current.Assets
#> 0
The dataset has no missing values. We can continue.
Pre-Processing Data
Check data proportion from target variable. The target variable or
independent varible is Bankcrupt..
company_clean$Bankrupt. %>% table() %>% prop.table#> .
#> Not Bankrupt Bankrupt
#> 0.9677372 0.0322628
It’s imbalanced. We are going handle it later.
Cross Validation
We split the dataset to Train Dataset dan Test Dataset. Train dataset is use to train the model. Test dataset is use to validate our model and see how well our model work to unseen data.
# Split the dataset to train data and test data
index <- sample(nrow(company_clean),
nrow(company_clean)*0.8)
company_train <- company_clean[index,]
company_test <- company_clean[-index,]# check proportion table for data train
company_train$Bankrupt. %>% table() %>% prop.table#> .
#> Not Bankrupt Bankrupt
#> 0.96736939 0.03263061
Our data is imbalance. So we do upsampling to data train.
RNGkind(sample.kind = "Rounding")
set.seed(70)
library(rsample)
company_train_up <- upSample(
x = company_train %>% dplyr::select(-Bankrupt.),
y= company_train$Bankrupt.,
yname = "Bankrupt."
) # check proportion table for data train
company_train_up$Bankrupt. %>% table() %>% prop.table#> .
#> Not Bankrupt Bankrupt
#> 0.5 0.5
Modeling
We are going to make model with Logistic Regression. Modeling with
glm(). The target / independent variable is
Bankrupt.
model_lr <- glm(
formula = Bankrupt.~ .,
family = 'binomial',
data = company_train_up
)
summary(model_lr)#>
#> Call:
#> glm(formula = Bankrupt. ~ ., family = "binomial", data = company_train_up)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -8.49 -8.49 0.00 0.00 8.49
#>
#> Coefficients:
#> Estimate
#> (Intercept) 6094775238684640.0000000
#> Operating.Gross.Margin -133109908820438450186.0000000
#> Operating.Profit.Rate 25955025910767376.0000000
#> After.tax.net.Interest.Rate -3085474774989755.0000000
#> Operating.Expense.Rate -29196.2179969
#> Cash.flow.rate -17191486051624098.0000000
#> Gross.Profit.to.Sales 133104409970840371200.0000000
#> Continuous.Net.Profit.Growth.Rate -25412735752718940.0000000
#> Net.worth.Assets -13145407137642790.0000000
#> Cash.Turnover.Rate -39723.7341970
#> Current.Liability.to.Current.Assets -4288783996462404.0000000
#> Std. Error z value
#> (Intercept) 71635059.5184516 85080899
#> Operating.Gross.Margin 314203971127.2032471 -423641714
#> Operating.Profit.Rate 111882058.2161452 231985596
#> After.tax.net.Interest.Rate 102880465.4576698 -29990871
#> Operating.Expense.Rate 0.0002063 -141522648
#> Cash.flow.rate 47563076.3376580 -361446050
#> Gross.Profit.to.Sales 314202809182.2778931 423625780
#> Continuous.Net.Profit.Growth.Rate 78027681.7113657 -325688720
#> Net.worth.Assets 12163946.2788169 -1080686057
#> Cash.Turnover.Rate 0.0002436 -163092051
#> Current.Liability.to.Current.Assets 17631621.5918203 -243243877
#> Pr(>|z|)
#> (Intercept) <0.0000000000000002 ***
#> Operating.Gross.Margin <0.0000000000000002 ***
#> Operating.Profit.Rate <0.0000000000000002 ***
#> After.tax.net.Interest.Rate <0.0000000000000002 ***
#> Operating.Expense.Rate <0.0000000000000002 ***
#> Cash.flow.rate <0.0000000000000002 ***
#> Gross.Profit.to.Sales <0.0000000000000002 ***
#> Continuous.Net.Profit.Growth.Rate <0.0000000000000002 ***
#> Net.worth.Assets <0.0000000000000002 ***
#> Cash.Turnover.Rate <0.0000000000000002 ***
#> Current.Liability.to.Current.Assets <0.0000000000000002 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 14631 on 10553 degrees of freedom
#> Residual deviance: 272995 on 10543 degrees of freedom
#> AIC: 273017
#>
#> Number of Fisher Scoring iterations: 25
Model Fitting
library(MASS)
model2 <- stepAIC(model_lr, direction = "backward", trace = F)
summary(model2)#>
#> Call:
#> glm(formula = Bankrupt. ~ Operating.Gross.Margin + Operating.Expense.Rate +
#> Cash.flow.rate + Continuous.Net.Profit.Growth.Rate + Net.worth.Assets +
#> Cash.Turnover.Rate + Current.Liability.to.Current.Assets,
#> family = "binomial", data = company_train_up)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -7.0975 -0.6171 0.0019 0.6579 2.2223
#>
#> Coefficients:
#> Estimate Std. Error
#> (Intercept) 78.467921268228167 3.183195861246445
#> Operating.Gross.Margin -34.387925481701700 2.497008570076800
#> Operating.Expense.Rate -0.000000000100512 0.000000000008488
#> Cash.flow.rate -47.768048370968096 2.695796389837017
#> Continuous.Net.Profit.Growth.Rate -65.783236945054966 11.085822098942645
#> Net.worth.Assets -25.256591810520330 0.689606865994374
#> Cash.Turnover.Rate -0.000000000144141 0.000000000010371
#> Current.Liability.to.Current.Assets 16.635234166348820 1.311227432149053
#> z value Pr(>|z|)
#> (Intercept) 24.651 < 0.0000000000000002 ***
#> Operating.Gross.Margin -13.772 < 0.0000000000000002 ***
#> Operating.Expense.Rate -11.841 < 0.0000000000000002 ***
#> Cash.flow.rate -17.719 < 0.0000000000000002 ***
#> Continuous.Net.Profit.Growth.Rate -5.934 0.00000000296 ***
#> Net.worth.Assets -36.625 < 0.0000000000000002 ***
#> Cash.Turnover.Rate -13.899 < 0.0000000000000002 ***
#> Current.Liability.to.Current.Assets 12.687 < 0.0000000000000002 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 14631.0 on 10553 degrees of freedom
#> Residual deviance: 8890.8 on 10546 degrees of freedom
#> AIC: 8906.8
#>
#> Number of Fisher Scoring iterations: 6
Prediction
With Model2 from stepwise, we are going to predict with
data test.
company_test$prob_bankrupt<-predict(model2, type = "response", newdata = company_test)Plot the prediction
ggplot(company_test, aes(x=prob_bankrupt)) +
geom_density(lwd=0.5) +
labs(title = "Distribution of Probability Prediction Data") +
theme_minimal()company_test$pred_bankrupt <- factor(ifelse(company_test$prob_bankrupt > 0.5, "Bankrupt","Not Bankrupt"))
company_test[1:10, c("pred_bankrupt", "Bankrupt.")]#> pred_bankrupt Bankrupt.
#> 9 Not Bankrupt Not Bankrupt
#> 19 Not Bankrupt Not Bankrupt
#> 23 Not Bankrupt Not Bankrupt
#> 29 Not Bankrupt Not Bankrupt
#> 30 Bankrupt Bankrupt
#> 34 Not Bankrupt Not Bankrupt
#> 55 Bankrupt Bankrupt
#> 57 Bankrupt Bankrupt
#> 58 Bankrupt Bankrupt
#> 71 Not Bankrupt Not Bankrupt
Model Evaluation
Model Evaluation with Confusion Matrix
library(caret)
CM <- confusionMatrix(company_test$pred_bankrupt, company_test$Bankrupt., positive = "Bankrupt")
CM#> Confusion Matrix and Statistics
#>
#> Reference
#> Prediction Not Bankrupt Bankrupt
#> Not Bankrupt 1064 5
#> Bankrupt 258 37
#>
#> Accuracy : 0.8072
#> 95% CI : (0.7852, 0.8278)
#> No Information Rate : 0.9692
#> P-Value [Acc > NIR] : 1
#>
#> Kappa : 0.1751
#>
#> Mcnemar's Test P-Value : <0.0000000000000002
#>
#> Sensitivity : 0.88095
#> Specificity : 0.80484
#> Pos Pred Value : 0.12542
#> Neg Pred Value : 0.99532
#> Prevalence : 0.03079
#> Detection Rate : 0.02713
#> Detection Prevalence : 0.21628
#> Balanced Accuracy : 0.84290
#>
#> 'Positive' Class : Bankrupt
#>
- Sensitivity : from all the positive classes, how many the actual positive
- Specificity : from all the negative classes, how many the actual negative
- Pos Pred Value : from all the positive classes, how many the predict positive correct
- Pos Pred Value : from all the negative classes, how many the predict negative correct
For this case , we want sensitifity as high as possible. Because we concern the actual positive.
Sensitivity : 0.78571. This already good model.
Model Interpretation
# Odds ratio all coefficients
inv.logit(model2$coefficients) %>%
data.frame() #> .
#> (Intercept) 1.00000000000000000000000000000000000
#> Operating.Gross.Margin 0.00000000000000116282328356635084455
#> Operating.Expense.Rate 0.49999999997487204472790267573145684
#> Cash.flow.rate 0.00000000000000000000179721560640081
#> Continuous.Net.Profit.Growth.Rate 0.00000000000000000000000000002695896
#> Net.worth.Assets 0.00000000001074487939130762646344069
#> Cash.Turnover.Rate 0.49999999996396476964477528781571891
#> Current.Liability.to.Current.Assets 0.99999994037758399567650258177309297
We can interpertate :
The company’s bankrupt probability would rise 50% when Operating.Expense.Rate and Cash.Turnover.Rate every one unit increase. With anything else being equal.
K-Nearest Neighbour
Cross Validation
set.seed(50)
index <- rsample::initial_split(data=company_train_up,
prop = 0.8,
strata = Bankrupt.)
com_train <- training(index)
com_test <- testing(index)Check the proportion
prop.table(table(com_train$Bankrupt.))#>
#> Not Bankrupt Bankrupt
#> 0.5 0.5
prop.table(table(com_test$Bankrupt.))#>
#> Not Bankrupt Bankrupt
#> 0.5 0.5
Prepare train data and test data
# Train Data predictor
train_x <- com_train %>% select_if(is.numeric)
# Train Data target
train_y <- com_train %>% dplyr::select(Bankrupt.)
# Test Data predictor
test_x <- com_test %>% select_if(is.numeric)
# Test Data target
test_y <- com_test %>% dplyr::select(Bankrupt.)Scalling train data and test data
train_x <- scale(train_x)
test_x <- scale(test_x,
center=attr(train_x, "scaled:center"),
scale=attr(train_x, "scaled:scale")) Choose K value from rooted observasi
sqrt(nrow(train_x))#> [1] 91.88036
We use k = 91
library(class)
com_pred <- knn(train = train_x,
test = test_x,
cl = train_y$Bankrupt.,
k=91) Evaluation Model
confusionMatrix(data=com_pred,
reference=test_y$Bankrupt.,
positive="Bankrupt")#> Confusion Matrix and Statistics
#>
#> Reference
#> Prediction Not Bankrupt Bankrupt
#> Not Bankrupt 842 113
#> Bankrupt 214 943
#>
#> Accuracy : 0.8452
#> 95% CI : (0.829, 0.8603)
#> No Information Rate : 0.5
#> P-Value [Acc > NIR] : < 0.00000000000000022
#>
#> Kappa : 0.6903
#>
#> Mcnemar's Test P-Value : 0.00000003202
#>
#> Sensitivity : 0.8930
#> Specificity : 0.7973
#> Pos Pred Value : 0.8150
#> Neg Pred Value : 0.8817
#> Prevalence : 0.5000
#> Detection Rate : 0.4465
#> Detection Prevalence : 0.5478
#> Balanced Accuracy : 0.8452
#>
#> 'Positive' Class : Bankrupt
#>
with Sensitivity : 0.9015, it has better value than Logistic Regression.
Conclusion
We already try this problem with Logistic Regression and K-Nearest Neighbour. Because we want focus on the bankrupt, we choose sensitivity as our metric evaluation. We want the lowest False Negative (the condition when we predict not bankrupt, but actually bankrupt) as low as possible. In my opinion, we can use K-Nearest Neighbour model.