Introduction

Clustering banks based on bankruptcy risk is a crucial analytical approach for understanding and managing financial stability within the banking sector. By grouping banks with similar risk profiles, stakeholders can better assess potential vulnerabilities, allocate resources efficiently, and implement targeted strategies. This method provides a structured way to address the complexities of financial risk and improve overall oversight and intervention efforts.

motivation

  • Risk Assessment and Management:
    • Identifies banks with similar bankruptcy risks for targeted risk management.
    • Facilitates the development of specific strategies for managing financial distress.
  • Resource Allocation:
    • Enables efficient allocation of monitoring and support resources to high-risk banks.
    • Prioritizes intervention efforts based on cluster risk levels.
  • Regulatory Oversight:
    • Helps regulators design tailored policies and regulations for different risk clusters.
    • Supports focused oversight and preventive measures.
  • Predictive Analysis:
    • Enhances predictive models by analyzing groups of banks with comparable financial profiles.
    • Improves the forecasting of potential financial crises and systemic risks.
  • Strategic Planning:
    • Assists banks in understanding their position relative to peers within the same risk cluster.
    • Aids in formulating strategies for risk reduction and financial stability.
  • Investment Decisions:
    • Guides investors in evaluating the risk profiles of banks and making informed investment choices.
    • Highlights potential investment risks and opportunities based on cluster analysis.
  • Crisis Management:
    • Improves crisis preparedness by identifying high-risk clusters requiring immediate attention.
    • Supports the development of tailored contingency plans for different risk levels.
  • Performance Benchmarking:
    • Facilitates comparison and benchmarking of banks within similar risk clusters.
    • Helps assess performance metrics and identify best practices among peers.
  • Data-Driven Insights:
    • Provides a structured approach to analyzing bankruptcy risks, leading to more informed decision-making.
    • Enhances understanding of systemic vulnerabilities and strengths.
  • Enhanced Communication:
    • Clarifies risk discussions between banks, regulators, and stakeholders through a structured framework.
    • Improves transparency and rationale behind regulatory decisions and interventions.

Objective

Here our objective is to cluster the data points.

Methodology

Data collection

The data is obtained from kaggle Dataset name : Company Bankruptcy Prediction

Variable Details

Y - Bankrupt?: Class label

X1 - ROA(C) before interest and depreciation before interest: Return On Total Assets(C)

X2 - ROA(A) before interest and % after tax: Return On Total Assets(A)

X3 - ROA(B) before interest and depreciation after tax: Return On Total Assets(B)

X4 - Operating Gross Margin: Gross Profit/Net Sales

X5 - Realized Sales Gross Margin: Realized Gross Profit/Net Sales

X6 - Operating Profit Rate: Operating Income/Net Sales

X7 - Pre-tax net Interest Rate: Pre-Tax Income/Net Sales

X8 - After-tax net Interest Rate: Net Income/Net Sales

X9 - Non-industry income and expenditure/revenue: Net Non-operating Income Ratio

X10 - Continuous interest rate (after tax): Net Income-Exclude Disposal Gain or Loss/Net Sales

X11 - Operating Expense Rate: Operating Expenses/Net Sales

X12 - Research and development expense rate: (Research and Development Expenses)/Net Sales

X13 - Cash flow rate: Cash Flow from Operating/Current Liabilities

X14 - Interest-bearing debt interest rate: Interest-bearing Debt/Equity

X15 - Tax rate (A): Effective Tax Rate

X16 - Net Value Per Share (B): Book Value Per Share(B)

X17 - Net Value Per Share (A): Book Value Per Share(A)

X18 - Net Value Per Share (C): Book Value Per Share(C)

X19 - Persistent EPS in the Last Four Seasons: EPS-Net Income

X20 - Cash Flow Per Share

X21 - Revenue Per Share (Yuan ¥): Sales Per Share

X22 - Operating Profit Per Share (Yuan ¥): Operating Income Per Share

X23 - Per Share Net profit before tax (Yuan ¥): Pretax Income Per Share

X24 - Realized Sales Gross Profit Growth Rate

X25 - Operating Profit Growth Rate: Operating Income Growth

X26 - After-tax Net Profit Growth Rate: Net Income Growth

X27 - Regular Net Profit Growth Rate: Continuing Operating Income after Tax Growth

X28 - Continuous Net Profit Growth Rate: Net Income-Excluding Disposal Gain or Loss Growth

X29 - Total Asset Growth Rate: Total Asset Growth

X30 - Net Value Growth Rate: Total Equity Growth

X31 - Total Asset Return Growth Rate Ratio: Return on Total Asset Growth

X32 - Cash Reinvestment %: Cash Reinvestment Ratio

X33 - Current Ratio

X34 - Quick Ratio: Acid Test

X35 - Interest Expense Ratio: Interest Expenses/Total Revenue

X36 - Total debt/Total net worth: Total Liability/Equity Ratio

X37 - Debt ratio %: Liability/Total Assets

X38 - Net worth/Assets: Equity/Total Assets

X39 - Long-term fund suitability ratio (A): (Long-term Liability+Equity)/Fixed Assets

X40 - Borrowing dependency: Cost of Interest-bearing Debt

X41 - Contingent liabilities/Net worth: Contingent Liability/Equity

X42 - Operating profit/Paid-in capital: Operating Income/Capital

X43 - Net profit before tax/Paid-in capital: Pretax Income/Capital

X44 - Inventory and accounts receivable/Net value: (Inventory+Accounts Receivables)/Equity

X45 - Total Asset Turnover

X46 - Accounts Receivable Turnover

X47 - Average Collection Days: Days Receivable Outstanding

X48 - Inventory Turnover Rate (times)

X49 - Fixed Assets Turnover Frequency

X50 - Net Worth Turnover Rate (times): Equity Turnover

X51 - Revenue per person: Sales Per Employee

X52 - Operating profit per person: Operation Income Per Employee

X53 - Allocation rate per person: Fixed Assets Per Employee

X54 - Working Capital to Total Assets

X55 - Quick Assets/Total Assets

X56 - Current Assets/Total Assets

X57 - Cash/Total Assets

X58 - Quick Assets/Current Liability

X59 - Cash/Current Liability

X60 - Current Liability to Assets

X61 - Operating Funds to Liability

X62 - Inventory/Working Capital

X63 - Inventory/Current Liability

X64 - Current Liabilities/Liability

X65 - Working Capital/Equity

X66 - Current Liabilities/Equity

X67 - Long-term Liability to Current Assets

X68 - Retained Earnings to Total Assets

X69 - Total income/Total expense

X70 - Total expense/Assets

X71 - Current Asset Turnover Rate: Current Assets to Sales

X72 - Quick Asset Turnover Rate: Quick Assets to Sales

X73 - Working capitcal Turnover Rate: Working Capital to Sales

X74 - Cash Turnover Rate: Cash to Sales

X75 - Cash Flow to Sales

X76 - Fixed Assets to Assets

X77 - Current Liability to Liability

X78 - Current Liability to Equity

X79 - Equity to Long-term Liability

X80 - Cash Flow to Total Assets

X81 - Cash Flow to Liability

X82 - CFO to Assets

X83 - Cash Flow to Equity

X84 - Current Liability to Current Assets

X85 - Liability-Assets Flag: 1 if Total Liability exceeds Total Assets, 0 otherwise

X86 - Net Income to Total Assets

X87 - Total assets to GNP price

X88 - No-credit Interval

X89 - Gross Profit to Sales

X90 - Net Income to Stockholder’s Equity

X91 - Liability to Equity

X92 - Degree of Financial Leverage (DFL)

X93 - Interest Coverage Ratio (Interest expense to EBIT)

X94 - Net Income Flag: 1 if Net Income is Negative for the last two years, 0 otherwise

X95 - Equity to Liability

Data preparation

Getting the required packages:

pacman::p_load(MASS,clValid,cluster,dbscan,ggplot2)

Importing and cleaning the dataset:

mydata=read.csv("C:\\Users\\zeeda\\Downloads\\archive (3)\\data.csv")
  • First check if the data is balanced or not i.e. whether the no of bankrupt and non-bankrupt banks are equal.
table(mydata$Bankrupt.)

   0    1 
6599  220 

Clearly the data is highly unbalanced as we have a lot of observation on banks that are not bankrupt and less on bankrupt. So we will use undersampling technique and reduce the cardinality of majority class ( non-bankrupt class)

no_bankrupt=subset(mydata,mydata$Bankrupt.==0)
mydata_index=sample(1:nrow(no_bankrupt),201,F)
no_bankrupt_data=no_bankrupt[mydata_index,]
bankrupt_data=subset(mydata,mydata$Bankrupt.==1)
mydata=rbind(no_bankrupt_data,bankrupt_data)
table(mydata$Bankrupt.)

  0   1 
201 220 
  • Now we have a balanced data.
Summary of the data
summary(mydata)
   Bankrupt.      ROA.C..before.interest.and.depreciation.before.interest
 Min.   :0.0000   Min.   :0.02428                                        
 1st Qu.:0.0000   1st Qu.:0.43022                                        
 Median :1.0000   Median :0.46921                                        
 Mean   :0.5226   Mean   :0.45933                                        
 3rd Qu.:1.0000   3rd Qu.:0.50261                                        
 Max.   :1.0000   Max.   :0.69385                                        
 ROA.A..before.interest.and...after.tax
 Min.   :0.0000                        
 1st Qu.:0.4702                        
 Median :0.5258                        
 Mean   :0.5052                        
 3rd Qu.:0.5612                        
 Max.   :0.7439                        
 ROA.B..before.interest.and.depreciation.after.tax Operating.Gross.Margin
 Min.   :0.03351                                   Min.   :0.5329        
 1st Qu.:0.47272                                   1st Qu.:0.5966        
 Median :0.51823                                   Median :0.6016        
 Mean   :0.50449                                   Mean   :0.6028        
 3rd Qu.:0.55249                                   3rd Qu.:0.6091        
 Max.   :0.72440                                   Max.   :0.6652        
 Realized.Sales.Gross.Margin Operating.Profit.Rate Pre.tax.net.Interest.Rate
 Min.   :0.5329              Min.   :0.9862        Min.   :0.7572           
 1st Qu.:0.5966              1st Qu.:0.9988        1st Qu.:0.7971           
 Median :0.6016              Median :0.9990        Median :0.7973           
 Mean   :0.6029              Mean   :0.9988        Mean   :0.7969           
 3rd Qu.:0.6092              3rd Qu.:0.9990        3rd Qu.:0.7975           
 Max.   :0.6652              Max.   :0.9995        Max.   :0.7983           
 After.tax.net.Interest.Rate Non.industry.income.and.expenditure.revenue
 Min.   :0.7616              Min.   :0.2351                             
 1st Qu.:0.8090              1st Qu.:0.3032                             
 Median :0.8093              Median :0.3035                             
 Mean   :0.8088              Mean   :0.3030                             
 3rd Qu.:0.8094              3rd Qu.:0.3035                             
 Max.   :0.8101              Max.   :0.3054                             
 Continuous.interest.rate..after.tax. Operating.Expense.Rate
 Min.   :0.7427                       Min.   :0.000e+00     
 1st Qu.:0.7812                       1st Qu.:0.000e+00     
 Median :0.7815                       Median :0.000e+00     
 Mean   :0.7811                       Mean   :1.997e+09     
 3rd Qu.:0.7816                       3rd Qu.:4.090e+09     
 Max.   :0.7821                       Max.   :9.890e+09     
 Research.and.development.expense.rate Cash.flow.rate  
 Min.   :0.000e+00                     Min.   :0.3438  
 1st Qu.:0.000e+00                     1st Qu.:0.4596  
 Median :2.040e+08                     Median :0.4622  
 Mean   :1.766e+09                     Mean   :0.4641  
 3rd Qu.:3.190e+09                     3rd Qu.:0.4660  
 Max.   :9.920e+09                     Max.   :0.7433  
 Interest.bearing.debt.interest.rate  Tax.rate..A.    Net.Value.Per.Share..B.
 Min.   :        0                   Min.   :0.0000   Min.   :0.06966        
 1st Qu.:        0                   1st Qu.:0.0000   1st Qu.:0.15528        
 Median :        0                   Median :0.0000   Median :0.17361        
 Mean   :  8078385                   Mean   :0.0698   Mean   :0.17514        
 3rd Qu.:        0                   3rd Qu.:0.1195   3rd Qu.:0.18857        
 Max.   :790000000                   Max.   :0.9755   Max.   :0.30024        
 Net.Value.Per.Share..A. Net.Value.Per.Share..C.
 Min.   :0.06966         Min.   :0.06966        
 1st Qu.:0.15528         1st Qu.:0.15528        
 Median :0.17361         Median :0.17361        
 Mean   :0.17511         Mean   :0.17519        
 3rd Qu.:0.18857         3rd Qu.:0.18963        
 Max.   :0.30024         Max.   :0.30024        
 Persistent.EPS.in.the.Last.Four.Seasons Cash.Flow.Per.Share
 Min.   :0.0000                          Min.   :0.2085     
 1st Qu.:0.1924                          1st Qu.:0.3144     
 Median :0.2097                          Median :0.3192     
 Mean   :0.2074                          Mean   :0.3191     
 3rd Qu.:0.2244                          3rd Qu.:0.3243     
 Max.   :0.3505                          Max.   :0.4208     
 Revenue.Per.Share..Yuan... Operating.Profit.Per.Share..Yuan...
 Min.   :0.000e+00          Min.   :0.00000                    
 1st Qu.:0.000e+00          1st Qu.:0.08696                    
 Median :0.000e+00          Median :0.09576                    
 Mean   :7.173e+06          Mean   :0.09709                    
 3rd Qu.:0.000e+00          3rd Qu.:0.10553                    
 Max.   :3.020e+09          Max.   :0.25446                    
 Per.Share.Net.profit.before.tax..Yuan...
 Min.   :0.0000                          
 1st Qu.:0.1512                          
 Median :0.1672                          
 Mean   :0.1651                          
 3rd Qu.:0.1795                          
 Max.   :0.3233                          
 Realized.Sales.Gross.Profit.Growth.Rate Operating.Profit.Growth.Rate
 Min.   :0.01885                         Min.   :0.7364              
 1st Qu.:0.02203                         1st Qu.:0.8479              
 Median :0.02208                         Median :0.8480              
 Mean   :0.02253                         Mean   :0.8475              
 3rd Qu.:0.02213                         3rd Qu.:0.8481              
 Max.   :0.08345                         Max.   :0.8525              
 After.tax.Net.Profit.Growth.Rate Regular.Net.Profit.Growth.Rate
 Min.   :0.1807                   Min.   :0.1807                
 1st Qu.:0.6888                   1st Qu.:0.6889                
 Median :0.6893                   Median :0.6893                
 Mean   :0.6868                   Mean   :0.6868                
 3rd Qu.:0.6895                   3rd Qu.:0.6895                
 Max.   :0.7831                   Max.   :0.7831                
 Continuous.Net.Profit.Growth.Rate Total.Asset.Growth.Rate
 Min.   :0.1617                    Min.   :0.000e+00      
 1st Qu.:0.2175                    1st Qu.:4.330e+09      
 Median :0.2176                    Median :5.890e+09      
 Mean   :0.2173                    Mean   :5.206e+09      
 3rd Qu.:0.2176                    3rd Qu.:6.900e+09      
 Max.   :0.2195                    Max.   :9.980e+09      
 Net.Value.Growth.Rate Total.Asset.Return.Growth.Rate.Ratio Cash.Reinvestment..
 Min.   :0.000e+00     Min.   :0.2516                       Min.   :0.02583    
 1st Qu.:0.000e+00     1st Qu.:0.2634                       1st Qu.:0.37004    
 Median :0.000e+00     Median :0.2639                       Median :0.37807    
 Mean   :2.216e+07     Mean   :0.2637                       Mean   :0.37647    
 3rd Qu.:0.000e+00     3rd Qu.:0.2642                       3rd Qu.:0.38359    
 Max.   :9.330e+09     Max.   :0.2721                       Max.   :1.00000    
 Current.Ratio        Quick.Ratio        Interest.Expense.Ratio
 Min.   :0.0003551   Min.   :0.000e+00   Min.   :0.5251        
 1st Qu.:0.0050964   1st Qu.:0.000e+00   1st Qu.:0.6301        
 Median :0.0079251   Median :0.000e+00   Median :0.6306        
 Mean   :0.0119850   Mean   :2.192e+07   Mean   :0.6309        
 3rd Qu.:0.0121160   3rd Qu.:0.000e+00   3rd Qu.:0.6310        
 Max.   :0.7126299   Max.   :9.230e+09   Max.   :0.8122        
 Total.debt.Total.net.worth  Debt.ratio..    Net.worth.Assets
 Min.   :0.000e+00          Min.   :0.0000   Min.   :0.4746  
 1st Qu.:0.000e+00          1st Qu.:0.1005   1st Qu.:0.8073  
 Median :0.000e+00          Median :0.1548   Median :0.8452  
 Mean   :8.242e+06          Mean   :0.1500   Mean   :0.8500  
 3rd Qu.:0.000e+00          3rd Qu.:0.1927   3rd Qu.:0.8995  
 Max.   :3.470e+09          Max.   :0.5254   Max.   :1.0000  
 Long.term.fund.suitability.ratio..A. Borrowing.dependency
 Min.   :0.004129                     Min.   :0.0000      
 1st Qu.:0.005080                     1st Qu.:0.3716      
 Median :0.005440                     Median :0.3772      
 Mean   :0.009931                     Mean   :0.3825      
 3rd Qu.:0.006222                     3rd Qu.:0.3839      
 Max.   :0.923930                     Max.   :1.0000      
 Contingent.liabilities.Net.worth Operating.profit.Paid.in.capital
 Min.   :0.000000                 Min.   :0.00000                 
 1st Qu.:0.005366                 1st Qu.:0.08703                 
 Median :0.005366                 Median :0.09571                 
 Mean   :0.008349                 Mean   :0.09714                 
 3rd Qu.:0.006038                 3rd Qu.:0.10531                 
 Max.   :1.000000                 Max.   :0.25447                 
 Net.profit.before.tax.Paid.in.capital
 Min.   :0.0000                       
 1st Qu.:0.1508                       
 Median :0.1662                       
 Mean   :0.1642                       
 3rd Qu.:0.1782                       
 Max.   :0.2998                       
 Inventory.and.accounts.receivable.Net.value Total.Asset.Turnover
 Min.   :0.0000                              Min.   :0.00000     
 1st Qu.:0.3979                              1st Qu.:0.06297     
 Median :0.4015                              Median :0.09745     
 Mean   :0.4053                              Mean   :0.12394     
 3rd Qu.:0.4086                              3rd Qu.:0.15592     
 Max.   :0.7074                              Max.   :0.66117     
 Accounts.Receivable.Turnover Average.Collection.Days
 Min.   :0.000e+00            Min.   :        0      
 1st Qu.:0.000e+00            1st Qu.:        0      
 Median :0.000e+00            Median :        0      
 Mean   :2.898e+06            Mean   :   325416      
 3rd Qu.:0.000e+00            3rd Qu.:        0      
 Max.   :1.220e+09            Max.   :137000000      
 Inventory.Turnover.Rate..times. Fixed.Assets.Turnover.Frequency
 Min.   :0.000e+00               Min.   :0.000e+00              
 1st Qu.:0.000e+00               1st Qu.:0.000e+00              
 Median :0.000e+00               Median :0.000e+00              
 Mean   :1.987e+09               Mean   :1.564e+09              
 3rd Qu.:3.900e+09               3rd Qu.:9.530e+08              
 Max.   :9.990e+09               Max.   :9.990e+09              
 Net.Worth.Turnover.Rate..times. Revenue.per.person 
 Min.   :0.008871                Min.   :0.000e+00  
 1st Qu.:0.021452                1st Qu.:0.000e+00  
 Median :0.029839                Median :0.000e+00  
 Mean   :0.040590                Mean   :1.675e+07  
 3rd Qu.:0.044355                3rd Qu.:0.000e+00  
 Max.   :0.396129                Max.   :7.050e+09  
 Operating.profit.per.person Allocation.rate.per.person
 Min.   :0.3023              Min.   :0.000e+00         
 1st Qu.:0.3849              1st Qu.:0.000e+00         
 Median :0.3923              Median :0.000e+00         
 Mean   :0.3905              Mean   :2.793e+07         
 3rd Qu.:0.3978              3rd Qu.:0.000e+00         
 Max.   :0.5165              Max.   :8.280e+09         
 Working.Capital.to.Total.Assets Quick.Assets.Total.Assets
 Min.   :0.4942                  Min.   :0.01743          
 1st Qu.:0.7337                  1st Qu.:0.17470          
 Median :0.7828                  Median :0.33588          
 Mean   :0.7830                  Mean   :0.35182          
 3rd Qu.:0.8312                  3rd Qu.:0.49470          
 Max.   :0.9623                  Max.   :0.88270          
 Current.Assets.Total.Assets Cash.Total.Assets   Quick.Assets.Current.Liability
 Min.   :0.02083             Min.   :0.0001842   Min.   :0.0001435             
 1st Qu.:0.31889             1st Qu.:0.0174118   1st Qu.:0.0027748             
 Median :0.48552             Median :0.0443369   Median :0.0052089             
 Mean   :0.49951             Mean   :0.0864997   Mean   :0.0083411             
 3rd Qu.:0.67877             3rd Qu.:0.1040967   3rd Qu.:0.0092043             
 Max.   :0.99545             Max.   :0.6582866   Max.   :0.3251893             
 Cash.Current.Liability Current.Liability.to.Assets
 Min.   :0.000e+00      Min.   :0.001481           
 1st Qu.:0.000e+00      1st Qu.:0.068199           
 Median :0.000e+00      Median :0.111588           
 Mean   :1.442e+08      Mean   :0.118628           
 3rd Qu.:0.000e+00      3rd Qu.:0.160469           
 Max.   :9.010e+09      Max.   :0.343143           
 Operating.Funds.to.Liability Inventory.Working.Capital
 Min.   :0.02627              Min.   :0.0000           
 1st Qu.:0.33714              1st Qu.:0.2769           
 Median :0.34278              Median :0.2771           
 Mean   :0.34602              Mean   :0.2766           
 3rd Qu.:0.35059              3rd Qu.:0.2775           
 Max.   :0.70850              Max.   :0.3066           
 Inventory.Current.Liability Current.Liabilities.Liability
 Min.   :0.000e+00           Min.   :0.04958              
 1st Qu.:0.000e+00           1st Qu.:0.64141              
 Median :0.000e+00           Median :0.80095              
 Mean   :4.253e+07           Mean   :0.75683              
 3rd Qu.:0.000e+00           3rd Qu.:0.91956              
 Max.   :8.790e+09           Max.   :1.00000              
 Working.Capital.Equity Current.Liabilities.Equity
 Min.   :0.0000         Min.   :0.0000            
 1st Qu.:0.7308         1st Qu.:0.3289            
 Median :0.7347         Median :0.3324            
 Mean   :0.7310         Mean   :0.3372            
 3rd Qu.:0.7382         3rd Qu.:0.3372            
 Max.   :0.8252         Max.   :1.0000            
 Long.term.Liability.to.Current.Assets Retained.Earnings.to.Total.Assets
 Min.   :0.000e+00                     Min.   :0.7292                   
 1st Qu.:0.000e+00                     1st Qu.:0.9091                   
 Median :0.000e+00                     Median :0.9291                   
 Mean   :4.301e+07                     Mean   :0.9197                   
 3rd Qu.:0.000e+00                     3rd Qu.:0.9385                   
 Max.   :7.000e+09                     Max.   :0.9755                   
 Total.income.Total.expense Total.expense.Assets Current.Asset.Turnover.Rate
 Min.   :0.0009712          Min.   :0.003324     Min.   :0.000e+00          
 1st Qu.:0.0020192          1st Qu.:0.016257     1st Qu.:0.000e+00          
 Median :0.0021953          Median :0.026571     Median :0.000e+00          
 Mean   :0.0022202          Mean   :0.039829     Mean   :1.229e+09          
 3rd Qu.:0.0023434          3rd Qu.:0.044457     3rd Qu.:0.000e+00          
 Max.   :0.0039097          Max.   :0.463483     Max.   :9.880e+09          
 Quick.Asset.Turnover.Rate Working.capitcal.Turnover.Rate Cash.Turnover.Rate 
 Min.   :0.000e+00         Min.   :0.5729                 Min.   :0.000e+00  
 1st Qu.:0.000e+00         1st Qu.:0.5939                 1st Qu.:0.000e+00  
 Median :0.000e+00         Median :0.5939                 Median :1.160e+09  
 Mean   :2.379e+09         Mean   :0.5941                 Mean   :2.305e+09  
 3rd Qu.:5.520e+09         3rd Qu.:0.5940                 3rd Qu.:3.660e+09  
 Max.   :9.980e+09         Max.   :0.6742                 Max.   :9.940e+09  
 Cash.Flow.to.Sales Fixed.Assets.to.Assets Current.Liability.to.Liability
 Min.   :0.6706     Min.   :0.000e+00      Min.   :0.04958               
 1st Qu.:0.6716     1st Qu.:0.000e+00      1st Qu.:0.64141               
 Median :0.6716     Median :0.000e+00      Median :0.80095               
 Mean   :0.6716     Mean   :1.976e+07      Mean   :0.75683               
 3rd Qu.:0.6716     3rd Qu.:0.000e+00      3rd Qu.:0.91956               
 Max.   :0.6908     Max.   :8.320e+09      Max.   :1.00000               
 Current.Liability.to.Equity Equity.to.Long.term.Liability
 Min.   :0.0000              Min.   :0.0000               
 1st Qu.:0.3289              1st Qu.:0.1109               
 Median :0.3324              Median :0.1142               
 Mean   :0.3372              Mean   :0.1228               
 3rd Qu.:0.3372              3rd Qu.:0.1208               
 Max.   :1.0000              Max.   :0.9221               
 Cash.Flow.to.Total.Assets Cash.Flow.to.Liability CFO.to.Assets   
 Min.   :0.1677            Min.   :0.07397        Min.   :0.2270  
 1st Qu.:0.6285            1st Qu.:0.45693        1st Qu.:0.5432  
 Median :0.6410            Median :0.45897        Median :0.5747  
 Mean   :0.6379            Mean   :0.45776        Mean   :0.5730  
 3rd Qu.:0.6511            3rd Qu.:0.46062        3rd Qu.:0.6041  
 Max.   :0.8637            Max.   :0.81312        Max.   :0.9156  
 Cash.Flow.to.Equity Current.Liability.to.Current.Assets Liability.Assets.Flag
 Min.   :0.0000      Min.   :0.0001224                   Min.   :0.00000      
 1st Qu.:0.3114      1st Qu.:0.0241626                   1st Qu.:0.00000      
 Median :0.3142      Median :0.0366234                   Median :0.00000      
 Mean   :0.3130      Mean   :0.0459983                   Mean   :0.01425      
 3rd Qu.:0.3163      3rd Qu.:0.0559658                   3rd Qu.:0.00000      
 Max.   :0.5692      Max.   :0.4606748                   Max.   :1.00000      
 Net.Income.to.Total.Assets Total.assets.to.GNP.price No.credit.Interval
 Min.   :0.4118             Min.   :0.000e+00         Min.   :0.5283    
 1st Qu.:0.7536             1st Qu.:0.000e+00         1st Qu.:0.6232    
 Median :0.7907             Median :0.000e+00         Median :0.6237    
 Mean   :0.7714             Mean   :4.755e+07         Mean   :0.6235    
 3rd Qu.:0.8097             3rd Qu.:0.000e+00         3rd Qu.:0.6241    
 Max.   :0.8955             Max.   :9.170e+09         Max.   :0.6797    
 Gross.Profit.to.Sales Net.Income.to.Stockholder.s.Equity Liability.to.Equity
 Min.   :0.5329        Min.   :0.0000                     Min.   :0.0000     
 1st Qu.:0.5966        1st Qu.:0.8356                     1st Qu.:0.2781     
 Median :0.6016        Median :0.8398                     Median :0.2817     
 Mean   :0.6028        Mean   :0.8330                     Mean   :0.2870     
 3rd Qu.:0.6091        3rd Qu.:0.8413                     3rd Qu.:0.2867     
 Max.   :0.6651        Max.   :1.0000                     Max.   :1.0000     
 Degree.of.Financial.Leverage..DFL.
 Min.   :0.02076                   
 1st Qu.:0.02663                   
 Median :0.02678                   
 Mean   :0.02797                   
 3rd Qu.:0.02684                   
 Max.   :0.26458                   
 Interest.Coverage.Ratio..Interest.expense.to.EBIT. Net.Income.Flag
 Min.   :0.4493                                     Min.   :1      
 1st Qu.:0.5644                                     1st Qu.:1      
 Median :0.5652                                     Median :1      
 Mean   :0.5652                                     Mean   :1      
 3rd Qu.:0.5655                                     3rd Qu.:1      
 Max.   :0.7370                                     Max.   :1      
 Equity.to.Liability
 Min.   :0.003946   
 1st Qu.:0.018048   
 Median :0.023393   
 Mean   :0.036272   
 3rd Qu.:0.037762   
 Max.   :1.000000   
dim(mydata)
[1] 421  96
  • Clearly we have a really high number of predictor variable so we need to use derived inputs techniques (ex. Principle Component Analysis)

Data Reduction

Principle Component Analysis

data=mydata[,-1]
pca_model=prcomp(data,center = TRUE)
Summary PCA model
summary(pca_model)
Importance of components:
                             PC1       PC2       PC3       PC4       PC5
Standard deviation     3.894e+09 3.266e+09 3.027e+09 2.904e+09 2.848e+09
Proportion of Variance 2.128e-01 1.497e-01 1.286e-01 1.184e-01 1.139e-01
Cumulative Proportion  2.128e-01 3.626e-01 4.912e-01 6.096e-01 7.235e-01
                             PC6       PC7       PC8       PC9      PC10
Standard deviation     2.562e+09 2.335e+09 2.262e+09 8.921e+08 6.049e+08
Proportion of Variance 9.210e-02 7.654e-02 7.179e-02 1.117e-02 5.140e-03
Cumulative Proportion  8.156e-01 8.921e-01 9.639e-01 9.751e-01 9.802e-01
                            PC11      PC12      PC13      PC14      PC15
Standard deviation     5.169e+08 4.904e+08 4.681e+08 4.504e+08 4.465e+08
Proportion of Variance 3.750e-03 3.380e-03 3.080e-03 2.850e-03 2.800e-03
Cumulative Proportion  9.840e-01 9.873e-01 9.904e-01 9.933e-01 9.961e-01
                            PC16      PC17      PC18      PC19      PC20   PC21
Standard deviation     3.475e+08 3.251e+08 1.673e+08 1.467e+08 6.088e+07 0.3201
Proportion of Variance 1.700e-03 1.480e-03 3.900e-04 3.000e-04 5.000e-05 0.0000
Cumulative Proportion  9.978e-01 9.992e-01 9.997e-01 1.000e+00 1.000e+00 1.0000
                         PC22   PC23   PC24   PC25  PC26    PC27    PC28
Standard deviation     0.2077 0.1742 0.1356 0.1218 0.116 0.09867 0.08805
Proportion of Variance 0.0000 0.0000 0.0000 0.0000 0.000 0.00000 0.00000
Cumulative Proportion  1.0000 1.0000 1.0000 1.0000 1.000 1.00000 1.00000
                          PC29    PC30    PC31    PC32    PC33    PC34    PC35
Standard deviation     0.07034 0.06137 0.05922 0.04598 0.04146 0.04027 0.03632
Proportion of Variance 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
Cumulative Proportion  1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
                          PC36    PC37    PC38   PC39    PC40    PC41    PC42
Standard deviation     0.03256 0.02981 0.02834 0.0251 0.02276 0.01898 0.01756
Proportion of Variance 0.00000 0.00000 0.00000 0.0000 0.00000 0.00000 0.00000
Cumulative Proportion  1.00000 1.00000 1.00000 1.0000 1.00000 1.00000 1.00000
                          PC43    PC44    PC45    PC46    PC47    PC48   PC49
Standard deviation     0.01712 0.01545 0.01459 0.01377 0.01293 0.01266 0.0119
Proportion of Variance 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.0000
Cumulative Proportion  1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.0000
                          PC50   PC51    PC52    PC53    PC54     PC55     PC56
Standard deviation     0.01117 0.0109 0.01032 0.01006 0.00938 0.009023 0.007936
Proportion of Variance 0.00000 0.0000 0.00000 0.00000 0.00000 0.000000 0.000000
Cumulative Proportion  1.00000 1.0000 1.00000 1.00000 1.00000 1.000000 1.000000
                           PC57     PC58   PC59     PC60     PC61     PC62
Standard deviation     0.007726 0.006835 0.0064 0.005821 0.005314 0.004968
Proportion of Variance 0.000000 0.000000 0.0000 0.000000 0.000000 0.000000
Cumulative Proportion  1.000000 1.000000 1.0000 1.000000 1.000000 1.000000
                           PC63    PC64     PC65     PC66     PC67     PC68
Standard deviation     0.004642 0.00429 0.003913 0.003407 0.002945 0.002702
Proportion of Variance 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000
Cumulative Proportion  1.000000 1.00000 1.000000 1.000000 1.000000 1.000000
                           PC69     PC70     PC71     PC72     PC73     PC74
Standard deviation     0.002498 0.002372 0.001883 0.001725 0.001373 0.001053
Proportion of Variance 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Cumulative Proportion  1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
                            PC75      PC76      PC77      PC78      PC79
Standard deviation     0.0009467 0.0008456 0.0007278 0.0006526 0.0005193
Proportion of Variance 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
Cumulative Proportion  1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
                            PC80      PC81      PC82      PC83      PC84
Standard deviation     0.0004672 0.0004013 0.0003665 0.0002562 0.0001398
Proportion of Variance 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
Cumulative Proportion  1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
                            PC85     PC86      PC87      PC88      PC89
Standard deviation     9.142e-05 7.56e-05 3.445e-05 2.573e-05 1.331e-06
Proportion of Variance 0.000e+00 0.00e+00 0.000e+00 0.000e+00 0.000e+00
Cumulative Proportion  1.000e+00 1.00e+00 1.000e+00 1.000e+00 1.000e+00
                            PC90      PC91      PC92      PC93      PC94
Standard deviation     2.734e-07 2.734e-07 2.734e-07 2.734e-07 2.734e-07
Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
Cumulative Proportion  1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
                            PC95
Standard deviation     2.156e-07
Proportion of Variance 0.000e+00
Cumulative Proportion  1.000e+00

We will obtain Screeplot and Cumulative variance plot

par(mfrow=c(1,2))
plot(pca_model,typ="l",main= "Screeplot",col="blue")
variance_plot=cumsum((pca_model$sdev^2)/sum(pca_model$sdev^2))
plot(variance_plot,type="l",main="Cumulative variance",xlab="No of principal components",ylab=" variance",col="blue")
abline(h=0.9,col="red")

  • 7 Principle components capture 90 percent variance of the data

Now we will reduce the dimension of the data using these 7 PC’s as predictors

rotation_matrix=pca_model$rotation
reduced_rotation_matrix=rotation_matrix[,-c(8:95)]
final_data=as.matrix(data)%*%reduced_rotation_matrix
dim(final_data)
[1] 421   7

Now we will apply different Clustering Techniques

Clustering

K-means

  • We will use kmeans to obtain different no of clusters.
  • compute Dunn Index for each of the model and plot it.
  • Choose the model with largest Dunn Index value and select the corresponding no of cluster.
set.seed(12)
dun_index=c()
for(i in 2:10){
  obj=kmeans(x=final_data,i)
  dun_index[i]=dunn(dist(final_data),as.vector(obj$cluster))
}
plot(dun_index,typ="b",xlab="No of Clusters",ylab="Dunn Index",col="darkviolet") 

  • The Dunn Index is max when the cluster number is
which.max(dun_index)
[1] 2
  • Now we will check the accuracy how well the method is been able to classify the two data
#--optimal clustering
set.seed(12)
obj=kmeans(final_data,2)
cluster_vector=obj$cluster
#--class labels
class_vector=mydata$Bankrupt.
#--Accuracy
True_Positive=0
False_Negetive=0
True_Negetive=0
False_Positive=0
for(i in 1:nrow(final_data)){
  if(class_vector[i]==0){
    if(cluster_vector[i]==1)
      True_Positive=True_Positive+1
    else
      False_Negetive=False_Negetive+1
    }
  else{
    if(cluster_vector[i]==2)
      True_Negetive=True_Negetive+1
    else
      False_Positive=False_Positive+1
  }
}
Confusion_mat=matrix(c(True_Positive,False_Negetive,True_Negetive,False_Positive),nc=2)
Accuracy=(sum(diag(Confusion_mat))/sum(Confusion_mat))*100
Accuracy
[1] 47.981
  • Which is Quiet less.

PAM (Partioning Around Medoids)

  • We will use PAM to obtain different no of clusters.
  • compute Dunn Index for each of the model and plot it.
  • Choose the model with largest Dunn Index value and select the corresponding no of cluster.
  • We will also use silhouette plot to judge the number of clusters.
set.seed(12)
for(i in 2:8){ 
  obj2=pam(final_data,i) 
  dun_index[i]=dunn(dist(final_data),obj2$clustering) } 
obj2=pam(final_data,2) 
plot(obj2,which=2,main="silhouette plot") 

plot(obj2,which=1,main="")

obj2=pam(final_data,3) 
plot(obj2,which=2,main="silhouette plot") 

plot(obj2,which=1,main="")

plot(dun_index,typ="b",xlab="No of Clusters",ylab="Dunn Index",col="darkviolet")

  • From the silhouette plot we get the maximum value of the silhouette coefficient 0.5 when the number of clusters is 2.

  • 0.5 means the clusters are moderately well separated

  • The Dunn Index is max when the cluster number is

which.max(dun_index)
[1] 2
  • Now we will check the accuracy how well the method is been able to classify the two data
#--optimal clustering
set.seed(12)
obj2=pam(final_data,2)
cluster_vector=obj2$clustering
#--Accuracy
True_Positive=0
False_Negetive=0
True_Negetive=0
False_Positive=0
for(i in 1:nrow(final_data)){
  if(class_vector[i]==0){
    if(cluster_vector[i]==1)
      True_Positive=True_Positive+1
    else
      False_Negetive=False_Negetive+1
    }
  else{
    if(cluster_vector[i]==2)
      True_Negetive=True_Negetive+1
    else
      False_Positive=False_Positive+1
  }
}
Confusion_mat=matrix(c(True_Positive,False_Negetive,True_Negetive,False_Positive),nc=2)
Accuracy=(sum(diag(Confusion_mat))/sum(Confusion_mat))*100
Accuracy
[1] 55.34442
  • Which is still Quiet less.

AGNES (Single Linkage Algorithm)

  • We will plot the dendogram using the above method.
  • Cut the dendogram at different lengths to get different no of clusters.
  • Then calculate Dunn Index based on it and get the optimal no of clusters.
par(mfrow=c(1,2))
obj3=agnes(final_data,method="single") 
plot(obj3,which=2,main="Dendogram of single linkage",col="darkblue")
dun_index=c()
for(i in 2:8){ 
  clust=cutree(obj3,i) 
  dun_index[i]=dunn(dist(final_data),clust) } 
plot(dun_index,typ="b",xlab="No of Clusters",ylab="Dunn Index",col="darkviolet")

  • The Agglomerative Coefficient 0.81 which means clusters are well formed.
  • The Dunn Index is max when the cluster number is
which.max(dun_index)
[1] 2
  • Now we will check the accuracy how well the method is been able to classify the two data
#--optimal clustering
obj3=agnes(final_data,method = "single")
clust=cutree(obj3,2)
cluster_vector=clust
#--Accuracy
True_Positive=0
False_Negetive=0
True_Negetive=0
False_Positive=0
for(i in 1:nrow(final_data)){
  if(class_vector[i]==0){
    if(cluster_vector[i]==1)
      True_Positive=True_Positive+1
    else
      False_Negetive=False_Negetive+1
    }
  else{
    if(cluster_vector[i]==2)
      True_Negetive=True_Negetive+1
    else
      False_Positive=False_Positive+1
  }
}
Confusion_mat=matrix(c(True_Positive,False_Negetive,True_Negetive,False_Positive),nc=2)
Accuracy=(sum(diag(Confusion_mat))/sum(Confusion_mat))*100
Accuracy
[1] 99.76247
  • Which is very accurate.

AGNES (complete Linkage Algorithm)

  • We will plot the dendogram using the above method.
  • Cut the dendogram at different lengths to get different no of clusters.
  • Then calculate Dunn Index based on it and get the optimal no of clusters.
par(mfrow=c(1,2))
obj3=agnes(final_data,method="complete") 
plot(obj3,which=2,main="Dendogram of complete linkage",col="darkblue")
dun_index=c()
for(i in 2:8){ 
  clust=cutree(obj3,i) 
  dun_index[i]=dunn(dist(final_data),clust) } 
plot(dun_index,typ="b",xlab="No of Clusters",ylab="Dunn Index",col="darkviolet")

  • The Agglomerative Coefficient 0.92 which means clusters are very well formed.
  • The Dunn Index is max when the cluster number is
which.max(dun_index)
[1] 2
  • Now we will check the accuracy how well the method is been able to classify the two data
#--optimal clustering
obj3=agnes(final_data,method = "complete")
clust=cutree(obj3,2)
cluster_vector=clust
#--Accuracy
True_Positive=0
False_Negetive=0
True_Negetive=0
False_Positive=0
for(i in 1:nrow(final_data)){
  if(class_vector[i]==0){
    if(cluster_vector[i]==1)
      True_Positive=True_Positive+1
    else
      False_Negetive=False_Negetive+1
    }
  else{
    if(cluster_vector[i]==2)
      True_Negetive=True_Negetive+1
    else
      False_Positive=False_Positive+1
  }
}
Confusion_mat=matrix(c(True_Positive,False_Negetive,True_Negetive,False_Positive),nc=2)
Accuracy=(sum(diag(Confusion_mat))/sum(Confusion_mat))*100
Accuracy
[1] 60.09501
  • Which is moderate.

DIANA (Divisive Analysis)

  • We will plot the dendogram using the above method.
  • Cut the dendogram at different lengths to get different no of clusters.
  • Then calculate Dunn Index based on it and get the optimal no of clusters.
par(mfrow=c(1,2))
obj5=diana(final_data) 
plot(obj5,which=2,main="Dendogram",col="darkblue")
dun_index=c() 
for(i in 2:15){    
  clust=cutree(obj5,i)    
  dun_index[i]=dunn(dist(final_data),clust) }  
plot(dun_index,typ="b",xlab="No of Clusters",ylab="Dunn Index",col="darkviolet")

  • The divisive Coefficient 0.92 which means clusters are very well formed.
  • The Dunn Index is max when the cluster number is
which.max(dun_index)
[1] 11
  • Which is quiet different from the rest of the results obtained from different methods.

  • Now we will check the accuracy how well the method is been able to classify the two data

#--optimal clustering
obj5=diana(final_data)
clust=cutree(obj3,2)
cluster_vector=clust
#--Accuracy
True_Positive=0
False_Negetive=0
True_Negetive=0
False_Positive=0
for(i in 1:nrow(final_data)){
  if(class_vector[i]==0){
    if(cluster_vector[i]==1)
      True_Positive=True_Positive+1
    else
      False_Negetive=False_Negetive+1
    }
  else{
    if(cluster_vector[i]==2)
      True_Negetive=True_Negetive+1
    else
      False_Positive=False_Positive+1
  }
}
Confusion_mat=matrix(c(True_Positive,False_Negetive,True_Negetive,False_Positive),nc=2)
Accuracy=(sum(diag(Confusion_mat))/sum(Confusion_mat))*100
Accuracy
[1] 60.09501
  • Which is Very moderate.

DBSCAN (Density Based Spatial Clustering of Application with Noise)

  • Here we will need initial value of two parameter eps and minPts
  • Selecting minPts = (no of predictors in data +1)=8
  • For eps we will obtain k-distance plot
mat=kNN(final_data, k = 8) 
k_dist=sort(mat$dist[,8]) 
plot(k_dist,typ="l",xlab="data points",ylab="k distances",main="k distance plot") 
abline(h=0.5,col="red")

  • We will use a sequence of values around 0.5 .
  • Create a Dunn Index plot and Choose the member of the sequence for which Dunn index will be maximum.
db_seq=seq(0.1,2.0,0.1)
dun_index=c() 
for(i in 2:10){  
  obj6=dbscan(final_data,eps=db_seq[i],minPts=8) 
  dun_index[i]=dunn(dist(final_data),obj6$cluster) } 
plot(dun_index,typ="b",xlab="Member of db_seq",ylab="Dunn Index",col="darkviolet")  

which.max(dun_index)
[1] 5

The 5’th member is selected so we will choose eps =0.5

  • Since we also have Noise points here we cannot directly calculate accuracy.
  • We remove the noise points and calculate Accuracy with rest of the points.
db=dbscan(final_data,eps=db_seq[5],minPts=8)
db
DBSCAN clustering for 421 objects.
Parameters: eps = 0.5, minPts = 8
Using euclidean distances and borderpoints = TRUE
The clustering contains 2 cluster(s) and 194 noise points.

  0   1   2 
194 208  19 

Available fields: cluster, eps, minPts, metric, borderPoints
clus_vec=db$cluster
index_remove=which(clus_vec==0)
cluster_vector=clus_vec[-index_remove]
class_vector=mydata[-index_remove,]$Bankrupt.
#--Accuracy
True_Positive=0
False_Negetive=0
True_Negetive=0
False_Positive=0
for(i in 1:nrow(final_data[-index_remove,])){
  if(class_vector[i]==0){
    if(cluster_vector[i]==1)
      True_Positive=True_Positive+1
    else
      False_Negetive=False_Negetive+1
    }
  else{
    if(cluster_vector[i]==2)
      True_Negetive=True_Negetive+1
    else
      False_Positive=False_Positive+1
  }
}
Confusion_mat=matrix(c(True_Positive,False_Negetive,True_Negetive,False_Positive),nc=2)
Accuracy=(sum(diag(Confusion_mat))/sum(Confusion_mat))*100
Accuracy
[1] 91.62996
  • Which is Very Good.

OPTICS ( Ordering Points To Identify Clustering Structures)

  • Here we used the value of minPts same as before.
  • Select eps using the same analogy as before.
  • We will use a sequence of values around 0.5 .
  • Create a Dunn Index plot and Choose the member of the sequence for which Dunn index will be maximum.
dun_index=c() 
for(i in 2:10){    
  obj6=optics(final_data,eps=db_seq[i],minPts=8)   
  dun_index[i]=dunn(dist(final_data),extractDBSCAN(obj6,db_seq[i])$cluster) }  
plot(dun_index,typ="b",xlab="No of Clusters",ylab="Dunn Index",col="darkviolet")  

which.max(dun_index)
[1] 5

The 5’th member is selected so we will choose eps =0.5

  • Since we also have Noise points here we cannot directly calculate accuracy.
  • We remove the noise points and calculate Accuracy with rest of the points.
  • We will visualize the the Reachability Plot
op=optics(final_data,eps=db_seq[5],minPts=8)
table(extractDBSCAN(op,db_seq[5])$cluster)

  0   1   2 
194 208  19 
clus_vec=extractDBSCAN(optics(final_data,eps=db_seq[5],minPts=8),db_seq[5])$cluster
index_remove=which(clus_vec==0)
cluster_vector=clus_vec[-index_remove]
#--Accuracy
True_Positive=0
False_Negetive=0
True_Negetive=0
False_Positive=0
for(i in 1:nrow(final_data[-index_remove,])){
  if(class_vector[i]==0){
    if(cluster_vector[i]==1)
      True_Positive=True_Positive+1
    else
      False_Negetive=False_Negetive+1
    }
  else{
    if(cluster_vector[i]==2)
      True_Negetive=True_Negetive+1
    else
      False_Positive=False_Positive+1
  }
}
Confusion_mat=matrix(c(True_Positive,False_Negetive,True_Negetive,False_Positive),nc=2)
Accuracy=(sum(diag(Confusion_mat))/sum(Confusion_mat))*100
Accuracy
[1] 91.62996
  • Which is Very Good.
plot(obj6,ylab="Reachability distance")

** Clearly We have 2 Clusters here with varying density.

Project Report

Summary findings

  • The findings from different methods are summarized in the table below

    Method Applied Optimal no of Clusters Accuracy
    K-Means 2 47.98
    PAM 2 55.35
    AGNES(Single Linkage) 2 99.76247
    AGNES(complete Linkage) 2 60.09
    DIANA 11 60.09
    DBSCAN 2 91.63
    OPTICS 2 91.63

Conclusion

In our analysis of bankruptcy data using clustering methods, we observed that several algorithms performed exceptionally well, achieving high classification accuracy. Specifically, the clustering methods produced two distinct clusters, which effectively separated the data into “bankrupt” and “non-bankrupt” categories.

The high accuracy achieved by 3 methods indicates that the clustering algorithms were successful in identifying and grouping data points according to their bankruptcy status. This suggests that the underlying structure of the data is well-represented by the two-cluster model, and that the chosen clustering techniques are well-suited for this classification task.

Overall, the results underscore the effectiveness of clustering as a tool for distinguishing between bankrupt and non-bankrupt entities, validating the robustness of the algorithms used. This not only demonstrates the algorithms’ capability in handling this specific classification problem but also highlights their potential for broader applications in similar data analysis scenarios.