Week3Bridge

Regression of Polish Companies’ Financials to Bankruptcy After 1 Year

Question: Can we find measures that will help us build a model to show a company’s likelihood to go bankrupt within one year based on financial ratios from a 5 year period?

library(farff)
library(ggplot2)
PolishCompanies<-readARFF("C:/Users/dawig/Documents/DataPolishCompanies/5year.arff")

## Parse with reader=readr : C:/Users/dawig/Documents/DataPolishCompanies/5year.arff

## Loading required package: readr

## header: 0.010000; preproc: 0.030000; data: 0.130000; postproc: 0.000000; total: 0.170000

PolishCompanies$class<-as.numeric(PolishCompanies$class)

This data set contains some questionable data, which is not scaled in a known way. The average working capital for a company in the set is 10817, which may be in millions of zlotys, but is definitely not in single dollars or single zlotys. To ascertain whether the data is internally consistent, a secondary data frame is created with financial data recomposed based on the ratios. A ratio based on the recomposed numbers is then compared to another ratio within the data set to see that they are similar. The vast majority of the companies did not go bankrupt as seen below.

    The mean, minimum and maximum was taken for the difference between both methods for calculating total assets/ total liabilities. The results are:
mean =-0.0031149
min = -20.1849315
max = 1.7538945
    The mean, minimum and maximum were also taken for the difference between both methods for calculating sales /total assets.
mean = 0.2390619
min = -2.3603715
max = 12.2390366
    It appears that the ratios match reasonably close enough to achieve meaningful results. head(RecomposedFinancials,4) str(RecomposedFinancials)

This data set has a major deficiency in answering the question at hand. Ratios related to debt are unavailable. Nevertheless, EBITDA as a function of total assets is available to test. This gives a sense of how much cash a company can generate in the current period compared to its size.

RecomposedFinancials[[9]]<-(RecomposedFinancials[[3]]/RecomposedFinancials[[4]])
PolishCompanies[[65]] <- factor(PolishCompanies[[65]])
glmResults<-glm(PolishCompanies[[65]] ~ PolishCompanies[[48]]+PolishCompanies[[19]]+RecomposedFinancials[[1]], family="binomial")

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

There is a warning here that some fitted probabilities are 0 or 1. This would be a difficulty in using this model to predict bankruptcy. This probably occurs because the probability of bankruptcy is quite low and very hard to predict. While our model found statistically significant factors, it did not account for a lot of the variability in bankruptcy likelihood.

summary(glmResults)

## 
## Call:
## glm(formula = PolishCompanies[[65]] ~ PolishCompanies[[48]] + 
##     PolishCompanies[[19]] + RecomposedFinancials[[1]], family = "binomial")
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.9758  -0.3903  -0.3816  -0.3640   3.9044  
## 
## Coefficients:
##                             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)               -2.518e+00  5.679e-02 -44.344  < 2e-16 ***
## PolishCompanies[[48]]     -3.319e-01  8.487e-02  -3.911 9.19e-05 ***
## PolishCompanies[[19]]     -7.123e-03  5.257e-03  -1.355  0.17545    
## RecomposedFinancials[[1]] -2.121e-06  6.849e-07  -3.097  0.00195 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2967.9  on 5905  degrees of freedom
## Residual deviance: 2927.4  on 5902  degrees of freedom
##   (4 observations deleted due to missingness)
## AIC: 2935.4
## 
## Number of Fisher Scoring iterations: 7

with(glmResults, null.deviance - deviance)

## [1] 40.467

The final model is completed below, dropping the gross profit/sales, which was not statistically significant. The resulting model shows that: EBITDA (profit on operating activities - depreciation) / total assets, and total assets as a measure by itsself were both statistically significant predictors of a Polish company’s likelihood during the period studied. No segmentation of data to test the model was performed.

glmResults<-glm(PolishCompanies[[65]] ~ PolishCompanies[[48]] + RecomposedFinancials[[1]], family="binomial")

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

summary(glmResults)

## 
## Call:
## glm(formula = PolishCompanies[[65]] ~ PolishCompanies[[48]] + 
##     RecomposedFinancials[[1]], family = "binomial")
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.9660  -0.3907  -0.3820  -0.3644   3.9110  
## 
## Coefficients:
##                             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)               -2.516e+00  5.675e-02 -44.343  < 2e-16 ***
## PolishCompanies[[48]]     -3.331e-01  8.490e-02  -3.924 8.72e-05 ***
## RecomposedFinancials[[1]] -2.133e-06  6.864e-07  -3.108  0.00188 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2967.9  on 5905  degrees of freedom
## Residual deviance: 2928.8  on 5903  degrees of freedom
##   (4 observations deleted due to missingness)
## AIC: 2934.8
## 
## Number of Fisher Scoring iterations: 7

Unbankrupted<-subset(RecomposedFinancials,PolishCompanies[[65]] == "0")
str (Unbankrupted)

## 'data.frame':    5500 obs. of  9 variables:
##  $ Total_Assets        : num  1.34e+06 1.01e+04 5.78e+04 1.04e+04 1.16e+01 ...
##  $ Total_Liabilities   : num  7.43e+05 4.87e+03 1.28e+04 9.19e+03 6.36 ...
##  $ Working_Capital     : num  15182 2341.8 33401 2789.6 1.24 ...
##  $ Book_Value_of_Equity: num  4.29e+05 5.18e+03 3.92e+04 1.17e+03 5.19 ...
##  $ AssetLiabilityCheck : num  1.13e-05 -5.53e-05 -9.61e-05 -4.28e-06 2.75e-05 ...
##  $ Net_Profit          : num  1.18e+05 -6.23e+01 7.53e+03 -9.32e+02 5.57e-01 ...
##  $ Sales               : num  1.90e+06 1.28e+04 6.53e+04 1.32e+04 1.75e+01 ...
##  $ SalesAssetsCheck    : num  3.29e-01 -9.33e-05 -1.17e-02 4.84e-05 -3.24e-05 ...
##  $ EBITDA              : num  0.0354 0.4521 0.8526 2.3828 0.2394 ...

Bankrupted<-subset(RecomposedFinancials,PolishCompanies[[65]] == "1")
str (Bankrupted)

## 'data.frame':    410 obs. of  9 variables:
##  $ Total_Assets        : num  4757 1447 1435 5364 11209 ...
##  $ Total_Liabilities   : num  4856 1634 1079 6316 8216 ...
##  $ Working_Capital     : num  624 475 227 1438 296 ...
##  $ Book_Value_of_Equity: num  -98.8 -187.7 356.3 -952.4 2992.1 ...
##  $ AssetLiabilityCheck : num  1.38e-05 2.27e-05 2.87e-05 -1.31e-05 -6.29e-06 ...
##  $ Net_Profit          : num  383.5 -193 55.1 -284.6 -2786 ...
##  $ Sales               : num  11192 1305 1704 8713 11433 ...
##  $ SalesAssetsCheck    : num  5.92e-05 -3.37e-06 -4.64e-05 3.80e-05 -1.48e-05 ...
##  $ EBITDA              : num  -6.3179 -2.5308 0.6376 -1.5098 0.0989 ...

qplot(RecomposedFinancials[[1]],RecomposedFinancials[[9]]) + geom_point(aes(x=RecomposedFinancials[[1]], y=RecomposedFinancials[[9]], colour=factor(PolishCompanies[[65]]))) + xlim(0,100000) +ylim(0,10) + labs(x="Total Assets", y = "EBITDA", color="Bankruptcy Next Year?")

## Warning: Removed 1093 rows containing missing values (geom_point).

## Warning: Removed 1093 rows containing missing values (geom_point).

The graph for the two factors is separated into 2 graphs to visualize it with less crouded points.

ggplot(RecomposedFinancials, aes(x = PolishCompanies[[65]], y = RecomposedFinancials$EBITDA)) + geom_boxplot() + labs(x = "Bankruptcy?", y="EBITDA")

## Warning: Removed 19 rows containing non-finite values (stat_boxplot).

The boxplot of EBITDA for bankrupt and solvent companies shows a possibility for future fruitful research. Most of the distribution of bankrupt companies is closer together than solvent companies, but there are noticable outliers. Perhaps one could find two separate predictors for bankruptcy.

qplot(Unbankrupted[[1]],Unbankrupted[[9]]) + geom_point(aes(x=Unbankrupted[[1]], y=Unbankrupted[[9]], color="red")) + xlim(0,100000) +ylim(0,10) + labs(x = "Total Assets", y="EBITDA")

## Warning: Removed 960 rows containing missing values (geom_point).

## Warning: Removed 960 rows containing missing values (geom_point).

qplot(Bankrupted[[1]],Bankrupted[[9]]) + geom_point(aes(x=Bankrupted[[1]], y=Bankrupted[[9]])) + geom_point(colour = "#00BFC4") + xlim(0,100000) +ylim(0,10) + labs(x = "Total Assets", y="EBITDA")

## Warning: Removed 133 rows containing missing values (geom_point).

## Warning: Removed 133 rows containing missing values (geom_point).

## Warning: Removed 133 rows containing missing values (geom_point).

Week3Bridge

Dan Wigodsky

January 18, 2018

Regression of Polish Companies’ Financials to Bankruptcy After 1 Year

Question: Can we find measures that will help us build a model to show a company’s likelihood to go bankrupt within one year based on financial ratios from a 5 year period?

The graph for the two factors is separated into 2 graphs to visualize it with less crouded points.

The boxplot of EBITDA for bankrupt and solvent companies shows a possibility for future fruitful research. Most of the distribution of bankrupt companies is closer together than solvent companies, but there are noticable outliers. Perhaps one could find two separate predictors for bankruptcy.