Question: Can we find measures that will help us build a model to show a company’s likelihood to go bankrupt within one year based on financial ratios from a 5 year period?
library(farff)
library(ggplot2)
PolishCompanies<-readARFF("C:/Users/dawig/Documents/DataPolishCompanies/5year.arff")
## Parse with reader=readr : C:/Users/dawig/Documents/DataPolishCompanies/5year.arff
## Loading required package: readr
## header: 0.010000; preproc: 0.030000; data: 0.130000; postproc: 0.000000; total: 0.170000
PolishCompanies$class<-as.numeric(PolishCompanies$class)
    This data set contains some questionable data, which is not scaled in a known way. The average working capital for a company in the set is 10817, which may be in millions of zlotys, but is definitely not in single dollars or single zlotys. To ascertain whether the data is internally consistent, a secondary data frame is created with financial data recomposed based on the ratios. A ratio based on the recomposed numbers is then compared to another ratio within the data set to see that they are similar. The vast majority of the companies did not go bankrupt as seen below.

    The mean, minimum and maximum was taken for the difference between both methods for calculating total assets/ total liabilities. The results are:
mean =-0.0031149
min = -20.1849315
max = 1.7538945
    The mean, minimum and maximum were also taken for the difference between both methods for calculating sales /total assets.
mean = 0.2390619
min = -2.3603715
max = 12.2390366
    It appears that the ratios match reasonably close enough to achieve meaningful results. head(RecomposedFinancials,4) str(RecomposedFinancials)
    There is a warning here that some fitted probabilities are 0 or 1. This would be a difficulty in using this model to predict bankruptcy. This probably occurs because the probability of bankruptcy is quite low and very hard to predict. While our model found statistically significant factors, it did not account for a lot of the variability in bankruptcy likelihood.
summary(glmResults)
##
## Call:
## glm(formula = PolishCompanies[[65]] ~ PolishCompanies[[48]] +
## PolishCompanies[[19]] + RecomposedFinancials[[1]], family = "binomial")
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.9758 -0.3903 -0.3816 -0.3640 3.9044
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.518e+00 5.679e-02 -44.344 < 2e-16 ***
## PolishCompanies[[48]] -3.319e-01 8.487e-02 -3.911 9.19e-05 ***
## PolishCompanies[[19]] -7.123e-03 5.257e-03 -1.355 0.17545
## RecomposedFinancials[[1]] -2.121e-06 6.849e-07 -3.097 0.00195 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2967.9 on 5905 degrees of freedom
## Residual deviance: 2927.4 on 5902 degrees of freedom
## (4 observations deleted due to missingness)
## AIC: 2935.4
##
## Number of Fisher Scoring iterations: 7
with(glmResults, null.deviance - deviance)
## [1] 40.467
    The final model is completed below, dropping the gross profit/sales, which was not statistically significant. The resulting model shows that: EBITDA (profit on operating activities - depreciation) / total assets, and total assets as a measure by itsself were both statistically significant predictors of a Polish company’s likelihood during the period studied. No segmentation of data to test the model was performed.
glmResults<-glm(PolishCompanies[[65]] ~ PolishCompanies[[48]] + RecomposedFinancials[[1]], family="binomial")
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(glmResults)
##
## Call:
## glm(formula = PolishCompanies[[65]] ~ PolishCompanies[[48]] +
## RecomposedFinancials[[1]], family = "binomial")
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.9660 -0.3907 -0.3820 -0.3644 3.9110
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.516e+00 5.675e-02 -44.343 < 2e-16 ***
## PolishCompanies[[48]] -3.331e-01 8.490e-02 -3.924 8.72e-05 ***
## RecomposedFinancials[[1]] -2.133e-06 6.864e-07 -3.108 0.00188 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2967.9 on 5905 degrees of freedom
## Residual deviance: 2928.8 on 5903 degrees of freedom
## (4 observations deleted due to missingness)
## AIC: 2934.8
##
## Number of Fisher Scoring iterations: 7
Unbankrupted<-subset(RecomposedFinancials,PolishCompanies[[65]] == "0")
str (Unbankrupted)
## 'data.frame': 5500 obs. of 9 variables:
## $ Total_Assets : num 1.34e+06 1.01e+04 5.78e+04 1.04e+04 1.16e+01 ...
## $ Total_Liabilities : num 7.43e+05 4.87e+03 1.28e+04 9.19e+03 6.36 ...
## $ Working_Capital : num 15182 2341.8 33401 2789.6 1.24 ...
## $ Book_Value_of_Equity: num 4.29e+05 5.18e+03 3.92e+04 1.17e+03 5.19 ...
## $ AssetLiabilityCheck : num 1.13e-05 -5.53e-05 -9.61e-05 -4.28e-06 2.75e-05 ...
## $ Net_Profit : num 1.18e+05 -6.23e+01 7.53e+03 -9.32e+02 5.57e-01 ...
## $ Sales : num 1.90e+06 1.28e+04 6.53e+04 1.32e+04 1.75e+01 ...
## $ SalesAssetsCheck : num 3.29e-01 -9.33e-05 -1.17e-02 4.84e-05 -3.24e-05 ...
## $ EBITDA : num 0.0354 0.4521 0.8526 2.3828 0.2394 ...
Bankrupted<-subset(RecomposedFinancials,PolishCompanies[[65]] == "1")
str (Bankrupted)
## 'data.frame': 410 obs. of 9 variables:
## $ Total_Assets : num 4757 1447 1435 5364 11209 ...
## $ Total_Liabilities : num 4856 1634 1079 6316 8216 ...
## $ Working_Capital : num 624 475 227 1438 296 ...
## $ Book_Value_of_Equity: num -98.8 -187.7 356.3 -952.4 2992.1 ...
## $ AssetLiabilityCheck : num 1.38e-05 2.27e-05 2.87e-05 -1.31e-05 -6.29e-06 ...
## $ Net_Profit : num 383.5 -193 55.1 -284.6 -2786 ...
## $ Sales : num 11192 1305 1704 8713 11433 ...
## $ SalesAssetsCheck : num 5.92e-05 -3.37e-06 -4.64e-05 3.80e-05 -1.48e-05 ...
## $ EBITDA : num -6.3179 -2.5308 0.6376 -1.5098 0.0989 ...
qplot(RecomposedFinancials[[1]],RecomposedFinancials[[9]]) + geom_point(aes(x=RecomposedFinancials[[1]], y=RecomposedFinancials[[9]], colour=factor(PolishCompanies[[65]]))) + xlim(0,100000) +ylim(0,10) + labs(x="Total Assets", y = "EBITDA", color="Bankruptcy Next Year?")
## Warning: Removed 1093 rows containing missing values (geom_point).
## Warning: Removed 1093 rows containing missing values (geom_point).

    The graph for the two factors is separated into 2 graphs to visualize it with less crouded points.
ggplot(RecomposedFinancials, aes(x = PolishCompanies[[65]], y = RecomposedFinancials$EBITDA)) + geom_boxplot() + labs(x = "Bankruptcy?", y="EBITDA")
## Warning: Removed 19 rows containing non-finite values (stat_boxplot).

    The boxplot of EBITDA for bankrupt and solvent companies shows a possibility for future fruitful research. Most of the distribution of bankrupt companies is closer together than solvent companies, but there are noticable outliers. Perhaps one could find two separate predictors for bankruptcy.
qplot(Unbankrupted[[1]],Unbankrupted[[9]]) + geom_point(aes(x=Unbankrupted[[1]], y=Unbankrupted[[9]], color="red")) + xlim(0,100000) +ylim(0,10) + labs(x = "Total Assets", y="EBITDA")
## Warning: Removed 960 rows containing missing values (geom_point).
## Warning: Removed 960 rows containing missing values (geom_point).

qplot(Bankrupted[[1]],Bankrupted[[9]]) + geom_point(aes(x=Bankrupted[[1]], y=Bankrupted[[9]])) + geom_point(colour = "#00BFC4") + xlim(0,100000) +ylim(0,10) + labs(x = "Total Assets", y="EBITDA")
## Warning: Removed 133 rows containing missing values (geom_point).
## Warning: Removed 133 rows containing missing values (geom_point).
## Warning: Removed 133 rows containing missing values (geom_point).
