The data used is from CSMAR database 2015-2019 annual financial statements. It selected from four aspects: asset structure, profitability, cash flow, and operating capability. The model used is M-score model with 9 indexes: Day Sales in Receivables Index (DSRI), Gross Margin Index (GMI) Asset Quality Index (AQI) Sales Growth Index (SGI) Depreciation Index (DEPI) Selling, General, & Admin. Expenses Index (SGAI) Leverage Index (LVGI) Total Accruals to Total Assets (TATA). Below is the univariate analysis and histograms of each index.
Below is missing data analysis, there is not missing in the dataset.
aggr(data[,2:10],prop=FALSE,numbers=TRUE)
Below is outlier analysis, it shows that each index has outlier, these outliers will be deleted in the model.
featurePlot(x=data[,2:9],y=data[,11], plot = "box")