Tired of remembering too many different Packages?
One of the biggest challenge beginners in Data Science face is which algorithms to learn and focus on. In case of R, the problem gets accentuated by the fact that one functionality can be achieved by various approaches by using different libraries available in R, which is great but quite frustrating since each package was designed independently and has very different syntax, inputs and outputs. This could be too much for a beginner.
Here is a tip to handle everything from Exploring Data to performing complex Machine learning Algorithms to tuning those algorithms using hyper parameters, everything under a single roof.
All this has been made possible by the years of effort that have gone behind CARET ( Classification And REgression Training) which is possibly the biggest project in R. This package alone is all you need to know for solve almost any supervised machine learning problem. Not only does caret allow you to run a plethora of ML methods, it also provides tools for auxiliary techniques such as:
• Data preparation (imputation, centering/scaling data, removing correlated predictors, reducing skewness)
• Data splitting
• Variable selection
• Model evaluation
In this problem statement, we have to predict the Loan Status of an Individual based on his/ her profile. We’ll get started by loading the Caret Library and Loan Default dataset in R available in my Working Directory.
# Installing the Library.
library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
# Setting up the working Directory and Loading the Loan Default dataset.
setwd("D:/Great Learning/Finance and Risk Analytics")
dataset <- read.csv("raw-data.csv")
Once we have the data available in R Environment, we perform a few exploratory checks to understand the Structure of the data and ensure that the data loaded is correct.
### Performing basic Exploratory Analysis
# Checking the class of the data.
class(dataset)
## [1] "data.frame"
# Checking the dimension of data.
dim(dataset)
## [1] 3541 53
# Reading top 5 Rows.
head(dataset, n=5)
## Num Networth.Next.Year Total.assets Net.worth Total.income
## 1 1 8890.6 17512.3 7093.2 24965.2
## 2 2 394.3 941.0 351.5 1527.4
## 3 3 92.2 232.8 100.6 477.3
## 4 4 2.7 2.7 2.7 NA
## 5 5 109.0 478.5 107.6 1580.5
## Change.in.stock Total.expenses Profit.after.tax PBDITA PBT
## 1 235.8 23657.8 1543.2 2860.2 2417.2
## 2 42.7 1454.9 115.2 283.0 188.4
## 3 -5.2 478.7 -6.6 5.8 -6.6
## 4 NA NA NA NA NA
## 5 -17.0 1558.0 5.5 31.0 6.3
## Cash.profit PBDITA.as...of.total.income PBT.as...of.total.income
## 1 1872.8 11.46 9.68
## 2 158.6 18.53 12.33
## 3 0.3 1.22 -1.38
## 4 NA 0.00 0.00
## 5 11.9 1.96 0.40
## PAT.as...of.total.income Cash.profit.as...of.total.income
## 1 6.18 7.50
## 2 7.54 10.38
## 3 -1.38 0.06
## 4 0.00 0.00
## 5 0.35 0.75
## PAT.as...of.net.worth Sales Income.from.financial.services
## 1 23.78 24458.0 158.0
## 2 38.08 1504.3 4.0
## 3 -6.35 475.6 1.5
## 4 0.00 NA NA
## 5 5.25 1575.1 3.9
## Other.income Total.capital Reserves.and.funds
## 1 297.2 423.8 6822.8
## 2 15.9 115.5 257.8
## 3 0.2 81.4 19.2
## 4 NA 0.5 2.2
## 5 0.9 6.2 161.8
## Deposits..accepted.by.commercial.banks. Borrowings
## 1 NA 14.9
## 2 NA 272.5
## 3 NA 35.4
## 4 NA NA
## 5 NA 193.1
## Current.liabilities...provisions Deferred.tax.liability
## 1 9965.9 284.9
## 2 210.0 85.2
## 3 96.8 NA
## 4 NA NA
## 5 112.8 4.6
## Shareholders.funds Cumulative.retained.profits Capital.employed TOL.TNW
## 1 7093.2 6263.3 7108.1 1.33
## 2 351.5 247.4 624.0 1.23
## 3 100.6 32.4 136.0 1.44
## 4 2.7 2.2 2.7 0.00
## 5 107.6 82.7 300.7 2.83
## Total.term.liabilities...tangible.net.worth
## 1 0.00
## 2 0.34
## 3 0.29
## 4 0.00
## 5 1.59
## Contingent.liabilities...Net.worth.... Contingent.liabilities
## 1 14.80 1049.7
## 2 19.23 67.6
## 3 45.83 46.1
## 4 0.00 NA
## 5 34.94 37.6
## Net.fixed.assets Investments Current.assets Net.working.capital
## 1 1900.2 1069.6 13277.5 3588.5
## 2 286.4 2.2 563.9 203.5
## 3 38.7 4.3 167.5 59.6
## 4 2.5 NA 0.2 0.2
## 5 94.8 7.4 349.7 215.8
## Quick.ratio..times. Current.ratio..times. Debt.to.equity.ratio..times.
## 1 1.18 1.37 0.00
## 2 0.95 1.56 0.78
## 3 1.11 1.55 0.35
## 4 NA NA 0.00
## 5 1.41 2.54 1.79
## Cash.to.current.liabilities..times.
## 1 0.43
## 2 0.06
## 3 0.21
## 4 NA
## 5 0.00
## Cash.to.average.cost.of.sales.per.day Creditors.turnover
## 1 68.21 3.62
## 2 5.96 9.80
## 3 17.07 5.28
## 4 NA 0.00
## 5 0.00 13.00
## Debtors.turnover Finished.goods.turnover WIP.turnover
## 1 3.85 200.55 21.78
## 2 5.70 14.21 7.49
## 3 5.07 9.24 0.23
## 4 0.00 NA NA
## 5 9.46 12.68 7.90
## Raw.material.turnover Shares.outstanding Equity.face.value EPS
## 1 7.71 42381675 10 35.52
## 2 11.46 11550000 10 9.97
## 3 NA 8149090 10 -0.50
## 4 0.00 52404 10 0.00
## 5 17.03 619635 10 7.91
## Adjusted.EPS Total.liabilities PE.on.BSE Default
## 1 7.10 17512.3 27.31 0
## 2 9.97 941.0 8.17 0
## 3 -0.50 232.8 -5.76 0
## 4 0.00 2.7 NA 0
## 5 7.91 478.5 NA 0
# Reading bottom 5 Rows.
tail(dataset, n=5)
## Num Networth.Next.Year Total.assets Net.worth Total.income
## 3537 3541 226.4 450.5 172.3 565.0
## 3538 3542 89.4 97.6 82.0 75.8
## 3539 3543 246.2 902.9 209.1 1005.1
## 3540 3544 146.9 177.0 137.2 371.0
## 3541 3545 -0.2 0.6 0.3 NA
## Change.in.stock Total.expenses Profit.after.tax PBDITA PBT
## 3537 30.5 581.1 14.4 76.7 41.1
## 3538 -4.0 66.5 5.3 11.1 6.2
## 3539 5.6 966.5 44.2 120.3 70.0
## 3540 3.9 348.9 26.0 50.5 40.8
## 3541 NA 17.4 -17.4 -17.4 -17.4
## Cash.profit PBDITA.as...of.total.income PBT.as...of.total.income
## 3537 48.4 13.58 7.27
## 3538 9.2 14.64 8.18
## 3539 62.6 11.97 6.96
## 3540 33.6 13.61 11.00
## 3541 -17.4 NA NA
## PAT.as...of.total.income Cash.profit.as...of.total.income
## 3537 2.55 8.57
## 3538 6.99 12.14
## 3539 4.40 6.23
## 3540 7.01 9.06
## 3541 NA NA
## PAT.as...of.net.worth Sales Income.from.financial.services
## 3537 8.71 564.5 0.5
## 3538 6.68 73.9 1.7
## 3539 22.77 995.9 2.6
## 3540 20.30 365.8 3.3
## 3541 -193.33 NA NA
## Other.income Total.capital Reserves.and.funds
## 3537 NA 89.0 85.5
## 3538 NA 38.6 48.4
## 3539 0.3 30.0 179.1
## 3540 1.6 50.9 86.3
## 3541 NA 28.3 -28.0
## Deposits..accepted.by.commercial.banks. Borrowings
## 3537 NA 190.2
## 3538 NA 3.0
## 3539 NA 305.0
## 3540 NA 1.3
## 3541 NA NA
## Current.liabilities...provisions Deferred.tax.liability
## 3537 42.5 36.8
## 3538 7.6 NA
## 3539 363.4 25.4
## 3540 21.1 17.4
## 3541 0.3 NA
## Shareholders.funds Cumulative.retained.profits Capital.employed
## 3537 172.3 76.8 362.5
## 3538 87.0 36.6 90.0
## 3539 209.1 179.1 514.1
## 3540 137.2 77.1 138.5
## 3541 0.3 -28.0 0.3
## TOL.TNW Total.term.liabilities...tangible.net.worth
## 3537 1.30 0.72
## 3538 0.12 0.02
## 3539 2.45 0.68
## 3540 0.10 0.01
## 3541 1.00 0.00
## Contingent.liabilities...Net.worth.... Contingent.liabilities
## 3537 0.00 NA
## 3538 5.12 4.2
## 3539 93.45 195.4
## 3540 6.20 8.5
## 3541 0.00 NA
## Net.fixed.assets Investments Current.assets Net.working.capital
## 3537 227.0 NA 187.0 78.3
## 3538 21.9 6.8 55.8 47.2
## 3539 217.7 17.5 477.5 -49.5
## 3540 73.5 NA 80.8 59.7
## 3541 NA NA 0.6 0.3
## Quick.ratio..times. Current.ratio..times.
## 3537 0.41 1.71
## 3538 4.58 6.49
## 3539 0.59 0.91
## 3540 2.83 3.83
## 3541 2.00 2.00
## Debt.to.equity.ratio..times. Cash.to.current.liabilities..times.
## 3537 1.10 0.07
## 3538 0.10 3.88
## 3539 1.46 0.05
## 3540 0.01 1.35
## 3541 0.00 2.00
## Cash.to.average.cost.of.sales.per.day Creditors.turnover
## 3537 5.67 15.65
## 3538 177.71 10.07
## 3539 11.05 3.96
## 3540 29.93 25.00
## 3541 2190.00 0.00
## Debtors.turnover Finished.goods.turnover WIP.turnover
## 3537 20.64 8.66 5.14
## 3538 14.21 5.13 4.17
## 3539 3.76 33.03 11.68
## 3540 13.75 49.00 47.03
## 3541 0.00 NA NA
## Raw.material.turnover Shares.outstanding Equity.face.value EPS
## 3537 19.47 14904213 10 0.97
## 3538 4.83 3362800 10 1.61
## 3539 4.63 3000000 10 13.10
## 3540 17.42 4422346 10 6.06
## 3541 0.00 5220000 10 -0.02
## Adjusted.EPS Total.liabilities PE.on.BSE Default
## 3537 0.97 450.5 NA 0
## 3538 1.61 97.6 2.49 0
## 3539 13.10 902.9 12.62 0
## 3540 6.06 177.0 4.07 0
## 3541 -0.02 0.6 NA 1
# Understanding the Structure of the data loaded.
str(dataset)
## 'data.frame': 3541 obs. of 53 variables:
## $ Num : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Networth.Next.Year : num 8890.6 394.3 92.2 2.7 109 ...
## $ Total.assets : num 17512.3 941 232.8 2.7 478.5 ...
## $ Net.worth : num 7093.2 351.5 100.6 2.7 107.6 ...
## $ Total.income : num 24965 1527 477 NA 1580 ...
## $ Change.in.stock : num 235.8 42.7 -5.2 NA -17 ...
## $ Total.expenses : num 23658 1455 479 NA 1558 ...
## $ Profit.after.tax : num 1543.2 115.2 -6.6 NA 5.5 ...
## $ PBDITA : num 2860.2 283 5.8 NA 31 ...
## $ PBT : num 2417.2 188.4 -6.6 NA 6.3 ...
## $ Cash.profit : num 1872.8 158.6 0.3 NA 11.9 ...
## $ PBDITA.as...of.total.income : num 11.46 18.53 1.22 0 1.96 ...
## $ PBT.as...of.total.income : num 9.68 12.33 -1.38 0 0.4 ...
## $ PAT.as...of.total.income : num 6.18 7.54 -1.38 0 0.35 2.81 0 0.72 8.29 -2.88 ...
## $ Cash.profit.as...of.total.income : num 7.5 10.38 0.06 0 0.75 ...
## $ PAT.as...of.net.worth : num 23.78 38.08 -6.35 0 5.25 ...
## $ Sales : num 24458 1504 476 NA 1575 ...
## $ Income.from.financial.services : num 158 4 1.5 NA 3.9 6.4 NA NA 7.3 NA ...
## $ Other.income : num 297.2 15.9 0.2 NA 0.9 ...
## $ Total.capital : num 423.8 115.5 81.4 0.5 6.2 ...
## $ Reserves.and.funds : num 6822.8 257.8 19.2 2.2 161.8 ...
## $ Deposits..accepted.by.commercial.banks. : logi NA NA NA NA NA NA ...
## $ Borrowings : num 14.9 272.5 35.4 NA 193.1 ...
## $ Current.liabilities...provisions : num 9965.9 210 96.8 NA 112.8 ...
## $ Deferred.tax.liability : num 284.9 85.2 NA NA 4.6 ...
## $ Shareholders.funds : num 7093.2 351.5 100.6 2.7 107.6 ...
## $ Cumulative.retained.profits : num 6263.3 247.4 32.4 2.2 82.7 ...
## $ Capital.employed : num 7108.1 624 136 2.7 300.7 ...
## $ TOL.TNW : num 1.33 1.23 1.44 0 2.83 1.8 0.03 5.17 1.05 3.25 ...
## $ Total.term.liabilities...tangible.net.worth: num 0 0.34 0.29 0 1.59 0.37 0.03 0.94 0.3 0.54 ...
## $ Contingent.liabilities...Net.worth.... : num 14.8 19.2 45.8 0 34.9 ...
## $ Contingent.liabilities : num 1049.7 67.6 46.1 NA 37.6 ...
## $ Net.fixed.assets : num 1900.2 286.4 38.7 2.5 94.8 ...
## $ Investments : num 1069.6 2.2 4.3 NA 7.4 ...
## $ Current.assets : num 13277.5 563.9 167.5 0.2 349.7 ...
## $ Net.working.capital : num 3588.5 203.5 59.6 0.2 215.8 ...
## $ Quick.ratio..times. : num 1.18 0.95 1.11 NA 1.41 0.48 NA 0.54 0.59 0.39 ...
## $ Current.ratio..times. : num 1.37 1.56 1.55 NA 2.54 1.27 NA 1.15 1.58 0.5 ...
## $ Debt.to.equity.ratio..times. : num 0 0.78 0.35 0 1.79 1.09 0.32 2.31 0.94 3.13 ...
## $ Cash.to.current.liabilities..times. : num 0.43 0.06 0.21 NA 0 0.11 NA 0.04 0.19 0 ...
## $ Cash.to.average.cost.of.sales.per.day : num 68.21 5.96 17.07 NA 0 ...
## $ Creditors.turnover : num 3.62 9.8 5.28 0 13 ...
## $ Debtors.turnover : num 3.85 5.7 5.07 0 9.46 ...
## $ Finished.goods.turnover : num 200.55 14.21 9.24 NA 12.68 ...
## $ WIP.turnover : num 21.78 7.49 0.23 NA 7.9 ...
## $ Raw.material.turnover : num 7.71 11.46 NA 0 17.03 ...
## $ Shares.outstanding : num 42381675 11550000 8149090 52404 619635 ...
## $ Equity.face.value : num 10 10 10 10 10 10 10 NA 10 10 ...
## $ EPS : num 35.52 9.97 -0.5 0 7.91 ...
## $ Adjusted.EPS : num 7.1 9.97 -0.5 0 7.91 ...
## $ Total.liabilities : num 17512.3 941 232.8 2.7 478.5 ...
## $ PE.on.BSE : num 27.31 8.17 -5.76 NA NA ...
## $ Default : int 0 0 0 0 0 0 0 0 0 1 ...
#Understanding the Summary of the data loaded.
summary(dataset)
## Num Networth.Next.Year Total.assets Net.worth
## Min. : 1 Min. :-74265.6 Min. : 0.1 Min. : 0.0
## 1st Qu.: 886 1st Qu.: 31.7 1st Qu.: 91.3 1st Qu.: 31.3
## Median :1773 Median : 116.3 Median : 309.7 Median : 102.3
## Mean :1772 Mean : 1616.3 Mean : 3443.4 Mean : 1295.9
## 3rd Qu.:2658 3rd Qu.: 456.1 3rd Qu.: 1098.7 3rd Qu.: 377.3
## Max. :3545 Max. :805773.4 Max. :1176509.2 Max. :613151.6
##
## Total.income Change.in.stock Total.expenses
## Min. : 0.0 Min. :-3029.40 Min. : -0.1
## 1st Qu.: 106.5 1st Qu.: -1.80 1st Qu.: 95.8
## Median : 444.9 Median : 1.60 Median : 407.7
## Mean : 4582.8 Mean : 41.49 Mean : 4262.9
## 3rd Qu.: 1440.9 3rd Qu.: 18.05 3rd Qu.: 1359.8
## Max. :2442828.2 Max. :14185.50 Max. :2366035.3
## NA's :198 NA's :458 NA's :139
## Profit.after.tax PBDITA PBT
## Min. : -3908.30 Min. : -440.7 Min. : -3894.80
## 1st Qu.: 0.50 1st Qu.: 6.9 1st Qu.: 0.70
## Median : 8.80 Median : 35.4 Median : 12.40
## Mean : 277.36 Mean : 578.1 Mean : 383.81
## 3rd Qu.: 52.27 3rd Qu.: 150.2 3rd Qu.: 71.97
## Max. :119439.10 Max. :208576.5 Max. :145292.60
## NA's :131 NA's :131 NA's :131
## Cash.profit PBDITA.as...of.total.income PBT.as...of.total.income
## Min. : -2245.70 Min. :-6400.000 Min. :-21340.00
## 1st Qu.: 2.90 1st Qu.: 5.000 1st Qu.: 0.55
## Median : 18.85 Median : 9.660 Median : 3.31
## Mean : 392.07 Mean : 4.571 Mean : -17.28
## 3rd Qu.: 93.20 3rd Qu.: 16.390 3rd Qu.: 8.80
## Max. :176911.80 Max. : 100.000 Max. : 100.00
## NA's :131 NA's :68 NA's :68
## PAT.as...of.total.income Cash.profit.as...of.total.income
## Min. :-21340.00 Min. :-15020.000
## 1st Qu.: 0.35 1st Qu.: 2.020
## Median : 2.34 Median : 5.640
## Mean : -19.20 Mean : -8.229
## 3rd Qu.: 6.34 3rd Qu.: 10.700
## Max. : 150.00 Max. : 100.000
## NA's :68 NA's :68
## PAT.as...of.net.worth Sales Income.from.financial.services
## Min. :-748.72 Min. : 0.1 Min. : 0.00
## 1st Qu.: 0.00 1st Qu.: 112.7 1st Qu.: 0.40
## Median : 7.92 Median : 453.1 Median : 1.80
## Mean : 10.27 Mean : 4549.5 Mean : 80.84
## 3rd Qu.: 20.19 3rd Qu.: 1433.5 3rd Qu.: 9.68
## Max. :2466.67 Max. :2384984.4 Max. :51938.20
## NA's :259 NA's :935
## Other.income Total.capital Reserves.and.funds
## Min. : 0.00 Min. : 0.1 Min. : -6525.9
## 1st Qu.: 0.40 1st Qu.: 13.1 1st Qu.: 5.0
## Median : 1.40 Median : 42.1 Median : 54.8
## Mean : 41.36 Mean : 216.6 Mean : 1163.8
## 3rd Qu.: 5.97 3rd Qu.: 100.3 3rd Qu.: 277.3
## Max. :42856.70 Max. :78273.2 Max. :625137.8
## NA's :1295 NA's :4 NA's :85
## Deposits..accepted.by.commercial.banks. Borrowings
## Mode:logical Min. : 0.10
## NA's:3541 1st Qu.: 23.95
## Median : 99.20
## Mean : 1122.28
## 3rd Qu.: 352.60
## Max. :278257.30
## NA's :366
## Current.liabilities...provisions Deferred.tax.liability
## Min. : 0.1 Min. : 0.1
## 1st Qu.: 17.8 1st Qu.: 3.2
## Median : 69.4 Median : 13.4
## Mean : 940.6 Mean : 227.2
## 3rd Qu.: 261.7 3rd Qu.: 50.0
## Max. :352240.3 Max. :72796.6
## NA's :96 NA's :1140
## Shareholders.funds Cumulative.retained.profits Capital.employed
## Min. : 0.0 Min. : -6534.3 Min. : 0.0
## 1st Qu.: 32.0 1st Qu.: 1.1 1st Qu.: 60.8
## Median : 105.6 Median : 37.1 Median : 214.7
## Mean : 1322.1 Mean : 890.5 Mean : 2328.3
## 3rd Qu.: 393.2 3rd Qu.: 202.3 3rd Qu.: 767.3
## Max. :613151.6 Max. :390133.8 Max. :891408.9
## NA's :38
## TOL.TNW Total.term.liabilities...tangible.net.worth
## Min. :-350.480 Min. :-325.600
## 1st Qu.: 0.600 1st Qu.: 0.050
## Median : 1.430 Median : 0.340
## Mean : 3.994 Mean : 1.844
## 3rd Qu.: 2.830 3rd Qu.: 1.000
## Max. : 473.000 Max. : 456.000
##
## Contingent.liabilities...Net.worth.... Contingent.liabilities
## Min. : 0.00 Min. : 0.1
## 1st Qu.: 0.00 1st Qu.: 6.3
## Median : 5.33 Median : 38.0
## Mean : 53.94 Mean : 932.9
## 3rd Qu.: 30.76 3rd Qu.: 192.7
## Max. :14704.27 Max. :559506.8
## NA's :1188
## Net.fixed.assets Investments Current.assets
## Min. : 0.0 Min. : 0.00 Min. : 0.1
## 1st Qu.: 26.0 1st Qu.: 1.00 1st Qu.: 36.2
## Median : 93.5 Median : 8.35 Median : 145.1
## Mean : 1189.7 Mean : 694.73 Mean : 1293.4
## 3rd Qu.: 344.9 3rd Qu.: 64.30 3rd Qu.: 502.2
## Max. :636604.6 Max. :199978.60 Max. :354815.2
## NA's :118 NA's :1435 NA's :66
## Net.working.capital Quick.ratio..times. Current.ratio..times.
## Min. :-63839.0 Min. : 0.000 Min. : 0.00
## 1st Qu.: -1.1 1st Qu.: 0.410 1st Qu.: 0.93
## Median : 16.2 Median : 0.670 Median : 1.23
## Mean : 138.6 Mean : 1.401 Mean : 2.13
## 3rd Qu.: 84.2 3rd Qu.: 1.030 3rd Qu.: 1.71
## Max. : 85782.8 Max. :341.000 Max. :505.00
## NA's :32 NA's :93 NA's :93
## Debt.to.equity.ratio..times. Cash.to.current.liabilities..times.
## Min. : 0.00 Min. : 0.0000
## 1st Qu.: 0.22 1st Qu.: 0.0200
## Median : 0.79 Median : 0.0700
## Mean : 2.78 Mean : 0.4904
## 3rd Qu.: 1.75 3rd Qu.: 0.1900
## Max. :456.00 Max. :165.0000
## NA's :93
## Cash.to.average.cost.of.sales.per.day Creditors.turnover
## Min. : 0.00 Min. : 0.000
## 1st Qu.: 2.79 1st Qu.: 3.700
## Median : 8.03 Median : 6.095
## Mean : 158.44 Mean : 15.446
## 3rd Qu.: 21.79 3rd Qu.: 11.490
## Max. :128040.76 Max. :2401.000
## NA's :85 NA's :333
## Debtors.turnover Finished.goods.turnover WIP.turnover
## Min. : 0.00 Min. : -0.09 Min. : -0.18
## 1st Qu.: 3.76 1st Qu.: 8.20 1st Qu.: 5.10
## Median : 6.32 Median : 17.27 Median : 9.76
## Mean : 17.04 Mean : 87.08 Mean : 27.93
## 3rd Qu.: 11.68 3rd Qu.: 40.35 3rd Qu.: 20.24
## Max. :3135.20 Max. :17947.60 Max. :5651.40
## NA's :328 NA's :740 NA's :640
## Raw.material.turnover Shares.outstanding Equity.face.value
## Min. : -2.00 Min. :-2.147e+09 Min. :-999999
## 1st Qu.: 2.99 1st Qu.: 1.316e+06 1st Qu.: 10
## Median : 6.40 Median : 4.672e+06 Median : 10
## Mean : 19.09 Mean : 2.207e+07 Mean : -1334
## 3rd Qu.: 11.85 3rd Qu.: 1.065e+07 3rd Qu.: 10
## Max. :21092.00 Max. : 4.130e+09 Max. : 100000
## NA's :361 NA's :692 NA's :692
## EPS Adjusted.EPS Total.liabilities
## Min. :-843181.8 Min. :-843181.8 Min. : 0.1
## 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.: 91.3
## Median : 1.4 Median : 1.2 Median : 309.7
## Mean : -220.3 Mean : -221.5 Mean : 3443.4
## 3rd Qu.: 9.6 3rd Qu.: 7.5 3rd Qu.: 1098.7
## Max. : 34522.5 Max. : 34522.5 Max. :1176509.2
##
## PE.on.BSE Default
## Min. :-1116.64 Min. :0.00000
## 1st Qu.: 3.27 1st Qu.:0.00000
## Median : 9.10 Median :0.00000
## Mean : 63.91 Mean :0.06608
## 3rd Qu.: 17.79 3rd Qu.:0.00000
## Max. :51002.74 Max. :1.00000
## NA's :2194
We need to pre-process our data before we can use it for modeling. This step involves the below steps. • Missing Value Treatment
• Outlier Treatment
• Performing Multicollinearity check.
Let’s check if there are any missing values present in data.
# Checking for missing values available in data.
colSums(is.na(dataset))
## Num
## 0
## Networth.Next.Year
## 0
## Total.assets
## 0
## Net.worth
## 0
## Total.income
## 198
## Change.in.stock
## 458
## Total.expenses
## 139
## Profit.after.tax
## 131
## PBDITA
## 131
## PBT
## 131
## Cash.profit
## 131
## PBDITA.as...of.total.income
## 68
## PBT.as...of.total.income
## 68
## PAT.as...of.total.income
## 68
## Cash.profit.as...of.total.income
## 68
## PAT.as...of.net.worth
## 0
## Sales
## 259
## Income.from.financial.services
## 935
## Other.income
## 1295
## Total.capital
## 4
## Reserves.and.funds
## 85
## Deposits..accepted.by.commercial.banks.
## 3541
## Borrowings
## 366
## Current.liabilities...provisions
## 96
## Deferred.tax.liability
## 1140
## Shareholders.funds
## 0
## Cumulative.retained.profits
## 38
## Capital.employed
## 0
## TOL.TNW
## 0
## Total.term.liabilities...tangible.net.worth
## 0
## Contingent.liabilities...Net.worth....
## 0
## Contingent.liabilities
## 1188
## Net.fixed.assets
## 118
## Investments
## 1435
## Current.assets
## 66
## Net.working.capital
## 32
## Quick.ratio..times.
## 93
## Current.ratio..times.
## 93
## Debt.to.equity.ratio..times.
## 0
## Cash.to.current.liabilities..times.
## 93
## Cash.to.average.cost.of.sales.per.day
## 85
## Creditors.turnover
## 333
## Debtors.turnover
## 328
## Finished.goods.turnover
## 740
## WIP.turnover
## 640
## Raw.material.turnover
## 361
## Shares.outstanding
## 692
## Equity.face.value
## 692
## EPS
## 0
## Adjusted.EPS
## 0
## Total.liabilities
## 0
## PE.on.BSE
## 2194
## Default
## 0
We observe that there are variables with missing values more then 25% of the total records. Imputing such variables can end up creating artifical data giving lower accuracy in Data Modelling. Hence we’ll be eliminating those variables where the missing data is more then 25%.
# Eliminating variables having missing value greater the 25%
data <- dataset[,-c(1,22,25,18,32,34,52)]
# Imputing missing values for the remaining variables.
imputed <- preProcess(data[,-46],method = "knnImpute",k = 5)
imputed_val <- predict(imputed,data)
# Checking for missing values on the Output data.
anyNA(imputed_val)
## [1] FALSE