Caret: A Complete Solution for Machine Learning in R

Tired of remembering too many different Packages?

One of the biggest challenge beginners in Data Science face is which algorithms to learn and focus on. In case of R, the problem gets accentuated by the fact that one functionality can be achieved by various approaches by using different libraries available in R, which is great but quite frustrating since each package was designed independently and has very different syntax, inputs and outputs. This could be too much for a beginner.

Here is a tip to handle everything from Exploring Data to performing complex Machine learning Algorithms to tuning those algorithms using hyper parameters, everything under a single roof.

All this has been made possible by the years of effort that have gone behind CARET ( Classification And REgression Training) which is possibly the biggest project in R. This package alone is all you need to know for solve almost any supervised machine learning problem. Not only does caret allow you to run a plethora of ML methods, it also provides tools for auxiliary techniques such as:

• Data preparation (imputation, centering/scaling data, removing correlated predictors, reducing skewness)

• Data splitting

• Variable selection

• Model evaluation

Here is an end to end guide to showcase the power of a package that has it all.

In this problem statement, we have to predict the Loan Status of an Individual based on his/ her profile. We’ll get started by loading the Caret Library and Loan Default dataset in R available in my Working Directory.

# Installing the Library.
library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
# Setting up the working Directory and Loading the Loan Default dataset.

setwd("D:/Great Learning/Finance and Risk Analytics")

dataset <- read.csv("raw-data.csv")

Once we have the data available in R Environment, we perform a few exploratory checks to understand the Structure of the data and ensure that the data loaded is correct.

### Performing basic Exploratory Analysis

# Checking the class of the data. 
class(dataset)
## [1] "data.frame"
# Checking the dimension of data.
dim(dataset)
## [1] 3541   53
# Reading top 5 Rows.
head(dataset, n=5)
##   Num Networth.Next.Year Total.assets Net.worth Total.income
## 1   1             8890.6      17512.3    7093.2      24965.2
## 2   2              394.3        941.0     351.5       1527.4
## 3   3               92.2        232.8     100.6        477.3
## 4   4                2.7          2.7       2.7           NA
## 5   5              109.0        478.5     107.6       1580.5
##   Change.in.stock Total.expenses Profit.after.tax PBDITA    PBT
## 1           235.8        23657.8           1543.2 2860.2 2417.2
## 2            42.7         1454.9            115.2  283.0  188.4
## 3            -5.2          478.7             -6.6    5.8   -6.6
## 4              NA             NA               NA     NA     NA
## 5           -17.0         1558.0              5.5   31.0    6.3
##   Cash.profit PBDITA.as...of.total.income PBT.as...of.total.income
## 1      1872.8                       11.46                     9.68
## 2       158.6                       18.53                    12.33
## 3         0.3                        1.22                    -1.38
## 4          NA                        0.00                     0.00
## 5        11.9                        1.96                     0.40
##   PAT.as...of.total.income Cash.profit.as...of.total.income
## 1                     6.18                             7.50
## 2                     7.54                            10.38
## 3                    -1.38                             0.06
## 4                     0.00                             0.00
## 5                     0.35                             0.75
##   PAT.as...of.net.worth   Sales Income.from.financial.services
## 1                 23.78 24458.0                          158.0
## 2                 38.08  1504.3                            4.0
## 3                 -6.35   475.6                            1.5
## 4                  0.00      NA                             NA
## 5                  5.25  1575.1                            3.9
##   Other.income Total.capital Reserves.and.funds
## 1        297.2         423.8             6822.8
## 2         15.9         115.5              257.8
## 3          0.2          81.4               19.2
## 4           NA           0.5                2.2
## 5          0.9           6.2              161.8
##   Deposits..accepted.by.commercial.banks. Borrowings
## 1                                      NA       14.9
## 2                                      NA      272.5
## 3                                      NA       35.4
## 4                                      NA         NA
## 5                                      NA      193.1
##   Current.liabilities...provisions Deferred.tax.liability
## 1                           9965.9                  284.9
## 2                            210.0                   85.2
## 3                             96.8                     NA
## 4                               NA                     NA
## 5                            112.8                    4.6
##   Shareholders.funds Cumulative.retained.profits Capital.employed TOL.TNW
## 1             7093.2                      6263.3           7108.1    1.33
## 2              351.5                       247.4            624.0    1.23
## 3              100.6                        32.4            136.0    1.44
## 4                2.7                         2.2              2.7    0.00
## 5              107.6                        82.7            300.7    2.83
##   Total.term.liabilities...tangible.net.worth
## 1                                        0.00
## 2                                        0.34
## 3                                        0.29
## 4                                        0.00
## 5                                        1.59
##   Contingent.liabilities...Net.worth.... Contingent.liabilities
## 1                                  14.80                 1049.7
## 2                                  19.23                   67.6
## 3                                  45.83                   46.1
## 4                                   0.00                     NA
## 5                                  34.94                   37.6
##   Net.fixed.assets Investments Current.assets Net.working.capital
## 1           1900.2      1069.6        13277.5              3588.5
## 2            286.4         2.2          563.9               203.5
## 3             38.7         4.3          167.5                59.6
## 4              2.5          NA            0.2                 0.2
## 5             94.8         7.4          349.7               215.8
##   Quick.ratio..times. Current.ratio..times. Debt.to.equity.ratio..times.
## 1                1.18                  1.37                         0.00
## 2                0.95                  1.56                         0.78
## 3                1.11                  1.55                         0.35
## 4                  NA                    NA                         0.00
## 5                1.41                  2.54                         1.79
##   Cash.to.current.liabilities..times.
## 1                                0.43
## 2                                0.06
## 3                                0.21
## 4                                  NA
## 5                                0.00
##   Cash.to.average.cost.of.sales.per.day Creditors.turnover
## 1                                 68.21               3.62
## 2                                  5.96               9.80
## 3                                 17.07               5.28
## 4                                    NA               0.00
## 5                                  0.00              13.00
##   Debtors.turnover Finished.goods.turnover WIP.turnover
## 1             3.85                  200.55        21.78
## 2             5.70                   14.21         7.49
## 3             5.07                    9.24         0.23
## 4             0.00                      NA           NA
## 5             9.46                   12.68         7.90
##   Raw.material.turnover Shares.outstanding Equity.face.value   EPS
## 1                  7.71           42381675                10 35.52
## 2                 11.46           11550000                10  9.97
## 3                    NA            8149090                10 -0.50
## 4                  0.00              52404                10  0.00
## 5                 17.03             619635                10  7.91
##   Adjusted.EPS Total.liabilities PE.on.BSE Default
## 1         7.10           17512.3     27.31       0
## 2         9.97             941.0      8.17       0
## 3        -0.50             232.8     -5.76       0
## 4         0.00               2.7        NA       0
## 5         7.91             478.5        NA       0
# Reading bottom 5 Rows.
tail(dataset, n=5)
##       Num Networth.Next.Year Total.assets Net.worth Total.income
## 3537 3541              226.4        450.5     172.3        565.0
## 3538 3542               89.4         97.6      82.0         75.8
## 3539 3543              246.2        902.9     209.1       1005.1
## 3540 3544              146.9        177.0     137.2        371.0
## 3541 3545               -0.2          0.6       0.3           NA
##      Change.in.stock Total.expenses Profit.after.tax PBDITA   PBT
## 3537            30.5          581.1             14.4   76.7  41.1
## 3538            -4.0           66.5              5.3   11.1   6.2
## 3539             5.6          966.5             44.2  120.3  70.0
## 3540             3.9          348.9             26.0   50.5  40.8
## 3541              NA           17.4            -17.4  -17.4 -17.4
##      Cash.profit PBDITA.as...of.total.income PBT.as...of.total.income
## 3537        48.4                       13.58                     7.27
## 3538         9.2                       14.64                     8.18
## 3539        62.6                       11.97                     6.96
## 3540        33.6                       13.61                    11.00
## 3541       -17.4                          NA                       NA
##      PAT.as...of.total.income Cash.profit.as...of.total.income
## 3537                     2.55                             8.57
## 3538                     6.99                            12.14
## 3539                     4.40                             6.23
## 3540                     7.01                             9.06
## 3541                       NA                               NA
##      PAT.as...of.net.worth Sales Income.from.financial.services
## 3537                  8.71 564.5                            0.5
## 3538                  6.68  73.9                            1.7
## 3539                 22.77 995.9                            2.6
## 3540                 20.30 365.8                            3.3
## 3541               -193.33    NA                             NA
##      Other.income Total.capital Reserves.and.funds
## 3537           NA          89.0               85.5
## 3538           NA          38.6               48.4
## 3539          0.3          30.0              179.1
## 3540          1.6          50.9               86.3
## 3541           NA          28.3              -28.0
##      Deposits..accepted.by.commercial.banks. Borrowings
## 3537                                      NA      190.2
## 3538                                      NA        3.0
## 3539                                      NA      305.0
## 3540                                      NA        1.3
## 3541                                      NA         NA
##      Current.liabilities...provisions Deferred.tax.liability
## 3537                             42.5                   36.8
## 3538                              7.6                     NA
## 3539                            363.4                   25.4
## 3540                             21.1                   17.4
## 3541                              0.3                     NA
##      Shareholders.funds Cumulative.retained.profits Capital.employed
## 3537              172.3                        76.8            362.5
## 3538               87.0                        36.6             90.0
## 3539              209.1                       179.1            514.1
## 3540              137.2                        77.1            138.5
## 3541                0.3                       -28.0              0.3
##      TOL.TNW Total.term.liabilities...tangible.net.worth
## 3537    1.30                                        0.72
## 3538    0.12                                        0.02
## 3539    2.45                                        0.68
## 3540    0.10                                        0.01
## 3541    1.00                                        0.00
##      Contingent.liabilities...Net.worth.... Contingent.liabilities
## 3537                                   0.00                     NA
## 3538                                   5.12                    4.2
## 3539                                  93.45                  195.4
## 3540                                   6.20                    8.5
## 3541                                   0.00                     NA
##      Net.fixed.assets Investments Current.assets Net.working.capital
## 3537            227.0          NA          187.0                78.3
## 3538             21.9         6.8           55.8                47.2
## 3539            217.7        17.5          477.5               -49.5
## 3540             73.5          NA           80.8                59.7
## 3541               NA          NA            0.6                 0.3
##      Quick.ratio..times. Current.ratio..times.
## 3537                0.41                  1.71
## 3538                4.58                  6.49
## 3539                0.59                  0.91
## 3540                2.83                  3.83
## 3541                2.00                  2.00
##      Debt.to.equity.ratio..times. Cash.to.current.liabilities..times.
## 3537                         1.10                                0.07
## 3538                         0.10                                3.88
## 3539                         1.46                                0.05
## 3540                         0.01                                1.35
## 3541                         0.00                                2.00
##      Cash.to.average.cost.of.sales.per.day Creditors.turnover
## 3537                                  5.67              15.65
## 3538                                177.71              10.07
## 3539                                 11.05               3.96
## 3540                                 29.93              25.00
## 3541                               2190.00               0.00
##      Debtors.turnover Finished.goods.turnover WIP.turnover
## 3537            20.64                    8.66         5.14
## 3538            14.21                    5.13         4.17
## 3539             3.76                   33.03        11.68
## 3540            13.75                   49.00        47.03
## 3541             0.00                      NA           NA
##      Raw.material.turnover Shares.outstanding Equity.face.value   EPS
## 3537                 19.47           14904213                10  0.97
## 3538                  4.83            3362800                10  1.61
## 3539                  4.63            3000000                10 13.10
## 3540                 17.42            4422346                10  6.06
## 3541                  0.00            5220000                10 -0.02
##      Adjusted.EPS Total.liabilities PE.on.BSE Default
## 3537         0.97             450.5        NA       0
## 3538         1.61              97.6      2.49       0
## 3539        13.10             902.9     12.62       0
## 3540         6.06             177.0      4.07       0
## 3541        -0.02               0.6        NA       1
# Understanding the Structure of the data loaded. 
str(dataset)
## 'data.frame':    3541 obs. of  53 variables:
##  $ Num                                        : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Networth.Next.Year                         : num  8890.6 394.3 92.2 2.7 109 ...
##  $ Total.assets                               : num  17512.3 941 232.8 2.7 478.5 ...
##  $ Net.worth                                  : num  7093.2 351.5 100.6 2.7 107.6 ...
##  $ Total.income                               : num  24965 1527 477 NA 1580 ...
##  $ Change.in.stock                            : num  235.8 42.7 -5.2 NA -17 ...
##  $ Total.expenses                             : num  23658 1455 479 NA 1558 ...
##  $ Profit.after.tax                           : num  1543.2 115.2 -6.6 NA 5.5 ...
##  $ PBDITA                                     : num  2860.2 283 5.8 NA 31 ...
##  $ PBT                                        : num  2417.2 188.4 -6.6 NA 6.3 ...
##  $ Cash.profit                                : num  1872.8 158.6 0.3 NA 11.9 ...
##  $ PBDITA.as...of.total.income                : num  11.46 18.53 1.22 0 1.96 ...
##  $ PBT.as...of.total.income                   : num  9.68 12.33 -1.38 0 0.4 ...
##  $ PAT.as...of.total.income                   : num  6.18 7.54 -1.38 0 0.35 2.81 0 0.72 8.29 -2.88 ...
##  $ Cash.profit.as...of.total.income           : num  7.5 10.38 0.06 0 0.75 ...
##  $ PAT.as...of.net.worth                      : num  23.78 38.08 -6.35 0 5.25 ...
##  $ Sales                                      : num  24458 1504 476 NA 1575 ...
##  $ Income.from.financial.services             : num  158 4 1.5 NA 3.9 6.4 NA NA 7.3 NA ...
##  $ Other.income                               : num  297.2 15.9 0.2 NA 0.9 ...
##  $ Total.capital                              : num  423.8 115.5 81.4 0.5 6.2 ...
##  $ Reserves.and.funds                         : num  6822.8 257.8 19.2 2.2 161.8 ...
##  $ Deposits..accepted.by.commercial.banks.    : logi  NA NA NA NA NA NA ...
##  $ Borrowings                                 : num  14.9 272.5 35.4 NA 193.1 ...
##  $ Current.liabilities...provisions           : num  9965.9 210 96.8 NA 112.8 ...
##  $ Deferred.tax.liability                     : num  284.9 85.2 NA NA 4.6 ...
##  $ Shareholders.funds                         : num  7093.2 351.5 100.6 2.7 107.6 ...
##  $ Cumulative.retained.profits                : num  6263.3 247.4 32.4 2.2 82.7 ...
##  $ Capital.employed                           : num  7108.1 624 136 2.7 300.7 ...
##  $ TOL.TNW                                    : num  1.33 1.23 1.44 0 2.83 1.8 0.03 5.17 1.05 3.25 ...
##  $ Total.term.liabilities...tangible.net.worth: num  0 0.34 0.29 0 1.59 0.37 0.03 0.94 0.3 0.54 ...
##  $ Contingent.liabilities...Net.worth....     : num  14.8 19.2 45.8 0 34.9 ...
##  $ Contingent.liabilities                     : num  1049.7 67.6 46.1 NA 37.6 ...
##  $ Net.fixed.assets                           : num  1900.2 286.4 38.7 2.5 94.8 ...
##  $ Investments                                : num  1069.6 2.2 4.3 NA 7.4 ...
##  $ Current.assets                             : num  13277.5 563.9 167.5 0.2 349.7 ...
##  $ Net.working.capital                        : num  3588.5 203.5 59.6 0.2 215.8 ...
##  $ Quick.ratio..times.                        : num  1.18 0.95 1.11 NA 1.41 0.48 NA 0.54 0.59 0.39 ...
##  $ Current.ratio..times.                      : num  1.37 1.56 1.55 NA 2.54 1.27 NA 1.15 1.58 0.5 ...
##  $ Debt.to.equity.ratio..times.               : num  0 0.78 0.35 0 1.79 1.09 0.32 2.31 0.94 3.13 ...
##  $ Cash.to.current.liabilities..times.        : num  0.43 0.06 0.21 NA 0 0.11 NA 0.04 0.19 0 ...
##  $ Cash.to.average.cost.of.sales.per.day      : num  68.21 5.96 17.07 NA 0 ...
##  $ Creditors.turnover                         : num  3.62 9.8 5.28 0 13 ...
##  $ Debtors.turnover                           : num  3.85 5.7 5.07 0 9.46 ...
##  $ Finished.goods.turnover                    : num  200.55 14.21 9.24 NA 12.68 ...
##  $ WIP.turnover                               : num  21.78 7.49 0.23 NA 7.9 ...
##  $ Raw.material.turnover                      : num  7.71 11.46 NA 0 17.03 ...
##  $ Shares.outstanding                         : num  42381675 11550000 8149090 52404 619635 ...
##  $ Equity.face.value                          : num  10 10 10 10 10 10 10 NA 10 10 ...
##  $ EPS                                        : num  35.52 9.97 -0.5 0 7.91 ...
##  $ Adjusted.EPS                               : num  7.1 9.97 -0.5 0 7.91 ...
##  $ Total.liabilities                          : num  17512.3 941 232.8 2.7 478.5 ...
##  $ PE.on.BSE                                  : num  27.31 8.17 -5.76 NA NA ...
##  $ Default                                    : int  0 0 0 0 0 0 0 0 0 1 ...
#Understanding the Summary of the data loaded.
summary(dataset)
##       Num       Networth.Next.Year  Total.assets         Net.worth       
##  Min.   :   1   Min.   :-74265.6   Min.   :      0.1   Min.   :     0.0  
##  1st Qu.: 886   1st Qu.:    31.7   1st Qu.:     91.3   1st Qu.:    31.3  
##  Median :1773   Median :   116.3   Median :    309.7   Median :   102.3  
##  Mean   :1772   Mean   :  1616.3   Mean   :   3443.4   Mean   :  1295.9  
##  3rd Qu.:2658   3rd Qu.:   456.1   3rd Qu.:   1098.7   3rd Qu.:   377.3  
##  Max.   :3545   Max.   :805773.4   Max.   :1176509.2   Max.   :613151.6  
##                                                                          
##   Total.income       Change.in.stock    Total.expenses     
##  Min.   :      0.0   Min.   :-3029.40   Min.   :     -0.1  
##  1st Qu.:    106.5   1st Qu.:   -1.80   1st Qu.:     95.8  
##  Median :    444.9   Median :    1.60   Median :    407.7  
##  Mean   :   4582.8   Mean   :   41.49   Mean   :   4262.9  
##  3rd Qu.:   1440.9   3rd Qu.:   18.05   3rd Qu.:   1359.8  
##  Max.   :2442828.2   Max.   :14185.50   Max.   :2366035.3  
##  NA's   :198         NA's   :458        NA's   :139        
##  Profit.after.tax        PBDITA              PBT           
##  Min.   : -3908.30   Min.   :  -440.7   Min.   : -3894.80  
##  1st Qu.:     0.50   1st Qu.:     6.9   1st Qu.:     0.70  
##  Median :     8.80   Median :    35.4   Median :    12.40  
##  Mean   :   277.36   Mean   :   578.1   Mean   :   383.81  
##  3rd Qu.:    52.27   3rd Qu.:   150.2   3rd Qu.:    71.97  
##  Max.   :119439.10   Max.   :208576.5   Max.   :145292.60  
##  NA's   :131         NA's   :131        NA's   :131        
##   Cash.profit        PBDITA.as...of.total.income PBT.as...of.total.income
##  Min.   : -2245.70   Min.   :-6400.000           Min.   :-21340.00       
##  1st Qu.:     2.90   1st Qu.:    5.000           1st Qu.:     0.55       
##  Median :    18.85   Median :    9.660           Median :     3.31       
##  Mean   :   392.07   Mean   :    4.571           Mean   :   -17.28       
##  3rd Qu.:    93.20   3rd Qu.:   16.390           3rd Qu.:     8.80       
##  Max.   :176911.80   Max.   :  100.000           Max.   :   100.00       
##  NA's   :131         NA's   :68                  NA's   :68              
##  PAT.as...of.total.income Cash.profit.as...of.total.income
##  Min.   :-21340.00        Min.   :-15020.000              
##  1st Qu.:     0.35        1st Qu.:     2.020              
##  Median :     2.34        Median :     5.640              
##  Mean   :   -19.20        Mean   :    -8.229              
##  3rd Qu.:     6.34        3rd Qu.:    10.700              
##  Max.   :   150.00        Max.   :   100.000              
##  NA's   :68               NA's   :68                      
##  PAT.as...of.net.worth     Sales           Income.from.financial.services
##  Min.   :-748.72       Min.   :      0.1   Min.   :    0.00              
##  1st Qu.:   0.00       1st Qu.:    112.7   1st Qu.:    0.40              
##  Median :   7.92       Median :    453.1   Median :    1.80              
##  Mean   :  10.27       Mean   :   4549.5   Mean   :   80.84              
##  3rd Qu.:  20.19       3rd Qu.:   1433.5   3rd Qu.:    9.68              
##  Max.   :2466.67       Max.   :2384984.4   Max.   :51938.20              
##                        NA's   :259         NA's   :935                   
##   Other.income      Total.capital     Reserves.and.funds
##  Min.   :    0.00   Min.   :    0.1   Min.   : -6525.9  
##  1st Qu.:    0.40   1st Qu.:   13.1   1st Qu.:     5.0  
##  Median :    1.40   Median :   42.1   Median :    54.8  
##  Mean   :   41.36   Mean   :  216.6   Mean   :  1163.8  
##  3rd Qu.:    5.97   3rd Qu.:  100.3   3rd Qu.:   277.3  
##  Max.   :42856.70   Max.   :78273.2   Max.   :625137.8  
##  NA's   :1295       NA's   :4         NA's   :85        
##  Deposits..accepted.by.commercial.banks.   Borrowings       
##  Mode:logical                            Min.   :     0.10  
##  NA's:3541                               1st Qu.:    23.95  
##                                          Median :    99.20  
##                                          Mean   :  1122.28  
##                                          3rd Qu.:   352.60  
##                                          Max.   :278257.30  
##                                          NA's   :366        
##  Current.liabilities...provisions Deferred.tax.liability
##  Min.   :     0.1                 Min.   :    0.1       
##  1st Qu.:    17.8                 1st Qu.:    3.2       
##  Median :    69.4                 Median :   13.4       
##  Mean   :   940.6                 Mean   :  227.2       
##  3rd Qu.:   261.7                 3rd Qu.:   50.0       
##  Max.   :352240.3                 Max.   :72796.6       
##  NA's   :96                       NA's   :1140          
##  Shareholders.funds Cumulative.retained.profits Capital.employed  
##  Min.   :     0.0   Min.   : -6534.3            Min.   :     0.0  
##  1st Qu.:    32.0   1st Qu.:     1.1            1st Qu.:    60.8  
##  Median :   105.6   Median :    37.1            Median :   214.7  
##  Mean   :  1322.1   Mean   :   890.5            Mean   :  2328.3  
##  3rd Qu.:   393.2   3rd Qu.:   202.3            3rd Qu.:   767.3  
##  Max.   :613151.6   Max.   :390133.8            Max.   :891408.9  
##                     NA's   :38                                    
##     TOL.TNW         Total.term.liabilities...tangible.net.worth
##  Min.   :-350.480   Min.   :-325.600                           
##  1st Qu.:   0.600   1st Qu.:   0.050                           
##  Median :   1.430   Median :   0.340                           
##  Mean   :   3.994   Mean   :   1.844                           
##  3rd Qu.:   2.830   3rd Qu.:   1.000                           
##  Max.   : 473.000   Max.   : 456.000                           
##                                                                
##  Contingent.liabilities...Net.worth.... Contingent.liabilities
##  Min.   :    0.00                       Min.   :     0.1      
##  1st Qu.:    0.00                       1st Qu.:     6.3      
##  Median :    5.33                       Median :    38.0      
##  Mean   :   53.94                       Mean   :   932.9      
##  3rd Qu.:   30.76                       3rd Qu.:   192.7      
##  Max.   :14704.27                       Max.   :559506.8      
##                                         NA's   :1188          
##  Net.fixed.assets    Investments        Current.assets    
##  Min.   :     0.0   Min.   :     0.00   Min.   :     0.1  
##  1st Qu.:    26.0   1st Qu.:     1.00   1st Qu.:    36.2  
##  Median :    93.5   Median :     8.35   Median :   145.1  
##  Mean   :  1189.7   Mean   :   694.73   Mean   :  1293.4  
##  3rd Qu.:   344.9   3rd Qu.:    64.30   3rd Qu.:   502.2  
##  Max.   :636604.6   Max.   :199978.60   Max.   :354815.2  
##  NA's   :118        NA's   :1435        NA's   :66        
##  Net.working.capital Quick.ratio..times. Current.ratio..times.
##  Min.   :-63839.0    Min.   :  0.000     Min.   :  0.00       
##  1st Qu.:    -1.1    1st Qu.:  0.410     1st Qu.:  0.93       
##  Median :    16.2    Median :  0.670     Median :  1.23       
##  Mean   :   138.6    Mean   :  1.401     Mean   :  2.13       
##  3rd Qu.:    84.2    3rd Qu.:  1.030     3rd Qu.:  1.71       
##  Max.   : 85782.8    Max.   :341.000     Max.   :505.00       
##  NA's   :32          NA's   :93          NA's   :93           
##  Debt.to.equity.ratio..times. Cash.to.current.liabilities..times.
##  Min.   :  0.00               Min.   :  0.0000                   
##  1st Qu.:  0.22               1st Qu.:  0.0200                   
##  Median :  0.79               Median :  0.0700                   
##  Mean   :  2.78               Mean   :  0.4904                   
##  3rd Qu.:  1.75               3rd Qu.:  0.1900                   
##  Max.   :456.00               Max.   :165.0000                   
##                               NA's   :93                         
##  Cash.to.average.cost.of.sales.per.day Creditors.turnover
##  Min.   :     0.00                     Min.   :   0.000  
##  1st Qu.:     2.79                     1st Qu.:   3.700  
##  Median :     8.03                     Median :   6.095  
##  Mean   :   158.44                     Mean   :  15.446  
##  3rd Qu.:    21.79                     3rd Qu.:  11.490  
##  Max.   :128040.76                     Max.   :2401.000  
##  NA's   :85                            NA's   :333       
##  Debtors.turnover  Finished.goods.turnover  WIP.turnover    
##  Min.   :   0.00   Min.   :   -0.09        Min.   :  -0.18  
##  1st Qu.:   3.76   1st Qu.:    8.20        1st Qu.:   5.10  
##  Median :   6.32   Median :   17.27        Median :   9.76  
##  Mean   :  17.04   Mean   :   87.08        Mean   :  27.93  
##  3rd Qu.:  11.68   3rd Qu.:   40.35        3rd Qu.:  20.24  
##  Max.   :3135.20   Max.   :17947.60        Max.   :5651.40  
##  NA's   :328       NA's   :740             NA's   :640      
##  Raw.material.turnover Shares.outstanding   Equity.face.value
##  Min.   :   -2.00      Min.   :-2.147e+09   Min.   :-999999  
##  1st Qu.:    2.99      1st Qu.: 1.316e+06   1st Qu.:     10  
##  Median :    6.40      Median : 4.672e+06   Median :     10  
##  Mean   :   19.09      Mean   : 2.207e+07   Mean   :  -1334  
##  3rd Qu.:   11.85      3rd Qu.: 1.065e+07   3rd Qu.:     10  
##  Max.   :21092.00      Max.   : 4.130e+09   Max.   : 100000  
##  NA's   :361           NA's   :692          NA's   :692      
##       EPS             Adjusted.EPS       Total.liabilities  
##  Min.   :-843181.8   Min.   :-843181.8   Min.   :      0.1  
##  1st Qu.:      0.0   1st Qu.:      0.0   1st Qu.:     91.3  
##  Median :      1.4   Median :      1.2   Median :    309.7  
##  Mean   :   -220.3   Mean   :   -221.5   Mean   :   3443.4  
##  3rd Qu.:      9.6   3rd Qu.:      7.5   3rd Qu.:   1098.7  
##  Max.   :  34522.5   Max.   :  34522.5   Max.   :1176509.2  
##                                                             
##    PE.on.BSE           Default       
##  Min.   :-1116.64   Min.   :0.00000  
##  1st Qu.:    3.27   1st Qu.:0.00000  
##  Median :    9.10   Median :0.00000  
##  Mean   :   63.91   Mean   :0.06608  
##  3rd Qu.:   17.79   3rd Qu.:0.00000  
##  Max.   :51002.74   Max.   :1.00000  
##  NA's   :2194

We need to pre-process our data before we can use it for modeling. This step involves the below steps. • Missing Value Treatment

• Outlier Treatment

• Performing Multicollinearity check.

Let’s check if there are any missing values present in data.

# Checking for missing values available in data.

colSums(is.na(dataset))
##                                         Num 
##                                           0 
##                          Networth.Next.Year 
##                                           0 
##                                Total.assets 
##                                           0 
##                                   Net.worth 
##                                           0 
##                                Total.income 
##                                         198 
##                             Change.in.stock 
##                                         458 
##                              Total.expenses 
##                                         139 
##                            Profit.after.tax 
##                                         131 
##                                      PBDITA 
##                                         131 
##                                         PBT 
##                                         131 
##                                 Cash.profit 
##                                         131 
##                 PBDITA.as...of.total.income 
##                                          68 
##                    PBT.as...of.total.income 
##                                          68 
##                    PAT.as...of.total.income 
##                                          68 
##            Cash.profit.as...of.total.income 
##                                          68 
##                       PAT.as...of.net.worth 
##                                           0 
##                                       Sales 
##                                         259 
##              Income.from.financial.services 
##                                         935 
##                                Other.income 
##                                        1295 
##                               Total.capital 
##                                           4 
##                          Reserves.and.funds 
##                                          85 
##     Deposits..accepted.by.commercial.banks. 
##                                        3541 
##                                  Borrowings 
##                                         366 
##            Current.liabilities...provisions 
##                                          96 
##                      Deferred.tax.liability 
##                                        1140 
##                          Shareholders.funds 
##                                           0 
##                 Cumulative.retained.profits 
##                                          38 
##                            Capital.employed 
##                                           0 
##                                     TOL.TNW 
##                                           0 
## Total.term.liabilities...tangible.net.worth 
##                                           0 
##      Contingent.liabilities...Net.worth.... 
##                                           0 
##                      Contingent.liabilities 
##                                        1188 
##                            Net.fixed.assets 
##                                         118 
##                                 Investments 
##                                        1435 
##                              Current.assets 
##                                          66 
##                         Net.working.capital 
##                                          32 
##                         Quick.ratio..times. 
##                                          93 
##                       Current.ratio..times. 
##                                          93 
##                Debt.to.equity.ratio..times. 
##                                           0 
##         Cash.to.current.liabilities..times. 
##                                          93 
##       Cash.to.average.cost.of.sales.per.day 
##                                          85 
##                          Creditors.turnover 
##                                         333 
##                            Debtors.turnover 
##                                         328 
##                     Finished.goods.turnover 
##                                         740 
##                                WIP.turnover 
##                                         640 
##                       Raw.material.turnover 
##                                         361 
##                          Shares.outstanding 
##                                         692 
##                           Equity.face.value 
##                                         692 
##                                         EPS 
##                                           0 
##                                Adjusted.EPS 
##                                           0 
##                           Total.liabilities 
##                                           0 
##                                   PE.on.BSE 
##                                        2194 
##                                     Default 
##                                           0

We observe that there are variables with missing values more then 25% of the total records. Imputing such variables can end up creating artifical data giving lower accuracy in Data Modelling. Hence we’ll be eliminating those variables where the missing data is more then 25%.

# Eliminating variables having missing value greater the 25%
data <- dataset[,-c(1,22,25,18,32,34,52)]

# Imputing missing values for the remaining variables.
imputed <- preProcess(data[,-46],method = "knnImpute",k = 5)

imputed_val <- predict(imputed,data)

# Checking for missing values on the Output data.
anyNA(imputed_val)
## [1] FALSE