Deepak Krishnan
[READING AND PREPARING DATA]
[1] 8995 17
'data.frame': 8995 obs. of 17 variables:
$ CandidateRef : int 2110407 2112635 2112838 2115021 2115125 2117167 2119124 2127572 2138169 2143362 ...
$ DOJExtended : Factor w/ 2 levels "No","Yes": 2 1 1 1 2 2 2 2 1 1 ...
$ DurationToAcceptOffer : int 14 18 3 26 1 17 37 16 1 6 ...
$ NoticePeriod : int 30 30 45 30 120 30 30 0 30 30 ...
$ OfferedBand : Factor w/ 4 levels "E0","E1","E2",..: 3 3 3 3 3 2 3 2 2 2 ...
$ PercentHikeExpectedInCTC: num -20.8 50 42.8 42.8 42.6 ...
$ PercentHikeOfferedInCTC : num 13.2 320 42.8 42.8 42.6 ...
$ PercentDifferenceCTC : num 42.9 180 0 0 0 ...
$ JoiningBonus : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
$ CandidateRelocateActual : Factor w/ 2 levels "No","Yes": 1 1 1 1 2 1 1 1 1 1 ...
$ Gender : Factor w/ 2 levels "Female","Male": 1 2 2 2 2 2 2 1 1 2 ...
$ CandidateSource : Factor w/ 3 levels "Agency","Direct",..: 1 3 1 3 3 3 3 2 3 3 ...
$ RexInYrs : int 7 8 4 4 6 2 7 8 3 3 ...
$ LOB : Factor w/ 9 levels "AXON","BFSI",..: 5 8 8 8 8 8 8 7 2 3 ...
$ Location : Factor w/ 11 levels "Ahmedabad","Bangalore",..: 9 3 9 9 9 9 9 9 5 3 ...
$ Age : int 34 34 27 34 34 34 32 34 26 34 ...
$ Status : Factor w/ 2 levels "Joined","Not Joined": 1 1 1 1 1 1 1 1 1 1 ...
Descriptive Statistics
vars n mean sd median
CandidateRef 1 8995 2843647.38 486344.77 2807482
DOJExtended* 2 8995 1.47 0.50 1
DurationToAcceptOffer 3 8995 21.43 25.81 10
NoticePeriod 4 8995 39.29 22.22 30
OfferedBand* 5 8995 2.39 0.63 2
PercentHikeExpectedInCTC 6 8995 43.86 29.79 40
PercentHikeOfferedInCTC 7 8995 40.66 36.06 36
PercentDifferenceCTC 8 8995 -1.57 19.61 0
JoiningBonus* 9 8995 1.05 0.21 1
CandidateRelocateActual* 10 8995 1.14 0.35 1
Gender* 11 8995 1.83 0.38 2
CandidateSource* 12 8995 1.89 0.67 2
RexInYrs 13 8995 4.24 2.55 4
LOB* 14 8995 5.18 2.38 5
Location* 15 8995 4.94 3.00 3
Age 16 8995 29.91 4.10 29
Status* 17 8995 1.19 0.39 1
[ONE-WAY, TWO-WAY AND THREE-WAY CONTINGENCY TABLES]
Status
Joined Not Joined
81.3 18.7
Status
DOJExtended Joined Not Joined
No 81.08 18.92
Yes 81.55 18.45
Sum 162.63 37.37
Status
NoticePeriod Joined Not Joined Sum
0 726 51 777
30 4393 765 5158
45 397 129 526
60 1285 470 1755
75 75 35 110
90 415 212 627
120 22 20 42
Sum 7313 1682 8995
Status
NoticePeriod Joined Not Joined
0 93.44 6.56
30 85.17 14.83
45 75.48 24.52
60 73.22 26.78
75 68.18 31.82
90 66.19 33.81
120 52.38 47.62
Sum 514.06 185.94
Status
JoiningBonus Joined Not Joined
No 81.34 18.66
Yes 80.58 19.42
Sum 161.92 38.08
Status
Gender Joined Not Joined
Female 82.40 17.60
Male 81.07 18.93
Sum 163.47 36.53
Status
CandidateSource Joined Not Joined
Agency 75.82 24.18
Direct 82.00 18.00
Employee Referral 88.00 12.00
Sum 245.82 54.18
Status
OfferedBand Joined Not Joined
E0 76.30 23.70
E1 81.30 18.70
E2 80.97 19.03
E3 85.15 14.85
Sum 323.72 76.28
Status
LOB Joined Not Joined
AXON 77.46 22.54
BFSI 75.86 24.14
CSMP 81.52 18.48
EAS 73.41 26.59
ERS 78.11 21.89
ETS 83.07 16.93
Healthcare 82.26 17.74
INFRA 87.79 12.21
MMS 100.00 0.00
Sum 739.48 160.52
Status
Location Joined Not Joined Sum
Ahmedabad 5 1 6
Bangalore 1742 488 2230
Chennai 2486 664 3150
Cochin 7 1 8
Gurgaon 118 28 146
Hyderabad 266 75 341
Kolkata 100 29 129
Mumbai 176 21 197
Noida 2362 365 2727
Others 13 0 13
Pune 38 10 48
Sum 7313 1682 8995
Status
Location Joined Not Joined Sum
Ahmedabad 5 1 6
Bangalore 1742 488 2230
Chennai 2486 664 3150
Cochin 7 1 8
Gurgaon 118 28 146
Hyderabad 266 75 341
Kolkata 100 29 129
Mumbai 176 21 197
Noida 2362 365 2727
Others 13 0 13
Pune 38 10 48
Sum 7313 1682 8995
Status
Location Joined Not Joined
Ahmedabad 83.33 16.67
Bangalore 78.12 21.88
Chennai 78.92 21.08
Cochin 87.50 12.50
Gurgaon 80.82 19.18
Hyderabad 78.01 21.99
Kolkata 77.52 22.48
Mumbai 89.34 10.66
Noida 86.62 13.38
Others 100.00 0.00
Pune 79.17 20.83
Sum 919.35 180.65
[SUMMARY TABLES]
Status AverageAgeofCandidates
1 Joined 30.00
2 Not Joined 29.52
Status AverageAgeofCandidates AverageNoticePeriod
1 Joined 30.00 37.24
2 Not Joined 29.52 48.19
Status Gender AverageAgeofCandidates
1 Joined Female 29.09
2 Not Joined Female 28.02
3 Joined Male 30.20
4 Not Joined Male 29.81
Status Gender AverageAgeofCandidates YearsOfExperience
1 Joined Female 29.09 3.45
2 Not Joined Female 28.02 3.61
3 Joined Male 30.20 4.35
4 Not Joined Male 29.81 4.60
Status Gender AverageAgeofCandidates YearsOfExperience
1 Joined Female 29.09 3.45
2 Not Joined Female 28.02 3.61
3 Joined Male 30.20 4.35
4 Not Joined Male 29.81 4.60
DurationToAcceptOffer
1 19.43
2 24.10
3 20.88
4 25.12
Status Gender AverageAgeofCandidates NoticePeriod
1 Joined Female 29.09 35.48
2 Not Joined Female 28.02 46.65
3 Joined Male 30.20 37.62
4 Not Joined Male 29.81 48.49
[READING AND PREPARING DATA]
[1] 8995 17
'data.frame': 8995 obs. of 17 variables:
$ CandidateRef : int 2110407 2112635 2112838 2115021 2115125 2117167 2119124 2127572 2138169 2143362 ...
$ DOJExtended : Factor w/ 2 levels "No","Yes": 2 1 1 1 2 2 2 2 1 1 ...
$ DurationToAcceptOffer : int 14 18 3 26 1 17 37 16 1 6 ...
$ NoticePeriod : int 30 30 45 30 120 30 30 0 30 30 ...
$ OfferedBand : Factor w/ 4 levels "E0","E1","E2",..: 3 3 3 3 3 2 3 2 2 2 ...
$ PercentHikeExpectedInCTC: num -20.8 50 42.8 42.8 42.6 ...
$ PercentHikeOfferedInCTC : num 13.2 320 42.8 42.8 42.6 ...
$ PercentDifferenceCTC : num 42.9 180 0 0 0 ...
$ JoiningBonus : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
$ CandidateRelocateActual : Factor w/ 2 levels "No","Yes": 1 1 1 1 2 1 1 1 1 1 ...
$ Gender : Factor w/ 2 levels "Female","Male": 1 2 2 2 2 2 2 1 1 2 ...
$ CandidateSource : Factor w/ 3 levels "Agency","Direct",..: 1 3 1 3 3 3 3 2 3 3 ...
$ RexInYrs : int 7 8 4 4 6 2 7 8 3 3 ...
$ LOB : Factor w/ 9 levels "AXON","BFSI",..: 5 8 8 8 8 8 8 7 2 3 ...
$ Location : Factor w/ 11 levels "Ahmedabad","Bangalore",..: 9 3 9 9 9 9 9 9 5 3 ...
$ Age : int 34 34 27 34 34 34 32 34 26 34 ...
$ Status : Factor w/ 2 levels "Joined","Not Joined": 1 1 1 1 1 1 1 1 1 1 ...
Descriptive Statistics
vars n mean sd median
CandidateRef 1 8995 2843647.38 486344.77 2807482
DOJExtended* 2 8995 1.47 0.50 1
DurationToAcceptOffer 3 8995 21.43 25.81 10
NoticePeriod 4 8995 39.29 22.22 30
OfferedBand* 5 8995 2.39 0.63 2
PercentHikeExpectedInCTC 6 8995 43.86 29.79 40
PercentHikeOfferedInCTC 7 8995 40.66 36.06 36
PercentDifferenceCTC 8 8995 -1.57 19.61 0
JoiningBonus* 9 8995 1.05 0.21 1
CandidateRelocateActual* 10 8995 1.14 0.35 1
Gender* 11 8995 1.83 0.38 2
CandidateSource* 12 8995 1.89 0.67 2
RexInYrs 13 8995 4.24 2.55 4
LOB* 14 8995 5.18 2.38 5
Location* 15 8995 4.94 3.00 3
Age 16 8995 29.91 4.10 29
Status* 17 8995 1.19 0.39 1
[PIE CHART, BARPLOT]
library(caTools)
# use set.seed to use the same random number sequence
set.seed(123)
# craeting 75% data for training
default.df <- hr_data
split <- sample.split(default.df$Status, SplitRatio = 0.75)
trainData <- subset(default.df, split == TRUE)
# dimensions of training data
dim(trainData)
[1] 6747 17
# creating 25% data for testing
testData <- subset(default.df, split == FALSE)
# dimensions of testing data
dim(testData)
[1] 2248 17
glm())# fit logistic classifier 1
logitClassifier1 <- glm(Status ~ CandidateRef+ DOJExtended+ DurationToAcceptOffer+ NoticePeriod+ OfferedBand+ PercentHikeExpectedInCTC+ PercentHikeOfferedInCTC+ PercentDifferenceCTC+ JoiningBonus+ CandidateRelocateActual+ Gender+ CandidateSource+ RexInYrs+ LOB+ Location+ Age
,
data = trainData,
family = binomial())
# summary of the classifier 1
summary(logitClassifier1)
Call:
glm(formula = Status ~ CandidateRef + DOJExtended + DurationToAcceptOffer +
NoticePeriod + OfferedBand + PercentHikeExpectedInCTC + PercentHikeOfferedInCTC +
PercentDifferenceCTC + JoiningBonus + CandidateRelocateActual +
Gender + CandidateSource + RexInYrs + LOB + Location + Age,
family = binomial(), data = trainData)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.50288 -0.69671 -0.51264 -0.00012 2.76304
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.550e+01 2.536e+03 -0.006 0.995125
CandidateRef 1.530e-07 8.198e-08 1.867 0.061937
DOJExtendedYes -1.843e-01 7.447e-02 -2.474 0.013349
DurationToAcceptOffer -4.905e-04 1.370e-03 -0.358 0.720333
NoticePeriod 2.194e-02 1.632e-03 13.443 < 2e-16
OfferedBandE1 -1.401e+00 2.245e-01 -6.240 4.38e-10
OfferedBandE2 -1.319e+00 2.448e-01 -5.390 7.05e-08
OfferedBandE3 -1.718e+00 3.208e-01 -5.357 8.45e-08
PercentHikeExpectedInCTC 4.027e-03 4.291e-03 0.938 0.348012
PercentHikeOfferedInCTC -5.880e-03 4.378e-03 -1.343 0.179244
PercentDifferenceCTC 4.082e-03 5.846e-03 0.698 0.485006
JoiningBonusYes 1.997e-01 1.722e-01 1.160 0.246159
CandidateRelocateActualYes -1.731e+01 1.990e+02 -0.087 0.930701
GenderMale 2.005e-01 9.098e-02 2.204 0.027519
CandidateSourceDirect -3.033e-01 7.804e-02 -3.886 0.000102
CandidateSourceEmployee Referral -7.207e-01 1.139e-01 -6.329 2.47e-10
RexInYrs 5.114e-02 2.341e-02 2.185 0.028884
LOBBFSI -5.010e-01 1.657e-01 -3.023 0.002503
LOBCSMP -3.888e-01 1.895e-01 -2.052 0.040168
LOBEAS 2.146e-01 2.075e-01 1.035 0.300820
LOBERS -2.959e-01 1.571e-01 -1.884 0.059604
LOBETS -6.107e-01 1.862e-01 -3.280 0.001039
LOBHealthcare -5.626e-01 3.150e-01 -1.786 0.074082
LOBINFRA -7.909e-01 1.688e-01 -4.685 2.80e-06
LOBMMS -1.773e+01 2.099e+03 -0.008 0.993261
LocationBangalore 1.584e+01 2.536e+03 0.006 0.995016
LocationChennai 1.600e+01 2.536e+03 0.006 0.994966
LocationCochin -7.192e-01 3.676e+03 0.000 0.999844
LocationGurgaon 1.605e+01 2.536e+03 0.006 0.994950
LocationHyderabad 1.566e+01 2.536e+03 0.006 0.995072
LocationKolkata 1.589e+01 2.536e+03 0.006 0.995002
LocationMumbai 1.573e+01 2.536e+03 0.006 0.995052
LocationNoida 1.564e+01 2.536e+03 0.006 0.995081
LocationOthers -5.605e-01 3.320e+03 0.000 0.999865
LocationPune 1.621e+01 2.536e+03 0.006 0.994900
Age -3.783e-02 1.149e-02 -3.293 0.000990
(Intercept)
CandidateRef .
DOJExtendedYes *
DurationToAcceptOffer
NoticePeriod ***
OfferedBandE1 ***
OfferedBandE2 ***
OfferedBandE3 ***
PercentHikeExpectedInCTC
PercentHikeOfferedInCTC
PercentDifferenceCTC
JoiningBonusYes
CandidateRelocateActualYes
GenderMale *
CandidateSourceDirect ***
CandidateSourceEmployee Referral ***
RexInYrs *
LOBBFSI **
LOBCSMP *
LOBEAS
LOBERS .
LOBETS **
LOBHealthcare .
LOBINFRA ***
LOBMMS
LocationBangalore
LocationChennai
LocationCochin
LocationGurgaon
LocationHyderabad
LocationKolkata
LocationMumbai
LocationNoida
LocationOthers
LocationPune
Age ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 6502.9 on 6746 degrees of freedom
Residual deviance: 5599.4 on 6711 degrees of freedom
AIC: 5671.4
Number of Fisher Scoring iterations: 17
Call:
glm(formula = Status ~ DOJExtended + NoticePeriod + OfferedBand +
Gender + CandidateSource + RexInYrs + LOB + Age, family = binomial(),
data = trainData)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.3919 -0.6751 -0.5370 -0.3653 2.7069
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.34836 0.36445 0.956 0.33915
DOJExtendedYes -0.21356 0.06849 -3.118 0.00182 **
NoticePeriod 0.02094 0.00147 14.250 < 2e-16 ***
OfferedBandE1 -1.22692 0.21459 -5.717 1.08e-08 ***
OfferedBandE2 -1.16601 0.23457 -4.971 6.66e-07 ***
OfferedBandE3 -1.64026 0.30794 -5.327 1.00e-07 ***
GenderMale 0.11714 0.08896 1.317 0.18793
CandidateSourceDirect -0.29907 0.07505 -3.985 6.76e-05 ***
CandidateSourceEmployee Referral -0.76459 0.11104 -6.886 5.74e-12 ***
RexInYrs 0.07184 0.02206 3.256 0.00113 **
LOBBFSI -0.07745 0.14657 -0.528 0.59721
LOBCSMP -0.20099 0.17830 -1.127 0.25963
LOBEAS 0.36480 0.18849 1.935 0.05294 .
LOBERS -0.08401 0.13947 -0.602 0.54694
LOBETS -0.41182 0.17377 -2.370 0.01779 *
LOBHealthcare -0.18137 0.30411 -0.596 0.55092
LOBINFRA -0.60664 0.15362 -3.949 7.85e-05 ***
LOBMMS -12.33515 172.93170 -0.071 0.94314
Age -0.04397 0.01052 -4.181 2.90e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 6502.9 on 6746 degrees of freedom
Residual deviance: 6061.9 on 6728 degrees of freedom
AIC: 6099.9
Number of Fisher Scoring iterations: 12
# prediction using classifier 1
predProbClass1 <- predict(logitClassifier1a, type = 'response', newdata = testData[-1])
yPred1 <- ifelse(predProbClass1 > 0.5, "Not Joined", "Joined")
table(yPred1)
yPred1
Joined Not Joined
2221 27
# confusion matrix using classifier 1a
confMatrix1 <- table(yActual = testData[, 17], yPred1)
confMatrix1
yPred1
yActual Joined Not Joined
Joined 1814 14
Not Joined 407 13
MLmetrics# confusion matrix using classifier 1a
library(MLmetrics)
ConfusionMatrix(y_pred = yPred1, y_true = testData[, 17])
y_pred
y_true Joined Not Joined
Joined 1814 14
Not Joined 407 13
# accuracy using classifier 1
library(MLmetrics)
Accuracy(y_pred = yPred1, y_true = testData$Status)
[1] 0.8127224
# sensitivity using classifier 1
library(MLmetrics)
Sensitivity(y_true = testData$Status, y_pred = yPred1, positive = "Joined")
[1] 0.9923414
# specificity using classifier 1
library(MLmetrics)
Specificity(y_true = testData$Status, y_pred = yPred1, positive = "Joined")
[1] 0.03095238
library(ROCR)
#Every classifier evaluation using ROCR starts with creating a prediction object. This function is used to transform the input data into a standardized format.
PredictObject1 <- prediction(predProbClass1, testData$Status)
# All kinds of predictor evaluations are performed using the performance function
PerformObject1 <- performance(PredictObject1, "tpr","fpr")
# Plot the ROC Curve for Credit Card Default
plot(PerformObject1, main = "ROC Curve for Joining Status", col = "black", lwd = 2)
abline(a = 0,b = 1, lwd = 2, lty = 3, col = "black")