HR Analytics EDA
HR Analytics EDA
- LOADING DATA INTO R ENVIRONMENT
- DISCRETE DATA DISTRIBUTION
- Percentage of the candidates (Joined / Not joined)
- Bar Chart for % of the Candidates Who Joined / Did Not Join the Company
- Percentage of the candidates Joined / Did Not Join the Company, Split by DOJ extended
- Bar Chart for % of Candidates Who Did Not Join, Split by DOJ Extended
- Percentage of Candidates Who (Joined / Did Not Join), split by Notice Period
- Bar Chart for % of Candidates, Who Did Not Join, Split by Notice Period
- Percentage of the Candidates Who (Joined / Did Not Join), split by Joining Bonus
- Bar Chart for % of Candidates, Who Did Not Join, Split by Joining Bonus
- Percentage of the Candidates Who (Joined / Did Not Join), Split by Gender
- Bar Chart for % of the Candidates Who Did Not join, Split by Gender
- Percentage of the Candidates Who (Joined / Did Not Join), Split by Candidate Source
- Bar Chart for % of Candidates Who Did Not Join the Company, Split by Candidate Source
- Percentage of the Candidates Who (Joined / Did Not Join), Split by Offered Band
- Bar Chart for % Candidates Who Did Not Join, Split by Offered Band
- Percentage of the Candidates Who (Joined / Did Not join), Split by Line of Business (LOB)
- Bar Chart for % of the Candidates Who Did Not join, Split by Line of Business (LOB)
- CONTINUOUS DATA DISTRIBUTION
- Average Age of the Candidates (Joined/Did Not join)
- Mean Plot for the Age, Split by Status
- Boxplot of Age, Split by Status (Joined / Did Not Join)
- Average Age of the Candidates (Joined/Did Not join)
- Mean Plot for Notice Period, Split by Status (Joined / Did Not join)
- Boxplot for Notice Period of the Candidates, Split by Status (Joined / Did Not join)
- Average (Relevant Years of Experience) of the candidates (Joined / Not joined)
- Mean Plot for the above
- Boxplot of Relevant Years of Experience of the candidates (Joined / Not joined)
- Average of DurationToAcceptOffer (Number of days taken by the candidate to accept the offer) of candidates (Joined / Not joined)
- Mean plot for the above
- Average (Age, Relevant Years of Experience and Number of days taken by the candidate to accept the offer) of candidates (Joined / Not joined) by Gender (Male / Female)
- CORRELATION
- SCATTER PLOTS
- Scatter Plot of Experience and Duration to Accept Offer by Status (Joined / Not Joined)
- Scatter Plot of Experience and Notice Period by Status (Joined / Not Joined)
- Scatter Plot of Experience and Percent Hike (CTC) Expected by Candidate by Status (Joined / Not Joined)
- Scatter Plot of Experience and Percent Hike (CTC) Offered by Candidate by Status (Joined / Not Joined)
ggplot version available @ http://rpubs.com/pgp34301/hr_analytics_eda
LOADING DATA INTO R ENVIRONMENT
Loading the Data
Column names of the dataframe
## [1] "DOJExtended" "DurationToAcceptOffer"
## [3] "NoticePeriod" "OfferedBand"
## [5] "PercentHikeExpectedInCTC" "PercentHikeOfferedInCTC"
## [7] "PercentDifferenceCTC" "JoiningBonus"
## [9] "CandidateRelocateActual" "Gender"
## [11] "CandidateSource" "RexInYrs"
## [13] "LOB" "Location"
## [15] "Age" "Status"
DISCRETE DATA DISTRIBUTION
Percentage of the candidates (Joined / Not joined)
## Status
## Joined NotJoined
## 81.3 18.7
Bar Chart for % of the Candidates Who Joined / Did Not Join the Company
bp <- barplot(
tab1,
main = "% of the Candidates who joined/ did not join the company",
xlab = "Status",
ylab = "Percentage(%)",
col = c('lightblue', 'red'),
legend = rownames(tab1),
beside = TRUE,
ylim = c(0, 90))
text(bp, 0, round(tab1, 1), cex = 1, pos = 3)
Percentage of the candidates Joined / Did Not Join the Company, Split by DOJ extended
## Status
## DOJExtended Joined NotJoined
## No 81.08 18.92
## Yes 81.55 18.45
Bar Chart for % of Candidates Who Did Not Join, Split by DOJ Extended
tab2 <- round(prop.table(table(DOJExtended, Status), 1)*100, 2)[3:4]
bp <- barplot(
tab2,
main = "% of the Candidates, who did not join, split by DOJ Extended",
xlab = "DOJ Extended",
ylab = "Percentage(%)",
col = 'lightblue',
legend = rownames(tab2),
beside = TRUE,
ylim = c(0, 20))
text(bp, 0, round(tab2, 1), cex = 1, pos = 3)
Percentage of Candidates Who (Joined / Did Not Join), split by Notice Period
## Status
## NoticePeriod Joined NotJoined
## 0 93.44 6.56
## 30 85.17 14.83
## 45 75.48 24.52
## 60 73.22 26.78
## 75 68.18 31.82
## 90 66.19 33.81
## 120 52.38 47.62
Bar Chart for % of Candidates, Who Did Not Join, Split by Notice Period
tab3 <- round(prop.table(table(NoticePeriod, Status), 1)*100, 2)[,2]
bp <- barplot(
tab3,
main = "% of the Candidates, who did not join, split by Notice Period",
xlab = "NoticePeriod",
ylab = "Percentage(%)",
col = 'lightblue',
legend = rownames(tab3),
beside = TRUE)
text(bp, 0, round(tab3, 1), cex = 1, pos = 3)
Percentage of the Candidates Who (Joined / Did Not Join), split by Joining Bonus
## Status
## JoiningBonus Joined NotJoined
## No 81.34 18.66
## Yes 80.58 19.42
Bar Chart for % of Candidates, Who Did Not Join, Split by Joining Bonus
tab4 <- round(prop.table(table(JoiningBonus, Status), 1)*100, 2)[,2]
bp <- barplot(
tab4,
main = "% of the Candidates, who did not join, split by Joining Bonus",
xlab = "Joining Bonus",
ylab = "Percentage(%)",
col = 'lightblue',
legend = rownames(tab4),
beside = TRUE)
text(bp, 0, round(tab4, 1), cex = 1, pos = 3)
Percentage of the Candidates Who (Joined / Did Not Join), Split by Gender
## Status
## Gender Joined NotJoined
## Female 82.40 17.60
## Male 81.07 18.93
Bar Chart for % of the Candidates Who Did Not join, Split by Gender
tab5 <- round(prop.table(table(Gender, Status), 1)*100, 2)[,2]
bp <- barplot(
tab5,
main = "% of the Candidates, who did not join, split by Gender",
xlab = "Gender",
ylab = "Percentage(%)",
col = 'lightblue',
legend = rownames(tab5),
beside = TRUE)
text(bp, 0, round(tab5, 1), cex = 1, pos = 3)
Percentage of the Candidates Who (Joined / Did Not Join), Split by Candidate Source
## Status
## CandidateSource Joined NotJoined
## Agency 75.82 24.18
## Direct 82.00 18.00
## Employee Referral 88.00 12.00
Bar Chart for % of Candidates Who Did Not Join the Company, Split by Candidate Source
tab6 <- round(prop.table(table(CandidateSource, Status), 1)*100, 2)[,2]
bp <- barplot(
tab6,
main = "% of the Candidates, who did not join, split by Candidate Source",
xlab = "Candidate Source",
ylab = "Percentage(%)",
col = 'lightblue',
legend = rownames(tab6),
beside = TRUE)
text(bp, 0, round(tab6, 1), cex = 1, pos = 3)
Percentage of the Candidates Who (Joined / Did Not Join), Split by Offered Band
## Status
## OfferedBand Joined NotJoined
## E0 76.30 23.70
## E1 81.30 18.70
## E2 80.97 19.03
## E3 85.15 14.85
Bar Chart for % Candidates Who Did Not Join, Split by Offered Band
tab7 <- round(prop.table(table(OfferedBand, Status), 1)*100, 2)[,2]
bp <- barplot(
tab7,
main = "% of the Candidates, who did not join, split by Offered Band",
xlab = "Offered Band",
ylab = "Percentage(%)",
col = 'lightblue',
legend = rownames(tab7),
beside = TRUE)
text(bp, 0, round(tab7, 1), cex = 1, pos = 3)
Percentage of the Candidates Who (Joined / Did Not join), Split by Line of Business (LOB)
## Status
## LOB Joined NotJoined
## AXON 77.46 22.54
## BFSI 75.86 24.14
## CSMP 81.52 18.48
## EAS 73.41 26.59
## ERS 78.11 21.89
## ETS 83.07 16.93
## Healthcare 82.26 17.74
## INFRA 87.79 12.21
## MMS 100.00 0.00
Bar Chart for % of the Candidates Who Did Not join, Split by Line of Business (LOB)
tab8 <- round(prop.table(table(LOB, Status), 1)*100, 2)[,2]
bp <- barplot(
tab8,
main = "% of the Candidates, who did not join, split by Line of Business",
xlab = "LOB",
ylab = "Percentage(%)",
col = 'lightblue',
legend = rownames(tab8),
beside = TRUE)
text(bp, 0, round(tab8, 1), cex = 1, pos = 3)
CONTINUOUS DATA DISTRIBUTION
Average Age of the Candidates (Joined/Did Not join)
library(data.table)
dt <- data.table(HR.df)
dt[, .(AverageAgeofCandidates = round(mean(Age), 2)),
by=list(Status)]
Mean Plot for the Age, Split by Status
## Warning: package 'gplots' was built under R version 3.6.1
plotmeans(Age~Status, data=HR.df,barcol = 'blue',mean.labels = T,col = 'red',n.label = F,digits = 2)
Boxplot of Age, Split by Status (Joined / Did Not Join)
Average Age of the Candidates (Joined/Did Not join)
Mean Plot for Notice Period, Split by Status (Joined / Did Not join)
plotmeans(
NoticePeriod ~ Status,
data = HR.df,
mean.labels = T,
col = 'red',
barcol = 'blue',
digits = 2,
n.label = F
)
## Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped
Boxplot for Notice Period of the Candidates, Split by Status (Joined / Did Not join)
Average (Relevant Years of Experience) of the candidates (Joined / Not joined)
Mean Plot for the above
Boxplot of Relevant Years of Experience of the candidates (Joined / Not joined)
Average of DurationToAcceptOffer (Number of days taken by the candidate to accept the offer) of candidates (Joined / Not joined)
Mean plot for the above
Average (Age, Relevant Years of Experience and Number of days taken by the candidate to accept the offer) of candidates (Joined / Not joined) by Gender (Male / Female)
dt[,list(AverageAgeofCandidates = round(mean(Age), 2),
YearsOfExperience = round(mean(RexInYrs), 2),
DurationtoAcceptOffer = round(mean(DurationToAcceptOffer), 2),NoticePeriod = round(mean(NoticePeriod), 2)),
by=list(Status, Gender)][order(Status, Gender)]
CORRELATION
Correlation Matrix for all the Continuous Variable
## DurationToAcceptOffer NoticePeriod
## DurationToAcceptOffer 1.00 0.36
## NoticePeriod 0.36 1.00
## PercentHikeOfferedInCTC 0.01 -0.01
## PercentDifferenceCTC -0.01 -0.02
## RexInYrs 0.11 0.18
## Age 0.02 0.00
## PercentHikeOfferedInCTC PercentDifferenceCTC
## DurationToAcceptOffer 0.01 -0.01
## NoticePeriod -0.01 -0.02
## PercentHikeOfferedInCTC 1.00 0.60
## PercentDifferenceCTC 0.60 1.00
## RexInYrs -0.11 0.08
## Age -0.08 0.04
## RexInYrs Age
## DurationToAcceptOffer 0.11 0.02
## NoticePeriod 0.18 0.00
## PercentHikeOfferedInCTC -0.11 -0.08
## PercentDifferenceCTC 0.08 0.04
## RexInYrs 1.00 0.57
## Age 0.57 1.00
##
## n= 8995
##
##
## P
## DurationToAcceptOffer NoticePeriod
## DurationToAcceptOffer 0.0000
## NoticePeriod 0.0000
## PercentHikeOfferedInCTC 0.4883 0.2019
## PercentDifferenceCTC 0.3730 0.1531
## RexInYrs 0.0000 0.0000
## Age 0.0562 0.6376
## PercentHikeOfferedInCTC PercentDifferenceCTC
## DurationToAcceptOffer 0.4883 0.3730
## NoticePeriod 0.2019 0.1531
## PercentHikeOfferedInCTC 0.0000
## PercentDifferenceCTC 0.0000
## RexInYrs 0.0000 0.0000
## Age 0.0000 0.0003
## RexInYrs Age
## DurationToAcceptOffer 0.0000 0.0562
## NoticePeriod 0.0000 0.6376
## PercentHikeOfferedInCTC 0.0000 0.0000
## PercentDifferenceCTC 0.0000 0.0003
## RexInYrs 0.0000
## Age 0.0000
Plotting Correlation Matrix
## Warning: package 'PerformanceAnalytics' was built under R version 3.6.1
## Warning: package 'xts' was built under R version 3.6.1
SCATTER PLOTS
Scatter Plot of Experience and Duration to Accept Offer by Status (Joined / Not Joined)
plot(RexInYrs, DurationToAcceptOffer, xlab = "Duration to Accept Offer (in Days)", ylab = "Relevant Experience (in Years)", main = "Scatterplot of Experience and Duration to Accept Offer",col=Status)
legend("topright",
levels(Status),
col = c('black','red'),
pch = 1)
Scatter Plot of Experience and Notice Period by Status (Joined / Not Joined)
plot(RexInYrs,NoticePeriod, xlab = "Relevant Experience (in Years)", ylab = "Notice Period (in Days)", main = "Scatterplot of Experience and Notice Period",col=Status)
legend("topright",
levels(Status),
col = c('black','red'),
pch = 1)
Scatter Plot of Experience and Percent Hike (CTC) Expected by Candidate by Status (Joined / Not Joined)
plot(RexInYrs,PercentHikeExpectedInCTC, xlab = "Relevant Experience (in Years)", ylab = "Percent Hike (CTC) Expected by Candidate", main = "Scatterplot of Experience and Percent Hike (CTC) Expected by Candidate",col=Status)
legend("topright",
levels(Status),
col = c('black','red'),
pch = 1)