Overview

Machine learning for practical prediction and analysis

Machine learning isn't just for serving up Ads on the internet or predicting your viewing preferences on Netflix.

It can be a powerful tool for common business tasks. In this example we will see how it can be applied to rank job applicants

Demonstration

We created some data for new hire candidates that shows their answers to job application questions along how they were rated by an interviewer. These employees were hired for a 60 day probationary period. Those who were made offers after the probationary period are labeled as "Good Hires" while those terminated are labeled as "Bad Hires"

First we will try descriptive analytics and then we will utilize predictive models to demonstrate how they can find answers hidden in the data

Predictors Used

priorwelding : Prior experience Yes/No
shiftpref : Shift preference First/Last
multijobs : Had > 1 job in prior 3 years Yes/No
techschool : Tech school degree Yes/No
contract : Prior contract work Yes/No
licensed : Licensed in another state Yes/No
intrating : Interviewer rating Numeric

Descriptive Results - Overview

The data shows that we have an even balance of good and bad hires

Descriptive Results - Predictor Detail

Using descriptive analytics to look at the first three predictors we see no clear signal. The "Good Hire" rate is the same regardless of the answer to each individual question.

Descriptive Results - Predictor Detail

Using descriptive analytics to look at the second three features, again the results are identical regardless of an individual predictor's response

Descriptive Results - Predictor Detail

Because the interviewer score is a numeric score, we will look at it by simple quartiles. Finally there is some differentiation but no reliable predictive value on employee success.

Predictor Importance

In the final use of descriptive analytics we've plotted the individual answers with color coding indicating if the "Good Hire" rate for that answer is favorable to the overall 50% rate. Only the interviewer score deviates more than 2% from the 50% "Good Hire" average but the Q4 scores show a contradictory relationship

Next Steps

If we only had these descriptive analytics at our disposal we might stop here and decide that there was no predictive value in the hiring data based on looking at the predictors individually.

What about looking to see if there is any signal in the combination of the answers versus just individual answers? This would be difficult with just descriptive methods as there are 6400 possible combinations of answers; that would be a lot of charts.

Fortunately we have machine learning methods at our disposal.

Training and Testing a Model

First, we use random sampling to evenly split our data into training and test data sets. The training data will be used to build a model to predict the test data. If we get good results on the test data we know we have a viable model to predict future applicants

Model Results

After building a basic machine learning model (Random Forest) we can find the hidden signal that exists from the combination of applicant answers. There is a pattern that the model picked up that allows it to predict success with nearly 90% accuracy.

Model Results

The score from the model would be used to rank the applicants. This chart shows how the results relate to the model score

Conclusions

Machine learning can unlock hidden signals in your data that human eyes and intuition can't always find.

Use it to leverage the full power of your organization's data to solve business problems