1 Introduction

Congratulations! You have completed the people analytics course. As a final step in this course, every student should complete the case as their capstone project. In this article, we will explain the task and rubrics of people analytics capstone project.

2 Dataset

There will be 2 datasets: train and test dataset.

The train dataset will be used to train and evaluate the model, while the test dataset is used for the final evaluation. The final evaluation requires you to submit your prediction of the test dataset to the app.R in order to obtain the final model evaluation (more details are provided below). The data scheme is illustrated as follows:

2.1 HR Analytics: Job change of data scientist

A company which is active in Big Data and Data Science wants to hire data scientists among people who successfully pass some courses which conduct by the company. Many people signup for their training. Company wants to know which of these candidates are really wants to work for the company after training or looking for a new employment because it helps to reduce the cost and time as well as the quality of training or planning the courses and categorization of candidates. Information related to demographics, education, experience are in hands from candidates signup and enrollment1.

This dataset designed to understand the factors that lead a person to leave current job for HR researches too. By model(s) that uses the current credentials,demographics,experience data you will predict the probability of a candidate to look for a new job or will work for the company, as well as interpreting affected factors on employee decision.

We provide the train dataset as follows:

The observation data consists of the following variables:

  • enrollee_id: Unique ID for candidate
  • city_ development _index: Developement index of the city (scaled)
  • gender: Gender of candidate
  • relevent_experience: Relevant experience of candidate
  • education_level: Education level of candidate
  • major_discipline:Education major discipline of candidate
  • experience: Candidate total experience in years
  • lastnewjob: Difference in years between previous job and current job
  • training_hours: training hours completed
  • target: 0 – Not looking for job change, 1 – Looking for a job change

3 Rubrics

Total points of this people analytics course is 20 points. You can achieve full points when you meet the criteria below:

3.1 Data Wrangling

(2 Points) Demonstrated how to properly do data Data Preparation.

  • Do you need to remove certain columns after joining the data frames?

3.2 Explanatory Data Analysis

(2 Points) Explored the proportion of the target variable.

  • What is the target variable?
  • Is there any class imbalance between the target value?
  • What should you do if there is a class imbalance?

3.3 Model Fiting and Evaluation

(2 Points) Demonstrated how to prepare cross-validation data for this case.

  • What is your proportion of training-testing dataset?
  • Do you need to use stratified random sampling during the cross-validation?

(2 Points) Demonstrated how to properly do model fitting and evaluation.

  • What model do you use?
  • How do you set the model parameter?
  • Do you concerned more with recall than accuracy for this case? Why?

(2 Points) Demonstrated how to properly do model selection by comparing models or making adjustment to single model.

  • Which model is better?
  • Can you adjust the classification threshold to get better model performance?

3.4 Prediction Performance

Put your model RDS in given template dashboard and see how better your model performance is.

  • (2 Point) Reached Accuracy > 70% in in test dataset.
  • (2 Point) Reached Sensitivity > 55% in test dataset.
  • (2 Point) Reached Specificity > 75% in test dataset.
  • (2 Point) Reached Precision > 40% in test dataset.

3.5 Interpretation

(2 Points) Write the conclusion of your capstone project

  • Is your goal achieved?
  • Is the problem can be solved by machine learning?
  • What model did you use and how is the performance?
  • What is the potential business implementation of your capstone project?

4 Submission

After finishing your work of data preprocessing, modeling, and model evaluation, the next step will be;

  • Save your model to .RDS file
  • Check the final model evaluation in app.R
  • Create a report in the .Rmd file that explain your project (it contains the narrative and detail of the rubrics).
  • Knit to html
  • Attach the rds model and you html file to classroom. # Reference