Selection Project Part 3

# Get the dataframe from Part 2 to get raw data
regdf <- read.csv("Validity Project.csv",fileEncoding = 'UTF-8-BOM')

# Get the simulated "new applicants" scores on the predictor variables
scoresdf <- read.csv("Tanner_Final_Data_Master_2022.csv",fileEncoding = 'UTF-8-BOM')

I am tasked with giving 10 recommendations to an open position: general operations manager.

I choose 3 KSAs from O-NET to then test future applicants to predict how well they would do in the general operations manger position. The KSAs I chose are critical thinking, social perceptiveness, and coordination. The tests I found were the California Critical Thinking Skill Test (CCTST), Chapin Social Insight Test (CSIT), and a structured interview to test the KSAs respectively. These assessments had moderate to high validity and acceptable reliability estimates.

After conducting a concurrent criterion validity study, the predictors CCTST and the interview were significantly correlated to current employee’s performance while CSIT was not. With the same data, I ran a linear regression with the significant predictors and got the following regression equation.

# Create a regression equation to predict Criterion variable
model <- lm(Criterion ~ CCTST + Interview1, regdf)
# Show equation with LaTeX
extract_eq(model,wrap=TRUE,use_coefs=TRUE)

\[ \begin{aligned} \operatorname{\widehat{Criterion}} &= -4.03 + 0.07(\operatorname{CCTST}) + 0.72(\operatorname{Interview1}) \end{aligned} \]

I also computed the minimum score on both the CCTST and the interview for each applicant to maintain at least the mean performance of the organization (3.81). These were the cutoffs of both predictors:

CCTST: 76

Interview: 3.94

Taking the regression equation from part 2, plugging in each of the test scores from each predictor and putting them in the regression equation will give the predicted performance score (Criterion).

I put the scores into and create a new variable, “Predicted”, which is the predicted work performance of the applicant based off the scores from the other test.

scoresdf <- scoresdf %>% 
  # Go row by row to calculate
  rowwise %>% 
  # Create a new variable called 'Predicted'
  mutate(Predicted=
           # Pull the constant from the regression
           model$coefficients[1]+
           # Multiply coefficient by the CCTST score
           (CCTST*model$coefficients[2])+
           # Multiply coefficient by the Interview1 score
           (Interview1*model$coefficients[3]))

I then sort by the predicted scores from highest to lowest and only show the top 10.

# Make an object called 'Regression' from the full data frame
Regression <- scoresdf %>% 
  # Sort the data by decending order in the 'Predited' variable
  arrange(desc(Predicted)) %>% 
  # Report the top 10
  head(n=10)
# Put the Regression strategy results in a pretty table
Regression %>% 
  kable(digits = 3) %>% 
  kableExtra::kable_styling(full_width=F,
                            bootstrap_options = "condense")

Applicant	CCTST	Interview1	Predicted
75	87	5	5.319
96	76	6	5.318
77	83	5	5.056
7	81	5	4.924
86	81	5	4.924
1	80	5	4.858
61	80	5	4.858
70	80	5	4.858
99	80	5	4.858
31	79	5	4.792

I can see from the table above that we should hire those applicants because they had the highest predicted work performance. I would recommend the applicants in the table above using the regression method.

Cutoff Strategy

# Define the cutoff for the predictors
CCTST.cutoff <- 76
Interview.cutoff <- 3.94
# Filter all the participants who have passed the cutoffs
Cutoff <- scoresdf %>% 
  # Select the order of variables you would like to show
  select(Applicant,Predicted,CCTST,Interview1) %>% 
  # Filter out the applicants that do not meet all the cutoffs
  filter(CCTST>CCTST.cutoff,Interview1>Interview.cutoff) %>% 
  # Sort the variable 'Predicted' in decending order
  arrange(desc(Predicted)) %>% 
  # Report the top 10
  head(n=10)
# Display the applicants who passed the cutoffs
Cutoff %>% 
  kable(digits = 3) %>% 
  kableExtra::kable_styling(full_width = F,
                            bootstrap_options = "condense")

Applicant	Predicted	CCTST	Interview1
75	5.319	87	5
77	5.056	83	5
7	4.924	81	5
86	4.924	81	5
1	4.858	80	5
61	4.858	80	5
70	4.858	80	5
99	4.858	80	5
31	4.792	79	5
62	4.792	79	5

Hurdle Strategy

Hurdle 1

# Look at the original data frame
scoresdf %>% 
  # Select the order of variables you would like to show
  select(Applicant,Predicted,CCTST,Interview1) %>% 
  # Filter all the participants who have passed the CCTST hurdle
  filter(CCTST>CCTST.cutoff) %>% 
  # Put the applicant's data in a pretty table
  kable(digits = 3) %>% 
  kableExtra::kable_styling(full_width = F,
                            bootstrap_options = "condense")

Applicant	Predicted	CCTST	Interview1
1	4.858	80	5
5	4.002	78	4
6	3.608	83	3
7	4.924	81	5
11	4.134	80	4
14	4.529	86	4
18	4.002	78	4
26	4.134	80	4
31	4.792	79	5
32	4.398	84	4
35	3.344	79	3
39	3.410	80	3
40	4.134	80	4
42	4.002	78	4
44	4.002	78	4
57	3.344	79	3
58	4.002	78	4
60	3.542	82	3
61	4.858	80	5
62	4.792	79	5
64	4.134	80	4
69	3.344	79	3
70	4.858	80	5
71	4.660	77	5
74	3.936	77	4
75	5.319	87	5
77	5.056	83	5
79	4.792	79	5
85	4.068	79	4
86	4.924	81	5
89	4.002	78	4
99	4.858	80	5

Hurdle 2

# Same thing as before but also send apply the second hurdle
scoresdf %>% 
  select(Applicant, Predicted,CCTST,Interview1) %>% 
  # This is where you apply the second hurdle (same as the cutoff strategy)
  filter(CCTST>CCTST.cutoff,Interview1>Interview.cutoff) %>% 
  kable(digits = 3) %>% 
  kableExtra::kable_styling(full_width = F,
                            bootstrap_options = "condense")

Applicant	Predicted	CCTST	Interview1
1	4.858	80	5
5	4.002	78	4
7	4.924	81	5
11	4.134	80	4
14	4.529	86	4
18	4.002	78	4
26	4.134	80	4
31	4.792	79	5
32	4.398	84	4
40	4.134	80	4
42	4.002	78	4
44	4.002	78	4
58	4.002	78	4
61	4.858	80	5
62	4.792	79	5
64	4.134	80	4
70	4.858	80	5
71	4.660	77	5
74	3.936	77	4
75	5.319	87	5
77	5.056	83	5
79	4.792	79	5
85	4.068	79	4
86	4.924	81	5
89	4.002	78	4
99	4.858	80	5

Because there are more than 10 applicants remain, I reported the highest scoring predicted criterion scores from the remaining applicants.

Hurdle <- scoresdf %>% 
  select(Applicant, Predicted,CCTST,Interview1) %>% 
  filter(CCTST>CCTST.cutoff,Interview1>Interview.cutoff) %>% 
  arrange(desc(Predicted)) %>% 
  head(n=10)
Hurdle %>% 
  kable(digits = 3) %>% 
  kableExtra::kable_styling(full_width = F)

Applicant	Predicted	CCTST	Interview1
75	5.319	87	5
77	5.056	83	5
7	4.924	81	5
86	4.924	81	5
1	4.858	80	5
61	4.858	80	5
70	4.858	80	5
99	4.858	80	5
31	4.792	79	5
62	4.792	79	5

Compare strategies

Regression
75
96
77
7
86
1
61
70
99
31

Cutoff
75
77
7
86
1
61
70
99
31
62

Hurdle
75
77
7
86
1
61
70
99
31
62

There is only one difference in the regression strategy and the cutoff/hurdle strategy. The regression equation did not account for low performers in any individual predictor score but rather just the result from the regression equation.

I am choosing to hire the following applicants via the cutoff strategy:

75, 77, 7, 86, 1, 61, 70, 99, 31, 62

I chose this method because it was actually a mixed method approach where after the two cutoffs were made, I had to apply the regression method on the remaining applicants to return only the top 10. The cutoff strategy allows only people to continue if they meet minimum requirements.

Project Reflection

This project was a great way to see how I can apply regressions and cutoff to real world situations. It was also a great way to try and use R to provide a deliverable explaining my process and ultimately my conclusion.

Selection Project Part 3

Tanner Levenhagen

4/18/2022

Cutoff Strategy

Hurdle Strategy

Compare strategies

Project Reflection