# Get the dataframe from Part 2 to get raw data
regdf <- read.csv("Validity Project.csv",fileEncoding = 'UTF-8-BOM')
# Get the simulated "new applicants" scores on the predictor variables
scoresdf <- read.csv("Tanner_Final_Data_Master_2022.csv",fileEncoding = 'UTF-8-BOM')

I am tasked with giving 10 recommendations to an open position: general operations manager.

I choose 3 KSAs from O-NET to then test future applicants to predict how well they would do in the general operations manger position. The KSAs I chose are critical thinking, social perceptiveness, and coordination. The tests I found were the California Critical Thinking Skill Test (CCTST), Chapin Social Insight Test (CSIT), and a structured interview to test the KSAs respectively. These assessments had moderate to high validity and acceptable reliability estimates.

After conducting a concurrent criterion validity study, the predictors CCTST and the interview were significantly correlated to current employee’s performance while CSIT was not. With the same data, I ran a linear regression with the significant predictors and got the following regression equation.

# Create a regression equation to predict Criterion variable
model <- lm(Criterion ~ CCTST + Interview1, regdf)
# Show equation with LaTeX
extract_eq(model,wrap=TRUE,use_coefs=TRUE)

\[ \begin{aligned} \operatorname{\widehat{Criterion}} &= -4.03 + 0.07(\operatorname{CCTST}) + 0.72(\operatorname{Interview1}) \end{aligned} \]

I also computed the minimum score on both the CCTST and the interview for each applicant to maintain at least the mean performance of the organization (3.81). These were the cutoffs of both predictors:

CCTST: 76

Interview: 3.94

Taking the regression equation from part 2, plugging in each of the test scores from each predictor and putting them in the regression equation will give the predicted performance score (Criterion).

I put the scores into and create a new variable, “Predicted”, which is the predicted work performance of the applicant based off the scores from the other test.

scoresdf <- scoresdf %>% 
  # Go row by row to calculate
  rowwise %>% 
  # Create a new variable called 'Predicted'
  mutate(Predicted=
           # Pull the constant from the regression
           model$coefficients[1]+
           # Multiply coefficient by the CCTST score
           (CCTST*model$coefficients[2])+
           # Multiply coefficient by the Interview1 score
           (Interview1*model$coefficients[3]))

I then sort by the predicted scores from highest to lowest and only show the top 10.

# Make an object called 'Regression' from the full data frame
Regression <- scoresdf %>% 
  # Sort the data by decending order in the 'Predited' variable
  arrange(desc(Predicted)) %>% 
  # Report the top 10
  head(n=10)
# Put the Regression strategy results in a pretty table
Regression %>% 
  kable(digits = 3) %>% 
  kableExtra::kable_styling(full_width=F,
                            bootstrap_options = "condense")
Applicant CCTST Interview1 Predicted
75 87 5 5.319
96 76 6 5.318
77 83 5 5.056
7 81 5 4.924
86 81 5 4.924
1 80 5 4.858
61 80 5 4.858
70 80 5 4.858
99 80 5 4.858
31 79 5 4.792

I can see from the table above that we should hire those applicants because they had the highest predicted work performance. I would recommend the applicants in the table above using the regression method.

Cutoff Strategy

# Define the cutoff for the predictors
CCTST.cutoff <- 76
Interview.cutoff <- 3.94
# Filter all the participants who have passed the cutoffs
Cutoff <- scoresdf %>% 
  # Select the order of variables you would like to show
  select(Applicant,Predicted,CCTST,Interview1) %>% 
  # Filter out the applicants that do not meet all the cutoffs
  filter(CCTST>CCTST.cutoff,Interview1>Interview.cutoff) %>% 
  # Sort the variable 'Predicted' in decending order
  arrange(desc(Predicted)) %>% 
  # Report the top 10
  head(n=10)
# Display the applicants who passed the cutoffs
Cutoff %>% 
  kable(digits = 3) %>% 
  kableExtra::kable_styling(full_width = F,
                            bootstrap_options = "condense")
Applicant Predicted CCTST Interview1
75 5.319 87 5
77 5.056 83 5
7 4.924 81 5
86 4.924 81 5
1 4.858 80 5
61 4.858 80 5
70 4.858 80 5
99 4.858 80 5
31 4.792 79 5
62 4.792 79 5

Hurdle Strategy

Hurdle 1

# Look at the original data frame
scoresdf %>% 
  # Select the order of variables you would like to show
  select(Applicant,Predicted,CCTST,Interview1) %>% 
  # Filter all the participants who have passed the CCTST hurdle
  filter(CCTST>CCTST.cutoff) %>% 
  # Put the applicant's data in a pretty table
  kable(digits = 3) %>% 
  kableExtra::kable_styling(full_width = F,
                            bootstrap_options = "condense")
Applicant Predicted CCTST Interview1
1 4.858 80 5
5 4.002 78 4
6 3.608 83 3
7 4.924 81 5
11 4.134 80 4
14 4.529 86 4
18 4.002 78 4
26 4.134 80 4
31 4.792 79 5
32 4.398 84 4
35 3.344 79 3
39 3.410 80 3
40 4.134 80 4
42 4.002 78 4
44 4.002 78 4
57 3.344 79 3
58 4.002 78 4
60 3.542 82 3
61 4.858 80 5
62 4.792 79 5
64 4.134 80 4
69 3.344 79 3
70 4.858 80 5
71 4.660 77 5
74 3.936 77 4
75 5.319 87 5
77 5.056 83 5
79 4.792 79 5
85 4.068 79 4
86 4.924 81 5
89 4.002 78 4
99 4.858 80 5

Hurdle 2

# Same thing as before but also send apply the second hurdle
scoresdf %>% 
  select(Applicant, Predicted,CCTST,Interview1) %>% 
  # This is where you apply the second hurdle (same as the cutoff strategy)
  filter(CCTST>CCTST.cutoff,Interview1>Interview.cutoff) %>% 
  kable(digits = 3) %>% 
  kableExtra::kable_styling(full_width = F,
                            bootstrap_options = "condense")
Applicant Predicted CCTST Interview1
1 4.858 80 5
5 4.002 78 4
7 4.924 81 5
11 4.134 80 4
14 4.529 86 4
18 4.002 78 4
26 4.134 80 4
31 4.792 79 5
32 4.398 84 4
40 4.134 80 4
42 4.002 78 4
44 4.002 78 4
58 4.002 78 4
61 4.858 80 5
62 4.792 79 5
64 4.134 80 4
70 4.858 80 5
71 4.660 77 5
74 3.936 77 4
75 5.319 87 5
77 5.056 83 5
79 4.792 79 5
85 4.068 79 4
86 4.924 81 5
89 4.002 78 4
99 4.858 80 5

Because there are more than 10 applicants remain, I reported the highest scoring predicted criterion scores from the remaining applicants.

Hurdle <- scoresdf %>% 
  select(Applicant, Predicted,CCTST,Interview1) %>% 
  filter(CCTST>CCTST.cutoff,Interview1>Interview.cutoff) %>% 
  arrange(desc(Predicted)) %>% 
  head(n=10)
Hurdle %>% 
  kable(digits = 3) %>% 
  kableExtra::kable_styling(full_width = F)
Applicant Predicted CCTST Interview1
75 5.319 87 5
77 5.056 83 5
7 4.924 81 5
86 4.924 81 5
1 4.858 80 5
61 4.858 80 5
70 4.858 80 5
99 4.858 80 5
31 4.792 79 5
62 4.792 79 5

Compare strategies

Regression
75
96
77
7
86
1
61
70
99
31
Cutoff
75
77
7
86
1
61
70
99
31
62
Hurdle
75
77
7
86
1
61
70
99
31
62


There is only one difference in the regression strategy and the cutoff/hurdle strategy. The regression equation did not account for low performers in any individual predictor score but rather just the result from the regression equation.

I am choosing to hire the following applicants via the cutoff strategy:

75, 77, 7, 86, 1, 61, 70, 99, 31, 62

I chose this method because it was actually a mixed method approach where after the two cutoffs were made, I had to apply the regression method on the remaining applicants to return only the top 10. The cutoff strategy allows only people to continue if they meet minimum requirements.







Project Reflection

This project was a great way to see how I can apply regressions and cutoff to real world situations. It was also a great way to try and use R to provide a deliverable explaining my process and ultimately my conclusion.