class: center, middle, inverse, title-slide .title[ # Intervenable Factors for Employee Turnover ] .subtitle[ ## With a focus on Big 5 personality traits: logistic regression and analysis ] .author[ ### Alice X. ] .author[ ### Josh G. ] .author[ ### Josh Z. ] .date[ ### 2024-05-09 ] --- ## Table of Contents .left[ <li> Research Question </li> <br> <li> Data Pre-Processing </li> <br> <li> Exploratory Analysis </li> <br> <li> Logistic Regression Models </li> <br> <li> Goodness-of-Fit </li> <br> <li> Odds Ratio </li> <br> <li> Results </li> <br> <li> Conclusion </li> ] --- ## Employee Turnover - Defined as the total number of employees that leave a company over a certain time period - Can cost the company time, resources, productivity - Can be influenced by a number of factors - wages - benefit packages - days of time off - promotions - communication - etc. - The ability to predict turnover rate in a company allows for better future business decisions and avoid unneeded costs --- ## Research Question Primary question: which combination of intervenable factors consisting on gender, age, wage type, industry, profession, way of travel, source of hire, management, and big five personality of employees in Russia is associated with employee turnover? Secondary question: the five personality variables from the big 5 personalities test have an association with employee turnover, and if so, which variables do. --- ## Data Preprocessing - 1130 observations - 16 variables - stag - standardized measure of an employee's stay at the company - age - event - 1 if employee left, 0 if they stayed - gender - 'm' and 'f' - industry - profession - traffic - how the employee came to the company - coach - head_gender - greywage - Is the employees wages taxed (white) or are their true wages unreported (gray) - way - transportation to work - extraversion - independ - Agreeableness - self control - Conscientiousness - anxiety - Neuroticism - novator - Openness --- class: top, center # Exploratory Analysis <img src="w14-EmployeeTurnoverFinalProject_files/figure-html/interactive-scatterplot-matrix-1.png" width="100%" /> --- --- # Exploratory Analysis <img src="w14-EmployeeTurnoverFinalProject_files/figure-html/interactive-scatterplot-matrix2-1.png" width="100%" /> --- class: top, center # Exploratory Analysis <img src="w14-EmployeeTurnoverFinalProject_files/figure-html/EDA-1.png" width="100%" /> --- class: top, center # Exploratory Analysis <img src="w14-EmployeeTurnoverFinalProject_files/figure-html/unnamed-chunk-3-1.png" width="100%" /> --- class: top, center # Exploratory Analysis <img src="w14-EmployeeTurnoverFinalProject_files/figure-html/unnamed-chunk-4-1.png" width="100%" /> --- ##Principal Component Analysis These traits are derived from factor analyses that have repeatedly confirmed their independence and stability across different samples and methods. Further aggregation or reduction of these dimensions might obscure the specific psychological processes and characteristics they represent. The original development of the Big 5 model involved extensive factor analysis procedures that identified these five factors as providing an optimal balance of comprehensiveness and parsimony in describing personality structure. Further factor analysis or PCA might lead to models that either overfit the data or fail to offer additional practical or theoretical advantages. Therefore, given the robust empirical support for the Big 5 structure of personality and the concerns about the potential drawbacks of further aggregation through PCA or factor analysis, <b> we did not perform PCA or Factor Analysis in our analysis </b> <br> <br> <br> <br> (Boyle et al., 1995) --- ## Linearity of log odds A key assumption for logistic regression is the linearity of log odds against the predictors. A test was run to check for linearity across the predictors in the dataset. <div style="border: 1px solid #ddd; padding: 0px; overflow-y: scroll; height:400px; overflow-x: scroll; width:100%; "><table class="table table-striped" style="width: auto !important; "> <thead> <tr> <th style="text-align:left;position: sticky; top:0; background-color: #FFFFFF;"> </th> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> Estimate </th> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> Std. Error </th> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> z value </th> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> Pr(>|z|) </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:right;"> 1.5057914 </td> <td style="text-align:right;"> 1.0352682 </td> <td style="text-align:right;"> 1.4544940 </td> <td style="text-align:right;"> 0.1458094 </td> </tr> <tr> <td style="text-align:left;"> stag </td> <td style="text-align:right;"> -0.0042445 </td> <td style="text-align:right;"> 0.0019406 </td> <td style="text-align:right;"> -2.1871803 </td> <td style="text-align:right;"> 0.0287294 </td> </tr> <tr> <td style="text-align:left;"> gender </td> <td style="text-align:right;"> -0.2039147 </td> <td style="text-align:right;"> 0.1767032 </td> <td style="text-align:right;"> -1.1539953 </td> <td style="text-align:right;"> 0.2485021 </td> </tr> <tr> <td style="text-align:left;"> age </td> <td style="text-align:right;"> -0.0282269 </td> <td style="text-align:right;"> 0.0099477 </td> <td style="text-align:right;"> -2.8375363 </td> <td style="text-align:right;"> 0.0045463 </td> </tr> <tr> <td style="text-align:left;"> industry2 </td> <td style="text-align:right;"> -0.4041901 </td> <td style="text-align:right;"> 0.2403596 </td> <td style="text-align:right;"> -1.6816058 </td> <td style="text-align:right;"> 0.0926453 </td> </tr> <tr> <td style="text-align:left;"> industry3 </td> <td style="text-align:right;"> -0.6949881 </td> <td style="text-align:right;"> 0.2234896 </td> <td style="text-align:right;"> -3.1097112 </td> <td style="text-align:right;"> 0.0018727 </td> </tr> <tr> <td style="text-align:left;"> industry4 </td> <td style="text-align:right;"> -0.8979739 </td> <td style="text-align:right;"> 0.2508461 </td> <td style="text-align:right;"> -3.5797807 </td> <td style="text-align:right;"> 0.0003439 </td> </tr> <tr> <td style="text-align:left;"> industry5 </td> <td style="text-align:right;"> -0.3196459 </td> <td style="text-align:right;"> 0.2337964 </td> <td style="text-align:right;"> -1.3671978 </td> <td style="text-align:right;"> 0.1715633 </td> </tr> <tr> <td style="text-align:left;"> profession1 </td> <td style="text-align:right;"> 0.7108187 </td> <td style="text-align:right;"> 0.5165538 </td> <td style="text-align:right;"> 1.3760786 </td> <td style="text-align:right;"> 0.1687973 </td> </tr> <tr> <td style="text-align:left;"> profession2 </td> <td style="text-align:right;"> 0.3096102 </td> <td style="text-align:right;"> 0.3846883 </td> <td style="text-align:right;"> 0.8048339 </td> <td style="text-align:right;"> 0.4209155 </td> </tr> <tr> <td style="text-align:left;"> profession3 </td> <td style="text-align:right;"> -0.2577007 </td> <td style="text-align:right;"> 0.3555193 </td> <td style="text-align:right;"> -0.7248570 </td> <td style="text-align:right;"> 0.4685397 </td> </tr> <tr> <td style="text-align:left;"> profession4 </td> <td style="text-align:right;"> -0.0674923 </td> <td style="text-align:right;"> 0.4085622 </td> <td style="text-align:right;"> -0.1651947 </td> <td style="text-align:right;"> 0.8687908 </td> </tr> <tr> <td style="text-align:left;"> profession5 </td> <td style="text-align:right;"> 1.1637120 </td> <td style="text-align:right;"> 0.5489946 </td> <td style="text-align:right;"> 2.1197148 </td> <td style="text-align:right;"> 0.0340301 </td> </tr> <tr> <td style="text-align:left;"> traffic </td> <td style="text-align:right;"> -0.1440712 </td> <td style="text-align:right;"> 0.1278180 </td> <td style="text-align:right;"> -1.1271589 </td> <td style="text-align:right;"> 0.2596753 </td> </tr> <tr> <td style="text-align:left;"> coach </td> <td style="text-align:right;"> -0.1044114 </td> <td style="text-align:right;"> 0.0741699 </td> <td style="text-align:right;"> -1.4077325 </td> <td style="text-align:right;"> 0.1592103 </td> </tr> <tr> <td style="text-align:left;"> head_gender </td> <td style="text-align:right;"> 0.2432737 </td> <td style="text-align:right;"> 0.1366734 </td> <td style="text-align:right;"> 1.7799641 </td> <td style="text-align:right;"> 0.0750818 </td> </tr> <tr> <td style="text-align:left;"> greywage </td> <td style="text-align:right;"> 0.1864606 </td> <td style="text-align:right;"> 0.2014140 </td> <td style="text-align:right;"> 0.9257577 </td> <td style="text-align:right;"> 0.3545719 </td> </tr> <tr> <td style="text-align:left;"> way1 </td> <td style="text-align:right;"> 0.7040452 </td> <td style="text-align:right;"> 0.2225129 </td> <td style="text-align:right;"> 3.1640639 </td> <td style="text-align:right;"> 0.0015558 </td> </tr> <tr> <td style="text-align:left;"> way2 </td> <td style="text-align:right;"> 0.7544207 </td> <td style="text-align:right;"> 0.2401262 </td> <td style="text-align:right;"> 3.1417671 </td> <td style="text-align:right;"> 0.0016793 </td> </tr> <tr> <td style="text-align:left;"> Extraversion </td> <td style="text-align:right;"> -0.0312104 </td> <td style="text-align:right;"> 0.0483825 </td> <td style="text-align:right;"> -0.6450773 </td> <td style="text-align:right;"> 0.5188771 </td> </tr> <tr> <td style="text-align:left;"> Agreeableness </td> <td style="text-align:right;"> 0.0371932 </td> <td style="text-align:right;"> 0.0490584 </td> <td style="text-align:right;"> 0.7581421 </td> <td style="text-align:right;"> 0.4483659 </td> </tr> <tr> <td style="text-align:left;"> Conscientiousness </td> <td style="text-align:right;"> -0.0325445 </td> <td style="text-align:right;"> 0.0482307 </td> <td style="text-align:right;"> -0.6747674 </td> <td style="text-align:right;"> 0.4998236 </td> </tr> <tr> <td style="text-align:left;"> Neuroticism </td> <td style="text-align:right;"> -0.0584022 </td> <td style="text-align:right;"> 0.0493167 </td> <td style="text-align:right;"> -1.1842277 </td> <td style="text-align:right;"> 0.2363229 </td> </tr> <tr> <td style="text-align:left;"> Openness </td> <td style="text-align:right;"> 0.0051041 </td> <td style="text-align:right;"> 0.0406373 </td> <td style="text-align:right;"> 0.1256013 </td> <td style="text-align:right;"> 0.9000476 </td> </tr> </tbody> </table></div> Some variables have significant p-values, and therefore violations of linearity --- ## Plots The three variables with significant p-values were plotted against the log odds <img src="w14-EmployeeTurnoverFinalProject_files/figure-html/unnamed-chunk-6-1.png" width="100%" /> Weak correlations, but no striking violations to assumption of linearity or obvious curvature. No transformation. --- ## Logistic Regression - reponse variable: event - `stepAIC()` function of the MASS package used for model selection - backwards stepwise selection done from the full model using AIC values to choose predictors - 2 models created: one for demographic predictors and one for big5 personality traits --- ## Logistic Regression: Demographic Predictors - 9 total predictors for full model - stag - gender - age - industry - profession - traffic - coach - head_gender - greywage - way --- ## Summary Inferential Statistics for Full Model <div style="border: 1px solid #ddd; padding: 0px; overflow-y: scroll; margin-bottom: 20px;overflow-y: scroll; height:400px; overflow-x: scroll; width:100%; "><table class="table table-striped" style="width: auto !important; "> <caption>Summary of Logistic Regression Model</caption> <thead> <tr> <th style="text-align:left;position: sticky; top:0; background-color: #FFFFFF;"> </th> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> Estimate </th> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> Std. Error </th> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> z value </th> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> Pr(>|z|) </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:right;"> 1.5057914 </td> <td style="text-align:right;"> 1.0352682 </td> <td style="text-align:right;"> 1.4544940 </td> <td style="text-align:right;"> 0.1458094 </td> </tr> <tr> <td style="text-align:left;"> stag </td> <td style="text-align:right;"> -0.0042445 </td> <td style="text-align:right;"> 0.0019406 </td> <td style="text-align:right;"> -2.1871803 </td> <td style="text-align:right;"> 0.0287294 </td> </tr> <tr> <td style="text-align:left;"> gender </td> <td style="text-align:right;"> -0.2039147 </td> <td style="text-align:right;"> 0.1767032 </td> <td style="text-align:right;"> -1.1539953 </td> <td style="text-align:right;"> 0.2485021 </td> </tr> <tr> <td style="text-align:left;"> age </td> <td style="text-align:right;"> -0.0282269 </td> <td style="text-align:right;"> 0.0099477 </td> <td style="text-align:right;"> -2.8375363 </td> <td style="text-align:right;"> 0.0045463 </td> </tr> <tr> <td style="text-align:left;"> industry2 </td> <td style="text-align:right;"> -0.4041901 </td> <td style="text-align:right;"> 0.2403596 </td> <td style="text-align:right;"> -1.6816058 </td> <td style="text-align:right;"> 0.0926453 </td> </tr> <tr> <td style="text-align:left;"> industry3 </td> <td style="text-align:right;"> -0.6949881 </td> <td style="text-align:right;"> 0.2234896 </td> <td style="text-align:right;"> -3.1097112 </td> <td style="text-align:right;"> 0.0018727 </td> </tr> <tr> <td style="text-align:left;"> industry4 </td> <td style="text-align:right;"> -0.8979739 </td> <td style="text-align:right;"> 0.2508461 </td> <td style="text-align:right;"> -3.5797807 </td> <td style="text-align:right;"> 0.0003439 </td> </tr> <tr> <td style="text-align:left;"> industry5 </td> <td style="text-align:right;"> -0.3196459 </td> <td style="text-align:right;"> 0.2337964 </td> <td style="text-align:right;"> -1.3671978 </td> <td style="text-align:right;"> 0.1715633 </td> </tr> <tr> <td style="text-align:left;"> profession1 </td> <td style="text-align:right;"> 0.7108187 </td> <td style="text-align:right;"> 0.5165538 </td> <td style="text-align:right;"> 1.3760786 </td> <td style="text-align:right;"> 0.1687973 </td> </tr> <tr> <td style="text-align:left;"> profession2 </td> <td style="text-align:right;"> 0.3096102 </td> <td style="text-align:right;"> 0.3846883 </td> <td style="text-align:right;"> 0.8048339 </td> <td style="text-align:right;"> 0.4209155 </td> </tr> <tr> <td style="text-align:left;"> profession3 </td> <td style="text-align:right;"> -0.2577007 </td> <td style="text-align:right;"> 0.3555193 </td> <td style="text-align:right;"> -0.7248570 </td> <td style="text-align:right;"> 0.4685397 </td> </tr> <tr> <td style="text-align:left;"> profession4 </td> <td style="text-align:right;"> -0.0674923 </td> <td style="text-align:right;"> 0.4085622 </td> <td style="text-align:right;"> -0.1651947 </td> <td style="text-align:right;"> 0.8687908 </td> </tr> <tr> <td style="text-align:left;"> profession5 </td> <td style="text-align:right;"> 1.1637120 </td> <td style="text-align:right;"> 0.5489946 </td> <td style="text-align:right;"> 2.1197148 </td> <td style="text-align:right;"> 0.0340301 </td> </tr> <tr> <td style="text-align:left;"> traffic </td> <td style="text-align:right;"> -0.1440712 </td> <td style="text-align:right;"> 0.1278180 </td> <td style="text-align:right;"> -1.1271589 </td> <td style="text-align:right;"> 0.2596753 </td> </tr> <tr> <td style="text-align:left;"> coach </td> <td style="text-align:right;"> -0.1044114 </td> <td style="text-align:right;"> 0.0741699 </td> <td style="text-align:right;"> -1.4077325 </td> <td style="text-align:right;"> 0.1592103 </td> </tr> <tr> <td style="text-align:left;"> head_gender </td> <td style="text-align:right;"> 0.2432737 </td> <td style="text-align:right;"> 0.1366734 </td> <td style="text-align:right;"> 1.7799641 </td> <td style="text-align:right;"> 0.0750818 </td> </tr> <tr> <td style="text-align:left;"> greywage </td> <td style="text-align:right;"> 0.1864606 </td> <td style="text-align:right;"> 0.2014140 </td> <td style="text-align:right;"> 0.9257577 </td> <td style="text-align:right;"> 0.3545719 </td> </tr> <tr> <td style="text-align:left;"> way1 </td> <td style="text-align:right;"> 0.7040452 </td> <td style="text-align:right;"> 0.2225129 </td> <td style="text-align:right;"> 3.1640639 </td> <td style="text-align:right;"> 0.0015558 </td> </tr> <tr> <td style="text-align:left;"> way2 </td> <td style="text-align:right;"> 0.7544207 </td> <td style="text-align:right;"> 0.2401262 </td> <td style="text-align:right;"> 3.1417671 </td> <td style="text-align:right;"> 0.0016793 </td> </tr> <tr> <td style="text-align:left;"> Extraversion </td> <td style="text-align:right;"> -0.0312104 </td> <td style="text-align:right;"> 0.0483825 </td> <td style="text-align:right;"> -0.6450773 </td> <td style="text-align:right;"> 0.5188771 </td> </tr> <tr> <td style="text-align:left;"> Agreeableness </td> <td style="text-align:right;"> 0.0371932 </td> <td style="text-align:right;"> 0.0490584 </td> <td style="text-align:right;"> 0.7581421 </td> <td style="text-align:right;"> 0.4483659 </td> </tr> <tr> <td style="text-align:left;"> Conscientiousness </td> <td style="text-align:right;"> -0.0325445 </td> <td style="text-align:right;"> 0.0482307 </td> <td style="text-align:right;"> -0.6747674 </td> <td style="text-align:right;"> 0.4998236 </td> </tr> <tr> <td style="text-align:left;"> Neuroticism </td> <td style="text-align:right;"> -0.0584022 </td> <td style="text-align:right;"> 0.0493167 </td> <td style="text-align:right;"> -1.1842277 </td> <td style="text-align:right;"> 0.2363229 </td> </tr> <tr> <td style="text-align:left;"> Openness </td> <td style="text-align:right;"> 0.0051041 </td> <td style="text-align:right;"> 0.0406373 </td> <td style="text-align:right;"> 0.1256013 </td> <td style="text-align:right;"> 0.9000476 </td> </tr> </tbody> </table></div> --- ## Stepwise AIC Algorithm for Stepwise Model - A stepwise function that chooses predictors based off AIC values was utilized in the creation of the stepwise model. - Backwards stepwise selection beginning from the full model. - Removed the variables of gender, industry, profession, traffic, and greywage. --- ## Summary Inferential Statistics for Stepwise Model <div style="border: 1px solid #ddd; padding: 0px; overflow-y: scroll; margin-bottom: 20px;overflow-y: scroll; height:400px; overflow-x: scroll; width:100%; "><table class="table table-striped" style="width: auto !important; "> <caption>Summary of Logistic Regression Model</caption> <thead> <tr> <th style="text-align:left;position: sticky; top:0; background-color: #FFFFFF;"> </th> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> Estimate </th> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> Std. Error </th> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> z value </th> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> Pr(>|z|) </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:right;"> 0.5554540 </td> <td style="text-align:right;"> 0.5536105 </td> <td style="text-align:right;"> 1.0033299 </td> <td style="text-align:right;"> 0.3157017 </td> </tr> <tr> <td style="text-align:left;"> stag </td> <td style="text-align:right;"> -0.0046201 </td> <td style="text-align:right;"> 0.0019085 </td> <td style="text-align:right;"> -2.4208135 </td> <td style="text-align:right;"> 0.0154858 </td> </tr> <tr> <td style="text-align:left;"> gender </td> <td style="text-align:right;"> -0.2470710 </td> <td style="text-align:right;"> 0.1684168 </td> <td style="text-align:right;"> -1.4670207 </td> <td style="text-align:right;"> 0.1423704 </td> </tr> <tr> <td style="text-align:left;"> age </td> <td style="text-align:right;"> -0.0271824 </td> <td style="text-align:right;"> 0.0097113 </td> <td style="text-align:right;"> -2.7990512 </td> <td style="text-align:right;"> 0.0051253 </td> </tr> <tr> <td style="text-align:left;"> industry2 </td> <td style="text-align:right;"> -0.4085437 </td> <td style="text-align:right;"> 0.2386349 </td> <td style="text-align:right;"> -1.7120033 </td> <td style="text-align:right;"> 0.0868961 </td> </tr> <tr> <td style="text-align:left;"> industry3 </td> <td style="text-align:right;"> -0.6995080 </td> <td style="text-align:right;"> 0.2219994 </td> <td style="text-align:right;"> -3.1509460 </td> <td style="text-align:right;"> 0.0016274 </td> </tr> <tr> <td style="text-align:left;"> industry4 </td> <td style="text-align:right;"> -0.9087529 </td> <td style="text-align:right;"> 0.2496793 </td> <td style="text-align:right;"> -3.6396800 </td> <td style="text-align:right;"> 0.0002730 </td> </tr> <tr> <td style="text-align:left;"> industry5 </td> <td style="text-align:right;"> -0.3333751 </td> <td style="text-align:right;"> 0.2315388 </td> <td style="text-align:right;"> -1.4398237 </td> <td style="text-align:right;"> 0.1499173 </td> </tr> <tr> <td style="text-align:left;"> profession1 </td> <td style="text-align:right;"> 0.7499231 </td> <td style="text-align:right;"> 0.5144513 </td> <td style="text-align:right;"> 1.4577144 </td> <td style="text-align:right;"> 0.1449193 </td> </tr> <tr> <td style="text-align:left;"> profession2 </td> <td style="text-align:right;"> 0.3177433 </td> <td style="text-align:right;"> 0.3833655 </td> <td style="text-align:right;"> 0.8288262 </td> <td style="text-align:right;"> 0.4072028 </td> </tr> <tr> <td style="text-align:left;"> profession3 </td> <td style="text-align:right;"> -0.2501724 </td> <td style="text-align:right;"> 0.3548537 </td> <td style="text-align:right;"> -0.7050014 </td> <td style="text-align:right;"> 0.4808094 </td> </tr> <tr> <td style="text-align:left;"> profession4 </td> <td style="text-align:right;"> -0.0693953 </td> <td style="text-align:right;"> 0.4072894 </td> <td style="text-align:right;"> -0.1703834 </td> <td style="text-align:right;"> 0.8647087 </td> </tr> <tr> <td style="text-align:left;"> profession5 </td> <td style="text-align:right;"> 1.2009499 </td> <td style="text-align:right;"> 0.5483599 </td> <td style="text-align:right;"> 2.1900761 </td> <td style="text-align:right;"> 0.0285187 </td> </tr> <tr> <td style="text-align:left;"> coach </td> <td style="text-align:right;"> -0.1046803 </td> <td style="text-align:right;"> 0.0738042 </td> <td style="text-align:right;"> -1.4183502 </td> <td style="text-align:right;"> 0.1560886 </td> </tr> <tr> <td style="text-align:left;"> head_gender </td> <td style="text-align:right;"> 0.2547743 </td> <td style="text-align:right;"> 0.1355873 </td> <td style="text-align:right;"> 1.8790425 </td> <td style="text-align:right;"> 0.0602387 </td> </tr> <tr> <td style="text-align:left;"> way1 </td> <td style="text-align:right;"> 0.7480121 </td> <td style="text-align:right;"> 0.2204404 </td> <td style="text-align:right;"> 3.3932615 </td> <td style="text-align:right;"> 0.0006907 </td> </tr> <tr> <td style="text-align:left;"> way2 </td> <td style="text-align:right;"> 0.7876752 </td> <td style="text-align:right;"> 0.2383831 </td> <td style="text-align:right;"> 3.3042416 </td> <td style="text-align:right;"> 0.0009523 </td> </tr> <tr> <td style="text-align:left;"> Agreeableness </td> <td style="text-align:right;"> 0.0722110 </td> <td style="text-align:right;"> 0.0371625 </td> <td style="text-align:right;"> 1.9431157 </td> <td style="text-align:right;"> 0.0520022 </td> </tr> </tbody> </table></div> --- ## Goodness of Fit for Full and Reduced Models Goodness of fit was compared through null deviance residuals, deviance residuals, and AIC - Null deviance residual: measure of how well the response can be predicted with just the intercept - Deviance residual: measure of how well a response can be predicted with a model with p predictors - AIC(Akaike information criterion): a single number score which estimates models relatively to other models <br> <table> <caption>Comparison of global goodness-of-fit statistics</caption> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> Deviance.residual </th> <th style="text-align:right;"> Null.Deviance.Residual </th> <th style="text-align:right;"> AIC </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> full.model </td> <td style="text-align:right;"> 1484.675 </td> <td style="text-align:right;"> 1564.977 </td> <td style="text-align:right;"> 1532.675 </td> </tr> <tr> <td style="text-align:left;"> final.model </td> <td style="text-align:right;"> 1488.218 </td> <td style="text-align:right;"> 1564.977 </td> <td style="text-align:right;"> 1524.218 </td> </tr> </tbody> </table> <br> Stepwise model chosen as final model due to smaller p-value and AIC values --- ## Summary statistics of stepwise model and Odds Ratios <div style="border: 1px solid #ddd; padding: 0px; overflow-y: scroll; margin-bottom: 20px;overflow-y: scroll; height:400px; overflow-x: scroll; width:100%; "><table class="table table-striped" style="width: auto !important; "> <caption>Summary Stats with Odds Ratios</caption> <thead> <tr> <th style="text-align:left;position: sticky; top:0; background-color: #FFFFFF;"> </th> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> Estimate </th> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> Std. Error </th> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> z value </th> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> Pr(>|z|) </th> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> odds.ratio </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:right;"> 0.5554540 </td> <td style="text-align:right;"> 0.5536105 </td> <td style="text-align:right;"> 1.0033299 </td> <td style="text-align:right;"> 0.3157017 </td> <td style="text-align:right;"> 1.7427319 </td> </tr> <tr> <td style="text-align:left;"> stag </td> <td style="text-align:right;"> -0.0046201 </td> <td style="text-align:right;"> 0.0019085 </td> <td style="text-align:right;"> -2.4208135 </td> <td style="text-align:right;"> 0.0154858 </td> <td style="text-align:right;"> 0.9953905 </td> </tr> <tr> <td style="text-align:left;"> gender </td> <td style="text-align:right;"> -0.2470710 </td> <td style="text-align:right;"> 0.1684168 </td> <td style="text-align:right;"> -1.4670207 </td> <td style="text-align:right;"> 0.1423704 </td> <td style="text-align:right;"> 0.7810853 </td> </tr> <tr> <td style="text-align:left;"> age </td> <td style="text-align:right;"> -0.0271824 </td> <td style="text-align:right;"> 0.0097113 </td> <td style="text-align:right;"> -2.7990512 </td> <td style="text-align:right;"> 0.0051253 </td> <td style="text-align:right;"> 0.9731837 </td> </tr> <tr> <td style="text-align:left;"> industry2 </td> <td style="text-align:right;"> -0.4085437 </td> <td style="text-align:right;"> 0.2386349 </td> <td style="text-align:right;"> -1.7120033 </td> <td style="text-align:right;"> 0.0868961 </td> <td style="text-align:right;"> 0.6646174 </td> </tr> <tr> <td style="text-align:left;"> industry3 </td> <td style="text-align:right;"> -0.6995080 </td> <td style="text-align:right;"> 0.2219994 </td> <td style="text-align:right;"> -3.1509460 </td> <td style="text-align:right;"> 0.0016274 </td> <td style="text-align:right;"> 0.4968297 </td> </tr> <tr> <td style="text-align:left;"> industry4 </td> <td style="text-align:right;"> -0.9087529 </td> <td style="text-align:right;"> 0.2496793 </td> <td style="text-align:right;"> -3.6396800 </td> <td style="text-align:right;"> 0.0002730 </td> <td style="text-align:right;"> 0.4030265 </td> </tr> <tr> <td style="text-align:left;"> industry5 </td> <td style="text-align:right;"> -0.3333751 </td> <td style="text-align:right;"> 0.2315388 </td> <td style="text-align:right;"> -1.4398237 </td> <td style="text-align:right;"> 0.1499173 </td> <td style="text-align:right;"> 0.7165014 </td> </tr> <tr> <td style="text-align:left;"> profession1 </td> <td style="text-align:right;"> 0.7499231 </td> <td style="text-align:right;"> 0.5144513 </td> <td style="text-align:right;"> 1.4577144 </td> <td style="text-align:right;"> 0.1449193 </td> <td style="text-align:right;"> 2.1168372 </td> </tr> <tr> <td style="text-align:left;"> profession2 </td> <td style="text-align:right;"> 0.3177433 </td> <td style="text-align:right;"> 0.3833655 </td> <td style="text-align:right;"> 0.8288262 </td> <td style="text-align:right;"> 0.4072028 </td> <td style="text-align:right;"> 1.3740235 </td> </tr> <tr> <td style="text-align:left;"> profession3 </td> <td style="text-align:right;"> -0.2501724 </td> <td style="text-align:right;"> 0.3548537 </td> <td style="text-align:right;"> -0.7050014 </td> <td style="text-align:right;"> 0.4808094 </td> <td style="text-align:right;"> 0.7786665 </td> </tr> <tr> <td style="text-align:left;"> profession4 </td> <td style="text-align:right;"> -0.0693953 </td> <td style="text-align:right;"> 0.4072894 </td> <td style="text-align:right;"> -0.1703834 </td> <td style="text-align:right;"> 0.8647087 </td> <td style="text-align:right;"> 0.9329578 </td> </tr> <tr> <td style="text-align:left;"> profession5 </td> <td style="text-align:right;"> 1.2009499 </td> <td style="text-align:right;"> 0.5483599 </td> <td style="text-align:right;"> 2.1900761 </td> <td style="text-align:right;"> 0.0285187 </td> <td style="text-align:right;"> 3.3232722 </td> </tr> <tr> <td style="text-align:left;"> coach </td> <td style="text-align:right;"> -0.1046803 </td> <td style="text-align:right;"> 0.0738042 </td> <td style="text-align:right;"> -1.4183502 </td> <td style="text-align:right;"> 0.1560886 </td> <td style="text-align:right;"> 0.9006124 </td> </tr> <tr> <td style="text-align:left;"> head_gender </td> <td style="text-align:right;"> 0.2547743 </td> <td style="text-align:right;"> 0.1355873 </td> <td style="text-align:right;"> 1.8790425 </td> <td style="text-align:right;"> 0.0602387 </td> <td style="text-align:right;"> 1.2901704 </td> </tr> <tr> <td style="text-align:left;"> way1 </td> <td style="text-align:right;"> 0.7480121 </td> <td style="text-align:right;"> 0.2204404 </td> <td style="text-align:right;"> 3.3932615 </td> <td style="text-align:right;"> 0.0006907 </td> <td style="text-align:right;"> 2.1127958 </td> </tr> <tr> <td style="text-align:left;"> way2 </td> <td style="text-align:right;"> 0.7876752 </td> <td style="text-align:right;"> 0.2383831 </td> <td style="text-align:right;"> 3.3042416 </td> <td style="text-align:right;"> 0.0009523 </td> <td style="text-align:right;"> 2.1982799 </td> </tr> <tr> <td style="text-align:left;"> Agreeableness </td> <td style="text-align:right;"> 0.0722110 </td> <td style="text-align:right;"> 0.0371625 </td> <td style="text-align:right;"> 1.9431157 </td> <td style="text-align:right;"> 0.0520022 </td> <td style="text-align:right;"> 1.0748822 </td> </tr> </tbody> </table></div> --- ## Model Diagnostics for Full Model - No major influential points or outliers were found through looking at standardized residuals and Cook's distance - Multicollinearity checked through VIF - values all <2 - Linearity of log odds shows similar results as before, only a slight deviance in age --- ## Logistic Regression: Big 5 Predictors The Big 5 Personality Test is based on the theory that human personality can be defined by independent, separately measurable traits of which the big 5 are the most important. - 5 total predictors for full model - Extraversion - Agreeableness - Conscientiousness - Neuroticism - Openness --- ## Summary of Inferential Statistics for Full Big5 Model <table> <caption>Summary of inferential statistics of the full model</caption> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> Estimate </th> <th style="text-align:right;"> Std. Error </th> <th style="text-align:right;"> z value </th> <th style="text-align:right;"> Pr(>|z|) </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:right;"> 0.9579344 </td> <td style="text-align:right;"> 0.8490951 </td> <td style="text-align:right;"> 1.1281827 </td> <td style="text-align:right;"> 0.2592428 </td> </tr> <tr> <td style="text-align:left;"> Extraversion </td> <td style="text-align:right;"> -0.0264367 </td> <td style="text-align:right;"> 0.0462196 </td> <td style="text-align:right;"> -0.5719801 </td> <td style="text-align:right;"> 0.5673355 </td> </tr> <tr> <td style="text-align:left;"> Agreeableness </td> <td style="text-align:right;"> 0.0087620 </td> <td style="text-align:right;"> 0.0468548 </td> <td style="text-align:right;"> 0.1870030 </td> <td style="text-align:right;"> 0.8516583 </td> </tr> <tr> <td style="text-align:left;"> Conscientiousness </td> <td style="text-align:right;"> -0.0625667 </td> <td style="text-align:right;"> 0.0463007 </td> <td style="text-align:right;"> -1.3513122 </td> <td style="text-align:right;"> 0.1765954 </td> </tr> <tr> <td style="text-align:left;"> Neuroticism </td> <td style="text-align:right;"> -0.0812600 </td> <td style="text-align:right;"> 0.0457465 </td> <td style="text-align:right;"> -1.7763121 </td> <td style="text-align:right;"> 0.0756815 </td> </tr> <tr> <td style="text-align:left;"> Openness </td> <td style="text-align:right;"> -0.0041428 </td> <td style="text-align:right;"> 0.0391695 </td> <td style="text-align:right;"> -0.1057661 </td> <td style="text-align:right;"> 0.9157679 </td> </tr> </tbody> </table> --- ## Stepwise AIC Algorithm for Stepwise Big5 Model - The same stepwise function that chooses predictors based off AIC values was utilized in the creation of the stepwise model - Backwards stepwise selection beginning from the full big5 model. - Removed the variables of Extraversion, Agreeableness, and Openness. --- ## Summary of Inferential Statistics for the Stepwise Big5 Model <table> <caption>Summary of inferential statistics of the final model</caption> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> Estimate </th> <th style="text-align:right;"> Std. Error </th> <th style="text-align:right;"> z value </th> <th style="text-align:right;"> Pr(>|z|) </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:right;"> 0.7484104 </td> <td style="text-align:right;"> 0.2831586 </td> <td style="text-align:right;"> 2.643078 </td> <td style="text-align:right;"> 0.0082156 </td> </tr> <tr> <td style="text-align:left;"> Conscientiousness </td> <td style="text-align:right;"> -0.0481598 </td> <td style="text-align:right;"> 0.0303877 </td> <td style="text-align:right;"> -1.584847 </td> <td style="text-align:right;"> 0.1130012 </td> </tr> <tr> <td style="text-align:left;"> Neuroticism </td> <td style="text-align:right;"> -0.0804365 </td> <td style="text-align:right;"> 0.0352813 </td> <td style="text-align:right;"> -2.279864 </td> <td style="text-align:right;"> 0.0226158 </td> </tr> </tbody> </table> --- ## Goodness of fit for two Big5 Models Goodness of fit was compared through null deviance residuals, deviance residuals, and AIC - Null deviance residual: measure of how well the response can be predicted with just the intercept - Deviance residual: measure of how well a response can be predicted with a model with p predictors - AIC(Akaike information criterion): a single number score which estimates models relatively to other models <br> <table> <caption>Comparison of global goodness-of-fit statistics</caption> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> Deviance.residual </th> <th style="text-align:right;"> Null.Deviance.Residual </th> <th style="text-align:right;"> AIC </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> full.model.big5 </td> <td style="text-align:right;"> 1557.267 </td> <td style="text-align:right;"> 1564.977 </td> <td style="text-align:right;"> 1569.267 </td> </tr> <tr> <td style="text-align:left;"> stepwise.model.big5 </td> <td style="text-align:right;"> 1557.937 </td> <td style="text-align:right;"> 1564.977 </td> <td style="text-align:right;"> 1563.937 </td> </tr> </tbody> </table> <br> Stepwise model chosen as final model due to smaller p-value and AIC values --- ## Summary statistics of final Big5 model and Odds Ratios <table> <caption>Summary Stats with Odds Ratios</caption> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> Estimate </th> <th style="text-align:right;"> Std. Error </th> <th style="text-align:right;"> z value </th> <th style="text-align:right;"> Pr(>|z|) </th> <th style="text-align:right;"> odds.ratio </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:right;"> 0.9579344 </td> <td style="text-align:right;"> 0.8490951 </td> <td style="text-align:right;"> 1.1281827 </td> <td style="text-align:right;"> 0.2592428 </td> <td style="text-align:right;"> 2.6063073 </td> </tr> <tr> <td style="text-align:left;"> Extraversion </td> <td style="text-align:right;"> -0.0264367 </td> <td style="text-align:right;"> 0.0462196 </td> <td style="text-align:right;"> -0.5719801 </td> <td style="text-align:right;"> 0.5673355 </td> <td style="text-align:right;"> 0.9739097 </td> </tr> <tr> <td style="text-align:left;"> Agreeableness </td> <td style="text-align:right;"> 0.0087620 </td> <td style="text-align:right;"> 0.0468548 </td> <td style="text-align:right;"> 0.1870030 </td> <td style="text-align:right;"> 0.8516583 </td> <td style="text-align:right;"> 1.0088005 </td> </tr> <tr> <td style="text-align:left;"> Conscientiousness </td> <td style="text-align:right;"> -0.0625667 </td> <td style="text-align:right;"> 0.0463007 </td> <td style="text-align:right;"> -1.3513122 </td> <td style="text-align:right;"> 0.1765954 </td> <td style="text-align:right;"> 0.9393504 </td> </tr> <tr> <td style="text-align:left;"> Neuroticism </td> <td style="text-align:right;"> -0.0812600 </td> <td style="text-align:right;"> 0.0457465 </td> <td style="text-align:right;"> -1.7763121 </td> <td style="text-align:right;"> 0.0756815 </td> <td style="text-align:right;"> 0.9219539 </td> </tr> <tr> <td style="text-align:left;"> Openness </td> <td style="text-align:right;"> -0.0041428 </td> <td style="text-align:right;"> 0.0391695 </td> <td style="text-align:right;"> -0.1057661 </td> <td style="text-align:right;"> 0.9157679 </td> <td style="text-align:right;"> 0.9958658 </td> </tr> </tbody> </table> --- ## Model Diagnostics for Big5 - No major influential points or outliers were found through looking at standardized residuals and Cook's distance - Multicollinearity checked through VIF - values all <2 - Linearity of log odds gives no significant deviances to assumptions --- ## Conclusion While these models are an improvement over using a the intercept, the improvement is not enough to be meaningfully accurate. When tested with a bootstrapped 5-fold CV both the full model and personality trait model returned accuracies of roughly 54% --- ## Shortcomings This survey had a number of shortcoming that hindered our ability to properly analyze this data, these include - Taking from Russian population rather than American one - Unknown survey method - Big five traits are ultimately poor predictors for retention - Missing factors commonly associated with retention, such as wages or PTO. --- ## Recommendation Further data collection is recommended. A follow up survey based on an American Population and using verified survey methods is required for before any reasonable actions based on this data should be taken Also conceptually the idea of using personality traits as a predictor of employee turnover should be scrutinized before dedicating any research on this topic. --- ## References <div style="padding-left: 64px; text-indent: -60px;">Holiday M. (2021, January 13). What is Employee Turnover & Why It Matters for Your Business. Oracle Netsuite. https://www.netsuite.com/portal/resource/articles/human-resources/employee-turnover.shtml#:~:text=Employee%20turnover%20reference%20to%20the,%E2%80%94that%20is%2C%20involuntary%20turnover</div> <br> <div style="padding-left: 64px; text-indent: -60px;">Indeed Editorial Team. (2023, February 3). Turnover vs. Attrition: Definitions, Differences and Tips. Indeed.com. https://www.indeed.com/career-advice/career-development/turnover -vs-attrition</div> <br> <div style="padding-left: 64px; text-indent: -60px;">UCI-Machine Learning Repository. (n.d.). Turnover data set [Data set]. https://www.aihr.com /wp-content/uploads/2019/10/turnover-data-set.csv</div> <br> <div style="padding-left: 64px; text-indent: -60px;">Boyle, G.J., Stankov, L., Cattell, R.B. (1995). Measurement and Statistical Models in the Study of Personality and Intelligence. In: Saklofske, D.H., Zeidner, M. (eds) International Handbook of Personality and Intelligence. Perspectives on Individual Differences. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-5571-8_20 </div> <br> <div style="padding-left: 64px; text-indent: -60px;">White, I. R., & Thompson, S. G. (2005). Adjusting for partially missing baseline measurements in randomized trials. Statistics in medicine, 24(7), 993–1007. https://doi.org/10.1002/sim.1981 </div> --- ## Questions or Comments?