What drives PAP scores
Intro
We consider total_score_pap, i.e., quality of team output as assessed by expert team. The predictor variables are textual and facial. We conduct analysis on the team level, aggregating follower behaviour into one mean value.
We have far too many variables:
[1] "y" "WC_F" "WC_L" "Sixltr_F"
[5] "Sixltr_L" "Funksjon_F" "Funksjon_L" "Pronomen_F"
[9] "Pronomen_L" "Ppron_F" "Ppron_L" "Jeg_F"
[13] "Jeg_L" "Vi_F" "Vi_L" "Du_F"
[17] "Du_L" "Hanhun_F" "Hanhun_L" "De_F"
[21] "De_L" "Upron_F" "Upron_L" "Artikkel_F"
[25] "Artikkel_L" "Verb_F" "Verb_L" "Hverb_F"
[29] "Hverb_L" "Fortid_F" "Fortid_L" "Naatid_F"
[33] "Naatid_L" "Fremtid_F" "Fremtid_L" "Adverb_F"
[37] "Adverb_L" "Prep_F" "Prep_L" "Konj_F"
[41] "Konj_L" "Nekting_F" "Nekting_L" "Kvantitet_F"
[45] "Kvantitet_L" "Tallord_F" "Tallord_L" "Banne_F"
[49] "Sosial_F" "Sosial_L" "Posemo_F" "Posemo_L"
[53] "Negemo_F" "Negemo_L" "Angst_F" "Angst_L"
[57] "Sinne_F" "Sinne_L" "Trist_F" "Trist_L"
[61] "KognitivPros_F" "KognitivPros_L" "Innsikt_F" "Innsikt_L"
[65] "Kausalt_F" "Kausalt_L" "Diskrepans_F" "Diskrepans_L"
[69] "Tentativ_F" "Tentativ_L" "Sikker_F" "Sikker_L"
[73] "Inhibisjon_F" "Inhibisjon_L" "Inklusjon_F" "Inklusjon_L"
[77] "Eksklusjon_F" "Eksklusjon_L" "Persepsjon_F" "Persepsjon_L"
[81] "Se_F" "Se_L" "Hoere_F" "Hoere_L"
[85] "Foele_F" "Foele_L" "Biologisk_F" "Biologisk_L"
[89] "Relativ_F" "Relativ_L" "Bevegelse_F" "Bevegelse_L"
[93] "Rom_F" "Rom_L" "Tid_F" "Tid_L"
[97] "Prestasjon_F" "Prestasjon_L" "Samtykke_F" "Samtykke_L"
[101] "Stotring_F" "Stotring_L" "Fyllord_F" "Fyllord_L"
[105] "Subst_F" "Subst_L" "Personal_F" "Personal_L"
[109] "CDI_F" "CDI_L" "female_leader" "hap_F"
[113] "hap_L" "ang_F" "ang_L" "sad_F"
[117] "sad_L" "Ref_F" "Ref_L"
Correlations with the PAP outcome are modest
Variable importance calculated by machine learning
Five machine learning methods were used
| x |
|---|
| lm |
| glmnet |
| knn |
| pls |
| gbm |
Leaders
The combined variable importance from the five methods:
Followers
Running linear regressions with most important variables
Only Leader Predictors
We first run with ten most important predictors. And then with a smaller set of predictors, and the final model has only two predictors.
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 0.00 | 0.09 | 0.00 | 1.00 |
| Samtykke_L | -0.18 | 0.11 | -1.69 | 0.09 |
| hap_L | 0.21 | 0.09 | 2.29 | 0.02 |
| Personal_L | 0.13 | 0.10 | 1.28 | 0.20 |
| Eksklusjon_L | 0.25 | 0.12 | 2.04 | 0.04 |
| Hverb_L | 0.11 | 0.11 | 1.04 | 0.30 |
| Nekting_L | -0.15 | 0.11 | -1.40 | 0.16 |
| KognitivPros_L | 0.02 | 0.11 | 0.17 | 0.86 |
| Upron_L | -0.17 | 0.12 | -1.41 | 0.16 |
| Funksjon_L | 0.02 | 0.14 | 0.14 | 0.89 |
| Jeg_L | -0.06 | 0.09 | -0.69 | 0.49 |
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 0.00 | 0.09 | 0.00 | 1.00 |
| Samtykke_L | -0.24 | 0.09 | -2.65 | 0.01 |
| hap_L | 0.21 | 0.09 | 2.45 | 0.02 |
| Eksklusjon_L | 0.13 | 0.09 | 1.41 | 0.16 |
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 0.00 | 0.09 | 0.00 | 1.00 |
| Samtykke_L | -0.27 | 0.09 | -3.11 | 0.00 |
| hap_L | 0.22 | 0.09 | 2.54 | 0.01 |
| Res.Df | RSS | Df | Sum of Sq | F | Pr(>F) |
|---|---|---|---|---|---|
| 108 | 96.11795 | NA | NA | NA | NA |
| 115 | 102.73744 | -7 | -6.619487 | 1.062541 | 0.3926491 |
| 116 | 104.52198 | -1 | -1.784547 | 2.005152 | 0.1596432 |
Two leader variables are retained as predictors in the lm model.
Only Follower Predictors
We retain the nine most important predictors.
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 0.00 | 0.08 | 0.00 | 1.00 |
| Konj_F | 0.13 | 0.09 | 1.57 | 0.12 |
| Verb_F | -0.34 | 0.11 | -3.21 | 0.00 |
| Sinne_F | -0.21 | 0.08 | -2.61 | 0.01 |
| Subst_F | 0.30 | 0.09 | 3.54 | 0.00 |
| Biologisk_F | -0.19 | 0.09 | -2.22 | 0.03 |
| Jeg_F | -0.17 | 0.09 | -1.92 | 0.06 |
| Kausalt_F | 0.16 | 0.08 | 1.93 | 0.06 |
| Sixltr_F | -0.32 | 0.10 | -3.33 | 0.00 |
| Funksjon_F | 0.17 | 0.10 | 1.68 | 0.10 |
Final linear model
The R2 adjusted of the leader-only model is
[1] 0.09894841
and the R2 adj of the follower-only model is
[1] 0.2881977
Combining both sets of parameters yields the final model
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 0.00 | 0.07 | 0.00 | 1.00 |
| Konj_F | 0.13 | 0.08 | 1.52 | 0.13 |
| Verb_F | -0.31 | 0.10 | -3.06 | 0.00 |
| Sinne_F | -0.23 | 0.08 | -2.91 | 0.00 |
| Subst_F | 0.24 | 0.08 | 2.91 | 0.00 |
| Biologisk_F | -0.18 | 0.08 | -2.23 | 0.03 |
| Jeg_F | -0.15 | 0.09 | -1.78 | 0.08 |
| Kausalt_F | 0.14 | 0.08 | 1.71 | 0.09 |
| Sixltr_F | -0.28 | 0.09 | -3.09 | 0.00 |
| Funksjon_F | 0.18 | 0.10 | 1.77 | 0.08 |
| Samtykke_L | -0.21 | 0.08 | -2.71 | 0.01 |
| hap_L | 0.16 | 0.08 | 2.11 | 0.04 |
The R2 adj of the final model is
[1] 0.3420889