What drives PAP scores

Author

N Foldnes

Intro

We consider total_score_pap, i.e., quality of team output as assessed by expert team. The predictor variables are textual and facial. We conduct analysis on the team level, aggregating follower behaviour into one mean value.

We have far too many variables:

  [1] "y"              "WC_F"           "WC_L"           "Sixltr_F"      
  [5] "Sixltr_L"       "Funksjon_F"     "Funksjon_L"     "Pronomen_F"    
  [9] "Pronomen_L"     "Ppron_F"        "Ppron_L"        "Jeg_F"         
 [13] "Jeg_L"          "Vi_F"           "Vi_L"           "Du_F"          
 [17] "Du_L"           "Hanhun_F"       "Hanhun_L"       "De_F"          
 [21] "De_L"           "Upron_F"        "Upron_L"        "Artikkel_F"    
 [25] "Artikkel_L"     "Verb_F"         "Verb_L"         "Hverb_F"       
 [29] "Hverb_L"        "Fortid_F"       "Fortid_L"       "Naatid_F"      
 [33] "Naatid_L"       "Fremtid_F"      "Fremtid_L"      "Adverb_F"      
 [37] "Adverb_L"       "Prep_F"         "Prep_L"         "Konj_F"        
 [41] "Konj_L"         "Nekting_F"      "Nekting_L"      "Kvantitet_F"   
 [45] "Kvantitet_L"    "Tallord_F"      "Tallord_L"      "Banne_F"       
 [49] "Sosial_F"       "Sosial_L"       "Posemo_F"       "Posemo_L"      
 [53] "Negemo_F"       "Negemo_L"       "Angst_F"        "Angst_L"       
 [57] "Sinne_F"        "Sinne_L"        "Trist_F"        "Trist_L"       
 [61] "KognitivPros_F" "KognitivPros_L" "Innsikt_F"      "Innsikt_L"     
 [65] "Kausalt_F"      "Kausalt_L"      "Diskrepans_F"   "Diskrepans_L"  
 [69] "Tentativ_F"     "Tentativ_L"     "Sikker_F"       "Sikker_L"      
 [73] "Inhibisjon_F"   "Inhibisjon_L"   "Inklusjon_F"    "Inklusjon_L"   
 [77] "Eksklusjon_F"   "Eksklusjon_L"   "Persepsjon_F"   "Persepsjon_L"  
 [81] "Se_F"           "Se_L"           "Hoere_F"        "Hoere_L"       
 [85] "Foele_F"        "Foele_L"        "Biologisk_F"    "Biologisk_L"   
 [89] "Relativ_F"      "Relativ_L"      "Bevegelse_F"    "Bevegelse_L"   
 [93] "Rom_F"          "Rom_L"          "Tid_F"          "Tid_L"         
 [97] "Prestasjon_F"   "Prestasjon_L"   "Samtykke_F"     "Samtykke_L"    
[101] "Stotring_F"     "Stotring_L"     "Fyllord_F"      "Fyllord_L"     
[105] "Subst_F"        "Subst_L"        "Personal_F"     "Personal_L"    
[109] "CDI_F"          "CDI_L"          "female_leader"  "hap_F"         
[113] "hap_L"          "ang_F"          "ang_L"          "sad_F"         
[117] "sad_L"          "Ref_F"          "Ref_L"         

Correlations with the PAP outcome are modest

Variable importance calculated by machine learning

Five machine learning methods were used

x
lm
glmnet
knn
pls
gbm

Leaders

The combined variable importance from the five methods:

Followers

Running linear regressions with most important variables

Only Leader Predictors

We first run with ten most important predictors. And then with a smaller set of predictors, and the final model has only two predictors.

Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.00 0.09 0.00 1.00
Samtykke_L -0.18 0.11 -1.69 0.09
hap_L 0.21 0.09 2.29 0.02
Personal_L 0.13 0.10 1.28 0.20
Eksklusjon_L 0.25 0.12 2.04 0.04
Hverb_L 0.11 0.11 1.04 0.30
Nekting_L -0.15 0.11 -1.40 0.16
KognitivPros_L 0.02 0.11 0.17 0.86
Upron_L -0.17 0.12 -1.41 0.16
Funksjon_L 0.02 0.14 0.14 0.89
Jeg_L -0.06 0.09 -0.69 0.49
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.00 0.09 0.00 1.00
Samtykke_L -0.24 0.09 -2.65 0.01
hap_L 0.21 0.09 2.45 0.02
Eksklusjon_L 0.13 0.09 1.41 0.16
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.00 0.09 0.00 1.00
Samtykke_L -0.27 0.09 -3.11 0.00
hap_L 0.22 0.09 2.54 0.01
Res.Df RSS Df Sum of Sq F Pr(>F)
108 96.11795 NA NA NA NA
115 102.73744 -7 -6.619487 1.062541 0.3926491
116 104.52198 -1 -1.784547 2.005152 0.1596432

Two leader variables are retained as predictors in the lm model.

Only Follower Predictors

We retain the nine most important predictors.

Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.00 0.08 0.00 1.00
Konj_F 0.13 0.09 1.57 0.12
Verb_F -0.34 0.11 -3.21 0.00
Sinne_F -0.21 0.08 -2.61 0.01
Subst_F 0.30 0.09 3.54 0.00
Biologisk_F -0.19 0.09 -2.22 0.03
Jeg_F -0.17 0.09 -1.92 0.06
Kausalt_F 0.16 0.08 1.93 0.06
Sixltr_F -0.32 0.10 -3.33 0.00
Funksjon_F 0.17 0.10 1.68 0.10

Final linear model

The R2 adjusted of the leader-only model is

[1] 0.09894841

and the R2 adj of the follower-only model is

[1] 0.2881977

Combining both sets of parameters yields the final model

Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.00 0.07 0.00 1.00
Konj_F 0.13 0.08 1.52 0.13
Verb_F -0.31 0.10 -3.06 0.00
Sinne_F -0.23 0.08 -2.91 0.00
Subst_F 0.24 0.08 2.91 0.00
Biologisk_F -0.18 0.08 -2.23 0.03
Jeg_F -0.15 0.09 -1.78 0.08
Kausalt_F 0.14 0.08 1.71 0.09
Sixltr_F -0.28 0.09 -3.09 0.00
Funksjon_F 0.18 0.10 1.77 0.08
Samtykke_L -0.21 0.08 -2.71 0.01
hap_L 0.16 0.08 2.11 0.04

The R2 adj of the final model is

[1] 0.3420889