What drives PAP scores

Intro

We consider total_score_pap, i.e., quality of team output as assessed by expert team. The predictor variables are textual and facial. We conduct analysis on the team level, aggregating follower behaviour into one mean value.

We have far too many variables:

  [1] "y"              "WC_F"           "WC_L"           "Sixltr_F"      
  [5] "Sixltr_L"       "Funksjon_F"     "Funksjon_L"     "Pronomen_F"    
  [9] "Pronomen_L"     "Ppron_F"        "Ppron_L"        "Jeg_F"         
 [13] "Jeg_L"          "Vi_F"           "Vi_L"           "Du_F"          
 [17] "Du_L"           "Hanhun_F"       "Hanhun_L"       "De_F"          
 [21] "De_L"           "Upron_F"        "Upron_L"        "Artikkel_F"    
 [25] "Artikkel_L"     "Verb_F"         "Verb_L"         "Hverb_F"       
 [29] "Hverb_L"        "Fortid_F"       "Fortid_L"       "Naatid_F"      
 [33] "Naatid_L"       "Fremtid_F"      "Fremtid_L"      "Adverb_F"      
 [37] "Adverb_L"       "Prep_F"         "Prep_L"         "Konj_F"        
 [41] "Konj_L"         "Nekting_F"      "Nekting_L"      "Kvantitet_F"   
 [45] "Kvantitet_L"    "Tallord_F"      "Tallord_L"      "Banne_F"       
 [49] "Sosial_F"       "Sosial_L"       "Posemo_F"       "Posemo_L"      
 [53] "Negemo_F"       "Negemo_L"       "Angst_F"        "Angst_L"       
 [57] "Sinne_F"        "Sinne_L"        "Trist_F"        "Trist_L"       
 [61] "KognitivPros_F" "KognitivPros_L" "Innsikt_F"      "Innsikt_L"     
 [65] "Kausalt_F"      "Kausalt_L"      "Diskrepans_F"   "Diskrepans_L"  
 [69] "Tentativ_F"     "Tentativ_L"     "Sikker_F"       "Sikker_L"      
 [73] "Inhibisjon_F"   "Inhibisjon_L"   "Inklusjon_F"    "Inklusjon_L"   
 [77] "Eksklusjon_F"   "Eksklusjon_L"   "Persepsjon_F"   "Persepsjon_L"  
 [81] "Se_F"           "Se_L"           "Hoere_F"        "Hoere_L"       
 [85] "Foele_F"        "Foele_L"        "Biologisk_F"    "Biologisk_L"   
 [89] "Relativ_F"      "Relativ_L"      "Bevegelse_F"    "Bevegelse_L"   
 [93] "Rom_F"          "Rom_L"          "Tid_F"          "Tid_L"         
 [97] "Prestasjon_F"   "Prestasjon_L"   "Samtykke_F"     "Samtykke_L"    
[101] "Stotring_F"     "Stotring_L"     "Fyllord_F"      "Fyllord_L"     
[105] "Subst_F"        "Subst_L"        "Personal_F"     "Personal_L"    
[109] "CDI_F"          "CDI_L"          "female_leader"  "hap_F"         
[113] "hap_L"          "ang_F"          "ang_L"          "sad_F"         
[117] "sad_L"          "Ref_F"          "Ref_L"

Correlations with the PAP outcome are modest

Variable importance calculated by machine learning

Five machine learning methods were used

x
lm
glmnet
knn
pls
gbm

Leaders

The combined variable importance from the five methods:

Followers

Running linear regressions with most important variables

Only Leader Predictors

We first run with ten most important predictors. And then with a smaller set of predictors, and the final model has only two predictors.

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	0.00	0.09	0.00	1.00
Samtykke_L	-0.18	0.11	-1.69	0.09
hap_L	0.21	0.09	2.29	0.02
Personal_L	0.13	0.10	1.28	0.20
Eksklusjon_L	0.25	0.12	2.04	0.04
Hverb_L	0.11	0.11	1.04	0.30
Nekting_L	-0.15	0.11	-1.40	0.16
KognitivPros_L	0.02	0.11	0.17	0.86
Upron_L	-0.17	0.12	-1.41	0.16
Funksjon_L	0.02	0.14	0.14	0.89
Jeg_L	-0.06	0.09	-0.69	0.49

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	0.00	0.09	0.00	1.00
Samtykke_L	-0.24	0.09	-2.65	0.01
hap_L	0.21	0.09	2.45	0.02
Eksklusjon_L	0.13	0.09	1.41	0.16

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	0.00	0.09	0.00	1.00
Samtykke_L	-0.27	0.09	-3.11	0.00
hap_L	0.22	0.09	2.54	0.01

Res.Df	RSS	Df	Sum of Sq	F	Pr(>F)
108	96.11795	NA	NA	NA	NA
115	102.73744	-7	-6.619487	1.062541	0.3926491
116	104.52198	-1	-1.784547	2.005152	0.1596432

Two leader variables are retained as predictors in the lm model.

Only Follower Predictors

We retain the nine most important predictors.

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	0.00	0.08	0.00	1.00
Konj_F	0.13	0.09	1.57	0.12
Verb_F	-0.34	0.11	-3.21	0.00
Sinne_F	-0.21	0.08	-2.61	0.01
Subst_F	0.30	0.09	3.54	0.00
Biologisk_F	-0.19	0.09	-2.22	0.03
Jeg_F	-0.17	0.09	-1.92	0.06
Kausalt_F	0.16	0.08	1.93	0.06
Sixltr_F	-0.32	0.10	-3.33	0.00
Funksjon_F	0.17	0.10	1.68	0.10

Final linear model

The R2 adjusted of the leader-only model is

[1] 0.09894841

and the R2 adj of the follower-only model is

[1] 0.2881977

Combining both sets of parameters yields the final model

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	0.00	0.07	0.00	1.00
Konj_F	0.13	0.08	1.52	0.13
Verb_F	-0.31	0.10	-3.06	0.00
Sinne_F	-0.23	0.08	-2.91	0.00
Subst_F	0.24	0.08	2.91	0.00
Biologisk_F	-0.18	0.08	-2.23	0.03
Jeg_F	-0.15	0.09	-1.78	0.08
Kausalt_F	0.14	0.08	1.71	0.09
Sixltr_F	-0.28	0.09	-3.09	0.00
Funksjon_F	0.18	0.10	1.77	0.08
Samtykke_L	-0.21	0.08	-2.71	0.01
hap_L	0.16	0.08	2.11	0.04

The R2 adj of the final model is

[1] 0.3420889