Methods overview

Objective and Dataset Design

Data preparation

To prepare therapist and client utterances for large language model (LLM) analysis, we applied a standardized text-cleaning pipeline to remove extraneous content and normalize input across sessions. Each utterance was stripped of timestamp markers (e.g., [00:15], [1:15:30]) and non-verbal annotations such as [laughs] or (sigh). Contractions were expanded (e.g., “won’t” → “will not”) to increase clarity and compatibility with downstream models. Repeated punctuation (e.g., …, !!, ??) was reduced to single marks, and all irregular spacing was normalized. The resulting cleaned utterances (ClientClean, TherapistClean) preserved the semantic content of the original dialogue while minimizing noise, enabling more reliable summarization and classification.

To identify key psychological shifts and communicative patterns in therapy sessions, we used a large language model (GPT-3.5-turbo) to generate rolling summaries based on sequential conversational windows. We piloted 1-, 2-, 3-, and 4-turn windows and found that increasing the number of turns improved summary reliability. A 3-turn window offered the best balance between interpretive richness and moment-to-moment precision, and was used for all downstream analyses. The dataset was sorted chronologically by participant, and a moving window captured up to three consecutive therapist–client exchanges at each point. The model received a prompt instructing it to “Summarize the key psychological themes or shifts in this interaction. Avoid quoting. Use 1–2 sentences.” For each eligible turn, we saved the raw text window (RawText_3Turn) and the corresponding summary (Summary_3Turn). This structured, mid-level summarization provided a scalable and interpretable representation of therapeutic process across sessions.

scoring transcripts

We used a three-stage Build–Filter–Train approach to classify therapy transcript segments for two distinct coding schemes. Claudia classified seven psychological processes (Motivation, Self, Cognition, Affect, Attention, Overt Behavior, Context) using three methods: instruction-based LLM classification (GPT-3.5 Turbo) with both full and short category definitions, embedding-based similarity (text-embedding-ada-002) to compare segments to category definitions in semantic space, and rule-based keyword matching. Natasa classified five social-interpersonal categories (Relational Content Focus, Understanding, Interpersonal Effectiveness, Collaboration, Ineffective) using the same LLM and keyword methods but not embeddings, as her focus was on social and contextual dimensions rather than content. In the Filter stage, outputs from all methods were combined and run through the Boruta feature selection algorithm separately for each task to retain only the most reliable predictors. In the Train stage, these confirmed predictors were used to train XGBoost models, producing robust, accurate predictions. This framework combined the interpretive flexibility of LLMs, the semantic precision of embeddings (Claudia only), and the transparency of rule-based methods into a unified, data-driven classification pipeline.

Data screening; excluding rules

Rules did not work for summaries (all 0s), except for relational content. did work for raw text. though. summary based rule models exluded from further analysis

summary(QdataSocial)
##        ID            Turn        Therapy friendly    Dimension1       
##  Min.   : 1.0   Min.   : 1.000   Length:707         Length:707        
##  1st Qu.:13.0   1st Qu.: 4.000   Class :character   Class :character  
##  Median :26.0   Median : 7.000   Mode  :character   Mode  :character  
##  Mean   :26.6   Mean   : 7.306                                        
##  3rd Qu.:40.0   3rd Qu.:11.000                                        
##  Max.   :53.0   Max.   :20.000                                        
##                                                                       
##   Dimension2        Dimension 3                           Social dimension
##  Length:707         Length:707         Collaboration              : 93    
##  Class :character   Class :character   Ineffective                :110    
##  Mode  :character   Mode  :character   Interpersonal effectiveness:124    
##                                        None explictely simulated  :256    
##                                        Understanding              :124    
##                                                                           
##                                                                           
##  Presenting problem Intervention Strategy Therapist approach
##  Length:707         Length:707            Length:707        
##  Class :character   Class :character      Class :character  
##  Mode  :character   Mode  :character      Mode  :character  
##                                                             
##                                                             
##                                                             
##                                                             
##  Therapist example language Intended effects      Client         
##  Length:707                 Length:707         Length:707        
##  Class :character           Class :character   Class :character  
##  Mode  :character           Mode  :character   Mode  :character  
##                                                                  
##                                                                  
##                                                                  
##                                                                  
##   Therapist         ClientClean        TherapistClean     Summary_3Turn     
##  Length:707         Length:707         Length:707         Length:707        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  RawText_3Turn      Summary_3Turn_RelationalContentFocus_rule
##  Length:707         Min.   :0.0000                           
##  Class :character   1st Qu.:0.0000                           
##  Mode  :character   Median :0.0000                           
##                     Mean   :0.3508                           
##                     3rd Qu.:0.0000                           
##                     Max.   :4.0000                           
##                                                              
##  Summary_3Turn_Understanding_rule Summary_3Turn_InterpersonalEffectiveness_rule
##  Min.   :0                        Min.   :0                                    
##  1st Qu.:0                        1st Qu.:0                                    
##  Median :0                        Median :0                                    
##  Mean   :0                        Mean   :0                                    
##  3rd Qu.:0                        3rd Qu.:0                                    
##  Max.   :0                        Max.   :0                                    
##                                                                                
##  Summary_3Turn_Collaboration_rule Summary_3Turn_Ineffective_rule
##  Min.   :0                        Min.   :0                     
##  1st Qu.:0                        1st Qu.:0                     
##  Median :0                        Median :0                     
##  Mean   :0                        Mean   :0                     
##  3rd Qu.:0                        3rd Qu.:0                     
##  Max.   :0                        Max.   :0                     
##                                                                 
##  Summary_3Turn_status_rule RawText_3Turn_RelationalContentFocus_rule
##  Length:707                Min.   :0.0000                           
##  Class :character          1st Qu.:0.0000                           
##  Mode  :character          Median :0.0000                           
##                            Mean   :0.4328                           
##                            3rd Qu.:0.0000                           
##                            Max.   :4.0000                           
##                                                                     
##  RawText_3Turn_Understanding_rule RawText_3Turn_InterpersonalEffectiveness_rule
##  Min.   : 0.00                    Min.   :0.0000                               
##  1st Qu.: 0.00                    1st Qu.:0.0000                               
##  Median : 0.00                    Median :0.0000                               
##  Mean   : 1.38                    Mean   :0.1075                               
##  3rd Qu.: 2.00                    3rd Qu.:0.0000                               
##  Max.   :10.00                    Max.   :2.0000                               
##                                                                                
##  RawText_3Turn_Collaboration_rule RawText_3Turn_Ineffective_rule
##  Min.   :0.00000                  Min.   :0.0000                
##  1st Qu.:0.00000                  1st Qu.:0.0000                
##  Median :0.00000                  Median :0.0000                
##  Mean   :0.09052                  Mean   :0.1188                
##  3rd Qu.:0.00000                  3rd Qu.:0.0000                
##  Max.   :6.00000                  Max.   :4.0000                
##                                                                 
##  RawText_3Turn_status_rule Client_RelationalContentFocus_rule
##  Length:707                Min.   : 0.0000                   
##  Class :character          1st Qu.: 0.0000                   
##  Mode  :character          Median : 0.0000                   
##                            Mean   : 0.5021                   
##                            3rd Qu.: 0.0000                   
##                            Max.   :10.0000                   
##                            NA's   :2                         
##  Client_Understanding_rule Client_InterpersonalEffectiveness_rule
##  Min.   : 0.0000           Min.   : 0.0000                       
##  1st Qu.: 0.0000           1st Qu.: 0.0000                       
##  Median : 0.0000           Median : 0.0000                       
##  Mean   : 0.5702           Mean   : 0.4482                       
##  3rd Qu.: 0.0000           3rd Qu.: 0.0000                       
##  Max.   :10.0000           Max.   :10.0000                       
##  NA's   :2                 NA's   :2                             
##  Client_Collaboration_rule Client_Ineffective_rule Client_status_rule
##  Min.   : 0.0000           Min.   : 0.0000         Length:707        
##  1st Qu.: 0.0000           1st Qu.: 0.0000         Class :character  
##  Median : 0.0000           Median : 0.0000         Mode  :character  
##  Mean   : 0.2014           Mean   : 0.7773                           
##  3rd Qu.: 0.0000           3rd Qu.: 0.0000                           
##  Max.   :10.0000           Max.   :10.0000                           
##  NA's   :2                 NA's   :2                                 
##  Therapist_RelationalContentFocus_rule Therapist_Understanding_rule
##  Min.   :0.0000                        Min.   : 0.000              
##  1st Qu.:0.0000                        1st Qu.: 0.000              
##  Median :0.0000                        Median : 0.000              
##  Mean   :0.3914                        Mean   : 1.099              
##  3rd Qu.:0.0000                        3rd Qu.: 2.000              
##  Max.   :6.0000                        Max.   :10.000              
##  NA's   :12                            NA's   :12                  
##  Therapist_InterpersonalEffectiveness_rule Therapist_Collaboration_rule
##  Min.   :0.0000                            Min.   :0.0000              
##  1st Qu.:0.0000                            1st Qu.:0.0000              
##  Median :0.0000                            Median :0.0000              
##  Mean   :0.4777                            Mean   :0.1065              
##  3rd Qu.:0.0000                            3rd Qu.:0.0000              
##  Max.   :6.0000                            Max.   :6.0000              
##  NA's   :12                                NA's   :12                  
##  Therapist_Ineffective_rule Therapist_status_rule
##  Min.   :0.0000             Length:707           
##  1st Qu.:0.0000             Class :character     
##  Median :0.0000             Mode  :character     
##  Mean   :0.3712                                  
##  3rd Qu.:0.0000                                  
##  Max.   :6.0000                                  
##  NA's   :12                                      
##  Summary_3Turn_RelationalContentFocus_full Summary_3Turn_Understanding_full
##  Min.   : 3.000                            Min.   : 6.500                  
##  1st Qu.: 8.500                            1st Qu.: 9.000                  
##  Median : 8.500                            Median : 9.000                  
##  Mean   : 8.544                            Mean   : 8.868                  
##  3rd Qu.: 8.500                            3rd Qu.: 9.000                  
##  Max.   :10.000                            Max.   :10.000                  
##                                                                            
##  Summary_3Turn_Interpersonaleffectiveness_full Summary_3Turn_Collaboration_full
##  Min.   :6.000                                 Min.   :3.000                   
##  1st Qu.:7.500                                 1st Qu.:8.000                   
##  Median :9.000                                 Median :8.000                   
##  Mean   :8.571                                 Mean   :8.041                   
##  3rd Qu.:9.500                                 3rd Qu.:9.000                   
##  Max.   :9.500                                 Max.   :9.500                   
##                                                                                
##  Summary_3Turn_Ineffective_full Summary_3Turn_RelationalContentFocus_short
##  Min.   :0.000                  Min.   : 8.000                            
##  1st Qu.:2.000                  1st Qu.: 8.000                            
##  Median :2.000                  Median : 8.500                            
##  Mean   :1.917                  Mean   : 8.498                            
##  3rd Qu.:2.000                  3rd Qu.: 9.000                            
##  Max.   :9.000                  Max.   :10.000                            
##                                                                           
##  Summary_3Turn_Understanding_short
##  Min.   : 6.500                   
##  1st Qu.: 8.000                   
##  Median : 9.000                   
##  Mean   : 8.579                   
##  3rd Qu.: 9.000                   
##  Max.   :10.000                   
##                                   
##  Summary_3Turn_Interpersonaleffectiveness_short
##  Min.   :6.000                                 
##  1st Qu.:7.000                                 
##  Median :7.000                                 
##  Mean   :7.422                                 
##  3rd Qu.:7.500                                 
##  Max.   :9.500                                 
##                                                
##  Summary_3Turn_Collaboration_short Summary_3Turn_Ineffective_short
##  Min.   :4.000                     Min.   :1.000                  
##  1st Qu.:8.000                     1st Qu.:2.000                  
##  Median :8.000                     Median :2.000                  
##  Mean   :8.056                     Mean   :2.295                  
##  3rd Qu.:8.500                     3rd Qu.:3.000                  
##  Max.   :9.500                     Max.   :5.000                  
##                                                                   
##  RawText_3Turn_RelationalContentFocus_full RawText_3Turn_Understanding_full
##  Min.   : 0.000                            Min.   : 0.000                  
##  1st Qu.: 8.000                            1st Qu.: 8.500                  
##  Median : 8.000                            Median : 9.000                  
##  Mean   : 8.088                            Mean   : 8.544                  
##  3rd Qu.: 9.000                            3rd Qu.: 9.500                  
##  Max.   :10.000                            Max.   :10.000                  
##                                                                            
##  RawText_3Turn_Interpersonaleffectiveness_full RawText_3Turn_Collaboration_full
##  Min.   : 0.000                                Min.   : 0.000                  
##  1st Qu.: 7.500                                1st Qu.: 7.500                  
##  Median : 9.000                                Median : 8.000                  
##  Mean   : 8.176                                Mean   : 7.574                  
##  3rd Qu.: 9.500                                3rd Qu.: 9.000                  
##  Max.   :10.000                                Max.   :10.000                  
##                                                                                
##  RawText_3Turn_Ineffective_full RawText_3Turn_RelationalContentFocus_short
##  Min.   : 0.000                 Min.   : 2.000                            
##  1st Qu.: 0.000                 1st Qu.: 8.000                            
##  Median : 0.000                 Median : 8.000                            
##  Mean   : 1.439                 Mean   : 8.073                            
##  3rd Qu.: 2.000                 3rd Qu.: 8.000                            
##  Max.   :10.000                 Max.   :10.000                            
##                                                                           
##  RawText_3Turn_Understanding_short
##  Min.   : 3.000                   
##  1st Qu.: 8.500                   
##  Median : 9.000                   
##  Mean   : 8.588                   
##  3rd Qu.: 9.000                   
##  Max.   :10.000                   
##                                   
##  RawText_3Turn_Interpersonaleffectiveness_short
##  Min.   : 3.000                                
##  1st Qu.: 7.000                                
##  Median : 8.000                                
##  Mean   : 7.857                                
##  3rd Qu.: 9.000                                
##  Max.   :10.000                                
##                                                
##  RawText_3Turn_Collaboration_short RawText_3Turn_Ineffective_short
##  Min.   : 1.000                    Min.   :0.000                  
##  1st Qu.: 7.000                    1st Qu.:2.000                  
##  Median : 8.000                    Median :2.000                  
##  Mean   : 7.718                    Mean   :2.187                  
##  3rd Qu.: 9.000                    3rd Qu.:2.000                  
##  Max.   :10.000                    Max.   :9.000                  
##                                                                   
##  Client_RelationalContentFocus_full Client_Understanding_full
##  Min.   : 0.000                     Min.   : 0.000           
##  1st Qu.: 8.000                     1st Qu.: 8.000           
##  Median : 8.500                     Median : 9.000           
##  Mean   : 7.961                     Mean   : 8.349           
##  3rd Qu.: 8.500                     3rd Qu.: 9.000           
##  Max.   :10.000                     Max.   :10.000           
##                                                              
##  Client_Interpersonaleffectiveness_full Client_Collaboration_full
##  Min.   : 0.000                         Min.   : 0.000           
##  1st Qu.: 7.000                         1st Qu.: 6.000           
##  Median : 7.500                         Median : 6.000           
##  Mean   : 7.278                         Mean   : 6.098           
##  3rd Qu.: 7.500                         3rd Qu.: 7.500           
##  Max.   :10.000                         Max.   :10.000           
##                                                                  
##  Client_Ineffective_full Client_RelationalContentFocus_short
##  Min.   :0.000           Min.   : 0.000                     
##  1st Qu.:2.000           1st Qu.: 8.000                     
##  Median :2.000           Median : 8.000                     
##  Mean   :3.076           Mean   : 8.061                     
##  3rd Qu.:3.000           3rd Qu.: 8.000                     
##  Max.   :9.500           Max.   :10.000                     
##                                                             
##  Client_Understanding_short Client_Interpersonaleffectiveness_short
##  Min.   : 0.00              Min.   :0.000                          
##  1st Qu.: 7.00              1st Qu.:6.000                          
##  Median : 7.50              Median :7.000                          
##  Mean   : 7.67              Mean   :6.401                          
##  3rd Qu.: 9.00              3rd Qu.:7.000                          
##  Max.   :10.00              Max.   :9.000                          
##                                                                    
##  Client_Collaboration_short Client_Ineffective_short
##  Min.   :0.000              Min.   :0.000           
##  1st Qu.:5.000              1st Qu.:3.000           
##  Median :6.000              Median :3.000           
##  Mean   :5.583              Mean   :3.475           
##  3rd Qu.:6.000              3rd Qu.:3.000           
##  Max.   :9.000              Max.   :9.000           
##                                                     
##  Therapist_RelationalContentFocus_full Therapist_Understanding_full
##  Min.   :0.000                         Min.   : 0.00               
##  1st Qu.:8.500                         1st Qu.: 9.00               
##  Median :8.500                         Median : 9.00               
##  Mean   :8.149                         Mean   : 8.61               
##  3rd Qu.:8.500                         3rd Qu.: 9.00               
##  Max.   :9.000                         Max.   :10.00               
##                                                                    
##  Therapist_Interpersonaleffectiveness_full Therapist_Collaboration_full
##  Min.   :0.000                             Min.   :0.000               
##  1st Qu.:7.500                             1st Qu.:6.500               
##  Median :7.500                             Median :8.000               
##  Mean   :8.081                             Mean   :7.474               
##  3rd Qu.:9.500                             3rd Qu.:9.000               
##  Max.   :9.500                             Max.   :9.500               
##                                                                        
##  Therapist_Ineffective_full Therapist_RelationalContentFocus_short
##  Min.   :0.000              Min.   : 0.000                        
##  1st Qu.:1.000              1st Qu.: 8.000                        
##  Median :2.000              Median : 8.000                        
##  Mean   :1.731              Mean   : 7.946                        
##  3rd Qu.:2.000              3rd Qu.: 8.000                        
##  Max.   :9.000              Max.   :10.000                        
##                                                                   
##  Therapist_Understanding_short Therapist_Interpersonaleffectiveness_short
##  Min.   : 0.000                Min.   :0.000                             
##  1st Qu.: 7.500                1st Qu.:7.000                             
##  Median : 9.000                Median :7.000                             
##  Mean   : 8.219                Mean   :7.178                             
##  3rd Qu.: 9.000                3rd Qu.:7.500                             
##  Max.   :10.000                Max.   :9.500                             
##                                                                          
##  Therapist_Collaboration_short Therapist_Ineffective_short
##  Min.   :0.000                 Min.   :0.000              
##  1st Qu.:6.000                 1st Qu.:2.000              
##  Median :7.000                 Median :2.000              
##  Mean   :7.195                 Mean   :2.214              
##  3rd Qu.:8.500                 3rd Qu.:2.500              
##  Max.   :9.500                 Max.   :9.000              
##                                                           
##  Summary_3Turn_status_full Summary_3Turn_status_short RawText_3Turn_status_full
##  Length:707                Length:707                 Length:707               
##  Class :character          Class :character           Class :character         
##  Mode  :character          Mode  :character           Mode  :character         
##                                                                                
##                                                                                
##                                                                                
##                                                                                
##  RawText_3Turn_status_short Client_status_full Client_status_short
##  Length:707                 Length:707         Length:707         
##  Class :character           Class :character   Class :character   
##  Mode  :character           Mode  :character   Mode  :character   
##                                                                   
##                                                                   
##                                                                   
##                                                                   
##  Therapist_status_full Therapist_status_short    MasterID    
##  Length:707            Length:707             Min.   :  1.0  
##  Class :character      Class :character       1st Qu.:177.5  
##  Mode  :character      Mode  :character       Median :354.0  
##                                               Mean   :354.0  
##                                               3rd Qu.:530.5  
##                                               Max.   :707.0  
##                                                              
##  Dimension1_clean   label_motivation label_cognition    label_self    
##  Length:707         Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  Class :character   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Mode  :character   Median :0.0000   Median :0.0000   Median :0.0000  
##                     Mean   :0.1202   Mean   :0.1584   Mean   :0.1726  
##                     3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.0000  
##                     Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##                                                                       
##   label_affect    label_attention  label_overt_behavior label_context   
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000       Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000       1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000   Median :0.0000       Median :0.0000  
##  Mean   :0.1457   Mean   :0.1443   Mean   :0.1287       Mean   :0.1301  
##  3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.0000       3rd Qu.:0.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000       Max.   :1.0000  
##                                                                         
##      Social_binary SocialDim_Collaboration SocialDim_Ineffective
##  Other      :  0   Min.   :0.0000          Min.   :0.0000       
##  Ineffective:110   1st Qu.:0.0000          1st Qu.:0.0000       
##  NA's       :597   Median :0.0000          Median :0.0000       
##                    Mean   :0.1315          Mean   :0.1556       
##                    3rd Qu.:0.0000          3rd Qu.:0.0000       
##                    Max.   :1.0000          Max.   :1.0000       
##                                                                 
##  SocialDim_Interpersonaleffectiveness SocialDim_Noneexplictelysimulated
##  Min.   :0.0000                       Min.   :0.0000                   
##  1st Qu.:0.0000                       1st Qu.:0.0000                   
##  Median :0.0000                       Median :0.0000                   
##  Mean   :0.1754                       Mean   :0.3621                   
##  3rd Qu.:0.0000                       3rd Qu.:1.0000                   
##  Max.   :1.0000                       Max.   :1.0000                   
##                                                                        
##  SocialDim_Understanding
##  Min.   :0.0000         
##  1st Qu.:0.0000         
##  Median :0.0000         
##  Mean   :0.1754         
##  3rd Qu.:0.0000         
##  Max.   :1.0000         
## 

Boruta Explained (FILTER STAGE)

To identify the most reliable predictors for each social-interpersonal coding category, we applied the Boruta feature selection algorithm to five target outcomes:

  • SocialDim_Collaboration
  • SocialDim_Ineffective
  • SocialDim_Interpersonaleffectiveness
  • SocialDim_Noneexplictelysimulated
  • SocialDim_Understanding

For each of these variables, Boruta was run separately using all available model-generated features (LLM-based scores, rule-based matching, etc.). The algorithm works by comparing the importance of real predictors to “shadow” variables — random noise versions of each feature. Predictors are only retained if they consistently outperform this noise baseline.

Each feature is classified as:

  • Confirmed — consistently more important than any shadow (strong predictor)
  • Tentative — borderline importance (may be useful)
  • Rejected — weaker than noise (excluded from modeling)

This process ensures that only robust predictors enter the next modeling stage.

What is a shadow feature?

A shadow feature is a shuffled version of a real predictor. It has no actual relationship to the outcome, so it acts as a baseline for what “random importance” looks like. Boruta tests whether each real predictor consistently performs better than the best-performing shadow. This allows it to confidently reject weak or noisy features.


Results summary

🔍 By Text Source

  • RawText produced the highest number of confirmed predictors (49), indicating strong signal in raw therapist–client text.
  • Therapist features contributed 14 confirmed predictors.
  • Summary features contributed 12 confirmed predictors.
  • Client features had weaker performance, with just 1 confirmed predictor and 71 rejected. this is perhaps unsurprising because the model focuses on therapist behavior, so in a way this is a validation check

🔍 By Modeling Method

  • Full-prompt LLM models had the most confirmed features (n = 40).
  • Short-prompt LLM yielded 23 confirmed.
  • Rule-based methods contributed 13 confirmed but also had the most rejected features (n = 77), suggesting lower signal-to-noise.

🔍 By Target Category

All five target categories produced meaningful results:

  • SocialDim_Ineffective: 15 confirmed predictors
  • SocialDim_Noneexplictelysimulated: 14 confirmed this is mostly effective behavior because instructions and default mode is of llm generaton is to be interpersonally validating and effective
  • SocialDim_Interpersonaleffectiveness: 11 confirmed
  • SocialDim_Understanding: 10 confirmed
  • SocialDim_Collaboration: 6 confirmed

This suggests that ineffective and unstructured behaviors are the easiest to predict, while collaborative behaviors may be harder to isolate.


These patterns highlight the strength of raw textual features and LLM-based classifications (especially full-prompt) in capturing interpersonal and contextual nuances. In contrast, rule-based methods are more likely to introduce noise or ambiguity.

The interactive tables below break down the Boruta results by text source, model type, and coding category.

Key boruta table (FILTER STAGE)

This table presents all predictors retained from the Boruta feature selection output, with each row corresponding to a single predictor and its attributes. The columns identify the text type the feature came from (e.g., raw or summarised transcript), the turn in the dialogue, the feature dimension it belongs to, and the model or method that produced it. For each target category, the table displays the predictor’s Boruta importance score, colour-coded to reflect its selection decision (confirmed, tentative, or rejected). The table is interactive and sortable, allowing the user to filter or arrange predictors based on their source, position in the transcript, feature grouping, method of generation, or category-specific importance.

XGboost Explained (PREDICT STAGE)

XGBoost is a machine learning method that builds a series of decision trees, where each new tree tries to fix the mistakes made by the previous ones (Friedman, 2001; Chen & Guestrin, 2016). Because it adds trees one at a time and learns from errors as it goes, it can model very complex patterns, including situations where predictors interact in non-obvious ways. XGBoost is designed to be fast, work well with large numbers of predictors, and avoid “overfitting” (when a model learns the training data too perfectly but performs poorly on new data). It does this by limiting tree depth, using random subsets of the data and predictors for each tree, and slowing down learning so that each step makes only a small adjustment.

Procedure. For each outcome we wanted to predict (e.g., Motivation), we started with the features Boruta had marked as “confirmed” for that outcome and built a dataset using only those predictors. We trained an XGBoost model with settings designed to balance complexity and generalization: tree depth limited to 3 (max_depth = 3), a modest learning rate of 0.05 (eta = 0.05), sampling 80% of the data for each tree (subsample = 0.8), and sampling 80% of the predictors for each tree (colsample_bytree = 0.8). The model’s objective was binary logistic classification (objective = “binary:logistic”) and performance was tracked using the area under the ROC curve (eval_metric = “auc”).

To make sure the model wasn’t just memorizing the training data, we used 5-fold cross-validation. This means we split the data into five equal parts, trained the model on four parts, and tested it on the one part left out. We repeated this process five times so that every part of the data was used once for testing. The results from the five test runs were averaged to give a more reliable measure of how the model would perform on new data. This process greatly reduces the risk of overfitting because the model is tested on data it has never seen during training in each round. We also used early stopping—ending training if performance didn’t improve for 10 rounds—to prevent the model from getting too complex.

Model performance was evaluated using accuracy, a confusion matrix (showing correct and incorrect predictions), the AUC, and a ranked list of the most important predictors based on the model’s internal feature importance scores.

References Chen, T., & Guestrin, C. (2016, August 13). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’16: The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco California USA. https://doi.org/10.1145/2939672.2939785

Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451

## [1] " Run Boruta-based feature selection and XGBoost model training with evaluation\n#"

ineffective predicting ineffective (Predict)

Did therapist show behavior that was either collaborative, understanding or interpersonally effective. Maybe hard to distinguish between positive behaviors, but easy to distinguish positive from negative.

Everything explained here

(PREDICT STAGE). Everything explained here

<

What the model outputs mean

  1. Boruta Feature Importance Plot
    A horizontal bar chart of predictors that Boruta marked as confirmed for the target. Bars are ordered from most to least important and color-coded by decision (confirmed, tentative, rejected). This shows which inputs carried the strongest signal before model training.
  2. Final XGBoost Feature-Importance Table
    A ranked table of the predictors kept in the final model with their importance values (gain).

    What “feature importance (gain)” means: Each time the model splits on a feature, it improves its predictions a little. Gain is that improvement, summed over all the model’s trees and then scaled so values typically add up to ~1 across features. A larger number means that feature contributed more to correct decisions. (Use this as a guide to which features mattered most.)
  3. Confusion Matrix and Metrics
    A table that compares predictions with the true labels, plus the following statistics:
    • Accuracy: Share of all predictions that were correct.
    • 95% CI: Likely range for the true accuracy.
    • No Information Rate (NIR): Accuracy from always guessing the most common class.
    • p[Acc > NIR]: Tests if accuracy beats the NIR (smaller is better).
    • Kappa: Agreement beyond chance (higher is better).
    • McNemar’s p: Checks if false positives vs. false negatives are imbalanced.
    • Sensitivity: True positive rate.
    • Specificity: True negative rate.
    • Positive Predictive Value (Precision): Of predicted positives, how many were truly positive.
    • Negative Predictive Value: Of predicted negatives, how many were truly negative.
    • Prevalence: Share of positives in the data.
    • Detection Rate: Share of the whole dataset correctly identified as positive.
    • Detection Prevalence: Share predicted as positive.
    • Balanced Accuracy: Average of sensitivity and specificity (good for imbalanced classes).

###material for results section

We trained a model to predict when therapist–client interactions would be classified as ineffective, using features that focused specifically on signs of ineffective behavior. The model was tested on 707 examples and achieved a high overall accuracy of 94.5%, meaning it got the correct answer in nearly 95 out of every 100 cases. The confidence interval for this accuracy ranged from 92.5% to 96.1%, which means we can be quite confident that the true accuracy falls within that range.

When the actual label was “effective” (coded as 0), the model correctly predicted this 590 times and incorrectly called it “ineffective” only 7 times. This gives us a sensitivity of 98.8%, showing that the model is very good at spotting effective interactions. However, it was somewhat less accurate when identifying truly ineffective behavior (coded as 1). It correctly labeled 78 of the 110 ineffective cases and missed 32, giving a specificity of 70.9%. In other words, the model tended to over-predict effectiveness, sometimes failing to recognize when behavior was actually ineffective.

Despite this, when the model predicted a case as ineffective, it was right 94.9% of the time (positive predictive value), and when it predicted effective, it was correct 91.8% of the time (negative predictive value). The balanced accuracy, which averages the accuracy for both classes, was 84.9%. This helps account for the fact that most of the data (about 84%) belonged to the “effective” group.

We also ran McNemar’s test, which checks whether the types of errors the model made were evenly distributed. The result (p = 0.0001) showed a small but statistically significant bias — the model was more likely to miss an ineffective case than to wrongly label an effective one. This means that while the model performs very well overall, it tends to err on the side of assuming behavior is effective unless it’s clearly not.

Overall, the model did an excellent job of predicting ineffective behavior using only behavior-focused features, though it was slightly better at ruling out ineffectiveness than confirming it.

##                          Predictor Importance
##                             <char>      <num>
## 1:  RawText_3Turn_Ineffective_full      0.711
## 2: RawText_3Turn_Ineffective_short      0.264
## 3:      Therapist_Ineffective_full      0.016
## 4: Summary_3Turn_Ineffective_short      0.009
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0 590  33
##          1   7  77
##                                           
##                Accuracy : 0.9434          
##                  95% CI : (0.9238, 0.9593)
##     No Information Rate : 0.8444          
##     P-Value [Acc > NIR] : 2.729e-16       
##                                           
##                   Kappa : 0.7617          
##                                           
##  Mcnemar's Test P-Value : 7.723e-05       
##                                           
##             Sensitivity : 0.9883          
##             Specificity : 0.7000          
##          Pos Pred Value : 0.9470          
##          Neg Pred Value : 0.9167          
##              Prevalence : 0.8444          
##          Detection Rate : 0.8345          
##    Detection Prevalence : 0.8812          
##       Balanced Accuracy : 0.8441          
##                                           
##        'Positive' Class : 0               
## 

Effective (PREDICT) effective

no effectivness variable survives the filter

We could predict effecitve behavior, as boruta shows many variables are important for this…But their was not a specificity between our measure of effecitvenss and specifically effective behavior (e.g., collaboration, understanding, etc , might have all contributed to effectivewness score)

Collaboration (PREDICT) collaboration

Probably could call this significant. we probably can predict collaboration

##                           Predictor Importance
##                              <char>      <num>
## 1: RawText_3Turn_Collaboration_rule      0.577
## 2:     Therapist_Collaboration_rule      0.261
## 3: RawText_3Turn_Collaboration_full      0.163
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0 612  82
##          1   2  11
##                                          
##                Accuracy : 0.8812         
##                  95% CI : (0.855, 0.9041)
##     No Information Rate : 0.8685         
##     P-Value [Acc > NIR] : 0.1724         
##                                          
##                   Kappa : 0.1811         
##                                          
##  Mcnemar's Test P-Value : <2e-16         
##                                          
##             Sensitivity : 0.9967         
##             Specificity : 0.1183         
##          Pos Pred Value : 0.8818         
##          Neg Pred Value : 0.8462         
##              Prevalence : 0.8685         
##          Detection Rate : 0.8656         
##    Detection Prevalence : 0.9816         
##       Balanced Accuracy : 0.5575         
##                                          
##        'Positive' Class : 0              
## 

Understanding (PREDICT) Understanding

##                            Predictor Importance
##                               <char>      <num>
## 1:  RawText_3Turn_Understanding_rule      0.587
## 2:      Therapist_Understanding_rule      0.159
## 3:  RawText_3Turn_Understanding_full      0.146
## 4: RawText_3Turn_Understanding_short      0.108
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0 583 115
##          1   0   9
##                                          
##                Accuracy : 0.8373         
##                  95% CI : (0.808, 0.8638)
##     No Information Rate : 0.8246         
##     P-Value [Acc > NIR] : 0.2012         
##                                          
##                   Kappa : 0.1143         
##                                          
##  Mcnemar's Test P-Value : <2e-16         
##                                          
##             Sensitivity : 1.00000        
##             Specificity : 0.07258        
##          Pos Pred Value : 0.83524        
##          Neg Pred Value : 1.00000        
##              Prevalence : 0.82461        
##          Detection Rate : 0.82461        
##    Detection Prevalence : 0.98727        
##       Balanced Accuracy : 0.53629        
##                                          
##        'Positive' Class : 0              
## 

qualitative look at high and low scores