Exploratory Data Analysis

Introduction

This report initiates an exploratory investigation into the role of speech characteristics in evaluating interaction quality. The experiment consists of four parts featuring semi-structured dialogues conducted via Zoom, with questions that progressively increase in intimacy. After the conversation, participants completed the Goodness of Interaction (GOI) Questionnaire, which assesses their interaction experience. Our analysis focuses on individual speech patterns and dyadic synchrony, examining metrics such as word count similarity and speaking time ratio within dyads. To ensure robustness, we arbitrarily divided our 120 dyads into two datasets of 60 dyads each, conducting the initial analysis on the first dataset and planning to replicate our findings using the second dataset.

Pre Process

Load Libraries

library(tidyverse)
library(reshape2)
library(ggcorrplot)
library(lmerTest)
library(performance)
library(parameters)
library(corrplot)

Load Data

data <- read.csv("Data_with_Gender.csv")

head(data)

##     iDyad iSubject iPartner   TW TW_partner      ST ST_partner      WPM
## 1 101_102      101      102 2039       1846 846.088    891.503 147.2107
## 2 101_102      102      101 1846       2039 891.503    846.088 123.9325
## 3 103_104      103      104 1713       2551 766.419    956.859 135.2987
## 4 103_104      104      103 2551       1713 956.859    766.419 159.7310
## 5 105_106      105      106 1060       1042 451.392    423.877 140.0591
## 6 105_106      106      105 1042       1060 423.877    451.392 148.3425
##   WPM_partner  GOI Gender
## 1    123.9325 81.8      1
## 2    147.2107 55.8      1
## 3    159.7310 76.5      1
## 4    135.2987 67.1      1
## 5    148.3425 84.5      1
## 6    140.0591 58.2      1

Speech Attributes Predictors

Variables

Predictive:

GOI - Goodness of Interaction questionnaire score for self-assessment of interaction quality.

Predictors:

TW - The total number of words spoken by each participant during the conversation.

ST - The total amount of time, measured in seconds, during which a participant actively spoke, excluding any periods of silence, pauses, or non-verbal communication. These exclusions were determined using Audacity software for precise audio processing and silence removal.

WPM - Words per minute, calculated as TW / (ST / 60).

Partner - _partner suffix indicating that the variable is measured for the second participant within the dyad.

Descriptive Statistics

GOI

GOI_mean <- mean(data$GOI)
GOI_med <- median(data$GOI)
GOI_SD <- sd(data$GOI)
cat(c("Mean:", GOI_mean, "\nMedian:", GOI_med, "\nSD:", GOI_SD))

## Mean: 68.4285449735 
## Median: 72.05 
## SD: 19.2574304302369

ggplot(data, aes(x = GOI)) +
  geom_density(fill = "#FBB4AE") +
  geom_vline(xintercept = GOI_mean, linetype = "dashed", color = "gray40") +
  theme_classic()

Total Words

TW_mean <- mean(data$TW)
TW_med <- median(data$TW)
TW_SD <- sd(data$TW)
cat(c("Mean:", TW_mean, "\nMedian:", TW_med, "\nSD:", TW_SD))

## Mean: 1460.30833333333 
## Median: 1299.5 
## SD: 695.25686923301

ggplot(data, aes(x = TW)) +
  geom_density(fill = "#B3CDE3") +
  geom_vline(xintercept = TW_mean, linetype = "dashed", color = "gray40") +
  theme_classic()

Speaking Time

The total amount of time, measured in seconds, during which a participant actively spoke, excluding any periods of silence, pauses, or non-verbal communication.

ST_mean <- mean(data$ST)
ST_med <- median(data$ST)
ST_SD <- sd(data$ST)
cat(c("Mean:", ST_mean, "\nMedian:", ST_med, "\nSD:", ST_SD))

## Mean: 641.629175 
## Median: 574.6865 
## SD: 272.839385756241

ggplot(data, aes(x = ST)) +
  geom_density(fill = "#CCEBC5") +
  geom_vline(xintercept = ST_mean, linetype = "dashed", color = "gray40") +
  theme_classic()

Speech Rate

Words Per Minute

WPM_mean <- mean(data$WPM)
WPM_med <- median(data$WPM)
WPM_SD <- sd(data$WPM)
cat(c("Mean:", WPM_mean, "\nMedian:", WPM_med, "\nSD:", WPM_SD))

## Mean: 135.145140308667 
## Median: 134.5623547 
## SD: 20.2735024769839

ggplot(data, aes(x = WPM, color = factor(Gender))) +
  geom_density(fill = "#DECBE4", alpha = 0.4) +
  geom_vline(xintercept = WPM_mean, linetype = "dashed", color = "gray40") +
  theme_classic()

GOI VS Speech Attributes: Correlation Matrix

# Remove rows with NA values in selected columns
#clean_data <- data |> select(GOI, TW, TW_partner, ST, ST_partner, WPM, WPM_partner) |> na.omit()

# Calculate the correlation matrix and p-values
#corr_results <- corr(as.matrix(clean_data))

#corr_matrix <- corr_results$r
#p_matrix <- corr_results$P

# Plot the heatmap with correlation coefficients and significance markers
#corrplot(corr_matrix, method = "color", col = colorRampPalette(brewer.pal(n = 8, name = "RdYlBu"))(200),
#         type = "upper", p.mat = p_matrix, sig.level = c(0.001, 0.01, 0.05), 
#         insig = "label_sig", pch.cex = 1.2, number.cex = 0.9,
#         tl.col = "black", tl.srt = 45, addCoef.col = "black")

Report:

The correlation matrix reveals that longer speaking duration, measured by spoken words (TW, r = 0.266, p = 0.003) and speaking time (ST, r = 0.270, p = 0.003), is positively associated with higher interaction quality evaluations (GOI), as anticipated. Notably, the speaking duration of the partner (TW_partner, r = 0.315, p < 0.001; ST_partner, r = 0.347, p < 0.001) shows an even stronger correlation with GOI. This suggests that the more one person spoke during the dialogue, the higher their partner perceived the quality of the interaction. Although a higher speech rate (WPM) was also positively correlated with GOI (r = 0.097, p = 0.289), these associations were not statistically significant for either the individual or their partner (WPM_partner, r = 0.054, p = 0.561).

Synchrony

In this section, we explore the role of synchrony in dyadic interactions and its impact on perceived interaction quality (GOI). Synchrony, in this context, refers to the alignment or discrepancy in speaking behaviors between partners within a dyad. To quantify this, we calculated both absolute differences and relative ratios between dyad members across three key speech characteristics: total words spoken (TW), speaking time (ST), and words per minute (WPM).

Initially, we derived the absolute differences between each participant’s speech metrics and those of their partner (TW_diff, ST_diff, WPM_diff). These measures provide a straightforward indicator of speech asymmetry, where higher values represent greater divergence between dyad members.

Next, we computed the relative ratios for each speech characteristic (TW_ratio, ST_ratio, WPM_ratio), offering a perspective on the proportional contribution of each dyad member to the interaction. Ratios closer to 0.5 indicate a balanced interaction, while deviations suggest dominance by one partner.

-To further explore these relationships, we applied log transformations to the difference and ratio measures (e.g., TW_diff_log, TW_ratio_log) to address potential skewness in the data. We then examined the influence of these synchrony measures on interaction quality (GOI) using linear mixed-effects models and linear regression.

Similarity

Absolute Distance Between Dyad’s Participants

data_diff <- data |>
  mutate(
  TW_diff = as.numeric(abs(TW - TW_partner)),
  ST_diff = abs(ST - ST_partner),
  WPM_diff = abs(WPM - WPM_partner))

data_diff |>
  select(iSubject, TW_diff, ST_diff, WPM_diff) |>
  head()

##   iSubject TW_diff ST_diff  WPM_diff
## 1      101     193  45.415 23.278206
## 2      102     193  45.415 23.278206
## 3      103     838 190.440 24.432296
## 4      104     838 190.440 24.432296
## 5      105      18  27.515  8.283413
## 6      106      18  27.515  8.283413

Ratio

data_diff_ratio <- data_diff |>
  mutate(
  TW_ratio = TW / (TW + TW_partner),
  ST_ratio = ST / (ST + ST_partner),
  WPM_ratio = WPM / (WPM_partner + WPM),
  TW_ratio_partner = TW_partner / (TW + TW_partner),
  ST_ratio_partner = ST_partner / (ST + ST_partner),
  WPM_ratio_partner = WPM_partner / (WPM_partner + WPM))

               
data_diff_ratio |>
  select(iSubject, TW_ratio, ST_ratio, WPM_ratio) |>
  head()

##   iSubject  TW_ratio  ST_ratio WPM_ratio
## 1      101 0.5248391 0.4869316 0.5429260
## 2      102 0.4751609 0.5130684 0.4570740
## 3      103 0.4017355 0.4447448 0.4585935
## 4      104 0.5982645 0.5552552 0.5414065
## 5      105 0.5042816 0.5157180 0.4856391
## 6      106 0.4957184 0.4842820 0.5143609

Total Words Synchrony

Total Words Similarity

Absolute Distance Between Total Words within Dyad

TW_diff_mean <- mean(data_diff$TW_diff)
TW_diff_med <- median(data_diff$TW_diff)
TW_diff_SD <- sd(data_diff$TW_diff)
cat(c("Mean:", TW_diff_mean, "\nMedian:", TW_diff_med, "\nSD:", TW_diff_SD))

## Mean: 441.55 
## Median: 288 
## SD: 385.356428921232

ggplot(data_diff, aes(x = TW_diff)) +
  geom_density(fill = "#A1D3E8") +
  geom_vline(xintercept = TW_diff_mean, linetype = "dashed", color = "gray40") +
  theme_classic()

Total Words Ratio

TW_ratio: Words(SubjectA) / (Words(SubjectA) + Words(SubjectB))

TW_ratio_SD <- sd(data_diff_ratio$TW_ratio)
cat(c("SD:", TW_ratio_SD))

## SD: 0.0863473224183988

ggplot(data_diff_ratio, aes(x = TW_ratio)) +
  geom_density(fill = "#89CFF0") +
  theme_classic()

Speaking Time Synchrony

Speaking Time Similarity

Absolute Distance Between Speaking Time within Dyad

ST_diff_mean <- mean(data_diff$ST_diff)
ST_diff_med <- median(data_diff$ST_diff)
ST_diff_SD <- sd(data_diff$ST_diff)
cat(c("Mean:", ST_diff_mean, "\nMedian:", ST_diff_med, "\nSD:", ST_diff_SD))

## Mean: 155.996716666667 
## Median: 104.34 
## SD: 139.306805596914

ggplot(data_diff, aes(x = ST_diff)) +
  geom_density(fill = "#BEE3AD") +
  geom_vline(xintercept = ST_diff_mean, linetype = "dashed", color = "gray40") +
  theme_classic()

Speaking Time Ratio

ST_ratio_SD <- sd(data_diff_ratio$ST_ratio)
cat(c("SD:", ST_ratio_SD))

## SD: 0.0740915221037298

ggplot(data_diff_ratio, aes(x = ST_ratio)) +
  geom_density(fill = "#A8D5BA") +
  theme_classic()

Speech Rate Synchorny

Speech Rate Similarity

Absolute Distance Between Speech Rate within Dyad

WPM_diff_mean <- mean(data_diff$WPM_diff)
WPM_diff_med <- median(data_diff$WPM_diff)
WPM_diff_SD <- sd(data_diff$WPM_diff)
cat(c("Mean:", WPM_diff_mean, "\nMedian:", WPM_diff_med, "\nSD:", WPM_diff_SD))

## Mean: 18.5636318826667 
## Median: 16.08311715 
## SD: 13.0876795158243

ggplot(data_diff, aes(x = WPM_diff)) +
  geom_density(fill = "#CBA9DA") +
  geom_vline(xintercept = WPM_diff_mean, linetype = "dashed", color = "gray40") +
  theme_classic()

Speech Rate Ratio

WPM_ratio_SD <- sd(data_diff_ratio$WPM_ratio)
cat(c("SD:", WPM_ratio_SD))

## SD: 0.042719434410943

ggplot(data_diff_ratio, aes(x = WPM_ratio)) +
  geom_density(fill = "#BBA4C3") +
  theme_classic()

# Apply log transformation for diff and ratio
data <- data_diff_ratio |>
  mutate(TW_diff_log = log2(TW_diff),
         TW_ratio_log = log2(TW_ratio),
         TW_ratio_dyad_log = log2(TW_ratio) + log2(TW_ratio_partner),
         ST_diff_log = log2(ST_diff),
         ST_ratio_log = log2(ST_ratio),
         ST_ratio_dyad_log = log2(ST_ratio) + log2(ST_ratio_partner),
         WPM_diff_log = log2(WPM_diff),
         WPM_ratio_log = log2(WPM_ratio),
         WPM_ratio_dyad_log = log2(WPM_ratio) + log2(WPM_ratio_partner))

data |>
  select(iSubject, iDyad, TW, TW_partner, TW_diff, TW_diff_log, TW_ratio, TW_ratio_log, TW_ratio_dyad_log) |>
  head()

##   iSubject   iDyad   TW TW_partner TW_diff TW_diff_log  TW_ratio TW_ratio_log
## 1      101 101_102 2039       1846     193    7.592457 0.5248391   -0.9300528
## 2      102 101_102 1846       2039     193    7.592457 0.4751609   -1.0735120
## 3      103 103_104 1713       2551     838    9.710806 0.4017355   -1.3156823
## 4      104 103_104 2551       1713     838    9.710806 0.5982645   -0.7411445
## 5      105 105_106 1060       1042      18    4.169925 0.5042816   -0.9876984
## 6      106 105_106 1042       1060      18    4.169925 0.4957184   -1.0124074
##   TW_ratio_dyad_log
## 1         -2.003565
## 2         -2.003565
## 3         -2.056827
## 4         -2.056827
## 5         -2.000106
## 6         -2.000106

GOI VS Speech Synchrony: Correlation Matrix

# Calculate the correlation matrix and p-values
#corr_results <- corr(as.matrix(data |>
#  select(GOI, TW_diff, TW_ratio, ST_diff, ST_ratio, WPM_diff, WPM_ratio)))

#corr_matrix <- corr_results$r
#p_matrix <- corr_results$P

# Plot the heatmap with correlation coefficients and p-values using a built-in palette
#corrplot(corr_matrix, method = "color", col = colorRampPalette(brewer.pal(n = 8, name = "RdYlBu"))(200),
#         type = "upper", p.mat = p_matrix, sig.level = c(0.001, 0.01, 0.05), 
#         insig = "label_sig", addCoef.col = "black", number.cex = 0.9,
#         tl.col = "black", tl.srt = 45, pch.cex = 0.7)

Model Selection

In this section, we investigate how speech characteristics contribute to the variability in interaction quality, as measured by GOI scores, across dyads. We begin by constructing baseline models with random intercepts to account for the inherent variation in GOI among different dyads. This approach allows us to first quantify the extent to which differences between dyads influence interaction quality before systematically introducing predictors related to total speech output (TW, ST, WPM) and the differences in speech attributes between dyad members (TW_diff, ST_diff, WPM_diff).

TW Model

model0 <- lmer(GOI ~ (1|iDyad), data = data)
model1 <- lmer(GOI ~ TW + (1|iDyad), data = data)
model2 <- lmer(GOI ~ TW_diff + (1|iDyad), data = data)
model3 <- lmer(GOI ~ TW * TW_diff + (1 | iDyad), data = data)

## Warning: Some predictor variables are on very different scales: consider
## rescaling
## Warning: Some predictor variables are on very different scales: consider
## rescaling

sjPlot::tab_model(model0, model1, model2, model3, show.std = TRUE, show.ci = FALSE)

	GOI			GOI			GOI			GOI
Predictors	Estimates	std. Beta	p	Estimates	std. Beta	p	Estimates	std. Beta	p	Estimates	std. Beta	p	std. p
(Intercept)	68.43	0.00	<0.001	59.97	0.00	<0.001	61.60	0.00	<0.001	46.07	0.13	<0.001	0.234
TW				0.01	0.21	0.029				0.01	0.20	0.014	0.094
TW diff							0.02	0.31	0.003	0.04	0.29	0.001	0.013
TW × TW diff										-0.00	-0.22	0.012	0.012
Random Effects
σ²	224.66			239.03			224.66			230.01
τ₀₀	147.43 _iDyad			111.00 _iDyad			115.51 _iDyad			90.01 _iDyad
ICC	0.40			0.32			0.34			0.28
N	60 _iDyad			60 _iDyad			60 _iDyad			60 _iDyad
Observations	120			120			120			120
Marginal R² / Conditional R²	0.000 / 0.396			0.044 / 0.347			0.094 / 0.402			0.151 / 0.390

ranova(model3)

## ANOVA-like table for random-effects: Single term deletions
## 
## Model:
## GOI ~ TW + TW_diff + (1 | iDyad) + TW:TW_diff
##             npar  logLik    AIC    LRT Df Pr(>Chisq)  
## <none>         6 -530.73 1073.5                       
## (1 | iDyad)    5 -533.12 1076.2 4.7731  1    0.02891 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

TW Report

The mixed-effects analysis shows that the interaction between total words spoken (TW) and the difference in word count between dyad members (TW_diff) significantly impacts interaction quality (GOI). Specifically, the interaction term is negative and significant (Estimate = -0.00, p = 0.012), indicating that while TW_diff generally enhances GOI (Estimate = 0.04, p = 0.001), this effect diminishes as TW increases. The model’s explanatory power is reflected in a marginal R² of 0.151 and a conditional R² of 0.390, showing that the fixed effects explain 15.1% of the variance in GOI, with dyad-specific factors accounting for the rest. The significance of the random intercept for dyads was confirmed by a likelihood ratio test (p = 0.02891).

TW Interaction Effects Visualization

# Function to create interaction plots based on quartile grouping of a moderator variable
# Arguments: data, variable, moderator
create_interaction_plot <- function(data, variable_x, variable_m) {
  # Calculate IQR thresholds for grouping
  iqr_values <- quantile(data[[variable_m]], probs = c(0.25, 0.5, 0.75), na.rm = TRUE)
  iqr_breaks <- c(-Inf, iqr_values[1], iqr_values[2], iqr_values[3], Inf)
  
  # Create groups based on the IQR
  data <- data %>%
    mutate(Quartile_Group = cut(data[[variable_m]],
                                breaks = iqr_breaks,
                                labels = c("Q1", "Q2", "Q3", "Q4")))

  # Generate the interaction plot
  p <- ggplot(data, aes(x = .data[[variable_x]], y = GOI, color = Quartile_Group)) +
    geom_point(alpha = 0.6, size = 3) +
    geom_smooth(method = "lm", aes(group = Quartile_Group), se = FALSE, size = 1) +
    scale_color_viridis_d(option = "magma") +
    labs(title = paste("Moderating Effect of Grouped", variable_m, "on", variable_x, "and GOI"),
         subtitle = "Regression lines for each level of grouped Quartile",
         x = paste(variable_x, "[s]"),
         y = "Interaction Quality (GOI)",
         color = paste("Grouped", variable_m, "Levels")) +
    theme_minimal()

  # Return the plot
  return(p)
}

TW_model <- create_interaction_plot(data, "TW_diff", "TW")

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

print(TW_model)

## `geom_smooth()` using formula = 'y ~ x'

TW Plot Report

The interaction plot visualizes the moderating effect of total words spoken (TW) on the relationship between word count difference (TW_diff) within dyads and the quality of interaction (GOI). As indicated by the regression lines, a nuanced moderation effect emerges across the quartiles of TW. Notably, in the highest quartile (Q4), the interaction shows a negative trend (β = -0.22, p = 0.012), suggesting that dyads with smaller differences in spoken words tend to evaluate their interactions more favorably. This pattern contrasts with the generally positive, though decreasing, trend observed in the lower quartiles (Q1, Q2, and Q3), where higher TW_diff is associated with enhanced interaction quality. These findings highlight the complex role of speech balance within dyads, particularly under varying conditions of total speech output.

ST Model

model0 <- lmer(GOI ~ (1|iDyad), data = data)
model1 <- lmer(GOI ~ ST + (1|iDyad), data = data)
model2 <- lmer(GOI ~ ST_diff + (1|iDyad), data = data)
model3 <- lmer(GOI ~ ST * ST_diff + (1 | iDyad), data = data)

## Warning: Some predictor variables are on very different scales: consider
## rescaling
## Warning: Some predictor variables are on very different scales: consider
## rescaling

sjPlot::tab_model(model0, model1, model2, model3, df.method = "satterthwaite", show.std = TRUE, show.ci = FALSE)

## Warning: Some predictor variables are on very different scales: consider
## rescaling
## Warning: Some predictor variables are on very different scales: consider
## rescaling

	GOI			GOI			GOI			GOI
Predictors	Estimates	std. Beta	p	Estimates	std. Beta	p	Estimates	std. Beta	p	Estimates	std. Beta	p	std. p
(Intercept)	68.43	0.00	<0.001	58.95	0.00	<0.001	61.67	0.00	<0.001	49.30	0.07	<0.001	0.538
ST				0.01	0.21	0.031				0.02	0.15	0.062	0.179
ST diff							0.04	0.31	0.003	0.09	0.30	0.010	0.009
ST × ST diff										-0.00	-0.14	0.091	0.091
Random Effects
σ²	224.66			242.13			224.66			224.71
τ₀₀	147.43 _iDyad			107.28 _iDyad			114.57 _iDyad			107.85 _iDyad
ICC	0.40			0.31			0.34			0.32
N	60 _iDyad			60 _iDyad			60 _iDyad			60 _iDyad
Observations	120			120			120			120
Marginal R² / Conditional R²	0.000 / 0.396			0.044 / 0.338			0.097 / 0.402			0.128 / 0.411

ranova(model3)

## ANOVA-like table for random-effects: Single term deletions
## 
## Model:
## GOI ~ ST + ST_diff + (1 | iDyad) + ST:ST_diff
##             npar  logLik    AIC    LRT Df Pr(>Chisq)  
## <none>         6 -528.28 1068.6                       
## (1 | iDyad)    5 -531.46 1072.9 6.3528  1    0.01172 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ST Report

The analysis reveals that while longer speaking times (ST) have a marginal positive association with GOI (Estimate = 0.01, p = 0.062), the difference in speaking time between dyad members (ST_diff) has a more pronounced positive effect (Estimate = 0.09, p = 0.010). The interaction between ST and ST_diff is not statistically significant (Estimate = -0.00, p = 0.091), suggesting only a potential moderation effect. The model’s marginal R² is 0.128, and the conditional R² is 0.411, indicating that the fixed effects account for 12.8% of the variance in GOI. The random effects highlight significant variability between dyads, with the significance of the random intercept supported by a likelihood ratio test (p = 0.01172).

ST Interaction Effect Visualization

ST_model <- create_interaction_plot(data, "ST_diff", "ST")
print(ST_model)

## `geom_smooth()` using formula = 'y ~ x'

ST Plot Report

The interaction plot illustrates the moderating effect of total speaking time (ST) on the relationship between speaking time difference (ST_diff) within dyads and the quality of interaction (GOI). The regression lines reveal a nuanced moderation effect across the quartiles of ST. In the highest quartile (Q4), the interaction trend is relatively flat, with the interaction term between ST and ST_diff showing a modest negative effect (β = -0.14, p = 0.091). This suggests that larger differences in speaking time do not significantly alter the perceived interaction quality. Conversely, in the lower quartiles (Q1, Q2, and Q3), there is a generally positive trend where increased ST_diff is associated with higher GOI. These results underscore the complexity of speaking time balance within dyads and its variable impact on interaction quality depending on the overall speaking time.

WPM Model

model0 <- lmer(GOI ~ (1|iDyad), data = data)
model1 <- lmer(GOI ~ WPM + (1|iDyad), data = data)
model2 <- lmer(GOI ~ WPM_diff + (1|iDyad), data = data)
model3 <- lmer(GOI ~ WPM * WPM_diff + (1 | iDyad), data = data)

## Warning: Some predictor variables are on very different scales: consider
## rescaling
## Warning: Some predictor variables are on very different scales: consider
## rescaling

sjPlot::tab_model(model0, model1, model2, model3, df.method = "satterthwaite")

## Warning: Some predictor variables are on very different scales: consider
## rescaling
## Warning: Some predictor variables are on very different scales: consider
## rescaling

	GOI			GOI			GOI			GOI
Predictors	Estimates	CI	p	Estimates	CI	p	Estimates	CI	p	Estimates	CI	p
(Intercept)	68.43	64.27 – 72.59	<0.001	56.96	33.40 – 80.52	<0.001	69.68	62.38 – 76.98	<0.001	42.09	-6.65 – 90.82	0.090
WPM				0.08	-0.09 – 0.26	0.329				0.21	-0.15 – 0.57	0.259
WPM diff							-0.07	-0.39 – 0.25	0.677	0.57	-1.13 – 2.27	0.510
WPM × WPM diff										-0.00	-0.02 – 0.01	0.450
Random Effects
σ²	224.66			225.56			224.66			222.84
τ₀₀	147.43 _iDyad			146.10 _iDyad			151.11 _iDyad			156.41 _iDyad
ICC	0.40			0.39			0.40			0.41
N	60 _iDyad			60 _iDyad			60 _iDyad			60 _iDyad
Observations	120			120			120			120
Marginal R² / Conditional R²	0.000 / 0.396			0.008 / 0.398			0.002 / 0.403			0.018 / 0.423

ranova(model3)

## ANOVA-like table for random-effects: Single term deletions
## 
## Model:
## GOI ~ WPM + WPM_diff + (1 | iDyad) + WPM:WPM_diff
##             npar  logLik    AIC    LRT Df Pr(>Chisq)   
## <none>         6 -523.87 1059.7                        
## (1 | iDyad)    5 -529.21 1068.4 10.689  1   0.001077 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

WPM Report

The analysis finds that neither the overall speech rate (WPM) nor the difference in speech rate between dyad members (WPM_diff) significantly impacts interaction quality (GOI). The coefficients for WPM (Estimate = 0.21, p = 0.259) and WPM_diff (Estimate = 0.57, p = 0.510) are not statistically significant, and the interaction term is also non-significant (Estimate = -0.00, p = 0.450). Despite the lack of significant findings, the model’s marginal R² is 0.018, and the conditional R² is 0.423, reflecting that only a small portion of the variance in GOI is explained by the fixed effects, with dyad-specific factors accounting for the majority. The significance of the random intercept was confirmed by a likelihood ratio test (p = 0.001077).

WPM_model <- create_interaction_plot(data, "WPM_diff", "WPM")
print(WPM_model)

## `geom_smooth()` using formula = 'y ~ x'

WPM Plot Report

The interaction plot examines the moderating effect of overall speech rate (WPM) on the relationship between the difference in speech rate (WPM_diff) within dyads and the quality of interaction (GOI). The analysis finds no strong evidence for either a main effect of WPM (β = 0.21, p = 0.259) or WPM_diff (β = 0.57, p = 0.510), nor for an interaction effect between the two (β = -0.00, p = 0.450). These findings suggest that variations in speech rate, both in terms of individual totals and differences between dyad members, do not significantly influence interaction quality.

Discussion

This study explored how various speech characteristics within dyadic interactions—specifically, total words spoken (TW), speaking time (ST), and speech rate (WPM)—influence the perceived quality of the interaction, as measured by the Goodness of Interaction (GOI) questionnaire. The analyses focused on both the individual effects of these speech characteristics and their differences between dyad members (TW_diff, ST_diff, WPM_diff), as well as the potential moderating effects of overall speech output.

Key Findings Total Words (TW) and Word Count Difference (TW_diff): The analysis revealed that the relationship between word count difference (TW_diff) and interaction quality (GOI) is significantly moderated by the total words spoken by dyad members (TW). The interaction effect is particularly evident in the highest quartile (Q4), where dyads with smaller differences in word counts tend to evaluate their interactions more favorably. This suggests that when participants in a dyad both speak more, a smaller difference in the amount of words spoken is associated with higher interaction quality. Conversely, in the lower quartiles (Q1, Q2, and Q3), the relationship between TW_diff and GOI is generally positive, indicating that in less verbose conversations, a larger difference in word count might be perceived more favorably.

Speaking Time (ST) and Speaking Time Difference (ST_diff): Similar to the findings on total words, the difference in speaking time (ST_diff) between dyad members was found to have a positive relationship with interaction quality (GOI). However, the interaction effect between total speaking time (ST) and ST_diff was not statistically significant, suggesting that while speaking time differences do influence interaction quality, this effect does not vary significantly with the overall amount of time spoken.

Speech Rate (WPM) and Speech Rate Difference (WPM_diff): The analysis of speech rate (WPM) and its difference between dyad members (WPM_diff) found no significant impact on interaction quality (GOI). Both the main effects and the interaction terms were not statistically significant, indicating that variations in speech rate, whether absolute or relative between dyad members, do not appear to influence how participants perceive the quality of their interaction.

Implications These findings highlight the nuanced role that speech characteristics play in shaping the perceived quality of social interactions. Specifically, the results suggest that in more verbose conversations (higher TW), maintaining a balance in the amount of speech between participants may be key to fostering positive evaluations of the interaction. This may be because balanced contributions are more likely to be perceived as equitable, thereby enhancing the overall interaction quality.

On the other hand, the lack of significant findings related to speech rate (WPM) implies that while the pace of speech is often considered an important aspect of communication, it may not play a critical role in how interaction quality is assessed in this context. This could be due to the fact that speech rate differences are less noticeable or less impactful in the semi-structured dialogue format used in this study.

Limitations and Future Research One limitation of this study is its reliance on semi-structured dialogues conducted via Zoom, which may not fully capture the dynamics of face-to-face interactions. Additionally, the relatively small sample size (120 dyads) and the exploratory nature of the analysis suggest that these findings should be interpreted with caution and verified in larger, more diverse samples.

Future research could explore these dynamics in different communication contexts, such as in-person interactions or more informal conversational settings, to determine whether these patterns hold across various modes of communication. Moreover, examining other speech characteristics, such as prosody or the use of pauses, could provide a more comprehensive understanding of the factors that contribute to interaction quality.

Conclusion In summary, this study contributes to our understanding of how speech characteristics influence the perceived quality of social interactions. The findings suggest that while the balance of speech within dyads (in terms of word count and speaking time) is important, other factors like speech rate may play a less significant role. These insights could have implications for improving communication strategies in settings where interaction quality is critical, such as in therapy, education, or team collaboration.

Time Domain

Load long format of data, indicating the speaking characteristic within tho four part of the experiment.

data_long <- read.csv("Data/Data_Long.csv")

data_long |>
  select(iSubject, Part, Words, SpeakingTime.s, WPM) |>
  head()

##   iSubject Part Words SpeakingTime.s      WPM
## 1      101    1   480        160.228 179.7439
## 2      101    2   548        235.757 139.4656
## 3      101    3   540        241.493 134.1654
## 4      101    4   471        208.610 135.4681
## 5      102    1   360        206.901 104.3978
## 6      102    2   467        243.549 115.0487

Calculate Similarity

data_long <- data_long |>
  mutate(
    Words_diff = as.numeric(abs(Words - Words_other)),
    ST_diff = abs(SpeakingTime.s - SpeakingTime.s_other),
    WPM_diff = abs(WPM - WPM_other))

data_long |>
  select(iSubject, Part, Words_diff, ST_diff, WPM_diff) |>
  head()

##   iSubject Part Words_diff ST_diff WPM_diff
## 1      101    1        120  46.673 75.34611
## 2      101    2         81   7.792 24.41692
## 3      101    3         51  15.265 22.57914
## 4      101    4         43   6.215 15.92894
## 5      102    1        120  46.673 75.34611
## 6      102    2         81   7.792 24.41692

Time Domain Models

model0_words <- lmer(Words_diff ~ (1|Part), data = data_long)
model0_words_diff <- lmer(Words ~ (1|Part), data = data_long)
model1 <- lmer(GOI ~ Words * Words_diff + (1|Part) , data = data_long)

## Warning: Some predictor variables are on very different scales: consider
## rescaling

## boundary (singular) fit: see help('isSingular')

## Warning: Some predictor variables are on very different scales: consider
## rescaling

sjPlot::tab_model(model0_words, model0_words_diff, model1, df.method = "satterthwaite", show.std = TRUE)

## Warning: Some predictor variables are on very different scales: consider
## rescaling
## Warning: Some predictor variables are on very different scales: consider
## rescaling

## boundary (singular) fit: see help('isSingular')

	Words_diff					Words					GOI
Predictors	Estimates	std. Beta	CI	standardized CI	p	Estimates	std. Beta	CI	standardized CI	p	Estimates	std. Beta	CI	standardized CI	p	std. p
(Intercept)	119.16	0.00	102.35 – 135.98	-0.15 – 0.15	<0.001	368.29	0.00	327.64 – 408.95	-0.21 – 0.21	<0.001	50.50	0.10	45.57 – 55.42	0.01 – 0.19	<0.001	0.037
Words											0.04	0.21	0.03 – 0.05	0.11 – 0.31	<0.001	<0.001
Words diff											0.10	0.26	0.07 – 0.14	0.15 – 0.36	<0.001	<0.001
Words × Words diff											-0.00	-0.19	-0.00 – -0.00	-0.27 – -0.12	<0.001	<0.001
Random Effects
σ²	13229.28					38125.18					326.05
τ₀₀	0.76 _Part					335.33 _Part					0.00 _Part
ICC	0.00					0.01
N	4 _Part					4 _Part					4 _Part
Observations	476					476					476
Marginal R² / Conditional R²	0.000 / 0.000					0.000 / 0.009					0.124 / NA

model0_ST <- lmer(ST_diff ~ (1|Part), data = data_long)

## boundary (singular) fit: see help('isSingular')

model0_ST_diff <- lmer(SpeakingTime.s ~ (1|Part), data = data_long)
model1 <- lmer(GOI ~ SpeakingTime.s * ST_diff + (1|Part) , data = data_long)

## Warning: Some predictor variables are on very different scales: consider
## rescaling

## boundary (singular) fit: see help('isSingular')

## Warning: Some predictor variables are on very different scales: consider
## rescaling

sjPlot::tab_model(model0_ST, model0_ST_diff, model1, df.method = "satterthwaite", show.std = TRUE)

## boundary (singular) fit: see help('isSingular')

## Warning: Some predictor variables are on very different scales: consider
## rescaling
## Warning: Some predictor variables are on very different scales: consider
## rescaling

## boundary (singular) fit: see help('isSingular')

	ST_diff					SpeakingTime.s					GOI
Predictors	Estimates	std. Beta	CI	standardized CI	p	Estimates	std. Beta	CI	standardized CI	p	Estimates	std. Beta	CI	standardized CI	p	std. p
(Intercept)	43.37	-0.00	39.62 – 47.11	-0.09 – 0.09	<0.001	161.76	0.00	144.95 – 178.58	-0.22 – 0.22	<0.001	52.77	0.05	47.53 – 58.01	-0.04 – 0.14	<0.001	0.272
SpeakingTime s											0.07	0.17	0.04 – 0.10	0.07 – 0.26	<0.001	0.001
ST diff											0.23	0.25	0.14 – 0.33	0.15 – 0.35	<0.001	<0.001
SpeakingTime s × ST diff											-0.00	-0.12	-0.00 – -0.00	-0.19 – -0.05	0.001	0.001
Random Effects
σ²	1729.18					5960.46					333.67
τ₀₀	0.00 _Part					61.71 _Part					0.00 _Part
ICC						0.01
N	4 _Part					4 _Part					4 _Part
Observations	476					476					476
Marginal R² / Conditional R²	0.000 / NA					0.000 / 0.010					0.103 / NA