————————————-Import Data————————————-
print(
all_raw_data %>%
count(video_attention_failure_type)
)
## # A tibble: 3 × 2
## video_attention_failure_type n
## <chr> <int>
## 1 no_boundary_press 2
## 2 rapid_successive_press 14
## 3 <NA> 1856
————————————-Clean Data————————————-
segmentation_data <- clean_data %>%
filter(trial_kind == "segmentation_video") %>%
select(
-any_of(c(
"trial_type", "time_elapsed", "PROLIFIC_PID", "trial_index", "trial_kind", "pair_number",
"confidence_rating"))
)
dim(segmentation_data)
## [1] 772 11
———————————Descriptive Data———————————
Average NoB
ggplot(button_count_long,
aes(x = predictability, y = mean_button_count, group = stimulus_name)
) +
geom_line(alpha = 0.4, color = "grey60") +
geom_point(aes(color = predictability), size = 2.5) +
stat_summary(aes(group = 1), fun = mean, geom = "line", linewidth = 1.2, color = "black"
) +
stat_summary(aes(group = 1), fun = mean, geom = "point", size = 3.5, color = "black"
) +
labs(x = NULL, y = "Mean NoB", title = "Mean NoB Across Predictability Conditions"
) +
theme_minimal(base_size = 14) +
theme(legend.position = "none", plot.title = element_text(face = "bold"), plot.subtitle = element_text(color = "grey40")
)

Variance
ggplot(consensus_long,
aes(x = predictability, y = var_boundary_count, group = stimulus_name)
) +
geom_line(alpha = 0.4, color = "grey60") +
geom_point(aes(color = predictability), size = 2.5) +
stat_summary(aes(group = 1), fun = mean, geom = "line", linewidth = 1.2, color = "black"
) +
stat_summary(aes(group = 1), fun = mean, geom = "point", size = 3.5, color = "black"
) +
labs(x = NULL, y = "Variance of NoB", title = "Within-Video Variability Across Predictability Conditions"
) +
theme_minimal(base_size = 14) +
theme(legend.position = "none", plot.title = element_text(face = "bold"), plot.subtitle = element_text(color = "grey40")
)

Paired-Sample T-Test on Variance
- Do participants disagree more about how many boundaries there are in
unpredictable videos compared with predictable videos?
t.test(consensus_wide$Unpredictable, consensus_wide$Predictable, paired = TRUE)
##
## Paired t-test
##
## data: consensus_wide$Unpredictable and consensus_wide$Predictable
## t = -0.85862, df = 29, p-value = 0.3976
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -0.4862357 0.1986927
## sample estimates:
## mean difference
## -0.1437715
- A paired-samples t-test is used as the raw participant-level
observations is collapsed into one variance estimate per
video-condition. This paired-samples t-test comparing the within-video
variance of boundary counts between predictable and unpredictable
versions revealed no significant difference, t(29) = −0.86, p = .40, 95%
CI [−0.49, 0.20]. Thus, even if unpredictability changes event
structure, participants remain similarly consistent in the number of
boundaries they perceive. However, this analysis concerns agreement in
the number of perceived boundaries and does not address whether
predictability influences agreement in the temporal locations of those
boundaries
—————————–Mixed Effect Regression—————————–
MEM for the effect of Predictability on NoB
- After accounting for individual differences in segmentation
tendencies and differences among videos, does predictability affect the
average number of boundaries?
segmentation_data <- segmentation_data %>%
mutate(boundary_count = as.numeric(boundary_count))
MEM_mean_Gaussian <- lmer(boundary_count ~ predictability + (1 | run_id) + (1 | stimulus_name), data = segmentation_data)
summary(MEM_mean_Gaussian)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: boundary_count ~ predictability + (1 | run_id) + (1 | stimulus_name)
## Data: segmentation_data
##
## REML criterion at convergence: 3816.7
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.4476 -0.5625 0.0042 0.5581 4.8587
##
## Random effects:
## Groups Name Variance Std.Dev.
## stimulus_name (Intercept) 3.929 1.982
## run_id (Intercept) 17.349 4.165
## Residual 6.829 2.613
## Number of obs: 772, groups: stimulus_name, 30; run_id, 13
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 10.1588 1.2179 14.5616 8.341 6.34e-07 ***
## predictabilityUnpredictable -0.2880 0.1882 729.0515 -1.530 0.126
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## prdctbltyUn -0.077
anova(MEM_mean_Gaussian)
## Type III Analysis of Variance Table with Satterthwaite's method
## Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
## predictability 15.995 15.995 1 729.05 2.3424 0.1263
- A mixed-effects model predicting boundary count from predictability,
with random intercepts for participant and stimulus, revealed no
significant effect of predictability, F(1, 729.05) = 2.34, p = .126.
Participants marked, on average, 0.29 fewer boundaries in the
unpredictable condition relative to the predictable condition (β =
−0.29, SE = 0.19). Random effects indicated substantial variability
across participants (SD = 4.17) and, to a lesser extent, across stimuli
(SD = 1.98).
MEM with Negative Biomodal Distribution
mean(segmentation_data$boundary_count, na.rm = TRUE)
## [1] 10.00518
var(segmentation_data$boundary_count, na.rm = TRUE)
## [1] 26.57325
MEM_mean_NB <- glmer.nb(boundary_count ~ predictability + (1 | run_id) + (1 | stimulus_name), data = segmentation_data)
## Warning in theta.ml(Y, mu, weights = object@resp$weights, limit = limit, :
## iteration limit reached
summary(MEM_mean_NB)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: Negative Binomial(859060.9) ( log )
## Formula: boundary_count ~ predictability + (1 | run_id) + (1 | stimulus_name)
## Data: segmentation_data
##
## AIC BIC logLik -2*log(L) df.resid
## 3683.0 3706.3 -1836.5 3673.0 767
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.2715 -0.4557 -0.0141 0.4544 2.1926
##
## Random effects:
## Groups Name Variance Std.Dev.
## stimulus_name (Intercept) 0.04175 0.2043
## run_id (Intercept) 0.17996 0.4242
## Number of obs: 772, groups: stimulus_name, 30; run_id, 13
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.21501 0.11912 18.596 <2e-16 ***
## predictabilityUnpredictable -0.02924 0.02268 -1.289 0.197
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## prdctbltyUn -0.084
- The Negative Binomial Distribution reached the same conclusion as
the Gaussian Distribution, indicating that unpredictability was
associated with a non-significant 2.9% reduction in boundary counts (z =
−1.29, p = .197). Thus, the absence of a predictability effect was
robust across modeling assumptions.
MEM with Poisson Distribution
model_pois <- glmer(boundary_count ~ predictability + (1 | run_id) + (1 | stimulus_name), family = poisson,
data = segmentation_data)
summary(model_pois)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: poisson ( log )
## Formula: boundary_count ~ predictability + (1 | run_id) + (1 | stimulus_name)
## Data: segmentation_data
##
## AIC BIC logLik -2*log(L) df.resid
## 3681.0 3699.6 -1836.5 3673.0 768
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.2716 -0.4557 -0.0141 0.4544 2.1926
##
## Random effects:
## Groups Name Variance Std.Dev.
## stimulus_name (Intercept) 0.04175 0.2043
## run_id (Intercept) 0.17996 0.4242
## Number of obs: 772, groups: stimulus_name, 30; run_id, 13
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.21499 0.12458 17.780 <2e-16 ***
## predictabilityUnpredictable -0.02924 0.02274 -1.286 0.198
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## prdctbltyUn -0.090
overdispersion_ratio <-
sum(residuals(model_pois, type = "pearson")^2) / df.residual(model_pois)
overdispersion_ratio
## [1] 0.4980728
- After accounting for participant and stimulus effects, there is only
half as much residual variation as Poisson would expect.