In these preprocessing steps, we follow the guidelines of Mathot et al. (2023) and Kret & Sjak Shie (2018) as closely as possible. Below is the general outline of the steps, and following that will be all of the code and explanations for each step.
Remove impossible values
Smooth pupil trace
Calculate velocity
Detect blinks
Interpolate
Detect blinks again
Interpolate again
Smooth final trace
Calculate baseline
Perform subtractive baseline correction
0. Load in data
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(tidyr)library(zoo)
Warning: package 'zoo' was built under R version 4.2.3
Attaching package: 'zoo'
The following objects are masked from 'package:base':
as.Date, as.Date.numeric
library(stringr)
Warning: package 'stringr' was built under R version 4.2.3
setwd("D:/Canna_d/EM6_stuff/Feb28_all_pilot_business/scripts_and_data/processed_data")All_ss_ET <-read.csv("MC_ET_preprocessed_step1.csv", header =TRUE)# Below cleans up the data a littleAll_ss_ET <- All_ss_ET %>%arrange(participant,calFrame) %>%mutate(condsFile =str_replace(condsFile,".csv",""),phase =str_replace(phase,"Start",""),trial=as.numeric(trial),location=row_number())# get mean of L and R pupil, as they are correlated# (Jackson & Sirois, 2009)All_ss_ET <- All_ss_ET %>%mutate(mean_pupil =if_else(is.na(left_pupil_measure1) &is.na(right_pupil_measure1), NA_real_, #If both columns are NA, it will just be NAif_else(is.na(left_pupil_measure1), right_pupil_measure1, #If left is na, use right pupilif_else(is.na(right_pupil_measure1), left_pupil_measure1, #If right is na, use left pupil (left_pupil_measure1 + right_pupil_measure1) /2) #If they are both available, get the mean! )))# organize data into problem and baseline phaseproblem_data <- All_ss_ET %>%filter(phase =="problem")baseline_data <- All_ss_ET %>%filter(phase =="baseline")
1. Remove impossible values
First we remove any values that would be physiologically impossible (negative or outside the range of 1.5 and 9 mm. This happens due to errors in the recording of data by the eyetracker.
problem_data <- problem_data %>%# remove values in mean_pupil that are negative# remove values in mean_pupil that are below 1.5 and above 9# Below does both at once (only includes pupil values within our range)# Keep NA for nowfilter((mean_pupil >=1.5& mean_pupil <=9) |is.na(mean_pupil))
2. Smooth trace
We use a Hanning window as that is what Mathot & Vilotijevic (2023) use to lightly smooth the trace prior to blink correction.
Note that Mathot’s code is for EyeLink 1000 (1000 Hz). The default window length for his smoother is 21, which is about 20 ms. Since we are using a 60 Hz Tobii, this imperfectly translates to using a window length of 3 (for us it would be a window length of 49.8 ms, because this function has a minimum window size of 3 points).
Also note that I am just using ggplot because it makes it easier to visualize all of the participants before and after the smoother but blinks might show up as gaps because ggplot just removes values that don’t fit in the scale.
library(Convolutioner)library(ggplot2)# below applies Hann filtersmoothed_trace_data <- problem_data %>%group_by(participant, condsFile, trial) %>%mutate(counter =row_number()) %>%mutate(smoothed_pupil =Hann(mean_pupil, buffer_size =3))## Visualize# Make a frame counter for each trialsmoothed_trace_data <- smoothed_trace_data %>%group_by(participant, condsFile, trial) %>%mutate(counter =row_number()) %>%ungroup()# Filter the data for trial 1Trace_data_trial1 <- smoothed_trace_data %>%filter(trial ==1&!is.na(d_time))# Plot traces for all participants in trial 1ggplot(Trace_data_trial1, aes(x = counter, y = smoothed_pupil, group = participant, color = participant)) +geom_line(size =1) +# Line plot for each participant's tracefacet_wrap(~ participant, scales ="free_y") +# Facet by participant, with independent y-axislabs(title ="Smoothed pupil - Trial 1 (All Participants)",x ="Frame",y ="Pupil (mm)") +theme_minimal() +theme(legend.position ="none")
ggplot(Trace_data_trial1, aes(x = counter, y = mean_pupil, group = participant, color = participant)) +geom_line(size =1) +# Line plot for each participant's tracefacet_wrap(~ participant, scales ="free_y") +# Facet by participant, with independent y-axislabs(title ="Raw pupil - Trial 1 (All Participants)",x ="Frame",y ="Pupil (mm)") +theme_minimal() +theme(legend.position ="none")
2. Calculate velocity
We calculate the velocity by getting the 1st derivative between time points, in the style of Mathot & Vilotijevic (2023).
smoothed_trace_data <- smoothed_trace_data %>%group_by(participant, condsFile, trial) %>%# make first value NA because we can't get a difference with 1 valuemutate(vtrace =c(NA, diff(smoothed_pupil, lag =1)))
3. Detect blinks
For blink correction, I calculate velocity speed outliers by getting the median absolute deviation and multiplying it by a constant. This method was in Preprocessing pupil size data: Guidelines and code by Kret and Sjak Shie (2019). About the constant, they say it “should be chosen empirically by researchers so that they best fit a particular dataset. It is our experience that no”one size fits all” set of rejection criteria exists, due to differing eyetracker sampling rates, precision, noise susceptibility, and pupil detection algorithms.” From playing around with different values, it seems constants between 5 and 10 do a decent job at removing blink artifacts.
# Calculate the threshold using the MAD methodMAD <-median(abs(smoothed_trace_data$vtrace -median(smoothed_trace_data$vtrace, na.rm =TRUE)), na.rm =TRUE)# Below is the constant we multiply by the MADx <-7threshold <-median(smoothed_trace_data$vtrace, na.rm =TRUE) + (x * MAD)# Identify rows where velocity exceeds the thresholdblink_rows <-which(smoothed_trace_data$vtrace >= threshold)# Make table of how many blinks showed up for each participantblink_data <- smoothed_trace_data %>%filter(row_number() %in% blink_rows) %>%select(participant, condsFile, trial)# checkpointblink_corrected_1 <- smoothed_trace_data# Replace ~50ms of rows around the blink rows with NAfor (i in blink_rows) { start_row <-max(1, i -3) end_row <-min(nrow(blink_corrected_1), i +3) blink_corrected_1$smoothed_pupil[start_row:end_row] <-NA}
4. Interpolate
We will use cubic spline interpolation as per Mathot & Vilotijevic (2023), but if the gap is over 250ms, we do not interpolate.
# Spline interpolation of mean_pupil# but DO NOT interpolate over gaps over 15 rowslibrary(zoo)interpolated_pupil <-na.spline(zoo(blink_corrected_1$smoothed_pupil), maxgap =16)blink_corrected_1$interpolated_pupil <- interpolated_pupil## Visualize# Filter the data for trial 1Trace_data_trial1 <- blink_corrected_1 %>%filter(condsFile =="hardFirst"& trial ==1&!is.na(d_time))# Plot traces for all participants in trial 2ggplot(Trace_data_trial1, aes(x = counter, y = smoothed_pupil, group = participant, color = participant)) +geom_line(size =1) +# Line plot for each participant's tracefacet_wrap(~ participant, scales ="free_y") +# Facet by participant, with independent y-axislabs(title ="Blink corrected (1st iteration) pupil - Trial 1 (All Participants)",x ="Frame",y ="Pupil (mm)") +theme_minimal() +theme(legend.position ="none")
Warning: Removed 415 rows containing missing values or values outside the scale range
(`geom_line()`).
# Same but make it the non-interpolated to compareggplot(Trace_data_trial1, aes(x = counter, y = interpolated_pupil, group = participant, color = participant)) +geom_line(size =1) +# Line plot for each participant's tracefacet_wrap(~ participant, scales ="free_y") +# Facet by participant, with independent y-axislabs(title ="INTERPOLATED pupil - Trial 1 (All Participants)",x ="Frame",y ="Pupil (mm)") +theme_minimal() +theme(legend.position ="none")
Don't know how to automatically pick scale for object of type <zoo>. Defaulting
to continuous.
Warning: Removed 364 rows containing missing values or values outside the scale range
(`geom_line()`).
5. Smooth trace again
Smooth trace again before second round of blink correction.
Go through blink correction a second time (recommendation by Kret & Sjak Shie).
# Calculate the threshold using the MAD method# Only consider rows in smoothed_pupil that are not NAMAD <- blink_corrected_1 %>%filter(!is.na(interpolated_pupil)) %>%# Keep only rows where smoothed_pupil is not NApull(vtrace) %>%# Extract vtrace as a numeric vector { median(abs(. -median(., na.rm =TRUE)), na.rm =TRUE) } # Compute MAD# Below is the constant we multiply by the MADx <-7threshold <-median(blink_corrected_1$vtrace, na.rm =TRUE) + (x * MAD)# Identify rows where velocity exceeds the thresholdblink_rows <-which(blink_corrected_1$vtrace >= threshold)# Make table of how many blinks showed up for each participantblink_data <- blink_corrected_1 %>%filter(row_number() %in% blink_rows) %>%select(participant, trial)
Adding missing grouping variables: `condsFile`
# checkpointblink_corrected_2 <- blink_corrected_1# Replace ~50ms of rows around the blink rows with NAfor (i in blink_rows) { start_row <-max(1, i -3) end_row <-min(nrow(blink_corrected_2), i +3) blink_corrected_2$interpolated_pupil[start_row:end_row] <-NA}
7. Interpolate again
Cubic spline, unless over 250 ms.
# Spline interpolation of mean_pupil# but DO NOT interpolate over gaps over 15 rowsinterpolated_pupil <-na.spline(zoo(blink_corrected_2$interpolated_pupil), maxgap =16)blink_corrected_2$interpolated_pupil2 <- interpolated_pupil## Visualize# Filter the data for trial 1Trace_data_trial1 <- blink_corrected_2 %>%filter(condsFile =="hardFirst"& trial ==1&!is.na(d_time))# Plot traces for all participants in trial 2ggplot(Trace_data_trial1, aes(x = counter, y = smoothed_pupil, group = participant, color = participant)) +geom_line(size =1) +# Line plot for each participant's tracefacet_wrap(~ participant, scales ="free_y") +# Facet by participant, with independent y-axislabs(title ="Blink corrected (2nd iteration) pupil - Trial 1 (All Participants)",x ="Frame",y ="Pupil (mm)") +theme_minimal() +theme(legend.position ="none")
Warning: Removed 368 rows containing missing values or values outside the scale range
(`geom_line()`).
# Same but make it the non-interpolated to compareggplot(Trace_data_trial1, aes(x = counter, y = interpolated_pupil2, group = participant, color = participant)) +geom_line(size =1) +# Line plot for each participant's tracefacet_wrap(~ participant, scales ="free_y") +# Facet by participant, with independent y-axislabs(title ="INTERPOLATED pupil - Trial 1 Trip 1 (All Participants)",x ="Frame",y ="Pupil (mm)") +theme_minimal() +theme(legend.position ="none")
Don't know how to automatically pick scale for object of type <zoo>. Defaulting
to continuous.
Warning: Removed 375 rows containing missing values or values outside the scale range
(`geom_line()`).
The baseline will be the median of the first 200 ms of each problem phase. If there is not at least 50 ms worth of good data here, the baseline is set to NA (and this trial will be excluded).
blink_corrected_2 <- blink_corrected_2 %>%group_by(participant, condsFile, trial) %>%mutate(baseline = {# Select the first 12 rows first_12 <- smoothed_pupil[1:12]# Check if there are at least 3 non-NA valuesif (sum(!is.na(first_12)) >=3) {# Calculate the median of the first 7 rowsmedian(first_12, na.rm =TRUE) } else {# Assign NA if the condition is not metNA } }) %>%ungroup()# see how many trials do not have valid baselinebad_baselines <- blink_corrected_2 %>%group_by(participant, condsFile, trial) %>%filter(all(is.na(baseline))) %>%ungroup()# remove the trials with invalid baselineblink_corrected_2 <- blink_corrected_2 %>%group_by(participant, condsFile, trial) %>%filter(!all(is.na(baseline))) %>%ungroup()# Visualize the baselines:hist(blink_corrected_2$baseline)
10. Subtractive baseline correction
We use subtractive correction as it is the most robust form of baseline correction (Mathot et al., 2018).
#Get change-from-baseline through subtractive correctionbaseline_corrected_data <- blink_corrected_2 %>%mutate(change_from_baseline =ifelse(is.na(smoothed_pupil), NA, smoothed_pupil - baseline))## Visualize# Filter the data for trial 1Trace_data_trial1 <- baseline_corrected_data %>%filter(condsFile =="hardFirst"& trial ==1&!is.na(d_time))ggplot(Trace_data_trial1, aes(x = counter, y = smoothed_pupil, group = participant, color = participant)) +geom_line(size =1) +# Line plot for each participant's tracefacet_wrap(~ participant, scales ="free_y") +labs(title ="Smoothed pupil - Trial 1 (All Participants)",x ="Frame",y ="Pupil (mm)") +theme_minimal() +theme(legend.position ="none")
Warning: Removed 55 rows containing missing values or values outside the scale range
(`geom_line()`).
ggplot(Trace_data_trial1, aes(x = counter, y = change_from_baseline, group = participant, color = participant)) +geom_line(size =1) +# Line plot for each participant's tracefacet_wrap(~ participant, scales ="free_y") +labs(title ="Change from baseline - Trial 1 (All Participants)",x ="Frame",y ="Pupil (mm)") +theme_minimal() +theme(legend.position ="none")
Warning: Removed 55 rows containing missing values or values outside the scale range
(`geom_line()`).