Created: 2/17/25
To-do:
- [Assess your results compared to the literature {basically done!}] And!, write this up into one nice tight paragraph! Suitable to pasting directly into your nascent paper! ✅
- Write up the preprocessing steps in as much detail as necessary (see Jamie’s paper for ideas on how it should sound) ✅
- Double-check your code itself for errors. ✅
- Share all the processing steps (smoothing, interpolation, averaging, baseline period, etc.) so we can vet together against “best-practices”. ✅
- Let’s confirm/adjust that what’s on-screen right before and during the mental math conditions is identical, and nominally isoluminant. (we need to do this for the other conditions too!!) (⚠️Need to do)
- Let’s confirm that the heat maps of fixations during the two mental math conditions are grossly similar. Actually, maybe the easiest way to do this is to confirm that the AOI of the whole screen has the same amount of time in the two conditions. It makes sense to confirm this for the other conditions as well, for the period where we’d be looking at the pupil. ✅
- Make a baseline-corrected trace and raw trace of everyone, for every trial ✅
1. Write-up for our results
[Assess your results compared to the literature {basically done!}] And!, write this up into one nice tight paragraph! Suitable to pasting directly into your nascent paper!
We are assessing the phasic task-evoked pupillary response (TEPR), which is the fast (occurs as quickly as 220 ms) change in pupil size that occurs due to task demands or cognitive load. The TEPR occurs in a window typically between 500 ms and 3000 ms after stimulus onset, which our experimental design is suitable to detect (additionally, the time between subsequent periods where we will be assessing the TEPR is always at least 10 seconds, which is more than enough time for the TEPR to subside between measurements). In previous studies, performing mental arithmetic typically results in peak pupil dilation ranging from 0.2 mm to 0.5 mm (Marquart & de Winter, 2015; Rozado & Dunser, 2015). In our study, we observed on average a 0.38 mm dilation for difficult addition equations (requires carry-over operations) and a 0.13 mm dilation for easy addition equations.
Mental calc:
We expected that the pupil diameter during the period after the addition equation appears on the screen (when participants are mentally calculating their answer) would be larger for difficult addition equations than for easier addition equations. To test this, we calculated the mean pupil diameter over the last 4000 ms of the mental calculation period (after the equation appears on the screen and before the answer choices appear, 5000 ms in total). We compared the means for the difficult and easy addition problems with a paired-samples t-test.
A paired-samples t-test comparing the mean TEPR in the mental calculation period between difficult and easy addition equations was significant: t(22) = -3.25, p = 0.004, Cohen’s d = __. The pupil diameter for difficult addition equations was 0.27 mm higher compared to easy addition equations.
Shopping Game:
We expected that the pupil diameter for 4th-quartile memorizers would be higher than the pupil diameter in the 1st-quartile memorizers. To test this, we calculated the mean pupil diameter over the last 4000 ms of the first maintenance-delay period going from the list to the shelf over all trials in the maintenance-delay condition for each participant. We compared the means for 4th-quartile memorizers and 1st-quartile memorizers with an independent samples t-test.
[We only have ~10 in each group even when I make 2 quartiles, and that isn’t enough power when we average the TEPRs for each participant to run a t-test, but below shows the results of the TOST test, which is non-significant, which means that there might be a meaningful effect, we just don’t have power. Also note that for this I am just using 2 quartiles, but in the real study when we have lots of participants, we can do 1st and 4th, although the 2 quartiles did make a meaningful separation (1st quartile is 5 clicks and under, 2nd quartile is over 5, so it does capture all the heavy loaders plus those who try to heavy load but get some wrong.]
Draft: An independent samples t-test comparing the mean TEPR in the maintenance-delay period between 4th-quartile memorizers and 1st-quartile memorizers in the maintenance-delay condition was significant: t(__) = __, p = __, Cohen’s d = __. The pupil diameter for 4th-quartile memorizers was __ mm higher compared to the 1st-quartile memorizers.
Code for above results:
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
`summarise()` has grouped output by 'participant', 'difficulty'. You can
override using the `.groups` argument.
Paired t-test
data: avg_per_Ss_wide$easy and avg_per_Ss_wide$hard
t = -3.25, df = 22, p-value = 0.003672
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
-0.44151711 -0.09754086
sample estimates:
mean difference
-0.269529
[1] "AVG TEPR for quartile 1"
[1] "AVG TEPR for quartile 2"
[1] "SD TEPR for quartile 1"
[1] "SD TEPR for quartile 2"
TOST results:
t-value lower bound: 0.178 p-value lower bound: 0.430
t-value upper bound: -1.65 p-value upper bound: 0.058
degrees of freedom : 18.7
Equivalence bounds (Cohen's d):
low eqbound: -0.4
high eqbound: 0.4
Equivalence bounds (raw scores):
low eqbound: -0.0556
high eqbound: 0.0556
TOST confidence interval:
lower bound 90% CI: -0.15
upper bound 90% CI: 0.06
NHST confidence interval:
lower bound 95% CI: -0.172
upper bound 95% CI: 0.083
Equivalence Test Result:
The equivalence test was non-significant, t(18.7) = 0.178, p = 0.430, given equivalence bounds of -0.0556 and 0.0556 (on a raw scale) and an alpha of 0.05.
Null Hypothesis Test Result:
The null hypothesis test was non-significant, t(18.7) = -0.737, p = 0.470, given an alpha of 0.05.
NHST: don't reject null significance hypothesis that the effect is equal to 0
TOST: don't reject null equivalence hypothesis
2. Write-up for the pre-processing steps I used
Write up the preprocessing steps in as much detail as necessary (see Jamie’s paper for ideas on how it should sound)
[Updated 2/22/25]
We took reference from Kret and Sjak-Shie (2019) and Mathot and Vilotijevic (2022) when pre-processing our data as they provide an open-source and best-practices approach that translates well into a wider range of data formats than many pre-built pre-processing pipelines [NOTE my opinion is that pupillometryR is a black box for beginners and its scary easy to mess things up without knowing you messed things up. Happy to try again with it though]. First, we averaged the left and the right pupil diameters for every time point. To correct for blinks, artifacts, gaps in the data and nonuniform sampling, we calculated the absolute change between samples divided by the temporal separation of the samples to obtain the normalized dilation speed for each time point. We then use the Median Absolute Deviation (Leys, Ley, Klein, Bernard, & Licata, 2013) in the following equation to get a threshold for determining blinks and artifacts, where d’ is the change-in-dilation-speed and n is a constant, which we set to 7 based on blink detection performance prior to any data analysis or inference [NOTE that Kret and Sjak Shie say this about the constant: “For the best results, the parameters of the filtering approach introduced in this section, such as n in Eq. 3, should be chosen empirically by researchers so that they best fit a particular dataset. It is our experience that no”one size fits all” set of rejection criteria exists, due to differing eyetracker sampling rates, precision, noise susceptibility, and pupil detection algorithms.”]:
\(Threshold = median(d') + n∙MAD\)
The time points with dilation speeds above the threshold, as well as ~35 ms surrounding the time point, are removed. After this blink and artifact correction procedure, we use cubic spline interpolation on gaps no larger than 250 ms. Data were then filtered with a moving window (window width was set to eleven samples, ~185 ms) median filter. Between each of these steps, we visually inspected the pupil traces. We then performed subtractive baseline correction to the median pupil diameter across all time points (Mathôt et al., 2018). All further references to pupil diameter will refer to these final, baseline-corrected values.
3. Check code for errors
Double-check your code itself for errors
Before making this quarto document, I went through the preprocessing code for the shopping game itself, and then for the mental calculation task. The data and scripts presented here are based off what is so-far the most refined and commented versions. No promises that errors don’t exist - but I honestly did not see any that haven’t been taken care of.
4. Share pre-processing steps
Share all the processing steps (smoothing, interpolation, averaging, baseline period, etc.) so we can vet together against “best-practices”.
[Updated 2/22/25]
See the mentioned scripts below in this project folder to look at the process in detail. I especially commented in the markdown file MC_ET_preprocessing_step2.Rmd
with the aim that it would make sense to an outsider. That markdown file contains all of the preprocessing on the actual pupil data itself. “Step1” was just combining individual participant data files into one data set. Anyways, here is an overview:
- Each participant’s data file is individually loaded in its own R script. You can find that in the folder:
All_preprocessing\raw_data_and_scripts
. Each script extracts the eyetracking data and the experiment data from the hdf5 file and the csv file and combines them. It also checks the alignment of the frames. Those scripts don’t do any preprocessing to the eyetracking data itself, they are just extracting the information we need from the 2 files that PsychoPy spits out when a participant completes the experiment.
- The file
MC_ET_preprocessing_step1.R
combines each participant’s data into one dataset and saves it as the csv MC_ET_preprocessed_step1.csv
. It also contains the old pupil preprocessing script, but we are not using that one - instead we will go to the file MC_ET_preprocessing_step2.Rmd
- In the file
MC_ET_preprocessing_step2.Rmd
, the pupil data is preprocessed. First the mean of the left and right pupil is calculated, then data quality is assessed and visualized. Then blink correction using MAD is performed, and then data quality is assessed and visualized a second time. Then cubic interpolation is used on gaps no larger than 250 ms. The data is visualized after interpolation. A running median filter with k = 11 is applied to the pupil data, using the stats package and the runmed()
function. After visualizing the effect of the filter, the baseline is calculated as the median of the first 115ms of the problem phase (the first 115ms right when the problem appears on the screen) and the baseline is subtracted from the pupil value. This data is then saved into the file MC_ET_preprocessed_step2.csv
. These steps are repeated for the eye tracking data in the maintenance delay period in the Shopping Game.
- That is the end of preprocessing - the file
MC_ET_preprocessed_step2.csv
can now be used for plotting.
5. Confirm isoluminance
Let’s confirm/adjust that what’s on-screen right before and during the mental math conditions is identical, and nominally isoluminant. (we need to do this for the other conditions too!!)
I am thinking here that I could demonstrate what is shown on the screen, and report what the luminance meter says about each?
6. Confirm participant gaze w/ heatmaps + AOI comparisons
Let’s confirm that the heat maps of fixations during the two mental math conditions are grossly similar. Actually, maybe the easiest way to do this is to confirm that the AOI of the whole screen has the same amount of time in the two conditions. It makes sense to confirm this for the other conditions as well, for the period where we’d be looking at the pupil.
It’s actually easier to look at the heatmap, so we’ll do that first.
Overall gaze positions:
Warning: package 'tidyverse' was built under R version 4.2.3
Warning: package 'ggplot2' was built under R version 4.2.3
Warning: package 'stringr' was built under R version 4.2.3
Warning: package 'forcats' was built under R version 4.2.3
Warning: package 'lubridate' was built under R version 4.2.3
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ forcats 1.0.0 ✔ purrr 0.3.4
✔ ggplot2 3.5.1 ✔ stringr 1.5.0
✔ lubridate 1.9.2 ✔ tibble 3.1.8
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Gaze positions for hard difficulty:
Gaze positions for easy difficulty:
Looking good. For the AOI thing, the whole screen is not really set up as an AOI, but we can approximate that by looking at how much solid eyetracking data is in each difficulty condition. We can compare the NAs, but we can also compare the x and y gaze positions that are within -1.0 and 1.0 (outside of those ranges would be off-screen).
First, let’s compare the NA percentages between the easy and hard conditions.
Percentage of NA values in hard condition:
# A tibble: 1 × 3
total na_count na_percentage
<int> <int> <dbl>
1 32688 2039 6.24
Percentage of NA values in easy condition:
# A tibble: 1 × 3
total na_count na_percentage
<int> <int> <dbl>
1 32066 2440 7.61
They have similar percentages of NA values, which is good. Now let’s compare how much of the gaze position is within the screen versus outside of the screen, for each difficulty level.
Percent of off-screen gaze in hard condition:
# A tibble: 1 × 3
total offscreen_count offscreen_percentage
<int> <int> <dbl>
1 32702 2 0.00612
Percent of off-screen gaze in easy condition:
# A tibble: 1 × 3
total offscreen_count offscreen_percentage
<int> <int> <dbl>
1 32091 4 0.0125
Don’t be too surprised by these low values. When the participant’s gaze is off-screen, the eyetracker is way more likely to not get the pupil data, and the above is based off the data that is artifact and blink corrected, so that could have definitely accounted for a lot of the off-screen looks. That’s fine, as the above shows us that the plots and analyses were done on gazes that were primarily on-screen.
7. Traces for everyone!
Make a baseline-corrected trace and raw trace of everyone, for every trial
First, let’s look at the raw traces. We will have to go trial by trial. Let’s do easy trials first.
Note that the y-axes will be different! Because the pupil ranges between participants between ~2.7 and 5.7, setting a wide y-axis to capture everyone’s trace would make it impossible to see the shape of their trace (for most people it would be super zoomed out!).
Trial 2 raw (easy)
Trial 4 raw (easy)
Trial 6 raw (easy)
Trial 8 raw (easy)
Trial 10 raw (easy)
Now let’s do hard trials.
Trial 1 raw (hard)
Trial 3 raw (hard)
Trial 5 raw (hard)
Trial 7 raw (hard)
Trial 9 raw (hard)
Great! Now let’s look at the baseline corrected traces, same way, for each participant, easy first:
Trial 2 baseline corrected (easy)
Trial 4 baseline corrected (easy)
Trial 6 baseline corrected (easy)
Trial 8 baseline corrected (easy)
Trial 10 baseline corrected (easy)
OK. Now for the hard condition:
Trial 1 baseline corrected (hard)
Trial 3 baseline corrected (hard)
Trial 5 baseline corrected (hard)
Trial 7 baseline corrected (Hard)
Trial 9 baseline corrected (Hard)
Now let’s just look at the averaged trace across participants for easy and hard trials.