Processing: from raw data to stimulus*voxel response

The goal of this file is to document the pipeline I have worked through to process raw imaging data from the Neural Fingerprint project (nicknamed BRAINVAR at the scanner). The end goal for this processing is to get the data for a single individual into the form: observed_stimulus*voxel_value (where voxel_value is the % change in BOLD for TR6 in each trial). Our experiment had 12 EPIs, each with 24 trials. Each trial had 10, 1600ms TRs (240 TRs per run) and had the following stimulus format:

Stimuli were 96 DallE-generated faces, places, and objects that vary in 5 binary attributes per domain. All domains shared the attributes of time of day (daytime/evening) and location (urban/rural). Additionally, faces varied in age (older/younger)

Our experiment had the following EPI-run design:

Raw data

We collect two sets of data: 1) the psych data captured from the psych experiment they do during the scan, and 2) the brain data captured by the scanner.

The psych data is saved to the experimental computer in the scanning suite (no matter where they are in the scan)– I typically pull these files off the computer via USB after each scanning session. Additionally, psych data files save to OSF if the experiment completes (timestamped). Psych files are also uploaded to our ResearchDrive.

The scanner data is sent to a radiologist before being sent to us (to verify that there are no major medical concerns, which we would have to report to the participant!). Once it’s sent back to us, I download the brain data from the WISPIC server and upload the full set of files to our ResearchDrive: ttrogers/Datasets/KCLab_Data/Fingerprints_Imaging/data/imaging

Psych data files

Raw psych data is stored in a directory on our ResearchDrive: ttrogers/Datasets/KCLab_Data/Fingerprints_Imaging/data/behavioral/

Fairly little preprocessing needs to be done for this data, mostly getting the stimulus trials in the order they were shown. All of this is handled in “get_stimulus_order.R”. This script is currently set to get stimulus order for one participant, but can easily be adapted to pull stimulus order files for all participants (or script can be used individually within each subject folder to match the brain data directory system).

Note: We will also need to process responses from the n-back number task– and accuracy on the n-back will be important. I will leave that for a masters student to take on this subproject! For now we focus on decoding responses from the primary task.

Brain data files

Raw data is stored in directories of the form:

SUBJECT/

  • dicoms/
  • shim_logs/
  • ge_physio/

We only really care about what’s in dicoms! These contain our anatomical and EPI (fMRI) files. Here they are all zipped up.

For the purposes of this tutorial (and my recommendation for the most up-to-date version of this pipeline), use data from PILOT_3!

When you want to use the data, create a file folder structure where the dicoms folder is contained within a folder called “raw”, which is contained within a subject folder:

SUBJECT/raw/dicoms/

To the subject folder, we will also add our preprocessing script documents & directories (as provided by the Postle lab and modified by me)! We need the following preprocessing docs:

  1. raw/convert_dicom_brainvar.sh *
  2. help_anat_brainvar.sh *
  3. 041_fmri_preproc2_brainvar_brainvar.py **
  4. helper_funcs.py **
  5. raw/summary_brainvar.txt *

We’ll also have a folder called “configs” within SUBJECT/ with:

  1. configs/study.details
  2. configs/sub.anat.preproc *
  3. configs/sub.fmri.glm
  4. configs/sub.fmri.mvpa
  5. configs/sub.frmi.preproc *

Some new scripts I’ve added:

  1. extract_voxel_values.sh **
  2. pull_stim_by_voxel_all_epi.R **
  3. get_stimulus_order.R **

And finally, some scripts from the Postle lab that it doesn’t seem like we need at the moment:

  1. align_session2_sm.sh
  2. glm_all.sh
  3. make_masks_inROI.m
  4. 050_make_regs.py ^
  5. 052_warp_tlrc_masks.py ^

Note: * = We use these mostly out-of-the-box; ** = We use these & were written or significantly edited by me; ^ = These work and I’ve run them…but may not be necessary for pipeline

Preprocessing with AFNI & shell scripts

The general steps for preprocessing brain data with AFNI are:

  1. Adjust existing scripts for specific scan details
  2. Unpack dicom files (using convert_dicoms_brainvar.sh)
  3. Create BRIK and HEAD files (using convert_dicoms_brainvar.sh)
  4. Skull stripping (using help_anat_brainvar.sh)
  5. Align to anatomical data (using help_anat_brainvar.sh)
  6. Detrending (using 041_fmri_preproc2_brainvar.py)
  7. Mean percent normalization (using 041_fmri_preproc2_brainvar.py)
  8. Smoothing (using 041_fmri_preproc2_brainvar.py)

Then deviating from the Postle Lab preprocessing a bit for our specific task:

  1. Grab % BOLD change for each voxel in every 6th TR (using pull_6th_tr_vals.sh)
  2. Grab non-zero voxels from 6th TR data (using extract_voxel_values.sh)

And once we’ve got brain data in order we can link stimuli and voxel values in R!:

  1. Grabbing the stimuli order from scanner behavioral task (using get_stimulus_order.R)
  2. Actually linking & writing out stimuli and voxel values (using pull_stim_by_voxel_all_epi.R)

Step 0: Adjust scripts for scan

Before starting, make some small updates to scripting files (by subject):

  • Update convert_dicom_brainvar.sh, help_anat_brainvar and summary_brainvar.txt with appropriate TR info
  • Make sure EPI/BRAVO volume numbers make sense! They can vary by subject which scan is what number
  • Update subject number in summary_brainvar.txt
  • Adjust study.details and sub.fmri.preproc for scan-specific info
  • Make sure code is specified to use the correct anatomy template in sub.ana.preproc

Relevant details for this study (Neural Fingerprints: BRAINVAR)

  • Number of EPIs: 12
  • Number of TRs per EPI: 240
  • Number of Trials per EPI: 24
  • Number of TRs per trial: 10
  • Voxel size: ~1.68mm * ~1.68mm, 2.4mm (with the 2.4mm being the primary dimension)
  • Number of TRs to skip at the beginning of each TR: 5 TRs (during 8-sec fixation)

Steps 1-2: Dicoms→BRIK/HEAD files with convert_dicom_brainvar.sh

After making sure that the run files specified in convert_dicom_brainvar.sh correspond to the EPI files in raw/dicoms/, navigate to the raw/ folder in Terminal and run the following command:

./convert_dicom_brainvar.sh

This command will untar all dicoms, convert them to BRIK/HEAD files (AFNI-style brain images), and move them into a folder it creates called briks/. Because it’s small, I’ll include the full script for PILOT_3 data here:

#!/bin/bash

# Create a directory to store the converted BRIK/HEAD files
mkdir -p briks

# Navigate to the directory containing the DICOM files
cd dicoms/

# Untar all .tgz files
for file in *.tgz; do tar -xvf $file; done
rm *.tgz

# Define the runs (directories) to process ** THIS IS IMPORTANT TO CHECK! **
runs=(s0004.AX_FSPGR_BRAVO s0008.EPI_2.4mm_MB3_x12 s0009.EPI_2.4mm_MB3_x12 s0010.EPI_2.4mm_MB3_x12 s0011.EPI_2.4mm_MB3_x12 s0012.EPI_2.4mm_MB3_x12 s0013.EPI_2.4mm_MB3_x12 s0014.EPI_2.4mm_MB3_x12 s0015.EPI_2.4mm_MB3_x12 s0016.EPI_2.4mm_MB3_x12 s0017.EPI_2.4mm_MB3_x12 s0018.EPI_2.4mm_MB3_x12 s0019.EPI_2.4mm_MB3_x12)
runs_len=${#runs[@]}

# Loop through each run directory
for (( irun=0; irun<${runs_len}; irun++ ))
do
    # Navigate to the current run directory
    cd ${runs[$irun]}
    
    # Extract the run name (e.g., s0009)
    run_name=$(basename ${runs[$irun]} | cut -d. -f1)
    
    # Convert the DICOM files to BRIK/HEAD using Dimon with the run name as the prefix
    Dimon -infile_pattern '*.dcm' -gert_create_dataset -GERT_Reco2AFNI -gert_to3d_prefix "${run_name}"
    
    # Find the generated BRIK/HEAD files and move them to the briks directory
    for file in ${run_name}*+orig.*; do
        # Move the files to the briks directory
        mv "$file" ../../briks
    done
    
    # Navigate back to the dicoms directory
    cd ..
done

echo "Conversion complete. All BRIK/HEAD files are saved in the 'briks' directory."

OUTPUT: When this completes, you should see a new folder (raw/briks/) that contains all your BRIK/HEAD files for each EPI!

Next, we align this data to a standard anatomical template.


Steps 3-4: Skullstrip and align to anatomical with help_anat_brainvar.sh

Now that we have all of our BRIK/HEAD files, we can use the following to do skull-stripping and alignment to the TT_N27+tlrc brain! In Terminal, navigate back out to the SUBJECT/ folder, where help_anat_brainvar.sh lives, then type the following:

./help_anat_brainvar.sh

This script is also short, so I’ll include the full script here:

cp raw/briks/s0004+orig* .
mv s0004+orig.HEAD anat+orig.HEAD
mv s0004+orig.BRIK anat+orig.BRIK

3dSkullStrip -input anat+orig -prefix anat_ns

@auto_tlrc -base /Users/ivette/afni_build/afni_atlases_dist/TT_N27+tlrc -input anat+orig -init_xform AUTO_CENTER

The TT_N27 brain is included in AFNI downloads. Mine is located at /Users/ivette/afni_build/afni_atlases_dist/TT_N27+tlrc on my MacBook Pro (M1). Make sure to adjust the help_anat_brainvar.sh script to point to the correct file on your machine.

OUTPUT: This script will output two new files that have been skull stripped and aligned to the TT_N27 brain.

Note: This is good place to stop and inspect things visually in AFNI!! Did skull stripping work okay? How did the alignment go? To inspect the different files in AFNI, open a Terminal window, navigate into the SUBJECT/folder, and type:

afni

AFNI should open up and will automatically search the folder for files it can display. Try changing the Overlay to the different anatomicals you’ve generated.


Steps 5-7: Preprocessing with 041_fmri_preproc2_brainvar.py

Next we will do the bulk of preprocessing (detrending, normalization, smoothing) with 041_fmri_preproc2_brainvar.py.

This is a Python script and is rather hefty, so I won’t include the full code here, but I would really, really appreciate someone going through and verifying that everything looks correct. This script came mostly out of the box from the Postle Lab… but I did modify it to be compatible with Python 3.10 (it was for Python 2.7).

We for sure do the following, based on SUBJECT/configs/sub.fmri.preproc (file that 041_fmri_preproc2_brainvar.py uses to set parameters):

## fMRI Preprocessing Parameters
EPI[1].DO_PREPROC = YES
EPI[1].PREPROC_RUNS = 1 2 3 4 5 6 7 8 9 10 11 12

# Motion Correction
EPI[1].DO_MOTION = YES
EPI[1].BASE_RUN = 1
EPI[1].IGNORE_FIRST = 5
EPI[1].BASE_TR = 1

# Fieldmap Correction
EPI[1].DO_FIELDMAP_COR = NO 

# Slicetiming Correction
EPI[1].DO_SLICETIMING = YES

# Skull stripping
EPI[1].DO_SKULLSTRIP = YES

# Detrending
EPI[1].DO_DETREND = YES
EPI[1].DETREND_POLORT = 3

# Spatial Smoothing
EPI[1].DO_SMOOTH = YES
EPI[1].SMOOTH_FWHM = 4

# Mean normalization/conversion to Percent signal change
EPI[1].DO_MEAN_NORM = YES

# Remove intermediary files?
EPI[1].REMOVE_INTS = YES

Note: We don’t currently do field map correction in this protocol but that is something to consider! The scans are available if we want to go that route.

Output files generated by 041_fmri_preproc2_brainvar.py

This script uses several prefixes in filenames to indicate different stages of preprocessing during fMRI data analysis. Here’s a breakdown of what each prefix corresponds to:

  • raw_: Refers to the raw data or files that have not been processed yet.
  • al_: Indicates that the file has been aligned, referring to spatial alignment or registration to a standard anatomical space.
  • ss_: Indicates that the file has undergone skull-stripping, where non-brain tissue (such as the skull) has been removed from the image.
  • dt_: Indicates that the file has been detrended, which means removing low-frequency trends from the data to correct for drift over time.
  • tsh_*: Refers to slice timing correction, a preprocessing step that corrects for differences in the acquisition time of different slices.
  • fm_: Likely refers to files that have undergone field map correction, which compensates for magnetic field inhomogeneities.
  • pcnt_: Indicates percentage normalization, a step that normalizes the data to a percentage change scale!
  • sm_: Indicates that the file has been smoothed, typically using a Gaussian kernel to reduce noise and improve signal-to-noise ratio.

Task-specific preprocessing

Next steps: extracting non-zero voxels from the 6th TR of each trial! These steps are separate from the typical processing done by the Postle lab and are accomplished in two steps using two scripts:

  1. Pulling 6th TR values for each EPI using pull_6th_tr_vals.sh
  2. Getting only the non-zero voxel values from the 6th TR data using extract_voxel_values.sh

Note: I made these scripts, so this is another point where I would really, really appreciate someone else’s eyes to make sure I’m doing the right things!


Step 8: Pulling 6th TR from all EPIs

Below is the code inside of pull_6th_tr_vals.sh. Some things:

  • We are grabbing values from the data formatted “sm_pcnt_dt_ss_FULL##_al+orig”, where ## is the EPI number. This is the smoothed, BOLD percent-change, detrended, skull-stripped, aligned version of the data.

  • Note the array in Step 1 with the indices [10, 20, …, 230]. We are grabbing the 6th TR of a trial that started AFTER the first 5 TRs (because the 8 second fixation at the beginning of each EPI = 5 TRs). This means our starting index is 10 (the 11th TR). Because there are 10 TRs in a trial, we go over 10 more indices (20) to grab the 6th TR in the 2nd trial, and so on!

  • This also saves a Nifti version of the files (Step 2; for my own use), but that’s easy to comment out.

To run this, navigate to the SUBJECT/ folder and run:

./pull_6th_tr_vals.sh

Enjoy the luxurious code in pull_6th_tr_vals.sh:

#!/bin/bash

# Loop through each EPI dataset
for i in {1..9}; do
    # Define the input and output filenames
    input_file="sm_pcnt_dt_ss_FULL0${i}_al+orig" # Replace with the actual prefix of your EPI datasets if different
    voxel_values_file="voxel_values_EPI_${i}"
    
    # Step 1: Extract voxel values for every 10th TR, starting with the 11th TR overall (6th TR within each trial)
    3dTcat -prefix ${voxel_values_file} ${input_file}'[10,20,30,40,50,60,70,80,90,100,110,120,130,140,150,160,170,180,190,200,210,220,230]'
    
    # Step 2: Convert the extracted voxel values to NIFTI format
    3dAFNItoNIFTI -prefix ${voxel_values_file}.nii.gz ${voxel_values_file}+orig
    
    # Print a message indicating progress
    echo "Extracted voxel values for EPI ${i}, saved as ${voxel_values_file}+orig"
done

# Loop through each EPI dataset (for datasets 10-12)
for i in {10..12}; do
    # Define the input and output filenames
    input_file="sm_pcnt_dt_ss_FULL${i}_al+orig" # Replace with the actual prefix of your EPI datasets if different
    voxel_values_file="voxel_values_EPI_${i}"
    
    # Step 1: Extract voxel values for every 10th TR, starting with the 11th TR overall (6th TR within each trial)
    3dTcat -prefix ${voxel_values_file} ${input_file}'[10,20,30,40,50,60,70,80,90,100,110,120,130,140,150,160,170,180,190,200,210,220,230]'
    
    # Step 2: Convert the extracted voxel values to NIFTI format
    3dAFNItoNIFTI -prefix ${voxel_values_file}.nii.gz ${voxel_values_file}+orig
    
    # Print progress
    echo "Extracted voxel values for EPI ${i}, saved as ${voxel_values_file}+orig"
done

## Optional Step 3: Combine voxel value datasets across all EPIs
#combined_file="combined_voxel_values"
#3dTcat -prefix ${combined_file} voxel_values_EPI_1+orig voxel_values_EPI_2+orig \
#       voxel_values_EPI_3+orig voxel_values_EPI_4+orig voxel_values_EPI_5+orig \
#       voxel_values_EPI_6+orig voxel_values_EPI_7+orig voxel_values_EPI_8+orig \
#       voxel_values_EPI_9+orig voxel_values_EPI_10+orig voxel_values_EPI_11+orig \
#       voxel_values_EPI_12+orig

## Print combined file save 
#echo "Combined voxel values data from all EPIs, saved as ${combined_file}+orig"

## Optional Step 4: Convert the combined dataset to NIFTI format
#3dAFNItoNIFTI -prefix ${combined_file}.nii.gz ${combined_file}+orig

# Print a completion message :)
echo "Converted combined dataset to NIFTI format, saved as ${combined_file}.nii.gz"

Why is it in two repetitive chunks? Because it pulls the file name for saving, and there are different numbers of 0s in the single and double digit numbers. Is there cleaner way to keep file name consistency? Maybe! But that’s what I got for now.

Step 9: Extracting non-zero voxel values

Now that we have data from just the 6th TRs of each trial, we want to grab only the active voxels in that time! Luckily there is a function in AFNI for this, 3dmaskdump.

Note: Here, I set the tag -nozero to indicate I want ONLY the non-zero voxels. You can change this to use the wholebrain mask and get alllllllll voxel values if you want, but that’s a lot of data!

To run this, go to the SUBJECT/ folder and run:

./extract_voxel_values.sh

Here’s a look inside extract_voxel_values.sh:

#!/bin/bash

# Loop through EPI numbers 1 to 12
for i in {1..12}; do
    # Construct the EPI filename
    EPI_FILE="voxel_values_EPI_${i}+orig"
    
    # Construct the output filename
    OUTPUT_FILE="coords_and_values_nonzero_epi_${i}.txt"
    
    # Run the 3dmaskdump command
    3dmaskdump -nozero -noijk -xyz -o $OUTPUT_FILE $EPI_FILE
    
    echo "Processed EPI ${i}, output saved to ${OUTPUT_FILE}"
done

echo "All EPIs processed."

Linking stimuli & voxel data

Step 10: Get task stimuli ordering using get_stimulus_order.R

We have all our brain data set up– but what do these values correspond to? What were people even looking at? We’re going to pull the ordering of presented stimuli from the scanner experiment using a simple script in R. Because this data will ultimately be used to do classification, I also include a column for what domain the stimulus is from. The final output file will be used in Step 11.

Note: Make sure to update the experiment file for the correct participant’s data– you don’t want to use someone else’s stimuli ordering!

Just edit get_stimulus_order.R in RStudio and run! Here’s get_stimulus_order.R:

# getting stimulus orderings from PILOT 3 experiment
library(tidyverse)
library(data.table)

d <- read_csv("PILOT_3_scanner_task.csv")

img_trials <- d %>% filter(trial_type == "image-keyboard-response") %>% select(stimulus, sequence) 

img_trials <- img_trials %>%
  mutate(stim_category = case_when(
    grepl("woman|man", stimulus) ~ "face",
    grepl("house|church", stimulus) ~ "place",
    grepl("vehicle|furniture", stimulus) ~ "object",
    TRUE ~ NA_character_  # In case none of the conditions match
  ))

img_trials$sequence <- img_trials$sequence + 1
img_trials$run <- rep(x=c(1:12), each=24)
write_csv(x = img_trials, file = "PILOT_3_stimulus_order.csv")

Step 11: Writing out stimuli*voxels and voxel coordinates using pull_stim_by_voxel_all_epi.R

We have our stimuli ordering (and their domain categories). We have our % BOLD signal change from every nonzero voxel during the 6th of TR of each trial. We shall link them and write out the stim*voxel (and voxel coordinates) matrices!

Note: This writes out two files– one combined file for the stim*voxel_responses (combined_stim_vox_all_epi.csv) and one for the xyz coordinates (combined_xyz_nonzero_all_epi.csv) across all EPIs. tbh, the voxel data is big! You can modify the code below to write out separate files for each EPI if you want.

pull_stim_by_voxel_all_epi.R:

library(tidyverse)

# initialize empty data frames for combined results
combined_xyz <- data.frame()
combined_stim_vox <- data.frame()

# loop through EPI numbers 1 to 12
for (i in 1:12) {
  
  # Construct the filenames
  epi_file <- paste0("coords_and_values_nonzero_epi_", i, ".txt")
  
  # Read in the data from the AFNI 3dmaskdump output
  epi_data <- read.delim(epi_file, sep = " ", header = FALSE)
  
  # Get only the coordinates and combine them into a single data frame
  epi_xyz <- epi_data[, 1:3]
  names(epi_xyz) <- c("x", "y", "z")
  combined_xyz <- bind_rows(combined_xyz, epi_xyz)
  
  # Get only the activation values at each voxel and transpose the data
  epi_t <- epi_data %>% select(-V1, -V2, -V3) %>% t()
  
  # Read in the stimulus order data and filter for the corresponding run
  stim <- read_csv("PILOT_3_stimulus_order.csv") %>% //NOTE TO CHANGE THIS TO CORRECT STIM FILE
    filter(run == i) %>%
    select(stimulus, stim_category)
  
  # Combine stimulus data with voxel data
  stim_by_vox <- cbind(stim, epi_t)
  combined_stim_vox <- bind_rows(combined_stim_vox, stim_by_vox)
  
  # Print progress message
  cat("Processed EPI", i, "\n")
}

# Write out the combined results to CSV files
write_csv(combined_xyz, file = "combined_xyz_nonzero_all_epi.csv")
write_csv(combined_stim_vox, file = "combined_stim_vox_all_epi.csv")

cat("All EPIs processed and combined.\n")

And that’s it! At each step, please review what you got out and whether things look odd or confusing. I would LOVE for someone to recreate what I’ve done, and review after each step to ensure we’re doing the right thing. I’m just a grad student Googling AFNI commands and trying my best. :)

Some quirks that could be fixed…

TO DO: Pull CORTEX-ONLY voxels (can be based on anatomical scan). Should be straightforward in AFNI if the anatomical scan is available!

  • The ./convert_dicom_brainvar.sh script doesn’t auto untarball the briks for some reason??? so you have to manually do that for now (it’s just double clicking)

  • The out-of-the-box protocol and script from the Postle lab (041_fmri_preproc2_brainvar.py) just has ya manually delete the intermediary processing files it generates… maybe we can include a cleanup script!

  • I feel like steps 8 & 9 could be done more efficiently/in one script?