The goal of this file is to document the pipeline I have worked
through to process raw imaging data from the Neural Fingerprint project
(nicknamed BRAINVAR at the scanner). The end goal for this processing is
to get the data for a single individual into the form:
observed_stimulus*voxel_value (where voxel_value is the % change in BOLD
for TR6 in each trial). Our experiment had 12 EPIs, each with 24 trials.
Each trial had 10, 1600ms TRs (240 TRs per run) and had the following
stimulus format:
Stimuli were 96 DallE-generated faces, places, and objects that vary in 5 binary attributes per domain. All domains shared the attributes of time of day (daytime/evening) and location (urban/rural). Additionally, faces varied in age (older/younger)
Our experiment had the following EPI-run design:
We collect two sets of data: 1) the psych data captured from the psych experiment they do during the scan, and 2) the brain data captured by the scanner.
The psych data is saved to the experimental computer in the scanning suite (no matter where they are in the scan)– I typically pull these files off the computer via USB after each scanning session. Additionally, psych data files save to OSF if the experiment completes (timestamped). Psych files are also uploaded to our ResearchDrive.
The scanner data is sent to a radiologist before being sent to us (to verify that there are no major medical concerns, which we would have to report to the participant!). Once it’s sent back to us, I download the brain data from the WISPIC server and upload the full set of files to our ResearchDrive: ttrogers/Datasets/KCLab_Data/Fingerprints_Imaging/data/imaging
Raw psych data is stored in a directory on our ResearchDrive: ttrogers/Datasets/KCLab_Data/Fingerprints_Imaging/data/behavioral/
Fairly little preprocessing needs to be done for this data, mostly getting the stimulus trials in the order they were shown. All of this is handled in “get_stimulus_order.R”. This script is currently set to get stimulus order for one participant, but can easily be adapted to pull stimulus order files for all participants (or script can be used individually within each subject folder to match the brain data directory system).
Note: We will also need to process responses from the n-back number task– and accuracy on the n-back will be important. I will leave that for a masters student to take on this subproject! For now we focus on decoding responses from the primary task.
Raw data is stored in directories of the form:
SUBJECT/
We only really care about what’s in dicoms! These contain our
anatomical and EPI (fMRI) files. Here they are all zipped up.
For the purposes of this tutorial (and my recommendation for the most up-to-date version of this pipeline), use data from PILOT_3!
When you want to use the data, create a file folder structure where the dicoms folder is contained within a folder called “raw”, which is contained within a subject folder:
SUBJECT/raw/dicoms/
To the subject folder, we will also add our preprocessing script documents & directories (as provided by the Postle lab and modified by me)! We need the following preprocessing docs:
We’ll also have a folder called “configs” within SUBJECT/ with:
Some new scripts I’ve added:
And finally, some scripts from the Postle lab that it doesn’t seem like we need at the moment:
Note: * = We use these mostly out-of-the-box; ** = We use these & were written or significantly edited by me; ^ = These work and I’ve run them…but may not be necessary for pipeline
The general steps for preprocessing brain data with AFNI are:
Then deviating from the Postle Lab preprocessing a bit for our specific task:
And once we’ve got brain data in order we can link stimuli and voxel values in R!:
Before starting, make some small updates to scripting files (by subject):
Relevant details for this study (Neural Fingerprints: BRAINVAR)
After making sure that the run files specified in convert_dicom_brainvar.sh correspond to the EPI files in raw/dicoms/, navigate to the raw/ folder in Terminal and run the following command:
./convert_dicom_brainvar.sh
This command will untar all dicoms, convert them to BRIK/HEAD files (AFNI-style brain images), and move them into a folder it creates called briks/. Because it’s small, I’ll include the full script for PILOT_3 data here:
#!/bin/bash
# Create a directory to store the converted BRIK/HEAD files
mkdir -p briks
# Navigate to the directory containing the DICOM files
cd dicoms/
# Untar all .tgz files
for file in *.tgz; do tar -xvf $file; done
rm *.tgz
# Define the runs (directories) to process ** THIS IS IMPORTANT TO CHECK! **
runs=(s0004.AX_FSPGR_BRAVO s0008.EPI_2.4mm_MB3_x12 s0009.EPI_2.4mm_MB3_x12 s0010.EPI_2.4mm_MB3_x12 s0011.EPI_2.4mm_MB3_x12 s0012.EPI_2.4mm_MB3_x12 s0013.EPI_2.4mm_MB3_x12 s0014.EPI_2.4mm_MB3_x12 s0015.EPI_2.4mm_MB3_x12 s0016.EPI_2.4mm_MB3_x12 s0017.EPI_2.4mm_MB3_x12 s0018.EPI_2.4mm_MB3_x12 s0019.EPI_2.4mm_MB3_x12)
runs_len=${#runs[@]}
# Loop through each run directory
for (( irun=0; irun<${runs_len}; irun++ ))
do
# Navigate to the current run directory
cd ${runs[$irun]}
# Extract the run name (e.g., s0009)
run_name=$(basename ${runs[$irun]} | cut -d. -f1)
# Convert the DICOM files to BRIK/HEAD using Dimon with the run name as the prefix
Dimon -infile_pattern '*.dcm' -gert_create_dataset -GERT_Reco2AFNI -gert_to3d_prefix "${run_name}"
# Find the generated BRIK/HEAD files and move them to the briks directory
for file in ${run_name}*+orig.*; do
# Move the files to the briks directory
mv "$file" ../../briks
done
# Navigate back to the dicoms directory
cd ..
done
echo "Conversion complete. All BRIK/HEAD files are saved in the 'briks' directory."
OUTPUT: When this completes, you should see a new folder (raw/briks/) that contains all your BRIK/HEAD files for each EPI!
Next, we align this data to a standard anatomical template.
Now that we have all of our BRIK/HEAD files, we can use the following to do skull-stripping and alignment to the TT_N27+tlrc brain! In Terminal, navigate back out to the SUBJECT/ folder, where help_anat_brainvar.sh lives, then type the following:
./help_anat_brainvar.sh
This script is also short, so I’ll include the full script here:
cp raw/briks/s0004+orig* .
mv s0004+orig.HEAD anat+orig.HEAD
mv s0004+orig.BRIK anat+orig.BRIK
3dSkullStrip -input anat+orig -prefix anat_ns
@auto_tlrc -base /Users/ivette/afni_build/afni_atlases_dist/TT_N27+tlrc -input anat+orig -init_xform AUTO_CENTER
The TT_N27 brain is included in AFNI downloads. Mine is located at /Users/ivette/afni_build/afni_atlases_dist/TT_N27+tlrc on my MacBook Pro (M1). Make sure to adjust the help_anat_brainvar.sh script to point to the correct file on your machine.
OUTPUT: This script will output two new files that have been skull stripped and aligned to the TT_N27 brain.
Note: This is good place to stop and inspect things visually in AFNI!! Did skull stripping work okay? How did the alignment go? To inspect the different files in AFNI, open a Terminal window, navigate into the SUBJECT/folder, and type:
afni
AFNI should open up and will automatically search the folder for files it can display. Try changing the Overlay to the different anatomicals you’ve generated.
Next we will do the bulk of preprocessing (detrending, normalization, smoothing) with 041_fmri_preproc2_brainvar.py.
This is a Python script and is rather hefty, so I won’t include the full code here, but I would really, really appreciate someone going through and verifying that everything looks correct. This script came mostly out of the box from the Postle Lab… but I did modify it to be compatible with Python 3.10 (it was for Python 2.7).
We for sure do the following, based on SUBJECT/configs/sub.fmri.preproc (file that 041_fmri_preproc2_brainvar.py uses to set parameters):
## fMRI Preprocessing Parameters
EPI[1].DO_PREPROC = YES
EPI[1].PREPROC_RUNS = 1 2 3 4 5 6 7 8 9 10 11 12
# Motion Correction
EPI[1].DO_MOTION = YES
EPI[1].BASE_RUN = 1
EPI[1].IGNORE_FIRST = 5
EPI[1].BASE_TR = 1
# Fieldmap Correction
EPI[1].DO_FIELDMAP_COR = NO
# Slicetiming Correction
EPI[1].DO_SLICETIMING = YES
# Skull stripping
EPI[1].DO_SKULLSTRIP = YES
# Detrending
EPI[1].DO_DETREND = YES
EPI[1].DETREND_POLORT = 3
# Spatial Smoothing
EPI[1].DO_SMOOTH = YES
EPI[1].SMOOTH_FWHM = 4
# Mean normalization/conversion to Percent signal change
EPI[1].DO_MEAN_NORM = YES
# Remove intermediary files?
EPI[1].REMOVE_INTS = YES
Note: We don’t currently do field map correction in this protocol but that is something to consider! The scans are available if we want to go that route.
This script uses several prefixes in filenames to indicate different stages of preprocessing during fMRI data analysis. Here’s a breakdown of what each prefix corresponds to:
Next steps: extracting non-zero voxels from the 6th TR of each trial! These steps are separate from the typical processing done by the Postle lab and are accomplished in two steps using two scripts:
Note: I made these scripts, so this is another point where I would really, really appreciate someone else’s eyes to make sure I’m doing the right things!
Below is the code inside of pull_6th_tr_vals.sh. Some things:
We are grabbing values from the data formatted “sm_pcnt_dt_ss_FULL##_al+orig”, where ## is the EPI number. This is the smoothed, BOLD percent-change, detrended, skull-stripped, aligned version of the data.
Note the array in Step 1 with the indices [10, 20, …, 230]. We are grabbing the 6th TR of a trial that started AFTER the first 5 TRs (because the 8 second fixation at the beginning of each EPI = 5 TRs). This means our starting index is 10 (the 11th TR). Because there are 10 TRs in a trial, we go over 10 more indices (20) to grab the 6th TR in the 2nd trial, and so on!
This also saves a Nifti version of the files (Step 2; for my own use), but that’s easy to comment out.
To run this, navigate to the SUBJECT/ folder and run:
./pull_6th_tr_vals.sh
Enjoy the luxurious code in pull_6th_tr_vals.sh:
#!/bin/bash
# Loop through each EPI dataset
for i in {1..9}; do
# Define the input and output filenames
input_file="sm_pcnt_dt_ss_FULL0${i}_al+orig" # Replace with the actual prefix of your EPI datasets if different
voxel_values_file="voxel_values_EPI_${i}"
# Step 1: Extract voxel values for every 10th TR, starting with the 11th TR overall (6th TR within each trial)
3dTcat -prefix ${voxel_values_file} ${input_file}'[10,20,30,40,50,60,70,80,90,100,110,120,130,140,150,160,170,180,190,200,210,220,230]'
# Step 2: Convert the extracted voxel values to NIFTI format
3dAFNItoNIFTI -prefix ${voxel_values_file}.nii.gz ${voxel_values_file}+orig
# Print a message indicating progress
echo "Extracted voxel values for EPI ${i}, saved as ${voxel_values_file}+orig"
done
# Loop through each EPI dataset (for datasets 10-12)
for i in {10..12}; do
# Define the input and output filenames
input_file="sm_pcnt_dt_ss_FULL${i}_al+orig" # Replace with the actual prefix of your EPI datasets if different
voxel_values_file="voxel_values_EPI_${i}"
# Step 1: Extract voxel values for every 10th TR, starting with the 11th TR overall (6th TR within each trial)
3dTcat -prefix ${voxel_values_file} ${input_file}'[10,20,30,40,50,60,70,80,90,100,110,120,130,140,150,160,170,180,190,200,210,220,230]'
# Step 2: Convert the extracted voxel values to NIFTI format
3dAFNItoNIFTI -prefix ${voxel_values_file}.nii.gz ${voxel_values_file}+orig
# Print progress
echo "Extracted voxel values for EPI ${i}, saved as ${voxel_values_file}+orig"
done
## Optional Step 3: Combine voxel value datasets across all EPIs
#combined_file="combined_voxel_values"
#3dTcat -prefix ${combined_file} voxel_values_EPI_1+orig voxel_values_EPI_2+orig \
# voxel_values_EPI_3+orig voxel_values_EPI_4+orig voxel_values_EPI_5+orig \
# voxel_values_EPI_6+orig voxel_values_EPI_7+orig voxel_values_EPI_8+orig \
# voxel_values_EPI_9+orig voxel_values_EPI_10+orig voxel_values_EPI_11+orig \
# voxel_values_EPI_12+orig
## Print combined file save
#echo "Combined voxel values data from all EPIs, saved as ${combined_file}+orig"
## Optional Step 4: Convert the combined dataset to NIFTI format
#3dAFNItoNIFTI -prefix ${combined_file}.nii.gz ${combined_file}+orig
# Print a completion message :)
echo "Converted combined dataset to NIFTI format, saved as ${combined_file}.nii.gz"
Why is it in two repetitive chunks? Because it pulls the file name for saving, and there are different numbers of 0s in the single and double digit numbers. Is there cleaner way to keep file name consistency? Maybe! But that’s what I got for now.
Now that we have data from just the 6th TRs of each trial, we want to grab only the active voxels in that time! Luckily there is a function in AFNI for this, 3dmaskdump.
Note: Here, I set the tag -nozero to indicate I want ONLY the non-zero voxels. You can change this to use the wholebrain mask and get alllllllll voxel values if you want, but that’s a lot of data!
To run this, go to the SUBJECT/ folder and run:
./extract_voxel_values.sh
Here’s a look inside extract_voxel_values.sh:
#!/bin/bash
# Loop through EPI numbers 1 to 12
for i in {1..12}; do
# Construct the EPI filename
EPI_FILE="voxel_values_EPI_${i}+orig"
# Construct the output filename
OUTPUT_FILE="coords_and_values_nonzero_epi_${i}.txt"
# Run the 3dmaskdump command
3dmaskdump -nozero -noijk -xyz -o $OUTPUT_FILE $EPI_FILE
echo "Processed EPI ${i}, output saved to ${OUTPUT_FILE}"
done
echo "All EPIs processed."
We have all our brain data set up– but what do these values correspond to? What were people even looking at? We’re going to pull the ordering of presented stimuli from the scanner experiment using a simple script in R. Because this data will ultimately be used to do classification, I also include a column for what domain the stimulus is from. The final output file will be used in Step 11.
Note: Make sure to update the experiment file for the correct participant’s data– you don’t want to use someone else’s stimuli ordering!
Just edit get_stimulus_order.R in RStudio and run! Here’s get_stimulus_order.R:
# getting stimulus orderings from PILOT 3 experiment
library(tidyverse)
library(data.table)
d <- read_csv("PILOT_3_scanner_task.csv")
img_trials <- d %>% filter(trial_type == "image-keyboard-response") %>% select(stimulus, sequence)
img_trials <- img_trials %>%
mutate(stim_category = case_when(
grepl("woman|man", stimulus) ~ "face",
grepl("house|church", stimulus) ~ "place",
grepl("vehicle|furniture", stimulus) ~ "object",
TRUE ~ NA_character_ # In case none of the conditions match
))
img_trials$sequence <- img_trials$sequence + 1
img_trials$run <- rep(x=c(1:12), each=24)
write_csv(x = img_trials, file = "PILOT_3_stimulus_order.csv")
We have our stimuli ordering (and their domain categories). We have our % BOLD signal change from every nonzero voxel during the 6th of TR of each trial. We shall link them and write out the stim*voxel (and voxel coordinates) matrices!
Note: This writes out two files– one combined file for the stim*voxel_responses (combined_stim_vox_all_epi.csv) and one for the xyz coordinates (combined_xyz_nonzero_all_epi.csv) across all EPIs. tbh, the voxel data is big! You can modify the code below to write out separate files for each EPI if you want.
pull_stim_by_voxel_all_epi.R:
library(tidyverse)
# initialize empty data frames for combined results
combined_xyz <- data.frame()
combined_stim_vox <- data.frame()
# loop through EPI numbers 1 to 12
for (i in 1:12) {
# Construct the filenames
epi_file <- paste0("coords_and_values_nonzero_epi_", i, ".txt")
# Read in the data from the AFNI 3dmaskdump output
epi_data <- read.delim(epi_file, sep = " ", header = FALSE)
# Get only the coordinates and combine them into a single data frame
epi_xyz <- epi_data[, 1:3]
names(epi_xyz) <- c("x", "y", "z")
combined_xyz <- bind_rows(combined_xyz, epi_xyz)
# Get only the activation values at each voxel and transpose the data
epi_t <- epi_data %>% select(-V1, -V2, -V3) %>% t()
# Read in the stimulus order data and filter for the corresponding run
stim <- read_csv("PILOT_3_stimulus_order.csv") %>% //NOTE TO CHANGE THIS TO CORRECT STIM FILE
filter(run == i) %>%
select(stimulus, stim_category)
# Combine stimulus data with voxel data
stim_by_vox <- cbind(stim, epi_t)
combined_stim_vox <- bind_rows(combined_stim_vox, stim_by_vox)
# Print progress message
cat("Processed EPI", i, "\n")
}
# Write out the combined results to CSV files
write_csv(combined_xyz, file = "combined_xyz_nonzero_all_epi.csv")
write_csv(combined_stim_vox, file = "combined_stim_vox_all_epi.csv")
cat("All EPIs processed and combined.\n")
And that’s it! At each step, please review what you got out and whether things look odd or confusing. I would LOVE for someone to recreate what I’ve done, and review after each step to ensure we’re doing the right thing. I’m just a grad student Googling AFNI commands and trying my best. :)
TO DO: Pull CORTEX-ONLY voxels (can be based on anatomical scan). Should be straightforward in AFNI if the anatomical scan is available!
The ./convert_dicom_brainvar.sh script doesn’t auto untarball the briks for some reason??? so you have to manually do that for now (it’s just double clicking)
The out-of-the-box protocol and script from the Postle lab (041_fmri_preproc2_brainvar.py) just has ya manually delete the intermediary processing files it generates… maybe we can include a cleanup script!
I feel like steps 8 & 9 could be done more efficiently/in one script?