date written: 2020-05-19
last run: 2020-05-20
website: https://rpubs.com/navona/SPINS_atlasComparison
Overview. We are seeing poor tract segmentation in the SPINS dataset, using Slicer whitematteranalysis
tractography registered to the ORG atlas. To narrow down what might be going wrong, we compare our present segmentation to that performed by Saba (May 2016), who also used Slicer whitematteranalysis
tractography but created her own custom atlas. Note that this isn’t a true comparison of atlases, as preprocessing and some tractography parameters differ.
#libraries
library(data.table)
library(tidyverse)
library(DT)
library(reshape2)
library(ggplot2)
#bring in my outputs
df_n <- read.csv('../03_analysis/data/df_fiberMeasurements_2020-03-11.csv')
#bring in SPINS demo data
df_demo <- read.csv('./data/df_demo.csv')
#bring in Saba's outputs
files <- list.files(path='./saba_outputs/SPINS', pattern='.csv') #filenames
path <- paste('./saba_outputs/SPINS', files, sep='//') #path
df_s = rbindlist(lapply(path, read.csv, header=F), use.names = F) #bring together
#manipulate my data structure -- keep only minimal requirements
df_n <- subset(df_n, select=c('participant_id', 'hemisphere', 'region', "tensors.FractionalAnisotropy.Mean"))
df_n$site <- substr(df_n$participant_id, 5, 7)
#manipulate Saba's data structure
df_s <- subset(df_s, select=c('V1', 'V2')) #keep vars we want
names(df_s) <- c('participant_id', "tensors.FractionalAnisotropy.Mean") #change column names
df_s$site <- sapply(strsplit(as.character(df_s$participant_id), "_"), function(x) x[3]) #capture site
df_s$hemisphere <- sapply(strsplit(as.character(df_s$participant_id), "_"), function(x) x[12]) #capture hemi
df_s$region <- gsub(".*[_]([^.]+)[.].*", "\\1", df_s$participant_id) #capture region
df_s$participant_id <- paste('sub-', df_s$site, substr(df_s$participant_id, 30, 33), sep='') #make ID bids format
#keep data from tracts of interest (df_n)
df_n <- df_n[df_n$region %in% c('T_AF', 'T_ILF', 'T_IOFF', 'T_UF'), ]
df_n$region <- substr(df_n$region, 3, length(df_n$region))#rename
#keep data from tracts of interest (df_s)
df_s <- df_s[df_s$region %in% c('Arcuate', 'ILF', 'IFOF', 'UF')]
df_s$region <- ifelse(df_s$region == 'Arcuate', 'AF', df_s$region)
df_s$region <- ifelse(df_s$region == 'IFOF', 'IOFF', df_s$region)
#for ease, make a variable combining region and hemisphere
df_n$region_hemi <- paste(df_n$region, df_n$hemisphere, sep=" -- ")
df_s$region_hemi <- paste(df_s$region, df_s$hemisphere, sep=" -- ")
#modify SPINS demo IDs so in bids format
df_demo$record_id <- paste('sub-', substr(df_demo$record_id, 7, 9), substr(df_demo$record_id, 11, 14), sep='')
#merge demo IDs with df_n and df_s
df_n <- merge(df_n, df_demo, by.x='participant_id', by.y='record_id', all.x = T)
df_s <- merge(df_s, df_demo, by.x='participant_id', by.y='record_id', all.x = T)
#in df_s, change values of 0 to NA
df_s[df_s == 0] <- NA
#in df_s, remove row if an NA
df_s <- df_s[complete.cases(df_s), ]
Location. Saba used Slicer to create her own whole brain atlas on 2016-05-02. The atlas can be found in /archive/alumni/saba/projects/new_DTI1/1DTIPrep_only/Tracts/Slicer_Tractography/ForAtlas/wma_output_SlicerTractography/
clustered_atlas/iteration_00002/remove_outliers/atlas.vtp
.
Participants. The atlas was derived from data of n=36 participants from the DWI 1.5T study. All participant data in the atlas came from individuals with schizophrenia. The participant IDs of the n=36 participants are as follows: [S001, S004, S007, S008, S014, S017, S018, S022, S025, S034, S035, S036, S037, S038, S039, S040, S047, S049, S058, S059, S064, S077, S081, S092, S096, S112, S139, S141, S147, S148, S152, S153, S155, S157, S158, S160].
Clustering of TOIs. First, Saba generated a whole-brain atlas containing k=500 fiber bundles (i.e., clusters). From this, she manually identified several key tracts, including the AF, ILF, IOFF, and UF (note: she also segmented the anterior thalamic projects, the cingulum, and the fornix, which are not considered here). The AF contained 9 fiber bundles [308, 319, 323, 369, 425, 4, 5, 6, 7]. The ILF contained 8 fiber bundles [16, 17, 21, 23, 24, 25, 28, 330]. The IFOF contained 6 fiber bundles [294, 311, 316, 320, 322, 336]. The UF contained 4 fiber bundles [10, 11, 12, 8]. These segmentations are visualized below, alongside their combination and the combined atlas:
Location. Saba applied this custom atlas to select SPINS participants on 2016-05-05. I have not located her script with certainty. Her outputs, which are analyzed in what follows, can be found at /archive/alumni/saba/projects/New_6May2016/3T_Output_files/AllOutputs
.
Extracted data. Saba extracted what appears to be FA, MD, and AD values, from the above-named tracts, though these values are not labeled. Because I cannot find her script, I am less confident about MD and RD, but nearly certain that FA is the first value. Thus below, we only compare FA. Saba did not extract information such as number of fibers, or length of fibers.
Participants. The custom atlas was applied to n=36 SPINS participants, all with schizophrenia. The participant IDs are as follows: [sub-CMH0001, sub-CMH0002, sub-CMH0021, sub-CMH0027, sub-CMH0029, sub-CMH0033, sub-CMH0034, sub-CMH0039, sub-CMH0041, sub-CMH0043, sub-CMH0047, sub-CMH0050, sub-CMH0052, sub-CMH0061, sub-CMH0067, sub-MRC0004, sub-MRC0006, sub-MRC0007, sub-MRC0008, sub-MRC0009, sub-MRC0010, sub-MRC0011, sub-MRC0012, sub-MRC0014, sub-MRC0025, sub-MRC0034, sub-MRC0039, sub-MRC0040, sub-MRC0044, sub-ZHH0013, sub-ZHH0014, sub-ZHH0022, sub-ZHH0030, sub-ZHH0032, sub-ZHH0037, sub-ZHH0038]. Thus, the custom atlas was not applied to any participants scanned on a PRISMA scanner (as analysis conducted in mid-2016).
Segmentation success. Across the AF, ILF, IOFF, and UF, Saba saw a high rate of success of segmentation, where ‘success’ means the identification of at least one fiber belonging to each tract (as noted, we do not have information about the number or length of fibers, which are other indices of quality). In the table below, the percent
column shows the percentage of participants with at least one fiber per tract across the n=36 participants analyzed by Saba. The percent comparison
shows the percentage of participants with at least one fiber per tract across the n=433 participants analyzed by Navona (for full analysis, see link). Note that the custom atlas performs significantly better across the AF, ILF, and IOFF, and the atlases perform comparably in the UF.
#make a summary df
tbl_s <- df_s %>%
dplyr::group_by(region_hemi) %>%
dplyr::summarise(participantsWithValues=n(),
meanFA=mean(tensors.FractionalAnisotropy.Mean, na.rm=T),
sdFA=sd(tensors.FractionalAnisotropy.Mean, na.rm=T)) %>%
dplyr::mutate(per=paste0(round(100*participantsWithValues/(length(unique(df_s$participant_id))), 2),'%'))
#round all numeric values
tbl_s <- tbl_s %>% mutate_if(is.numeric, round, digits=3)
#add a column comparing percent segmented by Navona
tbl_s$comparison <- c('61.07%', '39.95%', #AF
'78.12%', '79.64%', #ILF
'0%', '0.25%', #IOFF
'93.89%', '96.95%') #UF
#for consistency, make a pretty, sortable table
DT::datatable(
tbl_s[, c(1, 3:4, 2, 5:6)], #rearrange values
rownames = FALSE,
colnames = c(
'tract',
'FA (mean)',
'FA (sd)',
'participant count',
'percent',
'percent comparison'),
filter = 'top',
options = list(
columnDefs = list(list(className = 'dt-center', targets = 0:5)),
dom = 't'))
Description. Importantly, we also notice that extracted FA values are, in all cases, higher in the custom atlas than the ORG atlas. (However, recall that this isn’t a true comparison of FA values across the atlases, as preprocessing and perhaps some tractography parameters differ.)
#combine df_n and df_s into single df
df <- merge(df_s, df_n[, c('participant_id', 'region_hemi', "tensors.FractionalAnisotropy.Mean")], by=c('participant_id', 'region_hemi'), all.x = T)
#change column names for clarity
names(df)[names(df) == 'tensors.FractionalAnisotropy.Mean.x'] <- 'FA custom atlas'
names(df)[names(df) == 'tensors.FractionalAnisotropy.Mean.y'] <- 'FA ORG atlas'
#melt df for ggplot
df <- melt(df, measure.vars = c('FA custom atlas', 'FA ORG atlas'))
#change NA to 0
#df[is.na(df)] <- 0
#visualize difference
ggplot(df, aes(x=participant_id, y=value)) +
geom_point(aes(colour=variable), size=2.5) +
facet_grid(rows=vars(region_hemi)) +
theme(legend.position = 'top',
legend.title = element_blank(),
axis.text.x= element_text(angle=75, hjust=1),
axis.title = element_blank())