Final Project

Budapest Reference Connectome

Introduction

The dataset that I am working with comes from the Human Connectome Project. I am working with a subset of data that has been collected and catalogued by the PIT Bioinformatics Group in Budapest. They merged the connectomes of 477 brains from the MRI datasets of the Human Connectome Project into a “reference” brain map. The exact data set that I am working with can be found here.

For this analysis, I will be determining connectivity differences between the left and right hemisphere. This will involve comparing the number of connections within the right hemisphere as compared to the left. Additionally, I will examine the presence of interconnections between hemispheres and which direction they are projecting (e.g., right hemisphere to left). I will also be taking into account the edge weights of the connections and the presence of these connections in samples (not all samples have the same connections). Edge weight is \(n/L\), where \(n\) is the number of tracks between the two regions and \(L\) is the average length of those tracks. For the purpose of this analysis, I will make the assumption that this weight is indicative of the strength of the connection.

Why Should We Care?

There is a theory that right-handed individuals are “left-brain dominant”. This is called brain lateralization. And for a long time, the converse was also believed to be true: Left-handed individuals are “right-brain dominant”. However, this claim was later debunked. It turns out that most left-dominant humans, like right-handed people, are “left-brain dominant”. Thus, since the Human Connectome Projet is striving to create a model of a generic human brain, this left-brain dominance left hemisphere as compared to the right.

From a macroscopic level, the two hemispheres appear to be identical, but nuanced differences in neural networks allows for the intricate differences and specialization of functions in each hemispheres. A differential analysis of right vs left hemisphere can either debunk or corroborate the “left vs right brain” theory. The majority of the human population is right-handed, thus it would be supposed that most human brains are “left brain” dominant. This would further imply more connectivity and stronger activity in the left hemisphere.

What Am I Going To Do?

I will first dissect the data between nodes that project within the left and right hemisphere. I will remove any connections that project from one hemisphere to the other. Then, I will compare statistics on the right and left hemispheric connections. This will include summaries on the strength (weights) of the connections as well as the number of interconnections that exist in each hemisphere.

How Does This Help the Consumer?

The consumer in this scenario would be neuroscientists concerned with the anatomy and physiology of the brain. Many upper-level cognitive functions (i.e., language processing, motor control, etc.) are laterally distributed. That means that the most regions and the functions associated with those regions have a mirrored counterpart in the opposite hemisphere that has nearly identical functionality. My project is searching for the differences between the hemispheres. Thus, if there is a difference in connectivity between the left and right hemisphere, what does that imply about the lateralization of cognitive functions?

Data Preparation

Loading Packages

I will be using two packages: tidyverse and stringr. With tidyverse, I will be using dplyr for transformation function and ggplot for plotting my visuals to go along with my data. I will be using stringr to identify nodes of interest and nodes membership to each hemisphere.

library(tidyverse)
library(stringr)

Data Cleaning

The original dataset included 10 variables:

ID node 1
- The number ID assigned by the original researchers to the origin node
ID node 2
- The number ID assigned by the original researchers to the destination node
Name node 1
- The name of the origin node (usually a brain region or sulcus/gyrus)
Name node 2
- The name of the destination node
Parent ID node 1
- The number ID of the node from which the origin node projected from
Parent ID node 2
- The number ID of the node from which the destination node projected from
Parent name node 1
- The name of the node from which the origin node projected from
Parent name node 2
- The name of the node from which the destination node projected from
Edge weight confidence
- This is the number of brains from the sample that had this edge present
Edge weight (med nof)
- This is the average edge weight of all of the brains that included this edge

For the purpose of this analysis, I will only be including three of those variables: name.node1, name.node2, and edge.weight.med.nof. These are the only relevant variables in comparing the hemispheres.

connectome <-read.csv("C:/Users/hayak/Desktop/2017_BAN6003_hayakawash_FinalProject/budapest_connectome_3.0.csv", header=TRUE, sep=";") #read in data file


#Divide connections between right and left hemisphere
right <- connectome %>% select(name.node1, name.node2, edge.weight.med.nof.)%>% filter(str_detect(name.node1, "rh") | str_detect(name.node1, "Right"))%>% filter(str_detect(name.node2, "rh")| str_detect(name.node2, "Right"))


left <- connectome %>% select(name.node1, name.node2, edge.weight.med.nof.) %>% filter(str_detect(name.node1, "lh") | str_detect(name.node1, "Left"))%>% filter(str_detect(name.node2, "lh")|str_detect(name.node2, "Left"))

#Interconnections

right_to_left <- connectome %>% select(name.node1, name.node2, edge.weight.med.nof.) %>% filter(str_detect(name.node1, "rh")|str_detect(name.node1, "Right")) %>% filter(str_detect(name.node2, "lh")|str_detect(name.node2, "Left"))

left_to_right <- connectome %>% select(name.node1, name.node2, edge.weight.med.nof.) %>% filter(str_detect(name.node1, "lh")|str_detect(name.node1, "Left")) %>% filter(str_detect(name.node2, "rh")|str_detect(name.node2, "Right"))

Exploratory Data Analysis

Early Visualization and Summaries

ggplot(data=right, aes(x=right$edge.weight.med.nof.)) + geom_histogram(binwidth = 1, color="blue") + labs(title="Histogram of Right Hemisphere Edge Weights",x= "Edge Weight", y= "Frequency")

ggplot(data=left, aes(left$edge.weight.med.nof.))+ geom_histogram(binwidth = 1, color="red") + labs(title = "Histogram of Left Hemisphere Edge Weights", x="Edge Weights", y="Frequency")

Upon first inspection, the distributions of the weights of the connections for both hemispheres appear similiar. However, they are not identical. The left hemisphere has slightly more variability after the peak, but both distributions are right-skewed with outliers.

#Stat summaries of right and left hemisphere nodes
right_summary <- summarise(right, r_edge_mean = mean(right$edge.weight.med.nof.), r_edge_sd=sd(right$edge.weight.med.nof.), min = min(right$edge.weight.med.nof.), max = max(right$edge.weight.med.nof.), n=n())

left_summary <- summarise(left, l_edge_mean = mean(left$edge.weight.med.nof.), l_edge_sd =sd(left$edge.weight.med.nof.),min = min(left$edge.weight.med.nof.), max = max(left$edge.weight.med.nof.), n=n())

print(right_summary)

##   r_edge_mean r_edge_sd min max   n
## 1    4.859705  5.609739   1  64 474

print(left_summary)

##   l_edge_mean l_edge_sd min max   n
## 1    4.749509  5.170577   2  56 509

There are no connections that project from the left hemisphere to the right hemisphere in this dataset, therefore no analysis can be done on the interconnectivity between the two hemispheres.

#The empty summaries of the connections across lobes
summarise(left_to_right)

## data frame with 0 columns and 0 rows

summarise(right_to_left)

## data frame with 0 columns and 0 rows

Hypothesis Testing

I will test whether the average edge weight of the two hemispheres are significantly different by conducting a two population hypothesis test with a t-test.

For this test, \(\mu_{1}\) is the average edge weight for the right hemisphere and \(\mu_{2}\) is the average edge weight for the left hemisphere. Thus, the hypotheses are as follows: \[H_{0}: \mu_{1}=\mu_{2}\] \[H_{A}: \mu_{1} \neq \mu_{2}\]

#Two sample hypothesis test
test <- t.test(right$edge.weight.med.nof., left$edge.weight.med.nof., alternative="two.sided", conf.level = 0.95)

test

## 
##  Welch Two Sample t-test
## 
## data:  right$edge.weight.med.nof. and left$edge.weight.med.nof.
## t = 0.31956, df = 958.74, p-value = 0.7494
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.5665335  0.7869251
## sample estimates:
## mean of x mean of y 
##  4.859705  4.749509

After running the test, I fail to reject the null hypothesis because \(p=0.75\) which is greater than \(\alpha =.05\). Thus, there is insufficient evidence to claim that the average edge weights for each hemisphere are signficantly different. In the context of this problem, this means that the strength of the connections between the hemispheres are not significantly different.