Budapest Reference Connectome

Introduction

The dataset that I am working with comes from the Human Connectome Project. I am working with a subset of data that has been collected and catalogued by the PIT Bioinformatics Group in Budapest. They merged the connectomes of 477 brains from the MRI datasets of the Human Connectome Project into a “reference” brain map. The exact data set that I am working with can be found here.

For this analysis, I will be determining connectivity differences between the left and right hemisphere. This will involve comparing the number of connections within the right hemisphere as compared to the left. Additionally, I will examine the presence of interconnections between hemispheres and which direction they are projecting (e.g., right hemisphere to left). I will also be taking into account the edge weights of the connections and the presence of these connections in samples (not all samples have the same connections). Edge weight is \(n/L\), where \(n\) is the number of tracks between the two regions and \(L\) is the average length of those tracks. For the purpose of this analysis, I will make the assumption that this weight is indicative of the strength of the connection.

Why Should We Care?

There is a theory that right-handed individuals are “left-brain dominant”. This is called brain lateralization. And for a long time, the converse was also believed to be true: Left-handed individuals are “right-brain dominant”. However, this claim was later debunked. It turns out that most left-dominant humans, like right-handed people, are “left-brain dominant”. Thus, since the Human Connectome Projet is striving to create a model of a generic human brain, this left-brain dominance left hemisphere as compared to the right.

From a macroscopic level, the two hemispheres appear to be identical, but nuanced differences in neural networks allows for the intricate differences and specialization of functions in each hemispheres. A differential analysis of right vs left hemisphere can either debunk or corroborate the “left vs right brain” theory. The majority of the human population is right-handed, thus it would be supposed that most human brains are “left brain” dominant. This would further imply more connectivity and stronger activity in the left hemisphere.

What Am I Going To Do?

I will first dissect the data between nodes that project within the left and right hemisphere. I will remove any connections that project from one hemisphere to the other. Then, I will compare statistics on the right and left hemispheric connections. This will include summaries on the strength (weights) of the connections as well as the number of interconnections that exist in each hemisphere.

How Does This Help the Consumer?

The consumer in this scenario would be neuroscientists concerned with the anatomy and physiology of the brain. Many upper-level cognitive functions (i.e., language processing, motor control, etc.) are laterally distributed. That means that the most regions and the functions associated with those regions have a mirrored counterpart in the opposite hemisphere that has nearly identical functionality. My project is searching for the differences between the hemispheres. Thus, if there is a difference in connectivity between the left and right hemisphere, what does that imply about the lateralization of cognitive functions?

Data Preparation

Loading Packages

I will be using two packages: tidyverse and stringr. With tidyverse, I will be using dplyr for transformation function and ggplot for plotting my visuals to go along with my data. I will be using stringr to identify nodes of interest and nodes membership to each hemisphere.

library(tidyverse)
library(stringr)

Data Cleaning

The original dataset included 10 variables:

  • ID node 1
    • The number ID assigned by the original researchers to the origin node
  • ID node 2
    • The number ID assigned by the original researchers to the destination node
  • Name node 1
    • The name of the origin node (usually a brain region or sulcus/gyrus)
  • Name node 2
    • The name of the destination node
  • Parent ID node 1
    • The number ID of the node from which the origin node projected from
  • Parent ID node 2
    • The number ID of the node from which the destination node projected from
  • Parent name node 1
    • The name of the node from which the origin node projected from
  • Parent name node 2
    • The name of the node from which the destination node projected from
  • Edge weight confidence
    • This is the number of brains from the sample that had this edge present
  • Edge weight (med nof)
    • This is the average edge weight of all of the brains that included this edge

For the purpose of this analysis, I will only be including three of those variables: name.node1, name.node2, and edge.weight.med.nof. These are the only relevant variables in comparing the hemispheres.

connectome <-read.csv("C:/Users/hayak/Desktop/2017_BAN6003_hayakawash_FinalProject/budapest_connectome_3.0.csv", header=TRUE, sep=";") #read in data file


#Divide connections between right and left hemisphere
right <- connectome %>% select(name.node1, name.node2, edge.weight.med.nof.)%>% filter(str_detect(name.node1, "rh") | str_detect(name.node1, "Right"))%>% filter(str_detect(name.node2, "rh")| str_detect(name.node2, "Right"))


left <- connectome %>% select(name.node1, name.node2, edge.weight.med.nof.) %>% filter(str_detect(name.node1, "lh") | str_detect(name.node1, "Left"))%>% filter(str_detect(name.node2, "lh")|str_detect(name.node2, "Left"))

#Interconnections

right_to_left <- connectome %>% select(name.node1, name.node2, edge.weight.med.nof.) %>% filter(str_detect(name.node1, "rh")|str_detect(name.node1, "Right")) %>% filter(str_detect(name.node2, "lh")|str_detect(name.node2, "Left"))

left_to_right <- connectome %>% select(name.node1, name.node2, edge.weight.med.nof.) %>% filter(str_detect(name.node1, "lh")|str_detect(name.node1, "Left")) %>% filter(str_detect(name.node2, "rh")|str_detect(name.node2, "Right"))

Exploratory Data Analysis

Early Visualization and Summaries

ggplot(data=right, aes(x=right$edge.weight.med.nof.)) + geom_histogram(binwidth = 1, color="blue") + labs(title="Histogram of Right Hemisphere Edge Weights",x= "Edge Weight", y= "Frequency")

ggplot(data=left, aes(left$edge.weight.med.nof.))+ geom_histogram(binwidth = 1, color="red") + labs(title = "Histogram of Left Hemisphere Edge Weights", x="Edge Weights", y="Frequency")

Upon first inspection, the distributions of the weights of the connections for both hemispheres appear similiar. However, they are not identical. The left hemisphere has slightly more variability after the peak, but both distributions are right-skewed with outliers.

#Stat summaries of right and left hemisphere nodes
right_summary <- summarise(right, r_edge_mean = mean(right$edge.weight.med.nof.), r_edge_sd=sd(right$edge.weight.med.nof.), min = min(right$edge.weight.med.nof.), max = max(right$edge.weight.med.nof.), n=n())

left_summary <- summarise(left, l_edge_mean = mean(left$edge.weight.med.nof.), l_edge_sd =sd(left$edge.weight.med.nof.),min = min(left$edge.weight.med.nof.), max = max(left$edge.weight.med.nof.), n=n())

print(right_summary)
##   r_edge_mean r_edge_sd min max   n
## 1    4.859705  5.609739   1  64 474
print(left_summary)
##   l_edge_mean l_edge_sd min max   n
## 1    4.749509  5.170577   2  56 509

There are no connections that project from the left hemisphere to the right hemisphere in this dataset, therefore no analysis can be done on the interconnectivity between the two hemispheres.

#The empty summaries of the connections across lobes
summarise(left_to_right)
## data frame with 0 columns and 0 rows
summarise(right_to_left)
## data frame with 0 columns and 0 rows

Hypothesis Testing

I will test whether the average edge weight of the two hemispheres are significantly different by conducting a two population hypothesis test with a t-test.

For this test, \(\mu_{1}\) is the average edge weight for the right hemisphere and \(\mu_{2}\) is the average edge weight for the left hemisphere. Thus, the hypotheses are as follows: \[H_{0}: \mu_{1}=\mu_{2}\] \[H_{A}: \mu_{1} \neq \mu_{2}\]

#Two sample hypothesis test
test <- t.test(right$edge.weight.med.nof., left$edge.weight.med.nof., alternative="two.sided", conf.level = 0.95)

test
## 
##  Welch Two Sample t-test
## 
## data:  right$edge.weight.med.nof. and left$edge.weight.med.nof.
## t = 0.31956, df = 958.74, p-value = 0.7494
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.5665335  0.7869251
## sample estimates:
## mean of x mean of y 
##  4.859705  4.749509

After running the test, I fail to reject the null hypothesis because \(p=0.75\) which is greater than \(\alpha =.05\). Thus, there is insufficient evidence to claim that the average edge weights for each hemisphere are signficantly different. In the context of this problem, this means that the strength of the connections between the hemispheres are not significantly different.

Further Analysis

Speech and language ability within the brain was what sparked the interest in brain lateralization. Two scientists, Paul Pierre Broca and Carl Wernicke, found two different regions that were essential for speech and language function. Any deficits or malfunctions in these regions significantly impairs a human’s ability to speak and form words and sentences. These brain regions are called Broca’s Area and Wernicke’s Area. These areas brought rise to the idea of brain lateralization because, although both of these areas have homologs on both hemispheres of the brain, deficits on the left side causes these speech and language defects. However, functional right and left regions of these areas do not exist in all brains. Most brains have the left homolog of these regions.

I will investigate the connections that project to and from these areas and compare the right and left hemisphere. In this data, Broca’s area can be identified as pars opercularis and pars triangularis. And Wernicke’s area is identified as the superior temporal gyrus.

Location of Broca’s Area and Wernicke’s Area on the left hemisphere.

Location of Broca’s Area and Wernicke’s Area on the left hemisphere.

#Selecting Broca's area connections and dividing between right and left hemisphere
broca_right <- connectome %>% select(name.node1, name.node2, edge.weight.med.nof.) %>% filter(str_detect(name.node1, "parsopercularis")|str_detect(name.node1, "parstriangularis")) %>% filter(str_detect(name.node1, "rh")|str_detect(name.node1, "Right"))

broca_left <- connectome %>% select(name.node1, name.node2, edge.weight.med.nof.) %>% filter(str_detect(name.node1, "parsopercularis")|str_detect(name.node1, "parstriangularis")) %>% filter(str_detect(name.node1, "lh")|str_detect(name.node1, "Left"))

#Selecting Wernicke's area connections and dividing between right and left hemisphere
wernicke_right <- connectome %>% select(name.node1, name.node2, edge.weight.med.nof.) %>% filter(str_detect(name.node1, "superiortemporal"))%>% filter(str_detect(name.node1, "rh")|str_detect(name.node1, "Right"))

wernicke_left <- connectome %>% select(name.node1, name.node2, edge.weight.med.nof.) %>% filter(str_detect(name.node1, "superiortemporal"))%>% filter(str_detect(name.node1, "lh")|str_detect(name.node1, "Left"))

First, I will analyze and compare the connections of Broca’s area of the left and right hemispheres.

#Summary statistics of Broca's area of left and right
summarise(broca_right, r_mean_edge = mean(broca_right$edge.weight.med.nof.), r_sd_edge = sd(broca_right$edge.weight.med.nof.), min = min(broca_right$edge.weight.med.nof.), max = max(broca_right$edge.weight.med.nof.), n=n())
##   r_mean_edge r_sd_edge min max  n
## 1     4.52381   3.10836   2  17 21
summarise(broca_left, l_mean_edge = mean(broca_left$edge.weight.med.nof.), l_sd_edge = sd(broca_left$edge.weight.med.nof.), min=min(broca_left$edge.weight.med.nof.), max=max(broca_left$edge.weight.med.nof.), n=n())
##   l_mean_edge l_sd_edge min max  n
## 1    3.369565    1.5827   2   7 23
#Histograms of right and left hemsiphere edge weights for Broca's area

ggplot(broca_right, aes(x=broca_right$edge.weight.med.nof.))+geom_histogram(binwidth=.5,color="blue") + labs(title="Histogram of Edge Weights for Right Hemisphere of Broca's Area", x="Edge Weights of Connections")

ggplot(broca_left, aes(x=broca_left$edge.weight.med.nof.)) + geom_histogram(binwidth = .5, color = "red") + labs(title = "Histogram of Edge Weights for Left Hemsiphere of Broca's Area", x = "Edge Weights of Connections")

It can definitely be seen from the two histograms that the right hemisphere has generally stronger connections (mean = \(4.52\)) where as the left hemisphere has overall weaker connections (mean = \(3.37\)).

Now, I will compare the left and right hemisphere connections for Wernicke’s area.

#Summary statistics of Wernicke's area for left and right 

summarise(wernicke_right, r_mean_edge = mean(wernicke_right$edge.weight.med.nof.), r_sd_edge = sd(wernicke_right$edge.weight.med.nof.), min = min(wernicke_right$edge.weight.med.nof.), max = max(wernicke_right$edge.weight.med.nof.), median = median(wernicke_right$edge.weight.med.nof.), n=n())
##   r_mean_edge r_sd_edge min max median n
## 1         3.5  2.345208   2   8    2.5 6
summarise(wernicke_left, l_mean_edge = mean(wernicke_left$edge.weight.med.nof.), l_sd_edge = sd(wernicke_left$edge.weight.med.nof.), min=min(wernicke_left$edge.weight.med.nof.), max=max(wernicke_left$edge.weight.med.nof.), median= median(wernicke_left$edge.weight.med.nof.), n=n())
##   l_mean_edge l_sd_edge min max median n
## 1    5.166667  4.262237   2  12      3 6
#Histograms

ggplot(wernicke_right, aes(x=wernicke_right$edge.weight.med.nof.))+geom_histogram(binwidth=.5,color="blue") + labs(title="Histogram of Edge Weights for Right Hemisphere of Wernicke's Area", x="Edge Weights of Connections")

ggplot(wernicke_left, aes(x=wernicke_left$edge.weight.med.nof.)) + geom_histogram(binwidth = .5, color = "red") + labs(title = "Histogram of Edge Weights for Left Hemsiphere of Wernicke's Area", x = "Edge Weights of Connections")

There are not as many connections in this dataset for Wernicke’s Area as for Broca’s Area. Although, the left hemisphere connections have much more variability in the strength.

Summary

Problem Statement: The data collected and catalogued by the Human Connectome Project will provide a significant impact on the neurological and neuroscience community at large. This is the first project of its kind that seeks to map all of the connections of the human brain with the goal to establish a generic, healthy brain model. I seized the opportunity to dig into this brain data to investigate any unnoticed abnormalities. The most obtainable problem that I could tackle with this data involved comparing the left and right hemisphere since every observation in the data had a hemispheric label.

Thus, I analyzed Human Connectome Project data to compare the left and right hemispheres. The analysis includes an overall comparison of the connections and strength of connections within the left and right hemispheres. I also analyzed any existing connections between the two hemispheres.

Summarizing Implementation: The data was first divided into connections in the left and right hemisphere by selecting observations where the node names had “right” or “left”. The dataset was then parsed down to only include the node names and the edge weights. The exploratory analysis examined the overall differences between the left and right hemispheres as far as quantities and strength of connections using histograms and summary statistics. A further anaylsis was then implemented to investigate two well-documented regions involved in asymmetrical brain lateralization, Broca’s Area and Wernicke’s Area. I then analyzed the hemispheric differences of these two areas.

Summary/Insights: The overall conclusion that I gathered from my analysis was slightly anticlimatic and a bit disappointing. Essentially, I found that there is no significant difference between the left and right hemisphere of a healthy human brain. I could not even find significant differences in Broca’s Area and Wernicke’s Area, two regions that are related to asymmetry within the brain.

Limitations: The biggest limitation encountered in analyzing this data is that the data only includes connections within the cortex. These are only connections along the surface of the brain. Therefore, the data does not include connections from within the brain. The brain structures beneath the cortex are the command centers of much more complex cognitive and motor activities of the human body. The cortex is mostly known for storing connections based on learning and memory.

Additionally, the purpose of this data is to generate a model of a healthy, generic human brain. Thus, I could not go into the data and compare, say, a left-handed brain versus a right-handed brain. This data was meant to create a universal reference brain mapping of connections. This was probably the strongest contributor to why I could not find any significant differences in hemispheres.