1 Sample Information

Analysis from the results of a label free mass spectrometry experiment on paraffin embedded samples stratified as follows:

1.1 Sample groups

Sample Etiology
BH-308-P2 Cardioembolic
GOTH-158-P2 Cardioembolic
BH-287-P1 Cardioembolic
BH-278-P3 Cardioembolic
BH-326-P2 Cardioembolic
BH-364-P2 Cardioembolic
NICN-193-P4 Cardioembolic
ATH-018-P1 Cardioembolic
NICN-198-P1 Cardioembolic
BH-159-P1 Cardioembolic
BH-158-P3 Cardioembolic
GOTH-040-P5 Cardioembolic
NICN-053-P6 Cardioembolic
NICN-108-P3 Cardioembolic
NICN-164-P2 Cardioembolic
BH-321-P1 LAA
GOTH-168-P5 LAA
GOTH-172-P5 LAA
ATH-012-P2 LAA
ATH-011-P1 LAA
BH-316-P1 LAA
NICN-213-P1 LAA
BH-323-P2 LAA
NICN-167-P2 LAA
NICN-196-P1 LAA
BH-132-P1 LAA
BH-143-P2 LAA
GOTH-039-P3 LAA
BH-189-P2 LAA
BH-217-P2 LAA
GOTH-038-P3 LAA

2 Protein counts per sample

The graph below shows the protein count of the samples; this result indicates that samples CE_13(NICN-053-P6), CE_14 (NICN-108-P3), LAA_14 (NICN_189_P2) have a much lower protein count, with less than 100 protein count, and CE_15(NICN-164-P2) with less than 200. This discrepancy between samples impairs further analyzes since they are dependent on the overlapping proteins in the groups.

3 Venn diagrams

The Venn Diagram generated using the complete forth data set shows that 3 proteins are overlapping between the CE and LAA Groups - and this intersection is used for comparison between groups, 6 proteins are unique to the LAA samples, and 4 proteins are unique to CE samples.

4 Venn diagrams without low count proteins

The Venn Diagram generated when the low protein count samples were removed shows that 68 proteins are overlapping between the CE and LAA Groups - and this intersection is used for comparison between groups, 25 proteins are unique to the LAA samples, and 31 proteins are unique to CE samples.




# Data Processing

5 Principal Component Analysis

After processing the data with log-2 transformation and normalization, I performed an exploratory data analysis with principal component analysis. The results show that there is no clear separation between groups.

6 Pearson correlation

The correlation matrices with the co-abundant proteins were used to create the adjacency matrices necessary for the network analysis. The correlation analysis was done using the following data sets:

  1. LAA samples with proteins exclusive to the LAA group (50 proteins);
  2. CE samples with proteins exclusive to the CE group (11 proteins);

6.1 LAA Samples

6.2 CE Samples

7 Network Analysis

Using the correlation matrices above to extract information on the abundancy profile of the highly correlated proteins with a threshold of +/- 0.80, the network analysis was carried out and resulted in the graphs below. The node size is proportional with the degree (how connected the protein is), the red edges represent positive correlation and blue edges represent negative correlation. The optimal community structure was calculated for the graph, in using the maximal modularity score.

7.1 LAA Samples





7.2 CE Samples





8 Differential Analysis

Here I used a linear model approach to assess differential abundance/expression between the two groups - this analysis resulted in no differentially abundant proteins.

8.1 Manuscript’s statistically significantly abundant in LAA clots:

8.2 Manuscript’s statistically significantly abundant in CE clots: