1 Introduction

Gene expression profiling is a powerful tool for understanding the molecular basis of diseases, particularly cancer. The dataset analyzed in this report is a CSV file containing the gene expression levels of 54,676 genes (columns) across 151 samples (rows). These samples represent five different types of breast cancer as well as healthy tissue. The dataset is sourced from the CuMiDa database, a highly curated microarray repository designed for benchmarking machine learning techniques in cancer research (Feltes et al. 2019).

Understanding the underlying patterns in such high-dimensional datasets poses a significant challenge due to the sheer volume of features. In this dataset, each sample’s expression levels span tens of thousands of genes, most of which may not contribute directly to distinguishing between cancer subtypes. Consequently, raw data analysis can be hindered by redundancy, noise, and computational inefficiency. Dimension reduction and clustering techniques are therefore applied to address these issues.

1.1 Purpose of dimension reduction

Dimension reduction is used to: remove noise and redundancy, improve interpretability and enhance computational efficiency.

In this study, dimension reduction techniques such as Principal Component Analysis (PCA) and Factor analysis are applied to reduce the dataset’s dimensions while preserving the underlying structure of the data.

1.2 Purpose of clustering

Clustering techniques aim to group samples or genes based on their similarity in expression patterns. The primary goals include: identifying molecular subtypes, understanding gene co-expression and providing insights for biomarker discovery.

Clustering techniques applied here were: hierarchical clustering, k-means clustering and t-sne clustering.

The combination of dimension reduction and clustering offers a robust framework to uncover meaningful patterns and relationships in complex gene expression data, helping bridge the gap between raw molecular data and actionable insights in cancer research.

2 Exploring and cleaning the data

I start by uploading my dataset, checking its structure and dimensions.

data <- read.csv("Breast_GSE45827.csv", header = TRUE)

str(data[1:5])
## 'data.frame':    151 obs. of  5 variables:
##  $ samples   : int  84 85 87 90 91 92 93 94 99 101 ...
##  $ type      : chr  "basal" "basal" "basal" "basal" ...
##  $ X1007_s_at: num  9.85 9.86 10.1 9.76 9.41 ...
##  $ X1053_at  : num  8.1 8.21 8.94 7.36 7.75 ...
##  $ X117_at   : num  6.42 7.06 5.74 6.48 6.69 ...
dim(data)
## [1]   151 54677

Key Variables

Challenges with the dataset:

The dataset has far more variables (54,677) than observations (151), making it a typical candidate for dimension reduction techniques such as Principal Component Analysis (PCA) or Factor analysis. Some genes may be correlated/redundant, necessitating feature selection or clustering to group similar expression patterns. Also normalization and scaling might be required before applying clustering algorithms.

2.1 Removing unnecessary columns

gene_data <- data[, -1]

sum(is.na(gene_data))
## [1] 0

There is no missing data so I will proceed with data scaling and feature selection.

2.2 Data scaling and feature selection

gene_data_scaled <- scale(gene_data[,-1])
str(gene_data_scaled[1:5])
##  num [1:5] -0.797 -0.778 -0.384 -0.949 -1.517
gene_variances <- apply(gene_data_scaled, 2, var, na.rm = TRUE)

top_genes_indices <- order(gene_variances, decreasing = TRUE)[1:100]
top_genes <- gene_data_scaled[, top_genes_indices]
summary(top_genes[, 1:5])
##   X1552584_at       X1556147_at       X1557029_at      X1557677_a_at    
##  Min.   :-1.8826   Min.   :-1.7894   Min.   :-1.7714   Min.   :-1.5344  
##  1st Qu.:-0.7634   1st Qu.:-0.5713   1st Qu.:-0.6160   1st Qu.:-0.4761  
##  Median :-0.1457   Median :-0.1593   Median :-0.1199   Median :-0.1408  
##  Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000  
##  3rd Qu.: 0.5222   3rd Qu.: 0.3997   3rd Qu.: 0.3961   3rd Qu.: 0.3686  
##  Max.   : 3.0696   Max.   : 4.1978   Max.   : 6.2424   Max.   : 9.4677  
##   X1559732_at     
##  Min.   :-1.5836  
##  1st Qu.:-0.5968  
##  Median :-0.1229  
##  Mean   : 0.0000  
##  3rd Qu.: 0.2906  
##  Max.   : 7.2254
str(top_genes)
##  num [1:151, 1:100] 1.816 0.321 -0.567 2.403 0.594 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:100] "X1552584_at" "X1556147_at" "X1557029_at" "X1557677_a_at" ...

Scaling standardizes gene expression values, ensuring that all genes have a mean of 0 and a standard deviation of 1. This step is essential for downstream analyses such as PCA and clustering. As we can see from the summary, scaling was performed correctly. Following this, I selected 100 genes with the highest variance, as high-variance genes tend to be more informative for clustering. They exhibit greater expression differences across samples, making them particularly useful for identifying meaningful biological patterns.

3 Correlation analysis

In this step, a correlation matrix was computed for the 100 selected high-variance genes, using Pearson’s correlation coefficient. Since gene expression data is continuous, standardized, and follows an approximately normal distribution, Pearson’s correlation is the best choice. It effectively detects linear relationships, making it ideal for PCA, FA and clustering applications. The correlation matrix quantifies the linear relationship between pairs of genes, with values ranging from -1 to 1:

+1: Perfect positive correlation (genes show similar expression patterns).

0: No correlation.

-1: Perfect negative correlation (genes have opposite expression patterns).

The matrix is visualized using a heatmap generated by the corrplot package. Blue shades represent positive correlations, while red shades indicate negative correlations. The diagonal (black line) represents self-correlations, which are always 1.

library(corrplot)
## corrplot 0.95 loaded
correlation_matrix <- cor(top_genes, method = "pearson", use = "pairwise.complete.obs")

corrplot(correlation_matrix, order = "alphabet", tl.cex = 0.4, method = "color")

The correlation heatmap reveals a mix of weak to moderate correlations among the selected genes, suggesting some underlying structure but no clearly defined blocks. This indicates that Principal Component Analysis (PCA) is well-suited for dimension reduction, as it can effectively capture and condense the variance in the dataset. However, the absence of strong correlation clusters suggests that Factor Analysis (FA) may not be optimal, as it relies on well-defined latent factors driving the observed correlations. In the context of clustering, methods such as hierarchical clustering may still identify meaningful gene groupings, but the generally low correlation values imply that the clusters might not be well-defined or strongly separable.

4 Dimension reduction

Following the analysis of the correlation matrix, I wanted to further evaluate the suitability of Factor Analysis (FA) for this dataset. To do so, I conducted the Kaiser-Meyer-Olkin (KMO) test and Bartlett’s test of sphericity, which assess the adequacy of the data for dimension reduction techniques.

4.1 Kaiser-Meyer-Olkin test

The Kaiser-Meyer-Olkin (KMO) test assesses the sampling adequacy for factor analysis by measuring the proportion of variance in the data that might be caused by underlying factors.

library(psych)

KMO(correlation_matrix)
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = correlation_matrix)
## Overall MSA =  0.62
## MSA for each item = 
##   X1552584_at   X1556147_at   X1557029_at X1557677_a_at   X1559732_at 
##          0.59          0.29          0.52          0.46          0.43 
## X1559789_a_at   X1564573_at   X1566879_at   X1569983_at  X201121_s_at 
##          0.34          0.54          0.74          0.48          0.57 
##    X202419_at  X202425_x_at  X202867_s_at    X203006_at    X203761_at 
##          0.46          0.34          0.51          0.46          0.69 
##    X204389_at  X205915_x_at    X206024_at    X206396_at    X210516_at 
##          0.48          0.70          0.66          0.44          0.51 
##  X212901_s_at  X214894_x_at    X214936_at    X215282_at    X216097_at 
##          0.61          0.40          0.63          0.45          0.33 
##  X217272_s_at    X220663_at  X220965_s_at    X222027_at    X222184_at 
##          0.46          0.32          0.71          0.70          0.41 
##    X222371_at    X226406_at    X227623_at    X228041_at    X230228_at 
##          0.69          0.57          0.54          0.55          0.59 
##    X232968_at    X234774_at    X239537_at    X239735_at    X241255_at 
##          0.58          0.82          0.32          0.63          0.43 
##    X241366_at    X243537_at    X243900_at       X117_at   X1552261_at 
##          0.69          0.78          0.52          0.68          0.31 
## X1552264_a_at   X1552266_at   X1552271_at X1552272_a_at   X1552280_at 
##          0.75          0.29          0.77          0.54          0.50 
##   X1552281_at   X1552286_at   X1552302_at X1552303_a_at   X1552306_at 
##          0.84          0.69          0.68          0.70          0.52 
## X1552311_a_at   X1552315_at   X1552318_at X1552323_s_at   X1552347_at 
##          0.76          0.72          0.72          0.32          0.72 
## X1552349_a_at X1552355_s_at X1552364_s_at   X1552365_at X1552377_s_at 
##          0.40          0.82          0.44          0.37          0.46 
##   X1552386_at   X1552388_at   X1552396_at X1552399_a_at   X1552402_at 
##          0.63          0.58          0.64          0.56          0.75 
## X1552412_a_at X1552426_a_at X1552439_s_at X1552445_a_at X1552450_a_at 
##          0.70          0.42          0.28          0.41          0.73 
## X1552453_a_at X1552472_a_at   X1552482_at X1552486_s_at   X1552491_at 
##          0.71          0.63          0.74          0.57          0.69 
## X1552501_a_at X1552516_a_at X1552518_s_at X1552523_a_at   X1552528_at 
##          0.79          0.65          0.65          0.70          0.80 
## X1552532_a_at   X1552535_at X1552538_a_at   X1552555_at X1552563_a_at 
##          0.53          0.80          0.38          0.70          0.58 
##   X1552566_at X1552569_a_at   X1552582_at X1552585_s_at X1552590_a_at 
##          0.59          0.53          0.53          0.75          0.45 
##   X1552592_at   X1552594_at   X1552596_at   X1552612_at   X1552623_at 
##          0.62          0.59          0.65          0.52          0.40

The overall KMO value of 0.62 suggests a mediocre level of factorability, meaning that the dataset is somewhat suitable for factor analysis (FA) but not ideal.

Examining the individual KMO values for each gene, we see a wide range, with some genes showing low adequacy scores (below 0.50). Values closer to 1 indicate strong suitability for FA, while values below 0.50 suggest that some variables do not share sufficient common variance and may not contribute meaningfully to a latent factor model.

4.2 Bartlet’s test of sphericity

Bartlett’s test evaluates whether the correlation matrix is significantly different from an identity matrix, which would indicate that variables are sufficiently interrelated to justify dimension reduction techniques such as Principal Component Analysis (PCA) or Factor Analysis (FA).

library(psych)

bartlett_result <- cortest.bartlett(cor(top_genes), n = 151)
print(bartlett_result)
## $chisq
## [1] 10504.89
## 
## $p.value
## [1] 0
## 
## $df
## [1] 4950

The test result shows a chi-square value of 10504.89 with 4950 degrees of freedom, and a p-value of 0 (rounded, meaning it is extremely small). This strongly suggests that the null hypothesis, which assumes that the correlation matrix is an identity matrix (i.e., variables are uncorrelated), can be rejected. In other words, there is sufficient correlation structure in the data, making it appropriate to proceed with PCA or FA.

Despite the mediocre KMO score (0.62), Bartlett’s test confirms that the dataset has a strong enough correlation pattern to benefit from dimension reduction. This supports the application of PCA for variance maximization or FA for latent factor extraction, though careful selection of the number of components or factors is still necessary.

4.3 Determining the number of factors for Factor Analysis

A crucial step in Factor Analysis (FA) is selecting the optimal number of factors to retain. Since there is no single objective criterion for factor retention, several established methods can be used to guide this decision. Each approach provides a different perspective on how many factors should be considered meaningful in explaining the variance in the dataset.

To assess the appropriate number of factors, we can apply multiple techniques, including:

  • Kaiser’s Criterion: Retaining factors with eigenvalues greater than 1, as they explain more variance than a single original variable.

  • Scree Plot (Cattell’s Criterion): Analyzing the elbow point on the scree plot, where the eigenvalues show a sharp decline before leveling off. Factors to the left of this inflection point are considered significant.

  • Proportion of variance explained: Selecting the number of factors that together account for a predefined proportion of the total variance, such as 80%.

  • Half-Variable Rule (Kim & Mueller’s Criterion): Ensuring that the number of factors remains below half of the total observed variables, preventing overfitting.

By comparing the results from these approaches, I will determine an optimal number of factors that best capture the structure of the data while minimizing noise and redundancy.

  • Kaiser’s Criterion (Eigenvalues > 1)
eigenvalues <- eigen(correlation_matrix)$values
kaiser_factors <- sum(eigenvalues > 1)
cat("Number of factors based on Kaiser's criterion:", kaiser_factors, "\n")
## Number of factors based on Kaiser's criterion: 28
  • Scree Plot (Cattell’s Criterion)
scree(top_genes, factors = TRUE, pc = TRUE, main = "Scree Plot of Eigenvalues")

  • Proportion of variance explained

Choosing number of factors that explain at least 85% of total variance

cumulative_variance <- cumsum(eigenvalues) / sum(eigenvalues)
num_factors_variance <- min(which(cumulative_variance >= 0.85))
cat("Number of factors based on 85% variance explained:", num_factors_variance, "\n")
## Number of factors based on 85% variance explained: 40

When determining the optimal number of factors in Factor Analysis, different criteria provide different perspectives. While each method has its advantages, the Proportion of Variance Explained criterion is the most robust and interpretable choice because it ensures that the selected number of factors captures a meaningful proportion of the total variance in the dataset.

4.4 Factor Analysis

Chosen method to extract the loading for this analysis is MINRES. MINRES in its first stage, uses multiple R^2 estimates as common variability resources. Then, after initial factor extraction, it adjusts the loadings using iterative methods. The fitting of the model is assessed by minimizing the sum of squares of the residuals. To make the interpretation of given loadings easier I chose to use VARIMAX as rotation method. VARIMAX maximises the variance in the columns of the loadings matrix for every factor which seemed to be fitting in this example.

optimal_factors <- num_factors_variance
fa_result <- fa(top_genes, nfactors = optimal_factors, rotate = "varimax")
print(fa_result)
## Factor Analysis using method =  minres
## Call: fa(r = top_genes, nfactors = optimal_factors, rotate = "varimax")
## Standardized loadings (pattern matrix) based upon correlation matrix
##                 MR1   MR3   MR2  MR15  MR10   MR7  MR14   MR8  MR20  MR33  MR36
## X1552584_at   -0.17  0.80  0.26 -0.09 -0.13  0.01  0.05 -0.08 -0.03  0.05  0.04
## X1556147_at    0.09 -0.04 -0.06  0.02 -0.02  0.07  0.04 -0.02  0.18  0.02  0.00
## X1557029_at    0.16 -0.04  0.09 -0.18 -0.11  0.27  0.01  0.10  0.71  0.15  0.06
## X1557677_a_at  0.09  0.04  0.06 -0.04 -0.02  0.15  0.03  0.01 -0.05  0.03  0.03
## X1559732_at    0.10  0.05 -0.13  0.07 -0.04  0.08  0.01 -0.01  0.07 -0.06  0.01
## X1559789_a_at -0.01  0.04 -0.07 -0.01  0.03  0.14  0.01 -0.04  0.78  0.00  0.09
## X1564573_at    0.29 -0.11  0.10  0.18 -0.02  0.12  0.07  0.04 -0.05 -0.11  0.18
## X1566879_at    0.63 -0.07 -0.23 -0.03  0.06 -0.08  0.12  0.16 -0.01 -0.15  0.09
## X1569983_at    0.27  0.06 -0.10  0.12  0.06  0.01  0.06 -0.05  0.07  0.01 -0.07
## X201121_s_at  -0.09 -0.15  0.21 -0.11  0.09 -0.03 -0.12  0.12  0.02  0.20  0.09
## X202419_at     0.12 -0.07 -0.37  0.23 -0.19  0.29  0.01 -0.16 -0.15 -0.17  0.01
## X202425_x_at  -0.05 -0.10 -0.06 -0.05  0.01  0.05 -0.02 -0.08  0.03  0.01  0.05
## X202867_s_at  -0.02 -0.15  0.19  0.65  0.05  0.10  0.19 -0.02 -0.07 -0.09  0.04
## X203006_at    -0.08 -0.22 -0.12  0.04  0.00  0.05 -0.05  0.00  0.01 -0.02  0.00
## X203761_at    -0.22  0.81 -0.14 -0.26 -0.07 -0.10 -0.03 -0.05  0.00  0.11 -0.02
## X204389_at     0.09  0.00 -0.11  0.09  0.15  0.75  0.00  0.01  0.18 -0.06  0.01
## X205915_x_at   0.61 -0.13 -0.13 -0.12  0.14  0.04  0.03  0.04  0.06 -0.08  0.13
## X206024_at     0.25 -0.14 -0.04  0.25  0.07  0.03 -0.04 -0.07  0.04  0.08 -0.04
## X206396_at    -0.02 -0.04  0.00  0.11  0.66 -0.02  0.20 -0.08 -0.09 -0.11  0.05
## X210516_at    -0.06 -0.15  0.17  0.09 -0.01  0.01 -0.01 -0.08 -0.01  0.06  0.00
## X212901_s_at  -0.09 -0.27  0.33 -0.11 -0.06 -0.14 -0.06  0.11 -0.16  0.11  0.35
## X214894_x_at  -0.32  0.03  0.14 -0.04  0.09 -0.22 -0.11  0.14  0.06 -0.12 -0.06
## X214936_at    -0.12  0.20  0.22 -0.59  0.01  0.00 -0.11 -0.10  0.07  0.11 -0.07
## X215282_at     0.05 -0.17  0.17  0.05 -0.05  0.00  0.17  0.18  0.07 -0.17  0.04
## X216097_at    -0.04 -0.14  0.06  0.02 -0.08  0.05 -0.07  0.09  0.10 -0.02 -0.09
## X217272_s_at   0.14 -0.15 -0.03 -0.01 -0.12  0.77 -0.01 -0.02  0.13  0.04  0.02
## X220663_at     0.12 -0.14 -0.01  0.08  0.02  0.04  0.03 -0.04 -0.01 -0.02  0.04
## X220965_s_at   0.57 -0.01  0.24  0.08 -0.11  0.08  0.11 -0.03 -0.05 -0.04 -0.08
## X222027_at    -0.21 -0.14  0.29 -0.45 -0.26  0.00 -0.02  0.12  0.07  0.20 -0.10
## X222184_at     0.04 -0.13 -0.24 -0.04 -0.01  0.05 -0.05  0.07  0.04  0.06  0.19
## X222371_at    -0.27  0.53 -0.42 -0.28  0.11 -0.16 -0.12 -0.03  0.09  0.06 -0.11
## X226406_at    -0.36 -0.13  0.07 -0.03 -0.49  0.03 -0.03  0.02 -0.19 -0.04  0.17
## X227623_at    -0.04  0.24 -0.31  0.13  0.25 -0.02  0.00 -0.03  0.11 -0.18  0.07
## X228041_at    -0.33  0.04 -0.08  0.43  0.08  0.09  0.02 -0.09  0.02  0.12 -0.08
## X230228_at     0.05  0.05 -0.19  0.51  0.13 -0.06 -0.01 -0.07 -0.01 -0.10 -0.20
## X232968_at    -0.10 -0.02 -0.37  0.12  0.45 -0.03  0.25 -0.11 -0.08 -0.23 -0.14
## X234774_at     0.65 -0.22  0.21 -0.03 -0.14  0.02  0.13  0.06 -0.22 -0.01  0.06
## X239537_at     0.09  0.04 -0.07  0.03  0.05  0.11 -0.06  0.05  0.06 -0.03  0.02
## X239735_at    -0.19  0.24 -0.39 -0.04  0.25 -0.15  0.01 -0.11  0.03 -0.12 -0.13
## X241255_at     0.18 -0.03 -0.11 -0.06 -0.05 -0.02  0.02 -0.02  0.05  0.03  0.03
## X241366_at    -0.27 -0.34  0.59 -0.12  0.05 -0.08 -0.06  0.02  0.16  0.03 -0.06
## X243537_at    -0.22  0.57 -0.36 -0.14  0.11 -0.15 -0.03 -0.11 -0.02 -0.10 -0.05
## X243900_at     0.17 -0.15  0.08  0.18  0.07  0.03  0.84  0.07  0.06  0.07 -0.06
## X117_at       -0.27  0.48  0.00 -0.26 -0.05 -0.06 -0.07 -0.10 -0.10  0.03 -0.06
## X1552261_at    0.10  0.06 -0.02  0.03  0.07  0.09  0.10  0.02 -0.10  0.00  0.03
## X1552264_a_at -0.22 -0.20  0.79 -0.01 -0.13 -0.12 -0.06  0.11 -0.05  0.15  0.12
## X1552266_at   -0.02 -0.02 -0.22  0.03  0.01 -0.01 -0.03  0.07 -0.01 -0.02  0.00
## X1552271_at    0.59 -0.29 -0.01  0.02  0.00  0.03 -0.01  0.01 -0.01 -0.10  0.02
## X1552272_a_at  0.23 -0.10 -0.10  0.03  0.01  0.00  0.04  0.04 -0.03 -0.02  0.02
## X1552280_at    0.03  0.48  0.15 -0.27 -0.03  0.09 -0.01 -0.03  0.03 -0.04  0.04
## X1552281_at    0.82 -0.13 -0.06  0.03  0.05  0.11 -0.03  0.02  0.08  0.08 -0.06
## X1552286_at    0.10 -0.27  0.10  0.00 -0.16  0.00  0.06  0.06 -0.04  0.40  0.21
## X1552302_at   -0.23  0.46 -0.25 -0.04  0.05 -0.18 -0.05 -0.09 -0.02 -0.04 -0.10
## X1552303_a_at -0.22  0.49 -0.09 -0.14  0.13 -0.19 -0.05 -0.02 -0.03 -0.12  0.01
## X1552306_at   -0.06 -0.28  0.34 -0.09 -0.03  0.07  0.00 -0.12  0.02  0.02  0.17
## X1552311_a_at  0.65 -0.12 -0.38 -0.03 -0.05  0.02  0.08 -0.06 -0.04  0.03 -0.04
## X1552315_at   -0.09  0.88  0.01  0.16  0.01  0.03 -0.07 -0.04 -0.04 -0.05  0.04
## X1552318_at   -0.08  0.87 -0.14  0.02  0.03 -0.04 -0.09 -0.06  0.06  0.00  0.00
## X1552323_s_at  0.13  0.05  0.15  0.01  0.01  0.05  0.02 -0.04  0.14  0.05  0.84
## X1552347_at   -0.25  0.17 -0.59 -0.13  0.03  0.05 -0.01 -0.01  0.04  0.02  0.12
## X1552349_a_at  0.11 -0.15  0.04  0.11 -0.11  0.04 -0.02  0.80 -0.04  0.25  0.00
## X1552355_s_at  0.65 -0.21 -0.08  0.16  0.05  0.25  0.03  0.07  0.02 -0.06  0.19
## X1552364_s_at  0.09 -0.11  0.19  0.01  0.07 -0.07  0.08 -0.05 -0.09 -0.09 -0.01
## X1552365_at   -0.07  0.03 -0.03  0.05  0.13 -0.08  0.00 -0.04  0.05 -0.04  0.05
## X1552377_s_at  0.35  0.19  0.00 -0.12 -0.04 -0.07  0.05  0.05  0.01  0.00  0.09
## X1552386_at   -0.23  0.53 -0.07  0.18  0.07 -0.03 -0.10  0.00  0.08 -0.09  0.02
## X1552388_at    0.20 -0.22 -0.11  0.32  0.14 -0.02  0.09 -0.02  0.07 -0.04 -0.08
## X1552396_at    0.27 -0.19 -0.07  0.09  0.42  0.18  0.19 -0.08  0.05  0.03  0.07
## X1552399_a_at -0.33  0.00  0.22  0.06  0.00 -0.28  0.00 -0.04 -0.05 -0.30 -0.05
## X1552402_at    0.57 -0.18 -0.03 -0.02  0.04  0.03 -0.05  0.03 -0.10  0.10  0.10
## X1552412_a_at  0.43  0.00 -0.24  0.11  0.02  0.04  0.12 -0.04 -0.02 -0.10  0.01
## X1552426_a_at -0.09  0.05  0.00 -0.02 -0.04 -0.01 -0.08  0.04  0.01  0.09 -0.03
## X1552439_s_at -0.09 -0.06  0.06 -0.05 -0.02 -0.02 -0.05 -0.02  0.01  0.04  0.05
## X1552445_a_at  0.14 -0.12  0.01 -0.14 -0.03 -0.04  0.08  0.84  0.05 -0.06 -0.02
## X1552450_a_at  0.70 -0.15  0.12  0.06 -0.05  0.07  0.06  0.08  0.01 -0.02  0.01
## X1552453_a_at  0.33 -0.17  0.06  0.09 -0.04  0.07  0.08  0.06  0.14 -0.07  0.04
## X1552472_a_at -0.16 -0.03  0.68 -0.24 -0.16  0.01  0.01  0.00  0.11  0.08  0.05
## X1552482_at   -0.20  0.25 -0.31 -0.16  0.21 -0.16  0.02  0.23  0.18  0.23 -0.23
## X1552486_s_at -0.17  0.05  0.53 -0.02  0.04 -0.04 -0.05 -0.05 -0.11 -0.20  0.22
## X1552491_at    0.54 -0.15  0.01  0.08 -0.17 -0.10  0.02  0.00  0.21 -0.08 -0.15
## X1552501_a_at  0.67  0.12 -0.01  0.05  0.02  0.12  0.11  0.08  0.00  0.14  0.00
## X1552516_a_at -0.32  0.01  0.67 -0.10  0.09 -0.10  0.05  0.01  0.02  0.10 -0.03
## X1552518_s_at -0.19 -0.27  0.28 -0.05 -0.39  0.00  0.06  0.20  0.01  0.07  0.06
## X1552523_a_at  0.52 -0.06 -0.28  0.06 -0.05 -0.03  0.10  0.03  0.25 -0.03  0.05
## X1552528_at    0.77 -0.12 -0.04  0.11 -0.04  0.02 -0.01  0.05 -0.02  0.02 -0.05
## X1552532_a_at -0.20  0.02  0.10 -0.16 -0.08 -0.02 -0.03  0.14  0.08  0.75  0.04
## X1552535_at    0.71 -0.04 -0.15  0.04  0.11  0.02  0.04  0.03  0.12 -0.04  0.04
## X1552538_a_at  0.05 -0.11  0.01  0.03 -0.11 -0.08  0.09  0.08  0.05  0.03 -0.09
## X1552555_at    0.78  0.00  0.01 -0.20  0.10 -0.01  0.07 -0.05 -0.07 -0.02  0.09
## X1552563_a_at -0.24  0.19  0.13 -0.23 -0.14  0.01 -0.10  0.07  0.04  0.12 -0.04
## X1552566_at    0.32 -0.15  0.23  0.04  0.01  0.10  0.04  0.03  0.27  0.12  0.13
## X1552569_a_at  0.34 -0.12  0.06  0.01 -0.13  0.08  0.08 -0.04 -0.07 -0.02  0.08
## X1552582_at    0.18  0.11 -0.10  0.15  0.10  0.00  0.04 -0.10 -0.08 -0.06 -0.09
## X1552585_s_at  0.52 -0.05 -0.13 -0.03  0.05  0.00  0.06 -0.01  0.04 -0.05  0.08
## X1552590_a_at  0.12 -0.19  0.06  0.01  0.40 -0.07 -0.09  0.02  0.06  0.07 -0.05
## X1552592_at    0.52 -0.03 -0.15 -0.07  0.19 -0.11  0.03  0.01  0.03 -0.15  0.07
## X1552594_at    0.28 -0.05 -0.12  0.02  0.16 -0.06  0.77  0.00 -0.04 -0.09  0.09
## X1552596_at    0.65  0.00  0.07  0.15  0.02  0.15  0.29  0.07  0.10 -0.10 -0.07
## X1552612_at   -0.02  0.15  0.81  0.09  0.01  0.01  0.03 -0.06 -0.07 -0.08  0.11
## X1552623_at   -0.01  0.31 -0.08  0.11  0.06 -0.05  0.18 -0.07 -0.11 -0.13  0.01
##                MR12   MR5  MR16   MR9  MR28  MR21   MR4  MR39  MR29  MR24  MR13
## X1552584_at    0.03  0.09 -0.01 -0.19  0.08  0.01 -0.10 -0.02 -0.05 -0.03 -0.02
## X1556147_at    0.04  0.04 -0.05  0.73  0.00 -0.09  0.08  0.11 -0.02 -0.03  0.14
## X1557029_at    0.03 -0.16 -0.06  0.02  0.06  0.08 -0.02 -0.11 -0.03  0.18 -0.07
## X1557677_a_at  0.02  0.01  0.02 -0.06 -0.04  0.69  0.00 -0.04  0.09 -0.02  0.03
## X1559732_at    0.00 -0.09  0.00 -0.02 -0.02  0.02  0.02  0.00 -0.03  0.14 -0.04
## X1559789_a_at -0.05 -0.01  0.03  0.15 -0.05 -0.07  0.04  0.02  0.02 -0.01  0.00
## X1564573_at   -0.07  0.08  0.04  0.12 -0.04 -0.16  0.09  0.03 -0.07 -0.14  0.02
## X1566879_at    0.11 -0.07  0.03  0.04 -0.08  0.19 -0.06  0.08  0.05  0.18 -0.02
## X1569983_at   -0.02  0.03  0.00 -0.10  0.09  0.03  0.06  0.01  0.00  0.15 -0.02
## X201121_s_at  -0.06 -0.03 -0.15  0.10 -0.10  0.00  0.05  0.07  0.00  0.02  0.11
## X202419_at    -0.17 -0.16  0.22  0.02 -0.14 -0.03  0.03  0.00 -0.01  0.07  0.02
## X202425_x_at  -0.03 -0.04  0.00 -0.12  0.00 -0.02  0.04  0.01 -0.05  0.02 -0.88
## X202867_s_at   0.06 -0.05  0.06  0.05  0.00 -0.05 -0.02  0.03  0.03  0.03  0.02
## X203006_at     0.07 -0.03  0.05  0.02 -0.02  0.05  0.02 -0.05  0.04  0.06 -0.06
## X203761_at     0.01  0.03 -0.11 -0.01  0.11 -0.05 -0.02  0.08 -0.06  0.01  0.06
## X204389_at     0.01 -0.10  0.04  0.08  0.00  0.14  0.03 -0.01  0.07  0.08 -0.03
## X205915_x_at   0.02  0.14  0.05  0.20 -0.17  0.22  0.00 -0.03 -0.05  0.08 -0.06
## X206024_at     0.02  0.00 -0.07 -0.10 -0.07  0.14  0.01 -0.01  0.11  0.04 -0.11
## X206396_at     0.01  0.07  0.11  0.02  0.01 -0.01 -0.01  0.14  0.02  0.09 -0.06
## X210516_at     0.02 -0.02  0.02 -0.05  0.03  0.07  0.07  0.05  0.73  0.09  0.05
## X212901_s_at  -0.05 -0.01 -0.03  0.17 -0.02  0.07  0.03  0.06  0.14 -0.01  0.03
## X214894_x_at  -0.08 -0.17  0.55  0.17 -0.13  0.03  0.08  0.05  0.05  0.09 -0.13
## X214936_at     0.04 -0.14 -0.02  0.16  0.13 -0.01  0.08  0.12 -0.10  0.18 -0.12
## X215282_at     0.10 -0.05 -0.13  0.12 -0.21  0.03  0.03 -0.02  0.43 -0.08  0.04
## X216097_at     0.02  0.02  0.03 -0.02 -0.06 -0.06  0.02  0.01  0.02  0.04  0.00
## X217272_s_at   0.01  0.03 -0.04  0.00 -0.10  0.04  0.00  0.09 -0.04  0.04 -0.04
## X220663_at     0.00 -0.09  0.53 -0.07  0.01 -0.02 -0.01  0.00 -0.01 -0.02  0.03
## X220965_s_at  -0.02  0.10 -0.11 -0.04 -0.06 -0.03 -0.17 -0.02  0.16 -0.02 -0.04
## X222027_at    -0.05 -0.05 -0.13 -0.17  0.03  0.21 -0.03  0.02  0.06 -0.09 -0.10
## X222184_at    -0.03  0.02  0.00  0.12 -0.05 -0.03  0.65  0.01  0.18  0.16 -0.04
## X222371_at    -0.05  0.01 -0.17  0.12  0.14 -0.05  0.06  0.19 -0.18  0.00 -0.02
## X226406_at    -0.16 -0.18  0.16  0.04 -0.11  0.02 -0.10  0.03 -0.04 -0.09  0.05
## X227623_at    -0.08 -0.23  0.22  0.12 -0.07 -0.08  0.06 -0.08  0.14  0.34  0.17
## X228041_at    -0.10  0.19 -0.02  0.03  0.30  0.01 -0.05 -0.04  0.25  0.10 -0.08
## X230228_at     0.01 -0.12  0.06  0.03 -0.02  0.04  0.00 -0.04 -0.01  0.13  0.01
## X232968_at    -0.03 -0.07 -0.03  0.11  0.17  0.04 -0.04  0.32 -0.09 -0.04  0.05
## X234774_at     0.03  0.04 -0.03 -0.04  0.03  0.06  0.13 -0.03 -0.01  0.00  0.00
## X239537_at    -0.07 -0.01  0.00 -0.03  0.00 -0.04  0.11 -0.03  0.07  0.65 -0.03
## X239735_at     0.00  0.09 -0.23  0.04  0.15 -0.04  0.00  0.27 -0.09  0.02  0.12
## X241255_at     0.09  0.06 -0.01 -0.10 -0.06 -0.06 -0.02  0.00 -0.03 -0.01  0.08
## X241366_at     0.06  0.08  0.06  0.07 -0.05 -0.10  0.00  0.12  0.19  0.07  0.13
## X243537_at     0.00 -0.06 -0.13  0.09  0.13 -0.07  0.07  0.13 -0.10 -0.08 -0.07
## X243900_at    -0.04  0.10  0.00 -0.06 -0.03  0.02 -0.03 -0.11  0.03 -0.05  0.03
## X117_at        0.00  0.12  0.11 -0.04  0.18 -0.26 -0.02  0.08 -0.02  0.08  0.13
## X1552261_at    0.04 -0.02  0.04  0.02  0.02  0.01  0.01 -0.03  0.03 -0.03  0.02
## X1552264_a_at -0.04  0.05  0.01  0.02 -0.03  0.03  0.03 -0.11  0.02 -0.06 -0.01
## X1552266_at    0.11 -0.24 -0.04 -0.03 -0.08 -0.05  0.50 -0.08 -0.07  0.03  0.03
## X1552271_at    0.47  0.03 -0.01 -0.02  0.01  0.11 -0.12 -0.05 -0.03 -0.10 -0.02
## X1552272_a_at  0.90 -0.01 -0.02  0.04 -0.07  0.00  0.03  0.02  0.04 -0.05  0.05
## X1552280_at   -0.19  0.02  0.01 -0.08  0.01  0.10  0.06 -0.07 -0.08  0.02  0.09
## X1552281_at    0.01  0.00  0.07 -0.12 -0.06  0.02 -0.09 -0.01 -0.03  0.04  0.05
## X1552286_at   -0.01 -0.10 -0.07 -0.15 -0.18  0.15  0.22 -0.18  0.10 -0.01  0.01
## X1552302_at   -0.10 -0.07 -0.04 -0.09  0.59 -0.12 -0.06  0.08  0.02 -0.07 -0.01
## X1552303_a_at -0.10 -0.09 -0.07  0.08  0.61 -0.02 -0.11  0.11 -0.03  0.02  0.01
## X1552306_at   -0.06  0.30 -0.27  0.07  0.01  0.09  0.08  0.09  0.13 -0.15 -0.20
## X1552311_a_at -0.09  0.00 -0.02 -0.03  0.07 -0.07 -0.03 -0.02 -0.08 -0.09  0.09
## X1552315_at   -0.05 -0.06  0.00  0.00  0.01  0.06  0.02  0.01 -0.01  0.08  0.08
## X1552318_at   -0.07 -0.14 -0.04  0.04 -0.02  0.01  0.00  0.02 -0.02  0.07 -0.03
## X1552323_s_at  0.03 -0.01  0.03 -0.02 -0.03  0.02  0.10  0.08 -0.01  0.03 -0.05
## X1552347_at   -0.08  0.06 -0.08  0.11  0.13 -0.08  0.10  0.11 -0.03 -0.02 -0.09
## X1552349_a_at  0.02  0.00  0.09 -0.01 -0.01  0.01  0.00 -0.04  0.00  0.04  0.03
## X1552355_s_at  0.10  0.03 -0.03 -0.12 -0.10  0.02  0.02  0.02 -0.02 -0.13 -0.02
## X1552364_s_at -0.01  0.73 -0.19  0.05 -0.05  0.01 -0.12 -0.01 -0.04 -0.02  0.06
## X1552365_at   -0.04 -0.03 -0.04 -0.01  0.06 -0.08 -0.04  0.13 -0.12  0.09 -0.07
## X1552377_s_at  0.01 -0.08  0.13  0.00  0.00 -0.05 -0.07 -0.06 -0.03 -0.06  0.05
## X1552386_at   -0.08 -0.10 -0.05  0.14 -0.16 -0.04 -0.07  0.13  0.02 -0.25  0.12
## X1552388_at    0.06  0.04  0.21  0.06 -0.10 -0.02 -0.03  0.03  0.01  0.07  0.10
## X1552396_at   -0.03  0.04  0.02 -0.19  0.07  0.16 -0.08  0.00 -0.02 -0.09  0.13
## X1552399_a_at  0.05  0.15  0.14  0.31  0.08  0.02 -0.04  0.06 -0.08 -0.03  0.08
## X1552402_at    0.09  0.16  0.19  0.11 -0.05  0.09  0.10 -0.32  0.04 -0.04 -0.04
## X1552412_a_at  0.01  0.01 -0.19  0.09 -0.03 -0.07  0.10  0.11  0.02 -0.01  0.02
## X1552426_a_at -0.03 -0.01  0.07  0.12  0.02 -0.18  0.02 -0.02  0.07  0.10  0.09
## X1552439_s_at  0.01  0.00 -0.03  0.02 -0.01  0.01  0.04  0.04  0.15  0.05  0.02
## X1552445_a_at  0.03 -0.06 -0.09 -0.01 -0.04  0.01  0.10 -0.07 -0.06  0.02  0.06
## X1552450_a_at -0.07 -0.05 -0.05  0.04 -0.11  0.04 -0.13  0.00  0.01  0.04  0.03
## X1552453_a_at  0.15  0.02 -0.04 -0.06 -0.09  0.00 -0.05  0.01  0.07 -0.04  0.06
## X1552472_a_at  0.02  0.16 -0.05 -0.03  0.05  0.04 -0.04  0.04  0.02 -0.04 -0.11
## X1552482_at   -0.07 -0.08  0.05 -0.04  0.13 -0.05 -0.14  0.25 -0.09  0.01  0.10
## X1552486_s_at -0.09  0.04  0.19 -0.05 -0.04 -0.01 -0.16 -0.22  0.26 -0.11  0.03
## X1552491_at   -0.04  0.03 -0.05 -0.12  0.11  0.25  0.04 -0.07 -0.04  0.01 -0.10
## X1552501_a_at  0.09 -0.11  0.03 -0.02  0.00 -0.05 -0.08 -0.11  0.04  0.11  0.00
## X1552516_a_at -0.17  0.05 -0.02  0.26 -0.03 -0.06 -0.11  0.03  0.03 -0.07 -0.03
## X1552518_s_at  0.20  0.18 -0.05 -0.05  0.07  0.02  0.08 -0.06  0.05 -0.01 -0.04
## X1552523_a_at  0.03  0.06 -0.10  0.01 -0.07  0.00  0.05 -0.01 -0.04  0.30  0.02
## X1552528_at    0.11  0.08  0.04  0.00 -0.07  0.01  0.11  0.00  0.01 -0.15 -0.05
## X1552532_a_at -0.03 -0.05 -0.05  0.04 -0.02  0.04  0.04 -0.02  0.01 -0.04 -0.02
## X1552535_at    0.04 -0.15 -0.02  0.12  0.03  0.00  0.00 -0.03 -0.13  0.17  0.00
## X1552538_a_at -0.02 -0.01 -0.02 -0.09 -0.03  0.03 -0.02 -0.69 -0.05  0.04  0.02
## X1552555_at   -0.01  0.10  0.07  0.08 -0.01 -0.10  0.05  0.00 -0.01  0.00  0.08
## X1552563_a_at -0.10 -0.22  0.11  0.06  0.06  0.08  0.44  0.22 -0.06  0.03 -0.13
## X1552566_at    0.09  0.00  0.10  0.07  0.00 -0.11  0.05 -0.06 -0.12 -0.11 -0.09
## X1552569_a_at  0.04  0.04  0.11 -0.02  0.03  0.09 -0.07 -0.05  0.00  0.03  0.00
## X1552582_at    0.06  0.20 -0.01  0.00 -0.02 -0.11  0.04  0.05  0.01  0.12  0.13
## X1552585_s_at  0.00 -0.03 -0.04 -0.04  0.05 -0.05  0.00  0.01 -0.10  0.13 -0.04
## X1552590_a_at  0.00 -0.01 -0.13 -0.11  0.04  0.51 -0.07  0.04 -0.14 -0.09  0.01
## X1552592_at    0.08  0.12  0.18  0.03 -0.04  0.01  0.08 -0.12  0.00  0.08  0.21
## X1552594_at    0.09 -0.01  0.01  0.11 -0.02 -0.01 -0.04 -0.01 -0.02  0.00  0.00
## X1552596_at    0.06 -0.13 -0.01 -0.04  0.07  0.00  0.02  0.06  0.05  0.02  0.03
## X1552612_at   -0.09  0.04 -0.06 -0.14 -0.02  0.06 -0.11  0.13  0.03 -0.03  0.09
## X1552623_at    0.03  0.05 -0.10 -0.02  0.07  0.00 -0.14 -0.01  0.01 -0.16  0.06
##                MR22  MR17  MR40  MR38  MR11  MR37  MR26   MR6  MR35  MR19  MR25
## X1552584_at    0.13 -0.09  0.04 -0.06  0.06 -0.09  0.00  0.04 -0.07  0.02 -0.07
## X1556147_at   -0.03  0.03  0.09  0.13  0.03 -0.01 -0.07 -0.02 -0.02 -0.03 -0.05
## X1557029_at   -0.06 -0.07  0.04  0.04  0.12 -0.06  0.00  0.07 -0.02  0.19  0.12
## X1557677_a_at -0.07  0.02 -0.05 -0.16  0.03 -0.05 -0.02 -0.02  0.03 -0.07 -0.01
## X1559732_at   -0.04  0.03  0.02  0.01 -0.03  0.06  0.13 -0.06  0.75  0.02  0.01
## X1559789_a_at  0.02 -0.08 -0.02 -0.01 -0.05  0.09  0.06 -0.12  0.09  0.03  0.02
## X1564573_at   -0.01 -0.01  0.64 -0.03  0.01 -0.11  0.01 -0.08  0.01 -0.07  0.01
## X1566879_at    0.09 -0.05  0.00  0.04 -0.06 -0.08 -0.03  0.05 -0.02 -0.13  0.02
## X1569983_at   -0.07  0.01  0.02 -0.01  0.05 -0.14  0.71  0.04  0.19 -0.01 -0.07
## X201121_s_at   0.03  0.02 -0.04  0.06  0.03  0.04 -0.07 -0.13 -0.10 -0.14 -0.07
## X202419_at     0.11 -0.05 -0.09  0.20  0.15  0.07  0.11  0.02  0.03 -0.11  0.26
## X202425_x_at  -0.06 -0.02 -0.04 -0.09 -0.02  0.07  0.01 -0.03  0.04  0.00 -0.03
## X202867_s_at   0.06 -0.03  0.07 -0.04 -0.10 -0.01  0.08  0.02  0.05  0.03  0.06
## X203006_at    -0.75 -0.05 -0.01 -0.05  0.02  0.01  0.04 -0.07  0.04  0.06 -0.06
## X203761_at     0.11 -0.09 -0.05  0.02 -0.05  0.11  0.01 -0.09 -0.14 -0.02 -0.03
## X204389_at    -0.08  0.15  0.04  0.04 -0.03 -0.08 -0.03 -0.05  0.09  0.03  0.00
## X205915_x_at  -0.01  0.08 -0.09 -0.17  0.05 -0.04 -0.10  0.09  0.04 -0.03  0.05
## X206024_at    -0.17  0.06  0.08  0.01  0.03  0.05  0.07 -0.09  0.10  0.03  0.23
## X206396_at     0.07  0.06  0.09 -0.04  0.05  0.16  0.00  0.10 -0.01 -0.09 -0.04
## X210516_at    -0.04  0.04 -0.02  0.08  0.14 -0.11  0.02  0.04 -0.05  0.01  0.04
## X212901_s_at   0.09 -0.10  0.10 -0.12  0.03  0.13  0.08 -0.06 -0.12  0.04  0.06
## X214894_x_at   0.03  0.09  0.15  0.06  0.05  0.08 -0.02 -0.05 -0.04 -0.10  0.03
## X214936_at     0.11 -0.08 -0.10  0.02 -0.11 -0.06 -0.05 -0.15 -0.03 -0.04 -0.11
## X215282_at    -0.05 -0.03 -0.12 -0.03  0.30 -0.07 -0.06 -0.18  0.10  0.09  0.05
## X216097_at    -0.05  0.05 -0.03  0.04  0.05 -0.02 -0.01  0.05  0.02  0.75 -0.04
## X217272_s_at   0.01 -0.02  0.02 -0.05  0.00 -0.03  0.02  0.01  0.01  0.03  0.05
## X220663_at    -0.05  0.01  0.03  0.03 -0.05 -0.05  0.00 -0.04  0.01  0.05 -0.03
## X220965_s_at   0.03  0.22  0.08 -0.04  0.01 -0.04  0.19 -0.03 -0.08 -0.03  0.04
## X222027_at     0.09  0.08 -0.06  0.07  0.09 -0.05  0.11 -0.11 -0.06  0.28 -0.15
## X222184_at    -0.16 -0.01  0.14 -0.02  0.00  0.05  0.05 -0.03  0.04 -0.03 -0.04
## X222371_at     0.04  0.03 -0.12  0.14 -0.05  0.01  0.02  0.01 -0.05 -0.03 -0.10
## X226406_at     0.25  0.02  0.14  0.13  0.10 -0.09  0.04  0.14 -0.12 -0.01  0.10
## X227623_at    -0.04 -0.21  0.05  0.18 -0.17  0.16  0.07 -0.21  0.02 -0.06  0.01
## X228041_at     0.00 -0.05 -0.01  0.20 -0.04 -0.21 -0.16  0.01 -0.09  0.05 -0.06
## X230228_at    -0.13  0.10  0.21  0.03  0.02  0.24  0.17 -0.06  0.07  0.01 -0.12
## X232968_at    -0.21  0.04  0.15  0.13 -0.13 -0.01  0.02 -0.03 -0.04 -0.02  0.10
## X234774_at     0.01 -0.02  0.18 -0.02  0.04 -0.12 -0.09  0.13  0.02  0.06  0.07
## X239537_at    -0.05 -0.03 -0.04  0.10  0.05  0.09  0.10 -0.08  0.14  0.04 -0.01
## X239735_at     0.05 -0.01 -0.02  0.16 -0.13  0.00  0.09  0.00 -0.05 -0.07 -0.04
## X241255_at    -0.03  0.04 -0.05  0.00  0.06 -0.02  0.03  0.00  0.03  0.05  0.04
## X241366_at     0.23  0.01 -0.07 -0.05  0.08 -0.07  0.13 -0.02  0.02 -0.21  0.05
## X243537_at    -0.01  0.11 -0.10 -0.04  0.10  0.00  0.11  0.02  0.03 -0.05  0.00
## X243900_at     0.03  0.05  0.00 -0.06 -0.04  0.00  0.05  0.04  0.05  0.00  0.06
## X117_at        0.05 -0.12  0.03  0.13  0.06  0.11  0.04  0.03 -0.14 -0.15  0.03
## X1552261_at    0.05  0.83  0.03  0.10  0.14  0.03  0.01  0.04  0.03  0.06  0.03
## X1552264_a_at  0.02 -0.10  0.03 -0.02  0.09  0.15 -0.01 -0.06 -0.05  0.10 -0.01
## X1552266_at    0.13  0.07 -0.08  0.08  0.26 -0.14 -0.01 -0.16 -0.04  0.07  0.03
## X1552271_at   -0.08 -0.06  0.06  0.03  0.03  0.01 -0.08  0.08 -0.01 -0.07  0.03
## X1552272_a_at -0.06  0.06 -0.03 -0.04  0.02 -0.04  0.00  0.00  0.01  0.04  0.09
## X1552280_at    0.02  0.15 -0.03 -0.02  0.02 -0.04  0.07  0.07 -0.23  0.03  0.04
## X1552281_at   -0.03 -0.01  0.07  0.06 -0.06  0.13  0.15 -0.07 -0.05 -0.12  0.04
## X1552286_at    0.20  0.04 -0.09  0.04  0.14 -0.11 -0.02  0.11  0.01 -0.02 -0.15
## X1552302_at   -0.01  0.02 -0.07  0.05 -0.01  0.11  0.21  0.10 -0.01 -0.12 -0.09
## X1552303_a_at  0.06  0.05 -0.12  0.00 -0.05  0.11  0.05  0.07 -0.03 -0.09 -0.12
## X1552306_at   -0.05 -0.01  0.04  0.09 -0.10 -0.08 -0.21  0.02  0.04  0.04 -0.27
## X1552311_a_at  0.07 -0.03  0.06 -0.18  0.03 -0.04 -0.04  0.12  0.11  0.01 -0.05
## X1552315_at    0.01  0.10 -0.03  0.04 -0.02  0.01 -0.02  0.09  0.20 -0.01  0.00
## X1552318_at    0.05  0.04 -0.06  0.01 -0.10 -0.04  0.00  0.06  0.16 -0.08 -0.03
## X1552323_s_at -0.01  0.04  0.06 -0.03  0.05  0.05 -0.06  0.01  0.02 -0.10  0.01
## X1552347_at    0.15 -0.07 -0.03  0.07  0.03  0.01  0.03  0.12  0.08  0.08  0.00
## X1552349_a_at  0.01  0.06 -0.01  0.08  0.03 -0.01  0.02 -0.10  0.01  0.04  0.03
## X1552355_s_at  0.07 -0.15 -0.03 -0.13  0.10 -0.07  0.05 -0.05 -0.02  0.02  0.11
## X1552364_s_at  0.04 -0.02  0.05 -0.01  0.00 -0.04  0.03  0.03 -0.12  0.02  0.01
## X1552365_at   -0.01  0.02 -0.07  0.01 -0.03  0.72 -0.10 -0.06  0.06 -0.02  0.07
## X1552377_s_at  0.03  0.15 -0.06  0.05  0.00  0.02  0.06 -0.01  0.04  0.05 -0.02
## X1552386_at    0.00 -0.03  0.10  0.08 -0.12  0.11 -0.11  0.11 -0.12 -0.13 -0.03
## X1552388_at    0.03  0.13  0.48  0.01 -0.12 -0.03  0.03  0.12  0.03  0.00  0.08
## X1552396_at    0.04  0.09 -0.15 -0.02 -0.01  0.02  0.10 -0.03 -0.14 -0.07  0.00
## X1552399_a_at  0.01 -0.03 -0.06 -0.04 -0.03 -0.09 -0.04  0.09 -0.07  0.00  0.06
## X1552402_at   -0.08 -0.01  0.05 -0.11 -0.10 -0.01 -0.06  0.08 -0.05  0.08  0.19
## X1552412_a_at -0.06  0.00  0.10  0.01 -0.11  0.13 -0.07  0.01 -0.05 -0.15  0.10
## X1552426_a_at  0.05  0.11 -0.01  0.76  0.13  0.00 -0.01  0.02  0.02  0.05 -0.06
## X1552439_s_at -0.02  0.13 -0.01  0.11  0.75 -0.02  0.04  0.04 -0.03  0.04 -0.01
## X1552445_a_at  0.00 -0.04  0.03 -0.04 -0.04 -0.04 -0.05  0.04 -0.02  0.07  0.01
## X1552450_a_at  0.11  0.08  0.14  0.04 -0.04  0.06  0.22 -0.16 -0.02 -0.03  0.05
## X1552453_a_at  0.11  0.05  0.04 -0.10 -0.03  0.09 -0.10  0.04  0.01 -0.07  0.67
## X1552472_a_at  0.06  0.01 -0.15  0.12 -0.04 -0.14 -0.19  0.08 -0.06  0.11 -0.10
## X1552482_at   -0.02  0.06 -0.11  0.23  0.06  0.08  0.02 -0.19  0.02 -0.03 -0.05
## X1552486_s_at  0.15 -0.17  0.08  0.04  0.02 -0.02 -0.07  0.01 -0.08 -0.12  0.15
## X1552491_at    0.09  0.06  0.20  0.01 -0.01 -0.07  0.16 -0.01  0.00 -0.03 -0.10
## X1552501_a_at  0.04 -0.02  0.09  0.08  0.04  0.03  0.12  0.04  0.18  0.02  0.16
## X1552516_a_at  0.10  0.05  0.00 -0.02 -0.01  0.00 -0.06 -0.06 -0.03  0.05 -0.06
## X1552518_s_at  0.08 -0.11 -0.13  0.05  0.18 -0.03 -0.10  0.14  0.02  0.02 -0.04
## X1552523_a_at -0.11  0.08 -0.02 -0.03  0.09 -0.11 -0.04 -0.02  0.09  0.04 -0.09
## X1552528_at    0.11  0.05 -0.01 -0.07 -0.10 -0.02 -0.05 -0.08  0.06  0.06  0.01
## X1552532_a_at -0.01 -0.01 -0.07  0.08  0.02 -0.04  0.00 -0.09 -0.08 -0.02 -0.02
## X1552535_at   -0.11  0.04 -0.03  0.07  0.05 -0.07  0.08 -0.06  0.18  0.00 -0.08
## X1552538_a_at -0.05  0.03 -0.02  0.03 -0.04 -0.11  0.00 -0.01  0.00 -0.01 -0.01
## X1552555_at   -0.11  0.06 -0.05 -0.05 -0.11 -0.13 -0.04  0.05 -0.11  0.08  0.16
## X1552563_a_at  0.12 -0.08 -0.03 -0.02 -0.15 -0.06  0.06 -0.05  0.03  0.06 -0.07
## X1552566_at   -0.21  0.15 -0.10 -0.22  0.00 -0.14 -0.18  0.15 -0.03 -0.14  0.11
## X1552569_a_at  0.00  0.06 -0.04 -0.10  0.02 -0.05 -0.09  0.02  0.02  0.09  0.00
## X1552582_at    0.11  0.01  0.08 -0.01 -0.06  0.16  0.10 -0.05  0.07  0.02  0.15
## X1552585_s_at -0.07 -0.12 -0.03  0.01 -0.01  0.03  0.02 -0.06 -0.01  0.01  0.04
## X1552590_a_at  0.07 -0.10 -0.14  0.06 -0.06 -0.07  0.19  0.14 -0.11  0.04  0.10
## X1552592_at   -0.02 -0.04 -0.04  0.00 -0.03  0.15 -0.01  0.01 -0.05  0.18  0.04
## X1552594_at    0.03  0.07  0.08 -0.04 -0.01  0.00 -0.01  0.09 -0.05 -0.09  0.00
## X1552596_at    0.03  0.09  0.08  0.03  0.08  0.04  0.08  0.08  0.13 -0.08 -0.08
## X1552612_at    0.03  0.05  0.06  0.03  0.04 -0.07  0.06  0.05 -0.05  0.04  0.07
## X1552623_at    0.12  0.06 -0.02  0.02  0.05 -0.10  0.04  0.74 -0.10  0.08  0.04
##                MR32  MR34  MR18  MR30  MR31  MR27  MR23   h2    u2  com
## X1552584_at    0.02 -0.02  0.01  0.08  0.05  0.01 -0.12 0.92 0.079  2.0
## X1556147_at   -0.01 -0.05 -0.11  0.01  0.00 -0.02 -0.01 0.70 0.303  1.7
## X1557029_at    0.08 -0.08 -0.02  0.02  0.08  0.02  0.06 0.89 0.115  3.0
## X1557677_a_at -0.01  0.00 -0.05 -0.01  0.03 -0.04  0.02 0.58 0.418  1.5
## X1559732_at    0.02  0.04  0.03  0.04  0.01  0.00  0.00 0.67 0.329  1.4
## X1559789_a_at -0.05  0.03  0.06 -0.02 -0.09  0.02 -0.02 0.74 0.256  1.5
## X1564573_at    0.01  0.07 -0.05 -0.09 -0.02  0.01 -0.04 0.75 0.253  3.1
## X1566879_at    0.09 -0.01  0.09  0.10  0.05  0.02 -0.15 0.74 0.261  3.2
## X1569983_at   -0.01  0.05  0.04  0.06 -0.08  0.01  0.03 0.76 0.240  2.2
## X201121_s_at   0.05 -0.61 -0.09  0.04  0.01 -0.02  0.00 0.69 0.315  3.3
## X202419_at     0.25  0.09  0.07  0.04 -0.02  0.01 -0.04 0.80 0.196 13.5
## X202425_x_at   0.02  0.06 -0.08 -0.05 -0.01  0.02 -0.01 0.86 0.142  1.3
## X202867_s_at   0.08  0.03 -0.11 -0.04 -0.04 -0.02  0.07 0.63 0.369  2.2
## X203006_at     0.01  0.03  0.03 -0.03  0.01  0.03  0.00 0.69 0.312  1.5
## X203761_at    -0.04 -0.03  0.00  0.04  0.04 -0.01 -0.04 0.94 0.062  2.0
## X204389_at    -0.05 -0.03  0.07  0.01 -0.06  0.07  0.06 0.77 0.233  1.9
## X205915_x_at   0.08 -0.01  0.12 -0.04 -0.02  0.15  0.02 0.74 0.259  3.7
## X206024_at     0.26 -0.12  0.07  0.01 -0.02  0.10 -0.08 0.46 0.542 11.6
## X206396_at     0.00 -0.10 -0.05  0.00 -0.07 -0.02  0.05 0.66 0.338  2.2
## X210516_at     0.02 -0.02 -0.05 -0.04 -0.03 -0.05 -0.01 0.70 0.301  1.7
## X212901_s_at   0.25 -0.05  0.23 -0.18  0.05 -0.07  0.13 0.73 0.274 12.1
## X214894_x_at  -0.19  0.07  0.05  0.08  0.08  0.02  0.08 0.78 0.224  5.5
## X214936_at     0.12  0.11 -0.08  0.06 -0.09 -0.04  0.07 0.77 0.228  4.7
## X215282_at     0.06  0.15  0.11  0.17  0.15 -0.01  0.10 0.71 0.294  9.7
## X216097_at     0.02  0.07  0.05  0.03  0.06  0.00  0.00 0.67 0.334  1.4
## X217272_s_at   0.05  0.03 -0.09 -0.05  0.10 -0.06 -0.04 0.73 0.267  1.5
## X220663_at     0.05  0.05 -0.03  0.03  0.04 -0.02 -0.02 0.36 0.638  1.7
## X220965_s_at  -0.05 -0.10 -0.04 -0.15  0.10  0.07 -0.09 0.68 0.316  4.0
## X222027_at    -0.01 -0.07  0.07 -0.13 -0.21 -0.01  0.03 0.84 0.157 10.0
## X222184_at    -0.01  0.01 -0.02 -0.05 -0.05  0.04 -0.10 0.70 0.303  2.7
## X222371_at    -0.16 -0.11  0.05 -0.04 -0.05 -0.01  0.03 0.93 0.074  6.6
## X226406_at     0.12 -0.12  0.04 -0.10  0.08 -0.19  0.09 0.84 0.157  8.1
## X227623_at    -0.01  0.19 -0.04  0.06  0.07  0.10 -0.05 0.83 0.167 15.3
## X228041_at    -0.04  0.09  0.16  0.02 -0.04 -0.01 -0.01 0.73 0.275  8.2
## X230228_at    -0.05  0.27 -0.09 -0.02 -0.03 -0.05 -0.09 0.69 0.308  5.7
## X232968_at     0.02  0.05  0.02 -0.01 -0.05 -0.16  0.03 0.83 0.174  8.4
## X234774_at     0.04  0.00  0.04  0.08  0.10  0.04 -0.31 0.83 0.173  3.6
## X239537_at     0.00 -0.02 -0.01 -0.02  0.03  0.07  0.02 0.56 0.436  1.7
## X239735_at    -0.44  0.12 -0.15 -0.05 -0.08 -0.03  0.04 0.87 0.130  9.3
## X241255_at     0.03  0.04  0.75 -0.01 -0.04  0.00  0.00 0.68 0.319  1.4
## X241366_at     0.08  0.04 -0.01  0.02 -0.05 -0.03  0.08 0.86 0.145  5.0
## X243537_at    -0.16  0.11 -0.12  0.00 -0.08 -0.03  0.02 0.79 0.214  4.7
## X243900_at    -0.01  0.04 -0.02 -0.02  0.03  0.03  0.01 0.88 0.119  1.6
## X117_at       -0.03 -0.05  0.02 -0.14  0.05  0.12 -0.05 0.70 0.302  6.9
## X1552261_at   -0.01  0.00  0.04  0.08  0.04 -0.06  0.01 0.80 0.197  1.3
## X1552264_a_at  0.03 -0.11  0.00  0.02  0.02  0.00 -0.06 0.90 0.099  2.0
## X1552266_at    0.06 -0.02 -0.05 -0.06 -0.02 -0.05  0.13 0.59 0.413  4.7
## X1552271_at   -0.02  0.01  0.07  0.03  0.02  0.03 -0.10 0.77 0.234  3.2
## X1552272_a_at  0.01  0.03  0.09  0.01  0.03  0.00  0.01 0.94 0.064  1.3
## X1552280_at    0.05 -0.17  0.03  0.17 -0.03  0.09  0.29 0.67 0.334  6.0
## X1552281_at   -0.05  0.07 -0.03  0.05  0.08  0.03 -0.03 0.85 0.145  1.6
## X1552286_at    0.17 -0.20  0.08  0.06  0.01 -0.04  0.05 0.71 0.293 11.4
## X1552302_at   -0.07  0.09 -0.13 -0.02  0.01  0.03  0.07 0.93 0.075  4.8
## X1552303_a_at -0.01  0.08 -0.11  0.00  0.06  0.06 -0.05 0.90 0.096  4.1
## X1552306_at    0.14  0.01 -0.07 -0.18  0.08  0.17 -0.04 0.77 0.228 12.8
## X1552311_a_at -0.06  0.13 -0.06  0.01 -0.12  0.01 -0.02 0.74 0.256  2.8
## X1552315_at    0.01  0.08  0.01  0.03 -0.07 -0.02  0.02 0.92 0.083  1.4
## X1552318_at   -0.03  0.06  0.00  0.00 -0.09 -0.04  0.08 0.91 0.092  1.5
## X1552323_s_at  0.01 -0.05  0.01  0.05  0.05  0.05 -0.01 0.84 0.157  1.4
## X1552347_at   -0.12  0.04  0.10  0.05 -0.22  0.03  0.21 0.74 0.255  4.2
## X1552349_a_at -0.10  0.04  0.01  0.04 -0.07  0.00  0.05 0.83 0.168  1.6
## X1552355_s_at  0.04  0.02  0.10  0.08  0.14 -0.05  0.00 0.79 0.214  3.2
## X1552364_s_at -0.01  0.01  0.07 -0.02  0.02 -0.02  0.00 0.72 0.280  1.8
## X1552365_at    0.01 -0.01 -0.01  0.04 -0.03  0.02  0.01 0.65 0.348  1.6
## X1552377_s_at  0.04 -0.07 -0.02  0.57 -0.08  0.11  0.01 0.62 0.384  3.1
## X1552386_at    0.07  0.11 -0.01  0.28  0.03 -0.17  0.05 0.79 0.213  6.5
## X1552388_at   -0.01 -0.02 -0.07  0.10 -0.04 -0.07  0.06 0.63 0.366  5.7
## X1552396_at   -0.03 -0.06 -0.04  0.02 -0.14  0.07  0.00 0.58 0.419  7.7
## X1552399_a_at -0.28 -0.03 -0.24  0.00 -0.20 -0.08 -0.01 0.71 0.288 10.2
## X1552402_at    0.13 -0.13 -0.03 -0.03  0.07 -0.05  0.06 0.73 0.268  4.5
## X1552412_a_at  0.03  0.23 -0.07  0.12  0.05  0.17  0.00 0.55 0.448  6.6
## X1552426_a_at -0.01 -0.04 -0.01  0.03 -0.07  0.00  0.01 0.73 0.270  1.6
## X1552439_s_at  0.02 -0.03  0.06 -0.02  0.01  0.00 -0.01 0.66 0.339  1.4
## X1552445_a_at  0.11 -0.11 -0.03 -0.02  0.03  0.00 -0.05 0.85 0.147  1.5
## X1552450_a_at  0.10  0.06  0.04  0.14  0.04  0.18  0.09 0.79 0.208  2.6
## X1552453_a_at  0.02  0.07  0.05  0.00  0.00  0.03  0.01 0.74 0.257  2.6
## X1552472_a_at  0.16 -0.07 -0.04 -0.18  0.16 -0.02  0.07 0.87 0.128  3.4
## X1552482_at   -0.10  0.09 -0.12 -0.05 -0.06  0.01  0.11 0.79 0.211 16.7
## X1552486_s_at  0.15 -0.02 -0.11  0.03  0.11 -0.04 -0.17 0.81 0.188  6.7
## X1552491_at    0.07 -0.12 -0.08  0.05  0.03  0.03  0.14 0.68 0.317  5.0
## X1552501_a_at -0.05  0.03  0.04 -0.03  0.13  0.07  0.33 0.80 0.196  3.0
## X1552516_a_at -0.11 -0.06 -0.15  0.10 -0.18  0.06  0.07 0.84 0.163  3.2
## X1552518_s_at  0.33 -0.18  0.00  0.01 -0.03 -0.03  0.11 0.75 0.252 10.1
## X1552523_a_at  0.11  0.16  0.01  0.01 -0.16 -0.18  0.03 0.71 0.289  5.3
## X1552528_at   -0.08 -0.04  0.08 -0.04  0.02  0.00  0.03 0.75 0.251  1.6
## X1552532_a_at  0.02 -0.10  0.02 -0.03 -0.02 -0.03 -0.01 0.72 0.276  1.7
## X1552535_at    0.00  0.07  0.09  0.10  0.05 -0.12 -0.02 0.77 0.233  2.3
## X1552538_a_at  0.02  0.04  0.00  0.02  0.02 -0.01  0.01 0.58 0.424  1.4
## X1552555_at    0.12 -0.10  0.04  0.03  0.05  0.09 -0.08 0.86 0.138  1.9
## X1552563_a_at -0.05 -0.10  0.05  0.12 -0.01 -0.02  0.03 0.66 0.340  8.4
## X1552566_at    0.00 -0.03  0.05  0.11  0.09 -0.04 -0.22 0.67 0.333 15.1
## X1552569_a_at  0.02 -0.01 -0.06 -0.04  0.58  0.03  0.00 0.59 0.413  2.7
## X1552582_at   -0.10  0.10 -0.01  0.31  0.14 -0.11 -0.06 0.45 0.549 12.5
## X1552585_s_at  0.02  0.02  0.01  0.06  0.04  0.69  0.02 0.85 0.154  2.4
## X1552590_a_at  0.09 -0.01 -0.03 -0.26  0.16  0.08 -0.12 0.81 0.189  6.4
## X1552592_at    0.00  0.16 -0.01  0.01 -0.06 -0.04  0.03 0.60 0.404  4.4
## X1552594_at    0.01  0.05  0.05  0.07  0.04  0.01 -0.02 0.81 0.195  1.8
## X1552596_at   -0.22 -0.10  0.01 -0.04 -0.11  0.02  0.05 0.76 0.242  3.1
## X1552612_at   -0.19  0.03  0.00  0.01 -0.10 -0.10  0.07 0.87 0.127  1.8
## X1552623_at   -0.01  0.11 -0.01 -0.02  0.02 -0.05  0.00 0.88 0.123  2.4
## 
##                         MR1  MR3  MR2 MR15 MR10  MR7 MR14  MR8 MR20 MR33 MR36
## SS loadings           10.82 6.50 6.00 2.75 2.25 2.05 1.97 1.90 1.89 1.69 1.57
## Proportion Var         0.11 0.07 0.06 0.03 0.02 0.02 0.02 0.02 0.02 0.02 0.02
## Cumulative Var         0.11 0.17 0.23 0.26 0.28 0.30 0.32 0.34 0.36 0.38 0.39
## Proportion Explained   0.14 0.09 0.08 0.04 0.03 0.03 0.03 0.03 0.03 0.02 0.02
## Cumulative Proportion  0.14 0.23 0.31 0.35 0.38 0.41 0.43 0.46 0.48 0.51 0.53
##                       MR12  MR5 MR16  MR9 MR28 MR21  MR4 MR39 MR29 MR24 MR13
## SS loadings           1.51 1.45 1.42 1.40 1.39 1.37 1.36 1.36 1.32 1.30 1.30
## Proportion Var        0.02 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01
## Cumulative Var        0.41 0.42 0.44 0.45 0.47 0.48 0.49 0.51 0.52 0.53 0.55
## Proportion Explained  0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02
## Cumulative Proportion 0.55 0.57 0.58 0.60 0.62 0.64 0.66 0.68 0.69 0.71 0.73
##                       MR22 MR17 MR40 MR38 MR11 MR37 MR26  MR6 MR35 MR19 MR25
## SS loadings           1.30 1.27 1.26 1.25 1.23 1.22 1.21 1.18 1.17 1.16 1.12
## Proportion Var        0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01
## Cumulative Var        0.56 0.57 0.58 0.60 0.61 0.62 0.63 0.64 0.66 0.67 0.68
## Proportion Explained  0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.01
## Cumulative Proportion 0.75 0.76 0.78 0.80 0.81 0.83 0.85 0.86 0.88 0.89 0.91
##                       MR32 MR34 MR18 MR30 MR31 MR27 MR23
## SS loadings           1.12 1.11 1.08 1.02 0.95 0.92 0.74
## Proportion Var        0.01 0.01 0.01 0.01 0.01 0.01 0.01
## Cumulative Var        0.69 0.70 0.71 0.72 0.73 0.74 0.75
## Proportion Explained  0.01 0.01 0.01 0.01 0.01 0.01 0.01
## Cumulative Proportion 0.92 0.94 0.95 0.97 0.98 0.99 1.00
## 
## Mean item complexity =  4.5
## Test of the hypothesis that 40 factors are sufficient.
## 
## df null model =  4950  with the objective function =  90.69 with Chi Square =  10504.89
## df of  the model are 1730  and the objective function was  19.16 
## 
## The root mean square of the residuals (RMSR) is  0.01 
## The df corrected root mean square of the residuals is  0.02 
## 
## The harmonic n.obs is  151 with the empirical chi square  253.3  with prob <  1 
## The total n.obs was  151  with Likelihood Chi Square =  1708.48  with prob <  0.64 
## 
## Tucker Lewis Index of factoring reliability =  1.02
## RMSEA index =  0  and the 90 % confidence intervals are  0 0.017
## BIC =  -6971.41
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    MR1  MR3  MR2 MR15 MR10  MR7
## Correlation of (regression) scores with factors   0.98 0.99 0.98 0.93 0.93 0.93
## Multiple R square of scores with factors          0.96 0.97 0.96 0.87 0.86 0.86
## Minimum correlation of possible factor scores     0.92 0.95 0.91 0.75 0.71 0.72
##                                                   MR14  MR8 MR20 MR33 MR36 MR12
## Correlation of (regression) scores with factors   0.95 0.95 0.94 0.91 0.93 0.98
## Multiple R square of scores with factors          0.91 0.90 0.89 0.83 0.86 0.96
## Minimum correlation of possible factor scores     0.82 0.79 0.78 0.67 0.71 0.91
##                                                    MR5 MR16  MR9 MR28 MR21  MR4
## Correlation of (regression) scores with factors   0.90 0.93 0.91 0.93 0.90 0.88
## Multiple R square of scores with factors          0.81 0.86 0.82 0.86 0.80 0.78
## Minimum correlation of possible factor scores     0.63 0.71 0.65 0.72 0.61 0.56
##                                                   MR39 MR29 MR24 MR13 MR22 MR17
## Correlation of (regression) scores with factors   0.89 0.89 0.89 0.93 0.90 0.91
## Multiple R square of scores with factors          0.79 0.80 0.79 0.86 0.80 0.83
## Minimum correlation of possible factor scores     0.58 0.60 0.58 0.73 0.61 0.66
##                                                   MR40 MR38 MR11 MR37 MR26  MR6
## Correlation of (regression) scores with factors   0.89 0.89 0.89 0.90 0.89 0.92
## Multiple R square of scores with factors          0.79 0.80 0.78 0.80 0.80 0.84
## Minimum correlation of possible factor scores     0.58 0.60 0.57 0.61 0.60 0.68
##                                                   MR35 MR19 MR25 MR32 MR34 MR18
## Correlation of (regression) scores with factors   0.88 0.89 0.88 0.90 0.87 0.87
## Multiple R square of scores with factors          0.78 0.80 0.78 0.82 0.75 0.77
## Minimum correlation of possible factor scores     0.57 0.59 0.56 0.63 0.50 0.53
##                                                   MR30 MR31 MR27 MR23
## Correlation of (regression) scores with factors   0.88 0.89 0.90 0.88
## Multiple R square of scores with factors          0.77 0.79 0.81 0.77
## Minimum correlation of possible factor scores     0.55 0.59 0.63 0.54

The difference between the eigenvalue-based factor selection and the cumulative variance reported in Factor Analysis (FA) is expected due to the way variance is handled in FA. Unlike eigenvalue-based methods, FA removes unique variance and re-estimates factor loadings, which naturally results in a lower cumulative variance. Additionally, factor rotation redistributes variance among factors, sometimes lowering the cumulative total compared to initial eigenvalue calculations. Eigenvalue-based selection assumes that all variance is shared and directly contributes to the factors, whereas FA specifically filters out noise and unique variance, leading to a more refined but lower cumulative variance. That’s why we can see that 40 factors we got are explaining only 75% of variance and not the 85% we expected from the cummulative variance proportion. Although it is a smaller proportion of variance it is still acceptable.

Heatmap of factor loadings

heatmap.2(abs(fa_result$loadings), trace = "none", density.info = "none",
          col = colorRampPalette(c("white", "blue"))(100), main = "Factor Loadings Heatmap")

This heatmap visualizes factor loadings from Factor Analysis, showing the strength of association between genes (rows) and factors (columns). Darker blue shades indicate stronger loadings, meaning certain genes contribute significantly to specific factors. Some factors, like MR1 and MR3, have multiple genes with high loadings, suggesting they capture major variance patterns, while others, like MR9 or MR8, have fewer strong associations, indicating more specific effects. The dendrogram on the left highlights gene clustering, revealing potential co-regulation.

4.5 Principal Component Analysis (PCA)

PCA is a dimension reduction technique that seeks to summarize the dataset by identifying the most important directions (principal components) that capture the maximum variance in the data. Unlike FA, PCA does not assume an underlying latent structure, instead it prioritizes variance maximization to retain as much information as possible while reducing the number of dimensions.

To evaluate the effectiveness of PCA on this dataset, I will compute the principal components, assess the proportion of variance explained by each component, and visualize the results.

pca <- prcomp(top_genes, center = TRUE, scale. = TRUE)
summary_pca <- summary(pca)
print(summary_pca)
## Importance of components:
##                           PC1     PC2    PC3     PC4     PC5     PC6     PC7
## Standard deviation     3.8136 3.11958 2.2091 2.14389 1.81855 1.71076 1.65446
## Proportion of Variance 0.1454 0.09732 0.0488 0.04596 0.03307 0.02927 0.02737
## Cumulative Proportion  0.1454 0.24275 0.2915 0.33752 0.37059 0.39986 0.42723
##                            PC8     PC9    PC10    PC11    PC12    PC13    PC14
## Standard deviation     1.62575 1.54996 1.45410 1.42356 1.40158 1.35146 1.32695
## Proportion of Variance 0.02643 0.02402 0.02114 0.02027 0.01964 0.01826 0.01761
## Cumulative Proportion  0.45366 0.47768 0.49883 0.51909 0.53874 0.55700 0.57461
##                           PC15   PC16    PC17   PC18    PC19    PC20   PC21
## Standard deviation     1.29029 1.2805 1.25668 1.2124 1.18183 1.14234 1.1315
## Proportion of Variance 0.01665 0.0164 0.01579 0.0147 0.01397 0.01305 0.0128
## Cumulative Proportion  0.59126 0.6077 0.62345 0.6381 0.65211 0.66516 0.6780
##                           PC22    PC23    PC24    PC25    PC26    PC27    PC28
## Standard deviation     1.11529 1.09261 1.07375 1.05164 1.03250 1.02181 1.01157
## Proportion of Variance 0.01244 0.01194 0.01153 0.01106 0.01066 0.01044 0.01023
## Cumulative Proportion  0.69040 0.70234 0.71387 0.72493 0.73559 0.74603 0.75626
##                           PC29    PC30    PC31    PC32    PC33    PC34   PC35
## Standard deviation     0.97544 0.96563 0.94737 0.92265 0.91362 0.90791 0.9000
## Proportion of Variance 0.00951 0.00932 0.00898 0.00851 0.00835 0.00824 0.0081
## Cumulative Proportion  0.76578 0.77510 0.78408 0.79259 0.80094 0.80918 0.8173
##                           PC36    PC37    PC38    PC39    PC40    PC41    PC42
## Standard deviation     0.89315 0.86204 0.84611 0.83490 0.82783 0.82129 0.79683
## Proportion of Variance 0.00798 0.00743 0.00716 0.00697 0.00685 0.00675 0.00635
## Cumulative Proportion  0.82526 0.83269 0.83985 0.84682 0.85367 0.86042 0.86677
##                           PC43    PC44    PC45    PC46   PC47    PC48    PC49
## Standard deviation     0.78185 0.77389 0.76734 0.74777 0.7413 0.70093 0.68479
## Proportion of Variance 0.00611 0.00599 0.00589 0.00559 0.0055 0.00491 0.00469
## Cumulative Proportion  0.87288 0.87887 0.88476 0.89035 0.8958 0.90076 0.90545
##                           PC50    PC51    PC52    PC53    PC54    PC55    PC56
## Standard deviation     0.66786 0.66260 0.64290 0.63831 0.63047 0.62325 0.61537
## Proportion of Variance 0.00446 0.00439 0.00413 0.00407 0.00397 0.00388 0.00379
## Cumulative Proportion  0.90991 0.91430 0.91843 0.92250 0.92648 0.93036 0.93415
##                           PC57    PC58    PC59    PC60    PC61    PC62    PC63
## Standard deviation     0.60421 0.58426 0.57830 0.56917 0.55554 0.54703 0.52700
## Proportion of Variance 0.00365 0.00341 0.00334 0.00324 0.00309 0.00299 0.00278
## Cumulative Proportion  0.93780 0.94121 0.94456 0.94780 0.95088 0.95388 0.95665
##                           PC64   PC65    PC66    PC67    PC68    PC69    PC70
## Standard deviation     0.51677 0.5095 0.49210 0.48516 0.47311 0.46544 0.44166
## Proportion of Variance 0.00267 0.0026 0.00242 0.00235 0.00224 0.00217 0.00195
## Cumulative Proportion  0.95932 0.9619 0.96434 0.96670 0.96893 0.97110 0.97305
##                           PC71    PC72    PC73    PC74    PC75    PC76    PC77
## Standard deviation     0.43383 0.42894 0.41921 0.41383 0.39757 0.37832 0.37570
## Proportion of Variance 0.00188 0.00184 0.00176 0.00171 0.00158 0.00143 0.00141
## Cumulative Proportion  0.97493 0.97677 0.97853 0.98024 0.98182 0.98326 0.98467
##                           PC78    PC79    PC80    PC81    PC82    PC83    PC84
## Standard deviation     0.35349 0.34737 0.33294 0.32986 0.31939 0.31135 0.30299
## Proportion of Variance 0.00125 0.00121 0.00111 0.00109 0.00102 0.00097 0.00092
## Cumulative Proportion  0.98592 0.98712 0.98823 0.98932 0.99034 0.99131 0.99223
##                           PC85    PC86    PC87    PC88    PC89    PC90    PC91
## Standard deviation     0.29256 0.27833 0.26872 0.26597 0.25826 0.25343 0.22121
## Proportion of Variance 0.00086 0.00077 0.00072 0.00071 0.00067 0.00064 0.00049
## Cumulative Proportion  0.99308 0.99386 0.99458 0.99529 0.99595 0.99660 0.99709
##                           PC92    PC93   PC94    PC95    PC96    PC97    PC98
## Standard deviation     0.21644 0.20685 0.2010 0.19797 0.19323 0.18539 0.14658
## Proportion of Variance 0.00047 0.00043 0.0004 0.00039 0.00037 0.00034 0.00021
## Cumulative Proportion  0.99755 0.99798 0.9984 0.99878 0.99915 0.99950 0.99971
##                           PC99   PC100
## Standard deviation     0.12431 0.11635
## Proportion of Variance 0.00015 0.00014
## Cumulative Proportion  0.99986 1.00000

Unlike FA, PCA requires only 28 components to explain 75% of the variance. However, to maximize variance while adhering to the Half-Variable Rule, I am selecting 48 components, which account for 90% of the variance.

Plotting cumulative variance explained:

explained_variance <- pca$sdev^2 / sum(pca$sdev^2)

explained_variance_df <- data.frame(
  Components = 1:length(explained_variance),
  CumulativeVariance = cumsum(explained_variance)
)

ggplot(explained_variance_df, aes(x = Components, y = CumulativeVariance)) +
  geom_line(color = "blue") +
  geom_point(color = "blue") +
  labs(
    title = "Cumulative variance explained",
    x = "Number of components",
    y = "Cumulative variance"
  ) +
  theme_minimal()

The curve follows a typical elbow shape, where the first few components contribute significantly to the total variance, while additional components add diminishing returns. As more components are included, the cumulative variance continues to increase but at a slower rate.

Determining number of components explaining 90% variance:

num_components <- which(cumsum(explained_variance) >= 0.90)[1]
cat("Number of components explaining 90% variance:", num_components)
## Number of components explaining 90% variance: 48

Reduced data:

reduced_data <- as.data.frame(pca$x[, 1:num_components])
str(reduced_data)
## 'data.frame':    151 obs. of  48 variables:
##  $ PC1 : num  -2.834 -4.308 0.498 -3.394 -5.013 ...
##  $ PC2 : num  -4.34 -3.18 -6.6 -1.71 -1.96 ...
##  $ PC3 : num  0.0219 -0.6121 -1.2402 1.9773 0.8616 ...
##  $ PC4 : num  -1.1022 1.6411 -0.4932 -1.6938 0.0384 ...
##  $ PC5 : num  2.27 2.46 1.19 3.09 1.58 ...
##  $ PC6 : num  -0.716 -2.659 -1.07 -3.017 -1.283 ...
##  $ PC7 : num  0.722 -1.045 0.688 0.958 -0.334 ...
##  $ PC8 : num  1.0938 4.7261 4.2608 0.6408 0.0775 ...
##  $ PC9 : num  -1.41754 3.85401 -1.58819 -0.00362 -1.3923 ...
##  $ PC10: num  1.308 -3.515 1.075 0.38 0.283 ...
##  $ PC11: num  0.349 -5.239 -0.665 1.751 0.746 ...
##  $ PC12: num  -0.763 0.343 -0.878 -1.39 -1.388 ...
##  $ PC13: num  -0.569 -0.882 0.735 -0.739 -1.84 ...
##  $ PC14: num  0.625 -2.305 -1.523 2.188 -0.552 ...
##  $ PC15: num  2.06 3.364 0.915 0.553 0.439 ...
##  $ PC16: num  -0.278 -1.7 1.142 -0.76 -0.688 ...
##  $ PC17: num  -0.437 1.871 1.514 0.202 0.423 ...
##  $ PC18: num  1.053 1.299 0.08 1.629 0.662 ...
##  $ PC19: num  -0.371 1.312 1.481 1.355 1.3 ...
##  $ PC20: num  -0.6643 2.464 -0.0711 0.382 -0.0816 ...
##  $ PC21: num  -1.706 2.74 0.261 0.829 -1.513 ...
##  $ PC22: num  -1.014 -1.0477 0.0884 0.1146 1.7839 ...
##  $ PC23: num  1.6 -3.97 1.42 1.35 0.29 ...
##  $ PC24: num  -1.347 0.225 0.741 -0.703 0.728 ...
##  $ PC25: num  0.385 1.006 -0.49 -0.789 0.777 ...
##  $ PC26: num  0.549 -2.196 -0.828 -0.174 0.063 ...
##  $ PC27: num  0.516 -0.301 1.5 -0.6 0.727 ...
##  $ PC28: num  0.6206 0.5135 0.4235 -0.0797 0.3555 ...
##  $ PC29: num  -1.6192 -2.3022 -0.0852 0.253 -0.722 ...
##  $ PC30: num  -0.00934 -0.10416 0.48702 0.3083 -0.18809 ...
##  $ PC31: num  1.21 0.683 -1.014 1.127 -0.503 ...
##  $ PC32: num  -0.251 -2.833 -0.864 -0.23 0.694 ...
##  $ PC33: num  1.74 0.757 0.29 -0.195 0.284 ...
##  $ PC34: num  -0.588 -1.519 0.632 0.102 -0.836 ...
##  $ PC35: num  -0.6054 1.5401 -0.1405 0.7388 0.0691 ...
##  $ PC36: num  0.4996 -0.2809 -0.0372 1.3635 -0.1224 ...
##  $ PC37: num  -0.713 -1.264 1.146 -0.166 -1.312 ...
##  $ PC38: num  0.0705 0.0302 0.7203 -0.1795 0.7014 ...
##  $ PC39: num  -0.0431 1.6706 -1.3537 0.6622 -0.672 ...
##  $ PC40: num  0.7599 -1.6477 0.8836 0.0191 -0.3291 ...
##  $ PC41: num  1.4576 -0.047 -1.116 0.8824 -0.0564 ...
##  $ PC42: num  0.6 0.947 0.7 -0.549 0.112 ...
##  $ PC43: num  -1.043 -0.109 -0.254 0.629 0.489 ...
##  $ PC44: num  0.283 -0.326 0.21 -0.301 1.615 ...
##  $ PC45: num  0.857 -0.627 -1.349 -0.479 0.223 ...
##  $ PC46: num  0.7458 0.0301 -0.6469 -1.3376 -0.0438 ...
##  $ PC47: num  -0.88 -1.651 0.852 -0.567 -0.569 ...
##  $ PC48: num  -0.9346 -0.0869 0.5325 0.5566 -0.3164 ...

5 Clustering analysis

5.0.1 Hopkins statistic:

tendency <- get_clust_tendency(reduced_data, n = nrow(reduced_data) * 0.1)
cat("Hopkins statistic:", tendency$hopkins_stat, "\n")
## Hopkins statistic: 0.6570545

The Hopkins statistic (0.657) measures the clustering tendency of the dataset, indicating how strongly the data exhibits natural cluster structures. It ranges from 0 to 1:

  • Close to 0: The data is uniformly distributed (random), meaning clustering is not meaningful.

  • Close to 1: The data is highly clustered, making it well-suited for clustering algorithms like K-Means.

  • Around 0.5: The data has moderate structure but may not form well-separated clusters.

With a Hopkins value of 0.657, the dataset shows some clustering tendency.

5.1 K-means clustering

5.1.1 Elbow method

library(cluster)

wss <- numeric(9)
sil_scores <- numeric(9)

for (k in 2:10) {
  kmeans_model <- kmeans(reduced_data, centers = k, nstart = 25)
  
  wss[k - 1] <- kmeans_model$tot.withinss
  
  silhouette_vals <- silhouette(kmeans_model$cluster, dist(reduced_data))
  sil_scores[k - 1] <- mean(silhouette_vals[, 3])
}

elbow_df <- data.frame(Clusters = 2:10, WSS = wss)

ggplot(elbow_df, aes(x = Clusters, y = WSS)) +
  geom_line(color = "blue") +
  geom_point(color = "red") +
  labs(title = "Elbow method for optimal clusters",
       x = "Number of clusters",
       y = "Within-cluster sum of squares") +
  theme_minimal()

The optimal number of clusters is typically found at the elbow point, where increasing k further provides diminishing improvements in WSS. In this plot, the elbow appears to be around k = 2 or 3, suggesting that this is the most efficient number of clusters for this dataset. Further validation using silhouette scores can confirm this choice.

5.1.2 Silhouette score:

silhouette_df <- data.frame(Clusters = 2:10, Silhouette = sil_scores)

ggplot(silhouette_df, aes(x = Clusters, y = Silhouette)) +
  geom_line(color = "green") +
  geom_point(color = "red") +
  labs(title = "Silhouette scores for optimal clusters",
       x = "Number of clusters",
       y = "Mean silhouette score") +
  theme_minimal()

optimal_clusters <- which.max(sil_scores) + 1

kmeans_result <- kmeans(reduced_data, centers = optimal_clusters, nstart = 25)

silhouette_kmeans <- silhouette(kmeans_result$cluster, dist(reduced_data))

plot(silhouette_kmeans, main = "Silhouette plot for K-Means clustering", col = kmeans_result$cluster, border = NA)

This silhouette plot evaluates how well data points are clustered when using k = 2 clusters in K-Means. The silhouette width, shown on the x-axis, measures how similar each point is to its assigned cluster relative to the next closest cluster. A higher silhouette width indicates better clustering quality.

The average silhouette score is 0.11, which is very low, suggesting that the clustering structure is weak. Ideally, silhouette values should be above 0.25 for moderate clustering quality and closer to 1 for strong, well-separated clusters.

This suggests that k-means clustering is not a good choice for this dataset, as the clusters are not well-defined. The low silhouette scores and many near-zero values imply significant overlap or poor separation. It would be beneficial to consider a different clustering algorithm like hierarchical clustering or T-SNE, which may capture the data structure better.

5.2 Hierarchical Clustering

distance_matrix <- dist(reduced_data)
hc <- hclust(distance_matrix, method = "ward.D2")

plot(hc, labels = FALSE, main = "Hierarchical clustering dendrogram", xlab = "Samples", ylab = "Height")

cluster_labels <- cutree(hc, k = optimal_clusters)
table(cluster_labels)
## cluster_labels
##  1  2 
## 88 63

This dendrogram represents the hierarchical clustering structure of the dataset, where samples are grouped based on their similarity. The y-axis (Height) represents the distance or dissimilarity between clusters, with larger heights indicating more distinct clusters. The x-axis (Samples) represents the individual observations being clustered.

Using the Ward’s D2 method, which minimizes variance within clusters, the dendrogram shows a clear bifurcation into two main clusters, aligning with the cut at k = 2. The table below the plot confirms this, with 88 samples assigned to Cluster 1 and 63 to Cluster 2.

The large vertical jumps in the dendrogram indicate that these two clusters are relatively well-separated, though further validation using silhouette scores or gap statistics can confirm their quality. The structure suggests that hierarchical clustering successfully identifies patterns in the data, with potential for refining clusters further if needed.

sil <- silhouette(cluster_labels, dist(reduced_data))

plot(sil, col = cluster_labels, main = "Silhouette plot for Hierarchical Clustering")

The average silhouette width is 0.09, which is very low, indicating that the clusters are weakly separated and overlap significantly. Cluster 1 (88 samples) has a slightly better silhouette score of 0.12, but still suggests poor cluster cohesion. Cluster 2 (63 samples) has an even lower score of 0.05, meaning that many points may be misclassified or lie near the decision boundary between clusters.

These results suggest that hierarchical clustering with k = 2 is not a good fit for this dataset, as the low silhouette scores indicate that the clustering structure is not well-defined. It would be beneficial to consider alternative clustering algorithms like Rtsne to improve separation and structure.

5.3 t-SNE clustering with k-means

The t-Distributed Stochastic Neighbor Embedding (t-SNE) is a non-linear dimension reduction technique that maps high-dimensional data into a 2D space while preserving local structures and patterns. This makes it ideal for visualizing complex datasets and identifying potential cluster structures. However, t-SNE does not inherently perform clustering—it only reduces dimensions.

To extract meaningful groups, K-Means clustering was applied to the t-SNE-transformed data. K-Means partitions the dataset into k clusters, minimizing intra-cluster variance. By using t-SNE as preprocessing, K-Means can work more effectively in a reduced, meaningful space.

#install.packages("Rtsne")

library(Rtsne)

set.seed(123)
tsne_result <- Rtsne(reduced_data[, -ncol(reduced_data)], dims = 2, perplexity = 30)

tsne_data <- as.data.frame(tsne_result$Y)
colnames(tsne_data) <- c("Dim1", "Dim2")

set.seed(123)
kmeans_tsne <- kmeans(tsne_data, centers = 3, nstart = 25)

tsne_data$Cluster <- as.factor(kmeans_tsne$cluster)

library(ggplot2)
ggplot(tsne_data, aes(x = Dim1, y = Dim2, color = Cluster)) +
  geom_point(size = 3) +
  labs(title = "t-SNE with k-means Clustering") +
  theme_minimal()

The t-SNE plot shows a well-separated structure of three distinct clusters, colored red (Cluster 1), green (Cluster 2), and blue (Cluster 3). Unlike PCA, t-SNE excels at revealing non-linear relationships, which explains why the clusters appear more separated than in previous methods.

tsne_dist <- dist(tsne_data[, c("Dim1", "Dim2")])

silhouette_tsne <- silhouette(as.numeric(tsne_data$Cluster), tsne_dist)

plot(silhouette_tsne, main = "Silhouette Plot for t-SNE + k-means Clustering",
     col = as.numeric(tsne_data$Cluster), border = NA)

avg_silhouette_width <- mean(silhouette_tsne[, 3])
cat("Average silhouette width for t-SNE + k-means:", avg_silhouette_width, "\n")
## Average silhouette width for t-SNE + k-means: 0.4353395

The silhouette plot measures how well each point fits within its assigned cluster. With an average silhouette score of 0.44, the clustering is significantly stronger than previous methods, such as PCA-based or hierarchical clustering.

Cluster 1 (27 points) has a mean silhouette score of 0.38, indicating a moderate structure.

Cluster 2 (71 points) has the highest silhouette score of 0.45, meaning it is the most well-defined.

Cluster 3 (53 points) also has a strong silhouette score of 0.44, reinforcing the validity of the three-cluster solution.

Compared to hierarchical clustering and K-Means on PCA, this approach provides better-defined clusters with higher silhouette scores. The results suggest that t-SNE successfully separates data into three meaningful groups, while K-Means effectively groups them.

if (!requireNamespace("clusterSim", quietly = TRUE)) {
    install.packages("clusterSim")
}
library(clusterSim)
## Loading required package: MASS
k_values <- c(2, 3, 4)

ch_indices <- numeric(length(k_values))

for (i in seq_along(k_values)) {
  set.seed(123) 
  kmeans_model <- kmeans(tsne_data[, c("Dim1", "Dim2")], centers = k_values[i], nstart = 25)
  ch_indices[i] <- index.G1(tsne_data[, c("Dim1", "Dim2")], kmeans_model$cluster)
  cat("Calinski-Harabasz Index for k =", k_values[i], ":", ch_indices[i], "\n")
}
## Calinski-Harabasz Index for k = 2 : 171.6043 
## Calinski-Harabasz Index for k = 3 : 150.8291 
## Calinski-Harabasz Index for k = 4 : 150.9773

The Calinski-Harabasz (CH) Index evaluates the quality of clustering by measuring the ratio of between-cluster dispersion to within-cluster dispersion. Higher values indicate better-defined clusters with strong separation.

From the results:

k = 2: CH Index = 171.60 (highest)

k = 3: CH Index = 150.83

k = 4: CH Index = 150.98

Since the CH index is highest for k = 2, this suggests that two clusters provide the most optimal separation for this dataset. However, the difference between k = 3 and k = 4 is minimal, meaning both could still be reasonable choices depending on the dataset’s structure and the interpretation of clustering results. Because biological and functional interpretation supports more clusters, k = 3 or 4 is still valid despite the CH index favoring k = 2.

6 Conclusion

For this dataset, characterized by high variance among variables, t-SNE combined with K-Means emerged as the most effective clustering method. The clear separation in the t-SNE visualization, along with high silhouette scores, confirms a well-defined three-cluster structure. This approach is particularly beneficial for non-linear, high-dimensional data, making it a valuable tool for gene expression analysis and other complex biological studies.

However, previous research suggests that for typical gene expression datasets, K-Means (D’haeseleer 2005) or hierarchical clustering (Eisen et al. 1998) are often preferred. While t-SNE enhances visual interpretation, its stochastic nature and sensitivity to hyperparameters make it less commonly used for clustering in large-scale genomic studies.

References

D’haeseleer, Patrik. 2005. “How Does Gene Expression Clustering Work?” Nature Biotechnology 23 (12): 1499–1501. https://doi.org/10.1038/nbt1205-1499.
Eisen, Michael B., Paul T. Spellman, Patrick O. Brown, and David Botstein. 1998. “Cluster Analysis and Display of Genome-Wide Expression Patterns.” Proceedings of the National Academy of Sciences 95 (25): 14863–68. https://doi.org/10.1073/pnas.95.25.14863.
Feltes, B. C., E. B. Chandelier, B. I. Grisci, and M. Dorn. 2019. “CuMiDa: An Extensively Curated Microarray Database for Benchmarking and Testing of Machine Learning Approaches in Cancer Research.” Journal of Computational Biology 26 (4): 376–86. https://doi.org/10.1089/cmb.2018.0238.