Gene expression profiling is a powerful tool for understanding the molecular basis of diseases, particularly cancer. The dataset analyzed in this report is a CSV file containing the gene expression levels of 54,676 genes (columns) across 151 samples (rows). These samples represent five different types of breast cancer as well as healthy tissue. The dataset is sourced from the CuMiDa database, a highly curated microarray repository designed for benchmarking machine learning techniques in cancer research (Feltes et al. 2019).
Understanding the underlying patterns in such high-dimensional datasets poses a significant challenge due to the sheer volume of features. In this dataset, each sample’s expression levels span tens of thousands of genes, most of which may not contribute directly to distinguishing between cancer subtypes. Consequently, raw data analysis can be hindered by redundancy, noise, and computational inefficiency. Dimension reduction and clustering techniques are therefore applied to address these issues.
Dimension reduction is used to: remove noise and redundancy, improve interpretability and enhance computational efficiency.
In this study, dimension reduction techniques such as Principal Component Analysis (PCA) and Factor analysis are applied to reduce the dataset’s dimensions while preserving the underlying structure of the data.
Clustering techniques aim to group samples or genes based on their similarity in expression patterns. The primary goals include: identifying molecular subtypes, understanding gene co-expression and providing insights for biomarker discovery.
Clustering techniques applied here were: hierarchical clustering, k-means clustering and t-sne clustering.
The combination of dimension reduction and clustering offers a robust framework to uncover meaningful patterns and relationships in complex gene expression data, helping bridge the gap between raw molecular data and actionable insights in cancer research.
I start by uploading my dataset, checking its structure and dimensions.
data <- read.csv("Breast_GSE45827.csv", header = TRUE)
str(data[1:5])
## 'data.frame': 151 obs. of 5 variables:
## $ samples : int 84 85 87 90 91 92 93 94 99 101 ...
## $ type : chr "basal" "basal" "basal" "basal" ...
## $ X1007_s_at: num 9.85 9.86 10.1 9.76 9.41 ...
## $ X1053_at : num 8.1 8.21 8.94 7.36 7.75 ...
## $ X117_at : num 6.42 7.06 5.74 6.48 6.69 ...
dim(data)
## [1] 151 54677
Key Variables
samples: An integer column
representing unique sample identifiers.
type: A categorical (character)
column that classifies each sample into different biological conditions
(e.g., "basal" indicates a specific breast cancer
subtype).
gene expression variables
(X1007_s_at, X1053_at, etc.):
These numerical columns represent gene expression levels measured for different genes.
The values likely indicate transcript abundance in a logarithmic scale.
Challenges with the dataset:
The dataset has far more variables (54,677) than observations (151), making it a typical candidate for dimension reduction techniques such as Principal Component Analysis (PCA) or Factor analysis. Some genes may be correlated/redundant, necessitating feature selection or clustering to group similar expression patterns. Also normalization and scaling might be required before applying clustering algorithms.
gene_data <- data[, -1]
sum(is.na(gene_data))
## [1] 0
There is no missing data so I will proceed with data scaling and feature selection.
gene_data_scaled <- scale(gene_data[,-1])
str(gene_data_scaled[1:5])
## num [1:5] -0.797 -0.778 -0.384 -0.949 -1.517
gene_variances <- apply(gene_data_scaled, 2, var, na.rm = TRUE)
top_genes_indices <- order(gene_variances, decreasing = TRUE)[1:100]
top_genes <- gene_data_scaled[, top_genes_indices]
summary(top_genes[, 1:5])
## X1552584_at X1556147_at X1557029_at X1557677_a_at
## Min. :-1.8826 Min. :-1.7894 Min. :-1.7714 Min. :-1.5344
## 1st Qu.:-0.7634 1st Qu.:-0.5713 1st Qu.:-0.6160 1st Qu.:-0.4761
## Median :-0.1457 Median :-0.1593 Median :-0.1199 Median :-0.1408
## Mean : 0.0000 Mean : 0.0000 Mean : 0.0000 Mean : 0.0000
## 3rd Qu.: 0.5222 3rd Qu.: 0.3997 3rd Qu.: 0.3961 3rd Qu.: 0.3686
## Max. : 3.0696 Max. : 4.1978 Max. : 6.2424 Max. : 9.4677
## X1559732_at
## Min. :-1.5836
## 1st Qu.:-0.5968
## Median :-0.1229
## Mean : 0.0000
## 3rd Qu.: 0.2906
## Max. : 7.2254
str(top_genes)
## num [1:151, 1:100] 1.816 0.321 -0.567 2.403 0.594 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr [1:100] "X1552584_at" "X1556147_at" "X1557029_at" "X1557677_a_at" ...
Scaling standardizes gene expression values, ensuring that all genes have a mean of 0 and a standard deviation of 1. This step is essential for downstream analyses such as PCA and clustering. As we can see from the summary, scaling was performed correctly. Following this, I selected 100 genes with the highest variance, as high-variance genes tend to be more informative for clustering. They exhibit greater expression differences across samples, making them particularly useful for identifying meaningful biological patterns.
In this step, a correlation matrix was computed for the 100 selected high-variance genes, using Pearson’s correlation coefficient. Since gene expression data is continuous, standardized, and follows an approximately normal distribution, Pearson’s correlation is the best choice. It effectively detects linear relationships, making it ideal for PCA, FA and clustering applications. The correlation matrix quantifies the linear relationship between pairs of genes, with values ranging from -1 to 1:
+1: Perfect positive correlation (genes show similar expression patterns).
0: No correlation.
-1: Perfect negative correlation (genes have opposite expression patterns).
The matrix is visualized using a heatmap generated by the corrplot package. Blue shades represent positive correlations, while red shades indicate negative correlations. The diagonal (black line) represents self-correlations, which are always 1.
library(corrplot)
## corrplot 0.95 loaded
correlation_matrix <- cor(top_genes, method = "pearson", use = "pairwise.complete.obs")
corrplot(correlation_matrix, order = "alphabet", tl.cex = 0.4, method = "color")
The correlation heatmap reveals a mix of weak to moderate correlations among the selected genes, suggesting some underlying structure but no clearly defined blocks. This indicates that Principal Component Analysis (PCA) is well-suited for dimension reduction, as it can effectively capture and condense the variance in the dataset. However, the absence of strong correlation clusters suggests that Factor Analysis (FA) may not be optimal, as it relies on well-defined latent factors driving the observed correlations. In the context of clustering, methods such as hierarchical clustering may still identify meaningful gene groupings, but the generally low correlation values imply that the clusters might not be well-defined or strongly separable.
Following the analysis of the correlation matrix, I wanted to further evaluate the suitability of Factor Analysis (FA) for this dataset. To do so, I conducted the Kaiser-Meyer-Olkin (KMO) test and Bartlett’s test of sphericity, which assess the adequacy of the data for dimension reduction techniques.
The Kaiser-Meyer-Olkin (KMO) test assesses the sampling adequacy for factor analysis by measuring the proportion of variance in the data that might be caused by underlying factors.
library(psych)
KMO(correlation_matrix)
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = correlation_matrix)
## Overall MSA = 0.62
## MSA for each item =
## X1552584_at X1556147_at X1557029_at X1557677_a_at X1559732_at
## 0.59 0.29 0.52 0.46 0.43
## X1559789_a_at X1564573_at X1566879_at X1569983_at X201121_s_at
## 0.34 0.54 0.74 0.48 0.57
## X202419_at X202425_x_at X202867_s_at X203006_at X203761_at
## 0.46 0.34 0.51 0.46 0.69
## X204389_at X205915_x_at X206024_at X206396_at X210516_at
## 0.48 0.70 0.66 0.44 0.51
## X212901_s_at X214894_x_at X214936_at X215282_at X216097_at
## 0.61 0.40 0.63 0.45 0.33
## X217272_s_at X220663_at X220965_s_at X222027_at X222184_at
## 0.46 0.32 0.71 0.70 0.41
## X222371_at X226406_at X227623_at X228041_at X230228_at
## 0.69 0.57 0.54 0.55 0.59
## X232968_at X234774_at X239537_at X239735_at X241255_at
## 0.58 0.82 0.32 0.63 0.43
## X241366_at X243537_at X243900_at X117_at X1552261_at
## 0.69 0.78 0.52 0.68 0.31
## X1552264_a_at X1552266_at X1552271_at X1552272_a_at X1552280_at
## 0.75 0.29 0.77 0.54 0.50
## X1552281_at X1552286_at X1552302_at X1552303_a_at X1552306_at
## 0.84 0.69 0.68 0.70 0.52
## X1552311_a_at X1552315_at X1552318_at X1552323_s_at X1552347_at
## 0.76 0.72 0.72 0.32 0.72
## X1552349_a_at X1552355_s_at X1552364_s_at X1552365_at X1552377_s_at
## 0.40 0.82 0.44 0.37 0.46
## X1552386_at X1552388_at X1552396_at X1552399_a_at X1552402_at
## 0.63 0.58 0.64 0.56 0.75
## X1552412_a_at X1552426_a_at X1552439_s_at X1552445_a_at X1552450_a_at
## 0.70 0.42 0.28 0.41 0.73
## X1552453_a_at X1552472_a_at X1552482_at X1552486_s_at X1552491_at
## 0.71 0.63 0.74 0.57 0.69
## X1552501_a_at X1552516_a_at X1552518_s_at X1552523_a_at X1552528_at
## 0.79 0.65 0.65 0.70 0.80
## X1552532_a_at X1552535_at X1552538_a_at X1552555_at X1552563_a_at
## 0.53 0.80 0.38 0.70 0.58
## X1552566_at X1552569_a_at X1552582_at X1552585_s_at X1552590_a_at
## 0.59 0.53 0.53 0.75 0.45
## X1552592_at X1552594_at X1552596_at X1552612_at X1552623_at
## 0.62 0.59 0.65 0.52 0.40
The overall KMO value of 0.62 suggests a mediocre level of factorability, meaning that the dataset is somewhat suitable for factor analysis (FA) but not ideal.
Examining the individual KMO values for each gene, we see a wide range, with some genes showing low adequacy scores (below 0.50). Values closer to 1 indicate strong suitability for FA, while values below 0.50 suggest that some variables do not share sufficient common variance and may not contribute meaningfully to a latent factor model.
Bartlett’s test evaluates whether the correlation matrix is significantly different from an identity matrix, which would indicate that variables are sufficiently interrelated to justify dimension reduction techniques such as Principal Component Analysis (PCA) or Factor Analysis (FA).
library(psych)
bartlett_result <- cortest.bartlett(cor(top_genes), n = 151)
print(bartlett_result)
## $chisq
## [1] 10504.89
##
## $p.value
## [1] 0
##
## $df
## [1] 4950
The test result shows a chi-square value of 10504.89 with 4950 degrees of freedom, and a p-value of 0 (rounded, meaning it is extremely small). This strongly suggests that the null hypothesis, which assumes that the correlation matrix is an identity matrix (i.e., variables are uncorrelated), can be rejected. In other words, there is sufficient correlation structure in the data, making it appropriate to proceed with PCA or FA.
Despite the mediocre KMO score (0.62), Bartlett’s test confirms that the dataset has a strong enough correlation pattern to benefit from dimension reduction. This supports the application of PCA for variance maximization or FA for latent factor extraction, though careful selection of the number of components or factors is still necessary.
A crucial step in Factor Analysis (FA) is selecting the optimal number of factors to retain. Since there is no single objective criterion for factor retention, several established methods can be used to guide this decision. Each approach provides a different perspective on how many factors should be considered meaningful in explaining the variance in the dataset.
To assess the appropriate number of factors, we can apply multiple techniques, including:
Kaiser’s Criterion: Retaining factors with eigenvalues greater than 1, as they explain more variance than a single original variable.
Scree Plot (Cattell’s Criterion): Analyzing the elbow point on the scree plot, where the eigenvalues show a sharp decline before leveling off. Factors to the left of this inflection point are considered significant.
Proportion of variance explained: Selecting the number of factors that together account for a predefined proportion of the total variance, such as 80%.
Half-Variable Rule (Kim & Mueller’s Criterion): Ensuring that the number of factors remains below half of the total observed variables, preventing overfitting.
By comparing the results from these approaches, I will determine an optimal number of factors that best capture the structure of the data while minimizing noise and redundancy.
eigenvalues <- eigen(correlation_matrix)$values
kaiser_factors <- sum(eigenvalues > 1)
cat("Number of factors based on Kaiser's criterion:", kaiser_factors, "\n")
## Number of factors based on Kaiser's criterion: 28
scree(top_genes, factors = TRUE, pc = TRUE, main = "Scree Plot of Eigenvalues")
Choosing number of factors that explain at least 85% of total variance
cumulative_variance <- cumsum(eigenvalues) / sum(eigenvalues)
num_factors_variance <- min(which(cumulative_variance >= 0.85))
cat("Number of factors based on 85% variance explained:", num_factors_variance, "\n")
## Number of factors based on 85% variance explained: 40
When determining the optimal number of factors in Factor Analysis, different criteria provide different perspectives. While each method has its advantages, the Proportion of Variance Explained criterion is the most robust and interpretable choice because it ensures that the selected number of factors captures a meaningful proportion of the total variance in the dataset.
Chosen method to extract the loading for this analysis is MINRES. MINRES in its first stage, uses multiple R^2 estimates as common variability resources. Then, after initial factor extraction, it adjusts the loadings using iterative methods. The fitting of the model is assessed by minimizing the sum of squares of the residuals. To make the interpretation of given loadings easier I chose to use VARIMAX as rotation method. VARIMAX maximises the variance in the columns of the loadings matrix for every factor which seemed to be fitting in this example.
optimal_factors <- num_factors_variance
fa_result <- fa(top_genes, nfactors = optimal_factors, rotate = "varimax")
print(fa_result)
## Factor Analysis using method = minres
## Call: fa(r = top_genes, nfactors = optimal_factors, rotate = "varimax")
## Standardized loadings (pattern matrix) based upon correlation matrix
## MR1 MR3 MR2 MR15 MR10 MR7 MR14 MR8 MR20 MR33 MR36
## X1552584_at -0.17 0.80 0.26 -0.09 -0.13 0.01 0.05 -0.08 -0.03 0.05 0.04
## X1556147_at 0.09 -0.04 -0.06 0.02 -0.02 0.07 0.04 -0.02 0.18 0.02 0.00
## X1557029_at 0.16 -0.04 0.09 -0.18 -0.11 0.27 0.01 0.10 0.71 0.15 0.06
## X1557677_a_at 0.09 0.04 0.06 -0.04 -0.02 0.15 0.03 0.01 -0.05 0.03 0.03
## X1559732_at 0.10 0.05 -0.13 0.07 -0.04 0.08 0.01 -0.01 0.07 -0.06 0.01
## X1559789_a_at -0.01 0.04 -0.07 -0.01 0.03 0.14 0.01 -0.04 0.78 0.00 0.09
## X1564573_at 0.29 -0.11 0.10 0.18 -0.02 0.12 0.07 0.04 -0.05 -0.11 0.18
## X1566879_at 0.63 -0.07 -0.23 -0.03 0.06 -0.08 0.12 0.16 -0.01 -0.15 0.09
## X1569983_at 0.27 0.06 -0.10 0.12 0.06 0.01 0.06 -0.05 0.07 0.01 -0.07
## X201121_s_at -0.09 -0.15 0.21 -0.11 0.09 -0.03 -0.12 0.12 0.02 0.20 0.09
## X202419_at 0.12 -0.07 -0.37 0.23 -0.19 0.29 0.01 -0.16 -0.15 -0.17 0.01
## X202425_x_at -0.05 -0.10 -0.06 -0.05 0.01 0.05 -0.02 -0.08 0.03 0.01 0.05
## X202867_s_at -0.02 -0.15 0.19 0.65 0.05 0.10 0.19 -0.02 -0.07 -0.09 0.04
## X203006_at -0.08 -0.22 -0.12 0.04 0.00 0.05 -0.05 0.00 0.01 -0.02 0.00
## X203761_at -0.22 0.81 -0.14 -0.26 -0.07 -0.10 -0.03 -0.05 0.00 0.11 -0.02
## X204389_at 0.09 0.00 -0.11 0.09 0.15 0.75 0.00 0.01 0.18 -0.06 0.01
## X205915_x_at 0.61 -0.13 -0.13 -0.12 0.14 0.04 0.03 0.04 0.06 -0.08 0.13
## X206024_at 0.25 -0.14 -0.04 0.25 0.07 0.03 -0.04 -0.07 0.04 0.08 -0.04
## X206396_at -0.02 -0.04 0.00 0.11 0.66 -0.02 0.20 -0.08 -0.09 -0.11 0.05
## X210516_at -0.06 -0.15 0.17 0.09 -0.01 0.01 -0.01 -0.08 -0.01 0.06 0.00
## X212901_s_at -0.09 -0.27 0.33 -0.11 -0.06 -0.14 -0.06 0.11 -0.16 0.11 0.35
## X214894_x_at -0.32 0.03 0.14 -0.04 0.09 -0.22 -0.11 0.14 0.06 -0.12 -0.06
## X214936_at -0.12 0.20 0.22 -0.59 0.01 0.00 -0.11 -0.10 0.07 0.11 -0.07
## X215282_at 0.05 -0.17 0.17 0.05 -0.05 0.00 0.17 0.18 0.07 -0.17 0.04
## X216097_at -0.04 -0.14 0.06 0.02 -0.08 0.05 -0.07 0.09 0.10 -0.02 -0.09
## X217272_s_at 0.14 -0.15 -0.03 -0.01 -0.12 0.77 -0.01 -0.02 0.13 0.04 0.02
## X220663_at 0.12 -0.14 -0.01 0.08 0.02 0.04 0.03 -0.04 -0.01 -0.02 0.04
## X220965_s_at 0.57 -0.01 0.24 0.08 -0.11 0.08 0.11 -0.03 -0.05 -0.04 -0.08
## X222027_at -0.21 -0.14 0.29 -0.45 -0.26 0.00 -0.02 0.12 0.07 0.20 -0.10
## X222184_at 0.04 -0.13 -0.24 -0.04 -0.01 0.05 -0.05 0.07 0.04 0.06 0.19
## X222371_at -0.27 0.53 -0.42 -0.28 0.11 -0.16 -0.12 -0.03 0.09 0.06 -0.11
## X226406_at -0.36 -0.13 0.07 -0.03 -0.49 0.03 -0.03 0.02 -0.19 -0.04 0.17
## X227623_at -0.04 0.24 -0.31 0.13 0.25 -0.02 0.00 -0.03 0.11 -0.18 0.07
## X228041_at -0.33 0.04 -0.08 0.43 0.08 0.09 0.02 -0.09 0.02 0.12 -0.08
## X230228_at 0.05 0.05 -0.19 0.51 0.13 -0.06 -0.01 -0.07 -0.01 -0.10 -0.20
## X232968_at -0.10 -0.02 -0.37 0.12 0.45 -0.03 0.25 -0.11 -0.08 -0.23 -0.14
## X234774_at 0.65 -0.22 0.21 -0.03 -0.14 0.02 0.13 0.06 -0.22 -0.01 0.06
## X239537_at 0.09 0.04 -0.07 0.03 0.05 0.11 -0.06 0.05 0.06 -0.03 0.02
## X239735_at -0.19 0.24 -0.39 -0.04 0.25 -0.15 0.01 -0.11 0.03 -0.12 -0.13
## X241255_at 0.18 -0.03 -0.11 -0.06 -0.05 -0.02 0.02 -0.02 0.05 0.03 0.03
## X241366_at -0.27 -0.34 0.59 -0.12 0.05 -0.08 -0.06 0.02 0.16 0.03 -0.06
## X243537_at -0.22 0.57 -0.36 -0.14 0.11 -0.15 -0.03 -0.11 -0.02 -0.10 -0.05
## X243900_at 0.17 -0.15 0.08 0.18 0.07 0.03 0.84 0.07 0.06 0.07 -0.06
## X117_at -0.27 0.48 0.00 -0.26 -0.05 -0.06 -0.07 -0.10 -0.10 0.03 -0.06
## X1552261_at 0.10 0.06 -0.02 0.03 0.07 0.09 0.10 0.02 -0.10 0.00 0.03
## X1552264_a_at -0.22 -0.20 0.79 -0.01 -0.13 -0.12 -0.06 0.11 -0.05 0.15 0.12
## X1552266_at -0.02 -0.02 -0.22 0.03 0.01 -0.01 -0.03 0.07 -0.01 -0.02 0.00
## X1552271_at 0.59 -0.29 -0.01 0.02 0.00 0.03 -0.01 0.01 -0.01 -0.10 0.02
## X1552272_a_at 0.23 -0.10 -0.10 0.03 0.01 0.00 0.04 0.04 -0.03 -0.02 0.02
## X1552280_at 0.03 0.48 0.15 -0.27 -0.03 0.09 -0.01 -0.03 0.03 -0.04 0.04
## X1552281_at 0.82 -0.13 -0.06 0.03 0.05 0.11 -0.03 0.02 0.08 0.08 -0.06
## X1552286_at 0.10 -0.27 0.10 0.00 -0.16 0.00 0.06 0.06 -0.04 0.40 0.21
## X1552302_at -0.23 0.46 -0.25 -0.04 0.05 -0.18 -0.05 -0.09 -0.02 -0.04 -0.10
## X1552303_a_at -0.22 0.49 -0.09 -0.14 0.13 -0.19 -0.05 -0.02 -0.03 -0.12 0.01
## X1552306_at -0.06 -0.28 0.34 -0.09 -0.03 0.07 0.00 -0.12 0.02 0.02 0.17
## X1552311_a_at 0.65 -0.12 -0.38 -0.03 -0.05 0.02 0.08 -0.06 -0.04 0.03 -0.04
## X1552315_at -0.09 0.88 0.01 0.16 0.01 0.03 -0.07 -0.04 -0.04 -0.05 0.04
## X1552318_at -0.08 0.87 -0.14 0.02 0.03 -0.04 -0.09 -0.06 0.06 0.00 0.00
## X1552323_s_at 0.13 0.05 0.15 0.01 0.01 0.05 0.02 -0.04 0.14 0.05 0.84
## X1552347_at -0.25 0.17 -0.59 -0.13 0.03 0.05 -0.01 -0.01 0.04 0.02 0.12
## X1552349_a_at 0.11 -0.15 0.04 0.11 -0.11 0.04 -0.02 0.80 -0.04 0.25 0.00
## X1552355_s_at 0.65 -0.21 -0.08 0.16 0.05 0.25 0.03 0.07 0.02 -0.06 0.19
## X1552364_s_at 0.09 -0.11 0.19 0.01 0.07 -0.07 0.08 -0.05 -0.09 -0.09 -0.01
## X1552365_at -0.07 0.03 -0.03 0.05 0.13 -0.08 0.00 -0.04 0.05 -0.04 0.05
## X1552377_s_at 0.35 0.19 0.00 -0.12 -0.04 -0.07 0.05 0.05 0.01 0.00 0.09
## X1552386_at -0.23 0.53 -0.07 0.18 0.07 -0.03 -0.10 0.00 0.08 -0.09 0.02
## X1552388_at 0.20 -0.22 -0.11 0.32 0.14 -0.02 0.09 -0.02 0.07 -0.04 -0.08
## X1552396_at 0.27 -0.19 -0.07 0.09 0.42 0.18 0.19 -0.08 0.05 0.03 0.07
## X1552399_a_at -0.33 0.00 0.22 0.06 0.00 -0.28 0.00 -0.04 -0.05 -0.30 -0.05
## X1552402_at 0.57 -0.18 -0.03 -0.02 0.04 0.03 -0.05 0.03 -0.10 0.10 0.10
## X1552412_a_at 0.43 0.00 -0.24 0.11 0.02 0.04 0.12 -0.04 -0.02 -0.10 0.01
## X1552426_a_at -0.09 0.05 0.00 -0.02 -0.04 -0.01 -0.08 0.04 0.01 0.09 -0.03
## X1552439_s_at -0.09 -0.06 0.06 -0.05 -0.02 -0.02 -0.05 -0.02 0.01 0.04 0.05
## X1552445_a_at 0.14 -0.12 0.01 -0.14 -0.03 -0.04 0.08 0.84 0.05 -0.06 -0.02
## X1552450_a_at 0.70 -0.15 0.12 0.06 -0.05 0.07 0.06 0.08 0.01 -0.02 0.01
## X1552453_a_at 0.33 -0.17 0.06 0.09 -0.04 0.07 0.08 0.06 0.14 -0.07 0.04
## X1552472_a_at -0.16 -0.03 0.68 -0.24 -0.16 0.01 0.01 0.00 0.11 0.08 0.05
## X1552482_at -0.20 0.25 -0.31 -0.16 0.21 -0.16 0.02 0.23 0.18 0.23 -0.23
## X1552486_s_at -0.17 0.05 0.53 -0.02 0.04 -0.04 -0.05 -0.05 -0.11 -0.20 0.22
## X1552491_at 0.54 -0.15 0.01 0.08 -0.17 -0.10 0.02 0.00 0.21 -0.08 -0.15
## X1552501_a_at 0.67 0.12 -0.01 0.05 0.02 0.12 0.11 0.08 0.00 0.14 0.00
## X1552516_a_at -0.32 0.01 0.67 -0.10 0.09 -0.10 0.05 0.01 0.02 0.10 -0.03
## X1552518_s_at -0.19 -0.27 0.28 -0.05 -0.39 0.00 0.06 0.20 0.01 0.07 0.06
## X1552523_a_at 0.52 -0.06 -0.28 0.06 -0.05 -0.03 0.10 0.03 0.25 -0.03 0.05
## X1552528_at 0.77 -0.12 -0.04 0.11 -0.04 0.02 -0.01 0.05 -0.02 0.02 -0.05
## X1552532_a_at -0.20 0.02 0.10 -0.16 -0.08 -0.02 -0.03 0.14 0.08 0.75 0.04
## X1552535_at 0.71 -0.04 -0.15 0.04 0.11 0.02 0.04 0.03 0.12 -0.04 0.04
## X1552538_a_at 0.05 -0.11 0.01 0.03 -0.11 -0.08 0.09 0.08 0.05 0.03 -0.09
## X1552555_at 0.78 0.00 0.01 -0.20 0.10 -0.01 0.07 -0.05 -0.07 -0.02 0.09
## X1552563_a_at -0.24 0.19 0.13 -0.23 -0.14 0.01 -0.10 0.07 0.04 0.12 -0.04
## X1552566_at 0.32 -0.15 0.23 0.04 0.01 0.10 0.04 0.03 0.27 0.12 0.13
## X1552569_a_at 0.34 -0.12 0.06 0.01 -0.13 0.08 0.08 -0.04 -0.07 -0.02 0.08
## X1552582_at 0.18 0.11 -0.10 0.15 0.10 0.00 0.04 -0.10 -0.08 -0.06 -0.09
## X1552585_s_at 0.52 -0.05 -0.13 -0.03 0.05 0.00 0.06 -0.01 0.04 -0.05 0.08
## X1552590_a_at 0.12 -0.19 0.06 0.01 0.40 -0.07 -0.09 0.02 0.06 0.07 -0.05
## X1552592_at 0.52 -0.03 -0.15 -0.07 0.19 -0.11 0.03 0.01 0.03 -0.15 0.07
## X1552594_at 0.28 -0.05 -0.12 0.02 0.16 -0.06 0.77 0.00 -0.04 -0.09 0.09
## X1552596_at 0.65 0.00 0.07 0.15 0.02 0.15 0.29 0.07 0.10 -0.10 -0.07
## X1552612_at -0.02 0.15 0.81 0.09 0.01 0.01 0.03 -0.06 -0.07 -0.08 0.11
## X1552623_at -0.01 0.31 -0.08 0.11 0.06 -0.05 0.18 -0.07 -0.11 -0.13 0.01
## MR12 MR5 MR16 MR9 MR28 MR21 MR4 MR39 MR29 MR24 MR13
## X1552584_at 0.03 0.09 -0.01 -0.19 0.08 0.01 -0.10 -0.02 -0.05 -0.03 -0.02
## X1556147_at 0.04 0.04 -0.05 0.73 0.00 -0.09 0.08 0.11 -0.02 -0.03 0.14
## X1557029_at 0.03 -0.16 -0.06 0.02 0.06 0.08 -0.02 -0.11 -0.03 0.18 -0.07
## X1557677_a_at 0.02 0.01 0.02 -0.06 -0.04 0.69 0.00 -0.04 0.09 -0.02 0.03
## X1559732_at 0.00 -0.09 0.00 -0.02 -0.02 0.02 0.02 0.00 -0.03 0.14 -0.04
## X1559789_a_at -0.05 -0.01 0.03 0.15 -0.05 -0.07 0.04 0.02 0.02 -0.01 0.00
## X1564573_at -0.07 0.08 0.04 0.12 -0.04 -0.16 0.09 0.03 -0.07 -0.14 0.02
## X1566879_at 0.11 -0.07 0.03 0.04 -0.08 0.19 -0.06 0.08 0.05 0.18 -0.02
## X1569983_at -0.02 0.03 0.00 -0.10 0.09 0.03 0.06 0.01 0.00 0.15 -0.02
## X201121_s_at -0.06 -0.03 -0.15 0.10 -0.10 0.00 0.05 0.07 0.00 0.02 0.11
## X202419_at -0.17 -0.16 0.22 0.02 -0.14 -0.03 0.03 0.00 -0.01 0.07 0.02
## X202425_x_at -0.03 -0.04 0.00 -0.12 0.00 -0.02 0.04 0.01 -0.05 0.02 -0.88
## X202867_s_at 0.06 -0.05 0.06 0.05 0.00 -0.05 -0.02 0.03 0.03 0.03 0.02
## X203006_at 0.07 -0.03 0.05 0.02 -0.02 0.05 0.02 -0.05 0.04 0.06 -0.06
## X203761_at 0.01 0.03 -0.11 -0.01 0.11 -0.05 -0.02 0.08 -0.06 0.01 0.06
## X204389_at 0.01 -0.10 0.04 0.08 0.00 0.14 0.03 -0.01 0.07 0.08 -0.03
## X205915_x_at 0.02 0.14 0.05 0.20 -0.17 0.22 0.00 -0.03 -0.05 0.08 -0.06
## X206024_at 0.02 0.00 -0.07 -0.10 -0.07 0.14 0.01 -0.01 0.11 0.04 -0.11
## X206396_at 0.01 0.07 0.11 0.02 0.01 -0.01 -0.01 0.14 0.02 0.09 -0.06
## X210516_at 0.02 -0.02 0.02 -0.05 0.03 0.07 0.07 0.05 0.73 0.09 0.05
## X212901_s_at -0.05 -0.01 -0.03 0.17 -0.02 0.07 0.03 0.06 0.14 -0.01 0.03
## X214894_x_at -0.08 -0.17 0.55 0.17 -0.13 0.03 0.08 0.05 0.05 0.09 -0.13
## X214936_at 0.04 -0.14 -0.02 0.16 0.13 -0.01 0.08 0.12 -0.10 0.18 -0.12
## X215282_at 0.10 -0.05 -0.13 0.12 -0.21 0.03 0.03 -0.02 0.43 -0.08 0.04
## X216097_at 0.02 0.02 0.03 -0.02 -0.06 -0.06 0.02 0.01 0.02 0.04 0.00
## X217272_s_at 0.01 0.03 -0.04 0.00 -0.10 0.04 0.00 0.09 -0.04 0.04 -0.04
## X220663_at 0.00 -0.09 0.53 -0.07 0.01 -0.02 -0.01 0.00 -0.01 -0.02 0.03
## X220965_s_at -0.02 0.10 -0.11 -0.04 -0.06 -0.03 -0.17 -0.02 0.16 -0.02 -0.04
## X222027_at -0.05 -0.05 -0.13 -0.17 0.03 0.21 -0.03 0.02 0.06 -0.09 -0.10
## X222184_at -0.03 0.02 0.00 0.12 -0.05 -0.03 0.65 0.01 0.18 0.16 -0.04
## X222371_at -0.05 0.01 -0.17 0.12 0.14 -0.05 0.06 0.19 -0.18 0.00 -0.02
## X226406_at -0.16 -0.18 0.16 0.04 -0.11 0.02 -0.10 0.03 -0.04 -0.09 0.05
## X227623_at -0.08 -0.23 0.22 0.12 -0.07 -0.08 0.06 -0.08 0.14 0.34 0.17
## X228041_at -0.10 0.19 -0.02 0.03 0.30 0.01 -0.05 -0.04 0.25 0.10 -0.08
## X230228_at 0.01 -0.12 0.06 0.03 -0.02 0.04 0.00 -0.04 -0.01 0.13 0.01
## X232968_at -0.03 -0.07 -0.03 0.11 0.17 0.04 -0.04 0.32 -0.09 -0.04 0.05
## X234774_at 0.03 0.04 -0.03 -0.04 0.03 0.06 0.13 -0.03 -0.01 0.00 0.00
## X239537_at -0.07 -0.01 0.00 -0.03 0.00 -0.04 0.11 -0.03 0.07 0.65 -0.03
## X239735_at 0.00 0.09 -0.23 0.04 0.15 -0.04 0.00 0.27 -0.09 0.02 0.12
## X241255_at 0.09 0.06 -0.01 -0.10 -0.06 -0.06 -0.02 0.00 -0.03 -0.01 0.08
## X241366_at 0.06 0.08 0.06 0.07 -0.05 -0.10 0.00 0.12 0.19 0.07 0.13
## X243537_at 0.00 -0.06 -0.13 0.09 0.13 -0.07 0.07 0.13 -0.10 -0.08 -0.07
## X243900_at -0.04 0.10 0.00 -0.06 -0.03 0.02 -0.03 -0.11 0.03 -0.05 0.03
## X117_at 0.00 0.12 0.11 -0.04 0.18 -0.26 -0.02 0.08 -0.02 0.08 0.13
## X1552261_at 0.04 -0.02 0.04 0.02 0.02 0.01 0.01 -0.03 0.03 -0.03 0.02
## X1552264_a_at -0.04 0.05 0.01 0.02 -0.03 0.03 0.03 -0.11 0.02 -0.06 -0.01
## X1552266_at 0.11 -0.24 -0.04 -0.03 -0.08 -0.05 0.50 -0.08 -0.07 0.03 0.03
## X1552271_at 0.47 0.03 -0.01 -0.02 0.01 0.11 -0.12 -0.05 -0.03 -0.10 -0.02
## X1552272_a_at 0.90 -0.01 -0.02 0.04 -0.07 0.00 0.03 0.02 0.04 -0.05 0.05
## X1552280_at -0.19 0.02 0.01 -0.08 0.01 0.10 0.06 -0.07 -0.08 0.02 0.09
## X1552281_at 0.01 0.00 0.07 -0.12 -0.06 0.02 -0.09 -0.01 -0.03 0.04 0.05
## X1552286_at -0.01 -0.10 -0.07 -0.15 -0.18 0.15 0.22 -0.18 0.10 -0.01 0.01
## X1552302_at -0.10 -0.07 -0.04 -0.09 0.59 -0.12 -0.06 0.08 0.02 -0.07 -0.01
## X1552303_a_at -0.10 -0.09 -0.07 0.08 0.61 -0.02 -0.11 0.11 -0.03 0.02 0.01
## X1552306_at -0.06 0.30 -0.27 0.07 0.01 0.09 0.08 0.09 0.13 -0.15 -0.20
## X1552311_a_at -0.09 0.00 -0.02 -0.03 0.07 -0.07 -0.03 -0.02 -0.08 -0.09 0.09
## X1552315_at -0.05 -0.06 0.00 0.00 0.01 0.06 0.02 0.01 -0.01 0.08 0.08
## X1552318_at -0.07 -0.14 -0.04 0.04 -0.02 0.01 0.00 0.02 -0.02 0.07 -0.03
## X1552323_s_at 0.03 -0.01 0.03 -0.02 -0.03 0.02 0.10 0.08 -0.01 0.03 -0.05
## X1552347_at -0.08 0.06 -0.08 0.11 0.13 -0.08 0.10 0.11 -0.03 -0.02 -0.09
## X1552349_a_at 0.02 0.00 0.09 -0.01 -0.01 0.01 0.00 -0.04 0.00 0.04 0.03
## X1552355_s_at 0.10 0.03 -0.03 -0.12 -0.10 0.02 0.02 0.02 -0.02 -0.13 -0.02
## X1552364_s_at -0.01 0.73 -0.19 0.05 -0.05 0.01 -0.12 -0.01 -0.04 -0.02 0.06
## X1552365_at -0.04 -0.03 -0.04 -0.01 0.06 -0.08 -0.04 0.13 -0.12 0.09 -0.07
## X1552377_s_at 0.01 -0.08 0.13 0.00 0.00 -0.05 -0.07 -0.06 -0.03 -0.06 0.05
## X1552386_at -0.08 -0.10 -0.05 0.14 -0.16 -0.04 -0.07 0.13 0.02 -0.25 0.12
## X1552388_at 0.06 0.04 0.21 0.06 -0.10 -0.02 -0.03 0.03 0.01 0.07 0.10
## X1552396_at -0.03 0.04 0.02 -0.19 0.07 0.16 -0.08 0.00 -0.02 -0.09 0.13
## X1552399_a_at 0.05 0.15 0.14 0.31 0.08 0.02 -0.04 0.06 -0.08 -0.03 0.08
## X1552402_at 0.09 0.16 0.19 0.11 -0.05 0.09 0.10 -0.32 0.04 -0.04 -0.04
## X1552412_a_at 0.01 0.01 -0.19 0.09 -0.03 -0.07 0.10 0.11 0.02 -0.01 0.02
## X1552426_a_at -0.03 -0.01 0.07 0.12 0.02 -0.18 0.02 -0.02 0.07 0.10 0.09
## X1552439_s_at 0.01 0.00 -0.03 0.02 -0.01 0.01 0.04 0.04 0.15 0.05 0.02
## X1552445_a_at 0.03 -0.06 -0.09 -0.01 -0.04 0.01 0.10 -0.07 -0.06 0.02 0.06
## X1552450_a_at -0.07 -0.05 -0.05 0.04 -0.11 0.04 -0.13 0.00 0.01 0.04 0.03
## X1552453_a_at 0.15 0.02 -0.04 -0.06 -0.09 0.00 -0.05 0.01 0.07 -0.04 0.06
## X1552472_a_at 0.02 0.16 -0.05 -0.03 0.05 0.04 -0.04 0.04 0.02 -0.04 -0.11
## X1552482_at -0.07 -0.08 0.05 -0.04 0.13 -0.05 -0.14 0.25 -0.09 0.01 0.10
## X1552486_s_at -0.09 0.04 0.19 -0.05 -0.04 -0.01 -0.16 -0.22 0.26 -0.11 0.03
## X1552491_at -0.04 0.03 -0.05 -0.12 0.11 0.25 0.04 -0.07 -0.04 0.01 -0.10
## X1552501_a_at 0.09 -0.11 0.03 -0.02 0.00 -0.05 -0.08 -0.11 0.04 0.11 0.00
## X1552516_a_at -0.17 0.05 -0.02 0.26 -0.03 -0.06 -0.11 0.03 0.03 -0.07 -0.03
## X1552518_s_at 0.20 0.18 -0.05 -0.05 0.07 0.02 0.08 -0.06 0.05 -0.01 -0.04
## X1552523_a_at 0.03 0.06 -0.10 0.01 -0.07 0.00 0.05 -0.01 -0.04 0.30 0.02
## X1552528_at 0.11 0.08 0.04 0.00 -0.07 0.01 0.11 0.00 0.01 -0.15 -0.05
## X1552532_a_at -0.03 -0.05 -0.05 0.04 -0.02 0.04 0.04 -0.02 0.01 -0.04 -0.02
## X1552535_at 0.04 -0.15 -0.02 0.12 0.03 0.00 0.00 -0.03 -0.13 0.17 0.00
## X1552538_a_at -0.02 -0.01 -0.02 -0.09 -0.03 0.03 -0.02 -0.69 -0.05 0.04 0.02
## X1552555_at -0.01 0.10 0.07 0.08 -0.01 -0.10 0.05 0.00 -0.01 0.00 0.08
## X1552563_a_at -0.10 -0.22 0.11 0.06 0.06 0.08 0.44 0.22 -0.06 0.03 -0.13
## X1552566_at 0.09 0.00 0.10 0.07 0.00 -0.11 0.05 -0.06 -0.12 -0.11 -0.09
## X1552569_a_at 0.04 0.04 0.11 -0.02 0.03 0.09 -0.07 -0.05 0.00 0.03 0.00
## X1552582_at 0.06 0.20 -0.01 0.00 -0.02 -0.11 0.04 0.05 0.01 0.12 0.13
## X1552585_s_at 0.00 -0.03 -0.04 -0.04 0.05 -0.05 0.00 0.01 -0.10 0.13 -0.04
## X1552590_a_at 0.00 -0.01 -0.13 -0.11 0.04 0.51 -0.07 0.04 -0.14 -0.09 0.01
## X1552592_at 0.08 0.12 0.18 0.03 -0.04 0.01 0.08 -0.12 0.00 0.08 0.21
## X1552594_at 0.09 -0.01 0.01 0.11 -0.02 -0.01 -0.04 -0.01 -0.02 0.00 0.00
## X1552596_at 0.06 -0.13 -0.01 -0.04 0.07 0.00 0.02 0.06 0.05 0.02 0.03
## X1552612_at -0.09 0.04 -0.06 -0.14 -0.02 0.06 -0.11 0.13 0.03 -0.03 0.09
## X1552623_at 0.03 0.05 -0.10 -0.02 0.07 0.00 -0.14 -0.01 0.01 -0.16 0.06
## MR22 MR17 MR40 MR38 MR11 MR37 MR26 MR6 MR35 MR19 MR25
## X1552584_at 0.13 -0.09 0.04 -0.06 0.06 -0.09 0.00 0.04 -0.07 0.02 -0.07
## X1556147_at -0.03 0.03 0.09 0.13 0.03 -0.01 -0.07 -0.02 -0.02 -0.03 -0.05
## X1557029_at -0.06 -0.07 0.04 0.04 0.12 -0.06 0.00 0.07 -0.02 0.19 0.12
## X1557677_a_at -0.07 0.02 -0.05 -0.16 0.03 -0.05 -0.02 -0.02 0.03 -0.07 -0.01
## X1559732_at -0.04 0.03 0.02 0.01 -0.03 0.06 0.13 -0.06 0.75 0.02 0.01
## X1559789_a_at 0.02 -0.08 -0.02 -0.01 -0.05 0.09 0.06 -0.12 0.09 0.03 0.02
## X1564573_at -0.01 -0.01 0.64 -0.03 0.01 -0.11 0.01 -0.08 0.01 -0.07 0.01
## X1566879_at 0.09 -0.05 0.00 0.04 -0.06 -0.08 -0.03 0.05 -0.02 -0.13 0.02
## X1569983_at -0.07 0.01 0.02 -0.01 0.05 -0.14 0.71 0.04 0.19 -0.01 -0.07
## X201121_s_at 0.03 0.02 -0.04 0.06 0.03 0.04 -0.07 -0.13 -0.10 -0.14 -0.07
## X202419_at 0.11 -0.05 -0.09 0.20 0.15 0.07 0.11 0.02 0.03 -0.11 0.26
## X202425_x_at -0.06 -0.02 -0.04 -0.09 -0.02 0.07 0.01 -0.03 0.04 0.00 -0.03
## X202867_s_at 0.06 -0.03 0.07 -0.04 -0.10 -0.01 0.08 0.02 0.05 0.03 0.06
## X203006_at -0.75 -0.05 -0.01 -0.05 0.02 0.01 0.04 -0.07 0.04 0.06 -0.06
## X203761_at 0.11 -0.09 -0.05 0.02 -0.05 0.11 0.01 -0.09 -0.14 -0.02 -0.03
## X204389_at -0.08 0.15 0.04 0.04 -0.03 -0.08 -0.03 -0.05 0.09 0.03 0.00
## X205915_x_at -0.01 0.08 -0.09 -0.17 0.05 -0.04 -0.10 0.09 0.04 -0.03 0.05
## X206024_at -0.17 0.06 0.08 0.01 0.03 0.05 0.07 -0.09 0.10 0.03 0.23
## X206396_at 0.07 0.06 0.09 -0.04 0.05 0.16 0.00 0.10 -0.01 -0.09 -0.04
## X210516_at -0.04 0.04 -0.02 0.08 0.14 -0.11 0.02 0.04 -0.05 0.01 0.04
## X212901_s_at 0.09 -0.10 0.10 -0.12 0.03 0.13 0.08 -0.06 -0.12 0.04 0.06
## X214894_x_at 0.03 0.09 0.15 0.06 0.05 0.08 -0.02 -0.05 -0.04 -0.10 0.03
## X214936_at 0.11 -0.08 -0.10 0.02 -0.11 -0.06 -0.05 -0.15 -0.03 -0.04 -0.11
## X215282_at -0.05 -0.03 -0.12 -0.03 0.30 -0.07 -0.06 -0.18 0.10 0.09 0.05
## X216097_at -0.05 0.05 -0.03 0.04 0.05 -0.02 -0.01 0.05 0.02 0.75 -0.04
## X217272_s_at 0.01 -0.02 0.02 -0.05 0.00 -0.03 0.02 0.01 0.01 0.03 0.05
## X220663_at -0.05 0.01 0.03 0.03 -0.05 -0.05 0.00 -0.04 0.01 0.05 -0.03
## X220965_s_at 0.03 0.22 0.08 -0.04 0.01 -0.04 0.19 -0.03 -0.08 -0.03 0.04
## X222027_at 0.09 0.08 -0.06 0.07 0.09 -0.05 0.11 -0.11 -0.06 0.28 -0.15
## X222184_at -0.16 -0.01 0.14 -0.02 0.00 0.05 0.05 -0.03 0.04 -0.03 -0.04
## X222371_at 0.04 0.03 -0.12 0.14 -0.05 0.01 0.02 0.01 -0.05 -0.03 -0.10
## X226406_at 0.25 0.02 0.14 0.13 0.10 -0.09 0.04 0.14 -0.12 -0.01 0.10
## X227623_at -0.04 -0.21 0.05 0.18 -0.17 0.16 0.07 -0.21 0.02 -0.06 0.01
## X228041_at 0.00 -0.05 -0.01 0.20 -0.04 -0.21 -0.16 0.01 -0.09 0.05 -0.06
## X230228_at -0.13 0.10 0.21 0.03 0.02 0.24 0.17 -0.06 0.07 0.01 -0.12
## X232968_at -0.21 0.04 0.15 0.13 -0.13 -0.01 0.02 -0.03 -0.04 -0.02 0.10
## X234774_at 0.01 -0.02 0.18 -0.02 0.04 -0.12 -0.09 0.13 0.02 0.06 0.07
## X239537_at -0.05 -0.03 -0.04 0.10 0.05 0.09 0.10 -0.08 0.14 0.04 -0.01
## X239735_at 0.05 -0.01 -0.02 0.16 -0.13 0.00 0.09 0.00 -0.05 -0.07 -0.04
## X241255_at -0.03 0.04 -0.05 0.00 0.06 -0.02 0.03 0.00 0.03 0.05 0.04
## X241366_at 0.23 0.01 -0.07 -0.05 0.08 -0.07 0.13 -0.02 0.02 -0.21 0.05
## X243537_at -0.01 0.11 -0.10 -0.04 0.10 0.00 0.11 0.02 0.03 -0.05 0.00
## X243900_at 0.03 0.05 0.00 -0.06 -0.04 0.00 0.05 0.04 0.05 0.00 0.06
## X117_at 0.05 -0.12 0.03 0.13 0.06 0.11 0.04 0.03 -0.14 -0.15 0.03
## X1552261_at 0.05 0.83 0.03 0.10 0.14 0.03 0.01 0.04 0.03 0.06 0.03
## X1552264_a_at 0.02 -0.10 0.03 -0.02 0.09 0.15 -0.01 -0.06 -0.05 0.10 -0.01
## X1552266_at 0.13 0.07 -0.08 0.08 0.26 -0.14 -0.01 -0.16 -0.04 0.07 0.03
## X1552271_at -0.08 -0.06 0.06 0.03 0.03 0.01 -0.08 0.08 -0.01 -0.07 0.03
## X1552272_a_at -0.06 0.06 -0.03 -0.04 0.02 -0.04 0.00 0.00 0.01 0.04 0.09
## X1552280_at 0.02 0.15 -0.03 -0.02 0.02 -0.04 0.07 0.07 -0.23 0.03 0.04
## X1552281_at -0.03 -0.01 0.07 0.06 -0.06 0.13 0.15 -0.07 -0.05 -0.12 0.04
## X1552286_at 0.20 0.04 -0.09 0.04 0.14 -0.11 -0.02 0.11 0.01 -0.02 -0.15
## X1552302_at -0.01 0.02 -0.07 0.05 -0.01 0.11 0.21 0.10 -0.01 -0.12 -0.09
## X1552303_a_at 0.06 0.05 -0.12 0.00 -0.05 0.11 0.05 0.07 -0.03 -0.09 -0.12
## X1552306_at -0.05 -0.01 0.04 0.09 -0.10 -0.08 -0.21 0.02 0.04 0.04 -0.27
## X1552311_a_at 0.07 -0.03 0.06 -0.18 0.03 -0.04 -0.04 0.12 0.11 0.01 -0.05
## X1552315_at 0.01 0.10 -0.03 0.04 -0.02 0.01 -0.02 0.09 0.20 -0.01 0.00
## X1552318_at 0.05 0.04 -0.06 0.01 -0.10 -0.04 0.00 0.06 0.16 -0.08 -0.03
## X1552323_s_at -0.01 0.04 0.06 -0.03 0.05 0.05 -0.06 0.01 0.02 -0.10 0.01
## X1552347_at 0.15 -0.07 -0.03 0.07 0.03 0.01 0.03 0.12 0.08 0.08 0.00
## X1552349_a_at 0.01 0.06 -0.01 0.08 0.03 -0.01 0.02 -0.10 0.01 0.04 0.03
## X1552355_s_at 0.07 -0.15 -0.03 -0.13 0.10 -0.07 0.05 -0.05 -0.02 0.02 0.11
## X1552364_s_at 0.04 -0.02 0.05 -0.01 0.00 -0.04 0.03 0.03 -0.12 0.02 0.01
## X1552365_at -0.01 0.02 -0.07 0.01 -0.03 0.72 -0.10 -0.06 0.06 -0.02 0.07
## X1552377_s_at 0.03 0.15 -0.06 0.05 0.00 0.02 0.06 -0.01 0.04 0.05 -0.02
## X1552386_at 0.00 -0.03 0.10 0.08 -0.12 0.11 -0.11 0.11 -0.12 -0.13 -0.03
## X1552388_at 0.03 0.13 0.48 0.01 -0.12 -0.03 0.03 0.12 0.03 0.00 0.08
## X1552396_at 0.04 0.09 -0.15 -0.02 -0.01 0.02 0.10 -0.03 -0.14 -0.07 0.00
## X1552399_a_at 0.01 -0.03 -0.06 -0.04 -0.03 -0.09 -0.04 0.09 -0.07 0.00 0.06
## X1552402_at -0.08 -0.01 0.05 -0.11 -0.10 -0.01 -0.06 0.08 -0.05 0.08 0.19
## X1552412_a_at -0.06 0.00 0.10 0.01 -0.11 0.13 -0.07 0.01 -0.05 -0.15 0.10
## X1552426_a_at 0.05 0.11 -0.01 0.76 0.13 0.00 -0.01 0.02 0.02 0.05 -0.06
## X1552439_s_at -0.02 0.13 -0.01 0.11 0.75 -0.02 0.04 0.04 -0.03 0.04 -0.01
## X1552445_a_at 0.00 -0.04 0.03 -0.04 -0.04 -0.04 -0.05 0.04 -0.02 0.07 0.01
## X1552450_a_at 0.11 0.08 0.14 0.04 -0.04 0.06 0.22 -0.16 -0.02 -0.03 0.05
## X1552453_a_at 0.11 0.05 0.04 -0.10 -0.03 0.09 -0.10 0.04 0.01 -0.07 0.67
## X1552472_a_at 0.06 0.01 -0.15 0.12 -0.04 -0.14 -0.19 0.08 -0.06 0.11 -0.10
## X1552482_at -0.02 0.06 -0.11 0.23 0.06 0.08 0.02 -0.19 0.02 -0.03 -0.05
## X1552486_s_at 0.15 -0.17 0.08 0.04 0.02 -0.02 -0.07 0.01 -0.08 -0.12 0.15
## X1552491_at 0.09 0.06 0.20 0.01 -0.01 -0.07 0.16 -0.01 0.00 -0.03 -0.10
## X1552501_a_at 0.04 -0.02 0.09 0.08 0.04 0.03 0.12 0.04 0.18 0.02 0.16
## X1552516_a_at 0.10 0.05 0.00 -0.02 -0.01 0.00 -0.06 -0.06 -0.03 0.05 -0.06
## X1552518_s_at 0.08 -0.11 -0.13 0.05 0.18 -0.03 -0.10 0.14 0.02 0.02 -0.04
## X1552523_a_at -0.11 0.08 -0.02 -0.03 0.09 -0.11 -0.04 -0.02 0.09 0.04 -0.09
## X1552528_at 0.11 0.05 -0.01 -0.07 -0.10 -0.02 -0.05 -0.08 0.06 0.06 0.01
## X1552532_a_at -0.01 -0.01 -0.07 0.08 0.02 -0.04 0.00 -0.09 -0.08 -0.02 -0.02
## X1552535_at -0.11 0.04 -0.03 0.07 0.05 -0.07 0.08 -0.06 0.18 0.00 -0.08
## X1552538_a_at -0.05 0.03 -0.02 0.03 -0.04 -0.11 0.00 -0.01 0.00 -0.01 -0.01
## X1552555_at -0.11 0.06 -0.05 -0.05 -0.11 -0.13 -0.04 0.05 -0.11 0.08 0.16
## X1552563_a_at 0.12 -0.08 -0.03 -0.02 -0.15 -0.06 0.06 -0.05 0.03 0.06 -0.07
## X1552566_at -0.21 0.15 -0.10 -0.22 0.00 -0.14 -0.18 0.15 -0.03 -0.14 0.11
## X1552569_a_at 0.00 0.06 -0.04 -0.10 0.02 -0.05 -0.09 0.02 0.02 0.09 0.00
## X1552582_at 0.11 0.01 0.08 -0.01 -0.06 0.16 0.10 -0.05 0.07 0.02 0.15
## X1552585_s_at -0.07 -0.12 -0.03 0.01 -0.01 0.03 0.02 -0.06 -0.01 0.01 0.04
## X1552590_a_at 0.07 -0.10 -0.14 0.06 -0.06 -0.07 0.19 0.14 -0.11 0.04 0.10
## X1552592_at -0.02 -0.04 -0.04 0.00 -0.03 0.15 -0.01 0.01 -0.05 0.18 0.04
## X1552594_at 0.03 0.07 0.08 -0.04 -0.01 0.00 -0.01 0.09 -0.05 -0.09 0.00
## X1552596_at 0.03 0.09 0.08 0.03 0.08 0.04 0.08 0.08 0.13 -0.08 -0.08
## X1552612_at 0.03 0.05 0.06 0.03 0.04 -0.07 0.06 0.05 -0.05 0.04 0.07
## X1552623_at 0.12 0.06 -0.02 0.02 0.05 -0.10 0.04 0.74 -0.10 0.08 0.04
## MR32 MR34 MR18 MR30 MR31 MR27 MR23 h2 u2 com
## X1552584_at 0.02 -0.02 0.01 0.08 0.05 0.01 -0.12 0.92 0.079 2.0
## X1556147_at -0.01 -0.05 -0.11 0.01 0.00 -0.02 -0.01 0.70 0.303 1.7
## X1557029_at 0.08 -0.08 -0.02 0.02 0.08 0.02 0.06 0.89 0.115 3.0
## X1557677_a_at -0.01 0.00 -0.05 -0.01 0.03 -0.04 0.02 0.58 0.418 1.5
## X1559732_at 0.02 0.04 0.03 0.04 0.01 0.00 0.00 0.67 0.329 1.4
## X1559789_a_at -0.05 0.03 0.06 -0.02 -0.09 0.02 -0.02 0.74 0.256 1.5
## X1564573_at 0.01 0.07 -0.05 -0.09 -0.02 0.01 -0.04 0.75 0.253 3.1
## X1566879_at 0.09 -0.01 0.09 0.10 0.05 0.02 -0.15 0.74 0.261 3.2
## X1569983_at -0.01 0.05 0.04 0.06 -0.08 0.01 0.03 0.76 0.240 2.2
## X201121_s_at 0.05 -0.61 -0.09 0.04 0.01 -0.02 0.00 0.69 0.315 3.3
## X202419_at 0.25 0.09 0.07 0.04 -0.02 0.01 -0.04 0.80 0.196 13.5
## X202425_x_at 0.02 0.06 -0.08 -0.05 -0.01 0.02 -0.01 0.86 0.142 1.3
## X202867_s_at 0.08 0.03 -0.11 -0.04 -0.04 -0.02 0.07 0.63 0.369 2.2
## X203006_at 0.01 0.03 0.03 -0.03 0.01 0.03 0.00 0.69 0.312 1.5
## X203761_at -0.04 -0.03 0.00 0.04 0.04 -0.01 -0.04 0.94 0.062 2.0
## X204389_at -0.05 -0.03 0.07 0.01 -0.06 0.07 0.06 0.77 0.233 1.9
## X205915_x_at 0.08 -0.01 0.12 -0.04 -0.02 0.15 0.02 0.74 0.259 3.7
## X206024_at 0.26 -0.12 0.07 0.01 -0.02 0.10 -0.08 0.46 0.542 11.6
## X206396_at 0.00 -0.10 -0.05 0.00 -0.07 -0.02 0.05 0.66 0.338 2.2
## X210516_at 0.02 -0.02 -0.05 -0.04 -0.03 -0.05 -0.01 0.70 0.301 1.7
## X212901_s_at 0.25 -0.05 0.23 -0.18 0.05 -0.07 0.13 0.73 0.274 12.1
## X214894_x_at -0.19 0.07 0.05 0.08 0.08 0.02 0.08 0.78 0.224 5.5
## X214936_at 0.12 0.11 -0.08 0.06 -0.09 -0.04 0.07 0.77 0.228 4.7
## X215282_at 0.06 0.15 0.11 0.17 0.15 -0.01 0.10 0.71 0.294 9.7
## X216097_at 0.02 0.07 0.05 0.03 0.06 0.00 0.00 0.67 0.334 1.4
## X217272_s_at 0.05 0.03 -0.09 -0.05 0.10 -0.06 -0.04 0.73 0.267 1.5
## X220663_at 0.05 0.05 -0.03 0.03 0.04 -0.02 -0.02 0.36 0.638 1.7
## X220965_s_at -0.05 -0.10 -0.04 -0.15 0.10 0.07 -0.09 0.68 0.316 4.0
## X222027_at -0.01 -0.07 0.07 -0.13 -0.21 -0.01 0.03 0.84 0.157 10.0
## X222184_at -0.01 0.01 -0.02 -0.05 -0.05 0.04 -0.10 0.70 0.303 2.7
## X222371_at -0.16 -0.11 0.05 -0.04 -0.05 -0.01 0.03 0.93 0.074 6.6
## X226406_at 0.12 -0.12 0.04 -0.10 0.08 -0.19 0.09 0.84 0.157 8.1
## X227623_at -0.01 0.19 -0.04 0.06 0.07 0.10 -0.05 0.83 0.167 15.3
## X228041_at -0.04 0.09 0.16 0.02 -0.04 -0.01 -0.01 0.73 0.275 8.2
## X230228_at -0.05 0.27 -0.09 -0.02 -0.03 -0.05 -0.09 0.69 0.308 5.7
## X232968_at 0.02 0.05 0.02 -0.01 -0.05 -0.16 0.03 0.83 0.174 8.4
## X234774_at 0.04 0.00 0.04 0.08 0.10 0.04 -0.31 0.83 0.173 3.6
## X239537_at 0.00 -0.02 -0.01 -0.02 0.03 0.07 0.02 0.56 0.436 1.7
## X239735_at -0.44 0.12 -0.15 -0.05 -0.08 -0.03 0.04 0.87 0.130 9.3
## X241255_at 0.03 0.04 0.75 -0.01 -0.04 0.00 0.00 0.68 0.319 1.4
## X241366_at 0.08 0.04 -0.01 0.02 -0.05 -0.03 0.08 0.86 0.145 5.0
## X243537_at -0.16 0.11 -0.12 0.00 -0.08 -0.03 0.02 0.79 0.214 4.7
## X243900_at -0.01 0.04 -0.02 -0.02 0.03 0.03 0.01 0.88 0.119 1.6
## X117_at -0.03 -0.05 0.02 -0.14 0.05 0.12 -0.05 0.70 0.302 6.9
## X1552261_at -0.01 0.00 0.04 0.08 0.04 -0.06 0.01 0.80 0.197 1.3
## X1552264_a_at 0.03 -0.11 0.00 0.02 0.02 0.00 -0.06 0.90 0.099 2.0
## X1552266_at 0.06 -0.02 -0.05 -0.06 -0.02 -0.05 0.13 0.59 0.413 4.7
## X1552271_at -0.02 0.01 0.07 0.03 0.02 0.03 -0.10 0.77 0.234 3.2
## X1552272_a_at 0.01 0.03 0.09 0.01 0.03 0.00 0.01 0.94 0.064 1.3
## X1552280_at 0.05 -0.17 0.03 0.17 -0.03 0.09 0.29 0.67 0.334 6.0
## X1552281_at -0.05 0.07 -0.03 0.05 0.08 0.03 -0.03 0.85 0.145 1.6
## X1552286_at 0.17 -0.20 0.08 0.06 0.01 -0.04 0.05 0.71 0.293 11.4
## X1552302_at -0.07 0.09 -0.13 -0.02 0.01 0.03 0.07 0.93 0.075 4.8
## X1552303_a_at -0.01 0.08 -0.11 0.00 0.06 0.06 -0.05 0.90 0.096 4.1
## X1552306_at 0.14 0.01 -0.07 -0.18 0.08 0.17 -0.04 0.77 0.228 12.8
## X1552311_a_at -0.06 0.13 -0.06 0.01 -0.12 0.01 -0.02 0.74 0.256 2.8
## X1552315_at 0.01 0.08 0.01 0.03 -0.07 -0.02 0.02 0.92 0.083 1.4
## X1552318_at -0.03 0.06 0.00 0.00 -0.09 -0.04 0.08 0.91 0.092 1.5
## X1552323_s_at 0.01 -0.05 0.01 0.05 0.05 0.05 -0.01 0.84 0.157 1.4
## X1552347_at -0.12 0.04 0.10 0.05 -0.22 0.03 0.21 0.74 0.255 4.2
## X1552349_a_at -0.10 0.04 0.01 0.04 -0.07 0.00 0.05 0.83 0.168 1.6
## X1552355_s_at 0.04 0.02 0.10 0.08 0.14 -0.05 0.00 0.79 0.214 3.2
## X1552364_s_at -0.01 0.01 0.07 -0.02 0.02 -0.02 0.00 0.72 0.280 1.8
## X1552365_at 0.01 -0.01 -0.01 0.04 -0.03 0.02 0.01 0.65 0.348 1.6
## X1552377_s_at 0.04 -0.07 -0.02 0.57 -0.08 0.11 0.01 0.62 0.384 3.1
## X1552386_at 0.07 0.11 -0.01 0.28 0.03 -0.17 0.05 0.79 0.213 6.5
## X1552388_at -0.01 -0.02 -0.07 0.10 -0.04 -0.07 0.06 0.63 0.366 5.7
## X1552396_at -0.03 -0.06 -0.04 0.02 -0.14 0.07 0.00 0.58 0.419 7.7
## X1552399_a_at -0.28 -0.03 -0.24 0.00 -0.20 -0.08 -0.01 0.71 0.288 10.2
## X1552402_at 0.13 -0.13 -0.03 -0.03 0.07 -0.05 0.06 0.73 0.268 4.5
## X1552412_a_at 0.03 0.23 -0.07 0.12 0.05 0.17 0.00 0.55 0.448 6.6
## X1552426_a_at -0.01 -0.04 -0.01 0.03 -0.07 0.00 0.01 0.73 0.270 1.6
## X1552439_s_at 0.02 -0.03 0.06 -0.02 0.01 0.00 -0.01 0.66 0.339 1.4
## X1552445_a_at 0.11 -0.11 -0.03 -0.02 0.03 0.00 -0.05 0.85 0.147 1.5
## X1552450_a_at 0.10 0.06 0.04 0.14 0.04 0.18 0.09 0.79 0.208 2.6
## X1552453_a_at 0.02 0.07 0.05 0.00 0.00 0.03 0.01 0.74 0.257 2.6
## X1552472_a_at 0.16 -0.07 -0.04 -0.18 0.16 -0.02 0.07 0.87 0.128 3.4
## X1552482_at -0.10 0.09 -0.12 -0.05 -0.06 0.01 0.11 0.79 0.211 16.7
## X1552486_s_at 0.15 -0.02 -0.11 0.03 0.11 -0.04 -0.17 0.81 0.188 6.7
## X1552491_at 0.07 -0.12 -0.08 0.05 0.03 0.03 0.14 0.68 0.317 5.0
## X1552501_a_at -0.05 0.03 0.04 -0.03 0.13 0.07 0.33 0.80 0.196 3.0
## X1552516_a_at -0.11 -0.06 -0.15 0.10 -0.18 0.06 0.07 0.84 0.163 3.2
## X1552518_s_at 0.33 -0.18 0.00 0.01 -0.03 -0.03 0.11 0.75 0.252 10.1
## X1552523_a_at 0.11 0.16 0.01 0.01 -0.16 -0.18 0.03 0.71 0.289 5.3
## X1552528_at -0.08 -0.04 0.08 -0.04 0.02 0.00 0.03 0.75 0.251 1.6
## X1552532_a_at 0.02 -0.10 0.02 -0.03 -0.02 -0.03 -0.01 0.72 0.276 1.7
## X1552535_at 0.00 0.07 0.09 0.10 0.05 -0.12 -0.02 0.77 0.233 2.3
## X1552538_a_at 0.02 0.04 0.00 0.02 0.02 -0.01 0.01 0.58 0.424 1.4
## X1552555_at 0.12 -0.10 0.04 0.03 0.05 0.09 -0.08 0.86 0.138 1.9
## X1552563_a_at -0.05 -0.10 0.05 0.12 -0.01 -0.02 0.03 0.66 0.340 8.4
## X1552566_at 0.00 -0.03 0.05 0.11 0.09 -0.04 -0.22 0.67 0.333 15.1
## X1552569_a_at 0.02 -0.01 -0.06 -0.04 0.58 0.03 0.00 0.59 0.413 2.7
## X1552582_at -0.10 0.10 -0.01 0.31 0.14 -0.11 -0.06 0.45 0.549 12.5
## X1552585_s_at 0.02 0.02 0.01 0.06 0.04 0.69 0.02 0.85 0.154 2.4
## X1552590_a_at 0.09 -0.01 -0.03 -0.26 0.16 0.08 -0.12 0.81 0.189 6.4
## X1552592_at 0.00 0.16 -0.01 0.01 -0.06 -0.04 0.03 0.60 0.404 4.4
## X1552594_at 0.01 0.05 0.05 0.07 0.04 0.01 -0.02 0.81 0.195 1.8
## X1552596_at -0.22 -0.10 0.01 -0.04 -0.11 0.02 0.05 0.76 0.242 3.1
## X1552612_at -0.19 0.03 0.00 0.01 -0.10 -0.10 0.07 0.87 0.127 1.8
## X1552623_at -0.01 0.11 -0.01 -0.02 0.02 -0.05 0.00 0.88 0.123 2.4
##
## MR1 MR3 MR2 MR15 MR10 MR7 MR14 MR8 MR20 MR33 MR36
## SS loadings 10.82 6.50 6.00 2.75 2.25 2.05 1.97 1.90 1.89 1.69 1.57
## Proportion Var 0.11 0.07 0.06 0.03 0.02 0.02 0.02 0.02 0.02 0.02 0.02
## Cumulative Var 0.11 0.17 0.23 0.26 0.28 0.30 0.32 0.34 0.36 0.38 0.39
## Proportion Explained 0.14 0.09 0.08 0.04 0.03 0.03 0.03 0.03 0.03 0.02 0.02
## Cumulative Proportion 0.14 0.23 0.31 0.35 0.38 0.41 0.43 0.46 0.48 0.51 0.53
## MR12 MR5 MR16 MR9 MR28 MR21 MR4 MR39 MR29 MR24 MR13
## SS loadings 1.51 1.45 1.42 1.40 1.39 1.37 1.36 1.36 1.32 1.30 1.30
## Proportion Var 0.02 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01
## Cumulative Var 0.41 0.42 0.44 0.45 0.47 0.48 0.49 0.51 0.52 0.53 0.55
## Proportion Explained 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02
## Cumulative Proportion 0.55 0.57 0.58 0.60 0.62 0.64 0.66 0.68 0.69 0.71 0.73
## MR22 MR17 MR40 MR38 MR11 MR37 MR26 MR6 MR35 MR19 MR25
## SS loadings 1.30 1.27 1.26 1.25 1.23 1.22 1.21 1.18 1.17 1.16 1.12
## Proportion Var 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01
## Cumulative Var 0.56 0.57 0.58 0.60 0.61 0.62 0.63 0.64 0.66 0.67 0.68
## Proportion Explained 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.01
## Cumulative Proportion 0.75 0.76 0.78 0.80 0.81 0.83 0.85 0.86 0.88 0.89 0.91
## MR32 MR34 MR18 MR30 MR31 MR27 MR23
## SS loadings 1.12 1.11 1.08 1.02 0.95 0.92 0.74
## Proportion Var 0.01 0.01 0.01 0.01 0.01 0.01 0.01
## Cumulative Var 0.69 0.70 0.71 0.72 0.73 0.74 0.75
## Proportion Explained 0.01 0.01 0.01 0.01 0.01 0.01 0.01
## Cumulative Proportion 0.92 0.94 0.95 0.97 0.98 0.99 1.00
##
## Mean item complexity = 4.5
## Test of the hypothesis that 40 factors are sufficient.
##
## df null model = 4950 with the objective function = 90.69 with Chi Square = 10504.89
## df of the model are 1730 and the objective function was 19.16
##
## The root mean square of the residuals (RMSR) is 0.01
## The df corrected root mean square of the residuals is 0.02
##
## The harmonic n.obs is 151 with the empirical chi square 253.3 with prob < 1
## The total n.obs was 151 with Likelihood Chi Square = 1708.48 with prob < 0.64
##
## Tucker Lewis Index of factoring reliability = 1.02
## RMSEA index = 0 and the 90 % confidence intervals are 0 0.017
## BIC = -6971.41
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy
## MR1 MR3 MR2 MR15 MR10 MR7
## Correlation of (regression) scores with factors 0.98 0.99 0.98 0.93 0.93 0.93
## Multiple R square of scores with factors 0.96 0.97 0.96 0.87 0.86 0.86
## Minimum correlation of possible factor scores 0.92 0.95 0.91 0.75 0.71 0.72
## MR14 MR8 MR20 MR33 MR36 MR12
## Correlation of (regression) scores with factors 0.95 0.95 0.94 0.91 0.93 0.98
## Multiple R square of scores with factors 0.91 0.90 0.89 0.83 0.86 0.96
## Minimum correlation of possible factor scores 0.82 0.79 0.78 0.67 0.71 0.91
## MR5 MR16 MR9 MR28 MR21 MR4
## Correlation of (regression) scores with factors 0.90 0.93 0.91 0.93 0.90 0.88
## Multiple R square of scores with factors 0.81 0.86 0.82 0.86 0.80 0.78
## Minimum correlation of possible factor scores 0.63 0.71 0.65 0.72 0.61 0.56
## MR39 MR29 MR24 MR13 MR22 MR17
## Correlation of (regression) scores with factors 0.89 0.89 0.89 0.93 0.90 0.91
## Multiple R square of scores with factors 0.79 0.80 0.79 0.86 0.80 0.83
## Minimum correlation of possible factor scores 0.58 0.60 0.58 0.73 0.61 0.66
## MR40 MR38 MR11 MR37 MR26 MR6
## Correlation of (regression) scores with factors 0.89 0.89 0.89 0.90 0.89 0.92
## Multiple R square of scores with factors 0.79 0.80 0.78 0.80 0.80 0.84
## Minimum correlation of possible factor scores 0.58 0.60 0.57 0.61 0.60 0.68
## MR35 MR19 MR25 MR32 MR34 MR18
## Correlation of (regression) scores with factors 0.88 0.89 0.88 0.90 0.87 0.87
## Multiple R square of scores with factors 0.78 0.80 0.78 0.82 0.75 0.77
## Minimum correlation of possible factor scores 0.57 0.59 0.56 0.63 0.50 0.53
## MR30 MR31 MR27 MR23
## Correlation of (regression) scores with factors 0.88 0.89 0.90 0.88
## Multiple R square of scores with factors 0.77 0.79 0.81 0.77
## Minimum correlation of possible factor scores 0.55 0.59 0.63 0.54
The difference between the eigenvalue-based factor selection and the cumulative variance reported in Factor Analysis (FA) is expected due to the way variance is handled in FA. Unlike eigenvalue-based methods, FA removes unique variance and re-estimates factor loadings, which naturally results in a lower cumulative variance. Additionally, factor rotation redistributes variance among factors, sometimes lowering the cumulative total compared to initial eigenvalue calculations. Eigenvalue-based selection assumes that all variance is shared and directly contributes to the factors, whereas FA specifically filters out noise and unique variance, leading to a more refined but lower cumulative variance. That’s why we can see that 40 factors we got are explaining only 75% of variance and not the 85% we expected from the cummulative variance proportion. Although it is a smaller proportion of variance it is still acceptable.
Heatmap of factor loadings
heatmap.2(abs(fa_result$loadings), trace = "none", density.info = "none",
col = colorRampPalette(c("white", "blue"))(100), main = "Factor Loadings Heatmap")
This heatmap visualizes factor loadings from Factor Analysis, showing the strength of association between genes (rows) and factors (columns). Darker blue shades indicate stronger loadings, meaning certain genes contribute significantly to specific factors. Some factors, like MR1 and MR3, have multiple genes with high loadings, suggesting they capture major variance patterns, while others, like MR9 or MR8, have fewer strong associations, indicating more specific effects. The dendrogram on the left highlights gene clustering, revealing potential co-regulation.
PCA is a dimension reduction technique that seeks to summarize the dataset by identifying the most important directions (principal components) that capture the maximum variance in the data. Unlike FA, PCA does not assume an underlying latent structure, instead it prioritizes variance maximization to retain as much information as possible while reducing the number of dimensions.
To evaluate the effectiveness of PCA on this dataset, I will compute the principal components, assess the proportion of variance explained by each component, and visualize the results.
pca <- prcomp(top_genes, center = TRUE, scale. = TRUE)
summary_pca <- summary(pca)
print(summary_pca)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 3.8136 3.11958 2.2091 2.14389 1.81855 1.71076 1.65446
## Proportion of Variance 0.1454 0.09732 0.0488 0.04596 0.03307 0.02927 0.02737
## Cumulative Proportion 0.1454 0.24275 0.2915 0.33752 0.37059 0.39986 0.42723
## PC8 PC9 PC10 PC11 PC12 PC13 PC14
## Standard deviation 1.62575 1.54996 1.45410 1.42356 1.40158 1.35146 1.32695
## Proportion of Variance 0.02643 0.02402 0.02114 0.02027 0.01964 0.01826 0.01761
## Cumulative Proportion 0.45366 0.47768 0.49883 0.51909 0.53874 0.55700 0.57461
## PC15 PC16 PC17 PC18 PC19 PC20 PC21
## Standard deviation 1.29029 1.2805 1.25668 1.2124 1.18183 1.14234 1.1315
## Proportion of Variance 0.01665 0.0164 0.01579 0.0147 0.01397 0.01305 0.0128
## Cumulative Proportion 0.59126 0.6077 0.62345 0.6381 0.65211 0.66516 0.6780
## PC22 PC23 PC24 PC25 PC26 PC27 PC28
## Standard deviation 1.11529 1.09261 1.07375 1.05164 1.03250 1.02181 1.01157
## Proportion of Variance 0.01244 0.01194 0.01153 0.01106 0.01066 0.01044 0.01023
## Cumulative Proportion 0.69040 0.70234 0.71387 0.72493 0.73559 0.74603 0.75626
## PC29 PC30 PC31 PC32 PC33 PC34 PC35
## Standard deviation 0.97544 0.96563 0.94737 0.92265 0.91362 0.90791 0.9000
## Proportion of Variance 0.00951 0.00932 0.00898 0.00851 0.00835 0.00824 0.0081
## Cumulative Proportion 0.76578 0.77510 0.78408 0.79259 0.80094 0.80918 0.8173
## PC36 PC37 PC38 PC39 PC40 PC41 PC42
## Standard deviation 0.89315 0.86204 0.84611 0.83490 0.82783 0.82129 0.79683
## Proportion of Variance 0.00798 0.00743 0.00716 0.00697 0.00685 0.00675 0.00635
## Cumulative Proportion 0.82526 0.83269 0.83985 0.84682 0.85367 0.86042 0.86677
## PC43 PC44 PC45 PC46 PC47 PC48 PC49
## Standard deviation 0.78185 0.77389 0.76734 0.74777 0.7413 0.70093 0.68479
## Proportion of Variance 0.00611 0.00599 0.00589 0.00559 0.0055 0.00491 0.00469
## Cumulative Proportion 0.87288 0.87887 0.88476 0.89035 0.8958 0.90076 0.90545
## PC50 PC51 PC52 PC53 PC54 PC55 PC56
## Standard deviation 0.66786 0.66260 0.64290 0.63831 0.63047 0.62325 0.61537
## Proportion of Variance 0.00446 0.00439 0.00413 0.00407 0.00397 0.00388 0.00379
## Cumulative Proportion 0.90991 0.91430 0.91843 0.92250 0.92648 0.93036 0.93415
## PC57 PC58 PC59 PC60 PC61 PC62 PC63
## Standard deviation 0.60421 0.58426 0.57830 0.56917 0.55554 0.54703 0.52700
## Proportion of Variance 0.00365 0.00341 0.00334 0.00324 0.00309 0.00299 0.00278
## Cumulative Proportion 0.93780 0.94121 0.94456 0.94780 0.95088 0.95388 0.95665
## PC64 PC65 PC66 PC67 PC68 PC69 PC70
## Standard deviation 0.51677 0.5095 0.49210 0.48516 0.47311 0.46544 0.44166
## Proportion of Variance 0.00267 0.0026 0.00242 0.00235 0.00224 0.00217 0.00195
## Cumulative Proportion 0.95932 0.9619 0.96434 0.96670 0.96893 0.97110 0.97305
## PC71 PC72 PC73 PC74 PC75 PC76 PC77
## Standard deviation 0.43383 0.42894 0.41921 0.41383 0.39757 0.37832 0.37570
## Proportion of Variance 0.00188 0.00184 0.00176 0.00171 0.00158 0.00143 0.00141
## Cumulative Proportion 0.97493 0.97677 0.97853 0.98024 0.98182 0.98326 0.98467
## PC78 PC79 PC80 PC81 PC82 PC83 PC84
## Standard deviation 0.35349 0.34737 0.33294 0.32986 0.31939 0.31135 0.30299
## Proportion of Variance 0.00125 0.00121 0.00111 0.00109 0.00102 0.00097 0.00092
## Cumulative Proportion 0.98592 0.98712 0.98823 0.98932 0.99034 0.99131 0.99223
## PC85 PC86 PC87 PC88 PC89 PC90 PC91
## Standard deviation 0.29256 0.27833 0.26872 0.26597 0.25826 0.25343 0.22121
## Proportion of Variance 0.00086 0.00077 0.00072 0.00071 0.00067 0.00064 0.00049
## Cumulative Proportion 0.99308 0.99386 0.99458 0.99529 0.99595 0.99660 0.99709
## PC92 PC93 PC94 PC95 PC96 PC97 PC98
## Standard deviation 0.21644 0.20685 0.2010 0.19797 0.19323 0.18539 0.14658
## Proportion of Variance 0.00047 0.00043 0.0004 0.00039 0.00037 0.00034 0.00021
## Cumulative Proportion 0.99755 0.99798 0.9984 0.99878 0.99915 0.99950 0.99971
## PC99 PC100
## Standard deviation 0.12431 0.11635
## Proportion of Variance 0.00015 0.00014
## Cumulative Proportion 0.99986 1.00000
Unlike FA, PCA requires only 28 components to explain 75% of the variance. However, to maximize variance while adhering to the Half-Variable Rule, I am selecting 48 components, which account for 90% of the variance.
Plotting cumulative variance explained:
explained_variance <- pca$sdev^2 / sum(pca$sdev^2)
explained_variance_df <- data.frame(
Components = 1:length(explained_variance),
CumulativeVariance = cumsum(explained_variance)
)
ggplot(explained_variance_df, aes(x = Components, y = CumulativeVariance)) +
geom_line(color = "blue") +
geom_point(color = "blue") +
labs(
title = "Cumulative variance explained",
x = "Number of components",
y = "Cumulative variance"
) +
theme_minimal()
The curve follows a typical elbow shape, where the first few components contribute significantly to the total variance, while additional components add diminishing returns. As more components are included, the cumulative variance continues to increase but at a slower rate.
Determining number of components explaining 90% variance:
num_components <- which(cumsum(explained_variance) >= 0.90)[1]
cat("Number of components explaining 90% variance:", num_components)
## Number of components explaining 90% variance: 48
Reduced data:
reduced_data <- as.data.frame(pca$x[, 1:num_components])
str(reduced_data)
## 'data.frame': 151 obs. of 48 variables:
## $ PC1 : num -2.834 -4.308 0.498 -3.394 -5.013 ...
## $ PC2 : num -4.34 -3.18 -6.6 -1.71 -1.96 ...
## $ PC3 : num 0.0219 -0.6121 -1.2402 1.9773 0.8616 ...
## $ PC4 : num -1.1022 1.6411 -0.4932 -1.6938 0.0384 ...
## $ PC5 : num 2.27 2.46 1.19 3.09 1.58 ...
## $ PC6 : num -0.716 -2.659 -1.07 -3.017 -1.283 ...
## $ PC7 : num 0.722 -1.045 0.688 0.958 -0.334 ...
## $ PC8 : num 1.0938 4.7261 4.2608 0.6408 0.0775 ...
## $ PC9 : num -1.41754 3.85401 -1.58819 -0.00362 -1.3923 ...
## $ PC10: num 1.308 -3.515 1.075 0.38 0.283 ...
## $ PC11: num 0.349 -5.239 -0.665 1.751 0.746 ...
## $ PC12: num -0.763 0.343 -0.878 -1.39 -1.388 ...
## $ PC13: num -0.569 -0.882 0.735 -0.739 -1.84 ...
## $ PC14: num 0.625 -2.305 -1.523 2.188 -0.552 ...
## $ PC15: num 2.06 3.364 0.915 0.553 0.439 ...
## $ PC16: num -0.278 -1.7 1.142 -0.76 -0.688 ...
## $ PC17: num -0.437 1.871 1.514 0.202 0.423 ...
## $ PC18: num 1.053 1.299 0.08 1.629 0.662 ...
## $ PC19: num -0.371 1.312 1.481 1.355 1.3 ...
## $ PC20: num -0.6643 2.464 -0.0711 0.382 -0.0816 ...
## $ PC21: num -1.706 2.74 0.261 0.829 -1.513 ...
## $ PC22: num -1.014 -1.0477 0.0884 0.1146 1.7839 ...
## $ PC23: num 1.6 -3.97 1.42 1.35 0.29 ...
## $ PC24: num -1.347 0.225 0.741 -0.703 0.728 ...
## $ PC25: num 0.385 1.006 -0.49 -0.789 0.777 ...
## $ PC26: num 0.549 -2.196 -0.828 -0.174 0.063 ...
## $ PC27: num 0.516 -0.301 1.5 -0.6 0.727 ...
## $ PC28: num 0.6206 0.5135 0.4235 -0.0797 0.3555 ...
## $ PC29: num -1.6192 -2.3022 -0.0852 0.253 -0.722 ...
## $ PC30: num -0.00934 -0.10416 0.48702 0.3083 -0.18809 ...
## $ PC31: num 1.21 0.683 -1.014 1.127 -0.503 ...
## $ PC32: num -0.251 -2.833 -0.864 -0.23 0.694 ...
## $ PC33: num 1.74 0.757 0.29 -0.195 0.284 ...
## $ PC34: num -0.588 -1.519 0.632 0.102 -0.836 ...
## $ PC35: num -0.6054 1.5401 -0.1405 0.7388 0.0691 ...
## $ PC36: num 0.4996 -0.2809 -0.0372 1.3635 -0.1224 ...
## $ PC37: num -0.713 -1.264 1.146 -0.166 -1.312 ...
## $ PC38: num 0.0705 0.0302 0.7203 -0.1795 0.7014 ...
## $ PC39: num -0.0431 1.6706 -1.3537 0.6622 -0.672 ...
## $ PC40: num 0.7599 -1.6477 0.8836 0.0191 -0.3291 ...
## $ PC41: num 1.4576 -0.047 -1.116 0.8824 -0.0564 ...
## $ PC42: num 0.6 0.947 0.7 -0.549 0.112 ...
## $ PC43: num -1.043 -0.109 -0.254 0.629 0.489 ...
## $ PC44: num 0.283 -0.326 0.21 -0.301 1.615 ...
## $ PC45: num 0.857 -0.627 -1.349 -0.479 0.223 ...
## $ PC46: num 0.7458 0.0301 -0.6469 -1.3376 -0.0438 ...
## $ PC47: num -0.88 -1.651 0.852 -0.567 -0.569 ...
## $ PC48: num -0.9346 -0.0869 0.5325 0.5566 -0.3164 ...
tendency <- get_clust_tendency(reduced_data, n = nrow(reduced_data) * 0.1)
cat("Hopkins statistic:", tendency$hopkins_stat, "\n")
## Hopkins statistic: 0.6570545
The Hopkins statistic (0.657) measures the clustering tendency of the dataset, indicating how strongly the data exhibits natural cluster structures. It ranges from 0 to 1:
Close to 0: The data is uniformly distributed
(random), meaning clustering is not meaningful.
Close to 1: The data is highly clustered, making it well-suited for clustering algorithms like K-Means.
Around 0.5: The data has moderate structure but may not form well-separated clusters.
With a Hopkins value of 0.657, the dataset shows some clustering tendency.
library(cluster)
wss <- numeric(9)
sil_scores <- numeric(9)
for (k in 2:10) {
kmeans_model <- kmeans(reduced_data, centers = k, nstart = 25)
wss[k - 1] <- kmeans_model$tot.withinss
silhouette_vals <- silhouette(kmeans_model$cluster, dist(reduced_data))
sil_scores[k - 1] <- mean(silhouette_vals[, 3])
}
elbow_df <- data.frame(Clusters = 2:10, WSS = wss)
ggplot(elbow_df, aes(x = Clusters, y = WSS)) +
geom_line(color = "blue") +
geom_point(color = "red") +
labs(title = "Elbow method for optimal clusters",
x = "Number of clusters",
y = "Within-cluster sum of squares") +
theme_minimal()
The optimal number of clusters is typically found at the elbow point, where increasing k further provides diminishing improvements in WSS. In this plot, the elbow appears to be around k = 2 or 3, suggesting that this is the most efficient number of clusters for this dataset. Further validation using silhouette scores can confirm this choice.
silhouette_df <- data.frame(Clusters = 2:10, Silhouette = sil_scores)
ggplot(silhouette_df, aes(x = Clusters, y = Silhouette)) +
geom_line(color = "green") +
geom_point(color = "red") +
labs(title = "Silhouette scores for optimal clusters",
x = "Number of clusters",
y = "Mean silhouette score") +
theme_minimal()
optimal_clusters <- which.max(sil_scores) + 1
kmeans_result <- kmeans(reduced_data, centers = optimal_clusters, nstart = 25)
silhouette_kmeans <- silhouette(kmeans_result$cluster, dist(reduced_data))
plot(silhouette_kmeans, main = "Silhouette plot for K-Means clustering", col = kmeans_result$cluster, border = NA)
This silhouette plot evaluates how well data points are clustered when using k = 2 clusters in K-Means. The silhouette width, shown on the x-axis, measures how similar each point is to its assigned cluster relative to the next closest cluster. A higher silhouette width indicates better clustering quality.
The average silhouette score is 0.11, which is very low, suggesting that the clustering structure is weak. Ideally, silhouette values should be above 0.25 for moderate clustering quality and closer to 1 for strong, well-separated clusters.
This suggests that k-means clustering is not a good choice for this dataset, as the clusters are not well-defined. The low silhouette scores and many near-zero values imply significant overlap or poor separation. It would be beneficial to consider a different clustering algorithm like hierarchical clustering or T-SNE, which may capture the data structure better.
distance_matrix <- dist(reduced_data)
hc <- hclust(distance_matrix, method = "ward.D2")
plot(hc, labels = FALSE, main = "Hierarchical clustering dendrogram", xlab = "Samples", ylab = "Height")
cluster_labels <- cutree(hc, k = optimal_clusters)
table(cluster_labels)
## cluster_labels
## 1 2
## 88 63
This dendrogram represents the hierarchical clustering structure of the dataset, where samples are grouped based on their similarity. The y-axis (Height) represents the distance or dissimilarity between clusters, with larger heights indicating more distinct clusters. The x-axis (Samples) represents the individual observations being clustered.
Using the Ward’s D2 method, which minimizes variance within clusters, the dendrogram shows a clear bifurcation into two main clusters, aligning with the cut at k = 2. The table below the plot confirms this, with 88 samples assigned to Cluster 1 and 63 to Cluster 2.
The large vertical jumps in the dendrogram indicate that these two clusters are relatively well-separated, though further validation using silhouette scores or gap statistics can confirm their quality. The structure suggests that hierarchical clustering successfully identifies patterns in the data, with potential for refining clusters further if needed.
sil <- silhouette(cluster_labels, dist(reduced_data))
plot(sil, col = cluster_labels, main = "Silhouette plot for Hierarchical Clustering")
The average silhouette width is 0.09, which is very low, indicating that the clusters are weakly separated and overlap significantly. Cluster 1 (88 samples) has a slightly better silhouette score of 0.12, but still suggests poor cluster cohesion. Cluster 2 (63 samples) has an even lower score of 0.05, meaning that many points may be misclassified or lie near the decision boundary between clusters.
These results suggest that hierarchical clustering with k = 2 is not a good fit for this dataset, as the low silhouette scores indicate that the clustering structure is not well-defined. It would be beneficial to consider alternative clustering algorithms like Rtsne to improve separation and structure.
The t-Distributed Stochastic Neighbor Embedding (t-SNE) is a non-linear dimension reduction technique that maps high-dimensional data into a 2D space while preserving local structures and patterns. This makes it ideal for visualizing complex datasets and identifying potential cluster structures. However, t-SNE does not inherently perform clustering—it only reduces dimensions.
To extract meaningful groups, K-Means clustering was applied to the t-SNE-transformed data. K-Means partitions the dataset into k clusters, minimizing intra-cluster variance. By using t-SNE as preprocessing, K-Means can work more effectively in a reduced, meaningful space.
#install.packages("Rtsne")
library(Rtsne)
set.seed(123)
tsne_result <- Rtsne(reduced_data[, -ncol(reduced_data)], dims = 2, perplexity = 30)
tsne_data <- as.data.frame(tsne_result$Y)
colnames(tsne_data) <- c("Dim1", "Dim2")
set.seed(123)
kmeans_tsne <- kmeans(tsne_data, centers = 3, nstart = 25)
tsne_data$Cluster <- as.factor(kmeans_tsne$cluster)
library(ggplot2)
ggplot(tsne_data, aes(x = Dim1, y = Dim2, color = Cluster)) +
geom_point(size = 3) +
labs(title = "t-SNE with k-means Clustering") +
theme_minimal()
The t-SNE plot shows a well-separated structure of three distinct clusters, colored red (Cluster 1), green (Cluster 2), and blue (Cluster 3). Unlike PCA, t-SNE excels at revealing non-linear relationships, which explains why the clusters appear more separated than in previous methods.
tsne_dist <- dist(tsne_data[, c("Dim1", "Dim2")])
silhouette_tsne <- silhouette(as.numeric(tsne_data$Cluster), tsne_dist)
plot(silhouette_tsne, main = "Silhouette Plot for t-SNE + k-means Clustering",
col = as.numeric(tsne_data$Cluster), border = NA)
avg_silhouette_width <- mean(silhouette_tsne[, 3])
cat("Average silhouette width for t-SNE + k-means:", avg_silhouette_width, "\n")
## Average silhouette width for t-SNE + k-means: 0.4353395
The silhouette plot measures how well each point fits within its assigned cluster. With an average silhouette score of 0.44, the clustering is significantly stronger than previous methods, such as PCA-based or hierarchical clustering.
Cluster 1 (27 points) has a mean silhouette score of 0.38, indicating a moderate structure.
Cluster 2 (71 points) has the highest silhouette score of 0.45, meaning it is the most well-defined.
Cluster 3 (53 points) also has a strong silhouette score of 0.44, reinforcing the validity of the three-cluster solution.
Compared to hierarchical clustering and K-Means on PCA, this approach provides better-defined clusters with higher silhouette scores. The results suggest that t-SNE successfully separates data into three meaningful groups, while K-Means effectively groups them.
if (!requireNamespace("clusterSim", quietly = TRUE)) {
install.packages("clusterSim")
}
library(clusterSim)
## Loading required package: MASS
k_values <- c(2, 3, 4)
ch_indices <- numeric(length(k_values))
for (i in seq_along(k_values)) {
set.seed(123)
kmeans_model <- kmeans(tsne_data[, c("Dim1", "Dim2")], centers = k_values[i], nstart = 25)
ch_indices[i] <- index.G1(tsne_data[, c("Dim1", "Dim2")], kmeans_model$cluster)
cat("Calinski-Harabasz Index for k =", k_values[i], ":", ch_indices[i], "\n")
}
## Calinski-Harabasz Index for k = 2 : 171.6043
## Calinski-Harabasz Index for k = 3 : 150.8291
## Calinski-Harabasz Index for k = 4 : 150.9773
The Calinski-Harabasz (CH) Index evaluates the quality of clustering by measuring the ratio of between-cluster dispersion to within-cluster dispersion. Higher values indicate better-defined clusters with strong separation.
From the results:
k = 2: CH Index = 171.60 (highest)
k = 3: CH Index = 150.83
k = 4: CH Index = 150.98
Since the CH index is highest for k = 2, this suggests that two clusters provide the most optimal separation for this dataset. However, the difference between k = 3 and k = 4 is minimal, meaning both could still be reasonable choices depending on the dataset’s structure and the interpretation of clustering results. Because biological and functional interpretation supports more clusters, k = 3 or 4 is still valid despite the CH index favoring k = 2.
For this dataset, characterized by high variance among variables, t-SNE combined with K-Means emerged as the most effective clustering method. The clear separation in the t-SNE visualization, along with high silhouette scores, confirms a well-defined three-cluster structure. This approach is particularly beneficial for non-linear, high-dimensional data, making it a valuable tool for gene expression analysis and other complex biological studies.
However, previous research suggests that for typical gene expression datasets, K-Means (D’haeseleer 2005) or hierarchical clustering (Eisen et al. 1998) are often preferred. While t-SNE enhances visual interpretation, its stochastic nature and sensitivity to hyperparameters make it less commonly used for clustering in large-scale genomic studies.