This page is designed for a workshop on the R package rrda (Ridge Redundancy Analysis for High-Dimensional Omics Data), available on CRAN.
The rrda package is designed for analyzing two matrices, exploring how one may influence or explain the other.
While conventional methods of RDA (or RRDA) are restricted to relatively small matrices, rrda extends these computations to high-dimensional data—including datasets with more than 100,000 features.
This capability makes it especially valuable in modern omics sciences and other fields where analyzing extremely large datasets is essential.
The Ridge Redundancy Analysis model is represented as:
Let \(Y\) be an \(n \times q\) matrix of response variables, and let \(X\) be an \(n \times p\) matrix of predictor variables. We assume that the columns of matrices \(Y\) and \(X\) are centered to have means of 0.
The relation between \(X\) and \(Y\) is given by
\[ Y = X B + E \]
where \(B\) is the matrix of regression coefficients, and \(E\) is an error matrix.
The ridge RDA aims to estimate \(B\) under both rank and ridge restrictions. The corresponding optimization problem is
\[ \hat{B}(\lambda, r) = \arg\min_B \left\{ \| Y - X B \|^2 + \lambda \| B \|^2 \right\} \quad \text{subject to} \quad \text{rank}(B) \leq r \]
where \(\| A \|^2 = \text{tr}(A'A)\) is the Frobenius norm and \(\lambda\) is the ridge regularization parameter.
For Ridge Redundancy Analysis, the ridge estimator \(\hat{B}(\lambda)\) can be decomposed as follows:
\[ U_{\hat{B}(\lambda)} = V (D^{2} + \lambda I)^{-1/2}\hat{U}_{\lambda}, \quad D_{\hat{B}(\lambda)} = \hat{D}_{\lambda}, \quad V_{\hat{B}(\lambda)} = \hat{V}_{\lambda} \]
where the Singular Value Decomposition (SVD) of \(X\) is \[ X = U D V^{\prime} \] and \(\hat{U}_{\lambda}, \hat{D}_{\lambda}, \hat{V}_{\lambda}\) are the SVD components of
\[ M_\lambda = (D^{2} + \lambda I)^{-1/2} D U^{\prime} Y. \]
Since \(M_\lambda\) depends on \(\lambda\), its SVD must be computed for
each value of \(\lambda\).
However, efficiency can be gained because \(M_\lambda\) is the product of:
The rank-restricted ridge estimator is:
\[ \hat{B}(\lambda,r) = V(D^{2} + \lambda I)^{-1/2} \hat{U}_{\lambda}^{[r]} \hat{D}_{\lambda}^{[r]} \hat{V}_{\lambda}^{[r]\prime}. \]
Important Note 1: Ridge Redundancy Analysis (RRDA)
has two parameters:
- Ridge penalty \(\lambda\)
- Rank \(r\)
Important Note 2: RRDA is efficiently rewritten with SVD of X
Required packages
pkgs <- c("rrda", "stringr")
# Install missing packages
to_install <- pkgs[!(pkgs %in% installed.packages()[, "Package"])]
if (length(to_install)) {
install.packages(to_install, repos = "https://cloud.r-project.org")
}
# rrda is updated if the version is old
required_version <- "0.2.3"
if (!requireNamespace("rrda", quietly = TRUE) ||
packageVersion("rrda") < required_version) {
message("rrda will be updated")
install.packages("rrda", repos = "https://cloud.r-project.org", type = "source")
}
# load packages
library(rrda)
library(stringr)
Here we generate two matrices, X and Y, with a chosen rank constraint k.
set.seed(10)
simdata<-rdasim1(n = 50,p = 100,q = 100,k = 5)
X <- simdata$X
Y <- simdata$Y
cv_result<- rrda.cv(Y = Y, X = X,nfold = 5)
## Call:
## rrda.cv(Y = Y, X = X, lambda = 4.801 - 48010, maxrank = 15, nfold = 5)
rrda.summary(cv_result = cv_result)
## === opt_min ===
## MSE:
## [1] 1.218769
## rank:
## [1] 5
## lambda:
## [1] 66.71
##
## === opt_lambda.1se ===
## MSE:
## [1] 1.244403
## rank:
## [1] 5
## lambda:
## [1] 248.7
##
## === opt_rank.1se ===
## MSE:
## [1] 1.218769
## rank:
## [1] 5
## lambda:
## [1] 66.71
Here we illustrate the parameter tuning process (regularization path), which helps identify the optimal parameter for maximizing prediction accuracy from X to Y.
p <- rrda.plot(cv_result) # cv result plot
print(p)
h <- rrda.heatmap(cv_result) # cv result heatmap
print(h)
best_lambda<-cv_result$opt_min$lambda # selected parameter
best_rank<-cv_result$opt_min$rank # selected parameter
Bhat <- rrda.fit(Y = Y, X = X, nrank = best_rank,lambda = best_lambda) # fitting
Bhat_mat<-rrda.coef(Bhat)
## rank:
## [1] 5
## lambda:
## [1] 66.71
Yhat_mat <- rrda.predict(Bhat = Bhat, X = X) # prediction
Yhat<-Yhat_mat[[1]][[1]][[1]] # predicted values
cor_Y_Yhat<-diag(cor(Y,Yhat)) # correlation
summary(cor_Y_Yhat)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.4813 0.8754 0.9131 0.8964 0.9371 0.9813
plot(Yhat, Y)
abline(0, 1, col = "red")
Traditionally, (Ridge) RDA is visualized in two
dimensions.
However, in omics studies, this approach can sometimes overload
the plot with information, making interpretation
challenging.
# You want to specify one lambda in `rrda.fit` to visualize
ud<-Bhat$Bhat_comp[[1]][[1]] # SVD component of B (UD)
v <-Bhat$Bhat_comp[[1]][[2]] # SVD component of B (V)
ud12 <- ud[, 1:2]
v12 <- v[, 1:2]
# Base plot: ud (e.g., site scores)
plot(v12,
xlab = "RRDA1", ylab = "RRDA2",
xlim = range(c(ud12[,1], v12[,1])) * 1.1,
ylim = range(c(ud12[,2], v12[,2])) * 1.1,
pch = 19, col = "darkgreen",
main = "RRDA")
# Add v (e.g., species scores) as arrows from origin
arrows(0, 0, ud12[,1], ud12[,2], col = "blue3", length = 0.1)
# Optionally add text labels
text(ud12, labels = paste0("X", 1:nrow(ud12)), pos = 3, col = "blue3", cex = 0.6)
text(v12, labels = paste0("Y", 1:nrow(v12)), pos = 3, col = "darkgreen", cex = 0.6)
For better interpretability, we visualize the feature–feature matrix using a selected dimensionality, highlighting the most informative features.
best_lambda<-cv_result$opt_min$lambda
best_rank<-cv_result$opt_min$rank
rrda.top(Y=Y,X=X,nrank=best_rank,lambda=best_lambda,mx=20,my=20)
As an application, we analyze microbiome and
metabolome data from soybean grown under drought
conditions.
The dataset comes from a 2019 field experiment at Tottori University
(Yoshioka et al. 2025).
Data is available on GitHub
(https://github.com/Yoska393/rrda/blob/main/RDAdata/SoyData.RDS).
#download from github
if(!dir.exists("RDAdata")) dir.create("RDAdata")
url <- "https://raw.githubusercontent.com/Yoska393/rrda/main/RDAdata/SoyData.RDS"
destfile <- "RDAdata/SoyData.RDS"
if (!file.exists(destfile)) {
download.file(url, destfile, mode = "wb")
message("✅ File downloaded successfully.")
}
SoyData <- readRDS("RDAdata/Soydata.RDS")
met <- SoyData$metabolome
micro <- SoyData$microbiome
metname<-substr(SoyData$metname, 1, 30)
microname<-str_extract(SoyData$microname, "(?<=o__)[^;]+(?=; f__)")
Y <- met
X <- micro
colnames(Y)<-metname
colnames(X)<-microname
dim(Y)
## [1] 179 253
dim(X)
## [1] 179 4771
cv_result<- rrda.cv(Y = Y, X = X, nfold = 5)
## Call:
## rrda.cv(Y = Y, X = X, lambda = 0.336 - 3360, maxrank = 15, nfold = 5)
rrda.summary(cv_result = cv_result)
## === opt_min ===
## MSE:
## [1] 0.9410295
## rank:
## [1] 11
## lambda:
## [1] 241.8
##
## === opt_lambda.1se ===
## MSE:
## [1] 0.9561614
## rank:
## [1] 11
## lambda:
## [1] 3360
##
## === opt_rank.1se ===
## MSE:
## [1] 0.9798277
## rank:
## [1] 1
## lambda:
## [1] 241.8
p <- rrda.plot(cv_result) # cv result plot
print(p)
h <- rrda.heatmap(cv_result) # cv result heatmap
print(h)
best_lambda<-cv_result$opt_min$lambda # selected parameter
best_rank<-cv_result$opt_min$rank # selected parameter
Bhat <- rrda.fit(Y = Y, X = X, nrank = best_rank,lambda = best_lambda)
# fitting
Bhat_mat<-rrda.coef(Bhat)
## rank:
## [1] 11
## lambda:
## [1] 241.8
Yhat_mat <- rrda.predict(Bhat = Bhat, X = X) # prediction
Yhat<-Yhat_mat[[1]][[1]][[1]] # predicted values
cor_Y_Yhat<-diag(cor(Y,Yhat)) # correlation
summary(cor_Y_Yhat)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.09661 0.50947 0.70204 0.66532 0.82641 0.97748
plot(Yhat, Y)
abline(0, 1, col = "red")
In the lower-dimensional space, we observe several interesting interactions among the top features.
best_lambda<-cv_result$opt_min$lambda # selected parameter
best_rank<-cv_result$opt_min$rank # selected parameter
rrda.top(Y=Y,X=X,nrank=best_rank,lambda=best_lambda,mx=20,my=20)
Yoshioka H, Aubert J, and Mary-Huard T (2025). rrda: Ridge Redundancy Analysis for High-Dimensional Omics Data. https://CRAN.R-project.org/package=rrda (CRAN R Package)
Yoshioka, H., Aubert, J., Iwata, H., and Mary-Huard, T., 2025. Ridge Redundancy Analysis for High-Dimensional Omics Data. bioRxiv, https://doi.org/10.1101/2025.04.16.649138