1 Summary of R package rrda Workshop

This page is designed for a workshop on the R package rrda (Ridge Redundancy Analysis for High-Dimensional Omics Data), available on CRAN.

The rrda package is designed for analyzing two matrices, exploring how one may influence or explain the other.

While conventional methods of RDA (or RRDA) are restricted to relatively small matrices, rrda extends these computations to high-dimensional data—including datasets with more than 100,000 features.

This capability makes it especially valuable in modern omics sciences and other fields where analyzing extremely large datasets is essential.

2 Algebra

2.1 Classic formula

The Ridge Redundancy Analysis model is represented as:

Let \(Y\) be an \(n \times q\) matrix of response variables, and let \(X\) be an \(n \times p\) matrix of predictor variables. We assume that the columns of matrices \(Y\) and \(X\) are centered to have means of 0.

The relation between \(X\) and \(Y\) is given by

\[ Y = X B + E \]

where \(B\) is the matrix of regression coefficients, and \(E\) is an error matrix.

The ridge RDA aims to estimate \(B\) under both rank and ridge restrictions. The corresponding optimization problem is

\[ \hat{B}(\lambda, r) = \arg\min_B \left\{ \| Y - X B \|^2 + \lambda \| B \|^2 \right\} \quad \text{subject to} \quad \text{rank}(B) \leq r \]

where \(\| A \|^2 = \text{tr}(A'A)\) is the Frobenius norm and \(\lambda\) is the ridge regularization parameter.

2.2 For High-Dimensional Data

For Ridge Redundancy Analysis, the ridge estimator \(\hat{B}(\lambda)\) can be decomposed as follows:

\[ U_{\hat{B}(\lambda)} = V (D^{2} + \lambda I)^{-1/2}\hat{U}_{\lambda}, \quad D_{\hat{B}(\lambda)} = \hat{D}_{\lambda}, \quad V_{\hat{B}(\lambda)} = \hat{V}_{\lambda} \]

where the Singular Value Decomposition (SVD) of \(X\) is \[ X = U D V^{\prime} \] and \(\hat{U}_{\lambda}, \hat{D}_{\lambda}, \hat{V}_{\lambda}\) are the SVD components of

\[ M_\lambda = (D^{2} + \lambda I)^{-1/2} D U^{\prime} Y. \]

Since \(M_\lambda\) depends on \(\lambda\), its SVD must be computed for each value of \(\lambda\).
However, efficiency can be gained because \(M_\lambda\) is the product of:

a diagonal matrix \((D^{2} + \lambda I)^{-1/2}\), and
a constant matrix \(U'Y\) (independent of \(\lambda\)).

2.3 Final Expression

The rank-restricted ridge estimator is:

\[ \hat{B}(\lambda,r) = V(D^{2} + \lambda I)^{-1/2} \hat{U}_{\lambda}^{[r]} \hat{D}_{\lambda}^{[r]} \hat{V}_{\lambda}^{[r]\prime}. \]

Important Note 1: Ridge Redundancy Analysis (RRDA) has two parameters:
- Ridge penalty \(\lambda\) - Rank \(r\)

Important Note 2: RRDA is efficiently rewritten with SVD of X

3 Setup

Required packages

pkgs <- c("rrda", "stringr")

# Install missing packages
to_install <- pkgs[!(pkgs %in% installed.packages()[, "Package"])]
if (length(to_install)) {
  install.packages(to_install, repos = "https://cloud.r-project.org")
}

# rrda is updated if the version is old
required_version <- "0.2.3"  

if (!requireNamespace("rrda", quietly = TRUE) ||
    packageVersion("rrda") < required_version) {
  message("rrda will be updated")
  install.packages("rrda", repos = "https://cloud.r-project.org", type = "source")
}


# load packages
library(rrda)
library(stringr)

4 Simulation

4.1 Data generation

Here we generate two matrices, X and Y, with a chosen rank constraint k.

set.seed(10)
simdata<-rdasim1(n = 50,p = 100,q = 100,k = 5)
X <- simdata$X
Y <- simdata$Y

4.2 Parameter tuning

cv_result<- rrda.cv(Y = Y, X = X,nfold = 5)

## Call:
## rrda.cv(Y = Y, X = X, lambda = 4.801 - 48010, maxrank = 15, nfold = 5)

rrda.summary(cv_result = cv_result)

## === opt_min ===
## MSE: 
## [1] 1.218769
## rank: 
## [1] 5
## lambda: 
## [1] 66.71
## 
## === opt_lambda.1se ===
## MSE: 
## [1] 1.244403
## rank: 
## [1] 5
## lambda: 
## [1] 248.7
## 
## === opt_rank.1se ===
## MSE: 
## [1] 1.218769
## rank: 
## [1] 5
## lambda: 
## [1] 66.71

4.3 Visualize Results

Here we illustrate the parameter tuning process (regularization path), which helps identify the optimal parameter for maximizing prediction accuracy from X to Y.

p <- rrda.plot(cv_result) # cv result plot
print(p)

h <- rrda.heatmap(cv_result) # cv result heatmap
print(h)

best_lambda<-cv_result$opt_min$lambda  # selected parameter
best_rank<-cv_result$opt_min$rank # selected parameter

Bhat <- rrda.fit(Y = Y, X = X, nrank = best_rank,lambda = best_lambda) # fitting
Bhat_mat<-rrda.coef(Bhat)

## rank:
## [1] 5
## lambda:
## [1] 66.71

Yhat_mat <- rrda.predict(Bhat = Bhat, X = X) # prediction
Yhat<-Yhat_mat[[1]][[1]][[1]] # predicted values

cor_Y_Yhat<-diag(cor(Y,Yhat)) # correlation
summary(cor_Y_Yhat)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.4813  0.8754  0.9131  0.8964  0.9371  0.9813

plot(Yhat, Y)
abline(0, 1, col = "red")

4.4 Two-Dimensional Plot

Traditionally, (Ridge) RDA is visualized in two dimensions.
However, in omics studies, this approach can sometimes overload the plot with information, making interpretation challenging.

# You want to specify one lambda in `rrda.fit` to visualize

ud<-Bhat$Bhat_comp[[1]][[1]] # SVD component of B (UD)
v <-Bhat$Bhat_comp[[1]][[2]] # SVD component of B (V)

ud12 <- ud[, 1:2]
v12 <- v[, 1:2]

# Base plot: ud (e.g., site scores)
plot(v12, 
     xlab = "RRDA1", ylab = "RRDA2", 
     xlim = range(c(ud12[,1], v12[,1])) * 1.1, 
     ylim = range(c(ud12[,2], v12[,2])) * 1.1, 
     pch = 19, col = "darkgreen", 
     main = "RRDA")

# Add v (e.g., species scores) as arrows from origin
arrows(0, 0, ud12[,1], ud12[,2], col = "blue3", length = 0.1)

# Optionally add text labels
text(ud12, labels = paste0("X", 1:nrow(ud12)), pos = 3, col = "blue3", cex = 0.6)
text(v12, labels = paste0("Y", 1:nrow(v12)), pos = 3, col = "darkgreen", cex = 0.6)

4.5 Top Feature Heatmap

For better interpretability, we visualize the feature–feature matrix using a selected dimensionality, highlighting the most informative features.

best_lambda<-cv_result$opt_min$lambda  
best_rank<-cv_result$opt_min$rank
rrda.top(Y=Y,X=X,nrank=best_rank,lambda=best_lambda,mx=20,my=20)

5 Application

As an application, we analyze microbiome and metabolome data from soybean grown under drought conditions.
The dataset comes from a 2019 field experiment at Tottori University (Yoshioka et al. 2025).

5.1 Read data

Data is available on GitHub (https://github.com/Yoska393/rrda/blob/main/RDAdata/SoyData.RDS).

#download from github
if(!dir.exists("RDAdata")) dir.create("RDAdata")
url <- "https://raw.githubusercontent.com/Yoska393/rrda/main/RDAdata/SoyData.RDS"
destfile <- "RDAdata/SoyData.RDS"

if (!file.exists(destfile)) {
  download.file(url, destfile, mode = "wb")
  message("✅ File downloaded successfully.")
} 

SoyData <- readRDS("RDAdata/Soydata.RDS")

met <- SoyData$metabolome
micro <- SoyData$microbiome

metname<-substr(SoyData$metname, 1, 30)
microname<-str_extract(SoyData$microname, "(?<=o__)[^;]+(?=; f__)")

Y <- met
X <- micro

colnames(Y)<-metname
colnames(X)<-microname

dim(Y)

## [1] 179 253

dim(X)

## [1]  179 4771

5.2 Parameter tuning

cv_result<- rrda.cv(Y = Y, X = X, nfold = 5)

## Call:
## rrda.cv(Y = Y, X = X, lambda = 0.336 - 3360, maxrank = 15, nfold = 5)

rrda.summary(cv_result = cv_result)

## === opt_min ===
## MSE: 
## [1] 0.9410295
## rank: 
## [1] 11
## lambda: 
## [1] 241.8
## 
## === opt_lambda.1se ===
## MSE: 
## [1] 0.9561614
## rank: 
## [1] 11
## lambda: 
## [1] 3360
## 
## === opt_rank.1se ===
## MSE: 
## [1] 0.9798277
## rank: 
## [1] 1
## lambda: 
## [1] 241.8

5.3 Visualize Results

p <- rrda.plot(cv_result) # cv result plot
print(p)

h <- rrda.heatmap(cv_result) # cv result heatmap
print(h)

best_lambda<-cv_result$opt_min$lambda  # selected parameter
best_rank<-cv_result$opt_min$rank # selected parameter

Bhat <- rrda.fit(Y = Y, X = X, nrank = best_rank,lambda = best_lambda)

5.4 Fitting

# fitting
Bhat_mat<-rrda.coef(Bhat)

## rank:
## [1] 11
## lambda:
## [1] 241.8

Yhat_mat <- rrda.predict(Bhat = Bhat, X = X) # prediction
Yhat<-Yhat_mat[[1]][[1]][[1]] # predicted values

cor_Y_Yhat<-diag(cor(Y,Yhat)) # correlation
summary(cor_Y_Yhat)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.09661 0.50947 0.70204 0.66532 0.82641 0.97748

plot(Yhat, Y)
abline(0, 1, col = "red")

5.5 Top Feature Heatmap

In the lower-dimensional space, we observe several interesting interactions among the top features.

best_lambda<-cv_result$opt_min$lambda  # selected parameter
best_rank<-cv_result$opt_min$rank # selected parameter
rrda.top(Y=Y,X=X,nrank=best_rank,lambda=best_lambda,mx=20,my=20)

References

Yoshioka H, Aubert J, and Mary-Huard T (2025). rrda: Ridge Redundancy Analysis for High-Dimensional Omics Data. https://CRAN.R-project.org/package=rrda (CRAN R Package)

Yoshioka, H., Aubert, J., Iwata, H., and Mary-Huard, T., 2025. Ridge Redundancy Analysis for High-Dimensional Omics Data. bioRxiv, https://doi.org/10.1101/2025.04.16.649138

rrda: Ridge Redundancy Analysis for High-Dimensional Omics Data

Hayato Yoshioka (University of Tokyo)

2025-10-02