> sessionInfo()
R version 4.5.1 (2025-06-13)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.12.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
> library(scRNAseq)
> sce <- fetchDataset("zilionis-lung-2019", "2023-12-20", path = "human")
> logcounts(sce) <- log(assay(sce) + 1)
# Select the top 1000 highest mean genes
> rs = rowSums(assay(sce, "logcounts"))
> sce2 <- sce[rank(-rs) <= 1000, ]
> dim(sce)
[1] 41861 173954
# No loadings ("left"" singular vectors) or scores ("right" singular vectors) calculated
> system.time(svd0 <- svd(assay(sce2, "logcounts"), nu=0, nv=0))
user system elapsed
188.761 4.722 193.365
# No loadings, 30 scores calculated
> system.time(svd0030 <- svd(assay(sce2, "logcounts"), nu=0, nv=30))
user system elapsed
459.284 5.800 464.814
# All loadings and scores calculated
> system.time(fullsvd <- svd(assay(sce2, "logcounts")))
user system elapsed
460.722 6.440 466.884
This is with the openblas-pthread library, for parallelized computations. There is another version for serial computations. The steps to do this on supermicro:
sudo apt install libopenblas0-pthread
# openblas:
ls -l /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r*.so
sudo update-alternatives --install /usr/lib/x86_64-linux-gnu/libblas.so.3 libblas.so.3-x86_64-linux-gnu /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so 110
sudo update-alternatives --config libblas.so.3-x86_64-linux-gnu # choose pthread
# lapack:
sudo update-alternatives --install /usr/lib/x86_64-linux-gnu/liblapack.so.3 liblapack.so.3-x86_64-linux-gnu /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so 110
sudo update-alternatives --config liblapack.so.3-x86_64-linux-gnu
Verify in R with sessionInfo().
> sessionInfo()
R version 4.5.1 (2025-06-13)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.2 LTS
Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
Note, when running svd, I will now see the process consuming 64 threads:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2042136 levi 20 0 14.8g 6.0g 101376 R 6410 0.6 49:47.73 rsession
> suppressPackageStartupMessages(library(scRNAseq))
> sce <- fetchDataset("zilionis-lung-2019", "2023-12-20", path = "human")
> logcounts(sce) <- log(assay(sce) + 1)
> # Select the top 1000 highest mean genes
> rs = rowSums(assay(sce, "logcounts"))
> sce2 <- sce[rank(-rs) <= 1000, ]
> dim(sce)
[1] 41861 173954
> system.time(svd0 <- svd(assay(sce2, "logcounts"), nu=0, nv=0))
user system elapsed
327.514 697.689 38.197
> system.time(svd0030 <- svd(assay(sce2, "logcounts"), nu=0, nv=30))
user system elapsed
615.962 1288.188 47.986
> system.time(fullsvd <- svd(assay(sce2, "logcounts")))
user system elapsed
623.346 1267.853 47.430
Installation steps:
sudo apt install libopenblas0-serial
# BLAS
sudo update-alternatives --install /usr/lib/x86_64-linux-gnu/libblas.so.3 libblas.so.3-x86_64-linux-gnu /usr/lib/x86_64-linux-gnu/openblas-serial/libopenblas-r0.3.26.so 90
sudo update-alternatives --config libblas.so.3-x86_64-linux-gnu
# LAPACK
sudo update-alternatives --install /usr/lib/x86_64-linux-gnu/liblapack.so.3 liblapack.so.3-x86_64-linux-gnu /usr/lib/x86_64-linux-gnu/openblas-serial/libopenblas-r0.3.26.so 90
sudo update-alternatives --config liblapack.so.3-x86_64-linux-gnu
Confirmed that now it uses only a single CPU during SVD:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2042847 levi 20 0 6824492 5.8g 101376 R 107.3 0.6 3:56.38 rsession
Results:
> sessionInfo()
R version 4.5.1 (2025-06-13)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.2 LTS
Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-serial/libopenblas-r0.3.26.so; LAPACK version 3.12.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: America/New_York
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.5.1 tools_4.5.1 rstudioapi_0.17.1
> suppressPackageStartupMessages(library(scRNAseq))
> sce <- fetchDataset("zilionis-lung-2019", "2023-12-20", path = "human")
> logcounts(sce) <- log(assay(sce) + 1)
> # Select the top 1000 highest mean genes
> rs = rowSums(assay(sce, "logcounts"))
> sce2 <- sce[rank(-rs) <= 1000, ]
> dim(sce)
[1] 41861 173954
> system.time(svd0 <- svd(assay(sce2, "logcounts"), nu=0, nv=0))
user system elapsed
48.530 6.057 52.103
> system.time(svd0030 <- svd(assay(sce2, "logcounts"), nu=0, nv=30))
user system elapsed
74.601 8.004 78.390
> system.time(fullsvd <- svd(assay(sce2, "logcounts")))
user system elapsed
76.180 9.386 80.571
```