Unoptimized

> sessionInfo()
R version 4.5.1 (2025-06-13)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.12.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0

> library(scRNAseq)
> sce <- fetchDataset("zilionis-lung-2019", "2023-12-20", path = "human")
> logcounts(sce) <- log(assay(sce) + 1)

# Select the top 1000 highest mean genes
> rs = rowSums(assay(sce, "logcounts"))
> sce2 <- sce[rank(-rs) <= 1000, ]
> dim(sce)
[1]  41861 173954

# No loadings ("left"" singular vectors) or scores ("right" singular vectors) calculated
> system.time(svd0 <- svd(assay(sce2, "logcounts"), nu=0, nv=0))
   user  system elapsed 
188.761   4.722 193.365
# No loadings, 30 scores calculated
> system.time(svd0030 <- svd(assay(sce2, "logcounts"), nu=0, nv=30))
   user  system elapsed 
459.284   5.800 464.814 
# All loadings and scores calculated
> system.time(fullsvd <- svd(assay(sce2, "logcounts")))
   user  system elapsed 
460.722   6.440 466.884 

Optimized with parallelization

This is with the openblas-pthread library, for parallelized computations. There is another version for serial computations. The steps to do this on supermicro:

sudo apt install libopenblas0-pthread
# openblas:
ls -l /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r*.so
sudo update-alternatives --install /usr/lib/x86_64-linux-gnu/libblas.so.3 libblas.so.3-x86_64-linux-gnu /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so 110
sudo update-alternatives --config libblas.so.3-x86_64-linux-gnu # choose pthread
# lapack:
sudo update-alternatives --install /usr/lib/x86_64-linux-gnu/liblapack.so.3 liblapack.so.3-x86_64-linux-gnu /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so 110
sudo update-alternatives --config liblapack.so.3-x86_64-linux-gnu

Verify in R with sessionInfo().

> sessionInfo()
R version 4.5.1 (2025-06-13)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.2 LTS

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0

Note, when running svd, I will now see the process consuming 64 threads:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                       
2042136 levi      20   0   14.8g   6.0g 101376 R  6410   0.6  49:47.73 rsession                                                      
> suppressPackageStartupMessages(library(scRNAseq))
> sce <- fetchDataset("zilionis-lung-2019", "2023-12-20", path = "human")

> logcounts(sce) <- log(assay(sce) + 1)

> # Select the top 1000 highest mean genes
> rs = rowSums(assay(sce, "logcounts"))

> sce2 <- sce[rank(-rs) <= 1000, ]

> dim(sce)
[1]  41861 173954

> system.time(svd0 <- svd(assay(sce2, "logcounts"), nu=0, nv=0))
   user  system elapsed 
327.514 697.689  38.197 

> system.time(svd0030 <- svd(assay(sce2, "logcounts"), nu=0, nv=30))
    user   system  elapsed 
 615.962 1288.188   47.986 

> system.time(fullsvd <- svd(assay(sce2, "logcounts")))
    user   system  elapsed 
 623.346 1267.853   47.430 

Optimized without parallelization

Installation steps:

sudo apt install libopenblas0-serial
# BLAS
sudo update-alternatives --install /usr/lib/x86_64-linux-gnu/libblas.so.3 libblas.so.3-x86_64-linux-gnu /usr/lib/x86_64-linux-gnu/openblas-serial/libopenblas-r0.3.26.so 90
sudo update-alternatives --config libblas.so.3-x86_64-linux-gnu
# LAPACK
sudo update-alternatives --install /usr/lib/x86_64-linux-gnu/liblapack.so.3 liblapack.so.3-x86_64-linux-gnu /usr/lib/x86_64-linux-gnu/openblas-serial/libopenblas-r0.3.26.so 90
sudo update-alternatives --config liblapack.so.3-x86_64-linux-gnu

Confirmed that now it uses only a single CPU during SVD:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                       
2042847 levi      20   0 6824492   5.8g 101376 R 107.3   0.6   3:56.38 rsession      

Results:

> sessionInfo()
R version 4.5.1 (2025-06-13)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.2 LTS

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-serial/libopenblas-r0.3.26.so;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_4.5.1    tools_4.5.1       rstudioapi_0.17.1

> suppressPackageStartupMessages(library(scRNAseq))

> sce <- fetchDataset("zilionis-lung-2019", "2023-12-20", path = "human")

> logcounts(sce) <- log(assay(sce) + 1)

> # Select the top 1000 highest mean genes
> rs = rowSums(assay(sce, "logcounts"))

> sce2 <- sce[rank(-rs) <= 1000, ]

> dim(sce)
[1]  41861 173954

> system.time(svd0 <- svd(assay(sce2, "logcounts"), nu=0, nv=0))
   user  system elapsed 
 48.530   6.057  52.103 

> system.time(svd0030 <- svd(assay(sce2, "logcounts"), nu=0, nv=30))
   user  system elapsed 
 74.601   8.004  78.390 

> system.time(fullsvd <- svd(assay(sce2, "logcounts")))
   user  system elapsed 
 76.180   9.386  80.571 

```