1. load libraries


# Core single-cell
suppressPackageStartupMessages({
library(Seurat)
library(SingleR)
library(SummarizedExperiment)
library(Matrix)
library(S4Vectors)
library(dplyr)
library(ggplot2)
})


# Optional helpers (used only if present)
suppressWarnings({
if (!requireNamespace("BiocParallel", quietly = TRUE)) {
message("BiocParallel not found; proceeding in serial mode.")
}
})
options(stringsAsFactors = FALSE)
set.seed(1337)

2. Load Melange


# Load Melange_M0 Seurat object
load("/run/user/1000/gvfs/smb-share:server=10.144.142.131,share=commun/Ludivine/année_recherche_Ludivine/JUIN_OCTOBRE/manip_3/Melange_M0.Robj")  
Melange_M0  # Check object
An object of class Seurat 
59684 features across 58773 samples within 3 assays 
Active assay: RNA (36601 features, 0 variable features)
 2 layers present: data, counts
 2 other assays present: ADT, SCT
 2 dimensional reductions calculated: pca, umap

3. Load the Coulton reference atlas

atlas
An object of class Seurat 
111707 features across 363315 samples within 3 assays 
Active assay: RNA (64082 features, 0 variable features)
 2 layers present: counts, data
 2 other assays present: SCT, integrated
 1 dimensional reduction calculated: umap

Expression Data Quality in Query and Atlas


# For melange
Melange_M0
str(Melange_M0)
head(colnames(Melange_M0))
head(rownames(Melange_M0))

# For atlas
atlas
str(atlas)
head(colnames(atlas))
head(rownames(atlas))
head(atlas@meta.data)

#All cells have zero in nCount_RNA and nFeature_RNA.
#This means the counts layer in the RNA assay for every cell sums to zero—no gene expression data is present for any cell.


cat("**Note:** the reference does not have gene expression information.\n\n",
    "**Important:** Most annotation tools (ProjecTils, SingleR, Seurat label transfer) require expression data in the atlas to compare new cells for label assignment.\n\n",
    "**Reminder:** The presence of cell IDs and annotation alone is not enough for label transfer.\n\n",
    "**Requirement:** You need at least a non-zero counts matrix in the RNA assay of the atlas.\n")

Assessment of Reference and Query

# Sum counts per cell, using layer 'counts'
sum_zero_melange <- sum(Matrix::colSums(GetAssayData(Melange_M0, assay = "RNA", slot = "counts")) == 0)

cat("Zero-count cells in Melange_M0 RNA assay:", sum_zero_melange, "\n")
Zero-count cells in Melange_M0 RNA assay: 0 
sum_zero_atlas <- sum(Matrix::colSums(GetAssayData(atlas, assay = "RNA", slot = "counts")) == 0)

cat("Zero-count cells in atlas RNA assay:", sum_zero_atlas, "\n")
Zero-count cells in atlas RNA assay: 363315 

Step 1 – Assessment of Reference and Query:

Our query object is good, but the atlas is effectively unusable as a reference for expression-based label transfer because it has no gene expression data per cell.

Results:

Query (Melange_M0) RNA assay: 0 cells have zero total counts. This means all cells in your query object have valid expression data, which is good for analysis and label transfer.

Atlas RNA assay: All 363,315 cells show zero counts. This confirms the earlier observation that the atlas lacks any usable expression data in the RNA assay.

Note: the reference does not have gene expression information.

Important: Most annotation tools (ProjecTils, SingleR, Seurat label transfer) require expression data in the atlas to compare new cells for label assignment.

Reminder: The presence of cell IDs and annotation alone is not enough for label transfer.

Requirement: You need at least a non-zero counts matrix in the RNA assay of the atlas.

Alternative Assays in Atlas for Usable Data


# Using GetAssayData with 'layer' argument in Seurat v5
sum_zero_SCT <- sum(Matrix::colSums(GetAssayData(atlas, assay = "SCT", layer = "counts")) == 0)
cat("Zero-count cells in atlas SCT assay:", sum_zero_SCT, "\n")
Zero-count cells in atlas SCT assay: 363315 
sum_zero_integrated <- sum(Matrix::colSums(GetAssayData(atlas, assay = "integrated", layer = "counts")) == 0)
cat("Zero-count cells in atlas integrated assay:", sum_zero_integrated, "\n")
Zero-count cells in atlas integrated assay: 0 

Step 2 – Data Quality Check:

Before performing label transfer, we confirmed that the query object (Melange_M0) contains valid expression data for all cells.

The reference atlas, however, lacks gene expression data in its RNA assay, making it unsuitable for expression-based label transfer.

Implication:
Expression-based annotation tools (e.g., SingleR, Seurat label transfer, ProjecTILs) require non-zero counts in the reference atlas to assign labels meaningfully. Since the atlas has zero counts, it cannot serve as a proper reference.

Dimplot Visualization

library(Seurat)

# For Melange_M0 object
DimPlot(Melange_M0, reduction = "umap", label = TRUE, group.by = "Clusters", repel = T) + ggtitle("UMAP of Melange_M0")


DefaultAssay(atlas) <- "integrated"



DimPlot(atlas, reduction = "umap", group.by = "short.label",label = F, cells = which(!is.na(atlas$short.label)))

NA
NA

Prepare the UMAP coordinates

# Since the atlas has no expression data, we will project your query onto the atlas UMAP using the existing coordinates, without requiring RNA counts.
# Atlas UMAP coordinates
atlas_umap <- Embeddings(atlas, "umap")
head(atlas_umap)
        UMAP_1    UMAP_2
c99   1.722056  1.956212
c177 -3.854096 -4.504594
c215  3.202773  1.778220
c461  1.706688  1.998811
c538 -3.784627 -4.484729
c576  3.187453  1.769555
# Melange_M0 UMAP coordinates
melange_umap <- Embeddings(Melange_M0, "umap")
head(melange_umap)
                                  umap_1     umap_2
ARSI-M0-T24_AAACCTGAGAGGTTGC-1  6.909624 -0.7008778
ARSI-M0-T24_AAACCTGAGAGTACCG-1 -3.761140 -8.0694714
ARSI-M0-T24_AAACCTGAGCACCGCT-1  4.789340  2.2480630
ARSI-M0-T24_AAACCTGAGCCGATTT-1 -4.228836 -8.1086551
ARSI-M0-T24_AAACCTGAGCGAGAAA-1  9.942248  0.4261332
ARSI-M0-T24_AAACCTGCACGTAAGG-1 -4.625619 -6.1553312

Create combined data frame for plotting

library(ggplot2)

# Atlas
df_atlas <- data.frame(atlas_umap)
df_atlas$dataset <- "Atlas"
df_atlas$cluster <- atlas$short.label  # or seurat_clusters if you prefer

# Melange_M0
df_melange <- data.frame(melange_umap)
df_melange$dataset <- "Melange_M0"


# Atlas
df_atlas <- data.frame(
  UMAP_1 = atlas_umap[,1],
  UMAP_2 = atlas_umap[,2],
  dataset = "Atlas",
  cluster = atlas$short.label   # optional; can remove if not needed
)
#
# Melange_M0
df_melange <- data.frame(
  UMAP_1 = melange_umap[,1],
  UMAP_2 = melange_umap[,2],
  dataset = "Melange_M0",
  cluster = NA   # Add NA so column structure matches atlas
)


# Combine
df_combined <- rbind(df_atlas, df_melange)

Plot

ggplot(df_combined, aes(x=UMAP_1, y=UMAP_2, color=dataset)) +
  geom_point(alpha=0.5, size=0.5) +
  theme_minimal() +
  ggtitle("Melange_M0 projected onto Atlas UMAP")

Overlay query on atlas UMAP

#Since your atlas has no expression data, you cannot do label transfer yet, but you can still inspect how the query cells align in UMAP space.

library(ggplot2)

ggplot(df_combined, aes(x=UMAP_1, y=UMAP_2)) +
  geom_point(data = subset(df_combined, dataset == "Atlas"),
             color = "lightgray", alpha = 0.3, size = 0.5) +
  geom_point(data = subset(df_combined, dataset == "Melange_M0"),
             color = "red", alpha = 0.5, size = 0.7) +
  theme_minimal() +
  ggtitle("Projection of Melange_M0 onto Atlas UMAP")

NA
NA

highlight atlas clusters

ggplot(df_combined, aes(x=UMAP_1, y=UMAP_2)) +
  geom_point(data = subset(df_combined, dataset == "Atlas"),
             aes(color = cluster), alpha = 0.3, size = 0.5) +
  geom_point(data = subset(df_combined, dataset == "Melange_M0"),
             color = "red", alpha = 0.5, size = 0.7) +
  theme_minimal() +
  ggtitle("Projection of Melange_M0 onto Atlas UMAP with atlas clusters")

NA
NA

Prepare your query and reference

# Make sure your atlas is normalized and has a UMAP
DefaultAssay(atlas) <- "integrated"

Approximate annotation using UMAP coordinates

# 6. Visualize
DimPlot(Melange_M0, group.by = "predicted_cluster", label = F, repel = TRUE)

table(Melange_M0$Clusters, Melange_M0$predicted_cluster)
            
             0_AlvMac 1_MetM2Mac 2_C3Mac 3_ICIMac1 4_ICIMac2 5_StressMac 6_SPP1AREGMac
  ARSI_J1        1636        173     686        40      2880         132            58
  ARSI_J4        1447        243     600        29      2846         150            57
  ARSI_M0_J1       14       4461       8         0         3           0             8
  ARSI_M0_J4     7301          8      11        17        56          24          1409
  M0                2        820     529        33        81         667           359
  TAM_J1            0        140    1409      4676        20        5653            42
  TAM_J4            0          0       0        34         6           0             0
            
             7_IFNMac 8_IFNGMac 9_AngioMac 10_InflamMac 11_MetalloMac 12_MBMMac
  ARSI_J1          21      1015        204           75           793      1076
  ARSI_J4          31       948        252           94           859      1068
  ARSI_M0_J1        5         5         14            2             9         4
  ARSI_M0_J4       18        21         24           13            32        17
  M0              763        32        225          316            29        34
  TAM_J1          203         0          0           11            43         4
  TAM_J4            0         3          0            0             0         0
            
             13_CalciumMac 14_ProliMac 15_LYZMac 16_ECMHomeoMac 17_IFNMac3 18_ECMMac
  ARSI_J1              919          44       742             41         34       132
  ARSI_J4             1103          47       814             62         46       104
  ARSI_M0_J1             1           1         3              1          0         1
  ARSI_M0_J4            14           4        19             19          1      2733
  M0                    20           2         5            108         61        14
  TAM_J1                 2           2         0            104         31         5
  TAM_J4                 0         233         0           1587          0         0
            
             19_ClassMono 20_TDoub 21_HemeMac 22_IFNMac4 23_NA
  ARSI_J1             378      750         24         84     0
  ARSI_J4             489      741         17        102     0
  ARSI_M0_J1            4        1         11          1     0
  ARSI_M0_J4           14       14          0          4     0
  M0                   11       16         11          1     0
  TAM_J1                0        0         10          0     0
  TAM_J4                0        0          0          0     0

Prepare your query and reference

DimPlot(Melange_M0, group.by = "predicted_cluster", label = F) +
  ggtitle("Melange_M0 projected onto atlas clusters (nearest neighbor)")

Prepare your query and reference

library(ggplot2)

DimPlot(atlas, group.by = "short.label") +
  geom_point(
    data = as.data.frame(Embeddings(Melange_M0, "umap")) %>%
             mutate(predicted_cluster = Melange_M0$predicted_cluster),
    aes(x = umap_1, y = umap_2, color = predicted_cluster),
    alpha = 0.5, size = 0.5
  ) +
  ggtitle("Melange_M0 cells projected onto Atlas UMAP")

NA
NA

Summary of Steps and Observations

Load Query and Atlas

  • Query: Melange_M0 Seurat object with valid RNA counts.
  • Atlas: Coulton macrophage atlas (atlas) with UMAP embeddings and cluster labels, but no gene expression data (RNA assay counts are all zero).

Assess Data Quality

  • Query has usable expression for all cells → suitable for expression-based annotation.
  • Atlas lacks gene expression → cannot perform expression-based label transfer (tools like SingleR, Seurat label transfer, or ProjecTILs rely on comparing expression profiles).

Visualization

  • Both atlas and query UMAPs can be visualized.
  • Query cells can be projected onto atlas UMAP using nearest-neighbor approaches in coordinate space, but cluster assignment will be approximate.

Attempting Annotation Without Expression

  • Since the atlas has no expression data, tools like SingleR or normal ProjecTILs cannot assign labels.
  • Alternative: coordinate-based nearest-neighbor projection on UMAP or PCA embeddings, which approximates the closest atlas cluster for each query cell.

Why Expression Data is Needed

  • Expression-based annotation tools compare the gene expression profile of query cells to the reference.
  • Without expression:
    • Cannot calculate similarity.
    • Cannot identify marker genes.
    • Cannot confidently assign labels.
  • UMAP or PCA embeddings alone allow visual projection, but not true transcriptomic label transfer.

Methods That Can Work Without Expression

  • UMAP/PCA nearest-neighbor projection:
    Assign each query cell to the nearest atlas cluster using embeddings. Provides approximate annotation, not transcriptomic confirmation.

Key Takeaway

For true expression-based annotation, the reference atlas must have non-zero counts.
Without it, you can only perform embedding-based nearest-neighbor mapping, useful for visualization and rough cluster assignment, but not for transcriptomic validation.

