2. Load Melange
# Load Melange_M0 Seurat object
load("/run/user/1000/gvfs/smb-share:server=10.144.142.131,share=commun/Ludivine/année_recherche_Ludivine/JUIN_OCTOBRE/manip_3/Melange_M0.Robj")
Melange_M0 # Check object
An object of class Seurat
59684 features across 58773 samples within 3 assays
Active assay: RNA (36601 features, 0 variable features)
2 layers present: data, counts
2 other assays present: ADT, SCT
2 dimensional reductions calculated: pca, umap
3. Load the Coulton reference atlas
atlas
An object of class Seurat
111707 features across 363315 samples within 3 assays
Active assay: RNA (64082 features, 0 variable features)
2 layers present: counts, data
2 other assays present: SCT, integrated
1 dimensional reduction calculated: umap
Expression Data Quality in Query and Atlas
# For melange
Melange_M0
str(Melange_M0)
head(colnames(Melange_M0))
head(rownames(Melange_M0))
# For atlas
atlas
str(atlas)
head(colnames(atlas))
head(rownames(atlas))
head(atlas@meta.data)
#All cells have zero in nCount_RNA and nFeature_RNA.
#This means the counts layer in the RNA assay for every cell sums to zero—no gene expression data is present for any cell.
cat("**Note:** the reference does not have gene expression information.\n\n",
"**Important:** Most annotation tools (ProjecTils, SingleR, Seurat label transfer) require expression data in the atlas to compare new cells for label assignment.\n\n",
"**Reminder:** The presence of cell IDs and annotation alone is not enough for label transfer.\n\n",
"**Requirement:** You need at least a non-zero counts matrix in the RNA assay of the atlas.\n")
Assessment of Reference and Query
# Sum counts per cell, using layer 'counts'
sum_zero_melange <- sum(Matrix::colSums(GetAssayData(Melange_M0, assay = "RNA", slot = "counts")) == 0)
cat("Zero-count cells in Melange_M0 RNA assay:", sum_zero_melange, "\n")
Zero-count cells in Melange_M0 RNA assay: 0
sum_zero_atlas <- sum(Matrix::colSums(GetAssayData(atlas, assay = "RNA", slot = "counts")) == 0)
cat("Zero-count cells in atlas RNA assay:", sum_zero_atlas, "\n")
Zero-count cells in atlas RNA assay: 363315
Step 1 – Assessment of Reference and Query:
Our query object is good, but the atlas is effectively unusable as a
reference for expression-based label transfer because it has no gene
expression data per cell.
Results:
Query (Melange_M0) RNA assay: 0 cells have zero
total counts. This means all cells in your query object have valid
expression data, which is good for analysis and label transfer.
Atlas RNA assay: All 363,315 cells show zero counts.
This confirms the earlier observation that the atlas lacks any usable
expression data in the RNA assay.
Note: the reference does not have gene expression
information.
Important: Most annotation tools (ProjecTils,
SingleR, Seurat label transfer) require expression data in the atlas to
compare new cells for label assignment.
Reminder: The presence of cell IDs and annotation
alone is not enough for label transfer.
Requirement: You need at least a non-zero counts
matrix in the RNA assay of the atlas.
Alternative Assays in Atlas for Usable Data
# Using GetAssayData with 'layer' argument in Seurat v5
sum_zero_SCT <- sum(Matrix::colSums(GetAssayData(atlas, assay = "SCT", layer = "counts")) == 0)
cat("Zero-count cells in atlas SCT assay:", sum_zero_SCT, "\n")
Zero-count cells in atlas SCT assay: 363315
sum_zero_integrated <- sum(Matrix::colSums(GetAssayData(atlas, assay = "integrated", layer = "counts")) == 0)
cat("Zero-count cells in atlas integrated assay:", sum_zero_integrated, "\n")
Zero-count cells in atlas integrated assay: 0
Step 2 – Data Quality Check:
Before performing label transfer, we confirmed that the query object
(Melange_M0) contains valid expression data for all cells.
The reference atlas, however, lacks gene expression data in its RNA
assay, making it unsuitable for expression-based label transfer.
Implication:
Expression-based annotation tools (e.g., SingleR, Seurat label transfer,
ProjecTILs) require non-zero counts in the reference atlas to assign
labels meaningfully. Since the atlas has zero counts, it cannot serve as
a proper reference.
Dimplot Visualization
library(Seurat)
# For Melange_M0 object
DimPlot(Melange_M0, reduction = "umap", label = TRUE, group.by = "Clusters", repel = T) + ggtitle("UMAP of Melange_M0")

DefaultAssay(atlas) <- "integrated"
DimPlot(atlas, reduction = "umap", group.by = "short.label",label = F, cells = which(!is.na(atlas$short.label)))

NA
NA
Prepare the UMAP coordinates
# Since the atlas has no expression data, we will project your query onto the atlas UMAP using the existing coordinates, without requiring RNA counts.
# Atlas UMAP coordinates
atlas_umap <- Embeddings(atlas, "umap")
head(atlas_umap)
UMAP_1 UMAP_2
c99 1.722056 1.956212
c177 -3.854096 -4.504594
c215 3.202773 1.778220
c461 1.706688 1.998811
c538 -3.784627 -4.484729
c576 3.187453 1.769555
# Melange_M0 UMAP coordinates
melange_umap <- Embeddings(Melange_M0, "umap")
head(melange_umap)
umap_1 umap_2
ARSI-M0-T24_AAACCTGAGAGGTTGC-1 6.909624 -0.7008778
ARSI-M0-T24_AAACCTGAGAGTACCG-1 -3.761140 -8.0694714
ARSI-M0-T24_AAACCTGAGCACCGCT-1 4.789340 2.2480630
ARSI-M0-T24_AAACCTGAGCCGATTT-1 -4.228836 -8.1086551
ARSI-M0-T24_AAACCTGAGCGAGAAA-1 9.942248 0.4261332
ARSI-M0-T24_AAACCTGCACGTAAGG-1 -4.625619 -6.1553312
Create combined data frame for plotting
library(ggplot2)
# Atlas
df_atlas <- data.frame(atlas_umap)
df_atlas$dataset <- "Atlas"
df_atlas$cluster <- atlas$short.label # or seurat_clusters if you prefer
# Melange_M0
df_melange <- data.frame(melange_umap)
df_melange$dataset <- "Melange_M0"
# Atlas
df_atlas <- data.frame(
UMAP_1 = atlas_umap[,1],
UMAP_2 = atlas_umap[,2],
dataset = "Atlas",
cluster = atlas$short.label # optional; can remove if not needed
)
#
# Melange_M0
df_melange <- data.frame(
UMAP_1 = melange_umap[,1],
UMAP_2 = melange_umap[,2],
dataset = "Melange_M0",
cluster = NA # Add NA so column structure matches atlas
)
# Combine
df_combined <- rbind(df_atlas, df_melange)
Overlay query on atlas UMAP
#Since your atlas has no expression data, you cannot do label transfer yet, but you can still inspect how the query cells align in UMAP space.
library(ggplot2)
ggplot(df_combined, aes(x=UMAP_1, y=UMAP_2)) +
geom_point(data = subset(df_combined, dataset == "Atlas"),
color = "lightgray", alpha = 0.3, size = 0.5) +
geom_point(data = subset(df_combined, dataset == "Melange_M0"),
color = "red", alpha = 0.5, size = 0.7) +
theme_minimal() +
ggtitle("Projection of Melange_M0 onto Atlas UMAP")

NA
NA
highlight atlas clusters
ggplot(df_combined, aes(x=UMAP_1, y=UMAP_2)) +
geom_point(data = subset(df_combined, dataset == "Atlas"),
aes(color = cluster), alpha = 0.3, size = 0.5) +
geom_point(data = subset(df_combined, dataset == "Melange_M0"),
color = "red", alpha = 0.5, size = 0.7) +
theme_minimal() +
ggtitle("Projection of Melange_M0 onto Atlas UMAP with atlas clusters")

NA
NA
Prepare your query and reference
# Make sure your atlas is normalized and has a UMAP
DefaultAssay(atlas) <- "integrated"
Approximate annotation using UMAP coordinates

# 6. Visualize
DimPlot(Melange_M0, group.by = "predicted_cluster", label = F, repel = TRUE)
table(Melange_M0$Clusters, Melange_M0$predicted_cluster)
0_AlvMac 1_MetM2Mac 2_C3Mac 3_ICIMac1 4_ICIMac2 5_StressMac 6_SPP1AREGMac
ARSI_J1 1636 173 686 40 2880 132 58
ARSI_J4 1447 243 600 29 2846 150 57
ARSI_M0_J1 14 4461 8 0 3 0 8
ARSI_M0_J4 7301 8 11 17 56 24 1409
M0 2 820 529 33 81 667 359
TAM_J1 0 140 1409 4676 20 5653 42
TAM_J4 0 0 0 34 6 0 0
7_IFNMac 8_IFNGMac 9_AngioMac 10_InflamMac 11_MetalloMac 12_MBMMac
ARSI_J1 21 1015 204 75 793 1076
ARSI_J4 31 948 252 94 859 1068
ARSI_M0_J1 5 5 14 2 9 4
ARSI_M0_J4 18 21 24 13 32 17
M0 763 32 225 316 29 34
TAM_J1 203 0 0 11 43 4
TAM_J4 0 3 0 0 0 0
13_CalciumMac 14_ProliMac 15_LYZMac 16_ECMHomeoMac 17_IFNMac3 18_ECMMac
ARSI_J1 919 44 742 41 34 132
ARSI_J4 1103 47 814 62 46 104
ARSI_M0_J1 1 1 3 1 0 1
ARSI_M0_J4 14 4 19 19 1 2733
M0 20 2 5 108 61 14
TAM_J1 2 2 0 104 31 5
TAM_J4 0 233 0 1587 0 0
19_ClassMono 20_TDoub 21_HemeMac 22_IFNMac4 23_NA
ARSI_J1 378 750 24 84 0
ARSI_J4 489 741 17 102 0
ARSI_M0_J1 4 1 11 1 0
ARSI_M0_J4 14 14 0 4 0
M0 11 16 11 1 0
TAM_J1 0 0 10 0 0
TAM_J4 0 0 0 0 0
Prepare your query and reference
DimPlot(Melange_M0, group.by = "predicted_cluster", label = F) +
ggtitle("Melange_M0 projected onto atlas clusters (nearest neighbor)")

Prepare your query and reference
library(ggplot2)
DimPlot(atlas, group.by = "short.label") +
geom_point(
data = as.data.frame(Embeddings(Melange_M0, "umap")) %>%
mutate(predicted_cluster = Melange_M0$predicted_cluster),
aes(x = umap_1, y = umap_2, color = predicted_cluster),
alpha = 0.5, size = 0.5
) +
ggtitle("Melange_M0 cells projected onto Atlas UMAP")

NA
NA
---
title: "Atlas Mapping of Melange_M0 to Coulton Macrophage Atlas"
author: "Nasir Mahmood Abbasi"
date: "`r Sys.Date()`"
output:
  html_notebook:
    toc: yes
    toc_float: yes
    toc_collapsed: yes
  word_document:
    toc: yes
  html_document:
    toc: yes
    df_print: paged
  pdf_document:
    toc: yes
---

# 1. load libraries
```{r}

# Core single-cell
suppressPackageStartupMessages({
library(Seurat)
library(SingleR)
library(SummarizedExperiment)
library(Matrix)
library(S4Vectors)
library(dplyr)
library(ggplot2)
})


# Optional helpers (used only if present)
suppressWarnings({
if (!requireNamespace("BiocParallel", quietly = TRUE)) {
message("BiocParallel not found; proceeding in serial mode.")
}
})
options(stringsAsFactors = FALSE)
set.seed(1337)

```


# 2. Load Melange
```{r}

# # Load Melange_M0 Seurat object
# load("/run/user/1000/gvfs/smb-share:server=10.144.142.131,share=commun/Ludivine/année_recherche_Ludivine/JUIN_OCTOBRE/manip_3/Melange_M0.Robj")  
# Melange_M0  # Check object

```

# 3. Load the Coulton reference atlas
```{r}

# atlas <- readRDS("/run/user/1000/gvfs/smb-share:server=10.144.142.131,share=commun/Ludivine/alexcoulton-macrophage-atlas-5449048/Coulton.rds")
# 
# atlas
# 
# # This means you do have expression data in the atlas!

```

##  Expression Data Quality in Query and Atlas
```{r}

# For melange
Melange_M0
str(Melange_M0)
head(colnames(Melange_M0))
head(rownames(Melange_M0))

# For atlas
atlas
str(atlas)
head(colnames(atlas))
head(rownames(atlas))
head(atlas@meta.data)

#All cells have zero in nCount_RNA and nFeature_RNA.
#This means the counts layer in the RNA assay for every cell sums to zero—no gene expression data is present for any cell.


cat("**Note:** the reference does not have gene expression information.\n\n",
    "**Important:** Most annotation tools (ProjecTils, SingleR, Seurat label transfer) require expression data in the atlas to compare new cells for label assignment.\n\n",
    "**Reminder:** The presence of cell IDs and annotation alone is not enough for label transfer.\n\n",
    "**Requirement:** You need at least a non-zero counts matrix in the RNA assay of the atlas.\n")



```

##  Assessment of Reference and Query
```{r}
# Sum counts per cell, using layer 'counts'
sum_zero_melange <- sum(Matrix::colSums(GetAssayData(Melange_M0, assay = "RNA", slot = "counts")) == 0)

cat("Zero-count cells in Melange_M0 RNA assay:", sum_zero_melange, "\n")


sum_zero_atlas <- sum(Matrix::colSums(GetAssayData(atlas, assay = "RNA", slot = "counts")) == 0)

cat("Zero-count cells in atlas RNA assay:", sum_zero_atlas, "\n")

```

**Step 1 – Assessment of Reference and Query:**  

Our query object is good, but the atlas is effectively unusable as a reference for expression-based label transfer because it has no gene expression data per cell.

**Results:**  

**Query (Melange_M0) RNA assay:** 0 cells have zero total counts. This means all cells in your query object have valid expression data, which is good for analysis and label transfer.  

**Atlas RNA assay:** All 363,315 cells show zero counts. This confirms the earlier observation that the atlas lacks any usable expression data in the RNA assay.


**Note:** the reference does not have gene expression information.  

**Important:** Most annotation tools (ProjecTils, SingleR, Seurat label transfer) require expression data in the atlas to compare new cells for label assignment.  

**Reminder:** The presence of cell IDs and annotation alone is not enough for label transfer.  

**Requirement:** You need at least a non-zero counts matrix in the RNA assay of the atlas.


## Alternative Assays in Atlas for Usable Data
```{r}

# Using GetAssayData with 'layer' argument in Seurat v5
sum_zero_SCT <- sum(Matrix::colSums(GetAssayData(atlas, assay = "SCT", layer = "counts")) == 0)
cat("Zero-count cells in atlas SCT assay:", sum_zero_SCT, "\n")

sum_zero_integrated <- sum(Matrix::colSums(GetAssayData(atlas, assay = "integrated", layer = "counts")) == 0)
cat("Zero-count cells in atlas integrated assay:", sum_zero_integrated, "\n")



```

**Step 2 – Data Quality Check:**  

Before performing label transfer, we confirmed that the query object (Melange_M0) contains valid expression data for all cells.  

The reference atlas, however, lacks gene expression data in its RNA assay, making it unsuitable for expression-based label transfer.  

**Implication:**  
Expression-based annotation tools (e.g., SingleR, Seurat label transfer, ProjecTILs) require non-zero counts in the reference atlas to assign labels meaningfully. Since the atlas has zero counts, it cannot serve as a proper reference.



## Dimplot Visualization
```{r}
library(Seurat)

# For Melange_M0 object
DimPlot(Melange_M0, reduction = "umap", label = TRUE, group.by = "Clusters", repel = T) + ggtitle("UMAP of Melange_M0")

DefaultAssay(atlas) <- "integrated"



DimPlot(atlas, reduction = "umap", group.by = "short.label",label = F, cells = which(!is.na(atlas$short.label)))


```


##  Prepare the UMAP coordinates
```{r}
# Since the atlas has no expression data, we will project your query onto the atlas UMAP using the existing coordinates, without requiring RNA counts.
# Atlas UMAP coordinates
atlas_umap <- Embeddings(atlas, "umap")
head(atlas_umap)

# Melange_M0 UMAP coordinates
melange_umap <- Embeddings(Melange_M0, "umap")
head(melange_umap)

```

## Create combined data frame for plotting
```{r}
library(ggplot2)

# Atlas
df_atlas <- data.frame(atlas_umap)
df_atlas$dataset <- "Atlas"
df_atlas$cluster <- atlas$short.label  # or seurat_clusters if you prefer

# Melange_M0
df_melange <- data.frame(melange_umap)
df_melange$dataset <- "Melange_M0"


# Atlas
df_atlas <- data.frame(
  UMAP_1 = atlas_umap[,1],
  UMAP_2 = atlas_umap[,2],
  dataset = "Atlas",
  cluster = atlas$short.label   # optional; can remove if not needed
)
#
# Melange_M0
df_melange <- data.frame(
  UMAP_1 = melange_umap[,1],
  UMAP_2 = melange_umap[,2],
  dataset = "Melange_M0",
  cluster = NA   # Add NA so column structure matches atlas
)


# Combine
df_combined <- rbind(df_atlas, df_melange)


```

# Plot
```{r}
ggplot(df_combined, aes(x=UMAP_1, y=UMAP_2, color=dataset)) +
  geom_point(alpha=0.5, size=0.5) +
  theme_minimal() +
  ggtitle("Melange_M0 projected onto Atlas UMAP")

```

# Overlay query on atlas UMAP
```{r}
#Since your atlas has no expression data, you cannot do label transfer yet, but you can still inspect how the query cells align in UMAP space.

library(ggplot2)

ggplot(df_combined, aes(x=UMAP_1, y=UMAP_2)) +
  geom_point(data = subset(df_combined, dataset == "Atlas"),
             color = "lightgray", alpha = 0.3, size = 0.5) +
  geom_point(data = subset(df_combined, dataset == "Melange_M0"),
             color = "red", alpha = 0.5, size = 0.7) +
  theme_minimal() +
  ggtitle("Projection of Melange_M0 onto Atlas UMAP")


```

## highlight atlas clusters
```{r}
ggplot(df_combined, aes(x=UMAP_1, y=UMAP_2)) +
  geom_point(data = subset(df_combined, dataset == "Atlas"),
             aes(color = cluster), alpha = 0.3, size = 0.5) +
  geom_point(data = subset(df_combined, dataset == "Melange_M0"),
             color = "red", alpha = 0.5, size = 0.7) +
  theme_minimal() +
  ggtitle("Projection of Melange_M0 onto Atlas UMAP with atlas clusters")


```


## Prepare your query and reference
```{r}
# Make sure your atlas is normalized and has a UMAP
DefaultAssay(atlas) <- "integrated"

```

## Approximate annotation using UMAP coordinates
```{r}
library(FNN)

# 1. Extract UMAP embeddings for atlas and query
atlas_umap <- Embeddings(atlas, "umap")
query_umap <- Embeddings(Melange_M0, "umap")

# 2. Get the cluster labels from atlas (choose correct column)
atlas_clusters <- atlas$short.label   

# 3. Run nearest-neighbor search (for each query cell, find closest atlas cell in UMAP space)
nn <- get.knnx(data = atlas_umap, query = query_umap, k = 1)

# 4. Assign query cells the cluster label of their nearest atlas neighbor
predicted_cluster <- atlas_clusters[nn$nn.index[,1]]
names(predicted_cluster) <- rownames(query_umap)

# 5. Add to query Seurat object
Melange_M0 <- AddMetaData(Melange_M0, metadata = predicted_cluster, col.name = "predicted_cluster")

# 6. Visualize
DimPlot(Melange_M0, group.by = "predicted_cluster", label = F, repel = TRUE)

table(Melange_M0$Clusters, Melange_M0$predicted_cluster)

```

## Prepare your query and reference
```{r}
DimPlot(Melange_M0, group.by = "predicted_cluster", label = F) +
  ggtitle("Melange_M0 projected onto atlas clusters (nearest neighbor)")

```

## Prepare your query and reference
```{r}
library(ggplot2)

DimPlot(atlas, group.by = "short.label") +
  geom_point(
    data = as.data.frame(Embeddings(Melange_M0, "umap")) %>%
             mutate(predicted_cluster = Melange_M0$predicted_cluster),
    aes(x = umap_1, y = umap_2, color = predicted_cluster),
    alpha = 0.5, size = 0.5
  ) +
  ggtitle("Melange_M0 cells projected onto Atlas UMAP")


```
# **Summary of Steps and Observations**

## **Load Query and Atlas**
- **Query:** `Melange_M0` Seurat object with valid RNA counts.
- **Atlas:** Coulton macrophage atlas (`atlas`) with UMAP embeddings and cluster labels, but **no gene expression data** (RNA assay counts are all zero).

## **Assess Data Quality**
- Query has usable expression for all cells → suitable for **expression-based annotation**.  
- Atlas lacks gene expression → cannot perform **expression-based label transfer** (tools like SingleR, Seurat label transfer, or ProjecTILs rely on comparing expression profiles).

## **Visualization**
- Both atlas and query **UMAPs can be visualized**.  
- Query cells can be projected onto atlas UMAP using **nearest-neighbor approaches in coordinate space**, but **cluster assignment will be approximate**.

## **Attempting Annotation Without Expression**
- Since the atlas has no expression data, tools like **SingleR** or normal **ProjecTILs** cannot assign labels.  
- **Alternative:** coordinate-based nearest-neighbor projection on **UMAP or PCA embeddings**, which approximates the closest atlas cluster for each query cell.

## **Why Expression Data is Needed**
- Expression-based annotation tools compare the **gene expression profile** of query cells to the reference.  
- Without expression:
  - Cannot calculate similarity.
  - Cannot identify marker genes.
  - Cannot confidently assign labels.  
- UMAP or PCA embeddings alone allow visual projection, **but not true transcriptomic label transfer**.

## **Methods That Can Work Without Expression**
- **UMAP/PCA nearest-neighbor projection:**  
  Assign each query cell to the nearest atlas cluster using embeddings. Provides approximate annotation, not transcriptomic confirmation.  


## **Key Takeaway**
For **true expression-based annotation**, the reference atlas **must have non-zero counts**.  
Without it, you can only perform **embedding-based nearest-neighbor mapping**, useful for visualization and rough cluster assignment, **but not for transcriptomic validation**.







