# 读取 GSE218316 的基因文件
gene <- read.table("D:/Book/T2DM/Data/10x/genes.tsv")$V1
head(gene)[1] "REG3A" "CXCL10" "CTRB2" "PRSS2" "REG1A" "SPINK1"
length(gene)[1] 2000
施工中
施工中
施工中
数据集:GSE218316,一个关于胰岛细胞的数据集。
三个预处理好的 10x 文件,一个记录了 meta.data 的文件
GSE218316_filtered_counts_barcodes.tsv.gz
GSE218316_filtered_counts_data.mtx.gz
GSE218316_filtered_counts_gene.tsv.gz
GSE218316_metadata_percell.csv.gz
前三份文件是作者过滤过的SeuratObject导出,最后一份文件提供了metadata,方便我们创建自己的SeuratObject时添加注释。
将上述头三个文件解压重命名为barcodes.tsv, matrix.mtx, genes.tsv,我把这三个文件放在了工作路径下的Data文件夹里。
解压GSE218316_metadata_percell.csv.gz,得到 GSE218316_metadata_percell.csv。
其中barcodes与矩阵文件都是和我们在前文中读取的格式一致,唯独genes.tsv和以往的features.tsv不同。我们可以看看差异:
# 读取 GSE218316 的基因文件
gene <- read.table("D:/Book/T2DM/Data/10x/genes.tsv")$V1
head(gene)[1] "REG3A" "CXCL10" "CTRB2" "PRSS2" "REG1A" "SPINK1"
length(gene)[1] 2000
# 随便挑一个原始的常规基因文件读取
gene <- read.table("D:/Book/T2DM/Data/10x/feature.tsv")$V2
head(gene)[1] "MIR1302-2HG" "FAM138A" "OR4F5" "AL627309.1" "AL627309.3"
[6] "AL627309.2"
length(gene)[1] 33540
不难发现处理过的基因文件只剩下了2000个genesymbol。由于格式不同,我们不可以用read_10x()函数来读取。且由于基因个数不是常规的基因数(下文讨论),无法将其合并到处理好的表达矩阵中。在这里我们可以替换成该数据集的原始基因文件(由于几个样本的平台相同,因此随便选一个基因文件作为输入即可)。
# 常规情况下,读取第二栏获取 genesymbol
dir.10x <- "D:/Book/T2DM/Data/10x/"
genes <- read.table(paste0(dir.10x, "feature.tsv"), stringsAsFactors = F, header = F)$V2
head(genes)[1] "MIR1302-2HG" "FAM138A" "OR4F5" "AL627309.1" "AL627309.3"
[6] "AL627309.2"
# 此时只要读取第一栏
genes <- read.table(paste0(dir.10x, "genes.tsv"), stringsAsFactors = F, header = F)$V1
head(genes)[1] "REG3A" "CXCL10" "CTRB2" "PRSS2" "REG1A" "SPINK1"
现在需要处理 metadata 文件。
# 导入 metadata
meta <- read.csv("D:/Book/T2DM/Data/GSE218316_metadata_percell.csv",header = T )
head(meta) X Condition Donor Timepoint seurat_clusters
1 B1_AAACCTGAGACTAAGT Glucose+Palmitate D1 24h 23
2 B1_AAACCTGAGCAGGCTA Glucose+Palmitate D1 24h 16
3 B1_AAACGGGGTTTCGCTC Glucose+Palmitate D1 24h 7
4 B1_AAAGATGCACATAACC Glucose+Palmitate D1 24h 28
5 B1_AAAGATGCACCTTGTC Glucose+Palmitate D1 24h 20
6 B1_AAAGATGTCGTCTGCT Glucose+Palmitate D1 24h 23
CTclass Celltype
1 Exocrine Duct
2 Exocrine Duct
3 Endocrine Beta
4 Exocrine Duct
5 Endocrine Alpha
6 Exocrine Duct
我们在CreateSeuratObject()函数中的meta.data参数中导入meta变量即可。我在下面使用了分布读取的方法,这里直接用 read_10x() 也是可以的。
library(Seurat)Loading required package: SeuratObject
Loading required package: sp
The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
which was just loaded, will retire in October 2023.
Please refer to R-spatial evolution reports for details, especially
https://r-spatial.org/r/2023/05/15/evolution4.html.
It may be desirable to make the sf package available;
package maintainers should consider adding sf to Suggests:.
The sp package is now running under evolution status 2
(status 2 uses the sf package in place of rgdal)
Attaching package: 'SeuratObject'
The following objects are masked from 'package:base':
intersect, saveRDS
Loading Seurat v5 beta version
To maintain compatibility with previous workflows, new Seurat objects will use the previous object structure by default
To use new Seurat v5 assays: Please run: options(Seurat.object.assay.version = 'v5')
library(SeuratObject)
dir.10x <- "D:/Book/T2DM/Data/10x/"
# 这里 features.tsv.gz 替换成原始的基因文件,在这里我把文件解压为 feature.tsv
## read_10x()
genes <- read.table(paste0(dir.10x, "feature.tsv"), stringsAsFactors = F, header = F)$V2
genes <- make.unique(genes, sep = ".")
barcodes <- readLines(paste0(dir.10x, "barcodes.tsv"))
mtx <- Matrix::readMM(paste0(dir.10x, "matrix.mtx"))
mtx <- as(mtx, "dgCMatrix")'as(<dgTMatrix>, "dgCMatrix")' is deprecated.
Use 'as(., "CsparseMatrix")' instead.
See help("Deprecated") and help("Matrix-deprecated").
colnames(mtx) <- barcodes
rownames(mtx) <- genes
obj <- CreateSeuratObject(
counts = mtx,
# 传入 meta.data
meta.data = meta,
project = "Pancreatic islet cells"
)查看新创建的SeuratObject:
objAn object of class Seurat
33540 features across 23845 samples within 1 assay
Active assay: RNA (33540 features, 0 variable features)
2 layers present: counts, data
head(obj@meta.data) orig.ident X Condition Donor
B1_AAACCTGAGACTAAGT B1 B1_AAACCTGAGACTAAGT Glucose+Palmitate D1
B1_AAACCTGAGCAGGCTA B1 B1_AAACCTGAGCAGGCTA Glucose+Palmitate D1
B1_AAACGGGGTTTCGCTC B1 B1_AAACGGGGTTTCGCTC Glucose+Palmitate D1
B1_AAAGATGCACATAACC B1 B1_AAAGATGCACATAACC Glucose+Palmitate D1
B1_AAAGATGCACCTTGTC B1 B1_AAAGATGCACCTTGTC Glucose+Palmitate D1
B1_AAAGATGTCGTCTGCT B1 B1_AAAGATGTCGTCTGCT Glucose+Palmitate D1
Timepoint seurat_clusters CTclass Celltype nCount_RNA
B1_AAACCTGAGACTAAGT 24h 23 Exocrine Duct 6161
B1_AAACCTGAGCAGGCTA 24h 16 Exocrine Duct 11882
B1_AAACGGGGTTTCGCTC 24h 7 Endocrine Beta 23081
B1_AAAGATGCACATAACC 24h 28 Exocrine Duct 4864
B1_AAAGATGCACCTTGTC 24h 20 Endocrine Alpha 10304
B1_AAAGATGTCGTCTGCT 24h 23 Exocrine Duct 9934
nFeature_RNA
B1_AAACCTGAGACTAAGT 2103
B1_AAACCTGAGCAGGCTA 2760
B1_AAACGGGGTTTCGCTC 3569
B1_AAAGATGCACATAACC 1544
B1_AAAGATGCACCTTGTC 1630
B1_AAAGATGTCGTCTGCT 2399
需要注意各个变量中长度的对齐。
mtx 为 33540 x 23845 的矩阵。其中基因 33540 个,细胞 23845 个。
head(mtx)6 x 23845 sparse Matrix of class "dgCMatrix"
[[ suppressing 34 column names 'B1_AAACCTGAGACTAAGT', 'B1_AAACCTGAGCAGGCTA', 'B1_AAACGGGGTTTCGCTC' ... ]]
MIR1302-2HG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
FAM138A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
OR4F5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
AL627309.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
AL627309.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
AL627309.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
MIR1302-2HG ......
FAM138A ......
OR4F5 ......
AL627309.1 ......
AL627309.3 ......
AL627309.2 ......
.....suppressing 23811 columns in show(); maybe adjust 'options(max.print= *, width = *)'
..............................
dim(mtx)[1] 33540 23845
我们在传入基因和 metadata 的时候,需要保证他们的数量与 mtx 是一致的。
length(genes)[1] 33540
dim(meta)[1] 23845 7
dim(mtx)[1] 33540 23845
devtools::session_info()─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.3.1 (2023-06-16 ucrt)
os Windows 11 x64 (build 22621)
system x86_64, mingw32
ui RTerm
language (EN)
collate Chinese (Simplified)_China.utf8
ctype Chinese (Simplified)_China.utf8
tz Asia/Hong_Kong
date 2023-11-05
pandoc 3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
abind 1.4-5 2016-07-21 [1] CRAN (R 4.3.0)
cachem 1.0.8 2023-05-01 [1] CRAN (R 4.3.1)
callr 3.7.3 2022-11-02 [1] CRAN (R 4.3.1)
cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.1)
cluster 2.1.4 2022-08-22 [2] CRAN (R 4.3.1)
codetools 0.2-19 2023-02-01 [2] CRAN (R 4.3.1)
colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.1)
cowplot 1.1.1 2020-12-30 [1] CRAN (R 4.3.1)
crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.1)
data.table 1.14.8 2023-02-17 [1] CRAN (R 4.3.1)
deldir 1.0-9 2023-05-17 [1] CRAN (R 4.3.0)
devtools 2.4.5 2022-10-11 [1] CRAN (R 4.3.1)
digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.1)
dotCall64 1.0-2 2022-10-03 [1] CRAN (R 4.3.1)
dplyr 1.1.2 2023-04-20 [1] CRAN (R 4.3.1)
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.3.1)
evaluate 0.21 2023-05-05 [1] CRAN (R 4.3.1)
fansi 1.0.4 2023-01-22 [1] CRAN (R 4.3.1)
fastDummies 1.7.3 2023-07-06 [1] CRAN (R 4.3.1)
fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.1)
fitdistrplus 1.1-11 2023-04-25 [1] CRAN (R 4.3.1)
fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.1)
future 1.33.0 2023-07-01 [1] CRAN (R 4.3.1)
future.apply 1.11.0 2023-05-21 [1] CRAN (R 4.3.1)
generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.1)
ggplot2 3.4.2 2023-04-03 [1] CRAN (R 4.3.1)
ggrepel 0.9.3 2023-02-03 [1] CRAN (R 4.3.1)
ggridges 0.5.4 2022-09-26 [1] CRAN (R 4.3.1)
globals 0.16.2 2022-11-21 [1] CRAN (R 4.3.0)
glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.1)
goftest 1.2-3 2021-10-07 [1] CRAN (R 4.3.0)
gridExtra 2.3 2017-09-09 [1] CRAN (R 4.3.1)
gtable 0.3.3 2023-03-21 [1] CRAN (R 4.3.1)
htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.3.1)
htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.3.1)
httpuv 1.6.11 2023-05-11 [1] CRAN (R 4.3.1)
httr 1.4.6 2023-05-08 [1] CRAN (R 4.3.1)
ica 1.0-3 2022-07-08 [1] CRAN (R 4.3.0)
igraph 1.5.0 2023-06-16 [1] CRAN (R 4.3.1)
irlba 2.3.5.1 2022-10-03 [1] CRAN (R 4.3.1)
jsonlite 1.8.7 2023-06-29 [1] CRAN (R 4.3.1)
KernSmooth 2.23-21 2023-05-03 [2] CRAN (R 4.3.1)
knitr 1.43 2023-05-25 [1] CRAN (R 4.3.1)
later 1.3.1 2023-05-02 [1] CRAN (R 4.3.1)
lattice 0.21-8 2023-04-05 [2] CRAN (R 4.3.1)
lazyeval 0.2.2 2019-03-15 [1] CRAN (R 4.3.1)
leiden 0.4.3 2022-09-10 [1] CRAN (R 4.3.1)
lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.1)
listenv 0.9.0 2022-12-16 [1] CRAN (R 4.3.1)
lmtest 0.9-40 2022-03-21 [1] CRAN (R 4.3.1)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.1)
MASS 7.3-60 2023-05-04 [2] CRAN (R 4.3.1)
Matrix 1.5-4.1 2023-05-18 [2] CRAN (R 4.3.1)
matrixStats 1.0.0 2023-06-02 [1] CRAN (R 4.3.1)
memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.1)
mime 0.12 2021-09-28 [1] CRAN (R 4.3.0)
miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.3.1)
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.3.1)
nlme 3.1-162 2023-01-31 [2] CRAN (R 4.3.1)
parallelly 1.36.0 2023-05-26 [1] CRAN (R 4.3.0)
patchwork 1.1.2 2022-08-19 [1] CRAN (R 4.3.1)
pbapply 1.7-2 2023-06-27 [1] CRAN (R 4.3.1)
pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.1)
pkgbuild 1.4.2 2023-06-26 [1] CRAN (R 4.3.1)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.1)
pkgload 1.3.2.1 2023-07-08 [1] CRAN (R 4.3.1)
plotly 4.10.2 2023-06-03 [1] CRAN (R 4.3.1)
plyr 1.8.8 2022-11-11 [1] CRAN (R 4.3.1)
png 0.1-8 2022-11-29 [1] CRAN (R 4.3.0)
polyclip 1.10-4 2022-10-20 [1] CRAN (R 4.3.0)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.3.1)
processx 3.8.2 2023-06-30 [1] CRAN (R 4.3.1)
profvis 0.3.8 2023-05-02 [1] CRAN (R 4.3.1)
progressr 0.13.0 2023-01-10 [1] CRAN (R 4.3.1)
promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.3.1)
ps 1.7.5 2023-04-18 [1] CRAN (R 4.3.1)
purrr 1.0.1 2023-01-10 [1] CRAN (R 4.3.1)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.1)
RANN 2.6.1 2019-01-08 [1] CRAN (R 4.3.1)
RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.3.0)
Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.3.1)
RcppAnnoy 0.0.21 2023-07-02 [1] CRAN (R 4.3.1)
RcppHNSW 0.4.1 2022-07-18 [1] CRAN (R 4.3.1)
remotes 2.4.2.1 2023-07-18 [1] CRAN (R 4.3.1)
reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.3.1)
reticulate 1.30 2023-06-09 [1] CRAN (R 4.3.1)
rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.1)
rmarkdown 2.23 2023-07-01 [1] CRAN (R 4.3.1)
ROCR 1.0-11 2020-05-02 [1] CRAN (R 4.3.1)
RSpectra 0.16-1 2022-04-24 [1] CRAN (R 4.3.1)
rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.1)
Rtsne 0.16 2022-04-17 [1] CRAN (R 4.3.1)
scales 1.2.1 2022-08-20 [1] CRAN (R 4.3.1)
scattermore 1.2 2023-06-12 [1] CRAN (R 4.3.1)
sctransform 0.3.5 2022-09-21 [1] CRAN (R 4.3.1)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.1)
Seurat * 4.9.9.9058 2023-07-23 [1] local
SeuratObject * 4.9.9.9091 2023-07-23 [1] Github (mojaveazure/seurat-object@c51dd86)
shiny 1.7.4.1 2023-07-06 [1] CRAN (R 4.3.1)
sp * 2.0-0 2023-06-22 [1] CRAN (R 4.3.1)
spam 2.9-1 2022-08-07 [1] CRAN (R 4.3.1)
spatstat.data 3.0-1 2023-03-12 [1] CRAN (R 4.3.1)
spatstat.explore 3.2-1 2023-05-13 [1] CRAN (R 4.3.1)
spatstat.geom 3.2-4 2023-07-20 [1] CRAN (R 4.3.1)
spatstat.random 3.1-5 2023-05-11 [1] CRAN (R 4.3.1)
spatstat.sparse 3.0-2 2023-06-25 [1] CRAN (R 4.3.1)
spatstat.utils 3.0-3 2023-05-09 [1] CRAN (R 4.3.1)
stringi 1.7.12 2023-01-11 [1] CRAN (R 4.3.0)
stringr 1.5.0 2022-12-02 [1] CRAN (R 4.3.1)
survival 3.5-5 2023-03-12 [2] CRAN (R 4.3.1)
tensor 1.5 2012-05-05 [1] CRAN (R 4.3.0)
tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.1)
tidyr 1.3.0 2023-01-24 [1] CRAN (R 4.3.1)
tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.1)
urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.3.1)
usethis 2.2.2 2023-07-06 [1] CRAN (R 4.3.1)
utf8 1.2.3 2023-01-31 [1] CRAN (R 4.3.1)
uwot 0.1.16 2023-06-29 [1] CRAN (R 4.3.1)
vctrs 0.6.3 2023-06-14 [1] CRAN (R 4.3.1)
viridisLite 0.4.2 2023-05-02 [1] CRAN (R 4.3.1)
xfun 0.39 2023-04-20 [1] CRAN (R 4.3.1)
xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.1)
yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.0)
zoo 1.8-12 2023-04-13 [1] CRAN (R 4.3.1)
[1] C:/Users/Han Wang/AppData/Local/R/win-library/4.3
[2] C:/Program Files/R/R-4.3.1/library
──────────────────────────────────────────────────────────────────────────────