Input data format

SPARKLE supports multiple input formats. Users can directly input single-cell Seurat objects,Seurat metadata, or dataframes with calculated cell proportions.

Input formats 1: Seurat objects

library(SPARKLE)

seurat_object <-MNP.seurat_object
class(seurat_object)

## [1] "Seurat"
## attr(,"package")
## [1] "SeuratObject"

sparkle.data <- cwas_build_model_data(inputdata = seurat_object,Sample = "Patient.No",Phenotype = "Status",Celltype ="Clusters",Group ="Tissue",Subgroup = "Study.No",Control_label = "Healthy",Disease_label = "Cancer")

## [1] "Warning: No Covariate1 infomation"
## [1] "Warning: No Covariate2 infomation"
## [1] "No gene infomation added"
## [1] "No geneset score infomation added"

Input formats 2: Seurat metadata

metadata <-MNP.metadata 
knitr::kable(head(metadata), caption = "Seurat metadata")

Seurat metadata
	orig.ident	nCount_RNA	nFeature_RNA	Initial.No	Study.No	Ascites.1_Blood.2_Breast.3_Colon.4_Stomach.5_Kidney.6_Liver.7_Lung.8_Pancreas.9_Skin.10_Spleen.11_Tonsil.12	Patient.No	MirgDC.1_DC1.2_DC2.3_MacroMono.4	Study	Tissue	Status	Clusters.NO	Clusters	nCount_integrated	nFeature_integrated
ACTTGTTCACCGCTAG.1_1	SeuratProject	1628.662	719	1	1	1	1	3	Tang-Huau	Ascites	Healthy	6	T cell doublets	337.7403	1387
ATTGGACCATGTCCTC.1_1	SeuratProject	1028.799	342	2	1	1	1	3	Tang-Huau	Ascites	Healthy	6	T cell doublets	262.4473	1390
CCTTACGTCCCTGACT.1_1	SeuratProject	1779.581	959	3	1	1	1	3	Tang-Huau	Ascites	Healthy	6	T cell doublets	332.5596	1415
TGTTCCGCAGAAGCAC.1_1	SeuratProject	1493.055	578	4	1	1	1	1	Tang-Huau	Ascites	Healthy	6	T cell doublets	350.6807	1364
TTGACTTGTACTCAAC.1_1	SeuratProject	1555.572	806	5	1	1	1	1	Tang-Huau	Ascites	Healthy	6	T cell doublets	320.1611	1435
AACTCAGGTAAGTGGC.2_1	SeuratProject	1460.183	767	6	1	1	1	3	Tang-Huau	Ascites	Healthy	6	T cell doublets	334.7708	1478

sparkle.data <- cwas_build_model_data(inputdata = metadata,Sample = "Patient.No",Phenotype = "Status",Celltype ="Clusters",Group ="Tissue",Subgroup = "Study.No",Control_label = "Healthy",Disease_label = "Cancer")

## [1] "Warning: No Covariate1 infomation"
## [1] "Warning: No Covariate2 infomation"
## [1] "No gene infomation added"
## [1] "No geneset score infomation added"

Input formats 3: data frame with cell rate information

covid.data.ratio <-SPARKLE::covid.data 

knitr::kable(head(covid.data.ratio) , caption = "Cell rate information table")

Cell rate information table
sampleID	celltype	cellratio	dataset	tissue	severity
S-HC001	NK	0.0050188	Batch08	CD3+ T cell and CD19+ B cell sorted from fresh PBMC (FACS)	control
S-HC001	CD8	0.4382343	Batch08	CD3+ T cell and CD19+ B cell sorted from fresh PBMC (FACS)	control
S-HC001	B	0.0929622	Batch08	CD3+ T cell and CD19+ B cell sorted from fresh PBMC (FACS)	control
S-HC001	Mega	0.0018250	Batch08	CD3+ T cell and CD19+ B cell sorted from fresh PBMC (FACS)	control
S-HC001	Plasma	0.0007984	Batch08	CD3+ T cell and CD19+ B cell sorted from fresh PBMC (FACS)	control
S-HC001	CD4	0.4458766	Batch08	CD3+ T cell and CD19+ B cell sorted from fresh PBMC (FACS)	control

sparkle.data <- cwas_build_model_data(inputdata = covid.data.ratio,Sample = "sampleID",Phenotype = "severity",Celltype ="celltype",Group ="tissue",Subgroup = "dataset",Control_label = "control",Disease_label = "severe/critical",Cellrate ="cellratio" )

## [1] "Warning: No Covariate1 infomation"
## [1] "Warning: No Covariate2 infomation"
## [1] "No gene infomation added"
## [1] "No geneset score infomation added"

Function Parameters

cwas_build_model_data() is used to convert the inputdata

inputdata: This parameter accepts either a Seurat object or a metadata dataframe. The input should contain information about the samples, phenotypes, cell types, and potentially additional variables that might be relevant for the analysis.
Sample: Specifies the column name in the input data that identifies different samples. In the example, “Patient.No” is used to indicate patient identifiers.
Phenotype: Specifies the column name in the input data that contains the phenotype information (e.g., healthy vs. diseased). In the example, “Status” is used to indicate whether the sample is from a healthy or diseased individual.
Celltype: Specifies the column name in the input data that contains information about cell types. In the example, “celltype” is used to indicate different cell clusters.
Group (Optional): Specifies the column name for group information in the input data. This can be used to indicate broader categories or groups the samples belong to, such as different tissues. In the example, “Tissue” is provided.
Subgroup (Optional): Specifies the column name for subgroup information in the input data. This might be used for more granular categorizations within groups. In the example, “Study.No” is provided.
Covariate1 (Optional): Specifies the column name for the first covariate that might affect the analysis. This is not used in the provided example.
Covariate2 (Optional): Specifies the column name for the second covariate. This is also not used in the provided example.
selected_celltype (Optional): A vector of specific cell types to include in the analysis. This allows for focusing on particular cell types if needed. This is not specified in the example.
Control_label: The label used to identify control samples. In the example, “Healthy” is used to denote control samples.
Disease_label: The label used to identify disease samples. In the example, “Cancer” is used to denote disease samples.
Cellrate (Optional): Specifies the column name for precomputed cell rates if such data is available. This parameter is not used in the provided example.