Input data format

SPARKLE supports multiple input formats. Users can directly input single-cell Seurat objects,Seurat metadata, or dataframes with calculated cell proportions.

Input formats 1: Seurat objects

library(SPARKLE)

seurat_object <-MNP.seurat_object
class(seurat_object)
## [1] "Seurat"
## attr(,"package")
## [1] "SeuratObject"
sparkle.data <- cwas_build_model_data(inputdata = seurat_object,Sample = "Patient.No",Phenotype = "Status",Celltype ="Clusters",Group ="Tissue",Subgroup = "Study.No",Control_label = "Healthy",Disease_label = "Cancer")  
## [1] "Warning: No Covariate1 infomation"
## [1] "Warning: No Covariate2 infomation"
## [1] "No gene infomation added"
## [1] "No geneset score infomation added"

Input formats 2: Seurat metadata

metadata <-MNP.metadata 
knitr::kable(head(metadata), caption = "Seurat metadata") 
Seurat metadata
orig.ident nCount_RNA nFeature_RNA Initial.No Study.No Ascites.1_Blood.2_Breast.3_Colon.4_Stomach.5_Kidney.6_Liver.7_Lung.8_Pancreas.9_Skin.10_Spleen.11_Tonsil.12 Patient.No MirgDC.1_DC1.2_DC2.3_MacroMono.4 Study Tissue Status Clusters.NO Clusters nCount_integrated nFeature_integrated
ACTTGTTCACCGCTAG.1_1 SeuratProject 1628.662 719 1 1 1 1 3 Tang-Huau Ascites Healthy 6 T cell doublets 337.7403 1387
ATTGGACCATGTCCTC.1_1 SeuratProject 1028.799 342 2 1 1 1 3 Tang-Huau Ascites Healthy 6 T cell doublets 262.4473 1390
CCTTACGTCCCTGACT.1_1 SeuratProject 1779.581 959 3 1 1 1 3 Tang-Huau Ascites Healthy 6 T cell doublets 332.5596 1415
TGTTCCGCAGAAGCAC.1_1 SeuratProject 1493.055 578 4 1 1 1 1 Tang-Huau Ascites Healthy 6 T cell doublets 350.6807 1364
TTGACTTGTACTCAAC.1_1 SeuratProject 1555.572 806 5 1 1 1 1 Tang-Huau Ascites Healthy 6 T cell doublets 320.1611 1435
AACTCAGGTAAGTGGC.2_1 SeuratProject 1460.183 767 6 1 1 1 3 Tang-Huau Ascites Healthy 6 T cell doublets 334.7708 1478
sparkle.data <- cwas_build_model_data(inputdata = metadata,Sample = "Patient.No",Phenotype = "Status",Celltype ="Clusters",Group ="Tissue",Subgroup = "Study.No",Control_label = "Healthy",Disease_label = "Cancer")
## [1] "Warning: No Covariate1 infomation"
## [1] "Warning: No Covariate2 infomation"
## [1] "No gene infomation added"
## [1] "No geneset score infomation added"

Input formats 3: data frame with cell rate information

covid.data.ratio <-SPARKLE::covid.data 

knitr::kable(head(covid.data.ratio) , caption = "Cell rate information table") 
Cell rate information table
sampleID celltype cellratio dataset tissue severity
S-HC001 NK 0.0050188 Batch08 CD3+ T cell and CD19+ B cell sorted from fresh PBMC (FACS) control
S-HC001 CD8 0.4382343 Batch08 CD3+ T cell and CD19+ B cell sorted from fresh PBMC (FACS) control
S-HC001 B 0.0929622 Batch08 CD3+ T cell and CD19+ B cell sorted from fresh PBMC (FACS) control
S-HC001 Mega 0.0018250 Batch08 CD3+ T cell and CD19+ B cell sorted from fresh PBMC (FACS) control
S-HC001 Plasma 0.0007984 Batch08 CD3+ T cell and CD19+ B cell sorted from fresh PBMC (FACS) control
S-HC001 CD4 0.4458766 Batch08 CD3+ T cell and CD19+ B cell sorted from fresh PBMC (FACS) control
sparkle.data <- cwas_build_model_data(inputdata = covid.data.ratio,Sample = "sampleID",Phenotype = "severity",Celltype ="celltype",Group ="tissue",Subgroup = "dataset",Control_label = "control",Disease_label = "severe/critical",Cellrate ="cellratio" )
## [1] "Warning: No Covariate1 infomation"
## [1] "Warning: No Covariate2 infomation"
## [1] "No gene infomation added"
## [1] "No geneset score infomation added"

Function Parameters

cwas_build_model_data() is used to convert the inputdata

  • inputdata: This parameter accepts either a Seurat object or a metadata dataframe. The input should contain information about the samples, phenotypes, cell types, and potentially additional variables that might be relevant for the analysis.

  • Sample: Specifies the column name in the input data that identifies different samples. In the example, “Patient.No” is used to indicate patient identifiers.

  • Phenotype: Specifies the column name in the input data that contains the phenotype information (e.g., healthy vs. diseased). In the example, “Status” is used to indicate whether the sample is from a healthy or diseased individual.

  • Celltype: Specifies the column name in the input data that contains information about cell types. In the example, “celltype” is used to indicate different cell clusters.

  • Group (Optional): Specifies the column name for group information in the input data. This can be used to indicate broader categories or groups the samples belong to, such as different tissues. In the example, “Tissue” is provided.

  • Subgroup (Optional): Specifies the column name for subgroup information in the input data. This might be used for more granular categorizations within groups. In the example, “Study.No” is provided.

  • Covariate1 (Optional): Specifies the column name for the first covariate that might affect the analysis. This is not used in the provided example.

  • Covariate2 (Optional): Specifies the column name for the second covariate. This is also not used in the provided example.

  • selected_celltype (Optional): A vector of specific cell types to include in the analysis. This allows for focusing on particular cell types if needed. This is not specified in the example.

  • Control_label: The label used to identify control samples. In the example, “Healthy” is used to denote control samples.

  • Disease_label: The label used to identify disease samples. In the example, “Cancer” is used to denote disease samples.

  • Cellrate (Optional): Specifies the column name for precomputed cell rates if such data is available. This parameter is not used in the provided example.