Input data format
SPARKLE supports multiple input formats. Users can directly input single-cell Seurat objects,Seurat metadata, or dataframes with calculated cell proportions.
Input formats 1: Seurat objects
library(SPARKLE)
seurat_object <-MNP.seurat_object
class(seurat_object)## [1] "Seurat"
## attr(,"package")
## [1] "SeuratObject"
sparkle.data <- cwas_build_model_data(inputdata = seurat_object,Sample = "Patient.No",Phenotype = "Status",Celltype ="Clusters",Group ="Tissue",Subgroup = "Study.No",Control_label = "Healthy",Disease_label = "Cancer") ## [1] "Warning: No Covariate1 infomation"
## [1] "Warning: No Covariate2 infomation"
## [1] "No gene infomation added"
## [1] "No geneset score infomation added"
Input formats 2: Seurat metadata
metadata <-MNP.metadata
knitr::kable(head(metadata), caption = "Seurat metadata") | orig.ident | nCount_RNA | nFeature_RNA | Initial.No | Study.No | Ascites.1_Blood.2_Breast.3_Colon.4_Stomach.5_Kidney.6_Liver.7_Lung.8_Pancreas.9_Skin.10_Spleen.11_Tonsil.12 | Patient.No | MirgDC.1_DC1.2_DC2.3_MacroMono.4 | Study | Tissue | Status | Clusters.NO | Clusters | nCount_integrated | nFeature_integrated | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ACTTGTTCACCGCTAG.1_1 | SeuratProject | 1628.662 | 719 | 1 | 1 | 1 | 1 | 3 | Tang-Huau | Ascites | Healthy | 6 | T cell doublets | 337.7403 | 1387 |
| ATTGGACCATGTCCTC.1_1 | SeuratProject | 1028.799 | 342 | 2 | 1 | 1 | 1 | 3 | Tang-Huau | Ascites | Healthy | 6 | T cell doublets | 262.4473 | 1390 |
| CCTTACGTCCCTGACT.1_1 | SeuratProject | 1779.581 | 959 | 3 | 1 | 1 | 1 | 3 | Tang-Huau | Ascites | Healthy | 6 | T cell doublets | 332.5596 | 1415 |
| TGTTCCGCAGAAGCAC.1_1 | SeuratProject | 1493.055 | 578 | 4 | 1 | 1 | 1 | 1 | Tang-Huau | Ascites | Healthy | 6 | T cell doublets | 350.6807 | 1364 |
| TTGACTTGTACTCAAC.1_1 | SeuratProject | 1555.572 | 806 | 5 | 1 | 1 | 1 | 1 | Tang-Huau | Ascites | Healthy | 6 | T cell doublets | 320.1611 | 1435 |
| AACTCAGGTAAGTGGC.2_1 | SeuratProject | 1460.183 | 767 | 6 | 1 | 1 | 1 | 3 | Tang-Huau | Ascites | Healthy | 6 | T cell doublets | 334.7708 | 1478 |
sparkle.data <- cwas_build_model_data(inputdata = metadata,Sample = "Patient.No",Phenotype = "Status",Celltype ="Clusters",Group ="Tissue",Subgroup = "Study.No",Control_label = "Healthy",Disease_label = "Cancer")## [1] "Warning: No Covariate1 infomation"
## [1] "Warning: No Covariate2 infomation"
## [1] "No gene infomation added"
## [1] "No geneset score infomation added"
Input formats 3: data frame with cell rate information
covid.data.ratio <-SPARKLE::covid.data
knitr::kable(head(covid.data.ratio) , caption = "Cell rate information table") | sampleID | celltype | cellratio | dataset | tissue | severity |
|---|---|---|---|---|---|
| S-HC001 | NK | 0.0050188 | Batch08 | CD3+ T cell and CD19+ B cell sorted from fresh PBMC (FACS) | control |
| S-HC001 | CD8 | 0.4382343 | Batch08 | CD3+ T cell and CD19+ B cell sorted from fresh PBMC (FACS) | control |
| S-HC001 | B | 0.0929622 | Batch08 | CD3+ T cell and CD19+ B cell sorted from fresh PBMC (FACS) | control |
| S-HC001 | Mega | 0.0018250 | Batch08 | CD3+ T cell and CD19+ B cell sorted from fresh PBMC (FACS) | control |
| S-HC001 | Plasma | 0.0007984 | Batch08 | CD3+ T cell and CD19+ B cell sorted from fresh PBMC (FACS) | control |
| S-HC001 | CD4 | 0.4458766 | Batch08 | CD3+ T cell and CD19+ B cell sorted from fresh PBMC (FACS) | control |
sparkle.data <- cwas_build_model_data(inputdata = covid.data.ratio,Sample = "sampleID",Phenotype = "severity",Celltype ="celltype",Group ="tissue",Subgroup = "dataset",Control_label = "control",Disease_label = "severe/critical",Cellrate ="cellratio" )## [1] "Warning: No Covariate1 infomation"
## [1] "Warning: No Covariate2 infomation"
## [1] "No gene infomation added"
## [1] "No geneset score infomation added"
Function Parameters
cwas_build_model_data() is used to convert the inputdata
inputdata: This parameter accepts either a Seurat object or a metadata dataframe. The input should contain information about the samples, phenotypes, cell types, and potentially additional variables that might be relevant for the analysis.
Sample: Specifies the column name in the input data that identifies different samples. In the example, “Patient.No” is used to indicate patient identifiers.
Phenotype: Specifies the column name in the input data that contains the phenotype information (e.g., healthy vs. diseased). In the example, “Status” is used to indicate whether the sample is from a healthy or diseased individual.
Celltype: Specifies the column name in the input data that contains information about cell types. In the example, “celltype” is used to indicate different cell clusters.
Group (Optional): Specifies the column name for group information in the input data. This can be used to indicate broader categories or groups the samples belong to, such as different tissues. In the example, “Tissue” is provided.
Subgroup (Optional): Specifies the column name for subgroup information in the input data. This might be used for more granular categorizations within groups. In the example, “Study.No” is provided.
Covariate1 (Optional): Specifies the column name for the first covariate that might affect the analysis. This is not used in the provided example.
Covariate2 (Optional): Specifies the column name for the second covariate. This is also not used in the provided example.
selected_celltype (Optional): A vector of specific cell types to include in the analysis. This allows for focusing on particular cell types if needed. This is not specified in the example.
Control_label: The label used to identify control samples. In the example, “Healthy” is used to denote control samples.
Disease_label: The label used to identify disease samples. In the example, “Cancer” is used to denote disease samples.
Cellrate (Optional): Specifies the column name for precomputed cell rates if such data is available. This parameter is not used in the provided example.