In this project we add in our last work on finding the top genes to perform the best in machine learning of supervised algorithm, randomForest, using the top 10 fold change variation of complete cases that were found in the top 2,000 genes of highest variability in order to run PCA as unsupervised machine learning with Seurat library but not the top genes in best principal components or PCs of the PCA of that project just to clarify. They are labeled PCA but they weren’t the genes found from best PCs of the analysis. Both were found to perform imperfect on the training set but worse with Seurat high variability but both did perfect on predicting the class as healthy or pathology of Natural Killer T-Cell Lymphoma (NKTCL) with EBV.

Now we will add in these genes of both sets to our pathology data base. We will use this database to predict multiple classes of pathology or healthy with supervised machine learning once we get all the Epstein-Barr Viral (EBV) associated pathology analysis’ and compare to our other pathologies of unrelated Lyme disease but from a viral vector, fibromyalgia that may possibly be associated with EBV, and other associated pathologies that other medical professionals and studies have said have a low to known association with EBV infection at some point in the patient’s life span before diagnosis of their associated EBV disease which are Burkett’s Lymphoma, Hodgkin’s Lymphoma, nasopharyngeal carcinoma, mononucleosis, multiple sclerosis, and primary EBV infection.

Lets recall the study that we are adding to our data base because we will need to add in the type of cell media like peripheral blood mononuclear cells or PBMCs and the background procedures and methods of that study.

seriesInfoDesign <-read.csv("GSE318371-GPL34284_series_matrix.txt", sep='\t', nrows=50,stringsAsFactors = T,strip.white=T,na.strings=" ", ncol(32), skip=25, header=F)

dim(seriesInfoDesign)
## [1] 42 30

I looked at the data within the source code panel viewer of Rstudio and selected the rows of interest in describing this study, GSE318371 on National Center for Bioinformatics Information or NCBI site.

seriesInfoDesign[c(10,11,13,14,15,18),]
##                              V1
## 10  !Sample_characteristics_ch1
## 11  !Sample_characteristics_ch1
## 13         !Sample_molecule_ch1
## 14 !Sample_extract_protocol_ch1
## 15 !Sample_extract_protocol_ch1
## 18      !Sample_data_processing
##                                                                                                                                                                                                                                                                                                                                                                                                                                             V2
## 10                                                                                                                                                                                                                                                                                                                                                                                                                             "tissue: blood"
## 11                                                                                                                                                                                                                                                                                                                                                                                                                          "cell line: PBMCs"
## 13                                                                                                                                                                                                                                                                                                                                                                                                                                 "total RNA"
## 14                                                                                                                                                                                                                                                                                                                       "Isolated PBMCs were loaded into a 10× Chromium Chip (v3.1 PN:1000120) and barcoded using a 10x Chromium Controller."
## 15                                                                                                                                                                                                                         "RNA from the barcoded cells was then reverse-transcribed, amplified, and prepared into sequencing libraries with the 10× Library Construction Kit (v3.1 PN:1000190) according to the manufacturer’s instructions."
## 18 "Raw scRNA-seq data were initially pre-processed using CellRanger (version 8.0.1, 10x Genomics) to align reads to the human genome (GRCh38, 2024-A from 10x Genomics) and count the unique molecular identifiers (UMIs) for each gene to generate specific gene cell count tables. For each scRNA-seq sample, the count tables were filtered to retain the genes detected in at least 10 cells and cells with a minimum gene count of 300."
##                                                                                                                                                                                                                                                                                                                                                                                                                                             V3
## 10                                                                                                                                                                                                                                                                                                                                                                                                                             "tissue: blood"
## 11                                                                                                                                                                                                                                                                                                                                                                                                                          "cell line: PBMCs"
## 13                                                                                                                                                                                                                                                                                                                                                                                                                                 "total RNA"
## 14                                                                                                                                                                                                                                                                                                                       "Isolated PBMCs were loaded into a 10× Chromium Chip (v3.1 PN:1000120) and barcoded using a 10x Chromium Controller."
## 15                                                                                                                                                                                                                         "RNA from the barcoded cells was then reverse-transcribed, amplified, and prepared into sequencing libraries with the 10× Library Construction Kit (v3.1 PN:1000190) according to the manufacturer’s instructions."
## 18 "Raw scRNA-seq data were initially pre-processed using CellRanger (version 8.0.1, 10x Genomics) to align reads to the human genome (GRCh38, 2024-A from 10x Genomics) and count the unique molecular identifiers (UMIs) for each gene to generate specific gene cell count tables. For each scRNA-seq sample, the count tables were filtered to retain the genes detected in at least 10 cells and cells with a minimum gene count of 300."
##                                                                                                                                                                                                                                                                                                                                                                                                                                             V4
## 10                                                                                                                                                                                                                                                                                                                                                                                                                             "tissue: blood"
## 11                                                                                                                                                                                                                                                                                                                                                                                                                          "cell line: PBMCs"
## 13                                                                                                                                                                                                                                                                                                                                                                                                                                 "total RNA"
## 14                                                                                                                                                                                                                                                                                                                       "Isolated PBMCs were loaded into a 10× Chromium Chip (v3.1 PN:1000120) and barcoded using a 10x Chromium Controller."
## 15                                                                                                                                                                                                                         "RNA from the barcoded cells was then reverse-transcribed, amplified, and prepared into sequencing libraries with the 10× Library Construction Kit (v3.1 PN:1000190) according to the manufacturer’s instructions."
## 18 "Raw scRNA-seq data were initially pre-processed using CellRanger (version 8.0.1, 10x Genomics) to align reads to the human genome (GRCh38, 2024-A from 10x Genomics) and count the unique molecular identifiers (UMIs) for each gene to generate specific gene cell count tables. For each scRNA-seq sample, the count tables were filtered to retain the genes detected in at least 10 cells and cells with a minimum gene count of 300."
##                                                                                                                                                                                                                                                                                                                                                                                                                                             V5
## 10                                                                                                                                                                                                                                                                                                                                                                                                                             "tissue: blood"
## 11                                                                                                                                                                                                                                                                                                                                                                                                                          "cell line: PBMCs"
## 13                                                                                                                                                                                                                                                                                                                                                                                                                                 "total RNA"
## 14                                                                                                                                                                                                                                                                                                                       "Isolated PBMCs were loaded into a 10× Chromium Chip (v3.1 PN:1000120) and barcoded using a 10x Chromium Controller."
## 15                                                                                                                                                                                                                         "RNA from the barcoded cells was then reverse-transcribed, amplified, and prepared into sequencing libraries with the 10× Library Construction Kit (v3.1 PN:1000190) according to the manufacturer’s instructions."
## 18 "Raw scRNA-seq data were initially pre-processed using CellRanger (version 8.0.1, 10x Genomics) to align reads to the human genome (GRCh38, 2024-A from 10x Genomics) and count the unique molecular identifiers (UMIs) for each gene to generate specific gene cell count tables. For each scRNA-seq sample, the count tables were filtered to retain the genes detected in at least 10 cells and cells with a minimum gene count of 300."
##                                                                                                                                                                                                                                                                                                                                                                                                                                             V6
## 10                                                                                                                                                                                                                                                                                                                                                                                                                             "tissue: blood"
## 11                                                                                                                                                                                                                                                                                                                                                                                                                          "cell line: PBMCs"
## 13                                                                                                                                                                                                                                                                                                                                                                                                                                 "total RNA"
## 14                                                                                                                                                                                                                                                                                                                       "Isolated PBMCs were loaded into a 10× Chromium Chip (v3.1 PN:1000120) and barcoded using a 10x Chromium Controller."
## 15                                                                                                                                                                                                                         "RNA from the barcoded cells was then reverse-transcribed, amplified, and prepared into sequencing libraries with the 10× Library Construction Kit (v3.1 PN:1000190) according to the manufacturer’s instructions."
## 18 "Raw scRNA-seq data were initially pre-processed using CellRanger (version 8.0.1, 10x Genomics) to align reads to the human genome (GRCh38, 2024-A from 10x Genomics) and count the unique molecular identifiers (UMIs) for each gene to generate specific gene cell count tables. For each scRNA-seq sample, the count tables were filtered to retain the genes detected in at least 10 cells and cells with a minimum gene count of 300."
##                                                                                                                                                                                                                                                                                                                                                                                                                                             V7
## 10                                                                                                                                                                                                                                                                                                                                                                                                                             "tissue: blood"
## 11                                                                                                                                                                                                                                                                                                                                                                                                                          "cell line: PBMCs"
## 13                                                                                                                                                                                                                                                                                                                                                                                                                                 "total RNA"
## 14                                                                                                                                                                                                                                                                                                                       "Isolated PBMCs were loaded into a 10× Chromium Chip (v3.1 PN:1000120) and barcoded using a 10x Chromium Controller."
## 15                                                                                                                                                                                                                         "RNA from the barcoded cells was then reverse-transcribed, amplified, and prepared into sequencing libraries with the 10× Library Construction Kit (v3.1 PN:1000190) according to the manufacturer’s instructions."
## 18 "Raw scRNA-seq data were initially pre-processed using CellRanger (version 8.0.1, 10x Genomics) to align reads to the human genome (GRCh38, 2024-A from 10x Genomics) and count the unique molecular identifiers (UMIs) for each gene to generate specific gene cell count tables. For each scRNA-seq sample, the count tables were filtered to retain the genes detected in at least 10 cells and cells with a minimum gene count of 300."
##                                                                                                                                                                                                                                                                                                                                                                                                                                             V8
## 10                                                                                                                                                                                                                                                                                                                                                                                                                             "tissue: blood"
## 11                                                                                                                                                                                                                                                                                                                                                                                                                          "cell line: PBMCs"
## 13                                                                                                                                                                                                                                                                                                                                                                                                                                 "total RNA"
## 14                                                                                                                                                                                                                                                                                                                       "Isolated PBMCs were loaded into a 10× Chromium Chip (v3.1 PN:1000120) and barcoded using a 10x Chromium Controller."
## 15                                                                                                                                                                                                                         "RNA from the barcoded cells was then reverse-transcribed, amplified, and prepared into sequencing libraries with the 10× Library Construction Kit (v3.1 PN:1000190) according to the manufacturer’s instructions."
## 18 "Raw scRNA-seq data were initially pre-processed using CellRanger (version 8.0.1, 10x Genomics) to align reads to the human genome (GRCh38, 2024-A from 10x Genomics) and count the unique molecular identifiers (UMIs) for each gene to generate specific gene cell count tables. For each scRNA-seq sample, the count tables were filtered to retain the genes detected in at least 10 cells and cells with a minimum gene count of 300."
##                                                                                                                                                                                                                                                                                                                                                                                                                                             V9
## 10                                                                                                                                                                                                                                                                                                                                                                                                                             "tissue: blood"
## 11                                                                                                                                                                                                                                                                                                                                                                                                                          "cell line: PBMCs"
## 13                                                                                                                                                                                                                                                                                                                                                                                                                                 "total RNA"
## 14                                                                                                                                                                                                                                                                                                                       "Isolated PBMCs were loaded into a 10× Chromium Chip (v3.1 PN:1000120) and barcoded using a 10x Chromium Controller."
## 15                                                                                                                                                                                                                         "RNA from the barcoded cells was then reverse-transcribed, amplified, and prepared into sequencing libraries with the 10× Library Construction Kit (v3.1 PN:1000190) according to the manufacturer’s instructions."
## 18 "Raw scRNA-seq data were initially pre-processed using CellRanger (version 8.0.1, 10x Genomics) to align reads to the human genome (GRCh38, 2024-A from 10x Genomics) and count the unique molecular identifiers (UMIs) for each gene to generate specific gene cell count tables. For each scRNA-seq sample, the count tables were filtered to retain the genes detected in at least 10 cells and cells with a minimum gene count of 300."
##                                                                                                                                                                                                                                                                                                                                                                                                                                            V10
## 10                                                                                                                                                                                                                                                                                                                                                                                                                             "tissue: blood"
## 11                                                                                                                                                                                                                                                                                                                                                                                                                          "cell line: PBMCs"
## 13                                                                                                                                                                                                                                                                                                                                                                                                                                 "total RNA"
## 14                                                                                                                                                                                                                                                                                                                       "Isolated PBMCs were loaded into a 10× Chromium Chip (v3.1 PN:1000120) and barcoded using a 10x Chromium Controller."
## 15                                                                                                                                                                                                                         "RNA from the barcoded cells was then reverse-transcribed, amplified, and prepared into sequencing libraries with the 10× Library Construction Kit (v3.1 PN:1000190) according to the manufacturer’s instructions."
## 18 "Raw scRNA-seq data were initially pre-processed using CellRanger (version 8.0.1, 10x Genomics) to align reads to the human genome (GRCh38, 2024-A from 10x Genomics) and count the unique molecular identifiers (UMIs) for each gene to generate specific gene cell count tables. For each scRNA-seq sample, the count tables were filtered to retain the genes detected in at least 10 cells and cells with a minimum gene count of 300."
##                                                                                                                                                                                                                                                                                                                                                                                                                                            V11
## 10                                                                                                                                                                                                                                                                                                                                                                                                                             "tissue: blood"
## 11                                                                                                                                                                                                                                                                                                                                                                                                                          "cell line: PBMCs"
## 13                                                                                                                                                                                                                                                                                                                                                                                                                                 "total RNA"
## 14                                                                                                                                                                                                                                                                                                                       "Isolated PBMCs were loaded into a 10× Chromium Chip (v3.1 PN:1000120) and barcoded using a 10x Chromium Controller."
## 15                                                                                                                                                                                                                         "RNA from the barcoded cells was then reverse-transcribed, amplified, and prepared into sequencing libraries with the 10× Library Construction Kit (v3.1 PN:1000190) according to the manufacturer’s instructions."
## 18 "Raw scRNA-seq data were initially pre-processed using CellRanger (version 8.0.1, 10x Genomics) to align reads to the human genome (GRCh38, 2024-A from 10x Genomics) and count the unique molecular identifiers (UMIs) for each gene to generate specific gene cell count tables. For each scRNA-seq sample, the count tables were filtered to retain the genes detected in at least 10 cells and cells with a minimum gene count of 300."
##                                                                                                                                                                                                                                                                                                                                                                                                                                            V12
## 10                                                                                                                                                                                                                                                                                                                                                                                                                             "tissue: blood"
## 11                                                                                                                                                                                                                                                                                                                                                                                                                          "cell line: PBMCs"
## 13                                                                                                                                                                                                                                                                                                                                                                                                                                 "total RNA"
## 14                                                                                                                                                                                                                                                                                                                       "Isolated PBMCs were loaded into a 10× Chromium Chip (v3.1 PN:1000120) and barcoded using a 10x Chromium Controller."
## 15                                                                                                                                                                                                                         "RNA from the barcoded cells was then reverse-transcribed, amplified, and prepared into sequencing libraries with the 10× Library Construction Kit (v3.1 PN:1000190) according to the manufacturer’s instructions."
## 18 "Raw scRNA-seq data were initially pre-processed using CellRanger (version 8.0.1, 10x Genomics) to align reads to the human genome (GRCh38, 2024-A from 10x Genomics) and count the unique molecular identifiers (UMIs) for each gene to generate specific gene cell count tables. For each scRNA-seq sample, the count tables were filtered to retain the genes detected in at least 10 cells and cells with a minimum gene count of 300."
##                                                                                                                                                                                                                                                                                                                                                                                                                                            V13
## 10                                                                                                                                                                                                                                                                                                                                                                                                                             "tissue: blood"
## 11                                                                                                                                                                                                                                                                                                                                                                                                                          "cell line: PBMCs"
## 13                                                                                                                                                                                                                                                                                                                                                                                                                                 "total RNA"
## 14                                                                                                                                                                                                                                                                                                                       "Isolated PBMCs were loaded into a 10× Chromium Chip (v3.1 PN:1000120) and barcoded using a 10x Chromium Controller."
## 15                                                                                                                                                                                                                         "RNA from the barcoded cells was then reverse-transcribed, amplified, and prepared into sequencing libraries with the 10× Library Construction Kit (v3.1 PN:1000190) according to the manufacturer’s instructions."
## 18 "Raw scRNA-seq data were initially pre-processed using CellRanger (version 8.0.1, 10x Genomics) to align reads to the human genome (GRCh38, 2024-A from 10x Genomics) and count the unique molecular identifiers (UMIs) for each gene to generate specific gene cell count tables. For each scRNA-seq sample, the count tables were filtered to retain the genes detected in at least 10 cells and cells with a minimum gene count of 300."
##                                                                                                                                                                                                                                                                                                                                                                                                                                            V14
## 10                                                                                                                                                                                                                                                                                                                                                                                                                             "tissue: blood"
## 11                                                                                                                                                                                                                                                                                                                                                                                                                          "cell line: PBMCs"
## 13                                                                                                                                                                                                                                                                                                                                                                                                                                 "total RNA"
## 14                                                                                                                                                                                                                                                                                                                       "Isolated PBMCs were loaded into a 10× Chromium Chip (v3.1 PN:1000120) and barcoded using a 10x Chromium Controller."
## 15                                                                                                                                                                                                                         "RNA from the barcoded cells was then reverse-transcribed, amplified, and prepared into sequencing libraries with the 10× Library Construction Kit (v3.1 PN:1000190) according to the manufacturer’s instructions."
## 18 "Raw scRNA-seq data were initially pre-processed using CellRanger (version 8.0.1, 10x Genomics) to align reads to the human genome (GRCh38, 2024-A from 10x Genomics) and count the unique molecular identifiers (UMIs) for each gene to generate specific gene cell count tables. For each scRNA-seq sample, the count tables were filtered to retain the genes detected in at least 10 cells and cells with a minimum gene count of 300."
##                                                                                                                                                                                                                                                                                                                                                                                                                                            V15
## 10                                                                                                                                                                                                                                                                                                                                                                                                                             "tissue: blood"
## 11                                                                                                                                                                                                                                                                                                                                                                                                                          "cell line: PBMCs"
## 13                                                                                                                                                                                                                                                                                                                                                                                                                                 "total RNA"
## 14                                                                                                                                                                                                                                                                                                                       "Isolated PBMCs were loaded into a 10× Chromium Chip (v3.1 PN:1000120) and barcoded using a 10x Chromium Controller."
## 15                                                                                                                                                                                                                         "RNA from the barcoded cells was then reverse-transcribed, amplified, and prepared into sequencing libraries with the 10× Library Construction Kit (v3.1 PN:1000190) according to the manufacturer’s instructions."
## 18 "Raw scRNA-seq data were initially pre-processed using CellRanger (version 8.0.1, 10x Genomics) to align reads to the human genome (GRCh38, 2024-A from 10x Genomics) and count the unique molecular identifiers (UMIs) for each gene to generate specific gene cell count tables. For each scRNA-seq sample, the count tables were filtered to retain the genes detected in at least 10 cells and cells with a minimum gene count of 300."
##                                                                                                                                                                                                                                                                                                                                                                                                                                            V16
## 10                                                                                                                                                                                                                                                                                                                                                                                                                             "tissue: blood"
## 11                                                                                                                                                                                                                                                                                                                                                                                                                          "cell line: PBMCs"
## 13                                                                                                                                                                                                                                                                                                                                                                                                                                 "total RNA"
## 14                                                                                                                                                                                                                                                                                                                       "Isolated PBMCs were loaded into a 10× Chromium Chip (v3.1 PN:1000120) and barcoded using a 10x Chromium Controller."
## 15                                                                                                                                                                                                                         "RNA from the barcoded cells was then reverse-transcribed, amplified, and prepared into sequencing libraries with the 10× Library Construction Kit (v3.1 PN:1000190) according to the manufacturer’s instructions."
## 18 "Raw scRNA-seq data were initially pre-processed using CellRanger (version 8.0.1, 10x Genomics) to align reads to the human genome (GRCh38, 2024-A from 10x Genomics) and count the unique molecular identifiers (UMIs) for each gene to generate specific gene cell count tables. For each scRNA-seq sample, the count tables were filtered to retain the genes detected in at least 10 cells and cells with a minimum gene count of 300."
##                                                                                                                                                                                                                                                                                                                                                                                                                                            V17
## 10                                                                                                                                                                                                                                                                                                                                                                                                                             "tissue: blood"
## 11                                                                                                                                                                                                                                                                                                                                                                                                                          "cell line: PBMCs"
## 13                                                                                                                                                                                                                                                                                                                                                                                                                                 "total RNA"
## 14                                                                                                                                                                                                                                                                                                                       "Isolated PBMCs were loaded into a 10× Chromium Chip (v3.1 PN:1000120) and barcoded using a 10x Chromium Controller."
## 15                                                                                                                                                                                                                         "RNA from the barcoded cells was then reverse-transcribed, amplified, and prepared into sequencing libraries with the 10× Library Construction Kit (v3.1 PN:1000190) according to the manufacturer’s instructions."
## 18 "Raw scRNA-seq data were initially pre-processed using CellRanger (version 8.0.1, 10x Genomics) to align reads to the human genome (GRCh38, 2024-A from 10x Genomics) and count the unique molecular identifiers (UMIs) for each gene to generate specific gene cell count tables. For each scRNA-seq sample, the count tables were filtered to retain the genes detected in at least 10 cells and cells with a minimum gene count of 300."
##                                                                                                                                                                                                                                                                                                                                                                                                                                            V18
## 10                                                                                                                                                                                                                                                                                                                                                                                                                             "tissue: blood"
## 11                                                                                                                                                                                                                                                                                                                                                                                                                          "cell line: PBMCs"
## 13                                                                                                                                                                                                                                                                                                                                                                                                                                 "total RNA"
## 14                                                                                                                                                                                                                                                                                                                       "Isolated PBMCs were loaded into a 10× Chromium Chip (v3.1 PN:1000120) and barcoded using a 10x Chromium Controller."
## 15                                                                                                                                                                                                                         "RNA from the barcoded cells was then reverse-transcribed, amplified, and prepared into sequencing libraries with the 10× Library Construction Kit (v3.1 PN:1000190) according to the manufacturer’s instructions."
## 18 "Raw scRNA-seq data were initially pre-processed using CellRanger (version 8.0.1, 10x Genomics) to align reads to the human genome (GRCh38, 2024-A from 10x Genomics) and count the unique molecular identifiers (UMIs) for each gene to generate specific gene cell count tables. For each scRNA-seq sample, the count tables were filtered to retain the genes detected in at least 10 cells and cells with a minimum gene count of 300."
##                                                                                                                                                                                                                                                                                                                                                                                                                                            V19
## 10                                                                                                                                                                                                                                                                                                                                                                                                                             "tissue: blood"
## 11                                                                                                                                                                                                                                                                                                                                                                                                                          "cell line: PBMCs"
## 13                                                                                                                                                                                                                                                                                                                                                                                                                                 "total RNA"
## 14                                                                                                                                                                                                                                                                                                                       "Isolated PBMCs were loaded into a 10× Chromium Chip (v3.1 PN:1000120) and barcoded using a 10x Chromium Controller."
## 15                                                                                                                                                                                                                         "RNA from the barcoded cells was then reverse-transcribed, amplified, and prepared into sequencing libraries with the 10× Library Construction Kit (v3.1 PN:1000190) according to the manufacturer’s instructions."
## 18 "Raw scRNA-seq data were initially pre-processed using CellRanger (version 8.0.1, 10x Genomics) to align reads to the human genome (GRCh38, 2024-A from 10x Genomics) and count the unique molecular identifiers (UMIs) for each gene to generate specific gene cell count tables. For each scRNA-seq sample, the count tables were filtered to retain the genes detected in at least 10 cells and cells with a minimum gene count of 300."
##                                                                                                                                                                                                                                                                                                                                                                                                                                            V20
## 10                                                                                                                                                                                                                                                                                                                                                                                                                             "tissue: blood"
## 11                                                                                                                                                                                                                                                                                                                                                                                                                          "cell line: PBMCs"
## 13                                                                                                                                                                                                                                                                                                                                                                                                                                 "total RNA"
## 14                                                                                                                                                                                                                                                                                                                       "Isolated PBMCs were loaded into a 10× Chromium Chip (v3.1 PN:1000120) and barcoded using a 10x Chromium Controller."
## 15                                                                                                                                                                                                                         "RNA from the barcoded cells was then reverse-transcribed, amplified, and prepared into sequencing libraries with the 10× Library Construction Kit (v3.1 PN:1000190) according to the manufacturer’s instructions."
## 18 "Raw scRNA-seq data were initially pre-processed using CellRanger (version 8.0.1, 10x Genomics) to align reads to the human genome (GRCh38, 2024-A from 10x Genomics) and count the unique molecular identifiers (UMIs) for each gene to generate specific gene cell count tables. For each scRNA-seq sample, the count tables were filtered to retain the genes detected in at least 10 cells and cells with a minimum gene count of 300."
##                                                                                                                                                                                                                                                                                                                                                                                                                                            V21
## 10                                                                                                                                                                                                                                                                                                                                                                                                                             "tissue: blood"
## 11                                                                                                                                                                                                                                                                                                                                                                                                                          "cell line: PBMCs"
## 13                                                                                                                                                                                                                                                                                                                                                                                                                                 "total RNA"
## 14                                                                                                                                                                                                                                                                                                                       "Isolated PBMCs were loaded into a 10× Chromium Chip (v3.1 PN:1000120) and barcoded using a 10x Chromium Controller."
## 15                                                                                                                                                                                                                         "RNA from the barcoded cells was then reverse-transcribed, amplified, and prepared into sequencing libraries with the 10× Library Construction Kit (v3.1 PN:1000190) according to the manufacturer’s instructions."
## 18 "Raw scRNA-seq data were initially pre-processed using CellRanger (version 8.0.1, 10x Genomics) to align reads to the human genome (GRCh38, 2024-A from 10x Genomics) and count the unique molecular identifiers (UMIs) for each gene to generate specific gene cell count tables. For each scRNA-seq sample, the count tables were filtered to retain the genes detected in at least 10 cells and cells with a minimum gene count of 300."
##                                                                                                                                                                                                                                                                                                                                                                                                                                            V22
## 10                                                                                                                                                                                                                                                                                                                                                                                                                             "tissue: blood"
## 11                                                                                                                                                                                                                                                                                                                                                                                                                          "cell line: PBMCs"
## 13                                                                                                                                                                                                                                                                                                                                                                                                                                 "total RNA"
## 14                                                                                                                                                                                                                                                                                                                       "Isolated PBMCs were loaded into a 10× Chromium Chip (v3.1 PN:1000120) and barcoded using a 10x Chromium Controller."
## 15                                                                                                                                                                                                                         "RNA from the barcoded cells was then reverse-transcribed, amplified, and prepared into sequencing libraries with the 10× Library Construction Kit (v3.1 PN:1000190) according to the manufacturer’s instructions."
## 18 "Raw scRNA-seq data were initially pre-processed using CellRanger (version 8.0.1, 10x Genomics) to align reads to the human genome (GRCh38, 2024-A from 10x Genomics) and count the unique molecular identifiers (UMIs) for each gene to generate specific gene cell count tables. For each scRNA-seq sample, the count tables were filtered to retain the genes detected in at least 10 cells and cells with a minimum gene count of 300."
##                                                                                                                                                                                                                                                                                                                                                                                                                                            V23
## 10                                                                                                                                                                                                                                                                                                                                                                                                                             "tissue: blood"
## 11                                                                                                                                                                                                                                                                                                                                                                                                                          "cell line: PBMCs"
## 13                                                                                                                                                                                                                                                                                                                                                                                                                                 "total RNA"
## 14                                                                                                                                                                                                                                                                                                                       "Isolated PBMCs were loaded into a 10× Chromium Chip (v3.1 PN:1000120) and barcoded using a 10x Chromium Controller."
## 15                                                                                                                                                                                                                         "RNA from the barcoded cells was then reverse-transcribed, amplified, and prepared into sequencing libraries with the 10× Library Construction Kit (v3.1 PN:1000190) according to the manufacturer’s instructions."
## 18 "Raw scRNA-seq data were initially pre-processed using CellRanger (version 8.0.1, 10x Genomics) to align reads to the human genome (GRCh38, 2024-A from 10x Genomics) and count the unique molecular identifiers (UMIs) for each gene to generate specific gene cell count tables. For each scRNA-seq sample, the count tables were filtered to retain the genes detected in at least 10 cells and cells with a minimum gene count of 300."
##                                                                                                                                                                                                                                                                                                                                                                                                                                            V24
## 10                                                                                                                                                                                                                                                                                                                                                                                                                             "tissue: blood"
## 11                                                                                                                                                                                                                                                                                                                                                                                                                          "cell line: PBMCs"
## 13                                                                                                                                                                                                                                                                                                                                                                                                                                 "total RNA"
## 14                                                                                                                                                                                                                                                                                                                       "Isolated PBMCs were loaded into a 10× Chromium Chip (v3.1 PN:1000120) and barcoded using a 10x Chromium Controller."
## 15                                                                                                                                                                                                                         "RNA from the barcoded cells was then reverse-transcribed, amplified, and prepared into sequencing libraries with the 10× Library Construction Kit (v3.1 PN:1000190) according to the manufacturer’s instructions."
## 18 "Raw scRNA-seq data were initially pre-processed using CellRanger (version 8.0.1, 10x Genomics) to align reads to the human genome (GRCh38, 2024-A from 10x Genomics) and count the unique molecular identifiers (UMIs) for each gene to generate specific gene cell count tables. For each scRNA-seq sample, the count tables were filtered to retain the genes detected in at least 10 cells and cells with a minimum gene count of 300."
##                                                                                                                                                                                                                                                                                                                                                                                                                                            V25
## 10                                                                                                                                                                                                                                                                                                                                                                                                                             "tissue: blood"
## 11                                                                                                                                                                                                                                                                                                                                                                                                                          "cell line: PBMCs"
## 13                                                                                                                                                                                                                                                                                                                                                                                                                                 "total RNA"
## 14                                                                                                                                                                                                                                                                                                                       "Isolated PBMCs were loaded into a 10× Chromium Chip (v3.1 PN:1000120) and barcoded using a 10x Chromium Controller."
## 15                                                                                                                                                                                                                         "RNA from the barcoded cells was then reverse-transcribed, amplified, and prepared into sequencing libraries with the 10× Library Construction Kit (v3.1 PN:1000190) according to the manufacturer’s instructions."
## 18 "Raw scRNA-seq data were initially pre-processed using CellRanger (version 8.0.1, 10x Genomics) to align reads to the human genome (GRCh38, 2024-A from 10x Genomics) and count the unique molecular identifiers (UMIs) for each gene to generate specific gene cell count tables. For each scRNA-seq sample, the count tables were filtered to retain the genes detected in at least 10 cells and cells with a minimum gene count of 300."
##                                                                                                                                                                                                                                                                                                                                                                                                                                            V26
## 10                                                                                                                                                                                                                                                                                                                                                                                                                             "tissue: blood"
## 11                                                                                                                                                                                                                                                                                                                                                                                                                          "cell line: PBMCs"
## 13                                                                                                                                                                                                                                                                                                                                                                                                                                 "total RNA"
## 14                                                                                                                                                                                                                                                                                                                       "Isolated PBMCs were loaded into a 10× Chromium Chip (v3.1 PN:1000120) and barcoded using a 10x Chromium Controller."
## 15                                                                                                                                                                                                                         "RNA from the barcoded cells was then reverse-transcribed, amplified, and prepared into sequencing libraries with the 10× Library Construction Kit (v3.1 PN:1000190) according to the manufacturer’s instructions."
## 18 "Raw scRNA-seq data were initially pre-processed using CellRanger (version 8.0.1, 10x Genomics) to align reads to the human genome (GRCh38, 2024-A from 10x Genomics) and count the unique molecular identifiers (UMIs) for each gene to generate specific gene cell count tables. For each scRNA-seq sample, the count tables were filtered to retain the genes detected in at least 10 cells and cells with a minimum gene count of 300."
##                                                                                                                                                                                                                                                                                                                                                                                                                                            V27
## 10                                                                                                                                                                                                                                                                                                                                                                                                                             "tissue: blood"
## 11                                                                                                                                                                                                                                                                                                                                                                                                                          "cell line: PBMCs"
## 13                                                                                                                                                                                                                                                                                                                                                                                                                                 "total RNA"
## 14                                                                                                                                                                                                                                                                                                                       "Isolated PBMCs were loaded into a 10× Chromium Chip (v3.1 PN:1000120) and barcoded using a 10x Chromium Controller."
## 15                                                                                                                                                                                                                         "RNA from the barcoded cells was then reverse-transcribed, amplified, and prepared into sequencing libraries with the 10× Library Construction Kit (v3.1 PN:1000190) according to the manufacturer’s instructions."
## 18 "Raw scRNA-seq data were initially pre-processed using CellRanger (version 8.0.1, 10x Genomics) to align reads to the human genome (GRCh38, 2024-A from 10x Genomics) and count the unique molecular identifiers (UMIs) for each gene to generate specific gene cell count tables. For each scRNA-seq sample, the count tables were filtered to retain the genes detected in at least 10 cells and cells with a minimum gene count of 300."
##                                                                                                                                                                                                                                                                                                                                                                                                                                            V28
## 10                                                                                                                                                                                                                                                                                                                                                                                                                             "tissue: blood"
## 11                                                                                                                                                                                                                                                                                                                                                                                                                          "cell line: PBMCs"
## 13                                                                                                                                                                                                                                                                                                                                                                                                                                 "total RNA"
## 14                                                                                                                                                                                                                                                                                                                       "Isolated PBMCs were loaded into a 10× Chromium Chip (v3.1 PN:1000120) and barcoded using a 10x Chromium Controller."
## 15                                                                                                                                                                                                                         "RNA from the barcoded cells was then reverse-transcribed, amplified, and prepared into sequencing libraries with the 10× Library Construction Kit (v3.1 PN:1000190) according to the manufacturer’s instructions."
## 18 "Raw scRNA-seq data were initially pre-processed using CellRanger (version 8.0.1, 10x Genomics) to align reads to the human genome (GRCh38, 2024-A from 10x Genomics) and count the unique molecular identifiers (UMIs) for each gene to generate specific gene cell count tables. For each scRNA-seq sample, the count tables were filtered to retain the genes detected in at least 10 cells and cells with a minimum gene count of 300."
##                                                                                                                                                                                                                                                                                                                                                                                                                                            V29
## 10                                                                                                                                                                                                                                                                                                                                                                                                                             "tissue: blood"
## 11                                                                                                                                                                                                                                                                                                                                                                                                                          "cell line: PBMCs"
## 13                                                                                                                                                                                                                                                                                                                                                                                                                                 "total RNA"
## 14                                                                                                                                                                                                                                                                                                                       "Isolated PBMCs were loaded into a 10× Chromium Chip (v3.1 PN:1000120) and barcoded using a 10x Chromium Controller."
## 15                                                                                                                                                                                                                         "RNA from the barcoded cells was then reverse-transcribed, amplified, and prepared into sequencing libraries with the 10× Library Construction Kit (v3.1 PN:1000190) according to the manufacturer’s instructions."
## 18 "Raw scRNA-seq data were initially pre-processed using CellRanger (version 8.0.1, 10x Genomics) to align reads to the human genome (GRCh38, 2024-A from 10x Genomics) and count the unique molecular identifiers (UMIs) for each gene to generate specific gene cell count tables. For each scRNA-seq sample, the count tables were filtered to retain the genes detected in at least 10 cells and cells with a minimum gene count of 300."
##                                                                                                                                                                                                                                                                                                                                                                                                                                            V30
## 10                                                                                                                                                                                                                                                                                                                                                                                                                             "tissue: blood"
## 11                                                                                                                                                                                                                                                                                                                                                                                                                          "cell line: PBMCs"
## 13                                                                                                                                                                                                                                                                                                                                                                                                                                 "total RNA"
## 14                                                                                                                                                                                                                                                                                                                       "Isolated PBMCs were loaded into a 10× Chromium Chip (v3.1 PN:1000120) and barcoded using a 10x Chromium Controller."
## 15                                                                                                                                                                                                                         "RNA from the barcoded cells was then reverse-transcribed, amplified, and prepared into sequencing libraries with the 10× Library Construction Kit (v3.1 PN:1000190) according to the manufacturer’s instructions."
## 18 "Raw scRNA-seq data were initially pre-processed using CellRanger (version 8.0.1, 10x Genomics) to align reads to the human genome (GRCh38, 2024-A from 10x Genomics) and count the unique molecular identifiers (UMIs) for each gene to generate specific gene cell count tables. For each scRNA-seq sample, the count tables were filtered to retain the genes detected in at least 10 cells and cells with a minimum gene count of 300."

The cell line is PBMCs of human blood using single cell RNA sequencing in chip array format. The barcodes made were then selected by gene they belonged to if the gene’s barcodes were found in at least 10 of the chip array cells or at least 300 genes had a barcode present. This is reverse transcribed mRNA or messenger RNA so it is complementary DNA.

And within the previous projects using unsupervised learning via Seurat library the data was filtered for best explanatory genes, normalized to be between 0 and 1, and then scaled by 10,000 to avoid infinitesimally small values closer to zero, then normalized again to get the same window viewing size of genes to plot and then the highest variable genes were found for the top 2,000 genes. We selected the top 10 of those genes and later extracted the gene names for the genes that met initial study criteria for being within 10 cells of the array or having 300 barcodes for the gene totalling 30,960 genes in the single cell RNA sequencing data or sc-RNA analysis to filter.

Lets read in the pathology database and recall what we included in the data.

setwd(path)

pathologyDB <- read.csv("pathologyDB_5pathologies_2-8-2026.csv", sep=',', header=T)
head(pathologyDB)
##        Ensembl_ID Genecards_ID FC_pathology_control   topGenePathology
## 1 ENSG00000211899         IGHM          18550.40000 Epstein Barr Virus
## 2 ENSG00000164458         TBXT           1051.20000 Epstein Barr Virus
## 3 ENSG00000211644     IGLV1-51            179.38824 Epstein Barr Virus
## 4 ENSG00000125869        LAMP5            140.00000 Epstein Barr Virus
## 5 ENSG00000163600         ICOS            105.00000 Epstein Barr Virus
## 6 ENSG00000124507      PACSIN1             75.04348 Epstein Barr Virus
##                      mediaType
## 1 LCLs of PBMCs RNA-Seq format
## 2 LCLs of PBMCs RNA-Seq format
## 3 LCLs of PBMCs RNA-Seq format
## 4 LCLs of PBMCs RNA-Seq format
## 5 LCLs of PBMCs RNA-Seq format
## 6 LCLs of PBMCs RNA-Seq format
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      studySummarized
## 1 The EBV or Epstein-Barr Viral infected samples were obtained from lymphoblastic cells in peripheral blood mononuclear cells. Not the same as B cells in multiple sclerosis for tissue type. The lymphoblastic B cells are cancerous uncontrolled growth B cells, where the regular B cells are the normal circulating white blood cells active in healthy immunity. There are 2 samples of control1, 2 samples of control2, 2 samples from patient 1, and 2 samples from patient 2. This is a total of 8 samples where each control and sample have a basal or non-stimulated baseline and 4 samples stimulated with IL27. They stimulated the samples with IL27 and found the samples with an allele lacking IL27RA had a slower healing time if any to EBV infection. These 8 samples used the raw counts of genes. So, we have 8 EBV samples as lymphoblastic B-cells of PBMCs stimulated with IL27, 86 Lyme disease samples as RBCs of PBMCs in various states of infection with antibiotics used, 12 Fibromyalgia samples as skeletal muscle of possibly rats and unknown if before or after being stimulated with MEF2 known to enhance pain causing inflammation, or before or after that stimulation as well as stimulation with DEX known to alleviate pain caused by inflammation, and 18 Multiple Sclerosis samples  as healthy B-cells of PBMCs without any stimulation but using 5 MS samples of commercial B-cells. 
## 2 The EBV or Epstein-Barr Viral infected samples were obtained from lymphoblastic cells in peripheral blood mononuclear cells. Not the same as B cells in multiple sclerosis for tissue type. The lymphoblastic B cells are cancerous uncontrolled growth B cells, where the regular B cells are the normal circulating white blood cells active in healthy immunity. There are 2 samples of control1, 2 samples of control2, 2 samples from patient 1, and 2 samples from patient 2. This is a total of 8 samples where each control and sample have a basal or non-stimulated baseline and 4 samples stimulated with IL27. They stimulated the samples with IL27 and found the samples with an allele lacking IL27RA had a slower healing time if any to EBV infection. These 8 samples used the raw counts of genes. So, we have 8 EBV samples as lymphoblastic B-cells of PBMCs stimulated with IL27, 86 Lyme disease samples as RBCs of PBMCs in various states of infection with antibiotics used, 12 Fibromyalgia samples as skeletal muscle of possibly rats and unknown if before or after being stimulated with MEF2 known to enhance pain causing inflammation, or before or after that stimulation as well as stimulation with DEX known to alleviate pain caused by inflammation, and 18 Multiple Sclerosis samples  as healthy B-cells of PBMCs without any stimulation but using 5 MS samples of commercial B-cells. 
## 3 The EBV or Epstein-Barr Viral infected samples were obtained from lymphoblastic cells in peripheral blood mononuclear cells. Not the same as B cells in multiple sclerosis for tissue type. The lymphoblastic B cells are cancerous uncontrolled growth B cells, where the regular B cells are the normal circulating white blood cells active in healthy immunity. There are 2 samples of control1, 2 samples of control2, 2 samples from patient 1, and 2 samples from patient 2. This is a total of 8 samples where each control and sample have a basal or non-stimulated baseline and 4 samples stimulated with IL27. They stimulated the samples with IL27 and found the samples with an allele lacking IL27RA had a slower healing time if any to EBV infection. These 8 samples used the raw counts of genes. So, we have 8 EBV samples as lymphoblastic B-cells of PBMCs stimulated with IL27, 86 Lyme disease samples as RBCs of PBMCs in various states of infection with antibiotics used, 12 Fibromyalgia samples as skeletal muscle of possibly rats and unknown if before or after being stimulated with MEF2 known to enhance pain causing inflammation, or before or after that stimulation as well as stimulation with DEX known to alleviate pain caused by inflammation, and 18 Multiple Sclerosis samples  as healthy B-cells of PBMCs without any stimulation but using 5 MS samples of commercial B-cells. 
## 4 The EBV or Epstein-Barr Viral infected samples were obtained from lymphoblastic cells in peripheral blood mononuclear cells. Not the same as B cells in multiple sclerosis for tissue type. The lymphoblastic B cells are cancerous uncontrolled growth B cells, where the regular B cells are the normal circulating white blood cells active in healthy immunity. There are 2 samples of control1, 2 samples of control2, 2 samples from patient 1, and 2 samples from patient 2. This is a total of 8 samples where each control and sample have a basal or non-stimulated baseline and 4 samples stimulated with IL27. They stimulated the samples with IL27 and found the samples with an allele lacking IL27RA had a slower healing time if any to EBV infection. These 8 samples used the raw counts of genes. So, we have 8 EBV samples as lymphoblastic B-cells of PBMCs stimulated with IL27, 86 Lyme disease samples as RBCs of PBMCs in various states of infection with antibiotics used, 12 Fibromyalgia samples as skeletal muscle of possibly rats and unknown if before or after being stimulated with MEF2 known to enhance pain causing inflammation, or before or after that stimulation as well as stimulation with DEX known to alleviate pain caused by inflammation, and 18 Multiple Sclerosis samples  as healthy B-cells of PBMCs without any stimulation but using 5 MS samples of commercial B-cells. 
## 5 The EBV or Epstein-Barr Viral infected samples were obtained from lymphoblastic cells in peripheral blood mononuclear cells. Not the same as B cells in multiple sclerosis for tissue type. The lymphoblastic B cells are cancerous uncontrolled growth B cells, where the regular B cells are the normal circulating white blood cells active in healthy immunity. There are 2 samples of control1, 2 samples of control2, 2 samples from patient 1, and 2 samples from patient 2. This is a total of 8 samples where each control and sample have a basal or non-stimulated baseline and 4 samples stimulated with IL27. They stimulated the samples with IL27 and found the samples with an allele lacking IL27RA had a slower healing time if any to EBV infection. These 8 samples used the raw counts of genes. So, we have 8 EBV samples as lymphoblastic B-cells of PBMCs stimulated with IL27, 86 Lyme disease samples as RBCs of PBMCs in various states of infection with antibiotics used, 12 Fibromyalgia samples as skeletal muscle of possibly rats and unknown if before or after being stimulated with MEF2 known to enhance pain causing inflammation, or before or after that stimulation as well as stimulation with DEX known to alleviate pain caused by inflammation, and 18 Multiple Sclerosis samples  as healthy B-cells of PBMCs without any stimulation but using 5 MS samples of commercial B-cells. 
## 6 The EBV or Epstein-Barr Viral infected samples were obtained from lymphoblastic cells in peripheral blood mononuclear cells. Not the same as B cells in multiple sclerosis for tissue type. The lymphoblastic B cells are cancerous uncontrolled growth B cells, where the regular B cells are the normal circulating white blood cells active in healthy immunity. There are 2 samples of control1, 2 samples of control2, 2 samples from patient 1, and 2 samples from patient 2. This is a total of 8 samples where each control and sample have a basal or non-stimulated baseline and 4 samples stimulated with IL27. They stimulated the samples with IL27 and found the samples with an allele lacking IL27RA had a slower healing time if any to EBV infection. These 8 samples used the raw counts of genes. So, we have 8 EBV samples as lymphoblastic B-cells of PBMCs stimulated with IL27, 86 Lyme disease samples as RBCs of PBMCs in various states of infection with antibiotics used, 12 Fibromyalgia samples as skeletal muscle of possibly rats and unknown if before or after being stimulated with MEF2 known to enhance pain causing inflammation, or before or after that stimulation as well as stimulation with DEX known to alleviate pain caused by inflammation, and 18 Multiple Sclerosis samples  as healthy B-cells of PBMCs without any stimulation but using 5 MS samples of commercial B-cells. 
##   GSE_study_ID
## 1    GSE253756
## 2    GSE253756
## 3    GSE253756
## 4    GSE253756
## 5    GSE253756
## 6    GSE253756
colnames(pathologyDB)
## [1] "Ensembl_ID"           "Genecards_ID"         "FC_pathology_control"
## [4] "topGenePathology"     "mediaType"            "studySummarized"     
## [7] "GSE_study_ID"

We need to take our top genes, their gene ID and our summary of the study as well as the fold change values, media type and cell line, GSE ID, and description of pathology.

Lets read in our previous top genes in NKTCL from both sets of data.

common <- read.csv("commonToSeurat2000AndFCsCompleteCasesNKTCL.csv", header=T, sep=',')

head(common)
##   NKTCL.GSM9493334_mean NKTCL.GSM9493335_mean NKTCL.GSM9493336_mean
## 1           0.488204306           0.104894314           0.239316239
## 2           0.162849290           0.069289025           0.105943152
## 3           0.198465415           0.025319317           0.026237329
## 4           0.638685295           0.004069176           0.017193401
## 5           0.001946862           0.004747372           0.004472272
## 6           0.003321118           0.018198259           0.015702644
##   NKTCL.GSM9493337_mean NKTCL.GSM9493338_mean NKTCL.GSM9493339_mean
## 1           0.033717995           1.659388646           0.019009801
## 2           0.030472413           1.882431979           0.008448800
## 3           0.012441399           0.360317994           0.032781345
## 4           0.121168410           0.003023178           3.565224738
## 5           0.006491165           0.003471056           0.002281176
## 6           0.016588532           0.009069533           0.002619128
##   NKTCL.GSM9493340_mean NKTCL.GSM9493341_mean NKTCL.GSM9493342_mean
## 1           0.983197774           0.466055999           0.308642857
## 2           0.237906678           0.488273891           0.287000000
## 3           0.698630137           0.090430715           0.077285714
## 4           0.051583904           0.001819009           0.002857143
## 5           0.005458048           0.005911778           0.005142857
## 6           0.006849315           0.021568245           0.016785714
##   NKTCL.GSM9493343_mean NKTCL.GSM9493344_mean NKTCL.GSM9493345_mean
## 1           0.006203559           0.672415196           0.426184065
## 2           0.009066739           0.231595217           0.249722794
## 3           0.005999046           0.315545432           0.062648503
## 4           0.009339423           0.022695843           0.010217013
## 5           0.004362942           0.002359066           0.004435292
## 6           0.077646738           0.016594810           0.017503564
##   NKTCL.GSM9493346_mean NKTCL.GSM9493347_mean NKTCL.GSM9493348_mean
## 1           1.810318044           1.362316660           0.601367817
## 2           0.440111193           1.082437006           0.779419857
## 3           0.351402175           0.552012035           0.793412468
## 4           0.124110866           0.006167732           0.048738307
## 5           0.003188619           0.004362542           0.001414983
## 6           0.002043987           0.020308387           0.007232136
##   NKTCL.GSM9493349_mean NKTCL.GSM9493350_mean NKTCL.GSM9493351_mean
## 1           1.578515816           0.015787860           0.114040156
## 2           2.424455561           0.027590435           0.142351582
## 3           0.188475571           0.194512569           0.155929510
## 4           0.010343401           0.004445126           0.014083490
## 5           0.001692557           0.002759044           0.004838943
## 6           0.008011434           0.024908032           0.070273003
##   NKTCL.GSM9493332_mean NKTCL.GSM9493333_mean healthy.GSM9493320_mean
## 1           0.070467564          0.1994304023             0.014985591
## 2           0.036366058          0.3639017444             0.016253602
## 3           0.228853070          0.1167675329             0.018789625
## 4           0.001465299          0.0074759701             0.007377522
## 5           0.000799254          0.0008543966             0.013717579
## 6           0.008125749          0.0127447490             0.028933718
##   healthy.GSM9493329_mean healthy.GSM9493330_mean healthy.GSM9493331_mean
## 1             0.016264660             0.005078017             0.008245601
## 2             0.027550343             0.027144308             0.023337996
## 3             0.012502766             0.017911550             0.003901936
## 4             0.016485948             0.008863448             0.013178238
## 5             0.006638637             0.007016896             0.010601487
## 6             0.097698606             0.098605854             0.099094456
##   healthy.GSM9493321_mean healthy.GSM9493322_mean healthy.GSM9493323_mean
## 1             0.006818534             0.009493671             0.007760258
## 2             0.011777468             0.004801397             0.008420705
## 3             0.004339067             0.002400698             0.002476678
## 4             0.010460251             0.019860323             0.014447288
## 5             0.011235084             0.038411174             0.003715017
## 6             0.069889974             0.038192929             0.011062495
##   healthy.GSM9493324_mean healthy.GSM9493325_mean healthy.GSM9493326_mean
## 1             0.013263263             0.006746032             0.019963311
## 2             0.003503504             0.005634921             0.018560483
## 3             0.005880881             0.003253968             0.010467249
## 4             0.024274274             0.005158730             0.016186468
## 5             0.007007007             0.010555556             0.006474587
## 6             0.030780781             0.052380952             0.052767886
##   healthy.GSM9493327_mean healthy.GSM9493328_mean var.features
## 1              0.01338826             0.010101010      ANKRD22
## 2              0.02506008             0.018068004       CXCL10
## 3              0.01175764             0.002703087        IFI27
## 4              0.01278750             0.020344288        IL1R2
## 5              0.00592173             0.003983497       NCKAP5
## 6              0.03690354             0.062882345       FCER1A
##   var.features.rank   genes pathologyMean healthyMean
## 1               693 ANKRD22   0.557973754 0.011009017
## 2                30  CXCL10   0.452981671 0.015842734
## 3               775   IFI27   0.224373364 0.008032095
## 4               617   IL1R2   0.233235336 0.014118690
## 5              1013  NCKAP5   0.003549511 0.010439854
## 6                57  FCER1A   0.018804754 0.056599461
##   foldchangePathologyVsHealthy
## 1                   50.6833391
## 2                   28.5923929
## 3                   27.9345988
## 4                   16.5196158
## 5                    0.3399962
## 6                    0.3322426
colnames(common)
##  [1] "NKTCL.GSM9493334_mean"        "NKTCL.GSM9493335_mean"       
##  [3] "NKTCL.GSM9493336_mean"        "NKTCL.GSM9493337_mean"       
##  [5] "NKTCL.GSM9493338_mean"        "NKTCL.GSM9493339_mean"       
##  [7] "NKTCL.GSM9493340_mean"        "NKTCL.GSM9493341_mean"       
##  [9] "NKTCL.GSM9493342_mean"        "NKTCL.GSM9493343_mean"       
## [11] "NKTCL.GSM9493344_mean"        "NKTCL.GSM9493345_mean"       
## [13] "NKTCL.GSM9493346_mean"        "NKTCL.GSM9493347_mean"       
## [15] "NKTCL.GSM9493348_mean"        "NKTCL.GSM9493349_mean"       
## [17] "NKTCL.GSM9493350_mean"        "NKTCL.GSM9493351_mean"       
## [19] "NKTCL.GSM9493332_mean"        "NKTCL.GSM9493333_mean"       
## [21] "healthy.GSM9493320_mean"      "healthy.GSM9493329_mean"     
## [23] "healthy.GSM9493330_mean"      "healthy.GSM9493331_mean"     
## [25] "healthy.GSM9493321_mean"      "healthy.GSM9493322_mean"     
## [27] "healthy.GSM9493323_mean"      "healthy.GSM9493324_mean"     
## [29] "healthy.GSM9493325_mean"      "healthy.GSM9493326_mean"     
## [31] "healthy.GSM9493327_mean"      "healthy.GSM9493328_mean"     
## [33] "var.features"                 "var.features.rank"           
## [35] "genes"                        "pathologyMean"               
## [37] "healthyMean"                  "foldchangePathologyVsHealthy"

Lets keep the genes and foldchangePathologyVsHealthy column of this dataset of 9 genes common between the complete cases of top fold change genes and the top 2,000 highest variability genes found with Seurat.

commonNeeded <- common[,c(35,38)]

commonNeeded
##             genes foldchangePathologyVsHealthy
## 1         ANKRD22                   50.6833391
## 2          CXCL10                   28.5923929
## 3           IFI27                   27.9345988
## 4           IL1R2                   16.5196158
## 5          NCKAP5                    0.3399962
## 6          FCER1A                    0.3322426
## 7 ENSG00000286797                    0.3236574
## 8 ENSG00000240086                    0.3104035
## 9           NRCAM                    0.2786855

We need the columns to match the database of pathologies: [1] “Ensembl_ID” “Genecards_ID” “FC_pathology_control” [4] “topGenePathology” “mediaType” “studySummarized”
[7] “GSE_study_ID”

But notice the gene ID is a mix of Ensembl and Genecards ID. Mostly Genecards ID, so lets rename the genes column to Genecards_ID and the fold change column to FC_pathology_control.

colnames(commonNeeded) <- c("Genecards_ID","FC_pathology_control")

commonNeeded$Ensembl_ID <- NA
commonNeeded$topGenePathology <- "NKTCL Natural Killer T-Cell Lymphoma & EBV"
commonNeeded$mediaType <- "PBMC blood immune cells"
commonNeeded$studySummarized <- "The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are those genes having highest fold change values of under and overexpressed or stimulated or silenced the most using gene names extracted from Seurat and ran separately and also having those complete cases where all samples had a gene sequencing value to use the mean sample of pathology NKTCL over the healthy sample mean value for fold change, and those common to those complete cases with the top 2,000 highest variability genes in Seurat that was the last step to do before ability to run the Seurat PCA algorithm of unsupervised machine learning analysis but not the PCA genes found."
commonNeeded$GSE_study_ID <- "GSE318371"
colnames(commonNeeded)
## [1] "Genecards_ID"         "FC_pathology_control" "Ensembl_ID"          
## [4] "topGenePathology"     "mediaType"            "studySummarized"     
## [7] "GSE_study_ID"

We need to rearrange our column names to be [1] “Ensembl_ID” “Genecards_ID” “FC_pathology_control” [4] “topGenePathology” “mediaType” “studySummarized”
[7] “GSE_study_ID”

before adding them to the database

commonNeeded1 <- commonNeeded[,c(3,1,2,4:7)]
colnames(commonNeeded1)
## [1] "Ensembl_ID"           "Genecards_ID"         "FC_pathology_control"
## [4] "topGenePathology"     "mediaType"            "studySummarized"     
## [7] "GSE_study_ID"

Now we can add it as a row bind to the end of the pathology database.

pathology2 <- rbind(pathologyDB,commonNeeded1)
tail(pathology2)
##     Ensembl_ID    Genecards_ID FC_pathology_control
## 188       <NA>           IL1R2           16.5196158
## 189       <NA>          NCKAP5            0.3399962
## 190       <NA>          FCER1A            0.3322426
## 191       <NA> ENSG00000286797            0.3236574
## 192       <NA> ENSG00000240086            0.3104035
## 193       <NA>           NRCAM            0.2786855
##                               topGenePathology               mediaType
## 188 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 189 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 190 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 191 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 192 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 193 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         studySummarized
## 188 The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are those genes having highest fold change values of under and overexpressed or stimulated or silenced the most using gene names extracted from Seurat and ran separately and also having those complete cases where all samples had a gene sequencing value to use the mean sample of pathology NKTCL over the healthy sample mean value for fold change, and those common to those complete cases with the top 2,000 highest variability genes in Seurat that was the last step to do before ability to run the Seurat PCA algorithm of unsupervised machine learning analysis but not the PCA genes found.
## 189 The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are those genes having highest fold change values of under and overexpressed or stimulated or silenced the most using gene names extracted from Seurat and ran separately and also having those complete cases where all samples had a gene sequencing value to use the mean sample of pathology NKTCL over the healthy sample mean value for fold change, and those common to those complete cases with the top 2,000 highest variability genes in Seurat that was the last step to do before ability to run the Seurat PCA algorithm of unsupervised machine learning analysis but not the PCA genes found.
## 190 The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are those genes having highest fold change values of under and overexpressed or stimulated or silenced the most using gene names extracted from Seurat and ran separately and also having those complete cases where all samples had a gene sequencing value to use the mean sample of pathology NKTCL over the healthy sample mean value for fold change, and those common to those complete cases with the top 2,000 highest variability genes in Seurat that was the last step to do before ability to run the Seurat PCA algorithm of unsupervised machine learning analysis but not the PCA genes found.
## 191 The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are those genes having highest fold change values of under and overexpressed or stimulated or silenced the most using gene names extracted from Seurat and ran separately and also having those complete cases where all samples had a gene sequencing value to use the mean sample of pathology NKTCL over the healthy sample mean value for fold change, and those common to those complete cases with the top 2,000 highest variability genes in Seurat that was the last step to do before ability to run the Seurat PCA algorithm of unsupervised machine learning analysis but not the PCA genes found.
## 192 The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are those genes having highest fold change values of under and overexpressed or stimulated or silenced the most using gene names extracted from Seurat and ran separately and also having those complete cases where all samples had a gene sequencing value to use the mean sample of pathology NKTCL over the healthy sample mean value for fold change, and those common to those complete cases with the top 2,000 highest variability genes in Seurat that was the last step to do before ability to run the Seurat PCA algorithm of unsupervised machine learning analysis but not the PCA genes found.
## 193 The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are those genes having highest fold change values of under and overexpressed or stimulated or silenced the most using gene names extracted from Seurat and ran separately and also having those complete cases where all samples had a gene sequencing value to use the mean sample of pathology NKTCL over the healthy sample mean value for fold change, and those common to those complete cases with the top 2,000 highest variability genes in Seurat that was the last step to do before ability to run the Seurat PCA algorithm of unsupervised machine learning analysis but not the PCA genes found.
##     GSE_study_ID
## 188    GSE318371
## 189    GSE318371
## 190    GSE318371
## 191    GSE318371
## 192    GSE318371
## 193    GSE318371

Now lets do the same for the Seurat top 10 highest variable genes.

seurat10 <- read.csv("top10SeuratPCA_NKTCL_genes.csv", sep=',', header=T)

head(seurat10)
##   NKTCL.GSM9493334_mean NKTCL.GSM9493335_mean NKTCL.GSM9493336_mean
## 1           56.14074668             9.1726009            14.0559531
## 2           28.45338983             8.8639087             9.6306897
## 3            0.27828676             0.6656494             0.5280262
## 4            0.10364178             0.2857466             0.4873783
## 5            0.09883188             0.4481745             0.8714967
## 6            1.58863949             5.0399005             5.4777380
##   NKTCL.GSM9493337_mean NKTCL.GSM9493338_mean NKTCL.GSM9493339_mean
## 1            12.7582041           22.19706640           16.51334910
## 2            10.3959611           12.17064159            6.66821561
## 3             0.6413631            1.05105811            0.26039202
## 4             0.3067075            0.09965289            0.04123015
## 5             1.3196899            0.23278468            0.26486989
## 6             3.5041471            4.29336021            1.45555931
##   NKTCL.GSM9493340_mean NKTCL.GSM9493341_mean NKTCL.GSM9493342_mean
## 1            29.9675728            5.95517443            10.8132857
## 2            12.7071918           10.93055285            12.4585000
## 3             0.1405180            0.36938868             0.4012143
## 4             0.0854024            0.15792893             0.1267857
## 5             1.8139983            0.02864939             0.4072143
## 6             1.7065497            2.09627753             3.8170000
##   NKTCL.GSM9493343_mean NKTCL.GSM9493344_mean NKTCL.GSM9493345_mean
## 1            14.2298043            28.7486374            16.6500079
## 2            12.0203149            22.5039453            13.3806431
## 3             0.3568069             0.2224030             0.3739110
## 4             0.1337514             0.1227528             0.2149533
## 5             0.7770809             2.3528838             0.4168383
## 6             2.1034154             2.7587245             3.0952796
##   NKTCL.GSM9493346_mean NKTCL.GSM9493347_mean NKTCL.GSM9493348_mean
## 1           75.24241681           19.10109064           25.19990567
## 2           24.06557109           20.11214742            9.47779263
## 3            0.04962799            0.55690109            0.43573618
## 4            0.17676396            0.06130124            0.04213505
## 5            0.79298504            0.32094772            4.96093074
## 6            1.74392936            2.80699511            1.64916280
##   NKTCL.GSM9493349_mean NKTCL.GSM9493350_mean NKTCL.GSM9493351_mean
## 1           19.16635950            12.8743869           18.06203958
## 2           16.34768872             5.2355150           10.01278348
## 3            0.61263023             0.9675046            0.46959411
## 4            0.02903675             0.2963673            0.01856132
## 5            1.36555459             0.6836297            0.82103134
## 6            1.73246323             6.7227928            0.92568251
##   NKTCL.GSM9493332_mean NKTCL.GSM9493333_mean healthy.GSM9493320_mean
## 1           13.23404822            9.13221787               8.1473199
## 2            5.19808179           10.06422214               5.8773487
## 3            0.06673771            0.02470630               0.3576945
## 4            0.02917277            0.06429334               0.2695101
## 5            0.33422139            0.21352795               0.7465130
## 6            0.09750899            0.07831969               1.8303170
##   healthy.GSM9493329_mean healthy.GSM9493330_mean healthy.GSM9493331_mean
## 1              29.2248285              11.3305327               9.6991828
## 2              18.1343218               6.2842766               8.5866156
## 3               0.3899093               0.7881082               0.5373629
## 4               0.1203806               0.1153171               0.1123463
## 5               0.7561407               1.1591727               1.1468748
## 6               2.0202478               4.4677315               5.8579842
##   healthy.GSM9493321_mean healthy.GSM9493322_mean healthy.GSM9493323_mean
## 1               6.2743685               8.9673723              5.89515397
## 2               7.0196808              11.9966172              4.16816643
## 3               0.4270107               0.1407682              0.29167011
## 4               0.1930885               0.1736141              0.09378354
## 5               0.4886874               0.8538848              0.37183192
## 6               1.5470324               2.3434090              2.02253777
##   healthy.GSM9493324_mean healthy.GSM9493325_mean healthy.GSM9493326_mean
## 1              11.6448949              13.1591270             23.20427323
## 2              13.4121622               9.4002381             13.01553901
## 3               0.3395896               0.1820635              0.36840401
## 4               0.1525275               0.1492063              0.03507068
## 5               0.4872372               1.5381746              0.67454408
## 6               1.4079079               1.9276190              2.61066149
##   healthy.GSM9493327_mean healthy.GSM9493328_mean var.features
## 1              17.6065911              21.2710201       S100A9
## 2               4.5891692              15.8304168          LYZ
## 3               0.1588568               0.5342154        IGLC3
## 4               0.0592173               0.1337317         SOX5
## 5               1.6737899               0.8955755         PPBP
## 6               2.6473567               3.3883910         IGKC
##   var.features.rank  genes pathologyMean healthyMean
## 1                 4 S100A9    21.4607434  13.8687221
## 2                 9    LYZ    13.0348878   9.8595460
## 3                 6  IGLC3     0.4236228   0.3763044
## 4                 7   SOX5     0.1441782   0.1339828
## 5                 1   PPBP     0.9262670   0.8993689
## 6                 3   IGKC     2.6346723   2.6725997
##   foldchangePathologyVsHealthy
## 1                    1.5474204
## 2                    1.3220576
## 3                    1.1257449
## 4                    1.0760944
## 5                    1.0299078
## 6                    0.9858088
colnames(seurat10)
##  [1] "NKTCL.GSM9493334_mean"        "NKTCL.GSM9493335_mean"       
##  [3] "NKTCL.GSM9493336_mean"        "NKTCL.GSM9493337_mean"       
##  [5] "NKTCL.GSM9493338_mean"        "NKTCL.GSM9493339_mean"       
##  [7] "NKTCL.GSM9493340_mean"        "NKTCL.GSM9493341_mean"       
##  [9] "NKTCL.GSM9493342_mean"        "NKTCL.GSM9493343_mean"       
## [11] "NKTCL.GSM9493344_mean"        "NKTCL.GSM9493345_mean"       
## [13] "NKTCL.GSM9493346_mean"        "NKTCL.GSM9493347_mean"       
## [15] "NKTCL.GSM9493348_mean"        "NKTCL.GSM9493349_mean"       
## [17] "NKTCL.GSM9493350_mean"        "NKTCL.GSM9493351_mean"       
## [19] "NKTCL.GSM9493332_mean"        "NKTCL.GSM9493333_mean"       
## [21] "healthy.GSM9493320_mean"      "healthy.GSM9493329_mean"     
## [23] "healthy.GSM9493330_mean"      "healthy.GSM9493331_mean"     
## [25] "healthy.GSM9493321_mean"      "healthy.GSM9493322_mean"     
## [27] "healthy.GSM9493323_mean"      "healthy.GSM9493324_mean"     
## [29] "healthy.GSM9493325_mean"      "healthy.GSM9493326_mean"     
## [31] "healthy.GSM9493327_mean"      "healthy.GSM9493328_mean"     
## [33] "var.features"                 "var.features.rank"           
## [35] "genes"                        "pathologyMean"               
## [37] "healthyMean"                  "foldchangePathologyVsHealthy"

Lets keep the genes and foldchangePathologyVsHealthy columns.

Seurat10 <- seurat10[,c(35,38)]
colnames(Seurat10)
## [1] "genes"                        "foldchangePathologyVsHealthy"

Now lets get the same columns needed in our pathology database before we can combine these genes to it.

colnames(Seurat10) <- c("Genecards_ID","FC_pathology_control")

Seurat10$Ensembl_ID <- NA
Seurat10$topGenePathology <- "NKTCL Natural Killer T-Cell Lymphoma & EBV"
Seurat10$mediaType <- "PBMC blood immune cells"
Seurat10$studySummarized <- "The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are the top 10 genes of those genes needed in preprocessing and quality control before using the unsupervised algorithm of PCA."
Seurat10$GSE_study_ID <- "GSE318371"
colnames(Seurat10)
## [1] "Genecards_ID"         "FC_pathology_control" "Ensembl_ID"          
## [4] "topGenePathology"     "mediaType"            "studySummarized"     
## [7] "GSE_study_ID"
Seurat10b <- Seurat10[,c(3,1,2,4:7)]
colnames(Seurat10b)
## [1] "Ensembl_ID"           "Genecards_ID"         "FC_pathology_control"
## [4] "topGenePathology"     "mediaType"            "studySummarized"     
## [7] "GSE_study_ID"

Now lets add in these genes to the database.

pathology3 <- rbind(pathology2, Seurat10b)

tail(pathology3)
##     Ensembl_ID Genecards_ID FC_pathology_control
## 198       <NA>         PPBP            1.0299078
## 199       <NA>         IGKC            0.9858088
## 200       <NA>        IGLC2            0.9634843
## 201       <NA>          PF4            0.7387458
## 202       <NA>       LINGO2            0.6979391
## 203       <NA>         EREG            0.6300822
##                               topGenePathology               mediaType
## 198 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 199 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 200 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 201 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 202 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 203 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           studySummarized
## 198 The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are the top 10 genes of those genes needed in preprocessing and quality control before using the unsupervised algorithm of PCA.
## 199 The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are the top 10 genes of those genes needed in preprocessing and quality control before using the unsupervised algorithm of PCA.
## 200 The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are the top 10 genes of those genes needed in preprocessing and quality control before using the unsupervised algorithm of PCA.
## 201 The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are the top 10 genes of those genes needed in preprocessing and quality control before using the unsupervised algorithm of PCA.
## 202 The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are the top 10 genes of those genes needed in preprocessing and quality control before using the unsupervised algorithm of PCA.
## 203 The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are the top 10 genes of those genes needed in preprocessing and quality control before using the unsupervised algorithm of PCA.
##     GSE_study_ID
## 198    GSE318371
## 199    GSE318371
## 200    GSE318371
## 201    GSE318371
## 202    GSE318371
## 203    GSE318371

We should add in the alternate names if we have them for the Ensembl_ID feature based on the Genecards_ID name.

neededEnsembleID <- subset(pathology3, pathology3$GSE_study_ID == "GSE318371")
neededEnsembleID
##     Ensembl_ID    Genecards_ID FC_pathology_control
## 185       <NA>         ANKRD22           50.6833391
## 186       <NA>          CXCL10           28.5923929
## 187       <NA>           IFI27           27.9345988
## 188       <NA>           IL1R2           16.5196158
## 189       <NA>          NCKAP5            0.3399962
## 190       <NA>          FCER1A            0.3322426
## 191       <NA> ENSG00000286797            0.3236574
## 192       <NA> ENSG00000240086            0.3104035
## 193       <NA>           NRCAM            0.2786855
## 194       <NA>          S100A9            1.5474204
## 195       <NA>             LYZ            1.3220576
## 196       <NA>           IGLC3            1.1257449
## 197       <NA>            SOX5            1.0760944
## 198       <NA>            PPBP            1.0299078
## 199       <NA>            IGKC            0.9858088
## 200       <NA>           IGLC2            0.9634843
## 201       <NA>             PF4            0.7387458
## 202       <NA>          LINGO2            0.6979391
## 203       <NA>            EREG            0.6300822
##                               topGenePathology               mediaType
## 185 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 186 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 187 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 188 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 189 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 190 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 191 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 192 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 193 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 194 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 195 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 196 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 197 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 198 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 199 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 200 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 201 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 202 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 203 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         studySummarized
## 185 The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are those genes having highest fold change values of under and overexpressed or stimulated or silenced the most using gene names extracted from Seurat and ran separately and also having those complete cases where all samples had a gene sequencing value to use the mean sample of pathology NKTCL over the healthy sample mean value for fold change, and those common to those complete cases with the top 2,000 highest variability genes in Seurat that was the last step to do before ability to run the Seurat PCA algorithm of unsupervised machine learning analysis but not the PCA genes found.
## 186 The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are those genes having highest fold change values of under and overexpressed or stimulated or silenced the most using gene names extracted from Seurat and ran separately and also having those complete cases where all samples had a gene sequencing value to use the mean sample of pathology NKTCL over the healthy sample mean value for fold change, and those common to those complete cases with the top 2,000 highest variability genes in Seurat that was the last step to do before ability to run the Seurat PCA algorithm of unsupervised machine learning analysis but not the PCA genes found.
## 187 The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are those genes having highest fold change values of under and overexpressed or stimulated or silenced the most using gene names extracted from Seurat and ran separately and also having those complete cases where all samples had a gene sequencing value to use the mean sample of pathology NKTCL over the healthy sample mean value for fold change, and those common to those complete cases with the top 2,000 highest variability genes in Seurat that was the last step to do before ability to run the Seurat PCA algorithm of unsupervised machine learning analysis but not the PCA genes found.
## 188 The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are those genes having highest fold change values of under and overexpressed or stimulated or silenced the most using gene names extracted from Seurat and ran separately and also having those complete cases where all samples had a gene sequencing value to use the mean sample of pathology NKTCL over the healthy sample mean value for fold change, and those common to those complete cases with the top 2,000 highest variability genes in Seurat that was the last step to do before ability to run the Seurat PCA algorithm of unsupervised machine learning analysis but not the PCA genes found.
## 189 The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are those genes having highest fold change values of under and overexpressed or stimulated or silenced the most using gene names extracted from Seurat and ran separately and also having those complete cases where all samples had a gene sequencing value to use the mean sample of pathology NKTCL over the healthy sample mean value for fold change, and those common to those complete cases with the top 2,000 highest variability genes in Seurat that was the last step to do before ability to run the Seurat PCA algorithm of unsupervised machine learning analysis but not the PCA genes found.
## 190 The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are those genes having highest fold change values of under and overexpressed or stimulated or silenced the most using gene names extracted from Seurat and ran separately and also having those complete cases where all samples had a gene sequencing value to use the mean sample of pathology NKTCL over the healthy sample mean value for fold change, and those common to those complete cases with the top 2,000 highest variability genes in Seurat that was the last step to do before ability to run the Seurat PCA algorithm of unsupervised machine learning analysis but not the PCA genes found.
## 191 The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are those genes having highest fold change values of under and overexpressed or stimulated or silenced the most using gene names extracted from Seurat and ran separately and also having those complete cases where all samples had a gene sequencing value to use the mean sample of pathology NKTCL over the healthy sample mean value for fold change, and those common to those complete cases with the top 2,000 highest variability genes in Seurat that was the last step to do before ability to run the Seurat PCA algorithm of unsupervised machine learning analysis but not the PCA genes found.
## 192 The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are those genes having highest fold change values of under and overexpressed or stimulated or silenced the most using gene names extracted from Seurat and ran separately and also having those complete cases where all samples had a gene sequencing value to use the mean sample of pathology NKTCL over the healthy sample mean value for fold change, and those common to those complete cases with the top 2,000 highest variability genes in Seurat that was the last step to do before ability to run the Seurat PCA algorithm of unsupervised machine learning analysis but not the PCA genes found.
## 193 The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are those genes having highest fold change values of under and overexpressed or stimulated or silenced the most using gene names extracted from Seurat and ran separately and also having those complete cases where all samples had a gene sequencing value to use the mean sample of pathology NKTCL over the healthy sample mean value for fold change, and those common to those complete cases with the top 2,000 highest variability genes in Seurat that was the last step to do before ability to run the Seurat PCA algorithm of unsupervised machine learning analysis but not the PCA genes found.
## 194                                                                                                                                                                                                                                                                                                                                                                                                                                                                               The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are the top 10 genes of those genes needed in preprocessing and quality control before using the unsupervised algorithm of PCA.
## 195                                                                                                                                                                                                                                                                                                                                                                                                                                                                               The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are the top 10 genes of those genes needed in preprocessing and quality control before using the unsupervised algorithm of PCA.
## 196                                                                                                                                                                                                                                                                                                                                                                                                                                                                               The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are the top 10 genes of those genes needed in preprocessing and quality control before using the unsupervised algorithm of PCA.
## 197                                                                                                                                                                                                                                                                                                                                                                                                                                                                               The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are the top 10 genes of those genes needed in preprocessing and quality control before using the unsupervised algorithm of PCA.
## 198                                                                                                                                                                                                                                                                                                                                                                                                                                                                               The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are the top 10 genes of those genes needed in preprocessing and quality control before using the unsupervised algorithm of PCA.
## 199                                                                                                                                                                                                                                                                                                                                                                                                                                                                               The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are the top 10 genes of those genes needed in preprocessing and quality control before using the unsupervised algorithm of PCA.
## 200                                                                                                                                                                                                                                                                                                                                                                                                                                                                               The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are the top 10 genes of those genes needed in preprocessing and quality control before using the unsupervised algorithm of PCA.
## 201                                                                                                                                                                                                                                                                                                                                                                                                                                                                               The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are the top 10 genes of those genes needed in preprocessing and quality control before using the unsupervised algorithm of PCA.
## 202                                                                                                                                                                                                                                                                                                                                                                                                                                                                               The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are the top 10 genes of those genes needed in preprocessing and quality control before using the unsupervised algorithm of PCA.
## 203                                                                                                                                                                                                                                                                                                                                                                                                                                                                               The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are the top 10 genes of those genes needed in preprocessing and quality control before using the unsupervised algorithm of PCA.
##     GSE_study_ID
## 185    GSE318371
## 186    GSE318371
## 187    GSE318371
## 188    GSE318371
## 189    GSE318371
## 190    GSE318371
## 191    GSE318371
## 192    GSE318371
## 193    GSE318371
## 194    GSE318371
## 195    GSE318371
## 196    GSE318371
## 197    GSE318371
## 198    GSE318371
## 199    GSE318371
## 200    GSE318371
## 201    GSE318371
## 202    GSE318371
## 203    GSE318371
neededEnsembleID$Genecards_ID
##  [1] "ANKRD22"         "CXCL10"          "IFI27"           "IL1R2"          
##  [5] "NCKAP5"          "FCER1A"          "ENSG00000286797" "ENSG00000240086"
##  [9] "NRCAM"           "S100A9"          "LYZ"             "IGLC3"          
## [13] "SOX5"            "PPBP"            "IGKC"            "IGLC2"          
## [17] "PF4"             "LINGO2"          "EREG"

Lets go ahead and find those genes and add them from genecards.org.

ENSG00000152766 for ANKRD22 ENSG00000169245 for CXCL10 ENSG00000165949 for IFI27 ENSG00000115590 for IL1R2 ENSG00000176771 for NCKAP5 ENSG00000179639 for FCER1A ENSG00000286797 for same

ENSG00000240086 now is LOC102724019 in genecards

ENSG00000303545 NRCAM ENSG00000237803 PIRAT1 is newest for S100A9 ENSG00000257764 LYZ NA lnc-RSPH14-1 for IGLC3 ENSG00000256473 for SOX5 ENSG00000287037 for PPBP ENSG00000295771 for old IGKC NA lnc-RSPH14-1 for IGLC2 NA HSALNG0035179 for PF4 ENSG00000302413 for old LINGO2 ENSG00000304732 LOC105377276 for old EREG

We can add them to our database but also change the genecards ID for #8.

pathology3$Ensembl_ID[185:203] <- c("ENSG00000152766",
"ENSG00000169245",
"ENSG00000165949",
"ENSG00000115590",
"ENSG00000176771" ,
"ENSG00000179639" ,
"ENSG00000286797" ,

"ENSG00000240086" ,

"ENSG00000303545" ,
"ENSG00000237803" ,
"ENSG00000257764",
"NA" ,
"ENSG00000256473" ,
"ENSG00000287037" ,
"ENSG00000295771" ,
"NA" ,
"NA" ,
"ENSG00000302413",
"ENSG00000304732"  )
tail(pathology3,20)
##          Ensembl_ID    Genecards_ID FC_pathology_control
## 184 ENSG00000110092           CCND1           11.1412967
## 185 ENSG00000152766         ANKRD22           50.6833391
## 186 ENSG00000169245          CXCL10           28.5923929
## 187 ENSG00000165949           IFI27           27.9345988
## 188 ENSG00000115590           IL1R2           16.5196158
## 189 ENSG00000176771          NCKAP5            0.3399962
## 190 ENSG00000179639          FCER1A            0.3322426
## 191 ENSG00000286797 ENSG00000286797            0.3236574
## 192 ENSG00000240086 ENSG00000240086            0.3104035
## 193 ENSG00000303545           NRCAM            0.2786855
## 194 ENSG00000237803          S100A9            1.5474204
## 195 ENSG00000257764             LYZ            1.3220576
## 196              NA           IGLC3            1.1257449
## 197 ENSG00000256473            SOX5            1.0760944
## 198 ENSG00000287037            PPBP            1.0299078
## 199 ENSG00000295771            IGKC            0.9858088
## 200              NA           IGLC2            0.9634843
## 201              NA             PF4            0.7387458
## 202 ENSG00000302413          LINGO2            0.6979391
## 203 ENSG00000304732            EREG            0.6300822
##                               topGenePathology               mediaType
## 184                              mononucleosis            RBC microRNA
## 185 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 186 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 187 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 188 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 189 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 190 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 191 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 192 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 193 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 194 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 195 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 196 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 197 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 198 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 199 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 200 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 201 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 202 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 203 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         studySummarized
## 184                                                                                                                                                                                                                                                                                                                                                                        There were initially 5 patients that had blood drawn and the micro RNA was taken at initial diagnosis, then 1 month later, then 2 months later only for 4 of 5 patients, and then 7 months later for remaining 2 patients. These microRNA had different ID_REF names that were searched in miRbase.org and from there leads to Ensemble ID and/or genecards.org ID provided, BLAST was searched from the short ribonucleic strand that miRbase.org provided from the microRNA ID_REF, and top ranked gene selected to compare on the chromosome. These genes were filtered out of top 10 enhanced and bottom 10 silenced genes in analyzing the gene expression data from the GSE109220 study in NCBI databank. The numeric data was already normalized a specific method by the researchers
## 185 The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are those genes having highest fold change values of under and overexpressed or stimulated or silenced the most using gene names extracted from Seurat and ran separately and also having those complete cases where all samples had a gene sequencing value to use the mean sample of pathology NKTCL over the healthy sample mean value for fold change, and those common to those complete cases with the top 2,000 highest variability genes in Seurat that was the last step to do before ability to run the Seurat PCA algorithm of unsupervised machine learning analysis but not the PCA genes found.
## 186 The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are those genes having highest fold change values of under and overexpressed or stimulated or silenced the most using gene names extracted from Seurat and ran separately and also having those complete cases where all samples had a gene sequencing value to use the mean sample of pathology NKTCL over the healthy sample mean value for fold change, and those common to those complete cases with the top 2,000 highest variability genes in Seurat that was the last step to do before ability to run the Seurat PCA algorithm of unsupervised machine learning analysis but not the PCA genes found.
## 187 The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are those genes having highest fold change values of under and overexpressed or stimulated or silenced the most using gene names extracted from Seurat and ran separately and also having those complete cases where all samples had a gene sequencing value to use the mean sample of pathology NKTCL over the healthy sample mean value for fold change, and those common to those complete cases with the top 2,000 highest variability genes in Seurat that was the last step to do before ability to run the Seurat PCA algorithm of unsupervised machine learning analysis but not the PCA genes found.
## 188 The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are those genes having highest fold change values of under and overexpressed or stimulated or silenced the most using gene names extracted from Seurat and ran separately and also having those complete cases where all samples had a gene sequencing value to use the mean sample of pathology NKTCL over the healthy sample mean value for fold change, and those common to those complete cases with the top 2,000 highest variability genes in Seurat that was the last step to do before ability to run the Seurat PCA algorithm of unsupervised machine learning analysis but not the PCA genes found.
## 189 The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are those genes having highest fold change values of under and overexpressed or stimulated or silenced the most using gene names extracted from Seurat and ran separately and also having those complete cases where all samples had a gene sequencing value to use the mean sample of pathology NKTCL over the healthy sample mean value for fold change, and those common to those complete cases with the top 2,000 highest variability genes in Seurat that was the last step to do before ability to run the Seurat PCA algorithm of unsupervised machine learning analysis but not the PCA genes found.
## 190 The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are those genes having highest fold change values of under and overexpressed or stimulated or silenced the most using gene names extracted from Seurat and ran separately and also having those complete cases where all samples had a gene sequencing value to use the mean sample of pathology NKTCL over the healthy sample mean value for fold change, and those common to those complete cases with the top 2,000 highest variability genes in Seurat that was the last step to do before ability to run the Seurat PCA algorithm of unsupervised machine learning analysis but not the PCA genes found.
## 191 The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are those genes having highest fold change values of under and overexpressed or stimulated or silenced the most using gene names extracted from Seurat and ran separately and also having those complete cases where all samples had a gene sequencing value to use the mean sample of pathology NKTCL over the healthy sample mean value for fold change, and those common to those complete cases with the top 2,000 highest variability genes in Seurat that was the last step to do before ability to run the Seurat PCA algorithm of unsupervised machine learning analysis but not the PCA genes found.
## 192 The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are those genes having highest fold change values of under and overexpressed or stimulated or silenced the most using gene names extracted from Seurat and ran separately and also having those complete cases where all samples had a gene sequencing value to use the mean sample of pathology NKTCL over the healthy sample mean value for fold change, and those common to those complete cases with the top 2,000 highest variability genes in Seurat that was the last step to do before ability to run the Seurat PCA algorithm of unsupervised machine learning analysis but not the PCA genes found.
## 193 The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are those genes having highest fold change values of under and overexpressed or stimulated or silenced the most using gene names extracted from Seurat and ran separately and also having those complete cases where all samples had a gene sequencing value to use the mean sample of pathology NKTCL over the healthy sample mean value for fold change, and those common to those complete cases with the top 2,000 highest variability genes in Seurat that was the last step to do before ability to run the Seurat PCA algorithm of unsupervised machine learning analysis but not the PCA genes found.
## 194                                                                                                                                                                                                                                                                                                                                                                                                                                                                               The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are the top 10 genes of those genes needed in preprocessing and quality control before using the unsupervised algorithm of PCA.
## 195                                                                                                                                                                                                                                                                                                                                                                                                                                                                               The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are the top 10 genes of those genes needed in preprocessing and quality control before using the unsupervised algorithm of PCA.
## 196                                                                                                                                                                                                                                                                                                                                                                                                                                                                               The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are the top 10 genes of those genes needed in preprocessing and quality control before using the unsupervised algorithm of PCA.
## 197                                                                                                                                                                                                                                                                                                                                                                                                                                                                               The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are the top 10 genes of those genes needed in preprocessing and quality control before using the unsupervised algorithm of PCA.
## 198                                                                                                                                                                                                                                                                                                                                                                                                                                                                               The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are the top 10 genes of those genes needed in preprocessing and quality control before using the unsupervised algorithm of PCA.
## 199                                                                                                                                                                                                                                                                                                                                                                                                                                                                               The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are the top 10 genes of those genes needed in preprocessing and quality control before using the unsupervised algorithm of PCA.
## 200                                                                                                                                                                                                                                                                                                                                                                                                                                                                               The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are the top 10 genes of those genes needed in preprocessing and quality control before using the unsupervised algorithm of PCA.
## 201                                                                                                                                                                                                                                                                                                                                                                                                                                                                               The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are the top 10 genes of those genes needed in preprocessing and quality control before using the unsupervised algorithm of PCA.
## 202                                                                                                                                                                                                                                                                                                                                                                                                                                                                               The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are the top 10 genes of those genes needed in preprocessing and quality control before using the unsupervised algorithm of PCA.
## 203                                                                                                                                                                                                                                                                                                                                                                                                                                                                               The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are the top 10 genes of those genes needed in preprocessing and quality control before using the unsupervised algorithm of PCA.
##     GSE_study_ID
## 184    GSE109220
## 185    GSE318371
## 186    GSE318371
## 187    GSE318371
## 188    GSE318371
## 189    GSE318371
## 190    GSE318371
## 191    GSE318371
## 192    GSE318371
## 193    GSE318371
## 194    GSE318371
## 195    GSE318371
## 196    GSE318371
## 197    GSE318371
## 198    GSE318371
## 199    GSE318371
## 200    GSE318371
## 201    GSE318371
## 202    GSE318371
## 203    GSE318371

Now update the old genecards ID with latest found today.

pathology3$Genecards_ID[192] <- "LOC102724019"
pathology3$Genecards_ID[194] <- "PIRAT1" #replace S100A9
pathology3$Genecards_ID[196] <-  "lnc-RSPH14-1" #replace IGLC3
pathology3$Genecards_ID[200] <-   "lnc-RSPH14-1" #replace IGLC2
pathology3$Genecards_ID[201] <-  "HSALNG0035179" #replace PF4
pathology3$Genecards_ID[203] <-   "LOC105377276" #replace EREG
tail(pathology3,19)
##          Ensembl_ID    Genecards_ID FC_pathology_control
## 185 ENSG00000152766         ANKRD22           50.6833391
## 186 ENSG00000169245          CXCL10           28.5923929
## 187 ENSG00000165949           IFI27           27.9345988
## 188 ENSG00000115590           IL1R2           16.5196158
## 189 ENSG00000176771          NCKAP5            0.3399962
## 190 ENSG00000179639          FCER1A            0.3322426
## 191 ENSG00000286797 ENSG00000286797            0.3236574
## 192 ENSG00000240086    LOC102724019            0.3104035
## 193 ENSG00000303545           NRCAM            0.2786855
## 194 ENSG00000237803          PIRAT1            1.5474204
## 195 ENSG00000257764             LYZ            1.3220576
## 196              NA    lnc-RSPH14-1            1.1257449
## 197 ENSG00000256473            SOX5            1.0760944
## 198 ENSG00000287037            PPBP            1.0299078
## 199 ENSG00000295771            IGKC            0.9858088
## 200              NA    lnc-RSPH14-1            0.9634843
## 201              NA   HSALNG0035179            0.7387458
## 202 ENSG00000302413          LINGO2            0.6979391
## 203 ENSG00000304732    LOC105377276            0.6300822
##                               topGenePathology               mediaType
## 185 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 186 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 187 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 188 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 189 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 190 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 191 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 192 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 193 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 194 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 195 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 196 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 197 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 198 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 199 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 200 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 201 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 202 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
## 203 NKTCL Natural Killer T-Cell Lymphoma & EBV PBMC blood immune cells
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         studySummarized
## 185 The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are those genes having highest fold change values of under and overexpressed or stimulated or silenced the most using gene names extracted from Seurat and ran separately and also having those complete cases where all samples had a gene sequencing value to use the mean sample of pathology NKTCL over the healthy sample mean value for fold change, and those common to those complete cases with the top 2,000 highest variability genes in Seurat that was the last step to do before ability to run the Seurat PCA algorithm of unsupervised machine learning analysis but not the PCA genes found.
## 186 The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are those genes having highest fold change values of under and overexpressed or stimulated or silenced the most using gene names extracted from Seurat and ran separately and also having those complete cases where all samples had a gene sequencing value to use the mean sample of pathology NKTCL over the healthy sample mean value for fold change, and those common to those complete cases with the top 2,000 highest variability genes in Seurat that was the last step to do before ability to run the Seurat PCA algorithm of unsupervised machine learning analysis but not the PCA genes found.
## 187 The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are those genes having highest fold change values of under and overexpressed or stimulated or silenced the most using gene names extracted from Seurat and ran separately and also having those complete cases where all samples had a gene sequencing value to use the mean sample of pathology NKTCL over the healthy sample mean value for fold change, and those common to those complete cases with the top 2,000 highest variability genes in Seurat that was the last step to do before ability to run the Seurat PCA algorithm of unsupervised machine learning analysis but not the PCA genes found.
## 188 The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are those genes having highest fold change values of under and overexpressed or stimulated or silenced the most using gene names extracted from Seurat and ran separately and also having those complete cases where all samples had a gene sequencing value to use the mean sample of pathology NKTCL over the healthy sample mean value for fold change, and those common to those complete cases with the top 2,000 highest variability genes in Seurat that was the last step to do before ability to run the Seurat PCA algorithm of unsupervised machine learning analysis but not the PCA genes found.
## 189 The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are those genes having highest fold change values of under and overexpressed or stimulated or silenced the most using gene names extracted from Seurat and ran separately and also having those complete cases where all samples had a gene sequencing value to use the mean sample of pathology NKTCL over the healthy sample mean value for fold change, and those common to those complete cases with the top 2,000 highest variability genes in Seurat that was the last step to do before ability to run the Seurat PCA algorithm of unsupervised machine learning analysis but not the PCA genes found.
## 190 The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are those genes having highest fold change values of under and overexpressed or stimulated or silenced the most using gene names extracted from Seurat and ran separately and also having those complete cases where all samples had a gene sequencing value to use the mean sample of pathology NKTCL over the healthy sample mean value for fold change, and those common to those complete cases with the top 2,000 highest variability genes in Seurat that was the last step to do before ability to run the Seurat PCA algorithm of unsupervised machine learning analysis but not the PCA genes found.
## 191 The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are those genes having highest fold change values of under and overexpressed or stimulated or silenced the most using gene names extracted from Seurat and ran separately and also having those complete cases where all samples had a gene sequencing value to use the mean sample of pathology NKTCL over the healthy sample mean value for fold change, and those common to those complete cases with the top 2,000 highest variability genes in Seurat that was the last step to do before ability to run the Seurat PCA algorithm of unsupervised machine learning analysis but not the PCA genes found.
## 192 The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are those genes having highest fold change values of under and overexpressed or stimulated or silenced the most using gene names extracted from Seurat and ran separately and also having those complete cases where all samples had a gene sequencing value to use the mean sample of pathology NKTCL over the healthy sample mean value for fold change, and those common to those complete cases with the top 2,000 highest variability genes in Seurat that was the last step to do before ability to run the Seurat PCA algorithm of unsupervised machine learning analysis but not the PCA genes found.
## 193 The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are those genes having highest fold change values of under and overexpressed or stimulated or silenced the most using gene names extracted from Seurat and ran separately and also having those complete cases where all samples had a gene sequencing value to use the mean sample of pathology NKTCL over the healthy sample mean value for fold change, and those common to those complete cases with the top 2,000 highest variability genes in Seurat that was the last step to do before ability to run the Seurat PCA algorithm of unsupervised machine learning analysis but not the PCA genes found.
## 194                                                                                                                                                                                                                                                                                                                                                                                                                                                                               The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are the top 10 genes of those genes needed in preprocessing and quality control before using the unsupervised algorithm of PCA.
## 195                                                                                                                                                                                                                                                                                                                                                                                                                                                                               The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are the top 10 genes of those genes needed in preprocessing and quality control before using the unsupervised algorithm of PCA.
## 196                                                                                                                                                                                                                                                                                                                                                                                                                                                                               The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are the top 10 genes of those genes needed in preprocessing and quality control before using the unsupervised algorithm of PCA.
## 197                                                                                                                                                                                                                                                                                                                                                                                                                                                                               The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are the top 10 genes of those genes needed in preprocessing and quality control before using the unsupervised algorithm of PCA.
## 198                                                                                                                                                                                                                                                                                                                                                                                                                                                                               The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are the top 10 genes of those genes needed in preprocessing and quality control before using the unsupervised algorithm of PCA.
## 199                                                                                                                                                                                                                                                                                                                                                                                                                                                                               The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are the top 10 genes of those genes needed in preprocessing and quality control before using the unsupervised algorithm of PCA.
## 200                                                                                                                                                                                                                                                                                                                                                                                                                                                                               The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are the top 10 genes of those genes needed in preprocessing and quality control before using the unsupervised algorithm of PCA.
## 201                                                                                                                                                                                                                                                                                                                                                                                                                                                                               The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are the top 10 genes of those genes needed in preprocessing and quality control before using the unsupervised algorithm of PCA.
## 202                                                                                                                                                                                                                                                                                                                                                                                                                                                                               The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are the top 10 genes of those genes needed in preprocessing and quality control before using the unsupervised algorithm of PCA.
## 203                                                                                                                                                                                                                                                                                                                                                                                                                                                                               The peripheral blood mononuclear cells of immune cells were analyzed by chip sequencing using reverse mRNA in cDNA with single cell RNA sequencing, barcoded the fragments and analyzed the most common barcodes that showed up in the sparse matrix in at least 10 cells and having a gene represented with barcodes at least 300 ocurrences, the data was then filtered for best explanatory features of genes, normalized, scaled by 10,000, then normalized and highest variability genes of top 2,000 kept within Seurat library and these genes are the top 10 genes of those genes needed in preprocessing and quality control before using the unsupervised algorithm of PCA.
##     GSE_study_ID
## 185    GSE318371
## 186    GSE318371
## 187    GSE318371
## 188    GSE318371
## 189    GSE318371
## 190    GSE318371
## 191    GSE318371
## 192    GSE318371
## 193    GSE318371
## 194    GSE318371
## 195    GSE318371
## 196    GSE318371
## 197    GSE318371
## 198    GSE318371
## 199    GSE318371
## 200    GSE318371
## 201    GSE318371
## 202    GSE318371
## 203    GSE318371
Our database screenshot tail view
Our database screenshot tail view

Great now we can write this to csv and continue analysis of other EBV associated pathologies to add to this database.

setwd(path)
write.csv(pathology3, 'pathologyDB_NKTCL_added.csv', row.names=F)

Thanks so much and you can keep checking back, but here is our pathology database to keep and explore as desired.

Thanks again.