Using internet search to find a way to get gene symbols to our gene bank accession IDs. So we can add this database of genes to our Tableau collection of EBV associated and non-associated pathologies for comparison.
library(rmarkdown)
Data <- read.csv("autismDB_41kgenes_notSamples_needs_symbolID.csv", header=T, na.strings=c("","NA"," ") )#41472X8
paged_table(Data)
data <- Data[!duplicated(Data$GB_ACC),] #39930X8
#remove that 1 NA as well
data2 <- data[!is.na(data$GB_ACC),] #39929X8
Here are internet instructions:
AI Overview
To make a list of gene synonyms from gene accession IDs in R, use Bioconductor’s organism-specific annotation packages (e.g., org.Hs.eg.db for humans). The mapIds() function allows you to directly query NCBI/RefSeq/Ensembl accessions to return ALIAS (synonym) values.Step-by-Step InstructionsInstall and Load Packages:Ensure you have the core AnnotationDbi and your species-specific package (e.g., org.Hs.eg.db) installed via Bioconductor.Rif
(!requireNamespace(“BiocManager”, quietly = TRUE)) install.packages(“BiocManager”)
#BiocManager::install("org.Hs.eg.db")
library(org.Hs.eg.db)
## Loading required package: AnnotationDbi
## Loading required package: stats4
## Loading required package: BiocGenerics
## Loading required package: generics
##
## Attaching package: 'generics'
## The following objects are masked from 'package:base':
##
## as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
## setequal, union
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, aperm, append, as.data.frame, basename, cbind,
## colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
## get, grep, grepl, is.unsorted, lapply, Map, mapply, match, mget,
## order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
## rbind, Reduce, rownames, sapply, saveRDS, table, tapply, unique,
## unsplit, which.max, which.min
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
## Loading required package: IRanges
## Loading required package: S4Vectors
##
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:utils':
##
## findMatches
## The following objects are masked from 'package:base':
##
## expand.grid, I, unname
##
## Attaching package: 'IRanges'
## The following object is masked from 'package:grDevices':
##
## windows
##
library(AnnotationDbi)
Define Accession IDs and Keytype:Because accession numbers (e.g., NM_001369 or NP_001618) often include version numbers, it is best practice to strip them (e.g., using sub(“\..*“,”“, …)), as the database stores base accessions.Map and Retrieve Synonyms:Synonyms are stored under the”ALIAS” keytype. Because one accession ID can have multiple synonyms, use multivals = “list” to return a list of synonyms for each query.R#
accession_ids <- c(“NM_004448”, “NM_005228”) clean_accessions <- sub(“\..*“,”“, accession_ids)
accession_ids <- data2$GB_ACC
clean_accessions <- sub("\\..*", "", accession_ids)
gene_synonyms <- mapIds(org.Hs.eg.db, keys = clean_accessions, column = “ALIAS”, keytype = “REFSEQ”, # or “ACCNUM”, “ENSEMBL”, etc. multivals = “list”)
gene_synonyms <- mapIds(org.Hs.eg.db,
keys = clean_accessions,
column = "ALIAS",
keytype = "ACCNUM", # or "ACCNUM", "ENSEMBL", etc.
multivals = "list")
## 'select()' returned 1:many mapping between keys and columns
print(gene_synonyms)
geneID <- unlist(gene_synonyms) #39929 length of string
str(geneID[1:50])
## Named chr [1:50] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA ...
## - attr(*, "names")= chr [1:50] "M86720" "AA486138" "N51018" "H65481" ...
Tips for Best ResultsCheck Your Keytype: The most common accession types are REFSEQ (for RNA/protein accessions), ACCNUM (GenBank accessions), and ENSEMBL. Use keytypes(org.Hs.eg.db) to view all supported accession types for your organism.
No Matches? If your accessions don’t map, check that your keytype matches the exact format of your IDs, and remove any .1, .2, etc., suffixes at the end of the accession string.For a full list of mapping keys available for your species, view the official Bioconductor org.Hs.eg.db documentation.
The unlist() removes null values, we probably have a null value in the data that the na.strings parameter reading it in didn’t find.
data2$Gene_ID <- geneID
paged_table(data2[1:100,])
data3 <- data2[!is.na(data2$Gene_ID),] #only 538 genes remained
paged_table(data3)
This is our table for Tableau. Lets write it out to csv.
write.csv(data2, 'dataMergedSymbols_NAs.csv', row.names=F)
write.csv(data3,"dataMerged_Symbols_GBacc_538X9.csv", row.names=F)
This documet of the 538 genes is available here.
==============================
5-17-2026 adding these genes to our pathology database and then to Tableau for the non-associated EBV pathologies.
We are using the data3 dataset and need to import the pathologies database.
path <- "C:/...Path to current Pathologies Database/"
setwd(path)
pathologyDB <- read.csv("Pathology_database_ICC_added_5_15_2026.csv",
header=T)
colnames(pathologyDB)
## [1] "Ensembl_ID" "Genecards_ID" "FC_pathology_control"
## [4] "topGenePathology" "mediaType" "studySummarized"
## [7] "GSE_study_ID"
colnames(data3
)
## [1] "healthy_mean" "language_mean" "mild_mean"
## [4] "savant_mean" "FC_language_healthy" "FC_mild_healthy"
## [7] "FC_savant_healthy" "GB_ACC" "Gene_ID"
We need Ensembl IDs and we have it already here.
path2 <- "path to Ensemble IDs merger for Genecards IDs GSE271486 resource/"
setwd(path2)
ensembl <- read.csv("GSE271486_ensembleIDs_NPC_LBMP_study.csv", header=T)
colnames(ensembl)
## [1] "gene_id" "gene_name" "description"
## [4] "locus" "HNE_1_MUT_LMP1_1_count" "HNE_1_MUT_LMP1_2_count"
## [7] "HNE_1_MUT_LMP1_3_count" "HNE_1_WT_LMP1_1_count" "HNE_1_WT_LMP1_2_count"
## [10] "HNE_1_WT_LMP1_3_count" "HNE_1_MUT_LMP1_1_FPKM" "HNE_1_MUT_LMP1_2_FPKM"
## [13] "HNE_1_MUT_LMP1_3_FPKM" "HNE_1_WT_LMP1_1_FPKM" "HNE_1_WT_LMP1_2_FPKM"
## [16] "HNE_1_WT_LMP1_3_FPKM"
ensembl2 <- ensembl[,c(1,2)]
head(ensembl2)
## gene_id gene_name
## 1 ENSG00000000003 TSPAN6
## 2 ENSG00000000005 TNMD
## 3 ENSG00000000419 DPM1
## 4 ENSG00000000457 SCYL3
## 5 ENSG00000000460 C1orf112
## 6 ENSG00000000938 FGR
data4 <- merge(ensembl2, data3, by.x="gene_name", by.y="Gene_ID")
paged_table(data4)
write.csv(data4, "autism_geneIDs538X10.csv", row.names=F)
We want to only add the top genes to the pathology database. We need to get that file from that folder.
path3 <- "C:/Users/jlcor/Desktop/r programs 2025-2026/savant autism GSE15402/"
setwd(path3)
savant20 <- read.csv("savant_foldchange_top20_genebankAccession.csv", header=T, na.strings=c("",NA," ", "NA"))
mild20 <- read.csv("mild_foldchange_top20_genebankAccession.csv", header=T, na.strings=c("",NA," ", "NA"))
language20 <- read.csv("Language_foldchange_top20_genebankAccession.csv", header=T, na.strings=c("",NA," ", "NA"))
colnames(savant20)
## [1] "UID" "R" "C"
## [4] "GSM386518" "GSM386519" "GSM386520"
## [7] "GSM386521" "GSM386522" "GSM386523"
## [10] "GSM386524" "GSM386525" "GSM386526"
## [13] "GSM386527" "GSM386528" "GSM386529"
## [16] "GSM386530" "GSM386531" "GSM386532"
## [19] "GSM386533" "GSM386534" "GSM386535"
## [22] "GSM386536" "GSM386537" "GSM386538"
## [25] "GSM386539" "GSM386540" "GSM386541"
## [28] "GSM386542" "GSM386543" "GSM386544"
## [31] "GSM386545" "GSM386546" "GSM386547"
## [34] "GSM386548" "GSM386549" "GSM386550"
## [37] "GSM386551" "GSM386552" "GSM386553"
## [40] "GSM386554" "GSM386555" "GSM386556"
## [43] "GSM386557" "GSM386558" "GSM386559"
## [46] "GSM386560" "GSM386561" "GSM386562"
## [49] "GSM386563" "GSM386564" "GSM386565"
## [52] "GSM386566" "GSM386567" "GSM386568"
## [55] "GSM386569" "GSM386570" "GSM386571"
## [58] "GSM386572" "GSM386573" "GSM386574"
## [61] "GSM386575" "GSM386576" "GSM386577"
## [64] "GSM386578" "GSM386579" "GSM386580"
## [67] "GSM386581" "GSM386582" "GSM386583"
## [70] "GSM386584" "GSM386585" "GSM386586"
## [73] "GSM386587" "GSM386588" "GSM386589"
## [76] "GSM386590" "GSM386591" "GSM386592"
## [79] "GSM386593" "GSM386594" "GSM386595"
## [82] "GSM386596" "GSM386597" "GSM386598"
## [85] "GSM386599" "GSM386600" "GSM386601"
## [88] "GSM386602" "GSM386603" "GSM386604"
## [91] "GSM386605" "GSM386606" "GSM386607"
## [94] "GSM386608" "GSM386609" "GSM386610"
## [97] "GSM386611" "GSM386612" "GSM386613"
## [100] "GSM386614" "GSM386615" "GSM386616"
## [103] "GSM386617" "GSM386618" "GSM386619"
## [106] "GSM386620" "GSM386621" "GSM386622"
## [109] "GSM386623" "GSM386624" "GSM386625"
## [112] "GSM386626" "GSM386627" "GSM386628"
## [115] "GSM386629" "GSM386630" "GSM386631"
## [118] "GSM386632" "GSM386633" "healthy_mean"
## [121] "language_mean" "mild_mean" "savant_mean"
## [124] "FC_language_healthy" "FC_mild_healthy" "FC_savant_healthy"
## [127] "ID" "GB_ACC" "SPOT_ID"
str(data4$GB_ACC)
## chr [1:140] "AI038092" "AI208428" "N95187" "AA939070" "AA911712" ...
str(savant20$GB_ACC)
## chr [1:20] "N63864" NA "AI222331" NA "T99336" NA "AA486089" NA "AI018277" ...
Lets just keep all the Gene IDs but do a left join in the merge function to keep all Gene IDs but add in the Ensembl IDs if they are available. We have only 140 genes of autism when merging by the ensembl ID so we want to keep as many genes as possible. It looks like when we combine the top genes with ensembl ID by GB_ACC because we didn’t merge to find the gene ID that there may not be any gene symbols for our top genes. So we will just add it to Tableau but not the top genes if there are none with gene symbols.
data5 <- merge(data3, ensembl2, by.x="Gene_ID", by.y="gene_name", all.x=T)
Lets combine top genes together but add in a feature describing where it fits in for top gene by type of autism as mild, language, or savant.
mild20$autismType <- "mild"
language20$autismType <- 'language'
savant20$autismType <- 'savant'
allTopGenes60 <- rbind(mild20, language20, savant20)
paged_table(allTopGenes60)
TopGeneList <- allTopGenes60$GB_ACC
TopGeneList2 <- TopGeneList[!is.na(TopGeneList)]
TopGeneList2
## [1] "AI346406" "R53966" "R64686" "AA778346" "R40855" "W95682"
## [7] "AI351587" "AA448277" "N56875" "AA158162" "AA001718" "N73877"
## [13] "AI348022" "T86999" "T65948" "H27752" "N62522" "R93875"
## [19] "AI016962" "W03687" "R64686" "AI351317" "R38172" "AA456715"
## [25] "N51529" "AA400272" "R76772" "AA158162" "H63994" "AI198536"
## [31] "AA908954" "AI348022" "T86983" "N63864" "AI222331" "T99336"
## [37] "AA486089" "AI018277" "AI016962" "AA427367" "W03687" "N35115"
## [43] "AA778346" "N62729" "AI348022" "AA927340" "AI074469"
data6 <- data5[which(data5$GB_ACC %in% TopGeneList2),]
paged_table(data6)
None of the Autism top genes are in the database of 540 genes, so we cannot add it to our pathology database, but we can still make the tableau dashboard and compare top gene expression for EBV related pathologies to the Autism data of provided gene symbols.
write.csv(data5, 'autism540X10_ensemble_geneSymbolsAdded.csv', row.names=F)
We will go to the Tableau dashboard for this.
=======================
We do have the top genes from months ago with manual added gene summaries. We copied and pasted it into this folder and opened the pathology database file. We are using pathologyDB and the topGenes with summaries we are going to read in.
topGenes44 <- read.csv("Top44_chromosomeLocations_geneIDs_gbACC_geneSummariesAdded.csv", header=T)
colnames(topGenes44)
## [1] "Gene_ID" "GB_ACC" "ASD_group" "gene_id"
## [5] "description" "Gene_Summaries" "locus" "GSM386518"
## [9] "GSM386519" "GSM386520" "GSM386521" "GSM386522"
## [13] "GSM386523" "GSM386524" "GSM386525" "GSM386526"
## [17] "GSM386527" "GSM386528" "GSM386529" "GSM386530"
## [21] "GSM386531" "GSM386532" "GSM386533" "GSM386534"
## [25] "GSM386535" "GSM386536" "GSM386537" "GSM386538"
## [29] "GSM386539" "GSM386540" "GSM386541" "GSM386542"
## [33] "GSM386543" "GSM386544" "GSM386545" "GSM386546"
## [37] "GSM386547" "GSM386548" "GSM386549" "GSM386550"
## [41] "GSM386551" "GSM386552" "GSM386553" "GSM386554"
## [45] "GSM386555" "GSM386556" "GSM386557" "GSM386558"
## [49] "GSM386559" "GSM386560" "GSM386561" "GSM386562"
## [53] "GSM386563" "GSM386564" "GSM386565" "GSM386566"
## [57] "GSM386567" "GSM386568" "GSM386569" "GSM386570"
## [61] "GSM386571" "GSM386572" "GSM386573" "GSM386574"
## [65] "GSM386575" "GSM386576" "GSM386577" "GSM386578"
## [69] "GSM386579" "GSM386580" "GSM386581" "GSM386582"
## [73] "GSM386583" "GSM386584" "GSM386585" "GSM386586"
## [77] "GSM386587" "GSM386588" "GSM386589" "GSM386590"
## [81] "GSM386591" "GSM386592" "GSM386593" "GSM386594"
## [85] "GSM386595" "GSM386596" "GSM386597" "GSM386598"
## [89] "GSM386599" "GSM386600" "GSM386601" "GSM386602"
## [93] "GSM386603" "GSM386604" "GSM386605" "GSM386606"
## [97] "GSM386607" "GSM386608" "GSM386609" "GSM386610"
## [101] "GSM386611" "GSM386612" "GSM386613" "GSM386614"
## [105] "GSM386615" "GSM386616" "GSM386617" "GSM386618"
## [109] "GSM386619" "GSM386620" "GSM386621" "GSM386622"
## [113] "GSM386623" "GSM386624" "GSM386625" "GSM386626"
## [117] "GSM386627" "GSM386628" "GSM386629" "GSM386630"
## [121] "GSM386631" "GSM386632" "GSM386633"
We went ahead and ran all the script to get the fold change values from last dataset with fold change values allTopGenes60.
data60_44 <- merge(allTopGenes60[,c(120:128,130)], topGenes44[,c(1:4,6)], by.x="GB_ACC",by.y="GB_ACC", all.x=T)
summary(data60_44)
## Warning in grep("^[ \t\r\n]*$", object, perl = TRUE): input string 9 is invalid
## UTF-8
## GB_ACC healthy_mean language_mean mild_mean
## Length :70 Min. :0.002817 Min. :0.003074 Min. :0.00000
## N.unique :40 1st Qu.:0.004910 1st Qu.:0.019652 1st Qu.:0.01871
## N.blank : 0 Median :0.007419 Median :0.029503 Median :0.02641
## Min.nchar: 6 Mean :0.024873 Mean :0.029890 Mean :0.02604
## Max.nchar: 8 3rd Qu.:0.046016 3rd Qu.:0.041119 3rd Qu.:0.03667
## NAs :13 Max. :0.087555 Max. :0.064139 Max. :0.05772
## savant_mean FC_language_healthy FC_mild_healthy FC_savant_healthy
## Min. :0.00449 Min. : 0.05518 Min. : 0.0000 Min. : 0.1452
## 1st Qu.:0.02123 1st Qu.: 0.42589 1st Qu.: 0.5315 1st Qu.: 0.6031
## Median :0.03734 Median : 4.29281 Median : 2.6876 Median : 5.8483
## Mean :0.03530 Mean : 4.46047 Mean : 3.7646 Mean : 5.1896
## 3rd Qu.:0.04645 3rd Qu.: 8.46997 3rd Qu.: 6.7303 3rd Qu.: 8.7775
## Max. :0.07048 Max. :12.98571 Max. :11.7146 Max. :14.7887
## ID autismType Gene_ID ASD_group gene_id
## Length :70 Length :70 Length :70 Length :70 Length :70
## N.unique :50 N.unique : 3 N.unique :38 N.unique : 3 N.unique :38
## N.blank : 0 N.blank : 0 N.blank : 0 N.blank : 0 N.blank : 0
## Min.nchar: 3 Min.nchar: 4 Min.nchar: 4 Min.nchar: 4 Min.nchar:15
## Max.nchar: 7 Max.nchar: 8 Max.nchar:10 Max.nchar: 8 Max.nchar:15
## NAs :20 NAs :20 NAs :20
## Gene_Summaries
## Length :70
## N.unique :37
## N.blank : 0
## Min.nchar:NA
## Max.nchar:NA
## NAs :20
data60_44_noNAs <- data60_44[!is.na(data60_44$Gene_ID),]
summary(data60_44_noNAs)
## Warning in grep("^[ \t\r\n]*$", object, perl = TRUE): input string 9 is invalid
## UTF-8
## GB_ACC healthy_mean language_mean mild_mean
## Length :50 Min. :0.003441 Min. :0.003074 Min. :0.00000
## N.unique :36 1st Qu.:0.005007 1st Qu.:0.018465 1st Qu.:0.02045
## N.blank : 0 Median :0.021384 Median :0.029503 Median :0.02629
## Min.nchar: 6 Mean :0.028328 Mean :0.030798 Mean :0.02567
## Max.nchar: 8 3rd Qu.:0.046904 3rd Qu.:0.045449 3rd Qu.:0.03400
## Max. :0.087555 Max. :0.064139 Max. :0.05772
## savant_mean FC_language_healthy FC_mild_healthy FC_savant_healthy
## Min. :0.00643 Min. : 0.05518 Min. :0.0000 Min. : 0.1474
## 1st Qu.:0.02123 1st Qu.: 0.36208 1st Qu.:0.5159 1st Qu.: 0.5941
## Median :0.03447 Median : 2.60424 Median :1.7398 Median : 2.4552
## Mean :0.03483 Mean : 3.90610 Mean :3.0186 Mean : 4.1897
## 3rd Qu.:0.04645 3rd Qu.: 6.54350 3rd Qu.:5.1475 3rd Qu.: 7.6451
## Max. :0.07048 Max. :10.69432 Max. :9.8809 Max. :13.4551
## ID autismType Gene_ID ASD_group gene_id
## Length :50 Length :50 Length :50 Length :50 Length :50
## N.unique :36 N.unique : 3 N.unique :38 N.unique : 3 N.unique :38
## N.blank : 0 N.blank : 0 N.blank : 0 N.blank : 0 N.blank : 0
## Min.nchar: 3 Min.nchar: 4 Min.nchar: 4 Min.nchar: 4 Min.nchar:15
## Max.nchar: 6 Max.nchar: 8 Max.nchar:10 Max.nchar: 8 Max.nchar:15
##
## Gene_Summaries
## Length :50
## N.unique :37
## N.blank : 0
## Min.nchar:NA
## Max.nchar:NA
##
paged_table(data60_44_noNAs)
dataFiltered60_44 <- data60_44_noNAs[data60_44_noNAs$ASD_group == data60_44_noNAs$autismType,]
summary(dataFiltered60_44)
## Warning in grep("^[ \t\r\n]*$", object, perl = TRUE): input string 7 is invalid
## UTF-8
## GB_ACC healthy_mean language_mean mild_mean
## Length :42 Min. :0.003441 Min. :0.003074 Min. :0.00000
## N.unique :36 1st Qu.:0.005724 1st Qu.:0.015901 1st Qu.:0.01458
## N.blank : 0 Median :0.042595 Median :0.026142 Median :0.02629
## Min.nchar: 6 Mean :0.032746 Mean :0.028569 Mean :0.02517
## Max.nchar: 8 3rd Qu.:0.048933 3rd Qu.:0.037851 3rd Qu.:0.03511
## Max. :0.087555 Max. :0.064139 Max. :0.05772
## savant_mean FC_language_healthy FC_mild_healthy FC_savant_healthy
## Min. :0.00643 Min. : 0.05518 Min. :0.0000 Min. : 0.1474
## 1st Qu.:0.02123 1st Qu.: 0.31747 1st Qu.:0.4833 1st Qu.: 0.4278
## Median :0.03347 Median : 0.79052 Median :0.9131 Median : 0.7998
## Mean :0.03313 Mean : 3.05465 Mean :2.4213 Mean : 3.4079
## 3rd Qu.:0.04254 3rd Qu.: 5.96785 3rd Qu.:4.2691 3rd Qu.: 6.2473
## Max. :0.07048 Max. :10.69432 Max. :9.8809 Max. :13.4551
## ID autismType Gene_ID ASD_group gene_id
## Length :42 Length :42 Length :42 Length :42 Length :42
## N.unique :36 N.unique : 3 N.unique :38 N.unique : 3 N.unique :38
## N.blank : 0 N.blank : 0 N.blank : 0 N.blank : 0 N.blank : 0
## Min.nchar: 3 Min.nchar: 4 Min.nchar: 4 Min.nchar: 4 Min.nchar:15
## Max.nchar: 6 Max.nchar: 8 Max.nchar:10 Max.nchar: 8 Max.nchar:15
##
## Gene_Summaries
## Length :42
## N.unique :37
## N.blank : 0
## Min.nchar:NA
## Max.nchar:NA
##
paged_table(dataFiltered60_44)
dataKeptPathology <- dataFiltered60_44[,c(2:8,11:14)]
paged_table(dataKeptPathology)
mild1 <- dataKeptPathology[dataKeptPathology$ASD_group == "mild",]
savant1 <- dataKeptPathology[dataKeptPathology$ASD_group == "savant",]
language1 <- dataKeptPathology[dataKeptPathology$ASD_group == "language",]
mild2 <- mild1[,c(6,8:11)]
savant2 <- savant1[,c(7:11)]
language2 <- language1[,c(5,8:11)]
str(mild2)
## 'data.frame': 12 obs. of 5 variables:
## $ FC_mild_healthy: num 0.0672 9.8809 0.0756 6.7417 0.0836 ...
## $ Gene_ID : chr "BMPER" "TUT1" "FOXO1" "PRMT2" ...
## $ ASD_group : chr "mild" "mild" "mild" "mild" ...
## $ gene_id : chr "ENSG00000164619" "ENSG00000149016" "ENSG00000150907" "ENSG00000160310" ...
## $ Gene_Summaries : chr "NCBI Gene Summary for BMPER Gene \nThis gene encodes a secreted protein that interacts with, and inhibits bone "| __truncated__ "GeneCards Summary for ENSG00000308510 Gene\nENSG00000308510 (Novel Transcript, Antisense To MTA2 And TUT1) is a"| __truncated__ "Aliases for LINC00598 Gene\nGeneCards Symbol: LINC00598 2 \nLong Intergenic Non-Protein Coding RNA 598 2 3 5\nL"| __truncated__ "GeneCards Summary for MCM3AP-AS1 Gene\nMCM3AP-AS1 (MCM3AP Antisense RNA 1) is an RNA Gene, and is affiliated wi"| __truncated__ ...
colnames(mild2) <- c("FC_pathology_control","Genecards_ID","topGenePathology","Ensembl_ID","Gene_Summaries")
mild2$topGenePathology <- "mild Autism"
paged_table(mild2)
colnames(savant2) <- c("FC_pathology_control","Genecards_ID","topGenePathology","Ensembl_ID","Gene_Summaries")
savant2$topGenePathology <- "savant Autism"
paged_table(savant2)
colnames(language2) <- c("FC_pathology_control","Genecards_ID","topGenePathology","Ensembl_ID","Gene_Summaries")
language2$topGenePathology <- "language Autism"
paged_table(language2)
mild3 <- mild2[,c(4,2,1,3)]
savant3 <- savant2[,c(4,2,1,3)]
language3 <- language2[,c(4,2,1,3)]
mild3$mediaType <- "total RNA lymphoblastoid cell lines or LCLs in PBMCs"
mild3$studySummarized <- paste("Autism was studied and compared with healthy patients and 3 types of autism of mild, language, and savant. This group is mild Autism for fold change of the mean of mild over the mean of healthy.", mild2$Gene_Summaries, sep = "--- gene summaries added manually --- ")
mild3$GSE_study_ID <- "GSE15402"
savant3$mediaType <- "total RNA lymphoblastoid cell lines or LCLs in PBMCs"
savant3$studySummarized <- paste("Autism was studied and compared with healthy patients and 3 types of autism of mild, language, and savant. This group is savant Autism for fold change of the mean of savant over the mean of healthy.",savant2$Gene_Summaries, sep = "--- gene summaries added manually --- ")
savant3$GSE_study_ID <- "GSE15402"
language3$mediaType <- "total RNA lymphoblastoid cell lines or LCLs in PBMCs"
language3$studySummarized <- paste("Autism was studied and compared with healthy patients and 3 types of autism of mild, language, and savant. This group is language Autism for fold change of the mean of language over the mean of healthy.", language2$Gene_Summaries, sep = "--- gene summaries added manually --- ")
language3$GSE_study_ID <- "GSE15402"
combinedGenesTop <- rbind(mild3,language3,savant3)
paged_table(combinedGenesTop)
We can add these now to the pathology database.
pathologyDB_added <- rbind(pathologyDB, combinedGenesTop)
paged_table(pathologyDB_added)
There are now 521 genes in the Autism database.
Lets write this file out to the all pathologies data folder.
setwd(path)
write.csv(pathologyDB_added, "pathology_DB_autismAdded_5-18-2026.csv", row.names=F)