This is a quick addition as Part 2 to part 1, that is attached to the end of this part 1 version by a row of equal signs and 3 asterick marks, plus you can search for “Part 2” to get to that section. We just look at duplicate genes in at least one set and use those to keep in our pathologies database of genes related to uterine fibroid, add them to the Pathologies database, upload, and share.
This is meant to be a quick study and we will see what we find, but I was wondering if it is possible there is some connection between the tumor growth and associated genes involved in the studies I have been analyzing in the last few months that are related to Epstein-Barr viral (EBV) infection and latent EBV infection and associated pathologies. I have pulled this study up that says it compares the uterine fibroids of myometrial lining tissue in Black and White females. There are gene expression profiles to compare. There was a research article I can resort to for further details available at the NCBI site external link to the PMID number. This is GSE244187 for the study ID.
library(rmarkdown)
Lets look at what we have.
UL <- read.table("GSE244187_AlHendy_BulkTissue_Mar2021.featureCounts-genes.xls.gz", header=T)
str(UL)
## 'data.frame': 58735 obs. of 21 variables:
## $ GeneID : chr "ENSG00000223972" "ENSG00000227232" "ENSG00000278267" "ENSG00000243485" ...
## $ GeneSymbol : chr "DDX11L1" "WASH7P" "MIR6859-1" "MIR1302-2HG" ...
## $ GeneBiotype : chr "transcribed_unprocessed_pseudogene" "unprocessed_pseudogene" "miRNA" "lincRNA" ...
## $ rnamap.trim.MyoF.348_S12.geneAbundanceHisat2 : int 0 3 0 0 0 0 0 0 0 0 ...
## $ rnamap.trim.MyoF.428_S11.geneAbundanceHisat2 : int 0 3 0 0 0 0 0 0 0 0 ...
## $ rnamap.trim.MyoF.483_S8.geneAbundanceHisat2 : int 3 10 2 1 0 6 0 9 4 23 ...
## $ rnamap.trim.MyoF.526_S10.geneAbundanceHisat2 : int 0 1 3 0 0 0 0 0 0 0 ...
## $ rnamap.trim.MyoF.UI.10_S7.geneAbundanceHisat2: int 0 4 0 0 0 0 0 0 0 0 ...
## $ rnamap.trim.MyoF.UI.13_S9.geneAbundanceHisat2: int 0 2 0 0 0 0 0 0 0 0 ...
## $ rnamap.trim.MyoN.432_S4.geneAbundanceHisat2 : int 0 1 0 0 0 0 0 0 0 0 ...
## $ rnamap.trim.MyoN.514_S2.geneAbundanceHisat2 : int 0 1 0 0 0 0 0 0 0 0 ...
## $ rnamap.trim.MyoN.549_S5.geneAbundanceHisat2 : int 0 1 0 1 0 0 0 0 0 0 ...
## $ rnamap.trim.MyoN.UI.20_S1.geneAbundanceHisat2: int 1 5 0 0 0 0 0 0 0 0 ...
## $ rnamap.trim.MyoN.UI.43_S3.geneAbundanceHisat2: int 0 4 1 0 0 0 0 0 0 0 ...
## $ rnamap.trim.MyoN.UI.8_S6.geneAbundanceHisat2 : int 0 1 0 0 0 0 0 0 0 0 ...
## $ rnamap.trim.UF.372_S18.geneAbundanceHisat2 : int 0 2 2 0 0 0 0 0 0 0 ...
## $ rnamap.trim.UF.428_S17.geneAbundanceHisat2 : int 0 1 0 0 0 0 0 0 0 0 ...
## $ rnamap.trim.UF.483_S14.geneAbundanceHisat2 : int 0 2 0 0 0 0 0 0 0 1 ...
## $ rnamap.trim.UF.526_S16.geneAbundanceHisat2 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ rnamap.trim.UF.UI.13_S15.geneAbundanceHisat2 : int 0 1 0 0 0 0 0 0 0 0 ...
## $ rnamap.trim.UF.UI.23_S13.geneAbundanceHisat2 : int 0 1 0 0 0 1 0 2 0 2 ...
This research study being analyzed uses uterine fibroid tissue, myometrial uterine tissue of normal uterus (not belonging to same uterus as UF or at risk of UF biopsied), and tissue at risk for uterine fibroid that is adjacent to the uterine fibroid in only Black and White women.
I tried to separate the family.soft file into samples with name and ID to describe the samples but the names of each feature already provides that information.
colnames(UL)
## [1] "GeneID"
## [2] "GeneSymbol"
## [3] "GeneBiotype"
## [4] "rnamap.trim.MyoF.348_S12.geneAbundanceHisat2"
## [5] "rnamap.trim.MyoF.428_S11.geneAbundanceHisat2"
## [6] "rnamap.trim.MyoF.483_S8.geneAbundanceHisat2"
## [7] "rnamap.trim.MyoF.526_S10.geneAbundanceHisat2"
## [8] "rnamap.trim.MyoF.UI.10_S7.geneAbundanceHisat2"
## [9] "rnamap.trim.MyoF.UI.13_S9.geneAbundanceHisat2"
## [10] "rnamap.trim.MyoN.432_S4.geneAbundanceHisat2"
## [11] "rnamap.trim.MyoN.514_S2.geneAbundanceHisat2"
## [12] "rnamap.trim.MyoN.549_S5.geneAbundanceHisat2"
## [13] "rnamap.trim.MyoN.UI.20_S1.geneAbundanceHisat2"
## [14] "rnamap.trim.MyoN.UI.43_S3.geneAbundanceHisat2"
## [15] "rnamap.trim.MyoN.UI.8_S6.geneAbundanceHisat2"
## [16] "rnamap.trim.UF.372_S18.geneAbundanceHisat2"
## [17] "rnamap.trim.UF.428_S17.geneAbundanceHisat2"
## [18] "rnamap.trim.UF.483_S14.geneAbundanceHisat2"
## [19] "rnamap.trim.UF.526_S16.geneAbundanceHisat2"
## [20] "rnamap.trim.UF.UI.13_S15.geneAbundanceHisat2"
## [21] "rnamap.trim.UF.UI.23_S13.geneAbundanceHisat2"
These are RNA maps and trimmed for UF or uterine fibroid is first guess, then the MyoN is the normal myometrium, and the MyoF is the myometrial tissue next to fibroid. The samples page explains on each page what the sample is by race since this study is examining differences in race. The site does say the above abbreviations are as stated or guessed. Each sample when clicked will state the race. The first one is white ending in 414, white for 415, black for 416, white for 417, black for 418, black 419, white 420, black 421, white 422, black 423, black 424, white 425, white 426, white 427, black 428, white 429, black 430, and black 431.
Lets rename these columns by their race type and also trim the other title identifiers.
colnames(UL) <- gsub('rnamap.trim.','',colnames(UL))
colnames(UL) <- gsub('.geneAbundanceHisat2','', colnames(UL))
colnames(UL)
## [1] "GeneID" "GeneSymbol" "GeneBiotype" "MyoF.348_S12"
## [5] "MyoF.428_S11" "MyoF.483_S8" "MyoF.526_S10" "MyoF.UI.10_S7"
## [9] "MyoF.UI.13_S9" "MyoN.432_S4" "MyoN.514_S2" "MyoN.549_S5"
## [13] "MyoN.UI.20_S1" "MyoN.UI.43_S3" "MyoN.UI.8_S6" "UF.372_S18"
## [17] "UF.428_S17" "UF.483_S14" "UF.526_S16" "UF.UI.13_S15"
## [21] "UF.UI.23_S13"
The first one is white ending in 414, white for 415, black for 416, white for 417, black for 418, black 419, white 420, black 421, white 422, black 423, black 424, white 425, white 426, white 427, black 428, white 429, black 430, and black 431.
white <- c(1,2,4,7,9,12,13,14,16)
black <- c(3,5,6,8,10,11,15,17,18)
names <- colnames(UL)[4:21]
names[white] <- paste(names[white],"white",sep='_')
names[black] <- paste(names[black],"black",sep='_')
colnames(UL)[4:21] <- names
colnames(UL)
## [1] "GeneID" "GeneSymbol" "GeneBiotype"
## [4] "MyoF.348_S12_white" "MyoF.428_S11_white" "MyoF.483_S8_black"
## [7] "MyoF.526_S10_white" "MyoF.UI.10_S7_black" "MyoF.UI.13_S9_black"
## [10] "MyoN.432_S4_white" "MyoN.514_S2_black" "MyoN.549_S5_white"
## [13] "MyoN.UI.20_S1_black" "MyoN.UI.43_S3_black" "MyoN.UI.8_S6_white"
## [16] "UF.372_S18_white" "UF.428_S17_white" "UF.483_S14_black"
## [19] "UF.526_S16_white" "UF.UI.13_S15_black" "UF.UI.23_S13_black"
I am just using these next data frames as place holders to point out each sample in a set. But won’t be using these data frames for anything other than that.
normal <- UL[,c(10:15)]
UF <- UL[,c(16:21)]
UF_risk <- UL[,c(4:9)]
normal_white <- UL[,c(10,12,15)]
UF_white <- UL[,c(16,17,19)]
UF_risk_white <- UL[,c(4,5,7)]
normal_black <- UL[,c(11,13,14)]
UF_black <- UL[,c(18,20,21)]
UF_risk_black <- UL[,c(6,8,9)]
These samples are balanced equally as there are half White and half Black samples, as well as half the normal myometrial tissue is White and the other half is Black, the Uterine Fibroid tissue is divided equally and so is the myometrial adjacent to a uterine fibroid as at risk it is also equally divided in half for half are White and the other half is Black.
We can evenly split the gene expression values by race like the study and by tissue type. This is RNA high throughput sequencing of the myometrial tissue that was normal but in the same uterus (correction, not the same uterus, but a uterus that was removed for personal or other health reasons but a perfectly healthy uterus, like pelvic organ prolapse) as the uterine fibroid and adjacent tissue to the uterine fibroid.
Lets see what we find by mean value of the samples by tissue type then by tissue type specific to each race.
UL$normal_all_mean <- rowMeans(UL[,c(10:15)])
UL$UF_all_mean <- rowMeans(UL[,c(16:21)])
UL$UF_all_risk_mean <- rowMeans(UL[,c(4:9)])
UL$normal_white_mean <- rowMeans(UL[,c(10,12,15)])
UL$UF_white_mean <- rowMeans(UL[,c(16,17,19)])
UL$UF_risk_white_mean <- rowMeans(UL[,c(4,5,7)])
UL$normal_black_mean <- rowMeans(UL[,c(11,13,14)])
UL$UF_black_mean <- rowMeans(UL[,c(18,20,21)])
UL$UF_risk_black_mean <- rowMeans(UL[,c(6,8,9)])
colnames(UL)
## [1] "GeneID" "GeneSymbol" "GeneBiotype"
## [4] "MyoF.348_S12_white" "MyoF.428_S11_white" "MyoF.483_S8_black"
## [7] "MyoF.526_S10_white" "MyoF.UI.10_S7_black" "MyoF.UI.13_S9_black"
## [10] "MyoN.432_S4_white" "MyoN.514_S2_black" "MyoN.549_S5_white"
## [13] "MyoN.UI.20_S1_black" "MyoN.UI.43_S3_black" "MyoN.UI.8_S6_white"
## [16] "UF.372_S18_white" "UF.428_S17_white" "UF.483_S14_black"
## [19] "UF.526_S16_white" "UF.UI.13_S15_black" "UF.UI.23_S13_black"
## [22] "normal_all_mean" "UF_all_mean" "UF_all_risk_mean"
## [25] "normal_white_mean" "UF_white_mean" "UF_risk_white_mean"
## [28] "normal_black_mean" "UF_black_mean" "UF_risk_black_mean"
Now we can add the fold change values of the sample type to normal for UF and UF_risk for all, then by race within race of normal as this study did to see any differences.
UL$UF_normal_all_FC <- UL$UF_all_mean/UL$normal_all_mean
UL$UF_risk_normal_all_FC <- UL$UF_all_risk_mean/UL$normal_all_mean
UL$UF_normal_white_FC <- UL$UF_white_mean/UL$normal_white_mean
UL$UF_risk_white_FC <- UL$UF_risk_white_mean/UL$normal_white_mean
UL$UF_normal_black_FC <- UL$UF_black_mean/UL$normal_black_mean
UL$UF_risk_black_FC <- UL$UF_risk_black_mean/UL$normal_black_mean
colnames(UL)
## [1] "GeneID" "GeneSymbol" "GeneBiotype"
## [4] "MyoF.348_S12_white" "MyoF.428_S11_white" "MyoF.483_S8_black"
## [7] "MyoF.526_S10_white" "MyoF.UI.10_S7_black" "MyoF.UI.13_S9_black"
## [10] "MyoN.432_S4_white" "MyoN.514_S2_black" "MyoN.549_S5_white"
## [13] "MyoN.UI.20_S1_black" "MyoN.UI.43_S3_black" "MyoN.UI.8_S6_white"
## [16] "UF.372_S18_white" "UF.428_S17_white" "UF.483_S14_black"
## [19] "UF.526_S16_white" "UF.UI.13_S15_black" "UF.UI.23_S13_black"
## [22] "normal_all_mean" "UF_all_mean" "UF_all_risk_mean"
## [25] "normal_white_mean" "UF_white_mean" "UF_risk_white_mean"
## [28] "normal_black_mean" "UF_black_mean" "UF_risk_black_mean"
## [31] "UF_normal_all_FC" "UF_risk_normal_all_FC" "UF_normal_white_FC"
## [34] "UF_risk_white_FC" "UF_normal_black_FC" "UF_risk_black_FC"
We have 6 different fold change values, and will order them by the UF compared to normal for all, then white, and then black separately to get the top genes of top 10 up regulated and top 10 down regulated in a Uterine Fibroid compared to normal tissue. Lets write this file out to csv to have once we order it by all first.
UL_all_ordered <- UL[order(UL$UF_normal_all_FC, decreasing=T),]
UL_all_filtered <- UL[!(is.na(UL$UF_normal_all_FC)),]
UL_all_filtered1 <- UL_all_filtered[!(is.infinite(UL_all_filtered$UF_normal_all_FC)),]
UL_all_nozeros <- UL_all_filtered1[UL_all_filtered1$UF_normal_all_FC>0,]
UL_all_nozeros1 <- UL_all_nozeros[order(UL_all_nozeros$UF_normal_all_FC, decreasing=T),]
UL_all_top20 <- UL_all_nozeros1[c(1:10,38196:38205),]
paged_table(UL_all_top20)
Lets write out the files we have so far.
write.csv(UL, 'UL_all_FCs_58735_notFiltered_hasNaNs_hasINf.csv',row.names=F)
write.csv(UL_all_nozeros1,'UL_normal_allRaces_Foldchanges_GSE244187_filtered.csv',row.names=F)
write.csv(UL_all_top20, 'UL_normal_allRaces_Top20.csv',row.names=F)
Now lets do the same for the white UL compared to normal fold change values and the black UL compared to normal fold change values.
UL_normal_white_FC <- UL[!(is.na(UL$UF_normal_white_FC)),]
UL_normal_white_FC1 <- UL_normal_white_FC[!(is.infinite(UL_normal_white_FC$UF_normal_white_FC)),]
UL_normal_white_FC2 <- UL_normal_white_FC1[UL_normal_white_FC1$UF_normal_white_FC >0,]
UL_normal_white_FC3 <- UL_normal_white_FC2[order(UL_normal_white_FC2$UF_normal_white_FC, decreasing=T),]
UL_normal_white_top20 <- UL_normal_white_FC3[c(1:10,27051:27060),]
paged_table(UL_normal_white_top20)
write.csv(UL_normal_white_top20, 'UL_normal_white_Top20.csv',row.names=F)
Now lets get the Black fold change and top 20 genes by UL compared to normal tissue.
UL_normal_black_FC <- UL[!(is.na(UL$UF_normal_black_FC)),]
UL_normal_black_FC1 <- UL_normal_black_FC[!(is.infinite(UL_normal_black_FC$UF_normal_black_FC)),]
UL_normal_black_FC2 <- UL_normal_black_FC1[UL_normal_black_FC1$UF_normal_black_FC > 0,]
UL_normal_black_FC3 <- UL_normal_black_FC2[order(UL_normal_black_FC2$UF_normal_black_FC, decreasing=T),]
UL_normal_black_top20 <- UL_normal_black_FC3[c(1:10,29867:29876),]
paged_table(UL_normal_black_top20)
Lets write this file of top 20 genes for Black women Uterine Fibroids.
write.csv(UL_normal_black_top20,'UL_normal_black_Top20.csv', row.names=F)
Lets combine the top20 for all, white, and black. We left the UL risk genes alone, but we could also look at those after we combine these genes.
Keep only the respective mean and fold change values and add a column to each subtable to identify its gene information set obtained, i.e. all, white, black foldchange values.
table_all <- UL_all_top20[,c(1:3,22,23,31)]
table_all$foldchange_source <- "fibroid and normal all samples"
colnames(table_all)[4:6] <- c('normal_mean','fibroid_mean','foldchange_fibroid_vs_normal')
paged_table(table_all)
table_white <- UL_normal_white_top20[,c(1:3,25,26,33)]
table_white$foldchange_source <- "fibroid and normal White samples"
colnames(table_white)[4:6] <- c('normal_mean','fibroid_mean','foldchange_fibroid_vs_normal')
paged_table(table_white)
table_black <- UL_normal_black_top20[,c(1:3,28,29,35)]
table_black$foldchange_source <- "fibroid and normal Black samples"
colnames(table_black)[4:6] <- c('normal_mean','fibroid_mean','foldchange_fibroid_vs_normal')
paged_table(table_black)
Combine these genes together.
all60 <- rbind(table_all, table_white, table_black)
paged_table(all60)
write.csv(all60, "all60_topGenes_all_White_Black.csv", row.names=F)
Let me share these tables with you before proceeding through a comparison with the genes in our studies that have been shown to be associated with the pathologies we have looked at so far.
The large 58k gene database of fold change and mean values with all NaNs and Infinite values is here.
The large gene database without Infinite and NaN values only for the fibroid vs normal fold change values is here.
And the top 60 genes for all races, White, and Black separately is combined in the data table here.
=============================================
Now lets compare the genes in the fold change values of the unfiltered data with the genes of our pathologies data base and see if there are any findings of importance.
You can get access to our pathologies database thus for our current addition as the last data we added was the Hodgkin’s data with EBV and HIV.
Lets read in our pathologies database.
path <- "C:...current Pathologies Database/" #copy path to the downloadable pathologies file
setwd(path)
pathologies <- read.csv("pathologies_Hodgkin_added_cHL_EBV_HIV_3-27-2026.csv", header=T, sep=',')
str(pathologies)
## 'data.frame': 289 obs. of 7 variables:
## $ Ensembl_ID : chr "ENSG00000211899" "ENSG00000164458" "ENSG00000211644" "ENSG00000125869" ...
## $ Genecards_ID : chr "IGHM" "TBXT" "IGLV1-51" "LAMP5" ...
## $ FC_pathology_control: num 18550 1051 179 140 105 ...
## $ topGenePathology : chr "Epstein Barr Virus" "Epstein Barr Virus" "Epstein Barr Virus" "Epstein Barr Virus" ...
## $ mediaType : chr "LCLs of PBMCs RNA-Seq format" "LCLs of PBMCs RNA-Seq format" "LCLs of PBMCs RNA-Seq format" "LCLs of PBMCs RNA-Seq format" ...
## $ studySummarized : chr "The EBV or Epstein-Barr Viral infected samples were obtained from lymphoblastic cells in peripheral blood monon"| __truncated__ "The EBV or Epstein-Barr Viral infected samples were obtained from lymphoblastic cells in peripheral blood monon"| __truncated__ "The EBV or Epstein-Barr Viral infected samples were obtained from lymphoblastic cells in peripheral blood monon"| __truncated__ "The EBV or Epstein-Barr Viral infected samples were obtained from lymphoblastic cells in peripheral blood monon"| __truncated__ ...
## $ GSE_study_ID : chr "GSE253756" "GSE253756" "GSE253756" "GSE253756" ...
genesPathology <- pathologies$Genecards_ID
genesPathology
## [1] "IGHM" "TBXT" "IGLV1-51" "LAMP5"
## [5] "ICOS" "PACSIN1" "EFR3B" "SIRPB1"
## [9] "ISX" "MIR4537" "LINC00540" "DTHD1"
## [13] "SCT" "DPYSL4" "NBPF3" "THBS1"
## [17] "SLC16A14" "GIMAP7" "SERPINB2" "LINC00327"
## [21] "GNG12" "HMGA2" "ENSG00000255026" "CPA4"
## [25] "IGF1" "AKAP6" "RGPD2" "EFEMP1"
## [29] "ADGRE4P" "TMEM132B" "LINC02898" "TMPRSS3"
## [33] "CCL20" "CD93" "CACNA1B" "MUC20P1"
## [37] "LOC102724560" "TIGIT" "KLHL13" "ENSG00000261471"
## [41] "CHL1-AS2" "GOLT1A" "GPR171" "DZIP1"
## [45] "DTNA" "TINAG" "CLU" "MIR503HG"
## [49] "RBPMS" "PHLDB2" "GPRIN3" "TBX15"
## [53] "LINC01515" "PALLD" "COL27A1" "CDCP1"
## [57] "USP6" "SLC4A4" "SORBS2" "MDGA1"
## [61] "HOGA1" "ADGRA3" "BCO1" "FKBP10"
## [65] "TRIM71" "TEX15" "RHOU" "IGHG3"
## [69] "COL4A1" "PXDN" "ME3" "PTPRG-AS1"
## [73] "FREM1" "DDO" "PRKY" "IGHV3-30"
## [77] "IGHG2" "IGHGP" "IGHG1" "IGHG4"
## [81] "UPK3BL1" "ENSG00000268292" "ENSG00000268292" "LOC101928819"
## [85] "LOC101928819" "TVP23A" "TVP23A" "GNAS-AS1"
## [89] "GNAS-AS1" "HOXC10" "HOXC10" "HOXC10"
## [93] "CXCL2" "CSF3" "CH25H" "ISG20"
## [97] "CLEC2L" "PSMF1" "RNF168" "PEX26"
## [101] "F2" "KCNJ16" "MAP2K7" "ESYT1"
## [105] "GATC" "ENO1" "CYP7B1" "IGFALS"
## [109] "OR52A4" "INAFM1" "DLG3" "TMEM194A"
## [113] "RGPD3" "HPGD" "SLC1A1" "NUDT18"
## [117] "LOC400657" "OTOS" "HECW1" "POU4F2"
## [121] "FRS3" "PDZRN3" "KHDRBS3" "CENPF"
## [125] "FAM162A" "CABP1" "POU3F2" "CTXN3"
## [129] "CLINT1" "IGFBP7" "FAM20C" "SMARCA2"
## [133] "HSD3B1" "ST3GAL3" "TSHZ2" "NLRP3"
## [137] "POM121L15P" "CASC8" "ESRP1" "RPL7P61"
## [141] "PLSCR5" "CPA6" "ERBB4" "CACNA1E"
## [145] "TMEM200A" "NDUFB5P1" "STX12" "CRLF3P3"
## [149] "CDH8" " PIK3C2A" "KCNA6-AS1" "SLIRPP1"
## [153] "PDE4DIPP5" "ST8SIA4" "HDAC4" "SCHLAP1"
## [157] "ATP5MC3" "CAMKMT" "AQP12B" "LZIC"
## [161] "WDR35" "TFPI" "ACTN4" "TSNARE1"
## [165] "MIR4432HG" "RPL31P30" "RNU6-280P" "KLHL29"
## [169] "DDAH1" "MIR382" "MIR584" "MIR1973"
## [173] "MIR382" "MIR432" "MIR432" "CCND1"
## [177] "CCND1" "MIR382" "MIR409" "MIR664B"
## [181] "MIR382" "MIR432" "MIR489" "CCND1"
## [185] "ANKRD22" "CXCL10" "IFI27" "IL1R2"
## [189] "NCKAP5" "FCER1A" "ENSG00000286797" "LOC102724019"
## [193] "NRCAM" "PIRAT1" "LYZ" "lnc-RSPH14-1"
## [197] "SOX5" "PPBP" "IGKC" "lnc-RSPH14-1"
## [201] "HSALNG0035179" "LINGO2" "LOC105377276" "ENSG00000167522"
## [205] "ENSG00000170458" "ENSG00000174059" "ENSG00000086548" "ENSG00000182578"
## [209] "ENSG00000049768" "ENSG00000231389" "ENSG00000104432" "ENSG00000213626"
## [213] "ENSG00000226979" "ENSG00000116132" "ENSG00000123892" "ENSG00000154764"
## [217] "AC139530.2" "AC011511.4" "AC092143.1" "AL645465.1"
## [221] "AL353997.3" "AC022384.1" "MPO" "TRIM6-TRIM34"
## [225] "FUT4" "ITGA4" "RPL23AP34" "AC027088.3"
## [229] "MYOM3" "CALCR" "MOXD1" "SMAD5-AS1"
## [233] "HSPE1P7" "AC026464.2" "RASA4B" "AC026954.2"
## [237] "AL353997.3" "AICDA" "ARID1B" "CARD11"
## [241] "CD4" "COL16A1" "CPLX2" "CREBBP"
## [245] "EP300" "EZH2" "FAM72A" "KDM5B"
## [249] "KMT2D" "LERFS" "MUC16" "NCOR2"
## [253] "NR4A2" "POU2F3" "RPS4Y1" "SLC18A2"
## [257] "TIGIT" "TP53" "TTN" "VAMP5"
## [261] "XBP1" "Z82243.1" "ZFHX3" "ARID1B"
## [265] "CARD11" "CCDC8" "CD4" "CREBBP"
## [269] "CTSLP3" "EP300" "EZH2" "FAM72A"
## [273] "HNRNPA1P70" "IGKV1D-39" "IGKV3-20" "KDM5B"
## [277] "KMT2D" "LTF" "MMP8" "MPO"
## [281] "MUC16" "NCOR2" "PES1P2" "RPS4Y1"
## [285] "TIGIT" "TP53" "TTN" "XBP1"
## [289] "ZFHX3"
I noticed an error in the dataset where the Ensembl IDs are in the Genecards IDs and vice versa. We need to swith this, and could go back to that file but we will do it here and upload it again. The new downloaded file won’t show it, or I could just make a separate link. I will do that.
genecards <- pathologies$Ensembl_ID[204:216]
ensembls <- pathologies$Genecards_ID[204:216]
pathologies$Genecards_ID[204:216] <- genecards
pathologies$Ensembl_ID[204:216] <- ensembls
pathologies$Genecards_ID
## [1] "IGHM" "TBXT" "IGLV1-51" "LAMP5"
## [5] "ICOS" "PACSIN1" "EFR3B" "SIRPB1"
## [9] "ISX" "MIR4537" "LINC00540" "DTHD1"
## [13] "SCT" "DPYSL4" "NBPF3" "THBS1"
## [17] "SLC16A14" "GIMAP7" "SERPINB2" "LINC00327"
## [21] "GNG12" "HMGA2" "ENSG00000255026" "CPA4"
## [25] "IGF1" "AKAP6" "RGPD2" "EFEMP1"
## [29] "ADGRE4P" "TMEM132B" "LINC02898" "TMPRSS3"
## [33] "CCL20" "CD93" "CACNA1B" "MUC20P1"
## [37] "LOC102724560" "TIGIT" "KLHL13" "ENSG00000261471"
## [41] "CHL1-AS2" "GOLT1A" "GPR171" "DZIP1"
## [45] "DTNA" "TINAG" "CLU" "MIR503HG"
## [49] "RBPMS" "PHLDB2" "GPRIN3" "TBX15"
## [53] "LINC01515" "PALLD" "COL27A1" "CDCP1"
## [57] "USP6" "SLC4A4" "SORBS2" "MDGA1"
## [61] "HOGA1" "ADGRA3" "BCO1" "FKBP10"
## [65] "TRIM71" "TEX15" "RHOU" "IGHG3"
## [69] "COL4A1" "PXDN" "ME3" "PTPRG-AS1"
## [73] "FREM1" "DDO" "PRKY" "IGHV3-30"
## [77] "IGHG2" "IGHGP" "IGHG1" "IGHG4"
## [81] "UPK3BL1" "ENSG00000268292" "ENSG00000268292" "LOC101928819"
## [85] "LOC101928819" "TVP23A" "TVP23A" "GNAS-AS1"
## [89] "GNAS-AS1" "HOXC10" "HOXC10" "HOXC10"
## [93] "CXCL2" "CSF3" "CH25H" "ISG20"
## [97] "CLEC2L" "PSMF1" "RNF168" "PEX26"
## [101] "F2" "KCNJ16" "MAP2K7" "ESYT1"
## [105] "GATC" "ENO1" "CYP7B1" "IGFALS"
## [109] "OR52A4" "INAFM1" "DLG3" "TMEM194A"
## [113] "RGPD3" "HPGD" "SLC1A1" "NUDT18"
## [117] "LOC400657" "OTOS" "HECW1" "POU4F2"
## [121] "FRS3" "PDZRN3" "KHDRBS3" "CENPF"
## [125] "FAM162A" "CABP1" "POU3F2" "CTXN3"
## [129] "CLINT1" "IGFBP7" "FAM20C" "SMARCA2"
## [133] "HSD3B1" "ST3GAL3" "TSHZ2" "NLRP3"
## [137] "POM121L15P" "CASC8" "ESRP1" "RPL7P61"
## [141] "PLSCR5" "CPA6" "ERBB4" "CACNA1E"
## [145] "TMEM200A" "NDUFB5P1" "STX12" "CRLF3P3"
## [149] "CDH8" " PIK3C2A" "KCNA6-AS1" "SLIRPP1"
## [153] "PDE4DIPP5" "ST8SIA4" "HDAC4" "SCHLAP1"
## [157] "ATP5MC3" "CAMKMT" "AQP12B" "LZIC"
## [161] "WDR35" "TFPI" "ACTN4" "TSNARE1"
## [165] "MIR4432HG" "RPL31P30" "RNU6-280P" "KLHL29"
## [169] "DDAH1" "MIR382" "MIR584" "MIR1973"
## [173] "MIR382" "MIR432" "MIR432" "CCND1"
## [177] "CCND1" "MIR382" "MIR409" "MIR664B"
## [181] "MIR382" "MIR432" "MIR489" "CCND1"
## [185] "ANKRD22" "CXCL10" "IFI27" "IL1R2"
## [189] "NCKAP5" "FCER1A" "ENSG00000286797" "LOC102724019"
## [193] "NRCAM" "PIRAT1" "LYZ" "lnc-RSPH14-1"
## [197] "SOX5" "PPBP" "IGKC" "lnc-RSPH14-1"
## [201] "HSALNG0035179" "LINGO2" "LOC105377276" "ANKRD11"
## [205] "CD14" "CD34" "CEACAM6" "CSF1R"
## [209] "FOXP3" "HLA-DPA1" "IL7" "LBH"
## [213] "LTA" "PRRX1" "RAB38" "WNT7A"
## [217] "AC139530.2" "AC011511.4" "AC092143.1" "AL645465.1"
## [221] "AL353997.3" "AC022384.1" "MPO" "TRIM6-TRIM34"
## [225] "FUT4" "ITGA4" "RPL23AP34" "AC027088.3"
## [229] "MYOM3" "CALCR" "MOXD1" "SMAD5-AS1"
## [233] "HSPE1P7" "AC026464.2" "RASA4B" "AC026954.2"
## [237] "AL353997.3" "AICDA" "ARID1B" "CARD11"
## [241] "CD4" "COL16A1" "CPLX2" "CREBBP"
## [245] "EP300" "EZH2" "FAM72A" "KDM5B"
## [249] "KMT2D" "LERFS" "MUC16" "NCOR2"
## [253] "NR4A2" "POU2F3" "RPS4Y1" "SLC18A2"
## [257] "TIGIT" "TP53" "TTN" "VAMP5"
## [261] "XBP1" "Z82243.1" "ZFHX3" "ARID1B"
## [265] "CARD11" "CCDC8" "CD4" "CREBBP"
## [269] "CTSLP3" "EP300" "EZH2" "FAM72A"
## [273] "HNRNPA1P70" "IGKV1D-39" "IGKV3-20" "KDM5B"
## [277] "KMT2D" "LTF" "MMP8" "MPO"
## [281] "MUC16" "NCOR2" "PES1P2" "RPS4Y1"
## [285] "TIGIT" "TP53" "TTN" "XBP1"
## [289] "ZFHX3"
pathologies$Ensembl_ID
## [1] "ENSG00000211899" "ENSG00000164458" "ENSG00000211644" "ENSG00000125869"
## [5] "ENSG00000163600" "ENSG00000124507" "ENSG00000084710" "ENSG00000101307"
## [9] "ENSG00000175329" "ENSG00000264781" "ENSG00000276476" "ENSG00000197057"
## [13] "ENSG00000070031" "ENSG00000151640" "ENSG00000142794" "ENSG00000137801"
## [17] "ENSG00000163053" "ENSG00000179144" "ENSG00000197632" "ENSG00000232977"
## [21] "ENSG00000172380" "ENSG00000149948" "ENSG00000255026" "ENSG00000128510"
## [25] "ENSG00000017427" "ENSG00000151320" "ENSG00000185304" "ENSG00000115380"
## [29] "ENSG00000268758" "ENSG00000139364" "ENSG00000205086" "ENSG00000160183"
## [33] "ENSG00000115009" "ENSG00000125810" "ENSG00000148408" "ENSG00000224769"
## [37] "ENSG00000274276" "ENSG00000181847" "ENSG00000003096" "ENSG00000261471"
## [41] "ENSG00000224318" "ENSG00000174567" "ENSG00000174946" "ENSG00000134874"
## [45] "ENSG00000134769" "ENSG00000137251" "ENSG00000120885" "ENSG00000223749"
## [49] "ENSG00000157110" "ENSG00000144824" "ENSG00000185477" "ENSG00000092607"
## [53] "ENSG00000228065" "ENSG00000129116" "ENSG00000196739" "ENSG00000163814"
## [57] "ENSG00000129204" "ENSG00000080493" "ENSG00000154556" "ENSG00000112139"
## [61] "ENSG00000241935" "ENSG00000152990" "ENSG00000135697" "ENSG00000141756"
## [65] "ENSG00000206557" "ENSG00000133863" "ENSG00000116574" "ENSG00000211897"
## [69] "ENSG00000187498" "ENSG00000130508" "ENSG00000151376" "ENSG00000241472"
## [73] "ENSG00000164946" "ENSG00000203797" "ENSG00000099725" "ENSG00000270550"
## [77] "ENSG00000211893" "ENSG00000253755" "ENSG00000211896" "ENSG00000211892"
## [81] "ENSG00000267368" "ENSG00000268292" "ENSG00000268292" "ENSG00000250978"
## [85] "ENSG00000250978" "ENSG00000166676" "ENSG00000166676" "ENSG00000235590"
## [89] "ENSG00000235590" "ENSG00000180818" "ENSG00000180818" "ENSG00000180818"
## [93] "ENSG00000081041" "ENSG00000108342" "ENSG00000138135" "ENSG00000172183"
## [97] "ENSG00000236279" "ENSG00000125818" "ENSG00000163961" "ENSG00000215193"
## [101] "ENSG00000180210" "ENSG00000153822" "ENSG00000076984" "ENSG00000139641"
## [105] "ENSG00000257218" "ENSG00000074800" "ENSG00000172817" "ENSG00000099769"
## [109] "ENSG00000205494" "ENSG00000257704" "ENSG00000082458" "ENSG00000304975"
## [113] "ENSG00000153165" "ENSG00000164120" "ENSG00000106688" "ENSG00000275074"
## [117] "lysate" "ENSG00000178602" "ENSG00000002746" "ENSG00000151615"
## [121] "ENSG00000137218" "ENSG00000121440" "ENSG00000131773" "ENSG00000117724"
## [125] "ENSG00000114023" "ENSG00000157782" "ENSG00000184486" "ENSG00000205279"
## [129] "ENSG00000113282" "ENSG00000163453" "ENSG00000177706" "ENSG00000080503"
## [133] "ENSG00000203857" "ENSG00000126091" "ENSG00000182463" "ENSG00000162711"
## [137] "ENSG00000161103" "ENSG00000246228" "ENSG00000104413" "ENSG00000230282"
## [141] "ENSG00000231213" "ENSG00000165078" "ENSG00000178568" "ENSG00000198216"
## [145] "ENSG00000164484" "ENSG00000251025" "ENSG00000117758" "ENSG00000228225"
## [149] "ENSG00000150394" "ENSG00000011405" "ENSG00000256988" "ENSG00000227505"
## [153] "ENSG00000275064" "ENSG00000113532" "ENSG00000068024" "ENSG00000281131"
## [157] "ENSG00000154518" "ENSG00000143919" "ENSG00000185176" "ENSG00000162441"
## [161] "ENSG00000118965" "ENSG00000003436" "ENSG00000130402" "ENSG00000171045"
## [165] "ENSG00000228590" "ENSG00000230702" "ENSG00000201015" "ENSG00000119771"
## [169] "ENSG00000153904" "ENSG00000283170" "ENSG00000207714" "ENSG00000284253"
## [173] "ENSG00000283170" "ENSG00000272458" "ENSG00000272458" "ENSG00000110092"
## [177] "ENSG00000110092" "ENSG00000283170" "ENSG00000199107" "ENSG00000284450"
## [181] "ENSG00000283170" "ENSG00000272458" "ENSG00000207656" "ENSG00000110092"
## [185] "ENSG00000152766" "ENSG00000169245" "ENSG00000165949" "ENSG00000115590"
## [189] "ENSG00000176771" "ENSG00000179639" "ENSG00000286797" "ENSG00000240086"
## [193] "ENSG00000303545" "ENSG00000237803" "ENSG00000257764" NA
## [197] "ENSG00000256473" "ENSG00000287037" "ENSG00000295771" NA
## [201] NA "ENSG00000302413" "ENSG00000304732" "ENSG00000167522"
## [205] "ENSG00000170458" "ENSG00000174059" "ENSG00000086548" "ENSG00000182578"
## [209] "ENSG00000049768" "ENSG00000231389" "ENSG00000104432" "ENSG00000213626"
## [213] "ENSG00000226979" "ENSG00000116132" "ENSG00000123892" "ENSG00000154764"
## [217] "ENSG00000262660" "ENSG00000267303" "ENSG00000198211" "ENSG00000240963"
## [221] "ENSG00000267441" "ENSG00000272410" "ENSG00000005381" "ENSG00000258588"
## [225] "ENSG00000196371" "ENSG00000115232" "ENSG00000225991" "ENSG00000259265"
## [229] "ENSG00000142661" "ENSG00000004948" "ENSG00000079931" "ENSG00000164621"
## [233] "ENSG00000270945" "ENSG00000260108" "ENSG00000170667" "ENSG00000261915"
## [237] "ENSG00000279999" "ENSG00000111732" "ENSG00000049618" "ENSG00000198286"
## [241] "ENSG00000010610" "ENSG00000084636" "ENSG00000145920" "ENSG00000005339"
## [245] "ENSG00000100393" "ENSG00000106462" "ENSG00000196550" "ENSG00000117139"
## [249] "ENSG00000167548" "ENSG00000234665" "ENSG00000181143" "ENSG00000196498"
## [253] "ENSG00000153234" "ENSG00000137709" "ENSG00000129824" "ENSG00000165646"
## [257] "ENSG00000181847" "ENSG00000141510" "ENSG00000155657" "ENSG00000168899"
## [261] "ENSG00000100219" "ENSG00000273243" "ENSG00000140836" "ENSG00000049618"
## [265] "ENSG00000198286" "ENSG00000169515" "ENSG00000010610" "ENSG00000005339"
## [269] "ENSG00000280913" "ENSG00000100393" "ENSG00000106462" "ENSG00000196550"
## [273] "ENSG00000236946" "ENSG00000251546" "ENSG00000239951" "ENSG00000117139"
## [277] "ENSG00000167548" "ENSG00000012223" "ENSG00000118113" "ENSG00000005381"
## [281] "ENSG00000181143" "ENSG00000196498" "ENSG00000229268" "ENSG00000129824"
## [285] "ENSG00000181847" "ENSG00000141510" "ENSG00000155657" "ENSG00000100219"
## [289] "ENSG00000140836"
Looks ok because some of the Genecards IDs don’t have names other than Ensembl IDs.
write.csv(pathologies, 'pathologies_edited_3-31-2026.csv', row.names=F)
You can get that file here.
The LMP1 gene isn’t in the database of pathologies but the genes from the study are that the study found relevant. Those were the genes we switched from Ensemble IDs to the Genecards IDs because that is what they are.
Lets see if the KDM5B, LMP1, and other genes of the last studies are in this study, so we will add them to the list of Genecards IDs to select from the fibroid data on fold change values comparing uterine fibroid to normal tissue in same uterus of both White and Black females.
theList <- c(pathologies$Genecards_ID,"LMP1")
theList
## [1] "IGHM" "TBXT" "IGLV1-51" "LAMP5"
## [5] "ICOS" "PACSIN1" "EFR3B" "SIRPB1"
## [9] "ISX" "MIR4537" "LINC00540" "DTHD1"
## [13] "SCT" "DPYSL4" "NBPF3" "THBS1"
## [17] "SLC16A14" "GIMAP7" "SERPINB2" "LINC00327"
## [21] "GNG12" "HMGA2" "ENSG00000255026" "CPA4"
## [25] "IGF1" "AKAP6" "RGPD2" "EFEMP1"
## [29] "ADGRE4P" "TMEM132B" "LINC02898" "TMPRSS3"
## [33] "CCL20" "CD93" "CACNA1B" "MUC20P1"
## [37] "LOC102724560" "TIGIT" "KLHL13" "ENSG00000261471"
## [41] "CHL1-AS2" "GOLT1A" "GPR171" "DZIP1"
## [45] "DTNA" "TINAG" "CLU" "MIR503HG"
## [49] "RBPMS" "PHLDB2" "GPRIN3" "TBX15"
## [53] "LINC01515" "PALLD" "COL27A1" "CDCP1"
## [57] "USP6" "SLC4A4" "SORBS2" "MDGA1"
## [61] "HOGA1" "ADGRA3" "BCO1" "FKBP10"
## [65] "TRIM71" "TEX15" "RHOU" "IGHG3"
## [69] "COL4A1" "PXDN" "ME3" "PTPRG-AS1"
## [73] "FREM1" "DDO" "PRKY" "IGHV3-30"
## [77] "IGHG2" "IGHGP" "IGHG1" "IGHG4"
## [81] "UPK3BL1" "ENSG00000268292" "ENSG00000268292" "LOC101928819"
## [85] "LOC101928819" "TVP23A" "TVP23A" "GNAS-AS1"
## [89] "GNAS-AS1" "HOXC10" "HOXC10" "HOXC10"
## [93] "CXCL2" "CSF3" "CH25H" "ISG20"
## [97] "CLEC2L" "PSMF1" "RNF168" "PEX26"
## [101] "F2" "KCNJ16" "MAP2K7" "ESYT1"
## [105] "GATC" "ENO1" "CYP7B1" "IGFALS"
## [109] "OR52A4" "INAFM1" "DLG3" "TMEM194A"
## [113] "RGPD3" "HPGD" "SLC1A1" "NUDT18"
## [117] "LOC400657" "OTOS" "HECW1" "POU4F2"
## [121] "FRS3" "PDZRN3" "KHDRBS3" "CENPF"
## [125] "FAM162A" "CABP1" "POU3F2" "CTXN3"
## [129] "CLINT1" "IGFBP7" "FAM20C" "SMARCA2"
## [133] "HSD3B1" "ST3GAL3" "TSHZ2" "NLRP3"
## [137] "POM121L15P" "CASC8" "ESRP1" "RPL7P61"
## [141] "PLSCR5" "CPA6" "ERBB4" "CACNA1E"
## [145] "TMEM200A" "NDUFB5P1" "STX12" "CRLF3P3"
## [149] "CDH8" " PIK3C2A" "KCNA6-AS1" "SLIRPP1"
## [153] "PDE4DIPP5" "ST8SIA4" "HDAC4" "SCHLAP1"
## [157] "ATP5MC3" "CAMKMT" "AQP12B" "LZIC"
## [161] "WDR35" "TFPI" "ACTN4" "TSNARE1"
## [165] "MIR4432HG" "RPL31P30" "RNU6-280P" "KLHL29"
## [169] "DDAH1" "MIR382" "MIR584" "MIR1973"
## [173] "MIR382" "MIR432" "MIR432" "CCND1"
## [177] "CCND1" "MIR382" "MIR409" "MIR664B"
## [181] "MIR382" "MIR432" "MIR489" "CCND1"
## [185] "ANKRD22" "CXCL10" "IFI27" "IL1R2"
## [189] "NCKAP5" "FCER1A" "ENSG00000286797" "LOC102724019"
## [193] "NRCAM" "PIRAT1" "LYZ" "lnc-RSPH14-1"
## [197] "SOX5" "PPBP" "IGKC" "lnc-RSPH14-1"
## [201] "HSALNG0035179" "LINGO2" "LOC105377276" "ANKRD11"
## [205] "CD14" "CD34" "CEACAM6" "CSF1R"
## [209] "FOXP3" "HLA-DPA1" "IL7" "LBH"
## [213] "LTA" "PRRX1" "RAB38" "WNT7A"
## [217] "AC139530.2" "AC011511.4" "AC092143.1" "AL645465.1"
## [221] "AL353997.3" "AC022384.1" "MPO" "TRIM6-TRIM34"
## [225] "FUT4" "ITGA4" "RPL23AP34" "AC027088.3"
## [229] "MYOM3" "CALCR" "MOXD1" "SMAD5-AS1"
## [233] "HSPE1P7" "AC026464.2" "RASA4B" "AC026954.2"
## [237] "AL353997.3" "AICDA" "ARID1B" "CARD11"
## [241] "CD4" "COL16A1" "CPLX2" "CREBBP"
## [245] "EP300" "EZH2" "FAM72A" "KDM5B"
## [249] "KMT2D" "LERFS" "MUC16" "NCOR2"
## [253] "NR4A2" "POU2F3" "RPS4Y1" "SLC18A2"
## [257] "TIGIT" "TP53" "TTN" "VAMP5"
## [261] "XBP1" "Z82243.1" "ZFHX3" "ARID1B"
## [265] "CARD11" "CCDC8" "CD4" "CREBBP"
## [269] "CTSLP3" "EP300" "EZH2" "FAM72A"
## [273] "HNRNPA1P70" "IGKV1D-39" "IGKV3-20" "KDM5B"
## [277] "KMT2D" "LTF" "MMP8" "MPO"
## [281] "MUC16" "NCOR2" "PES1P2" "RPS4Y1"
## [285] "TIGIT" "TP53" "TTN" "XBP1"
## [289] "ZFHX3" "LMP1"
The ensemble list. LMP1 is an EBV gene, not human. Good to know. I remember that but only now do I remember that LMP1 wasn’t in our list even though we worked with a study entirely based on it, because LMP1 is the viral gene. Lets just use all the genes from our pathologies data that is from the EBV associated pathways.
EBVa <- pathologies$Genecards_ID[grep("EBV",pathologies$topGenePathology)]
EBVa
## [1] "ANKRD22" "CXCL10" "IFI27" "IL1R2"
## [5] "NCKAP5" "FCER1A" "ENSG00000286797" "LOC102724019"
## [9] "NRCAM" "PIRAT1" "LYZ" "lnc-RSPH14-1"
## [13] "SOX5" "PPBP" "IGKC" "lnc-RSPH14-1"
## [17] "HSALNG0035179" "LINGO2" "LOC105377276" "ANKRD11"
## [21] "CD14" "CD34" "CEACAM6" "CSF1R"
## [25] "FOXP3" "HLA-DPA1" "IL7" "LBH"
## [29] "LTA" "PRRX1" "RAB38" "WNT7A"
## [33] "AC139530.2" "AC011511.4" "AC092143.1" "AL645465.1"
## [37] "AL353997.3" "AC022384.1" "MPO" "TRIM6-TRIM34"
## [41] "FUT4" "ITGA4" "RPL23AP34" "AC027088.3"
## [45] "MYOM3" "CALCR" "MOXD1" "SMAD5-AS1"
## [49] "HSPE1P7" "AC026464.2" "RASA4B" "AC026954.2"
## [53] "AL353997.3" "AICDA" "ARID1B" "CARD11"
## [57] "CD4" "COL16A1" "CPLX2" "CREBBP"
## [61] "EP300" "EZH2" "FAM72A" "KDM5B"
## [65] "KMT2D" "LERFS" "MUC16" "NCOR2"
## [69] "NR4A2" "POU2F3" "RPS4Y1" "SLC18A2"
## [73] "TIGIT" "TP53" "TTN" "VAMP5"
## [77] "XBP1" "Z82243.1" "ZFHX3" "ARID1B"
## [81] "CARD11" "CCDC8" "CD4" "CREBBP"
## [85] "CTSLP3" "EP300" "EZH2" "FAM72A"
## [89] "HNRNPA1P70" "IGKV1D-39" "IGKV3-20" "KDM5B"
## [93] "KMT2D" "LTF" "MMP8" "MPO"
## [97] "MUC16" "NCOR2" "PES1P2" "RPS4Y1"
## [101] "TIGIT" "TP53" "TTN" "XBP1"
## [105] "ZFHX3"
Ok great! So we will work with this. There are 105 EBV specific to EBV associated genes. Now lets grab all these genes from our fibroid unfiltered and filtered data of genes with fold change values.
UL_set1 <- UL[UL$GeneSymbol %in% EBVa,]
UL_set2 <- UL_all_filtered1[UL_all_filtered1$GeneSymbol %in% EBVa,]
paged_table(UL_set1)
There are 78 genes in the pathologies data related to the EBV associated genes in the unfiltered set.
Lets look at a narrower data frame of our unfiltered 78 genes in the fibroid data.
UL_set1b <- UL_set1[,c(1:3,22,23,31)]
paged_table(UL_set1b)
The KDM5B gene is 8% up regulated over all samples. Lets see how it is in the White females and Black females.
The White females.
UL_set1c <- UL_set1[,c(1:3,25,26,33)]
paged_table(UL_set1c)
The KDM5B gene is 273% up regulated or enhanced in the White females. This gene has been associated with EBV infection and turning on and off gene transcription and signaling in Nasopharyngeal cancer and in gastric carcinoma by EBV.
Lets see how KDM5B is in Black females.
UL_set1d <- UL_set1[,c(1:3,28,29,35)]
paged_table(UL_set1d)
That is interesting, because in the black female samples KDM5B is actually inhibited or down regulated 55% by only operating at 45.7 % it’s normal range of gene expression. In the study that looked at gene KDM5B, this was the region of chromosome 1 that EBV liked to attach and started interfering with normal cell division. They looked at a couple of metastatic genes as well in that study of VEGFA and VCAM1, where VEGFA is a tumor angiogenesis promoter and T cell suppressor seen in metastasis, while VCAM1 is an encoding vascular cell adhesion molecule 1 seen in metastasis with tumor cell invasion and cellular immune response. Lets see how these 2 genes are in the larger data.
vegfa <- c("VCAM1","VEGFA", "KDM5B","CD4")
VEGFA_VCAM1 <- UL[UL$GeneSymbol %in% vegfa,]
paged_table(VEGFA_VCAM1)
The fold change in both races for Uterine fibroid compared to normal is enhanced 82.7% for VCAM1 and KDM5B is enhanced 8.5%, but silenced 48% for VEGFA and 9% for CD4. The fold change values for VCAM1, KDM5B, VEGFA, and CD4 is upregulated or enhanced in White females, but all are down regulated or silenced in Black females.
Looks like there are some differences in race for KDM5B gene in uterine fibroids. I only analyzed the samples in this study, but quickly looking over the results of the published article in the conclusion. The conclusion suggests that due to the excess production of fibrinogen and extra cellular matrix genes in Black females compared to White women, that the Black females are developing uterine fibroids on a larger proportion than White females due to their extra cellular matrix characteristics before fibroids develop.
Other genes are also inhibited in Black females that we saw in our last study on an African population with Hodgkin’s and some with EBV and some with EBV and HIV. The genes of CD4, TIGIT, and NCOR2 are all inhibited. So is the gene in HIV and EBV of XBP1, and also another cHL or Hodgkins gene of EP300. We can go back up to the data of UL_set1 and see the summary statistics for these features. But we will need to turn it into a matrix first to get summary stats for genes.
black <- grep('black', colnames(UL_set1))
ebv_black <- UL_set1[,c(1:3, black)]
paged_table(ebv_black)
normal_black <- ebv_black[,c(2,7:9)]
uf_black <- ebv_black[,c(2,10:12)]
normal_b_t <- t(normal_black[,2:4])
uf_b_t <- t(uf_black[,2:4])
colnames(normal_b_t) <- normal_black$GeneSymbol
colnames(uf_b_t) <- uf_black$GeneSymbol
summary(normal_b_t[,c(5,51,76,57,60,21,40)])
## KDM5B CD4 XBP1 NCOR2 CREBBP
## Min. :289.0 Min. :41.00 Min. :188.0 Min. :1005 Min. :1056
## 1st Qu.:357.5 1st Qu.:49.50 1st Qu.:195.5 1st Qu.:1222 1st Qu.:1266
## Median :426.0 Median :58.00 Median :203.0 Median :1439 Median :1477
## Mean :423.7 Mean :52.67 Mean :257.3 Mean :1592 Mean :1497
## 3rd Qu.:491.0 3rd Qu.:58.50 3rd Qu.:292.0 3rd Qu.:1885 3rd Qu.:1718
## Max. :556.0 Max. :59.00 Max. :381.0 Max. :2331 Max. :1958
## WNT7A IL7
## Min. :0.0000 Min. :1
## 1st Qu.:0.0000 1st Qu.:2
## Median :0.0000 Median :3
## Mean :0.6667 Mean :3
## 3rd Qu.:1.0000 3rd Qu.:4
## Max. :2.0000 Max. :5
The normal summmary stats for selected genes over these 3 Black samples show median values close to the mean value.
summary(uf_b_t[,c(5,51,76,57,60,21,40)])
## KDM5B CD4 XBP1 NCOR2 CREBBP
## Min. :104 Min. :11.00 Min. : 37.0 Min. : 91.0 Min. :145.0
## 1st Qu.:141 1st Qu.:11.50 1st Qu.: 52.5 1st Qu.:195.0 1st Qu.:300.0
## Median :178 Median :12.00 Median : 68.0 Median :299.0 Median :455.0
## Mean :194 Mean :19.33 Mean : 79.0 Mean :293.3 Mean :390.3
## 3rd Qu.:239 3rd Qu.:23.50 3rd Qu.:100.0 3rd Qu.:394.5 3rd Qu.:513.0
## Max. :300 Max. :35.00 Max. :132.0 Max. :490.0 Max. :571.0
## WNT7A IL7
## Min. : 0.000 Min. :1.0
## 1st Qu.: 0.000 1st Qu.:2.5
## Median : 0.000 Median :4.0
## Mean : 3.333 Mean :3.0
## 3rd Qu.: 5.000 3rd Qu.:4.0
## Max. :10.000 Max. :4.0
Seems like most of these selected genes seen in the study on EBV in cHL, EBV, and HIV as well as some other EBV genes were higher in the normal tissue then dropped dramatically in the fibroid tissue. The WNT7A gene was mentioned but is associated with many tissue cancers like breast cancer. The IL7 is an immune gene related to interleukins and inflammation response. The WNT7A and IL7 genes are higher in the fibroid tissue than in normal tissue. That was in comparing Black females with uterine fibroids and genes in an EBV associated pathology.
The two races have different responses to gene expression as a subset of the population to how their uterine fibroids are made up of in Black females compared to White females.
That was interesting but there could be some hidden information that connects more to the EBV genes in the at risk groups. Lets pull the at risk of fibroid samples compared to normal for fold change values and briefly compare just these genes. Lets make a list of the ones we want to look at.
few <- c("KDM5B","IL7","WNT7A", "KDM5B", "CD4" , "XBP1", "NCOR2", "CREBBP")
risk <- UL[,c(2,32,34,36)]
riskFC <- risk[risk$GeneSymbol %in% few,]
paged_table(riskFC)
For the at risk of developing into a fibroid because it is myometrial tissue biopsied adjacent or as close as possible to a fibroid, all genes in every subset of race is enhanced or up regulated. Except for WNT7A in the White females only. But you can see all our selected EBV associated genes in nasopharyngeal carcinoma, gastric carcinoma, and Hodgkin’s Lymphoma are up regulated in tissue adjacent to a tumor in the myometrium of the uterus. The endometrial tissue is in front of the myometrium and the perimetrium is outside the myometrium but not those tissues, the study says that the myometrial tissue adjacent to the fibroid, and not other tissue of the uterus.
That was interesting because there might be a connection to EBV and the epithelial tissue to get to the myometrial tissue fibroids in uterine fibroids of White and Black females. But not the uterine fibroids of Black females, they take a different turn and down regulate, but the White females uterine fibroid tissue stays up regulated.
We know the EBV virus is from our previous studies researched that it lies dormant and its viral gene LMP1 can alter gene expression at the chromatin level and so can KDM5B. Studies show the epithelial lining of the gastrointestinal tract and nasal and pharynx passageways can develop carcinoma from an active EBV infection from its harmless latent state in the nucleus of the cells of the body of host infected ever. We also know that the CD4 T cell anergy can occur making the immune response lazy or ineffective and that lymph nodes can be enlarged due to blockages of clotted B cells from EBV, and there is a connection to demyelination in multiple sclerosis that needs further study to see how EBV is connected to multiple sclerosis later. But one study suggested it is due to the KDM5B interacting region on human host chromatin that disrupts normal cell signaling and transcription and translation of proteins at the cellular level that interrupts nerve myelination when the EBV latent virus becomes activated from stess or other environmental factors. The Gastrointestinal tract and the genitourinary tract are stratified epithelial tissue. There is a mnemonic for these tissues and cancer like skin cancer, gastric reflux, or viral infections causing changes to stratified epithelial cells that turn them into another type like squamous or cuboid or other than their natural cell shape, ‘if its satisfied, its stratisfied.’ No current studies popped up immediately on any connection to EBV with uterine fibroids, but it is still an unknown development for how uterine fibroids are formed. Many studies suggest and show its due to the different races of females and how their bodies develop fibroids, as more Black females have a reported incidence of uterine fibroids than other races, but there are still around 80% of all females that have them and many are not known to have them as they don’t get annual or regular or any check ups with their OB/GYN even is symptomatic but mostly because they assume people that don’t visit the doctor for pain are asymptomatic.
Thanks for joining this research into finding an EBV associated pathology in the female genitourinary tract but specifically the uterus.
=======================================================================
*** Part 2
After reading over the study, orginally I misinterpreted what was in the series information to be that the normal myometrium tissue is from the same uterus that the myometrial tissue that is at risk of turning into a uterine fibroid, and the uterine fibroid is. But that is incorrect. I reveiwed the published article after the analysis, and saw that the normal tissue was the myometrium of a White or Black female who was having a hysterectomy but not due to a fibroid or other pathology but for personal reasons such as a pelvic organ prolapse. While the uterine fibroid and the myometrial tissue at risk of a uterine fibroid is from the same uterus but at least 2 cm from the nearest uterine fibroid as these are biopsies taken after or during a hysterectomy. The goal of the study was to look at the extra cellular matrix (ECM) and pathways of ECM in Black females compared to White females to see if this has a significance in why Black females tend to have more fibroids, more pain with fibroids, larger fibroids, and more cases of uterine fibroids. They found that there were significant genetic differences between the races in genes related to this pathway of cartilage, fibrinectin, the ECM, and in normal myometrial tissue of Black compared to White females as well as in normal compared to at risk of developing a uterine fibroid. I won’t be selecting the target genes of this particular study as they had more to do with racial disparities or separation and not pathology compared to normal. They would just throw off the machine model building of pathologies to compare and predict a pathology of EBV associated disease compared to non EBV associated pathologies.
Lets look at all 60 genes again, and see which are duplicated.
duplicated <- all60[duplicated(all60$GeneSymbol),]
duplicated <- duplicated[order(duplicated$GeneSymbol,decreasing=T),]
paged_table(duplicated)
There are 14 genes duplicated.
dups <- duplicated$GeneSymbol
allDups <- all60[all60$GeneSymbol %in% dups,]
allDups <- allDups[order(allDups$GeneSymbol, decreasing=T),]
paged_table(allDups)
Lets keep these as our genes for uterine fibroid, since they are seen in more than one dataset of White, Black, or both races as a top gene.
colnames(pathologies)
## [1] "Ensembl_ID" "Genecards_ID" "FC_pathology_control"
## [4] "topGenePathology" "mediaType" "studySummarized"
## [7] "GSE_study_ID"
colnames(allDups)
## [1] "GeneID" "GeneSymbol"
## [3] "GeneBiotype" "normal_mean"
## [5] "fibroid_mean" "foldchange_fibroid_vs_normal"
## [7] "foldchange_source"
dupsKept <- allDups[,c(1,2,6,7)]
colnames(dupsKept) <- c("Ensembl_ID","Genecards_ID", "FC_pathology_control","studySummarized")
paged_table(dupsKept)
write.csv(dupsKept,'common27_UF_genes.csv',row.names = F)
[4] “topGenePathology” “mediaType” “studySummarized”
[7] “GSE_study_ID”
dupsKept$topGenePathology <- "uterine fibroid myometrial tissue"
dupsKept$mediaType <- 'RNA of uterine fibroid biopsy tumor, normal, adjacent to tumor tissue'
dupsKept$studySummarized <- paste(dupsKept$studySummarized,"This study used RNA of uterine tissue with the hysterectomies of BLack and White females to compare the uterine fibroid gene expression data. The uterine fibroid and adjacent tissue to uterine fibroid as the 'at risk of fibroid' were from same uterus but the normal tissue was biopsied from a separate uterus of normal and healthy without any other patholgoy but having a hysterectomy due to personal reasons or because impacted by other health issues like pelvic organ prolapse. The study found that the Black females had more ECM and fibrinogen and fibronectin than White females and were more susceptible to developing fibroids before developing one. The samples were evenly split with half and half for 3 White normal, 3 White at risk, 3 White uterine fibroid, 3 Black normal, 3 Black at risk, and 3 Black uterine fibroid. There were a total of 18 samples", sep='...')
dupsKept$GSE_study_ID <- "GSE244187"
dupsKept1 <- dupsKept[,c(1:3,5,6,4,7)]
paged_table((dupsKept1))
Combine the two tables and save as csv.
Pathologies <- rbind(pathologies,dupsKept1)
paged_table(Pathologies[c(1:5,312:316),]) #look at first and last few observations
write.csv(Pathologies,'Pathologies_UF_added_4-2-2026.csv', row.names=F)
You can get the new pathologies database here.
Thanks so much, now we can move on with other pathologies of EBV associated and continue building our database to build a predictive model that can distinguish certain pathologies or show strong similarities between pathologies based on fold change values of various media types that have been normalized of gene expression values from research studies more current findings on pathologies selected.