## [1] "Assumed this study would be challenging due to the overwhelming amout of sugar in the current US diet. The challenge was to try to get data here or abroad. The studies outside the states are a lot more difficult to procure. The studies in the states for sugar are limited to specific organ targets or animal, non-human models in most cases; or even a drastically reduced sample size."
##
## The downloaded binary packages are in
## /var/folders/rg/x_7b05fn3sj3v_jq8q367xzm0000gn/T//RtmpAzHlVD/downloaded_packages
##
## Control Healthy.Obese Steatosis NASH
## 14 27 14 18
## [1] TRUE
## [1] 33298 21
## ID
## 1 7896736
## 2 7896738
## 3 7896740
## 4 7896742
## 5 7896744
## 6 7896746
## Gene title
## 1
## 2
## 3 olfactory receptor family 4 subfamily F member 17///olfactory receptor family 4 subfamily F member 5///olfactory receptor family 4 subfamily F member 4
## 4 uncharacterized LOC100134822///long intergenic non-protein coding RNA 266-1
## 5 olfactory receptor family 4 subfamily F member 29///olfactory receptor family 4 subfamily F member 21///olfactory receptor family 4 subfamily F member 16///olfactory receptor family 4 subfamily F member 3
## 6
## Gene symbol Gene ID
## 1
## 2
## 3 OR4F17///OR4F5///OR4F4 81099///79501///26682
## 4 LOC100134822///LINC00266-1 100134822///140849
## 5 OR4F29///OR4F21///OR4F16///OR4F3 729759///441308///81399///26683
## 6
## UniGene title UniGene symbol
## 1
## 2
## 3
## 4
## 5
## 6
## ProbeID Symbol EntrezID
## 3 7896740 OR4F17///OR4F5///OR4F4 <NA>
## 4 7896742 LOC100134822///LINC00266-1 <NA>
## 5 7896744 OR4F29///OR4F21///OR4F16///OR4F3 <NA>
## 10 7896754 LOC100287934///LOC100287497 <NA>
## 11 7896756 FAM87A 157693
## 12 7896759 LINC01128 643837
## GeneTitle
## 3 olfactory receptor family 4 subfamily F member 17///olfactory receptor family 4 subfamily F member 5///olfactory receptor family 4 subfamily F member 4
## 4 uncharacterized LOC100134822///long intergenic non-protein coding RNA 266-1
## 5 olfactory receptor family 4 subfamily F member 29///olfactory receptor family 4 subfamily F member 21///olfactory receptor family 4 subfamily F member 16///olfactory receptor family 4 subfamily F member 3
## 10 uncharacterized LOC100287934///septin 7 pseudogene
## 11 family with sequence similarity 87 member A
## 12 long intergenic non-protein coding RNA 1128
## Control Healthy.Obese Steatosis NASH
## 1 1 0 0 0
## 2 1 0 0 0
## 3 1 0 0 0
## 4 1 0 0 0
## 5 1 0 0 0
## 6 0 0 0 1
## 7 0 1 0 0
## 8 1 0 0 0
## 9 1 0 0 0
## 10 1 0 0 0
## 11 0 0 0 1
## 12 0 1 0 0
## 13 0 1 0 0
## 14 0 1 0 0
## 15 0 1 0 0
## 16 0 1 0 0
## 17 0 0 1 0
## 18 0 0 0 1
## 19 0 0 1 0
## 20 0 0 1 0
## 21 0 1 0 0
## 22 0 1 0 0
## 23 0 1 0 0
## 24 0 0 1 0
## 25 0 1 0 0
## 26 0 0 0 1
## 27 0 0 0 1
## 28 0 1 0 0
## 29 1 0 0 0
## 30 0 0 1 0
## 31 0 1 0 0
## 32 0 0 0 1
## 33 0 0 0 1
## 34 0 0 0 1
## 35 0 0 0 1
## 36 0 1 0 0
## 37 0 0 0 1
## 38 0 1 0 0
## 39 0 0 0 1
## 40 1 0 0 0
## 41 1 0 0 0
## 42 0 0 1 0
## 43 0 1 0 0
## 44 0 1 0 0
## 45 0 1 0 0
## 46 0 0 0 1
## 47 0 1 0 0
## 48 0 0 0 1
## 49 1 0 0 0
## 50 0 1 0 0
## 51 0 1 0 0
## 52 0 0 1 0
## 53 0 1 0 0
## 54 0 0 1 0
## 55 1 0 0 0
## 56 0 0 1 0
## 57 0 0 0 1
## 58 0 0 1 0
## 59 0 1 0 0
## 60 0 0 1 0
## 61 0 1 0 0
## 62 1 0 0 0
## 63 0 1 0 0
## 64 0 0 0 1
## 65 0 0 1 0
## 66 0 0 0 1
## 67 0 0 0 1
## 68 0 0 1 0
## 69 0 1 0 0
## 70 0 0 0 1
## 71 0 0 1 0
## 72 0 1 0 0
## 73 0 1 0 0
## attr(,"assign")
## [1] 1 1 1 1
## attr(,"contrasts")
## attr(,"contrasts")$`metadata$Group`
## [1] "contr.treatment"
## [1] "EDIT MEE.............It's important to note however that this risk can be mitigated/amplified by other factors. Thus, later we will review some system or pathway analysis."
## disease_group
## Control Disease
## 14 59
## logFC AveExpr t P.Value adj.P.Val B
## 8067985 1.6581257 6.775381 5.702269 2.242235e-07 0.007465971 6.272205
## 7949995 -0.4404043 7.309512 -5.487978 5.348513e-07 0.008904472 5.537474
## 7946641 0.4541986 5.527712 5.352440 9.210477e-07 0.010222709 5.077742
## 7957221 1.0836217 4.805792 5.200089 1.686022e-06 0.012352448 4.566093
## 7926223 0.4601744 6.958234 5.175874 1.854889e-06 0.012352448 4.485300
## 8020827 1.2492322 6.003273 5.071523 2.792822e-06 0.014886257 4.138885
## Symbol logFC t P.Value adj.P.Val
## 21264 NCAM2 1.6581257 5.702269 2.242235e-07 0.007465971
## 9575 MRPL21 -0.4404043 -5.487978 5.348513e-07 0.008904472
## 9213 GALNT18 0.4541986 5.352440 9.210477e-07 0.010222709
## 7128 CAMK1D 0.4601744 5.175874 1.854889e-06 0.012352448
## 10318 TRHDE-AS1///TRHDE 1.0836217 5.200089 1.686022e-06 0.012352448
## 7319 H2AFY2 0.5971328 5.042337 3.129525e-06 0.014886257
###Need addititional explanative HERE###
This is transcriptomics being converted to systems biology. The way to measure this goes beyond basic DE analysis, although DE analysis can be foundation to downstream analysis .
Let’s first do some additional analysis and visualization. We will first filter for low occurrence to improve signal/noise, thereby improving analysis and statistical significance. This improves confidence, reduces false-discovery, increases confidence, and concentrations active transcripts.
###Histogram
library(pheatmap)
# ---- 1) Make sure disease_group exists + is ordered ----
disease_group <- factor(disease_group, levels = c("Control", "Disease"))
# Important: name it so it can align to sample IDs
names(disease_group) <- rownames(metadata)
# ---- 2) If you already created expr_by_gene earlier, skip this block ----
# (expr_by_gene = gene symbols x samples)
if (!exists("expr_by_gene")) {
# Build probe -> symbol map from anno
probe2symbol <- anno[, c("ProbeID", "Symbol")]
probe2symbol <- probe2symbol[!is.na(probe2symbol$Symbol) & probe2symbol$Symbol != "", ]
# Expression matrix (probe x sample)
expr <- exprs(gse)
# Keep only probes with symbols
common_probes <- intersect(rownames(expr), probe2symbol$ProbeID)
expr <- expr[common_probes, , drop = FALSE]
# Merge expression with symbols
expr_df <- as.data.frame(expr)
expr_df$ProbeID <- rownames(expr_df)
expr_df <- merge(expr_df, probe2symbol, by = "ProbeID")
# Collapse probes -> gene symbol (mean expression)
library(dplyr)
expr_by_gene <- expr_df %>%
dplyr::select(-ProbeID) %>%
dplyr::group_by(Symbol) %>%
dplyr::summarise(across(where(is.numeric), mean, na.rm = TRUE)) %>%
as.data.frame()
rownames(expr_by_gene) <- expr_by_gene$Symbol
expr_by_gene$Symbol <- NULL
}
# ---- 3) Select top genes from res_all_sym ----
top_n <- 50
genes <- head(res_all_sym$Symbol, top_n)
genes <- genes[genes %in% rownames(expr_by_gene)]
# ---- 4) Build heatmap matrix and Z-score rows ----
mat <- expr_by_gene[genes, , drop = FALSE]
mat_z <- t(scale(t(mat)))
mat_z <- mat_z[complete.cases(mat_z), , drop = FALSE]
# ---- 5) Align and order samples as Control -> Disease ----
grp <- disease_group[colnames(mat_z)] # align to columns
col_order <- order(grp) # Control first, then Disease
mat_z <- mat_z[, col_order, drop = FALSE]
grp <- grp[col_order]
# ---- 6) Annotation + gap between groups ----
anno_col <- data.frame(Group = grp)
rownames(anno_col) <- colnames(mat_z)
gap_after <- sum(grp == "Control")
# ---- 7) Plot ----
pheatmap(
mat_z,
annotation_col = anno_col,
show_colnames = FALSE,
fontsize_row = 7,
cluster_cols = FALSE, # key: keep grouped order
cluster_rows = TRUE,
gaps_col = gap_after, # key: visual separator
clustering_distance_rows = "correlation",
main = paste0("Top ", top_n, " Genes (Symbol): Disease vs Control (Grouped)")
)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -5.48798 -0.86757 -0.04836 -0.05441 0.76012 5.70227
## [1] 17416
## ID
## R-HSA-8868773 R-HSA-8868773
## R-HSA-72312 R-HSA-72312
## R-HSA-72613 R-HSA-72613
## R-HSA-72737 R-HSA-72737
## R-HSA-6791226 R-HSA-6791226
## R-HSA-72706 R-HSA-72706
## Description
## R-HSA-8868773 rRNA processing in the nucleus and cytosol
## R-HSA-72312 rRNA processing
## R-HSA-72613 Eukaryotic Translation Initiation
## R-HSA-72737 Cap-dependent Translation Initiation
## R-HSA-6791226 Major pathway of rRNA processing in the nucleolus and cytosol
## R-HSA-72706 GTP hydrolysis and joining of the 60S ribosomal subunit
## NES pvalue p.adjust qvalue
## R-HSA-8868773 -3.265960 1e-10 4.55e-09 3.236842e-09
## R-HSA-72312 -3.246689 1e-10 4.55e-09 3.236842e-09
## R-HSA-72613 -3.237650 1e-10 4.55e-09 3.236842e-09
## R-HSA-72737 -3.237650 1e-10 4.55e-09 3.236842e-09
## R-HSA-6791226 -3.229929 1e-10 4.55e-09 3.236842e-09
## R-HSA-72706 -3.214599 1e-10 4.55e-09 3.236842e-09
## ID Description NES pvalue
## GO:0042254 GO:0042254 ribosome biogenesis -3.073861 1e-10
## GO:0006364 GO:0006364 rRNA processing -2.958005 1e-10
## GO:0002181 GO:0002181 cytoplasmic translation -2.905639 1e-10
## GO:0042274 GO:0042274 ribosomal small subunit biogenesis -2.905614 1e-10
## GO:0016072 GO:0016072 rRNA metabolic process -2.850143 1e-10
## GO:0022613 GO:0022613 ribonucleoprotein complex biogenesis -2.838387 1e-10
## p.adjust qvalue
## GO:0042254 6.807778e-08 6.072515e-08
## GO:0006364 6.807778e-08 6.072515e-08
## GO:0002181 6.807778e-08 6.072515e-08
## GO:0042274 6.807778e-08 6.072515e-08
## GO:0016072 6.807778e-08 6.072515e-08
## GO:0022613 6.807778e-08 6.072515e-08
## ID Description NES pvalue
## hsa03010 hsa03010 Ribosome -3.164054 1.000000e-10
## hsa05171 hsa05171 Coronavirus disease - COVID-19 -2.241228 1.000000e-10
## hsa05014 hsa05014 Amyotrophic lateral sclerosis -1.973762 1.667661e-09
## hsa03040 hsa03040 Spliceosome -2.196415 5.370261e-09
## hsa03008 hsa03008 Ribosome biogenesis in eukaryotes -2.390366 2.638503e-08
## hsa05012 hsa05012 Parkinson disease -1.938957 9.049412e-08
## p.adjust qvalue
## hsa03010 1.750000e-08 1.305263e-08
## hsa05171 1.750000e-08 1.305263e-08
## hsa05014 1.945605e-07 1.451158e-07
## hsa03040 4.698978e-07 3.504802e-07
## hsa03008 1.846952e-06 1.377576e-06
## hsa05012 5.278824e-06 3.937288e-06
Pathway Analysis Section: GO, Kegg, & Reactome Analysis Type,Function,Best For: GO (BP),enrichGO,“Broad biological mechanisms (e.g.,”“Cell Proliferation”“)” KEGG,enrichKEGG,“Well-defined metabolic/signaling maps (e.g.,”“Glycolysis”“)” Reactome,enrichPathway,Detailed molecular reactions and hierarchies
GO DETAIL: Category,Question it Answers,Level of Detail BP (Biological Process),What is the overall goal?,System-wide / Cellular program MF (Molecular Function),What is the chemical task?,Molecular / Biochemical CC (Cellular Component),Where is this happening?,Structural / Spatial
Results Explained: ####### References ################
## [1] "1.Ahrens M, Ammerpohl O, von Schönfels W, Kolarova J et al. DNA methylation analysis in nonalcoholic fatty liver disease suggests distinct disease-specific and remodeling signatures after bariatric surgery. Cell Metab 2013 Aug 6;18(2):296-302. PMID: 23931760"
## [1] "2. Changolkar LN, Singh G, Cui K, Berletch JB, Zhao K, Disteche CM, Pehrson JR. Genome-wide distribution of macroH2A1 histone variants in mouse liver chromatin. Mol Cell Biol. 2010 Dec;30(23):5473-83. doi: 10.1128/MCB.00518-10. Epub 2010 Oct 11. PMID: 20937776; PMCID: PMC2976432."
## [1] "3. Changolkar LN, Costanzi C, Leu NA, Chen D, McLaughlin KJ, Pehrson JR (2007) Developmental changes in histone macroH2A1‐mediated gene regulation. Mol Cell Biol 27: 2758–2764"
## [1] "Hjelkrem M, Stauch C, Shaw J, Harrison SA. Validation of the non-alcoholic fatty liver disease activity score. Aliment Pharmacol Ther. 2011 Jul;34(2):214-8. doi: 10.1111/j.1365-2036.2011.04695.x. Epub 2011 May 18. PMID: 21585409."
## [1] "6. Software:"
## Please cite the following if utilizing the GEOquery software:
##
## Davis S, Meltzer P (2007). "GEOquery: a bridge between the Gene
## Expression Omnibus (GEO) and BioConductor." _Bioinformatics_, *14*,
## 1846-1847. doi:10.1093/bioinformatics/btm254
## <https://doi.org/10.1093/bioinformatics/btm254>.
##
## A BibTeX entry for LaTeX users is
##
## @Article{,
## author = {Sean Davis and Paul Meltzer},
## title = {GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor},
## journal = {Bioinformatics},
## year = {2007},
## volume = {14},
## pages = {1846--1847},
## doi = {10.1093/bioinformatics/btm254},
## }
## To cite package 'DESeq2' in publications use:
##
## Love, M.I., Huber, W., Anders, S. Moderated estimation of fold change
## and dispersion for RNA-seq data with DESeq2 Genome Biology 15(12):550
## (2014)
##
## A BibTeX entry for LaTeX users is
##
## @Article{,
## title = {Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2},
## author = {Michael I. Love and Wolfgang Huber and Simon Anders},
## year = {2014},
## journal = {Genome Biology},
## doi = {10.1186/s13059-014-0550-8},
## volume = {15},
## issue = {12},
## pages = {550},
## }
## To cite package 'pheatmap' in publications use:
##
## Kolde R (2025). _pheatmap: Pretty Heatmaps_.
## doi:10.32614/CRAN.package.pheatmap
## <https://doi.org/10.32614/CRAN.package.pheatmap>, R package version
## 1.0.13, <https://CRAN.R-project.org/package=pheatmap>.
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {pheatmap: Pretty Heatmaps},
## author = {Raivo Kolde},
## year = {2025},
## note = {R package version 1.0.13},
## url = {https://CRAN.R-project.org/package=pheatmap},
## doi = {10.32614/CRAN.package.pheatmap},
## }
## Please cite G. Yu (2015) for using ReactomePA. In addition, please cite
## G. Yu (2012) when using compareCluster in clusterProfiler package, G.
## Yu (2015) when applying enrichment analysis to NGS data by using
## ChIPseeker
##
## Guangchuang Yu, Qing-Yu He. ReactomePA: an R/Bioconductor package for
## reactome pathway analysis and visualization. Molecular BioSystems
## 2016, 12(2):477-479
##
## A BibTeX entry for LaTeX users is
##
## @Article{,
## title = {ReactomePA: an R/Bioconductor package for reactome pathway analysis and visualization},
## author = {Guangchuang Yu and Qing-Yu He},
## journal = {Molecular BioSystems},
## year = {2016},
## volume = {12},
## number = {12},
## pages = {477-479},
## pmid = {26661513},
## url = {http://pubs.rsc.org/en/Content/ArticleLanding/2015/MB/C5MB00663E},
## doi = {10.1039/C5MB00663E},
## }
## Please cite S. Xu (2024) for using clusterProfiler. In addition, please
## cite G. Yu (2010) when using GOSemSim, G. Yu (2015) when using DOSE and
## G. Yu (2015) when using ChIPseeker.
##
## G Yu. Thirteen years of clusterProfiler. The Innovation. 2024,
## 5(6):100722
##
## S Xu, E Hu, Y Cai, Z Xie, X Luo, L Zhan, W Tang, Q Wang, B Liu, R
## Wang, W Xie, T Wu, L Xie, G Yu. Using clusterProfiler to characterize
## multiomics data. Nature Protocols. 2024, 19(11):3292-3320
##
## T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L
## Zhan, X Fu, S Liu, X Bo, and G Yu. clusterProfiler 4.0: A universal
## enrichment tool for interpreting omics data. The Innovation. 2021,
## 2(3):100141
##
## Guangchuang Yu, Li-Gen Wang, Yanyan Han and Qing-Yu He.
## clusterProfiler: an R package for comparing biological themes among
## gene clusters. OMICS: A Journal of Integrative Biology 2012,
## 16(5):284-287
##
## To see these entries in BibTeX format, use 'print(<citation>,
## bibtex=TRUE)', 'toBibtex(.)', or set
## 'options(citation.bibtex.max=999)'.
## To cite package 'limma' in publications use:
##
## Ritchie, M.E., Phipson, B., Wu, D., Hu, Y., Law, C.W., Shi, W., and
## Smyth, G.K. (2015). limma powers differential expression analyses for
## RNA-sequencing and microarray studies. Nucleic Acids Research 43(7),
## e47.
##
## A BibTeX entry for LaTeX users is
##
## @Article{,
## author = {Matthew E Ritchie and Belinda Phipson and Di Wu and Yifang Hu and Charity W Law and Wei Shi and Gordon K Smyth},
## title = {{limma} powers differential expression analyses for {RNA}-sequencing and microarray studies},
## journal = {Nucleic Acids Research},
## year = {2015},
## volume = {43},
## number = {7},
## pages = {e47},
## doi = {10.1093/nar/gkv007},
## }
## To cite package 'enrichplot' in publications use:
##
## Yu G (2025). _enrichplot: Visualization of Functional Enrichment
## Result_. doi:10.18129/B9.bioc.enrichplot
## <https://doi.org/10.18129/B9.bioc.enrichplot>, R package version
## 1.30.4, <https://bioconductor.org/packages/enrichplot>.
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {enrichplot: Visualization of Functional Enrichment Result},
## author = {Guangchuang Yu},
## year = {2025},
## note = {R package version 1.30.4},
## url = {https://bioconductor.org/packages/enrichplot},
## doi = {10.18129/B9.bioc.enrichplot},
## }
## To cite package 'org.Hs.eg.db' in publications use:
##
## Carlson M (2025). _org.Hs.eg.db: Genome wide annotation for Human_. R
## package version 3.22.0.
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {org.Hs.eg.db: Genome wide annotation for Human},
## author = {Marc Carlson},
## year = {2025},
## note = {R package version 3.22.0},
## }
##
## ATTENTION: This citation information has been auto-generated from the
## package DESCRIPTION file and may need manual editing, see
## 'help("citation")'.
## To cite ggplot2 in publications, please use
##
## H. Wickham. ggplot2: Elegant Graphics for Data Analysis.
## Springer-Verlag New York, 2016.
##
## A BibTeX entry for LaTeX users is
##
## @Book{,
## author = {Hadley Wickham},
## title = {ggplot2: Elegant Graphics for Data Analysis},
## publisher = {Springer-Verlag New York},
## year = {2016},
## isbn = {978-3-319-24277-4},
## url = {https://ggplot2.tidyverse.org},
## }
## To cite package 'stringr' in publications use:
##
## Wickham H (2025). _stringr: Simple, Consistent Wrappers for Common
## String Operations_. doi:10.32614/CRAN.package.stringr
## <https://doi.org/10.32614/CRAN.package.stringr>, R package version
## 1.6.0, <https://CRAN.R-project.org/package=stringr>.
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {stringr: Simple, Consistent Wrappers for Common String Operations},
## author = {Hadley Wickham},
## year = {2025},
## note = {R package version 1.6.0},
## url = {https://CRAN.R-project.org/package=stringr},
## doi = {10.32614/CRAN.package.stringr},
## }
## To cite package 'tidyverse' in publications use:
##
## Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R,
## Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller
## E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V,
## Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). "Welcome to
## the tidyverse." _Journal of Open Source Software_, *4*(43), 1686.
## doi:10.21105/joss.01686 <https://doi.org/10.21105/joss.01686>.
##
## A BibTeX entry for LaTeX users is
##
## @Article{,
## title = {Welcome to the {tidyverse}},
## author = {Hadley Wickham and Mara Averick and Jennifer Bryan and Winston Chang and Lucy D'Agostino McGowan and Romain François and Garrett Grolemund and Alex Hayes and Lionel Henry and Jim Hester and Max Kuhn and Thomas Lin Pedersen and Evan Miller and Stephan Milton Bache and Kirill Müller and Jeroen Ooms and David Robinson and Dana Paige Seidel and Vitalie Spinu and Kohske Takahashi and Davis Vaughan and Claus Wilke and Kara Woo and Hiroaki Yutani},
## year = {2019},
## journal = {Journal of Open Source Software},
## volume = {4},
## number = {43},
## pages = {1686},
## doi = {10.21105/joss.01686},
## }