See Olney et al. 2024 for sample details.
Raw Fastq files are available on SRA Short Read Archive (SRA) BioProject
PRJNA1023207
GitHub contains a reproducible snakemake pipeline for taking in the raw
fastq files and generating the STAR-aligned counts data: https://github.com/fryerlab/LBD_CWOW\
Interactive link to query genes: https://fryerlab.shinyapps.io/LBD_CWOW/
Control samples have a Braak NFT stage of less than III and Thal
amyloid phase of less than 2.
Pathological amyloid (PA) is defined as Braak NFT stage of less than III
and Thal amyloid phase of greater than or equal to 2.
Alzheimer’s disease (AD) is defined as Thal amyloid phase of equal or
greater than 3 and a Braak NFT stage of V and greater. Lewy body disease
(LBD) is defined if LBs were present.
Control, Pathological Amyloid (PA), Alzheimer’s disease (AD), Lewy Body Disease (LBD)
| Neuropathology | count |
|---|---|
| CONTROL | 81 |
| PA | 39 |
| AD | 53 |
| LBD | 436 |
Visualize the distribution of data.
Visualize gene expression of sex chromosomes linked genes when
aligned to the default reference GRCh38.
Samples are realigned to the sex chromosome complement (Olney et al. 2020) https://doi.org/10.1186/s13293-020-00312-9
Visualize the relationships between covariates.
Counts were filtered to remove mitochondrial (MT) genes and keep only expressed protein-coding genes. A gene is expressed if at least 70% of the smallest group size has a minimum CPM count of 1. Before filtering, there were 60,649 genes, and post-filtering retained 15,208 expressed protein-coding genes.
| gene_biotype | fraction |
|---|---|
| protein_coding | 1 |
15,208 genes are considered expressed and will be kept for downstream analysis.
Principal Components analysis
To determine a parsimonious model for differential expression, we applied a Bayesian Information Criterion (BIC)-based approach using a forward stepwise assessment of all variables. The BIC approach applies a penalty term for each added parameter in the model to avoid overfitting. The model with the lowest BIC score included batch, sex, RIN, ENO2 expression, and the percentage of reads aligning to coding, intronic, and intergenic regions. We further explored potential sources of variation, including APOE ε4+ status, brain weight, and age of death. Nonetheless, these factors demonstrated minimal explanatory power concerning the variability within the expression data. Subsequently, our final BIC model was formulated, encompassing ~ 0 + disease type + ENO2 expression + batch + sex + RIN + % coding + % intronic + % intergenic.
Percentage of variance explained by each variable.
A linear model was fitted to identify differentially expressed genes
(DEGs). The modeling was conducted by utilizing the limma lmfit
function, which fits an individual model to the expression values of
each gene. Differential expression analyses were performed on each sex,
using only female (XX) or male (XY) samples, to identify sex-specific
gene dysregulation. We identified significant expression differences
using a Benjamini-Hochberg adjusted p-value threshold below 0.05 (5%),
referred to as q-value, and we required an absolute log2 fold change
greater than 0.25 to ensure the practical relevance of identified
differential expression patterns.
Interactive link to query genes: https://fryerlab.shinyapps.io/LBD_CWOW/
To identify co-expression gene modules associated with each disease type, we implemented Weighted Gene Correlation Network Analysis (WGCNA).WGCNA reads stabilized counts as input; we, therefore, input the voom transformed counts described in Filtering and normalization.Topological overlap matrices were constructed to find modular structures within the co-expression network using a signed network. Hierarchical clustering was executed to identify interconnected gene clusters representing co-expression modules, resulting in a gene cluster dendrogram by the Dynamic Hybrid tree cut function. Gene modules were then trimmed of genes whose correlation with module eigengene (ME) was less than 0.25. An ME denotes the first principal components of each module. These MEs serve as expression representatives for all genes within a given module. A Pearson’s correlation coefficient was then determined for the association between gene modules and clinical variables such as pathology scores, disease type, sex, brain weight, APOE ε4 positive count, and age at death, followed by conservative Benjamini & Hochberg multiple test correction.