See Olney et al. 2024 for sample details.
Raw Fastq files are available on SRA Short Read Archive (SRA) BioProject PRJNA1023207
GitHub contains a reproducible snakemake pipeline for taking in the raw fastq files and generating the STAR-aligned counts data: https://github.com/fryerlab/LBD_CWOW\
Interactive link to query genes: https://fryerlab.shinyapps.io/LBD_CWOW/

Data

Explore Metadata

Control samples have a Braak NFT stage of less than III and Thal amyloid phase of less than 2.
Pathological amyloid (PA) is defined as Braak NFT stage of less than III and Thal amyloid phase of greater than or equal to 2.
Alzheimer’s disease (AD) is defined as Thal amyloid phase of equal or greater than 3 and a Braak NFT stage of V and greater. Lewy body disease (LBD) is defined if LBs were present.

Control, Pathological Amyloid (PA), Alzheimer’s disease (AD), Lewy Body Disease (LBD)

Neuropathology count
CONTROL 81
PA 39
AD 53
LBD 436

Visualize the distribution of data.

Visualize gene expression of sex chromosomes linked genes when aligned to the default reference GRCh38.

Samples are realigned to the sex chromosome complement (Olney et al. 2020) https://doi.org/10.1186/s13293-020-00312-9

Visualize the relationships between covariates.

Filter Genes

Counts were filtered to remove mitochondrial (MT) genes and keep only expressed protein-coding genes. A gene is expressed if at least 70% of the smallest group size has a minimum CPM count of 1. Before filtering, there were 60,649 genes, and post-filtering retained 15,208 expressed protein-coding genes.

gene_biotype fraction
protein_coding 1

15,208 genes are considered expressed and will be kept for downstream analysis.

PCA

Principal Components analysis

Covariates

Model Identification

To determine a parsimonious model for differential expression, we applied a Bayesian Information Criterion (BIC)-based approach using a forward stepwise assessment of all variables. The BIC approach applies a penalty term for each added parameter in the model to avoid overfitting. The model with the lowest BIC score included batch, sex, RIN, ENO2 expression, and the percentage of reads aligning to coding, intronic, and intergenic regions. We further explored potential sources of variation, including APOE ε4+ status, brain weight, and age of death. Nonetheless, these factors demonstrated minimal explanatory power concerning the variability within the expression data. Subsequently, our final BIC model was formulated, encompassing ~ 0 + disease type + ENO2 expression + batch + sex + RIN + % coding + % intronic + % intergenic.

Percentage of variance explained by each variable.

Differential Expression

A linear model was fitted to identify differentially expressed genes (DEGs). The modeling was conducted by utilizing the limma lmfit function, which fits an individual model to the expression values of each gene. Differential expression analyses were performed on each sex, using only female (XX) or male (XY) samples, to identify sex-specific gene dysregulation. We identified significant expression differences using a Benjamini-Hochberg adjusted p-value threshold below 0.05 (5%), referred to as q-value, and we required an absolute log2 fold change greater than 0.25 to ensure the practical relevance of identified differential expression patterns.
Interactive link to query genes: https://fryerlab.shinyapps.io/LBD_CWOW/

WGNCA

To identify co-expression gene modules associated with each disease type, we implemented Weighted Gene Correlation Network Analysis (WGCNA).WGCNA reads stabilized counts as input; we, therefore, input the voom transformed counts described in Filtering and normalization.Topological overlap matrices were constructed to find modular structures within the co-expression network using a signed network. Hierarchical clustering was executed to identify interconnected gene clusters representing co-expression modules, resulting in a gene cluster dendrogram by the Dynamic Hybrid tree cut function. Gene modules were then trimmed of genes whose correlation with module eigengene (ME) was less than 0.25. An ME denotes the first principal components of each module. These MEs serve as expression representatives for all genes within a given module. A Pearson’s correlation coefficient was then determined for the association between gene modules and clinical variables such as pathology scores, disease type, sex, brain weight, APOE ε4 positive count, and age at death, followed by conservative Benjamini & Hochberg multiple test correction.