Introduction

Type 1 diabetes (T1D) is an autoimmune disease characterized by the targeted destruction of insulin-producing pancreatic β-cells. Previous genetic studies have identified TYK2 as a T1D candidate gene. TYK2 encodes a Janus kinase involved in type I interferon (IFN-I) signaling, which mediates antigen presentation and thus enhances immune recognition. Interestingly, loss-of-function mutations in TYK2 have been linked to increased T1D risk in some populations, whereas a partial loss-of-function variant appears protective against autoimmune disease in others. The goal of this study was to elucidate TYK2’s roles in beta-cell development and IFN-alpha-mediated immune response.

The authors employed an array of bioinformatics techniques, predominantly for transcriptomic analysis. They first used ultra-deep bulk RNA-seq to profile global gene expression across stages of stem cell-derived islet (SC-islet) differentiation and to compare wild-type and TYK2 knockout (KO) lines. Differential expression and reactome pathway enrichment analyses identified transcriptomic differences underlying the KO cells’ halted differentiation. Single-cell RNA-seq and gene-set enrichment further clarified the mechanism of beta cell depletion in developing KO SC-islets, linking it to KRAS upregulation and consequent premature cell cycle progression. Additional transcriptomic profiling of wild-type and KO SC-islets after IFN-alpha treatment elucidated the impact of TYK2 on IFN-I signaling. Together, these bioinformatic techniques revealed a dual role of TYK2 in promoting beta-cell differentiation and MHC-I-mediated immune recognition, supporting its potential as a therapeutic target for T1D [1].

Methods

We were provided with paired-end RNA sequencing reads from a study conducted by Chandra et al [1]. Specifically, we received 6 RNAseq samples from endocrine precursor cells, 3 WT and 3 TYK2 KO. We were also provided with the GRCh38 genome assembly and gencode.v45 annotation. Quality control of the reads was performed using FastQC v0.12.1 in default mode [2]. We indexed the genome using STAR v2.7.11b in genomeGenerate mode with the provided GTF annotation. Reads were aligned to the reference in a splice-aware fashion using STAR v2.7.11b with outputs generated in BAM format [3]. We then used MultiQC v1.25 in default mode to aggregate all FastQC output zips and STAR alignment logs into a QC report [4]. Gene counts for each sample were quantified using VERSE v0.1.5 run in default mode given the gtf annotation and STAR alignments [5]. Gene counts tables from separate samples were concatenated into a total counts matrix.

We used DEseq2 v1.44.0 to run a basic differential expression analysis on the gene counts [6]. This entailed creating a ‘coldata’ dataframe mapping samples to their corresponding conditions - “E” for experimental or “C” for control. The gene counts matrix was filtered using a method described below. We then created our DESeq2 object, specifying ‘condition’ as the design parameter. DESeq2 was run with the contrast argument set to c(“condition”,“E”,“C”). We parsed the gtf annotation file to create a mapping of gene ids to human genome symbols, and we added these symbols to the table of significant genes outputted by DESeq2. We filtered significant genes by an adjusted p-value threshold of 0.01, and then performed enrichment analysis on this gene set via ENRICHR [7]. Finally, we performed an additional gene set enrichment analysis using GSEA v1.32.4 in default mode against the C2 canonical pathways MSIGDB dataset [9].

Evaluation of QC metrics for the full data

RNAseq was performed at a very high depth, with read counts ranging from 84-119 million per sample. Each sample shows over 95% alignment to the reference genome, with over 91% of reads in each sample uniquely aligned to one location in the genome. Furthermore, all samples have sufficiently high quality scores throughout the length of reads, and the average GC content displays a roughly normal distribution centered at 50%. Additionally, all 12 samples had less than 1% of reads made up of overrepresented sequences. The only flagged metrics are Per Base Sequence Content and Sequence Duplication Levels. All samples exhibit unequal base proportions at the start of reads, likely reflecting adapter or random hexamer priming bias, which is pretty standard in RNA-seq experiments and difficult to avoid. Furthermore, approximately 75% of reads appear duplicated, which could stem from the ‘Ultra-deep’ nature of their RNAseq methodology and the fact that only the transcriptome was sequenced. Since other metrics such as GC content, sequence quality, and alignment rates are all promising, it is likely fine to move forward with the data as is.

DESeq2 Setup

library(DESeq2)

## Loading required package: S4Vectors

## Loading required package: stats4

## Loading required package: BiocGenerics

## 
## Attaching package: 'BiocGenerics'

## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs

## The following objects are masked from 'package:base':
## 
##     anyDuplicated, aperm, append, as.data.frame, basename, cbind,
##     colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
##     get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
##     match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
##     Position, rank, rbind, Reduce, rownames, sapply, saveRDS, setdiff,
##     table, tapply, union, unique, unsplit, which.max, which.min

## 
## Attaching package: 'S4Vectors'

## The following object is masked from 'package:utils':
## 
##     findMatches

## The following objects are masked from 'package:base':
## 
##     expand.grid, I, unname

## Loading required package: IRanges

## Loading required package: GenomicRanges

## Loading required package: GenomeInfoDb

## Loading required package: SummarizedExperiment

## Loading required package: MatrixGenerics

## Loading required package: matrixStats

## 
## Attaching package: 'MatrixGenerics'

## The following objects are masked from 'package:matrixStats':
## 
##     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
##     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
##     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
##     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
##     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
##     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
##     colWeightedMeans, colWeightedMedians, colWeightedSds,
##     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
##     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
##     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
##     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
##     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
##     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
##     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
##     rowWeightedSds, rowWeightedVars

## Loading required package: Biobase

## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.

## 
## Attaching package: 'Biobase'

## The following object is masked from 'package:MatrixGenerics':
## 
##     rowMedians

## The following objects are masked from 'package:matrixStats':
## 
##     anyMissing, rowMedians

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ lubridate::%within%() masks IRanges::%within%()
## ✖ dplyr::collapse()     masks IRanges::collapse()
## ✖ dplyr::combine()      masks Biobase::combine(), BiocGenerics::combine()
## ✖ dplyr::count()        masks matrixStats::count()
## ✖ dplyr::desc()         masks IRanges::desc()
## ✖ tidyr::expand()       masks S4Vectors::expand()
## ✖ dplyr::filter()       masks stats::filter()
## ✖ dplyr::first()        masks S4Vectors::first()
## ✖ dplyr::lag()          masks stats::lag()
## ✖ ggplot2::Position()   masks BiocGenerics::Position(), base::Position()
## ✖ purrr::reduce()       masks GenomicRanges::reduce(), IRanges::reduce()
## ✖ dplyr::rename()       masks S4Vectors::rename()
## ✖ lubridate::second()   masks S4Vectors::second()
## ✖ lubridate::second<-() masks S4Vectors::second<-()
## ✖ dplyr::slice()        masks IRanges::slice()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Read in the counts matrix

cts <- as.matrix(read.csv("/projectnb/bf528/students/hbeakley/project-2-hrbeakley/results/counts_matrix.tsv", sep = "\t", row.names='gene'))

colData

coldata <- read.csv('/projectnb/bf528/students/hbeakley/project-2-hrbeakley/coldata.csv')

Filtering the counts matrix

I pre-filtered the counts matrix following the DESeq2 vignette recommendation, keeping only genes that have at least 10 read counts in a minimum number of samples. I set the minimal number of samples to the smallest experimental group size (3 in this case), as per the vignette’s suggestion [6]. This cut down the number of genes in the counts matrix from 63,241 to 20,579.

smallestGroupSize <- 3
#filter counts matrix to only include rows that have counts >= 10 in at least 3 samples
keep <- cts[rowSums(cts >= 10) >= smallestGroupSize,]

#display the number of genes present before and after filtering
before <- nrow(cts)
after <- nrow(keep) 

cat("Genes before filtering:", before, "\n")

## Genes before filtering: 63241

cat("Genes after filtering:", after, "\n")

## Genes after filtering: 20579

cat("Removed", before - after, "genes (",
    round((before - after) / before * 100, 1), "% )\n")

## Removed 42662 genes ( 67.5 % )

Here are histograms showing the distribution of total read counts per gene before and after filtering.

par(mfrow = c(1, 2))
hist(log10(rowSums(cts)+1), breaks=100, main = "Before Filtering", xlab = "Read Counts Per Gene", ylab = "Frequency", col = "lightblue")
hist(log10(rowSums(keep)+1), breaks=100, main = "After Filtering", xlab = "Read Counts Per Gene", ylab = "Frequency", col = "lightgreen")

par(mfrow = c(1, 1))

Differential Expression Analysis

# Constructing the DESeq object
dds <- DESeqDataSetFromMatrix(countData = keep,
                              colData = coldata,
                              design = ~ condition)

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

# Running DEseq
dds <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

res <- results(dds, contrast=c("condition","E","C"))

# DESeq2 results for the top ten significant genes ranked by padj
resOrdered <- res[order(res$padj),]
deseq2_res <- as_tibble(resOrdered, rownames='gene')
ids <- read_tsv("/projectnb/bf528/students/hbeakley/project-2-hrbeakley/results/gene_id_mappings.tsv", col_names = c('gene', 'symbol'))

## Rows: 63241 Columns: 2

## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (2): gene, symbol
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

deseq2_res <- deseq2_res %>% left_join(ids) %>% relocate(symbol)

## Joining with `by = join_by(gene)`

head(deseq2_res, n = 10)

## # A tibble: 10 × 8
##    symbol         gene  baseMean log2FoldChange  lfcSE  stat    pvalue      padj
##    <chr>          <chr>    <dbl>          <dbl>  <dbl> <dbl>     <dbl>     <dbl>
##  1 RPS4Y1         ENSG…   10677.          -8.77 0.142  -61.7 0         0        
##  2 ENSG000002895… ENSG…     732.           3.70 0.125   29.7 1.66e-193 1.58e-189
##  3 PNPO           ENSG…     605.          -4.16 0.141  -29.5 2.13e-191 1.35e-187
##  4 PCDHGA10       ENSG…     497.           4.48 0.200   22.4 6.36e-111 3.02e-107
##  5 YPEL3-DT       ENSG…     525.          -2.42 0.115  -21.0 1.03e- 97 3.92e- 94
##  6 LINC02506      ENSG…    1301.          -1.71 0.0986 -17.4 1.17e- 67 3.70e- 64
##  7 SLC2A14        ENSG…     634.           4.69 0.279   16.8 2.51e- 63 6.81e- 60
##  8 ENSG000002863… ENSG…     907.          -1.71 0.115  -14.9 2.41e- 50 5.71e- 47
##  9 ENSG000002894… ENSG…     384.          -2.13 0.145  -14.7 3.28e- 49 6.93e- 46
## 10 ENSG000002829… ENSG…     121.           3.76 0.264   14.2 6.86e- 46 1.30e- 42

Filtering significant genes by padj

I chose 0.01 as a padj threshold, leaving me with 735 significant genes (332 upregulated and 403 downregulated).

sig <- deseq2_res %>% filter(padj < 0.01)
head(sig)

## # A tibble: 6 × 8
##   symbol          gene  baseMean log2FoldChange  lfcSE  stat    pvalue      padj
##   <chr>           <chr>    <dbl>          <dbl>  <dbl> <dbl>     <dbl>     <dbl>
## 1 RPS4Y1          ENSG…   10677.          -8.77 0.142  -61.7 0         0        
## 2 ENSG00000289575 ENSG…     732.           3.70 0.125   29.7 1.66e-193 1.58e-189
## 3 PNPO            ENSG…     605.          -4.16 0.141  -29.5 2.13e-191 1.35e-187
## 4 PCDHGA10        ENSG…     497.           4.48 0.200   22.4 6.36e-111 3.02e-107
## 5 YPEL3-DT        ENSG…     525.          -2.42 0.115  -21.0 1.03e- 97 3.92e- 94
## 6 LINC02506       ENSG…    1301.          -1.71 0.0986 -17.4 1.17e- 67 3.70e- 64

pos <- sig %>% filter(log2FoldChange > 0)
sig_genes <- sig %>% select(symbol)
cat("Significant genes (padj < 0.01):", nrow(sig_genes), "\n")

## Significant genes (padj < 0.01): 735

cat("# Upregulated:", nrow(pos), "\n")

## # Upregulated: 332

cat("# Downregulated:", nrow(sig_genes) - nrow(pos), "\n")

## # Downregulated: 403

write.table(sig_genes, file = "sig_genes.txt", col.names = F, row.names = F, quote= F)

ENRICHR Analysis

#enrichment from Reactome Pathways 2024
reactome <- read.delim("/projectnb/bf528/students/hbeakley/project-2-hrbeakley/Reactome_Pathways_2024_table.txt", header = TRUE, sep = "\t")
head(reactome[, c("Term", "Adjusted.P.value", "Overlap", "Combined.Score")], 7)

##                                                                               Term
## 1                                              Regulation of Beta-Cell Development
## 2                                 TP53 Regulates Transcription of Cell Death Genes
## 3                      TP53 Regulates Transcription of Death Receptors and Ligands
## 4 Regulation of Gene Expression in Endocrine-Committed (NEUROG3+) Progenitor Cells
## 5                                                                  Neuronal System
## 6                                                Extracellular Matrix Organization
## 7                                      Regulation of Gene Expression in Beta Cells
##   Adjusted.P.value Overlap Combined.Score
## 1     2.019213e-05   12/42   1.875117e+02
## 2     2.019213e-05   12/44   1.700068e+02
## 3     2.019213e-05    7/12   6.163288e+02
## 4     2.019213e-05     5/5   1.592382e+06
## 5     4.720253e-05  38/411   4.268770e+01
## 6     4.549061e-04  29/300   3.745881e+01
## 7     1.136761e-03    7/21   1.578832e+02

#enrichment from KEGG 2021 Human
kegg <- read.delim("/projectnb/bf528/students/hbeakley/project-2-hrbeakley/KEGG_2021_Human_table.txt", header = TRUE, sep = "\t")
head(kegg[, c("Term", "Adjusted.P.value", "Overlap", "Combined.Score")], 7)

##                                   Term Adjusted.P.value Overlap Combined.Score
## 1 Maturity onset diabetes of the young     5.806366e-05    9/26      215.99625
## 2                   Morphine addiction     1.493751e-04   15/91       72.33060
## 3             ECM-receptor interaction     3.012224e-04   14/88       62.91286
## 4                    GABAergic synapse     3.012224e-04   14/89       61.38891
## 5               Dilated cardiomyopathy     5.833508e-04   14/96       52.01448
## 6                p53 signaling pathway     5.833508e-04   12/73       58.77369
## 7                  Cholinergic synapse     5.833508e-04  15/113       44.82922

GSEA Analysis

Installing fgsea

BiocManager::install("fgsea")

## 'getOption("repos")' replaces Bioconductor standard repositories, see
## 'help("repositories", package = "BiocManager")' for details.
## Replacement repositories:
##     CRAN: https://cloud.r-project.org

## Bioconductor version 3.20 (BiocManager 1.30.26), R 4.4.3 (2025-02-28)

## Warning: package(s) not installed when version(s) same as or greater than current; use
##   `force = TRUE` to re-install: 'fgsea'

## Installation paths not writeable, unable to update packages
##   path: /share/pkg.8/r/4.4.3/install/lib64/R/library
##   packages:
##     acepack, actuar, ade4TkGUI, adegraphics, adehabitatLT, adephylo, admisc,
##     agridat, AICcmodavg, akima, AlgDesign, alphashape3d, animation, anndata,
##     anytime, apcluster, aplot, argparse, arrow, arulesCBA, arulesViz,
##     AsioHeaders, attachment, basefun, batchtools, bayesplot, bayestestR,
##     BDgraph, betareg, bigassertr, bigD, bigsnpr, bigstatsr, bigutilsr,
##     BiocManager, bipartite, bit, blockmodeling, bnlearn, bookdown, boot,
##     BradleyTerry2, brglm, brms, broom, broom.helpers, C50, Cairo, cards, CDM,
##     CFtime, checkmate, chromote, CircStats, cli, clock, cluster, ClusterR, clv,
##     collapse, collections, cols4all, commonmark, CompQuadForm, coneproj,
##     copula, CovTools, cowplot, cpp11, credentials, crosstalk, crul, cubature,
##     Cubist, curl, data.table, data.tree, datawizard, dbplyr, dbscan, dcurver,
##     dendextend, DEoptimR, Deriv, DescTools, devEMF, devtools, dials, diffobj,
##     diptest, DirichletReg, distributions3, DistributionUtils, distro, doBy,
##     dockerfiler, docopt, DoE.base, doFuture, doRNG, DoseFinding, dotwhisker,
##     downloader, DPQ, dreamerr, DT, dtplyr, duckdb, eaf, effects, elliptic,
##     emdbook, emmeans, entropy, Epi, ergm, ergm.count, ergm.ego, ergm.multi,
##     etm, evaluate, extrafont, extrafontdb, fastcluster, fda, fields,
##     fitdistrplus, fixest, flextable, float, FME, foghorn, forcats, forecast,
##     forecTheta, foreign, fresh, fs, future, future.apply, future.batchtools,
##     gam, gamlss, gamlss.data, gamm4, gapminder, gargle, gclus, gdtools,
##     geepack, GeneralizedHyperbolic, generics, gert, GGally, ggdist, ggdmc,
##     ggeffects, ggforce, ggfortify, ggfun, ggimage, ggmcmc, ggnewscale, ggplot2,
##     ggplotify, ggpp, ggraph, ggridges, ggsci, ggstats, gh, gld, GLMMadaptive,
##     glmnet, globals, gmm, googledrive, googlePolylines, googlesheets4,
##     GPArotation, graphon, graphsim, gsDesign, gss, gt, gtsummary, hardhat,
##     harmony, haven, here, Hmisc, hms, httptest2, httpuv, httr2, hunspell,
##     huxtable, hypergeo, ICS, igraph, iNEXT, insight, invgamma, irace, ISOcodes,
##     ISwR, jpeg, jsonlite, keras, kinship2, kknn, knitr, ks, labdsv, labelled,
##     lamW, latentnet, later, lattice, latticeExtra, lavaan, leafem, leaflet,
##     leidenbase, lgr, libgeos, libstable4u, limSolve, literanger, lme4, LMest,
##     Lmoments, locfit, logger, logistf, longmemo, lpSolveAPI, magick, magrittr,
##     mapproj, maps, marginaleffects, markdown, matchingR, MatchIt, mathjaxr,
##     MCMCprecision, mcmcse, memisc, merDeriv, meta, mets, mgcv, mgcViz, mice,
##     miceadds, mime, miniUI, mirt, mixtools, mlr, mmrm, mockery, mockr,
##     modeldata, modeltools, MPV, msigdbr, multcomp, MultiLCIRT, multilevel,
##     multinet, MuMIn, nanoarrow, nanonext, ncdf4, ndjson, NetworkDistance, nlme,
##     nloptr, odbc, officer, openair, openssl, optimx, osmdata, packrat, pak,
##     parallelDist, parallelly, parameters, paran, parsnip, partitions, partykit,
##     patchwork, paws.common, paws.storage, pbapply, pbdZMQ, pbkrtest, pcadapt,
##     penalized, peperr, performance, permute, pheatmap, pillar, pixmap,
##     pkgbuild, pkgdown, pkgload, PKI, plm, plot3D, plotly, poppr, pracma, pROC,
##     prodlim, profmem, progressr, PROJ, promises, proxyC, ps, psych,
##     psychotools, psychotree, psychTools, Publish, purrr, qgam, qpdf, qs,
##     quanteda, quantmod, quarto, questionr, QuickJSR, R.cache, R.oo, r2rtf,
##     R2WinBUGS, ragg, raster, ratelimitr, Rcpp, RcppArmadillo, RcppParallel,
##     RcppTOML, RcppXPtrUtils, RcppZiggurat, Rcsdp, RCurl, Rdpack, readstata13,
##     readxl, recipes, REddyProc, reformulas, rentrez, renv, reshape, reticulate,
##     Rfast, Rfast2, rgl, rhub, RInside, rio, RItools, RJSONIO, rlang, rle,
##     rmarkdown, Rmpfr, RMySQL, RNetCDF, rnndescent, robustbase, roptim,
##     ROptSpace, roxygen2, rpart.plot, RPEGLMEN, rpf, RPostgreSQL, rprojroot,
##     RPushbullet, rrcov, rredlist, rsample, rsconnect, RSQLite, rstan, rstanarm,
##     rstantools, rstpm2, rsvg, Rttf2pt1, RUnit, runjags, runonce, rversions,
##     rvest, rvg, s2, sampling, sass, scales, scatterpie, sctransform,
##     sensitivity, seriation, SeuratObject, sf, sfsmisc, shadowtext, shapes,
##     shiny, shinydashboard, shinydashboardPlus, SimComp, SimDesign, sirt, skimr,
##     softImpute, SomaDataIO, spacesXYZ, sparsesvd, sparsevctrs, spatstat,
##     spatstat.data, spatstat.explore, spatstat.geom, spatstat.linnet,
##     spatstat.model, spatstat.random, spatstat.univar, spatstat.utils, spelling,
##     spls, ssgraph, statmod, statnet.common, storr, stringfish, stringi,
##     stringmagic, stringr, styler, SuppDists, survey, svglite, svUnit,
##     systemfonts, table1, TAM, tcltk2, tclust, tensor, tensorflow, tergm, terra,
##     textmineR, textshaping, tfruns, TH.data, threejs, tibble, tidytext,
##     timeDate, timereg, tinyplot, tinytex, tkrplot, tmap, tmaptools, TMB,
##     tmvtnorm, tree, tripack, tsna, TSP, tufte, tzdb, umx, units, unmarked,
##     urltools, usethis, utf8, V8, variables, vcr, vegan, VGAMextra, visNetwork,
##     vroom, waiter, waldo, webfakes, webmockr, webshot2, websocket, weights,
##     workflows, writexl, WriteXLS, x13binary, xfun, xgboost, xlsxjars, XML,
##     xml2, yulab.utils, zeallot, zip, zoo

library(fgsea)

Generate a ranked list of our genes by Log2FoldChange

deseq2_res_ordered <- deseq2_res %>% 
  drop_na() %>% 
  distinct(symbol, .keep_all = TRUE) %>%
  arrange(desc(log2FoldChange))

rnk_list <- setNames(deseq2_res_ordered$log2FoldChange, deseq2_res_ordered$symbol)

Read in the GMT file

pathways <- fgsea::gmtPathways("/projectnb/bf528/students/hbeakley/project-2-hrbeakley/c2.cp.v2025.1.Hs.symbols.gmt")

Running FGSEA

fgseaRes <- fgsea(pathways, rnk_list, minSize=15, maxSize=500)

GSEA Plot

library(dplyr)
library(ggplot2)
library(forcats)

top_positive_nes <- fgseaRes %>%
  filter(padj < .1 & NES > 0) %>%
  slice_max(NES, n=10)

top_negative_nes <- fgseaRes %>%
  filter(padj < .1 & NES < 0) %>%
  slice_min(NES, n=10)

bind_rows(top_positive_nes, top_negative_nes) %>%
  mutate(pathway = forcats::fct_reorder(pathway, NES)) %>%
  ggplot(aes(pathway, NES, fill = NES > 0)) +
  geom_col() + coord_flip() + theme_minimal()

Brief ENRICHR Analysis

Most notably, several enriched Reactome 2024 pathways were related to the regulation of beta-cell development and gene expression.This is consistent with the paper’s findings, since our RNA-seq data came from WT and TYK2 KO S5 EP cell lines, and TYK2 loss was found to compromise EP formation. The second and third most significantly enriched pathways involved TP53 and its regulation of cell death genes. This finding is intriguing because the paper attributes the developmental disturbances in KO cell lines to increased KRAS, which causes cells to prematurely progress through the cell cycle. TP53 upregulation in KO cells may be a fail-safe mechanism to combat these proliferative signals [10]. Finally, the most significantly enriched KEGG 2021 Human pathway, “Maturity-Onset Diabetes of the Young,” serves as a sanity check, reinforcing TYK2’s relevance as a candidate gene for type 1 diabetes.

Brief fgsea analysis

This plot depicts the top significantly enriched pathways from the GSEA analysis using the C2 canonical pathways dataset. It displays the normalized enrichment scores of the ten most upregulated and ten most downregulated pathways. Several pathways upregulated in TYK2 KO cells involve p53 signaling, extracellular matrix organization, and adhesion pathways, consistent with a stress response phenotype. In contrast, multiple downregulated pathways correspond to beta cell development and differentiation. Ultimately, these results support the hypothesis that TYK2 KO S5 cells exhibit compromised differentiation due to accelerated proliferation, which in turn causes a compensatory stress response.

ENRICHR + GSEA analysis comparison

Both the ENRICHR and GSEA analyses highlight beta-cell differentiation and p53 signaling as major pathways disrupted by TYK2 loss. Both techniques also identify enrichment of neuroendocrine programs. In particular, the GSEA results show downregulation of neuronal differentiation and signaling pathways, consistent with compromised beta cell maturation and neuroendocrine functionality in the TYK2 KO S5 cells.

Methodologically, GSEA offers directionality by computing normalized enrichment scores, and it directly derives enrichment from ranked differential expression results, which reduces the subjectivity of setting a significance threshold for defining the gene set. The ENRICHR analysis, by comparison, requires this subjective determination of a significant gene set and then reports undirected enrichment magnitudes. However, directional insights could be obtained by performing separate ENRICHR runs for genes with positive and negative fold changes. Despite these technical differences, the core biological processes detected by both methods are highly consistent, which helps validate the robustness of the results.

RNAseq QC

#create normalized counts matrix using rlog or vst
vsd <- vst(dds, blind=FALSE)
pcaData <- plotPCA(vsd, intgroup = "condition", returnData = TRUE)

## using ntop=500 top features by variance

percentVar <- round(100 * attr(pcaData, "percentVar"))
ggplot(pcaData, aes(PC1, PC2, color = condition)) +
  geom_point(size = 3) +
  xlab(paste0("PC1: ", percentVar[1], "% variance")) +
  ylab(paste0("PC2: ", percentVar[2], "% variance")) +
  coord_fixed()

library("RColorBrewer")
library("pheatmap")

sampleDists <- dist(t(assay(vsd)))
sampleDistMatrix <- as.matrix(sampleDists)
rownames(sampleDistMatrix) <- paste(vsd$condition, vsd$type, sep="-")
colnames(sampleDistMatrix) <- NULL
colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)
pheatmap(sampleDistMatrix,
         clustering_distance_rows=sampleDists,
         clustering_distance_cols=sampleDists,
         col=colors)

Principle component analysis (PCA) is commonly used as diagnostic in RNA-seq experiments to assess whether the condition of interest explains the majority of variation among samples. In our PCA, the samples generally cluster by condition, indicating that TYK2 KO is largely responsible for the differential expression observed. PCA plots are also useful for identifying outliers, and we have one experimental replicate that somewhat strays from its cluster. However, given the small number of replicates, this deviation is not particularly concerning.

The sample-to-sample distance heatmap represents the pairwise distances between samples, illustrating how similar or different they are from each other. Ideally, control samples should be more similar to each other, and experimental groups should be more similar to each other, with clear separation between groups. In our results, the control replicates show strong mutual similarity and are consistently different from all experimental replicates. Among the experimental group, two replicates are highly similar to each other, while one is more distant and slightly closer to the controls. Although this one deviant replicate is not ideal, the small number of replicates makes it difficult to determine outliers. Overall, the PCA and distance plots display generally robust clustering by condition, providing strong support that the differential expression results are truly due to TYK2 loss.

Figure 3C and 3F Replication

library(ggrepel)
sig_y <- 4      #significance cutoff on y-axis

volc <- as.data.frame(deseq2_res) %>%     #start with DESeq2 results as a data.frame
  dplyr::mutate(
    gene = if ("symbol" %in% names(.)) symbol else rownames(.),   #pick gene symbol column if present, else rownames
    p    = dplyr::if_else(is.na(pvalue) | pvalue == 0, .Machine$double.xmin, pvalue),
    y    = pmin(-log10(p), 16),   # transform to -log10(p), then cap at 15 to match paper
    dir  = dplyr::case_when(      ## direction group for coloring
      y >= sig_y & log2FoldChange > 0 ~ "Up",   # significant and positive LFC → Up
      y >= sig_y & log2FoldChange < 0 ~ "Down",   ## significant and negative LFC → Down
      TRUE ~ "Not significant"   # everything else → Not Significant
    )
  )
# list of genes to annotate (matching 3c from paper)
genes_to_label <- c("KRAS","LAMB2","SPP1","DUSP6","SPRY2","PLAT","AKT3","SLC2A2","BAX",
                    "COL2A1","APOE","LAMA4","KDR","PGF","PTPRU","COL9A2","NTRK2","PAK3",
                    "ELMO","INSM1","NEUROG3","GAB2","NOS3","PAX6","NEUROD1","THBS1",
                    "NKX2-2","GABRB3","ONECUT1")
lab <- dplyr::filter(volc, gene %in% genes_to_label, is.finite(log2FoldChange), is.finite(y))

#plot ranges + padding
xr   <- range(volc$log2FoldChange, na.rm = TRUE)    # x-range of all points (min/max LFC)
xpad <- diff(xr) * 0.12       # small horizontal padding (12% of span) for fan-out space
xlim <- xr  

yr   <- range(volc$y, na.rm = TRUE)
ypad <- diff(yr) * 0.25    # 25% extra room on top for labels
ylim <- c(0, max(yr) + ypad)

#plot
ggplot(volc, aes(log2FoldChange, y, color = dir)) + # base plot: x = LFC, y = -log10(p), color by direction
  geom_point(alpha = .6, size = 1.2) +
  #label left and right sides separately so labels can fan out
  geom_text_repel(data = subset(lab, log2FoldChange > 0), aes(label = gene),  # labels for right-hand points (LFC > 0)
                  nudge_x =  xpad*.7, nudge_y = .35,    # slight diagonal fan: push right + a little up
                  box.padding = .4, point.padding = .25,    # spacing around label box & between label and point
                  segment.color = "grey", min.segment.length = 0,
                  max.overlaps = Inf) +
  geom_text_repel(data = subset(lab, log2FoldChange < 0), aes(label = gene),
                  nudge_x = -xpad*.7, nudge_y = .35,
                  box.padding = .4, point.padding = .25,
                  segment.color = "grey", min.segment.length = 0,
                  max.overlaps = Inf) +
  scale_color_manual(values = c(Up="red", Down="blue", NS="grey")) +    # map groups to colors (Up=red, Down=blue)
  scale_x_continuous(limits = xlim, expand = expansion(mult = c(.02,.02))) +
  scale_y_continuous(expand = expansion(mult = c(0, .1))) +
  coord_cartesian(ylim = c(0, max(volc$y, na.rm = TRUE)))  +
  labs(x="Expression difference (log2 fold change)",
       y="Significance (-log10 p-value)", color=NULL,
       title="Volcano plot") +
  theme_minimal() +
  theme(panel.grid.minor = element_blank(),
        panel.grid.major.x = element_blank(),
        plot.margin = margin(20,40,20,40)) +
  coord_fixed(1)

## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.

library(dplyr)
library(ggplot2)
library(forcats)
library(purrr)

pathways_of_interest <- c(
  "REACTOME_EXTRACELLULAR_MATRIX_ORGANIZATION",
  "REACTOME_INTEGRIN_CELL_SURFACE_INTERACTIONS",
  "REACTOME_ECM_PROTEOGLYCANS",
  "REACTOME_REGULATION_OF_BETA_CELL_DEVELOPMENT",
  "REACTOME_NEURONAL_SYSTEM",
  "REACTOME_REGULATION_OF_GENE_EXPRESSION_IN_BETA_CELLS"
)

alpha <- 0.01

#Identify DE genes
de_genes <- deseq2_res %>%
  filter(!is.na(padj), padj < alpha) %>%
  transmute(symbol = if ("symbol" %in% names(.)) symbol else rownames(.)) %>%
  distinct() %>% pull(symbol)

#Keep only selected pathways
sel_paths <- pathways[names(pathways) %in% pathways_of_interest]

#Compute % of DE genes in each pathway
pct_df <- tibble(
  pathway  = names(sel_paths),
  set_size = map_int(sel_paths, length),
  overlap  = map_int(sel_paths, ~ sum(.x %in% de_genes))
) %>%
  mutate(pct = 100 * overlap / set_size)

#Join FGSEA results
plot_df <- pct_df %>%
  left_join(fgseaRes %>% select(pathway, NES, padj), by = "pathway") %>%
  filter(!is.na(NES), !is.na(padj)) %>%
  mutate(
    direction   = if_else(NES > 0, "Up", "Down"),
    pathway_lab = gsub("_", " ", sub("^REACTOME_", "", pathway)),
    pathway_lab = fct_reorder(pathway_lab, pct)
  )

#Plot horizontal bars colored by padj
ggplot(plot_df, aes(x = pathway_lab, y = pct, fill = padj)) +
  geom_col(width = 0.6) +
  coord_flip() +
  facet_wrap(~ direction, ncol = 1, scales = "free_y") +
  scale_fill_gradient(
    low = "mediumvioletred", high = "red",
    trans = "reverse",
    name = "Adjusted p value"
  ) +
  labs(
    title = "Reactome enrichment",
    x = NULL,
    y = "Percentage of DE genes in category / all genes in category (%)"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    strip.text = element_text(face = "bold"),
    panel.grid.major.y = element_blank(),
    plot.title = element_text(face = "bold", hjust = 0.5)
  )

Discussion

They reported 319 upregulated and 412 downregulated genes using a padj threshold of 0.01, whereas I found 332 upregulated and 401 downregulated genes at the same threshold. Given all of the variables at play, these numbers are quite similar and indicate that my pipeline captured comparable transcriptional trends.

I replicated Figures 3C and 3F as closely as possible, using the same genes and pathways highlighted in their figures. As a result, my volcano plot pretty closely mirrors theirs, and the directions of enrichment for nearly all labeled genes are consistent. Specific genes such as KRAS and LAMB2 appear in nearly identical positions on both plots, suggesting that my differential expression results align well with those of Chandra et al.

The enrichment results were also highly similar. Both analyses identified many of the same functional programs as significantly altered, and I was able to extract six of the seven pathways displayed in their Figure 3F from my GSEA results. When plotted as a horizontal bar graph, the relative bar lengths (representing percent of all genes in that category that are significantly differentially expressed) are highly comparable, as were the padj values. Overall, these parallels indicate that both analyses identified the same biological processes as up or down regulated and at very similar magnitudes. The only notable difference is the absence of the “Signaling by Receptor Tyrosine Kinases” pathway in my GSEA results.

The small discrepencies between our results may have arisen due to a plethora of technical differences. First, the authors likely filtered reads and genes differently than I did. Second, they used slightly different tools, such as Rsubread v2.4.3 for gene counting and ClusterProfile v3.18.1 for enrichment analysis. We also used different versions of some of the same tools, (i.e. STAR aligner v2.5.4b versus v2.7.11b and DESeq2 v1.30.1 versus v1.44.0). Lastly, and perhaps most significantly, they did not specify which version of the reference genome or annotation they used for their analysis. Nevertheless, given the highly similar counts of up and downregulated genes, the closely aligned volcano plots, and the nearly matching enrichment profiles, my results capture the same key underlying biological signals as Chandra et al. for this portion of the RNA-seq experiment.

Sources:

Chandra, V., Ibrahim, H., Halliez, C. et al. The type 1 diabetes gene TYK2 regulates β-cell development and its responses to interferon-α. Nat Commun 13, 6363 (2022). https://doi.org/10.1038/s41467-022-34069-z
Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online]. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
Dobin, A. et al. (2013). STAR: Ultrafast universal RNA-seq aligner. Bioinformatics, 29(1), 15–21. https://doi.org/10.1093/bioinformatics/bts635
Ewels, P., Magnusson, M., Lundin, S., & Käller, M. (2016). MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics, 32(19), 3047–3048. https://doi.org/10.1093/bioinformatics/btw354
Zhu, Y. et al. (2016). VERSE: a versatile and efficient RNA-Seq read counting tool. Bioinformatics, 32(20), 3216–3222. https://doi.org/10.1093/bioinformatics/btw343
Love, M.I., Huber, W., & Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15, 550. https://doi.org/10.1186/s13059-014-0550-8
Chen, E.Y. et al. (2013). Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics, 14, 128. https://maayanlab.cloud/Enrichr/
Subramanian, A. et al. (2005). Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. PNAS, 102(43), 15545–15550. https://doi.org/10.1073/pnas.0506580102
Liberzon, A. et al. (2011). Molecular signatures database (MSigDB) 3.0. Bioinformatics, 27(12), 1739–1740. https://www.gsea-msigdb.org/gsea/msigdb/human/collections.jsp#C2
Lowe SW. Activation of p53 by oncogenes. Endocr Relat Cancer. 1999 Mar;6(1):45-8. doi: 10.1677/erc.0.0060045. PMID: 10732786.

SessionInfo

The sessioninfo function is one way we can attempt to be transparent about our environment management in R.

This will print out a summary of the current packages installed in the environment.

sessioninfo::session_info()

## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.4.3 (2025-02-28)
##  os       AlmaLinux 8.10 (Cerulean Leopard)
##  system   x86_64, linux-gnu
##  ui       X11
##  language (EN)
##  collate  en_US.UTF-8
##  ctype    en_US.UTF-8
##  tz       America/New_York
##  date     2025-10-24
##  pandoc   3.2 @ /usr/local/ood/rstudio-server-2024.12.0+467/bin/quarto/bin/tools/x86_64/ (via rmarkdown)
##  quarto   1.5.57 @ /usr/local/ood/rstudio-server-2024.12.0+467/bin/quarto/bin/quarto
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package              * version  date (UTC) lib source
##  abind                  1.4-8    2024-09-12 [1] CRAN (R 4.4.0)
##  Biobase              * 2.66.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.0)
##  BiocGenerics         * 0.52.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.0)
##  BiocManager            1.30.26  2025-06-05 [1] CRAN (R 4.4.0)
##  BiocParallel           1.40.2   2025-10-06 [1] Bioconductor
##  bit                    4.6.0    2025-03-06 [1] CRAN (R 4.4.0)
##  bit64                  4.6.0-1  2025-01-16 [1] CRAN (R 4.4.0)
##  bslib                  0.9.0    2025-01-30 [2] CRAN (R 4.4.3)
##  cachem                 1.1.0    2024-05-16 [1] CRAN (R 4.4.0)
##  cli                    3.6.5    2025-04-23 [1] CRAN (R 4.4.0)
##  codetools              0.2-20   2024-03-31 [2] CRAN (R 4.4.3)
##  colorspace             2.1-2    2025-09-22 [1] CRAN (R 4.4.0)
##  cowplot                1.1.3    2024-01-22 [2] CRAN (R 4.4.3)
##  crayon                 1.5.3    2024-06-20 [1] CRAN (R 4.4.0)
##  data.table             1.17.8   2025-07-10 [1] CRAN (R 4.4.0)
##  DelayedArray           0.32.0   2024-10-29 [2] Bioconductor 3.20 (R 4.4.3)
##  DESeq2               * 1.46.0   2024-10-29 [2] Bioconductor 3.20 (R 4.4.3)
##  digest                 0.6.37   2024-08-19 [1] CRAN (R 4.4.0)
##  dplyr                * 1.1.4    2023-11-17 [2] CRAN (R 4.4.3)
##  evaluate               1.0.5    2025-08-27 [1] CRAN (R 4.4.0)
##  farver                 2.1.2    2024-05-13 [1] CRAN (R 4.4.0)
##  fastmap                1.2.0    2024-05-15 [1] CRAN (R 4.4.0)
##  fastmatch              1.1-6    2024-12-23 [1] CRAN (R 4.4.0)
##  fgsea                * 1.32.4   2025-03-20 [1] Bioconductor 3.20 (R 4.4.0)
##  forcats              * 1.0.0    2023-01-29 [2] CRAN (R 4.4.3)
##  generics               0.1.4    2025-05-09 [1] CRAN (R 4.4.0)
##  GenomeInfoDb         * 1.42.3   2025-01-27 [2] Bioconductor 3.20 (R 4.4.3)
##  GenomeInfoDbData       1.2.13   2025-10-06 [1] Bioconductor
##  GenomicRanges        * 1.58.0   2024-10-29 [2] Bioconductor 3.20 (R 4.4.3)
##  ggplot2              * 3.5.1    2024-04-23 [2] CRAN (R 4.4.3)
##  ggrepel              * 0.9.6    2024-09-07 [2] CRAN (R 4.4.3)
##  glue                   1.8.0    2024-09-30 [1] CRAN (R 4.4.0)
##  gtable                 0.3.6    2024-10-25 [2] CRAN (R 4.4.3)
##  hms                    1.1.3    2023-03-21 [2] CRAN (R 4.4.3)
##  htmltools              0.5.8.1  2024-04-04 [2] CRAN (R 4.4.3)
##  httr                   1.4.7    2023-08-15 [2] CRAN (R 4.4.3)
##  IRanges              * 2.40.1   2024-12-05 [2] Bioconductor 3.20 (R 4.4.3)
##  jquerylib              0.1.4    2021-04-26 [2] CRAN (R 4.4.3)
##  jsonlite               2.0.0    2025-03-27 [1] CRAN (R 4.4.0)
##  knitr                  1.49     2024-11-08 [2] CRAN (R 4.4.3)
##  labeling               0.4.3    2023-08-29 [2] CRAN (R 4.4.3)
##  lattice                0.22-7   2025-04-02 [1] CRAN (R 4.4.0)
##  lifecycle              1.0.4    2023-11-07 [2] CRAN (R 4.4.3)
##  locfit                 1.5-9.11 2025-02-03 [2] CRAN (R 4.4.3)
##  lubridate            * 1.9.4    2024-12-08 [2] CRAN (R 4.4.3)
##  magrittr               2.0.4    2025-09-12 [1] CRAN (R 4.4.0)
##  Matrix                 1.7-4    2025-08-28 [2] CRAN (R 4.4.3)
##  MatrixGenerics       * 1.18.1   2025-01-09 [2] Bioconductor 3.20 (R 4.4.3)
##  matrixStats          * 1.5.0    2025-01-07 [1] CRAN (R 4.4.0)
##  munsell                0.5.1    2024-04-01 [2] CRAN (R 4.4.3)
##  pheatmap             * 1.0.12   2019-01-04 [2] CRAN (R 4.4.3)
##  pillar                 1.10.1   2025-01-07 [2] CRAN (R 4.4.3)
##  pkgconfig              2.0.3    2019-09-22 [2] CRAN (R 4.4.3)
##  purrr                * 1.0.4    2025-02-05 [2] CRAN (R 4.4.3)
##  R6                     2.6.1    2025-02-15 [1] CRAN (R 4.4.0)
##  RColorBrewer         * 1.1-3    2022-04-03 [2] CRAN (R 4.4.3)
##  Rcpp                   1.1.0    2025-07-02 [1] CRAN (R 4.4.0)
##  readr                * 2.1.5    2024-01-10 [2] CRAN (R 4.4.3)
##  rlang                  1.1.6    2025-04-11 [1] CRAN (R 4.4.0)
##  rmarkdown              2.29     2024-11-04 [2] CRAN (R 4.4.3)
##  rstudioapi             0.17.1   2024-10-22 [1] CRAN (R 4.4.0)
##  S4Arrays               1.6.0    2024-10-29 [2] Bioconductor 3.20 (R 4.4.3)
##  S4Vectors            * 0.44.0   2024-10-29 [2] Bioconductor 3.20 (R 4.4.3)
##  sass                   0.4.9    2024-03-15 [2] CRAN (R 4.4.3)
##  scales                 1.3.0    2023-11-28 [2] CRAN (R 4.4.3)
##  sessioninfo            1.2.3    2025-02-05 [2] CRAN (R 4.4.3)
##  SparseArray            1.6.2    2025-02-20 [2] Bioconductor 3.20 (R 4.4.3)
##  stringi                1.8.7    2025-03-27 [1] CRAN (R 4.4.0)
##  stringr              * 1.5.1    2023-11-14 [2] CRAN (R 4.4.3)
##  SummarizedExperiment * 1.36.0   2024-10-29 [2] Bioconductor 3.20 (R 4.4.3)
##  tibble               * 3.2.1    2023-03-20 [2] CRAN (R 4.4.3)
##  tidyr                * 1.3.1    2024-01-24 [2] CRAN (R 4.4.3)
##  tidyselect             1.2.1    2024-03-11 [2] CRAN (R 4.4.3)
##  tidyverse            * 2.0.0    2023-02-22 [2] CRAN (R 4.4.3)
##  timechange             0.3.0    2024-01-18 [2] CRAN (R 4.4.3)
##  tzdb                   0.4.0    2023-05-12 [2] CRAN (R 4.4.3)
##  UCSC.utils             1.2.0    2024-10-29 [2] Bioconductor 3.20 (R 4.4.3)
##  utf8                   1.2.6    2025-06-08 [1] CRAN (R 4.4.0)
##  vctrs                  0.6.5    2023-12-01 [2] CRAN (R 4.4.3)
##  vroom                  1.6.5    2023-12-05 [2] CRAN (R 4.4.3)
##  withr                  3.0.2    2024-10-28 [1] CRAN (R 4.4.0)
##  xfun                   0.53     2025-08-19 [1] CRAN (R 4.4.0)
##  XVector                0.46.0   2024-10-29 [2] Bioconductor 3.20 (R 4.4.3)
##  yaml                   2.3.10   2024-07-26 [1] CRAN (R 4.4.0)
##  zlibbioc               1.52.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.0)
## 
##  [1] /usr4/bf527/hbeakley/R/x86_64-pc-linux-gnu-library/4.4
##  [2] /share/pkg.8/r/4.4.3/install/lib64/R/library
##  * ── Packages attached to the search path.
## 
## ──────────────────────────────────────────────────────────────────────────────

Project 2 Report

Hannah Beakley

2025-10-24