Using factR
1 Introduction
High throughput RNA sequencing experiments have allowed the discovery of many new RNA species. Current bioinformatic pipelines typically involve aligning these sequencing reads to the genome and assembling contiguous reads into transcript architectures. This will output full descriptions of transcript and exon coordinates in Gene Transfer Format (GTF). Further analyses including differential transcript expression can be performed with these custom-assembled transcript information. Despite having rich details on transcript architectures, these custom-assembled transcripts lack functional information including coding potential.
To this end, we developed a program with tools to Functionally Annotate Custom-assembled Transcripts in R (factR). At its core, factR will construct CDS features for assembled transcriptomes using a reference annotation as guide. With this newly-built coding information, we may then query transcripts for protein domain features, for nonsense-mediated decay inducing features and for upstream open-reading frames. factR also contain tools to annotate alternative segments between transcripts, to auto-fill gene metadata and other tools to visualize particular transcript families from a GTF file.
2 Getting started
2.1 How to install and load factR and other dependencies
To install factR, open R and type:
The above installation will also install packages needed to run factR. In addition, you would need to install the following genome sequence for this workflow:
Next, we need to load the following packages:
2.2 What inputs you need
2.2.1 Query transcript assembly
Transcript assembly programs typically outputs a GTF file containing transcript and exon coordinates, as well as attributes for each entry. It is mandatory that your assembly contain gene_id and transcript_id attributes.
Query GTF file have to be imported into R and parsed into a GenomicRanges object. To do so, run:
2.2.2 Reference annotation
Reference transcript annotation serves as a guide in the construction of query CDS. We strongly recommend using transcript annotation in GTF/GFF3 format for the workflow. Reference GTF annotation can be acquired from the Annotation Hub database in R or downloaded locally from data server website via FTP. For the latter, the GTF file have to be imported into R and parsed into a GenomicRanges object.
Here, we will describe three ways to obtain mouse annotation:
- vM6 Gencode basic annotation from Annotation Hub database
# query database for mouse gencode basic annotation
ah <- AnnotationHub()
query(ah, c('Mus musculus', 'gencode', 'gff', 'basic'))
#> AnnotationHub with 2 records
#> # snapshotDate(): 2019-05-02
#> # $dataprovider: Gencode
#> # $species: Mus musculus
#> # $rdataclass: GRanges
#> # additional mcols(): taxonomyid, genome, description, coordinate_1_based, maintainer,
#> # rdatadateadded, preparerclass, tags, rdatapath, sourceurl, sourcetype
#> # retrieve records with, e.g., 'object[["AH49547"]]'
#>
#> title
#> AH49547 | gencode.vM6.basic.annotation.gff3.gz
#> AH49549 | gencode.vM6.chr_patch_hapl_scaff.basic.annotation.gff3.gz
#> retrieve id 'AH49547'
ref_gtf <- ah[['AH49547']]- GRCm38 Ensembl annotation from Annotation Hub database
# Load annotation hub and query database for ensembl annotation as in method 1 above.
# retrieve latest GRCm38 annotation from ensembl
ref_gtf <- ah[['AH60127']]- Import locally downloaded GTF annotation into R
2.2.3 Genomic sequence
Genomic DNA sequence is required for determining open-reading frames and for in silico translation of query transcripts. Here, we describe three ways to obtain genomic sequence for mouse GRCm38 assembly:
- BSgenome sequence object from UCSC The package BSgenome.Mmusculus.UCSC.mm10 contains genome sequence for mouse mm10.
- TwoBit fasta sequence (Ensembl) from Annotation Hub
# query database for mouse gencode basic annotation
ah <- AnnotationHub()
query(ah, c("Mus musculus", "release-91", 'fasta', 'GRCm38'))
#> AnnotationHub with 5 records
#> # snapshotDate(): 2019-05-02
#> # $dataprovider: Ensembl
#> # $species: Mus musculus
#> # $rdataclass: TwoBitFile
#> # additional mcols(): taxonomyid, genome, description, coordinate_1_based, maintainer,
#> # rdatadateadded, preparerclass, tags, rdatapath, sourceurl, sourcetype
#> # retrieve records with, e.g., 'object[["AH60491"]]'
#>
#> title
#> AH60491 | Mus_musculus.GRCm38.cdna.all.2bit
#> AH60492 | Mus_musculus.GRCm38.dna.primary_assembly.2bit
#> AH60493 | Mus_musculus.GRCm38.dna_rm.primary_assembly.2bit
#> AH60494 | Mus_musculus.GRCm38.dna_sm.primary_assembly.2bit
#> AH60495 | Mus_musculus.GRCm38.ncrna.2bit
#> retrieve id 'AH60492'
Mmusculus <- ah[['AH60492']]- Import local FASTA genome sequence file
# load package and import fasta file as a DNAStringSet
Mmusculus <- readDNAStringSet('path/to/fasta')Note: when using DNA FASTA from Ensembl, do check header lines for entry names. If long names are given (eg. 1 dna:chromosome chromosome:GRCm38:1:1:195471971:1 REF), you would need to convert them to short names in order to have consistent sequence levels. This can be done by running the following:
3 Sample datasets
3.1 Query
factR package contains a sample assembled GTF imported as a GRanges object. The assembly was constructed from a mouse dataset and contain transcript and exon descriptions of 4 transcript isoforms from the same parent gene. Transcripts in query_gtf have been assigned generic gene_ids and transcript_ids. This is to recapitulate the nomenclature of a typical custom-assembled GTF.
# inspect query object
head(query_gtf)
#> GRanges object with 6 ranges and 3 metadata columns:
#> seqnames ranges strand | type transcript_id gene_id
#> <Rle> <IRanges> <Rle> | <factor> <character> <character>
#> [1] 10 79854427-79864432 + | transcript transcript1 GeneA
#> [2] 10 79854427-79854721 + | exon transcript1 GeneA
#> [3] 10 79856504-79856534 + | exon transcript1 GeneA
#> [4] 10 79858752-79858824 + | exon transcript1 GeneA
#> [5] 10 79858952-79859271 + | exon transcript1 GeneA
#> [6] 10 79859352-79859522 + | exon transcript1 GeneA
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
# list out transcript_id and gene_id names
unique(query_gtf$transcript_id)
#> [1] "transcript1" "transcript2" "transcript3" "transcript4"
unique(query_gtf$gene_id)
#> [1] "GeneA"Users can quickly visualize the exon structure of these query transcripts using the viewTranscripts tool, which is essentially a convenient wrapper to plot transcripts using the wiggleplotr package:
viewTranscripts(query_gtf)
#> Warning: `group_by_()` is deprecated as of dplyr 0.7.0.
#> Please use `group_by()` instead.
#> See vignette('programming') for more help
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_warnings()` to see where this warning was generated.
#> Warning: `filter_()` is deprecated as of dplyr 0.7.0.
#> Please use `filter()` instead.
#> See vignette('programming') for more help
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_warnings()` to see where this warning was generated.
#> Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
#> Please use `arrange()` instead.
#> See vignette('programming') for more help
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_warnings()` to see where this warning was generated.3.2 Reference
For our reference annotation, we will be using a subset of vM6 gencode reference annotation which contain basic transcripts from the same gene family as our query gene.
# inspect reference object
head(ref_gtf)
#> GRanges object with 6 ranges and 5 metadata columns:
#> seqnames ranges strand | type phase
#> <Rle> <IRanges> <Rle> | <factor> <integer>
#> [1] chr10 79854427-79864432 + | transcript <NA>
#> [2] chr10 79854427-79854721 + | exon <NA>
#> [3] chr10 79854714-79854721 + | CDS 0
#> [4] chr10 79854714-79854716 + | start_codon 0
#> [5] chr10 79856504-79856534 + | exon <NA>
#> [6] chr10 79856504-79856534 + | CDS 1
#> gene_id gene_name transcript_id
#> <character> <character> <character>
#> [1] ENSMUSG00000006498.14 Ptbp1 ENSMUST00000172282.5
#> [2] ENSMUSG00000006498.14 Ptbp1 ENSMUST00000172282.5
#> [3] ENSMUSG00000006498.14 Ptbp1 ENSMUST00000172282.5
#> [4] ENSMUSG00000006498.14 Ptbp1 ENSMUST00000172282.5
#> [5] ENSMUSG00000006498.14 Ptbp1 ENSMUST00000172282.5
#> [6] ENSMUSG00000006498.14 Ptbp1 ENSMUST00000172282.5
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
# list out transcript_id and gene_id names
unique(ref_gtf$transcript_id)
#> [1] "ENSMUST00000172282.5" "ENSMUST00000165704.5"
unique(ref_gtf$gene_id)
#> [1] "ENSMUSG00000006498.14"4 Matching chromosome IDs
Different genomic objects may have different chromosome IDs (aka seqlevels), depending on its source. Users may have a custom GTF with “Ensembl”-style seqlevels but wish to use a gencode reference annotation which contain “UCSC”-style seqlevels. The sample query_gtf and ref_gtf objects presents this exact predicament:
seqlevelsStyle(query_gtf)
#> [1] "NCBI" "Ensembl" "MSU6" "JGI2.F" "AGPvF"
seqlevelsStyle(ref_gtf)
#> [1] "UCSC"factR contain a tool to effectively match the seqlevels of a query GRanges object to any reference object with seqlevel information:
# matching seqlevels
matched_query_gtf <- matchChromosomes(query_gtf, Mmusculus)
head(matched_query_gtf)
#> GRanges object with 6 ranges and 3 metadata columns:
#> seqnames ranges strand | type transcript_id gene_id
#> <Rle> <IRanges> <Rle> | <factor> <character> <character>
#> [1] chr10 79854427-79864432 + | transcript transcript1 GeneA
#> [2] chr10 79854427-79854721 + | exon transcript1 GeneA
#> [3] chr10 79856504-79856534 + | exon transcript1 GeneA
#> [4] chr10 79858752-79858824 + | exon transcript1 GeneA
#> [5] chr10 79858952-79859271 + | exon transcript1 GeneA
#> [6] chr10 79859352-79859522 + | exon transcript1 GeneA
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengthsNote: matchChromosomes will drop any unmatched seqlevels from query
factR also comes with the function has_consistentSeqlevels to test if two or more objects have consistent seqlevels. This function returns a logical value of FALSE if input objects have inconsistent seqlevels and recommends a matchChromosomes function to run.
5 Matching gene metadata
Custom-built GTF annotation may contain generic gene_id and gene_name metadata (e.g. ‘MSTRG1’), depending on how it was constructed. Users who wish to fill these metadata using a reference annotation may run the matchGeneInfo function:
# matching gene metadata
matched_query_gtf <- matchGeneInfo(matched_query_gtf, ref_gtf)
#> Number of mismatched gene_ids found: 1
#> ---> Attempting to match gene_ids by finding overlapping coordinates...
#> ---> 1 gene_id matched
#> Total gene_ids corrected: 1
#> Remaining number of mismatched gene_ids: 0
head(matched_query_gtf)
#> GRanges object with 6 ranges and 6 metadata columns:
#> seqnames ranges strand | type transcript_id
#> <Rle> <IRanges> <Rle> | <factor> <character>
#> [1] chr10 79854427-79864432 + | transcript transcript1
#> [2] chr10 79854427-79854721 + | exon transcript1
#> [3] chr10 79856504-79856534 + | exon transcript1
#> [4] chr10 79858752-79858824 + | exon transcript1
#> [5] chr10 79858952-79859271 + | exon transcript1
#> [6] chr10 79859352-79859522 + | exon transcript1
#> gene_id old_gene_id match_level gene_name
#> <character> <character> <numeric> <character>
#> [1] ENSMUSG00000006498.14 GeneA 4 Ptbp1
#> [2] ENSMUSG00000006498.14 GeneA 4 Ptbp1
#> [3] ENSMUSG00000006498.14 GeneA 4 Ptbp1
#> [4] ENSMUSG00000006498.14 GeneA 4 Ptbp1
#> [5] ENSMUSG00000006498.14 GeneA 4 Ptbp1
#> [6] ENSMUSG00000006498.14 GeneA 4 Ptbp1
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengthsBy default, this function will match query transcripts to reference by overlapping coordinates. Users may improve the accuracy of this matching function by providing inputs to primary_gene_id and secondary_gene_id arguments. See ?matchGeneMeta for detailed description.
6 Extract newly-discovered transcripts from query transcriptome
Very often, custom-assembled transcriptomes contain de novo isoforms. It is useful to subset the dataset for these newly-discovered transcripts to streamline downstream steps. To perform this, run:
7 Construct query CDS information
To build CDS information on our query data, users can simply run:
new_query_gtf <- buildCDS(matched_query_gtf, ref_gtf, Mmusculus)
#> Out of 4 transcripts in `matched_query_gtf`, 4 transcript CDSs were built
#> Of which, 2 are newly-discovered CDSsThe above function will return a new GRanges object containing exon entries from query_gtf with appended cds entries. User can inspect the contents of the object further.
# check lengths of input and output GRanges
length(matched_query_gtf)
#> [1] 56
length(new_query_gtf)
#> [1] 105
# inspect CDS entries
head(new_query_gtf[new_query_gtf$type == "CDS"])
#> GRanges object with 6 ranges and 7 metadata columns:
#> seqnames ranges strand | type transcript_id
#> <Rle> <IRanges> <Rle> | <factor> <character>
#> [1] chr10 79854714-79854721 + | CDS transcript1
#> [2] chr10 79856504-79856534 + | CDS transcript1
#> [3] chr10 79858752-79858824 + | CDS transcript1
#> [4] chr10 79858952-79859271 + | CDS transcript1
#> [5] chr10 79859352-79859522 + | CDS transcript1
#> [6] chr10 79859602-79859712 + | CDS transcript1
#> gene_id old_gene_id match_level gene_name phase
#> <character> <character> <numeric> <character> <numeric>
#> [1] ENSMUSG00000006498.14 <NA> NA Ptbp1 0
#> [2] ENSMUSG00000006498.14 <NA> NA Ptbp1 1
#> [3] ENSMUSG00000006498.14 <NA> NA Ptbp1 0
#> [4] ENSMUSG00000006498.14 <NA> NA Ptbp1 2
#> [5] ENSMUSG00000006498.14 <NA> NA Ptbp1 0
#> [6] ENSMUSG00000006498.14 <NA> NA Ptbp1 0
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengthsUser can visualize the coding structure of each transcripts using the viewTranscripts tool:
The new query GTF Granges object can be exported as a .gtf file using rtracklayer’s export funcition:
8 Predicting protein domain structure
Upon construction of coding segments on transcripts, users may be interested in determining the functional domains which is encoded within the coding sequence. To this end, users can utilize the predictDomains function. This tool accepts a GTF GRanges object containing CDS information as input, together with a genome sequence object. After checking the translated protein sequence, predictDomains will query the SuperFamily database for domain features and plot its domain architecture:
In order to extract the domain features, predictDomains will attempt to submit the protein coding sequence to the SuperFamily database and return a summary of the domain structure. As this process is done over HTTP, the speed of this function is greatly dependent on the server and on the number of CDS in your list. This process can be sped up by subsetting the list of transcripts for analysis by providing logical conditions to the … argument. This argument accepts unquoted metadata column names from the GTF as variables and quoted values to filter the GTF by. For example:
pd <- predictDomains(new_query_gtf, Mmusculus, transcript_id %in% c('transcript1', 'transcript3'))
pd
#> transcript description eval begin end
#> 1 transcript1 Canonical RBD 4.96e-06 46 143
#> 2 transcript1 Canonical RBD 1.39e-05 357 446
#> 3 transcript1 Canonical RBD 1.28e-06 177 281
#> 4 transcript1 Canonical RBD 9.32e-06 469 553
#> 5 transcript3 Canonical RBD 4.96e-06 46 143
#> 6 transcript3 Canonical RBD 1.28e-06 177 281Alternatively, users may reduce the number of transcripts for the query by subsetting the input GTF for de novo transcripts.
subsetted_new_query_gtf <- subsetNewTranscripts(new_query_gtf, ref_gtf)
pd_subset <- predictDomains(subsetted_new_query_gtf, Mmusculus)
pd_subset
#> transcript description eval begin end
#> 1 transcript3 Canonical RBD 4.96e-06 46 143
#> 2 transcript3 Canonical RBD 1.28e-06 177 281
#> 3 transcript4 Canonical RBD 1.39e-05 291 380
#> 4 transcript4 Canonical RBD 1.28e-06 137 241
#> 5 transcript4 Canonical RBD 8.52e-05 39 95
#> 6 transcript4 Canonical RBD 8.32e-06 403 4879 Predicting transcript sensitivity to nonsense-mediated decay
Coding RNAs harbouring a premature stop codon are potentially eliminated by a process called Nonsense-mediated decay (NMD). Two common features of NMD targets include the upstream position of its stop_codon relative to the last exon-exon junction and the extensive length of its 3’-untranslated region (3’UTR). To report these NMD features and predict if any of the transcripts in the custom annotation are NMD-sensitive, users can run the predictNMD tool.
predictNMD can accept a GTF GRanges object as input and will only analyze coding transcripts for NMD-features.
predictNMD will output a tibble dataframe with report on NMD-inducing feature for each coding transcript. The column stop_to_lastEJ describes the relative distance of the stop_codon to the last exon-exon junction (lastEJ) of each transcript. Thus, a positive integer means that the lastEJ is downstream of the stop_codon. predictNMD also returns the number of exon-junctions downstream of the stop_codon (num_of_downEJs). 3'UTR_length describes the length of the 3’-untranslated region for each coding transcript. The column containing logical values for is_NMD will return TRUE if the stop_to_lastEJ value exceed the NMD triggering threshold which is typically 50 base-pair. Users may set a custom value by overriding the default NMD_threshold argument.
Users who wish to analyze certain genes or transcripts in their GTF GRanges for NMD features may provide logical conditions to the ... argument to subset their GTF input. This argument accepts unquoted metadata column names from the GTF as variables and quoted values to filter the GTF by. For example:
pn <- predictNMD(new_query_gtf, transcript_id == 'transcript3')
pn
#> # A tibble: 1 x 5
#> transcript stop_to_lastEJ num_of_downEJs `3'UTR_length` is_NMD
#> <chr> <dbl> <int> <dbl> <lgl>
#> 1 transcript 364 3 644 TRUEThis feature provides user the flexibility to analyze desired genes/transcripts from a large collection of GTF GRanges. Input GTF can be filtered with multiple conditions as long as variable names are found in the input GRanges object. Here are some examples on how this feature can be used on an Ensembl GTF annotation:
# retrieve latest GRCm38 annotation from ensembl
ah <- AnnotationHub()
mm10_gtf <- ah[['AH60127']]
# predict transcripts from Ptbp1 gene for NMD features
pn_Ptbp1 <- predictNMD(mm10_gtf, gene_name == 'Ptbp1')
pn_Ptbp1
#> # A tibble: 9 x 6
#> transcript stop_to_lastEJ num_of_downEJs stop_to_downEJs threeUTRlength is_NMD
#> <chr> <dbl> <int> <chr> <dbl> <lgl>
#> 1 ENSMUST00000057343 364 3 "69,286,364" 644 TRUE
#> 2 ENSMUST00000095457 -130 0 "" 1085 FALSE
#> 3 ENSMUST00000165704 -130 0 "" 1497 FALSE
#> 4 ENSMUST00000165724 286 2 "69,286" 362 TRUE
#> 5 ENSMUST00000168683 -145 0 "" 0 FALSE
#> 6 ENSMUST00000169091 -36 0 "" 0 FALSE
#> 7 ENSMUST00000169483 139 1 "139" 152 TRUE
#> 8 ENSMUST00000171599 -63 0 "" 0 FALSE
#> 9 ENSMUST00000172282 -130 0 "" 1158 FALSE
#> Warning message:
#> 7 transcript(s) have missing cds info and was not analyzed
# predict several transcripts from Ptbp1 gene for NMD features
pn_Ptbp1_transcripts <- predictNMD(mm10_gtf, gene_name == 'Ptbp1', transcript_id == c('ENSMUST00000057343', 'ENSMUST00000095457'))
pn_Ptbp1_transcripts
#> # A tibble: 2 x 6
#> transcript stop_to_lastEJ num_of_downEJs stop_to_downEJs threeUTRlength is_NMD
#> <chr> <dbl> <int> <chr> <dbl> <lgl>
#> 1 ENSMUST00000057343 364 3 "69,286,364" 644 TRUE
#> 2 ENSMUST00000095457 -130 0 "" 1085 FALSE 10 Session Information
All of the output in this vignette was produced under the following conditions:
sessionInfo()
#> R version 4.0.3 (2020-10-10)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Mojave 10.14.6
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> attached base packages:
#> [1] stats4 parallel stats graphics grDevices utils datasets
#> [8] methods base
#>
#> other attached packages:
#> [1] GenomicFeatures_1.42.1 AnnotationDbi_1.52.0
#> [3] Biobase_2.50.0 BSgenome.Mmusculus.UCSC.mm10_1.4.0
#> [5] BSgenome_1.58.0 rtracklayer_1.50.0
#> [7] GenomicRanges_1.42.0 GenomeInfoDb_1.26.0
#> [9] Biostrings_2.58.0 XVector_0.30.0
#> [11] IRanges_2.24.0 S4Vectors_0.28.0
#> [13] AnnotationHub_2.22.0 BiocFileCache_1.14.0
#> [15] dbplyr_2.0.0 BiocGenerics_0.36.0
#> [17] factR_0.0.0.9000
#>
#> loaded via a namespace (and not attached):
#> [1] MatrixGenerics_1.2.0 httr_1.4.2
#> [3] bit64_4.0.5 shiny_1.5.0
#> [5] assertthat_0.2.1 askpass_1.1
#> [7] interactiveDisplayBase_1.28.0 BiocManager_1.30.10
#> [9] blob_1.2.1 GenomeInfoDbData_1.2.4
#> [11] Rsamtools_2.6.0 progress_1.2.2
#> [13] yaml_2.2.1 BiocVersion_3.12.0
#> [15] pillar_1.4.6 RSQLite_2.2.1
#> [17] lattice_0.20-41 glue_1.4.2
#> [19] digest_0.6.27 promises_1.1.1
#> [21] wiggleplotr_1.14.0 colorspace_2.0-0
#> [23] htmltools_0.5.0 httpuv_1.5.4
#> [25] Matrix_1.2-18 XML_3.99-0.5
#> [27] pkgconfig_2.0.3 biomaRt_2.46.0
#> [29] bookdown_0.21 zlibbioc_1.36.0
#> [31] purrr_0.3.4 xtable_1.8-4
#> [33] scales_1.1.1 later_1.1.0.1
#> [35] BiocParallel_1.24.1 openssl_1.4.3
#> [37] tibble_3.0.4 farver_2.0.3
#> [39] ggplot2_3.3.2 generics_0.1.0
#> [41] ellipsis_0.3.1 SummarizedExperiment_1.20.0
#> [43] cli_2.1.0 magrittr_1.5
#> [45] crayon_1.3.4 mime_0.9
#> [47] memoise_1.1.0 evaluate_0.14
#> [49] fansi_0.4.1 xml2_1.3.2
#> [51] prettyunits_1.1.1 tools_4.0.3
#> [53] data.table_1.13.2 hms_0.5.3
#> [55] lifecycle_0.2.0 matrixStats_0.57.0
#> [57] stringr_1.4.0 munsell_0.5.0
#> [59] DelayedArray_0.16.0 compiler_4.0.3
#> [61] rlang_0.4.8 grid_4.0.3
#> [63] RCurl_1.98-1.2 drawProteins_1.10.0
#> [65] rappdirs_0.3.1 labeling_0.4.2
#> [67] bitops_1.0-6 rmarkdown_2.5
#> [69] gtable_0.3.0 DBI_1.1.0
#> [71] curl_4.3 R6_2.5.0
#> [73] GenomicAlignments_1.26.0 knitr_1.30
#> [75] dplyr_1.0.2
#> [ reached getOption("max.print") -- omitted 9 entries ]