IgIDivA [Immunoglobulin Intraclonal Diversification Analysis] is a purpose-built tool for the analysis of the intraclonal diversification process using high-throughput sequencing data.
It is written in shiny. Every step of the analysis can be performed interactively, thus not requiring any programming skills.
It takes as input the output files “clonotypes_computation” and “grouped_alignment_nt” from the tripr package.
Functions for an R command-line use are also available.
The IgIDivA scripts can be freely downloaded here. It requires R [version “4.1”], which can be installed on any operating system [e.g., Linux, Windows, MacOS] from CRAN. Installation with Docker will be available in the coming future.
All the packages that need to be installed in the R session are the following:
install.packages("shiny")
install.packages("shinyFiles")
install.packages("fs")
install.packages("pdftools")
install.packages("purrr")
install.packages("DT")
install.packages("bslib")
install.packages("shinyhelper")
install.packages("data.table")
install.packages("stringr")
install.packages("RGenetics")
install.packages("dplyr")
install.packages("ggsci")
install.packages("tidygraph")
install.packages("ggraph")
install.packages("igraph")
install.packages("ggplot2")
install.packages("ggpubr")
install.packages("rstatix")
install.packages("shinyvalidate")
All the scripts from IgIDivA need to be downloaded in the same folder. All the input files should also be stored in a different folder.
Alternatively, IgIDivA can be installed using a conda environment. We recommend to use Miniconda to install all the dependencies. The dependencies can be found in .yml format in the IgIDivA GitHub repository. The yml file and the IgIDivA scripts need to be stored in the working folder. After downloading all the files, a terminal should be opened and the following commands should be written:
conda env create -f IgIDivA.yml
conda activate IgIDivA
R
install.packages(c("shinyvalidate", "RGenetics","rstatix"))
q()
Rscript app.R
This will produce a url that can be copied in a web browser and will direct the user to the IgIDivA app.
IgIDivAAn example dataset to be used as Input for IgIDivA can be found here. The dataset comprises the tripr output files [“highly_sim_all_clonotypes” and “Grouped Alignment_nt] of 26 chronic lymphocytic leukemia (CLL) samples [19 CLL subset #2 samples and 7 CLL subset #169 samples]. The data was retrieved from ENA under the accession number PRJEB36589, and subsequently processed with IMGT/HighV-QUEST and tripr.
Each sample’s data can be downloaded by pressing the button Download.
Alternatively, to download all the data at the same time, the following commands can be used in the R session:
install.packages("zen4R")
library(zen4R)
path = paste0(getwd(), "/Input")
if (!dir.exists(path)){
dir.create(path)}
zen4R::download_zenodo('10.5281/zenodo.6616046', path = path)
[The variable “path” can be changed with the location where the user wants to store the Input].
Note: warnings might appear in RStudio indicating that the downloaded length of some files != reported length. This means that not all the length of those files was downloaded [probably due to the Internet speed]. One solution is to increase the ‘downloading’ time in Rstudio, with this command:
options(timeout = max(600, getOption("timeout")))
IgIDivA as a shiny applicationIn order to start the shiny app, the script app.R should be opened in the R session and the button Run App should be pressed.
In this tab users can create the folders where the results will be stored and import their data.
First, the user should specify the Results folder. For that, the user can go to the folder in their computer where they would like to store the output and press copy address as text:
Then, the copied path should be pasted in the area Enter desired path here, together with a “/” and the name of the Results folder that the user wants to create [e.g. “Results”]. If a folder with this name does not exist, it will be created:
Then, the Create Results Path should be pressed.
The following step consists on the selection of the Input folder. Following the same approach as for the output folder, the user will enter in the path where the Input files are stored. The tool takes as input for each sample the tripr output files “highly similar clonotype computation” and “grouped alignment nt”, in text format (.txt).
The input folder is selected:
Once the Input folder address has been added, users should verify it by pressing the button Upload. Then users can choose which samples from the Input folder they want to include in the analysis.
Users should subsequently verify the selection by pressing the button Verify. Please, mind the order of the steps. If the output folder is changed, it is necessary to press again the Verify button for the selected samples.
In order to make comparisons between groups of samples, the user needs to create a tab-delimited file with two columns.
The first column should be named “sample_id” and should include the names of the samples.
The second column should include the name of the group that each sample belongs to. By default the name of the column is “group_name”, but it can be modified in the Enter the name chosen for the second column button. The file would look like this:
An example file can be found here as “SampleGroups.txt”; the samples correspond to the data mentioned before.
Once created, the file can be uploaded through the Browse button. When it is uploaded, a message “Upload completed” will appear. Then, the tab “Set Parameters” should be opened.
There are different parameters that can be applied:
Clonotypes to be taken into account for the analysis:
Option for the user to choose the clonotypes to be included in the analysis. One approach would be, for example, to include the first [the most frequent] clonotype. The default is 1.
There are different options for the analysis that can be selected:
There are different metrics [or related calculations] that can be calculated for the description and determination of the intraclonal diversification level:
Then, it is possible to choose, among the graph metrics, which one(s) to use to perform comparisons between groups of samples.
Once all the parameters have been selected, the button Start must be pressed. A bar will show how much of the analysis has been completed.
The button Reset can be used to start a new analysis, resetting the parameters [the output results will be reset when pressing the Start button].
When the analysis is finished, a notification will appear with the message ‘File conversion in progress…’. This conversion is performed to allow the visualizations to be visible in the Visualize results tab. Once it is ready, the user will be automatically redirected to the Visualize Results tab.
This tab shows all the different output results and it offers the possibility of selecting them and choosing which sample to visualize. All the output results are saved locally in the user’s previously selected output folder.
For each sample, it shows the number of related clonotypes [clonotypes with the same IGV gene and very similar CDR3] considered for the analysis, the number of nucleotide variants included, the total number of sequences, the number of singletons [nucleotide variants constituted by only one sequence], number of expanded nucleotide variants [nucleotide variants constituted by more than 1 sequence], number of sequences belonging to expanded nucleotide variants, and the number of reads of the main nucleotide variant. [Example shown: sample H33].
For each sample, it shows the number of nt vars with additional SHMs for each given number of SHMs, as well as the total number of sequences. It includes the total number of nt vars and sequences. [Example shown: sample H33].
For each sample, it shows the number of sequences lacking SHMs of the main nt var, for each different number of SHMs. [Example shown: sample H33].
For each sample, it provides information for all unique SHMs or combinations of SHMs of all the nt vars that are part of the connected graph network. It also shows the number of SHMs in comparison to the germline, the number of sequences with those SHMs and the mutational level to which they belong. The mutational level is “less” if they have fewer SHMs than the main nt var, “main” for the SHMs of the main nt var, and “additional” for the cases with more SHMs than the main nt var. [Example shown: sample AMRMES].
It provides information of the replacement SHMs in the main nt var of each sample, together with the number of sequences carring each mutation. [Example shown: sample H33].
It contains all identified replacement SHMs in the main nt var of all the samples. It can be used to identify mutational patterns among samples. [Example shown: all samples from example dataset].
It contains all identified replacement SHMs in the nt vars with additional SHMs [excluding the ones of the main nt var]. [Example shown: sample H33].
It contains all identified replacement SHMs in the nt vars with additional SHMs [excluding the ones of the main nt var] for all the samples. It can be used to identify mutational patterns among samples. [Example shown: all samples from example dataset].
For each sample, it contains the germline identity %, the values of the graph metrics as well as information related to those metrics. [Example shown: sample H33].
It shows the graph metrics values for all the samples. If a sample has been discarded, the cause is provided. [Example shown: all samples from example dataset].
For each sample, it shows the graph network. [Example shown: sample H33].
If the parameter “Separate graphs” is selected, the graph network gets separated in two [nt vars with fewer SHMs than the main nt var on the left and nt vars with additional SHMs on the right]. For example [sample H33]:
If samples are classified into groups, the tool performs pairwise comparisons for all groups. This is performed independently for each of the graph metrics. [Example shown: all samples from example dataset].
It provides the names of samples that have been discarded from the analysis [e.g. samples with no connections among nt vars].