This is an R Markdown Notebook with exercises written for the workshop Validation – Experimental Design and Analysis Using STR-validator held at the 29th Congress of the International Society for Forensic Genetics (ISFG) in Washington, DC, Tuesday August 30th 2022 at 14:00 - 18:00.
These exercises are written using STR-validator version 2.4 running in a 64-bit R version 4.2.1 on a Windows 10 operating system. Screenshots may vary for other systems and the procedure may be different for other versions of STR-validator. The exercise has an easy difficulty level with no quality issues or errors. Each exercise can be performed using the STR-validator graphical user interface, or in the next section directly in R by command line using the strvalidator package. To aid the learning process identical names are used for data sets in STR-validator and the command line exercises.
The following software is required
The statistical software R: https://cloud.r-project.org/
The integrated development environment RStudio Desktop: https://www.rstudio.com/products/rstudio/download/#download
The R package strvalidator version 2.4.0:
install.packages("strvalidator", dependencies = TRUE)
install.packages("devtools", dependencies = TRUE)
library(devtools)
install_github("OskarHansson/strvalidator")
library(strvalidator)
## STR-validator 2.4.0 loaded!
strvalidator() Installation instructions can also be found at the STR-validator webpage.
Using the strvalidator package or its graphical user interface STR-validator:
There are multiple ways of setting up and running STR-validator. Either of the below alternatives can be used. The benefit alternative a. is if you also perform other tasks in R. RStudio is recommended for new users, however plots and viewed tables will show up within the RStudio Plots and Viewer tabs respectively. In alternative b. plots and tables will always show up in separate windows.
Through RStudio, RGui, or any other graphical user interface to R.
Open the R software of your choice by double click its program icon.
Load the strvalidator R package by typing the following code in the console window. Press [Enter] to execute the command.
library(strvalidator)
strvalidator() The STR-validator graphical user interface can be set up to launch from an icon, as a stand alone software. The instructions below are for RGui. The procedure in RStudio is slightly different. However, it is always the RGui icon that is modified in the end.
Open up a new session of RGui.
Clear the workspace using the menu Misc/Remove all objects, or the R command:
rm(list=ls(all=TRUE))
Open a new script window using the menu File/New script.
In the new script window type or copy paste this code:
.First <- function(){
library(strvalidator)
strvalidator()
}
Load the function using the menu Edit/Run all, or [Ctrl]+[A] to mark the code and [Ctrl]+[R] to run it.
Save the workspace as .RData (i.e. just a “file extension”) in your Documents folder using the menu File/Save workspace….
Copy the RGui shortcut/icon to the desktop.
Right click the RGui shortcut located on the desktop and select Properties.
Select the Shortcut tab.
Completely clear the text field Start in:.
Locate the Target field containing the full path to the RGui executable file. It looks similar to this:
"C:\Program Files\R\R-4.2.1\bin\x64\Rgui.exe" –cd-to-userdocs
Change the Rgui.exe part of the path to
Rterm.exe.
Click [OK] to exit and save.
Name the shortcut STR-validator.
To open STR-validator all you have to do is double click the new STR-validator icon. NB! Closing the R terminal window will also close STR-validator. Minimize it and use it for progress information and diagnostics. STR-validator will print useful messages to the terminal window.
Example Data
The authentic direct amplification data of FTA punches used for the exercises was kindly provided by Siv Gilfillan, Forensic Department, Oslo University Hospital. The DNA profiles have been anonymized by scrambling the alleles.
The input files can be downloaded from the https://sites.google.com/site/forensicapps/strvalidator/2022-isfg.
Conventions
[Bold] indicate buttons in the STR-validator graphical user interface.
Italic indicate text or selectable options in the STR-validator graphical user interface.
Italic_and_bold indicate data sets within the STR-validator graphical user interface, the R global environment, or file or folder names within the operating system.
“citation signs” indicate text written in option fields or text that should be typed into option fields.
STR-validator refer to the graphical user interface
of the strvalidator R package. STR-validator is opened by the
command strvalidator().
The analytical threshold should be based on signal-to-noise analyses of internally derived empirical data. An analytical threshold defines the minimum height requirement at and above which detected peaks can be reliably distinguished from background noise. Because the analytical threshold is based upon a distribution of noise values, it is expected that occasional, non-reproducible noise peaks may be detected above the analytical threshold. Usage of an exceedingly high analytical threshold increases the risk of allelic data loss. SWGDAM Interpretation Guidelines for Autosomal STR Typing by Forensic DNA Testing Laboratories (APPROVED 01/12/2017).
To estimate analytical thresholds non-template PCR controls or positive PCR controls, or samples with known profiles can be used. A higher noise level is expected with DNA containing samples. It is important to mask the input data from noise directly related the ILS (in negative controls), or directly related to known alleles (in positive controls and samples). This is done by excluding signals close to the ILS or in the stutter range from the calculation. Likewise, it is important to verify the assumption of normally distributed data under the method of choice. When the assumptions are met, a risk level can be assign to different AT so that an informed decision can be made about the AT to implement. To cover all capillaries of the instrument a full plate can be run. The level of noise is in our experience relatively stable across time.
The data set in this exercise is from 8 negative (non-template) PCR controls amplified using the SureID27 kit according to recommendations.
The results from the capillary electrophoresis were analysed in GeneMapper using a peak amplitude threshold (PAT) of 1 RFU. No stutter filter or global cut-off was applied.
During manual inspection of the EPGs it was noted that weak pull-up or bleed-through from the internal lane standard (ILS) were common in the yellow and purple dye channel for negative controls (<10 data points from the corresponding ILS peak).
This exercise is estimated to approximately 10 minutes and will teach:
Single file import with automatic trimming of ladders and conversion to the STR-validator format.
Saving the STR-validator project.
Estimation of analytical thresholds from negative controls, excluding evaluation of assumptions for the distribution of noise (Mönich et al. 2015).
Plotting of examples and saving images.
Additional Learning Resources
STR-validator excercise: For a more comprehensive exercise that additionally teaches data manipulation tools of STR-validator, analysis of noise from positive controls, and verification of assumptions for the distribution of noise, see exercise analytical_thresholds in ghep-isfg2018_exercises_easy.zip available at the STR-validator GHEP-ISFG 2018 webpage.
STR-validator video tutorial: Estimation of analytical thresholds (2017).
Publications: Mönich et al. (2015), Bregu et al. (2013), and Rakay, Bregu, and Grgicak (2012)
The data is available as an exported GeneMapper SamplePlotSizingTable (refer to the guideline Estimate Analytical Thresholds Using STR-validator for instructions) named sureid_neg_SamplePlotSizingTable.txt
Open STR-validator graphical user interface.
Import the data set into STR-validator.
Select the Workspace tab and click the [Import] button.
Use the [Select file] button to locate the file in the input_files folder. Select the file sureid_neg_SamplePlotSizingTable.txt and click [OK].
Check the Auto trim samples option to remove the allelic ladders. Change the search string to “neg” and uncheck the Invert (remove matching samples) option.
The Auto slim repeated columns option can be unchecked.
The filename “sureid_neg” is suggested as Name in the Save options group.
Click the [Import] button.
The import from files dialogue.
Save the project.
Select the Workspace tab and click the [Save As] button.
Browse to the desired folder where you want to save your project and click the [Select folder] button.
The Save as dialog opens. Give the project a suitable name and click [ok]. A confirmation dialog is shown to indicate successful saving of the project (the save process can take some time). Click [OK] to dismiss.
It is advised to regularly click the [Save] button. The save path is remembered within each project. If the project is moved to another location, use the [Save As] button to update the path.
Estimate analytical thresholds.
Select the AT tab and click the [Calculate] button, to Calculate analytical threshold (AT1, AT2, AT4, AT7).
Select the sureid_neg data set.
A reference data set is not needed for negative control samples.
The kit should be automatically detected as SureID27.
The Ignore case option can be left checked, and the Add word boundaries option can be left unchecked. These are name matching options for the reference data set.
Check the option to Mask high peaks and set the corresponding value Mask all peaks above (RFU) to “200” (a value well above any noise signal).
Uncheck the option to Mask sample alleles. This is used when a reference data set is provided.
Check the Mask ILS peaks option and set the corresponding value Range around known peak to “10” (the range should come from the manual inspection in GeneMapper to include pull-up peaks from the internal lane standard).
Leave the thresholds values to their default. The analysis can be repeated using different values.
Manually inspect the masking by first clicking the [Prepare and mask] button. Then select a sample to plot in the drop-down list. In RStudio the plot appears in the Plots tab in the lower right corner. If Rgui is used the plot appears in its own window. Pay attention to masked areas and excluded peaks. Select a few other random samples to inspect. You should be confident that the mask settings exclude any pull-up and sporadic high peaks from the analysis.
Select a representative plot and click the [Save plot] button.
Accept the default names for the result files and click the [Calculate] button.
The calculate analytical threshold dialogue.
Find the estimated ATs.
Note: Evaluation of the assumptions for the distribution of noise should be performed before selecting a method to estimate AT. In this example, method AT7 is the preferred since the transformed data was approximately normally distributed.
Click the [View] button.
Select the sureid_neg_at data set.
Maximize the window and scroll to the right. The most interesting columns are the Global.AT7 which is the AT7 method estimate using the noise signals from all samples, and the estimates per dye Global.B.AT7, Global.G.AT7, Global.Y.AT7, and Global.R.AT7.
The result can be copied or exported to a text file or spreadsheet software.
The estimated analytical thresholds.
Open RStudio and a new R Script document.
Load the strvalidator package:
library(strvalidator)Import the data set to a variable using the import function. The import.file parameter should be modified to point to the file to be imported.
sureid_neg <- import(import.file = params$path.to.sample.plot.sizing.table,
file.name = TRUE, time.stamp = TRUE,
auto.trim = TRUE, trim.samples = "neg", trim.invert = FALSE,
auto.slim = FALSE)Estimate analytical thresholds using the calculateAT function:
sureid_neg_res <- calculateAT(data = sureid_neg, ref = NULL, mask.height = TRUE, height = 200,
mask.sample = FALSE, per.dye = TRUE, range.sample = 20,
mask.ils = TRUE, range.ils = 10, k = 3, rank.t = 0.99, alpha = 0.01,
ignore.case = TRUE, word = FALSE)The result is a list of three data.frames namely: 1) the estimated analytical thresholds, 2) the ranked list of noise signals, and 3) the input data with masking information. To print the result we need to get the first data.frame from the result list. Then extract estimates for the method of choice, in this case AT7, and print the result.
# Extract the first data.frame from the list of results.
sureid_neg_at <- sureid_neg_res[[1]]
# Extract the first row in the data.frame.
sureid_neg_at <- sureid_neg_at[1,]
# Extract all columns containing "AT7".
sureid_neg_at <- sureid_neg_at[,grepl(names(sureid_neg_at), pattern = "AT7")]
# Extract all columns containing "Global".
sureid_neg_at <- sureid_neg_at[,grepl(names(sureid_neg_at),pattern = "Global")]
# Show values rounded to one decimal.
print(round(sureid_neg_at,1))
## Global.AT7 Global.B.AT7 Global.G.AT7 Global.Y.AT7 Global.R.AT7
## 1 39 25.4 29.3 34.7 53.1For verification of electrophoretic equipment it is recommended that “the precision of the instrument should be such that all measured alleles fall within a +/- 0.5 bp window around the measured size for the corresponding allele in the allelic ladder”. ENFSI Recommended Minimum Criteria for the Validation of Various Aspects of the DNA Profiling Process (ISSUE DATE: November 2010).
Allele Sizing Precision can be estimated by running allelic ladders as samples. For example, in order to cover all capillaries of the instrument, run at least one full injection of allelic ladders. Repeat the experiment at a different time. Analyse all ladders together to estimate the precision of the instrument.
The data set in this exercise is from 4 allelic ladders from the SureID27 kit.
The results from the capillary electrophoresis were analysed in GeneMapper using the previously determined analytical threshold (AT).
This exercise is estimated to approximately 15 minutes and will teach:
Other Learning Resources
STR-validator video tutorial: Estimation of allele sizing precision (2017).
Publications: Ensenberger et al. (2016)
The data is available as an exported GeneMapper Genotypes Table named sureid_ladder.txt
Open STR-validator graphical user interface.
Import the data set into STR-validator.
Select the Workspace tab and click the [Import] button.
Use the [Select file] button to locate the file in the input_files folder. Select the file sureid_ladder.txt and click [OK].
Check the Auto trim samples option to extract the allelic ladders. Change the search string to “ladder”. Uncheck the Invert option to extract sample names containing the string “ladder”.
Check the Auto slim repeated columns option.
The filename “sureid_ladder” is suggested as Name in the Save options group.
Click the [Import] button. Tip: the R terminal window show the path to the imported file, which is especially useful when importing multiple files.
The Import from file dialogue.
Filter the data
Select the Precision tab and click the [Filter] button.
Select the sample data set sureid_ladder. The kit SureID27 should be automatically detected.
Select the filter option Filter by kit bins (allelic ladder) to remove any peak other than what is defined in the kit bins. The Reference sample name matching option Ignore case can be left checked, while Excact matching and Add word boundaries can be unchecked.
The option to Exclude virtual bins1 should be checked to retain peaks that correspond to physical fragments present in the ladder.
Uncheck any pre-processing and post-processing options.
Accept the default name for the result and click the [Filter] button.
The Filter profile dialogue.
Calculate summary statistics
In the Precision tab, click the [Statistics] button to Calculate summary statistics for Size.
The sureid_ladder_filter (i.e. most recent) data set should be automatically selected.
The target column should be set to Size (Repeat the process if you want to calculate for Height or Data.Point. There are also dedicated buttons that opens Calculate summary statistics with pre-loaded options).
The Group by column(s) should be set to “Marker,Allele”. Note: if entered manually the column names must be separated by a comma, without spaces.
Leave the Count unique values in column drop down menu unselected. The Calculate quantile and Round to decimals can be left with the default values.
Accept the default name for the result and click the [Calculate] button.
The Calculate summary statistics dialogue.
Calculate and add the absolute difference
Select the Tools tab and click the [Columns] button.
Select the sureid_ladder_filter_stats data set.
Select Size.Max as column 1 and Size.Min as column 2.
Type “Size.Diff” as Column for new values.
Select the Action “-” (subtract).
Remove the appended “_new” in the Name for result to overwrite the data set.
Click the [Execute] button. Click [Yes] on the question to overwrite.
The Column actions dialogue.
Locate the worst sizing precision
Select the Workspace tab.
Select the sureid_ladder_filter_stats data set.
Click the [View] button.
In RStudio the dataset is shown in the Viewer tab in the lower right pane. Click the button Show in new window (an arrow on a window icon) for larger table (alternatively, use the Zoom icon).
Click the column header Size.Diff to sort in ascending order. Click again to sort in descending order. The minimum and maximum absolute difference respectively, can be read at the first row, and the corresponding marker in the Marker column. The sorted tables can be exported to different formats.
The interactive table viewer.
Plot sizing precision
Select the Precision tab and click the [Plot] button.
Select the sureid_ladder_filter data set. Tip: if you forgot to filter the data set, and there are “OL” alleles in the data set, a warning will be shown.
Check the Plot by marker option.
The X-axis should be Allele.
Change the Plot theme to theme_bw(). Further customization can be done by expanding the Data points, Axes, and Override default x/y/facet labels option groups.
Click the [Size] button in the Plot precision data as dotplot group. If we have more data per allele, we could use a boxplot instead. In RStudio the plot is shown in the Plots tab located in the lower right pane. Click the Zoom button to view the plot in a larger window. It is possible to export the plot from RStudio. Alternatively, the plot can be exported from STR-validator as described below.
In the Plot precision window, click the [Save as image] button.
Set the file extension to png image.
Uncheck the Overwrite existing file and Load size from plot device options.
Manually set the Width to “40” and Height to “20” cm. Leave the default “300” for Resolution and “1” for Scaling factor.
Click [file] to locate a folder to save the image (e.g. output_files) and click [Select Folder]. Click the [Save] button to save the image.
The settings for the Save as image dialogue will be remembered.
(Optional) In the Plot precision window, click the [Save as object] button to save the plot object in the workspace. The object can be viewed or manipulated at a later time.
Close the Plot precision window.
The Plot precision dialogue.
Example of a precision plot.
Open RStudio and a new R Script document.
Load the strvalidator package:
library(strvalidator)Import the data set to a variable using the import function. The import.file parameter should be modified to point to the file to be imported.
sureid_ladder <- import(import.file = params$path.to.precision.table,
file.name = TRUE, time.stamp = TRUE,
auto.trim = TRUE, trim.samples = "ladder", trim.invert = FALSE,
auto.slim = TRUE)Filter any additional peaks using the filterProfile function:
# Get markers, bins and flag for virtual bins.
ref <- getKit(kit = "SureID27", what = "VIRTUAL")
# Extract physical bins.
ref <- ref[ref$Virtual == 0, ]
# Filter using the bins as known profile.
sureid_ladder_filter <- filterProfile(data = sureid_ladder, ref = ref)Calculate the allele sizing precision using the calculateStatistics function:
sureid_ladder_filter_stats <- calculateStatistics(data = sureid_ladder_filter, target = "Size", quant = 0.5,
group = c("Marker","Allele"), count = NULL, decimals = 4)Calculate and add the absolute difference:
# Calculate and add the absolute difference to the result.
sureid_ladder_filter_stats$Size.Diff <- sureid_ladder_filter_stats$Size.Max - sureid_ladder_filter_stats$Size.MinView the result as interactive table (possible to sort and export) in RStudio Viewer tab:
# Load DT package (https://rstudio.github.io/DT/).
library(DT)
# Convert to a DT object.
DT.pr.stats <- datatable(sureid_ladder_filter_stats,
rownames = FALSE, filter = "top", extensions = "Buttons",
options = list(dom = "Bfrtip", buttons = c("copy", "csv", "excel", "pdf", "print"))) %>% formatRound(9, 2) # Round column 9 to 2 decimals.
# Show interactive table in RStudio Viewer tab.
DT.pr.stats
Alternatively, print manually sorted result in the RStudio Console tab:
# Load data.table package (https://rdatatable.gitlab.io/data.table/).
library(data.table)
# Convert to a data.table object.
DT.pr.stats <- data.table(sureid_ladder_filter_stats)
# Markers/alleles with the highest precision.
DT.pr.stats[order(Size.Diff, Marker, Allele)]
# Markers/alleles with the lowest precision.
DT.pr.stats[order(-Size.Diff, Marker, Allele)]
View the result as interactive plot in RStudio Viewer tab:
# Load the plotting package (https://plotly.com/r/getting-started/).
library(plotly)
# Convert to data.table
DT.pr <- data.table(sureid_ladder_filter)
# Calculate mean and deviation from mean, by marker and allele, and add the result to the data set.
DT.pr[, c("Mean", "Size.n") := list(mean(Size), .N), by = .(Marker, Allele)]
DT.pr[, c("Deviation") := list(Mean - Size), by = .(Marker, Allele)]
# Create and show interactive plot.
fig.pr <- plot_ly(data = DT.pr, x = ~Allele, y = ~Deviation, color = ~Marker, type = "scatter")
fig.pr
Lower template DNA may cause extreme heterozygote imbalance; as such, empirical heterozygote peak-height ratio data could be used to formulate mixture interpretation guidelines and determine the appropriate ratio by which two peaks are determined to be heterozygotes. SWGDAM Validation Guidelines for DNA Analysis Methods (Approved 12/05/2016).
The peak balance ratios of heterozygote alleles within a locus and of alleles between all loci should be >60% for good quality samples. ENFSI Recommended Minimum Criteria for the Validation of Various Aspects of the DNA Profiling Process (ISSUE DATE: November 2010).
Peak height imbalances may be seen in the typing results from, for example, a primer binding site variant that results in attenuated amplification of one allele of a heterozygous pair. Likewise, degraded, inhibited, and/or low level single-source DNA samples may exhibit poor peak height balance with heterozygous alleles. SWGDAM Interpretation Guidelines for Autosomal STR Typing by Forensic DNA Testing Laboratories (APPROVED 01/12/2017).
The data set in this exercise comes from authentic FTA reference samples. 85 samples were amplified using the SureID 27 kit. A reference data set with known profiles is available.
The results from the capillary electrophoresis were analysed in GeneMapper using the previously determined analytical threshold (AT), without applied stutter filter and with no global cut-off.
This exercise is estimated to approximately 15 minutes and will teach:
Additional Learning Resources
STR-validator video tutorial: Analysis of balance (2017).
Publications: Bright, Turkington, and Buckleton (2010), Bright et al. (2014), Hansson, Egeland, and Gill (2017)
The data is available as an exported GeneMapper Genotypes Table file named sureid_data.txt
The reference data set is in a file named sureid_ref.txt
Open STR-validator graphical user interface.
Import samples and references into STR-validator.
Select the Workspace tab and click the [Import] button.
Use the [Select file] button to locate the file in the input_files folder. Select the file sureid_data.txt and click [Open].
The exercise text file sureid_data.txt contain only the samples and is already in the slim STR-validator format so the Auto trim samples and Auto slim options can be unchecked. Normally, these options are checked to remove control samples and ladders and to convert the file from the semi-wide GeneMapper Table format to the slim STR-validator format.
The filename “sureid_data” is suggested as Name for dataset in the Save as group.
Click the [Import] button. Tip: the R terminal window show the path to the imported file, which is especially useful when importing multiple files.
Use the [Select file] button to locate the file in the input_files folder. Select the file sureid_ref.txt and click [Open].
The filename “sureid_ref” is suggested as Name for dataset in the Save as group.
Use the same settings as previously and click the [Import] button.
The Import from files dialogue.
Calculate heterozygote peak balance (Hb).
Select the Balance tab. In the Heterozygote balance (intra-locus) function group, click the [Calculate] button.
Select the sureid_data data set.
Select the sureid_ref reference data set.
(Optional) Click the [Check subsetting] button to confirm that samples are matched with the correct reference. Tip: This is especially useful when one reference should match multiple samples, using sample name matching options.
Uncheck the pre-processing options Remove sex markers and Remove quality sensors. The SureID 27 kit does not contain Y-markers or quality sensors.
Select Smaller peak / larger peak in the drop-down menu Define Hb as.
Leave the Sample name matching options unchanged (the settings does not matter in this example).
(Optional) To speed up the analysis, the Post-processing option to Calculate average peak height can be unchecked if not needed. Tip: The calculated Proportion can be a useful quality control that all profiles are complete.
Accept the default name for the result and click the [Calculate] button.
The Calculate heterozygote balance dialogue.
Plot the results.
Click the [Plot] button located in the Heterozygote balance (intra-locus) function group.
Select the sureid_data_hb data set.
Leave the Exclude sex markers option unchecked to keep Amelogenin, and the Log(balance) option unchecked to plot normal ratios.
Select the Do not facet or wrap option.
Click the [Hb vs. Marker] button. In RStudio the plot is shown in the Plots tab located in the lower right pane. Click the Zoom button to view the plot in a larger window. It is possible to export the plot from RStudio. Alternatively, the plot can be exported from STR-validator as described below. Tip: Inspect the plot to check that the result is reasonable and if there are unexpected outliers indicating errors or quality issues.
Click the [Save as image] button.
Set the file extension to png image.
Uncheck the Overwrite existing file and Load size from plot device options.
Manually set the Width to “40” and Height to “20” cm. Leave the default “300” for Resolution and “1” for Scaling factor.
Click [file] to locate a folder to save the image (e.g. output_files) and click [Select Folder]. Click the [Save] button to save the image.
The settings for the Save as image dialogue will be remembered.
(Optional) In the Plot balance window, click the [Save as object] button to save the plot object in the workspace. The object can be viewed or manipulated at a later time.
Close the Plot balance window.
The Plot balance dialogue.
Example of a heterozygous balance plot.
Calculate summary statistics.
Click the [Statistics] Calculate summary statistics by marker button located in the Heterozygote balance (intra-locus) function group.
The options, including the sureid_data_hb_norm_dye data set (i.e. the most recent data set), should be pre-filled according to the screenshot.
Accept the default name for the result and click the [Calculate] button.
(Optional) Click the [View] button to view, copy, or export the result.
(Optional) Click the [Statistics] Calculate global summary statistics button.
The sureid_data_hb data set must be manually selected (since it is no longer the most recent data set).
Add “_global” to the suggested Name for result and click the [Calculate] button.
Click the [View] button to view the minimum hb for the data set and overall percentile.
The Calculate summary statistics dialogue.
Calculate the inter-locus (profile) balance.
Click the [Calculate] Calculate profile balance button located in the Profile balance (inter-locus) function group.
Select the sureid_data sample data set.
Select the sureid_ref reference data set. Tip: a reference data set is not needed, in which case the sum of peak heights, including artefacts, in each marker will be used to calculate the locus balance. This will speed up the analysis.
Check the pre-processing option Remove off-ladder alleles, uncheck the options Remove sex markers to keep Amelogenin, and uncheck Remove quality sensors.
Select to Calculate locus balance as Normalised and check the option to Calculate Lb by dye channel.
Leave the options for Reference sample name matching unchanged.
Check the post-processing option to Calculate average peak height (this option require a reference data set).
Add “_norm_dye” to the suggested Name for result. Tip: It is useful with a descriptive name in case you want to calculate multiple options.
Click the [Calculate] button.
The Calculate locus balance dialogue.
Plot the results.
Click the [Plot] button located in the Profile balance (intra-locus) function group.
Select the sureid_data_lb data set.
Leave the Exclude sex markers option unchecked to keep Amelogenin, and the Log(balance) option unchecked to plot normal ratios.
Select the Do not facet or wrap option.
Click the [Lb vs. Marker] button. In RStudio the plot is shown in the Plots tab located in the lower right pane. Click the Zoom button to view the plot in a larger window. It is possible to export the plot from RStudio. Alternatively, the plot can be exported from STR-validator as described below. Tip: Inspect the plot to check that the result is reasonable and if there are unexpected outliers indicating errors or quality issues.
Click the [Save as image] button.
Set the file extension to png image.
Uncheck the Overwrite existing file and Load size from plot device options.
Manually set the Width to “40” and Height to “20” cm. Leave the default “300” for Resolution and “1” for Scaling factor.
Click [file] to locate a folder to save the image (e.g. output_files) and click [Select Folder]. Click the [Save] button to save the image.
The settings for the Save as image dialogue will be remembered.
(Optional) In the Plot balance window, click the [Save as object] button to save the plot object in the workspace. The object can be viewed or manipulated at a later time.
Close the Plot balance window.
The Plot balance dialogue.
Example of a locus balance plot.
Open RStudio and a new R Script document.
Load the strvalidator package:
library(strvalidator)Import the data set and reference profiles to variables using the import function. The import.file parameter point to the file to be imported.
sureid_data <- import(import.file = params$path.to.data,
file.name = TRUE, time.stamp = TRUE,
auto.trim = FALSE, trim.samples = NULL, trim.invert = FALSE,
auto.slim = FALSE)
sureid_ref <- import(import.file = params$path.to.ref,
file.name = TRUE, time.stamp = TRUE,
auto.trim = FALSE, trim.samples = NULL, trim.invert = FALSE,
auto.slim = FALSE)Calculate the heterozygote balance using the calculateHb function:
sureid_data_hb <- calculateHb(data = sureid_data, ref = sureid_ref, hb = 3,
kit = "SureID27", sex.rm = FALSE, qs.rm = FALSE,
ignore.case = TRUE, word = FALSE, exact = FALSE)View the result as interactive plot in RStudio Viewer tab:
# Load the plotting package (https://plotly.com/r/getting-started/).
library(plotly)
# Sort by marker order defined in kit. Add factors according to kit information.
sureid_data_hb <- sortMarker(data = sureid_data_hb, kit = "SureID27", add.missing.levels = TRUE)
# Convert to data.table
DT.hb <- data.table::data.table(sureid_data_hb)
# Create interactive plot.
sureid_data_hb_plotly <- plotly::plot_ly(data = DT.hb, x = ~Marker, y = ~Hb,
color = ~Dye, type = "box",
colors = c("blue", "lightgreen", "goldenrod", "red"),
text = ~paste("Sample: ", Sample.Name))
# Show interactive plot.
sureid_data_hb_plotly
Calculate summary statistics using the calculateStatistics function:
sureid_data_hb_stats <- calculateStatistics(data = sureid_data_hb, target = "Hb", quant = 0.05,
group = c("Marker"), count = NULL, decimals = 4)
# Load DT package (https://rstudio.github.io/DT/).
library(DT)
# Convert to a datatable object.
DT.hb <- DT::datatable(sureid_data_hb_stats)
# Show interactive table.
DT.hb
Calculate the profile balance using the calculateLb function:
sureid_data_lb <- calculateLb(data = sureid_data, ref = sureid_ref,
option = "norm", by.dye = TRUE,
ol.rm = TRUE, sex.rm = FALSE, qs.rm = FALSE, na = NULL,
kit = "SureID27",
ignore.case = TRUE, word = FALSE, exact = FALSE)View the result as interactive plot in RStudio Viewer tab:
# Sort by marker order defined in kit. Add factors according to kit information.
sureid_data_lb <- sortMarker(data = sureid_data_lb, kit = "SureID27", add.missing.levels = TRUE)
# Convert to data.table
dt.lb <- data.table::data.table(sureid_data_lb)
# Create interactive plot.
sureid_data_lb_plotly <- plotly::plot_ly(data = dt.lb, x = ~Marker, y = ~Lb,
color = ~Dye, type = "box",
colors = c("blue", "lightgreen", "goldenrod", "red"),
text = ~paste("Sample: ", Sample.Name))
# Show interactive plot.
sureid_data_lb_plotly
The stutter ratio is the ratio of the stutter peak height compared to the corresponding allele peak height. In general, stutter peaks have to be lower than the % of the allele peak height indicated by the manufacturer of the kit to be ignored as a biological artefact of the sample. ENFSI Recommended Minimum Criteria for the Validation of Various Aspects of the DNA Profiling Process (ISSUE DATE: November 2010). Based on internal verification of the kit, it may be necessary to adjust stutter ratios provided by the manufacturer.
The data set in this exercise comes from 85 authentic FTA reference samples amplified using the SureID 27 kit. A reference data set with known profiles is available.
The results from the capillary electrophoresis were analysed in GeneMapper using the previously determined analytical threshold, without applied stutter filter, and with no global cut-off.
This exercise is estimated to approximately 10 minutes and will teach:
Additional Learning Resources
STR-validator video tutorial: Analysis of stutter ratios (2017).
Publications: Brookes et al. (2012), Gibb et al. (2009), Klintschar and Wiegand (2003)
The result is available as an exported GeneMapper Genotypes Table named sureid_data.txt
The reference data set is in a file named sureid_ref.txt
Open STR-validator graphical user interface.
Import samples and references into STR-validator.
Select the Workspace tab and click the [Import] button.
Use the [Select file] button to locate the file in the input_files folder. Select the file sureid_data.txt and click [Open].
The exercise text file sureid_data.txt contain only the samples and is already in the slim STR-validator format so the Auto trim samples and Auto slim options can be unchecked. Normally, these options are checked to remove control samples and ladders and to convert the file from the semi-wide GeneMapper Table format to the slim STR-validator format.
The filename “sureid_data” is suggested as Name for dataset in the Save as group.
Click the [Import] button. Tip: the R terminal window show the path to the imported file, which is especially useful when importing multiple files.
Use the [Select file] button to locate the file in the input_files folder. Select the file sureid_ref.txt and click [Open].
The filename “sureid_ref” is suggested as Name for dataset in the Save as group.
Use the same settings as previously and click the [Import] button.
The Import from files dialogue.
Calculate stutter ratios.
Select the Stutter tab and click the [Calculate] button.
Select the sureid_data data set.
Select the sureid_ref reference data set.
(Optional) Click the [Check subsetting] button to confirm that samples are matched with the correct reference. Tip: This is especially useful when one reference should match multiple samples, using sample name matching options.
Set the option to calculate stutter ratios within the range “2” backward stutters to “1” forward stutter.
Select the radio button No overlap between stutter and alleles.
Optional: The table replace false stutters can be customized to fix artefacts from the numeric allele - stutter calculation, which does not take account for the number of base pairs in a repeat unit.
Accept the default name for the result and click the [Calculate] button.
The Calculate stutter ratio dialogue.
Plot the results.
Click the [Plot] button.
Select the sureid_data_stutter data set.
Leave the Exclude sex markers option checked to exclude Amelogenin, and all other options unchecked.
Click the [Ratio vs. Allele] button. In RStudio the plot is shown in the Plots tab located in the lower right pane. Click the Zoom button to view the plot in a larger window. It is possible to export the plot from RStudio. Alternatively, the plot can be exported from STR-validator as described below. Tip: Inspect the plot to check that the result is reasonable and if there are unexpected outliers indicating errors or quality issues.
Click the [Save as image] button (the [Save as object] button is disabled due to a limitation in the plot function).
Set the file extension to png image.
Uncheck the Overwrite existing file and Load size from plot device options.
Manually set the Width to “40” and Height to “20” cm. Leave the default “300” for Resolution and “1” for Scaling factor.
Click [Open] to locate a folder to save the image (e.g. output_files) and click [Select Folder]. Click the [Save] button to save the image.
Close the Plot stutter ratios window.
The Plot stutter ratios dialogue.
Example of a stutter plot.
Calculate summary statistics.
Click the [Statistics] Calculate summary statistics by marker and stutter type button.
The options, including the sureid_data_stutter data set (i.e. the most recent data set), should be pre-filled according to the screenshot.
Click the [Calculate] button.
(Optional) Click the [View] button to view, copy, or export the result.
The Calculate summary statistics dialogue.
Open RStudio and a new R Script document.
Load the strvalidator package:
library(strvalidator)Calculate stutter ratio using the calculateStutter function:
# Replace 'false' stutters.
val_replace <- c(-1.9, -1.8, -1.7, -0.9, -0.8, -0.7, 0.9, 0.8, 0.7)
val_by <- c(-1.3, -1.2, -1.1, -0.3, -0.2, -0.1, 0.3, 0.2, 0.1)
# Calculate stutters.
sureid_data_stutter <- calculateStutter(data = sureid_data, ref = sureid_ref,
back = 2, forward = 1, interference = 0,
replace.val = val_replace, by.val = val_by)Calculate summary statistics using the calculateStatistics function:
# Calculate summary statistics.
sureid_data_stutter_stat <- calculateStatistics(data = sureid_data_stutter, target = c("Ratio"),
group = c("Marker", "Type"), count = c("Allele"),
quant = 0.95, decimals = 4)View the result as interactive table (possible to sort and export) in RStudio Viewer tab:
# Sort by marker order defined in kit. Add factors according to kit information.
sureid_data_stutter_stat <- sortMarker(data = sureid_data_stutter_stat, kit = "SureID27", add.missing.levels = TRUE)
# Load the DT package.
library(DT)
# Convert to DT for interctive tables.
DT::datatable(sureid_data_stutter_stat, rownames = FALSE, filter = 'top', extensions = 'Buttons',
options = list(dom = 'Bfrtip', buttons = c('copy', 'csv', 'excel', 'pdf', 'print')),
caption = 'Table 1: This is a simple caption for the table.') %>%
formatRound(5:9, 3) # Round columns 5-9 to 3 decimals.
The kit configuration with marker ranges is an important features of a kit. It is easy to create illustrative figures including one or multiple kits.
To enable plotting of kit configuration in STR-validator the kit of
interest must be included in the kit configuration file. The file is
installed with the strvalidator package, usually at
C:\..\Dokument\R\win-library\4.1\strvalidator\extdata\kit.txt.
It is possible to edit the file manually using a text editor or
spreadsheet software. However, the easiest way of adding new kits is
using the Manage kits function that is found under the
DryLab tab by clicking the [Kits] button. The
GeneMapper Bins and Panels file provided by the
manufacturer of the kit is required. NB! If you have customized the
kit configuration file, you should keep a backup since it is overwritten
whenever a new versions of strvlidator is installed.
This exercise is estimated to approximately 5 minutes and will teach:
Currently there is no strvalidator function to create kit configuration plots using the command line.
Open STR-validator graphical user interface.
Create kit conofiguration plot.
Select the DryLab tab and click the [Kits] button.
In the Select kits option group, check the box for “SureID27”.
Change the Kit name size and Marker name size to “6”, the Marker height to “0.4”, and the Marker transparency to “0.5”. Type “sureid_ggplot” in the Name for result field.
Click the [Plot] button. In RStudio the plot is shown in the Plots tab located in the lower right pane. Click the Zoom button to view the plot in a larger window. It is possible to export the plot from RStudio. Alternatively, the plot can be exported from STR-validator as described below.
Click the [Save as image] button.
Set the file extension to png image.
Uncheck the Overwrite existing file and Load size from plot device options.
Manually set the Width to “40” and Height to “20” cm. Leave the default “300” for Resolution and “1” for Scaling factor.
Click [Open] to locate a folder to save the image (e.g. output_files) and click [Select Folder]. Click the [Save] button to save the image.
Close the Plot kit window.
The plot kit dialogue.
Example of a kit plot.
Virtual bins correspond to allele positions not physically present in the ladder. These have been verified by the manufacturer, or added by the laboratory↩︎