worksheet_lec4

Quality Control

Illumina

We begin our examination of Illumina BeadArray data with quality control of bead-level data. One initial approach is to create a graphical representation of the BeadArrays. For demonstration purposes, we use data from the R package beadarrayExampleData (Dunning, 2011).

First, we install the required packages. Additionally, we install the RColorBrewer package (Neuwirth, 2011), which contains different color palettes.

## Installation of the packages
BiocManager::install("BiocInstaller")
install.packages("RColorBrewer")
BiocManager::install("illuminaHumanv2.db")
## Load the packages
library(beadarray)
Loading required package: BiocGenerics
Loading required package: generics

Attaching package: 'generics'
The following objects are masked from 'package:base':

    as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
    setequal, union

Attaching package: 'BiocGenerics'
The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs
The following objects are masked from 'package:base':

    anyDuplicated, aperm, append, as.data.frame, basename, cbind,
    colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
    get, grep, grepl, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, saveRDS, table, tapply, unique,
    unsplit, which.max, which.min
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: hexbin
Welcome to beadarray version 2.58.0
library(illuminaHumanv3.db)
Loading required package: AnnotationDbi
Loading required package: stats4
Loading required package: IRanges
Loading required package: S4Vectors

Attaching package: 'S4Vectors'
The following object is masked from 'package:utils':

    findMatches
The following objects are masked from 'package:base':

    expand.grid, I, unname
Loading required package: org.Hs.eg.db
library(beadarrayExampleData)
library(RColorBrewer)
## Load the data
data(exampleBLData)
## Create color palette
mycol <- brewer.pal(3, "Set1")

We then generate an image of the second BeadArray using the imageplot() function from the beadarray package (Dunning et al., 2007).

## Using the package name and :: ensures that the correct function is called.  
## There is a potential naming conflict with a function of the same name from the package "marray".  

beadarray::imageplot(exampleBLData, array = 2, low = mycol[2], high = mycol[1])

Warning

The more packages you use in an R session, the greater the risk that objects (functions, datasets, etc.) will have identical names. All R packages must have what is called a namespace. By specifying the package name and using the operator ::, you can ensure that the correct function from the intended package is called. You can use find() or getAnywhere() function to locate all functions with the same name across different packages.

An artifact appears on the left side of the selected BeadArray. Specifically, the BASH algorithm by Cairns et al. (2008) can also be used to identify spatial artifacts. The beads identified in this way are referred to as masked and can be graphically displayed using the showArrayMask function.

Graphical representation of the positions of the masked beads of an Illumina BeadArray.

## Create a plot of the masked bead types
showArrayMask(exampleBLData, array = 2)

Another option is to use the function outlierplot(), which allows you to display so-called outliers graphically. According to Illumina’s definition, outliers are beads whose signal intensities deviate by more than three times the MAD (Median Absolute Deviation) from the median signal intensity of their respective bead type. In the resulting plot: Blue dots represent beads with signal intensities lower than expected (downward outliers). Red dots represent beads with intensities higher than expected (upward outliers).

Graphical representation of the positions of the outlier beads of an Illumina BeadArray.

## Create outlier plot
outlierplot(exampleBLData, 
            array = 2,
            lowOutlierCol = mycol[2],
            highOutlierCol = mycol[1])
65570  outliers found on the section

In the two preceding images, the artifact identified in the first image is clearly visible again, this time in the form of masked beads or outlier beads.

As with most array technologies, a BeadArray also contains various control probes; besides the actual probes for genes, there are built-in technical control probes to help assess array quality see Illumina Inc. (2010). We can identify the corresponding bead types using the function makeControlProfile(). We can use this function to automatically list and categorize all control beads on the array

Code for identifying controls:

# load the packages 
library(illuminaHumanv2.db)
library(illuminaHumanv3.db)
library(beadarray)
## Controls for BeadArrays Version 2
tab2 <- beadarray::makeControlProfile("Humanv2")  
Annotating control probes using package illuminaHumanv2.db Version:1.26.0
table(tab2$Tag)

             biotin             cy3_hyb high_stringency_hyb        housekeeping 
                  2                   8                   1                  14 
           labeling  low_stringency_hyb            negative 
                  8                   9                 758 
## Controls for BeadArrays Version 3
tab3 <- beadarray::makeControlProfile("Humanv3")
Annotating control probes using package illuminaHumanv3.db Version:1.26.0
table(tab3$Tag)

            biotin            cy3_hyb       housekeeping           labeling 
                 2                  6                  7                  2 
low_stringency_hyb           negative 
                 8                759 

Labeling controls: only high if the activation oligos were added; otherwise ≈ background (near green lines). Low-stringency PM vs MM2: PM > MM2. Cy3 hybridization controls: above background ⇒ good hybridization. Negative controls: define background/noise (green lines); should be low/tight. Positive controls (biotin, housekeepers): clearly above background; large spread or low levels may flag issues.

You can get a quick overview of the expression levels of all control probes using the function makeQCTable. By default, this function calculates the mean (Mean) and the standard deviation (Sd) for each control type.

makeQCTable(exampleBLData)
Annotating control probes using package illuminaHumanv3.db Version:1.26.0
             Mean:biotin Mean:cy3_hyb Mean:housekeeping Mean:labeling
4613710017_B    12.57368     11.49332          13.03082      5.127434
4616494005_A    11.80727     10.71637          12.37393      5.603400
             Mean:low_stringency_hyb Mean:negative Sd:biotin Sd:cy3_hyb
4613710017_B                8.859853      5.312527 0.5521571   2.461711
4616494005_A                6.990128      5.618631 0.6034373   2.580935
             Sd:housekeeping Sd:labeling Sd:low_stringency_hyb Sd:negative
4613710017_B        1.838744   0.5666269              2.100780   0.5985871
4616494005_A        1.372190   1.0209530              1.369125   0.9208625

The function makeQCTable() summarizes Illumina BeadArray control probe intensities for each array. It calculates the mean and standard deviation for control types such as biotin, Cy3 hybridization, housekeeping, labeling, and negative probes. Positive controls (e.g., biotin, Cy3_hyb) should show high signal intensities, whereas negative controls should remain low. Comparing these means across arrays allows for quick assessment of labeling quality, hybridization success, and overall array consistency.

A slightly clearer overview can be obtained through a graphical representation in the form of boxplots, which is generated by the function combinedControlPlot().

Boxplots of the signal intensities of control bead types.

## Create a boxplot overview of all control types
combinedControlPlot(exampleBLData)
Annotating control probes using package illuminaHumanv3.db Version:1.26.0

By adding special oligonucleotides to the processed sample, the labeling controls can be activated. If this is not done—as apparently in the present case—these control bead types show expression at background level, whose boundary region is indicated by three green lines. The low-stringency bead types occur in pairs: “perfect match” (PM) and “mismatch” (MM2). Here, the expression of the PM bead types should be higher than that of the MM2 bead types.

There are also Cy3 hybridization controls that indicate whether the hybridization succeeded. Accordingly, the expression of these bead types should lie above background intensity. Furthermore, one distinguishes between positive and negative controls. In the figure, the negative controls appear as green lines, representing background intensity and background noise. The positive controls should have signal intensities clearly above background. Illumina distinguishes, among others, between housekeepers and biotin bead types. The expression of these special control bead types can also be visualized with the function poscontPlot.

Graphical representation of the signal intensities of the positive controls of an Illumina BeadArray.

## Plot of the positive controls
poscontPlot(exampleBLData, array = 2, colList = mycol[2:3],
            main = "Positive Controls")
Annotating control probes using package illuminaHumanv3.db Version:1.26.0
## Add legend to the plot
legend("topright", legend = c("Housekeeper", "Biotin"), fill = mycol[2:3])

The possibilities for quality control of bead-level data offered by the beadarray package are very diverse and can be controlled and further refined via corresponding parameters of the presented functions.

In many cases, however, one does not have the bead-level data, but only has access to Bead-Summary data. We will describe the quality control of this type of data in more detail below. We will once again use example data from the ‘beadarrayExampleData’ package (Dunning (2011)) and thereby demonstrate the capabilities of the ‘beadarray’ package.

## Load the data
data(exampleSummaryData)
exampleSummaryData
ExpressionSetIllumina (storageMode: list)
assayData: 49576 features, 12 samples 
  element names: exprs, se.exprs, nObservations 
protocolData: none
phenoData
  rowNames: 4613710017_B 4613710052_B ... 4616494005_A (12 total)
  varLabels: sampleID SampleFac
  varMetadata: labelDescription
featureData
  featureNames: ILMN_1802380 ILMN_1893287 ... ILMN_1846115 (49576
    total)
  fvarLabels: ArrayAddressID IlluminaID Status
  fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
Annotation: Humanv3 
QC Information
 Available Slots:  
  QC Items: Date, Matrix, ..., SampleGroup, numBeads
  sampleNames: 4613710017_B, 4613710052_B, ..., 4616443136_A, 4616494005_A

Information about the bead types used on the BeadArrays can be accessed using the function fData.

## Show the first rows (head) of the slot "featureData" (fData)
head(fData(exampleSummaryData))
             ArrayAddressID   IlluminaID  Status
ILMN_1802380          10008 ILMN_1802380 regular
ILMN_1893287          10010 ILMN_1893287 regular
ILMN_1736104          10017 ILMN_1736104 regular
ILMN_1792389          10019 ILMN_1792389 regular
ILMN_1854015          10020 ILMN_1854015 regular
ILMN_1904757          10021 ILMN_1904757 regular
## Absolute frequencies (table) of the bead type status
table(fData(exampleSummaryData)[, "Status"])

                    biotin                    cy3_hyb 
                         2                          2 
cy3_hyb,low_stringency_hyb               housekeeping 
                         4                          7 
                  labeling         low_stringency_hyb 
                         2                          4 
                  negative                    regular 
                       759                      48796 

In the present case, we see that there are a total of 780 bead types that function as controls, and 48,797 bead types that are classified as regular. A first possibility for the quality control of Bead-Summary data is to display the number of beads per bead type in the form of box plots.

## Extract the Green Channel Signal Intensities
exSumData <- channel(exampleSummaryData, "G")
## 2. Boxplot of the Number of Beads per Bead Type
boxplot(exSumData, what = "nObservations")

We see a certain variability in the values, which is mainly caused by the removal of outliers when summarizing the bead-level data to the bead-summary data. Thus, this plot also indirectly indicates the number of outliers in the bead values, and BeadArrays with lower boxes point to a poorer data quality. A further possibility for quality control is to look at the beads according to their status. In the following figures, we will use the log2-transformed Bead-Summary data, since the transition to logarithmic expression values improves the clarity of the displays.

## Box plot of the signal intensities separated/divided by BeadArray and Status
boxplot(exSumData, probeFactor = "Status")

With this figure, we can display the strength of the expression signals (regular) in comparison to the background (negative). If the difference is small, this can also point to quality problems. Finally, we can, of course, also display the expression signals themselves and compare the BeadArrays with one another in this regard.

## Boxplot of the signal intensities divided by BeadArray
boxplot(exSumData)
Warning: Removed 1 row containing non-finite outside the scale range
(`stat_boxplot()`).