29 września 2017 | Konferencja Why R?

Working with image data in R

Package imager magick EBImage
Repository CRAN CRAN Bioconductor
Released 2015 2016 2006
Maintainer Simon Barthelme Jeroen Ooms Andrzej Oleś
System deps
  • cairo
  • fftw3
  • libjpeg
  • libpng
  • libtiff
  • ImageMagick++
  • libcurl
  • fftw3
  • libjpeg
  • libpng
  • libtiff

What is Bioconductor?

World’s largest Bioinformatics project (est. 2001)

  • Analysis and comprehension of high-throughput genomic data
  • Open source, open development
  • > 20,000 papers in PubMed Central

Primarily a software repository

  • 1383 R packages

Additionally

  • Data and repository
  • Publisher of supplementary materials
  • Bioinformatics support forum
  • Tutorials and instructional documentation

Motivating principles

Provide a compelling user experience

  • Package documentation (vignettes)
  • Workflows

Turn users into developers

  • Training on software development & programming paradigms
  • Distributed development by domain experts

Scientific software vs. scientific publications

  • Reproducible
  • Open to peer-review
  • Easy to access by other researchers & society
  • Builds on the work of others

EBImage

Image processing and analysis toolbox for R

  • reading and writing of image files
  • image manipulation, transformation and filtering
  • object detection and feature extraction
  • interactive image viewer


Since Bioconductor 1.8 (2006)

Original developers:

Oleg Sklyar
Wolfgang Huber
Mike Smith
Gregoire Pau

Contributors:

Joseph Barry
Bernd Fischer
Ilia Kats
Philip A. Marais

Let's get started!

library("EBImage")

img <- readImage(system.file("images", "sample.png", package="EBImage"))
display(img)

Reading and displaying images

Reading images

  • local files or URLs
  • supported file formats: JPEG, PNG and TIFF
For proprietary microscopy image data and metadata use
aoles/RBioFormats (148 formats)

Displaying images

  • interactive JavaScript viewer
  • R's build-in plotting device

Adding text labels and saving images

display(img, method = "raster")
text(x = 20, y = 20, label = "Parrots", adj = c(0,1), col = "orange", cex = 3)

dev.print(png, filename = "img.png", width = dim(img)[1], height = dim(img)[2])
writeImage(img, "img.jpeg", quality = 85)
files <- list.files(pattern = "image\\.")
data.frame(row.names = files, size = file.size(files))
##              size
## image.jpeg  49688
## image.png  361131

Image data representation

img
## Image 
##   colorMode    : Grayscale 
##   storage.mode : double 
##   dim          : 768 512 
##   frames.total : 1 
##   frames.render: 1 
## 
## imageData(object)[1:5,1:6]
##           [,1]      [,2]      [,3]      [,4]      [,5]      [,6]
## [1,] 0.4470588 0.4627451 0.4784314 0.4980392 0.5137255 0.5294118
## [2,] 0.4509804 0.4627451 0.4784314 0.4823529 0.5058824 0.5215686
## [3,] 0.4627451 0.4666667 0.4823529 0.4980392 0.5137255 0.5137255
## [4,] 0.4549020 0.4666667 0.4862745 0.4980392 0.5176471 0.5411765
## [5,] 0.4627451 0.4627451 0.4823529 0.4980392 0.5137255 0.5411765
str(img)
## Formal class 'Image' [package "EBImage"] with 2 slots
##   ..@ .Data    : num [1:768, 1:512] 0.447 0.451 0.463 0.455 0.463 ...
##   ..@ colormode: int 0

Image data representation

Multi-dimensional pixel intensity arrays 

  • (x, y)
  • (x, y, z) z-stack
  • (x, y, t) time-lapse
  • (x, y, c) channels
  • (x, y, c, z, t, …)

Manipulating images

Algebraic operations: \(\alpha + \beta x^{\gamma}\)

x

x

x + 0.3

x + 0.3

2 * x

2 * x

x^2

x^2

1 -- x

1 – x

Spatial transformations

x[99:199,60:159]

x[99:199,60:159]

rotate(x, 30)

rotate(x, 30)

resize(x, 192)

resize(x, 192)

flop(x)

flop(x)

Image filtering

2D linear convolution (using FFT)
original

original

low-pass

low-pass

high-pass

high-pass

Median filter [1]
x

x

medianFilter(x)

medianFilter(x)

[1] S. Perreault and P. Hebert (2007) Median Filtering in Constant Time, IEEE Trans Image Process 16 (9)

Morphological operations

Non-linear filtering of (binary) images

  • erosion/dilation: for every pixel, put the mask around it, and set it to the min/max value covered by the mask
  • opening: erosion followed by dilation
  • closing: dilation followed by erosion
x <- readImage("images/leaf.png")
k <- makeBrush(size = 3)
x

x

erode(x, k)

erode(x, k)

dilate(x, k)

dilate(x, k)

opening(x, k)

opening(x, k)

closing(x, k)

closing(x, k)

[1] E. R. Urbach and M.H.F. Wilkinson (2008) Efficient 2-D grayscale morphological transformations with arbitrary flat structuring elements, IEEE Trans Image Process 17 (1), 1-8

Image segmentation

Thresholding
x

x

x > otsu(x)

x > otsu(x)

thresh(x)

thresh(x)

Object identification
bwlabel

bwlabel

watershed

watershed

propagate[1]

propagate[1]

[1] T. Jones, A. Carpenter and P. Golland (2005) Voronoi-Based Segmentation of Cells on Image Manifolds, CVBIA05

Automated cellular phenotyping

Using multivariate analysis methods we can:

  • detect cell subpopulations (clustering)
  • classify cells into pre-defined cell types or phenotypes (classification)
  • based on the frequencies of the subpopulations compare different biological conditions


Input images

Fluorescent microscopy images of perturbed HeLa cells [1]

dna <- readImage("nuclei.tif")
tub <- readImage("cells.tif")
rgb <- rgbImage(green=1.5*tub, blue=dna)
dna

dna

tub

tub

rgb

rgb

[1] Fuchs, F. et al. Clustering phenotype populations by genome-wide RNAi and multiparametric imaging. Molecular Systems Biology 6 (2010).

Nuclei segmentation

  • Adaptive thresholding of the DNA channel
  • Morphological opening and filling of holes
  • Distance map computation and watershed transformation
nmaskt <- thresh(dna, w=15, h=15, offset=0.05)
nmaskf <- opening(nmaskt, makeBrush(5, shape='disc'))
nmask  <- watershed( distmap(nmaskf) )
nmaskt

nmaskt

nmaskf

nmaskf

colorLabels(nmask)

colorLabels(nmask)

Cytoplasm segmentation

Identification of cell bodies by Voronoi partitioning using the segmented nuclei as seeds

cmaskt <- closing( gblur(tub, 1) > 0.105, makeBrush(5, shape='disc') )
cmask  <- propagate(tub, seeds=nmask, mask=cmaskt, lambda=0.001)
segmented <- paintObjects(cmask, rgb, col="magenta")
segmented <- paintObjects(nmask, segmented, col="yellow")
cmaskt

cmaskt

colorLabels(cmask)

colorLabels(cmask)

segmented

segmented

Individual cells stacked

display(stackObjects(cmask, rgb), all = TRUE, nx = 11)

Feature extraction

Quantitative cellular descriptors

  • shape characteristics (area, perimeter, radius)
  • moments (center of mass, eccentricity, …)
  • basic statistics on pixel intensities
  • Haralick[1] textural features
head( computeFeatures.shape(cmask, tub) )
##   s.area s.perimeter s.radius.mean s.radius.sd s.radius.min s.radius.max
## 1   4216         408      43.52530   14.254634     16.13319     73.48543
## 2   3079         263      34.60167    9.061043     19.14618     53.49633
## 3   3088         211      30.99363    4.630083     16.81697     38.91545
## 4   2377         209      29.37053    8.030861     16.27813     44.85961
## 5   2643         277      34.75310   13.817002     15.03601     62.17141
## 6   2095         181      26.82256    6.948619     16.26868     40.91585

[1] R M Haralick, K Shanmugam and Its'Hak Deinstein (1979) Textural Features for Image Classification, IEEE Transactions on Systems, Man and Cybernetics.

Feature extraction

Quantitative cellular descriptors

  • shape characteristics (area, perimeter, radius)
  • moments (center of mass, eccentricity, …)
  • basic statistics on pixel intensities
  • Haralick[1] textural features

Multivariate phenotypic landscape


Each of the 1820 nodes represents a perturbation and is characterized by a phenoprint derived from phenotypic profiles, comprising relative proportions of each cell class and population summaries of cellular descriptors. Groups of nodes with small distances form clusters and are colored according to their most prominent cell subpopulations.

Automated high-content screening analysis

Boutros, Bras, Huber, Genome Biol. 2006
Fuchs, Pau et al., Mol. Sys. Biol. 2010
Neumann et al., Nature 2010
Kuttenkeuler et al., J. Innate Imm. 2010
Axelsson et al., BMC Bioinf. 2011
Horn et al., Nature Methods 2011

Khmelinskii et al., Nature Biotech. 2012
Laufer, Fisher et al., Nature Methods 2013
Donà et al., Nature 2013
Pau et al., BMC Bioinformatics 2013
Laufer et al., Nature Protocols 2014
Fischer, Horn, Billmann et al. eLife 2015

Acknowledgments