Description

Theses are examples of collecting data from a set of images and putting them into an array. In this case the images are in a folder called ‘train’ and consist of pairs: one ultrasound picture has one associated mask picture. There are two meythods for handling this: (A) building the array record by record (via the abind package) or (B) preparing an empty array and replace the empty record with data. As we will discover the second method is more efficient.

Step A1: Create a function that takes a pair of images and puts them into an array.

I used the EBImage package to read the tiff images.

library(EBImage)

# Function to read pairs of tiff files, extract pixel matrices and create an array.
# E.g. an_array[,,"mask", "1_3"] contains the pixel matrix of the 1_3_matrix.tif file.
# Uses the 'EBImage' package.
extractPix <- function(folder, image_name, extension) { 

  ## Concatenate filepath from input variables and read images.  
  filepath <- paste0(folder, "/", image_name, ".", extension)
  filepath.m <- paste0(folder, "/", image_name, "_mask.", extension)
  Image <- readImage(filepath)
  Mask <- readImage(filepath.m)
  
  ## Extract pixel matrices from both images and combine them in one array.
  (image_array <- array(
    c(as.vector(Image@.Data), as.vector(Mask@.Data)),
    dim = c(580, 420, 2, 1),
    dimnames = list(
      NULL,
      NULL,
      c("image", "mask"),
      image_name
    )
  ))
  
 return(image_array)
}

Step A2: Loop through the files in the folder and add pixel data pairwise to the array.

I used the abind package to bind new arrays to the existing one.

source("functions.R")
library(abind)

## Create vector with names of image files without the extension.
scans <- list.files(path = "train", pattern = ".[0-9]+\\.tif")
scans <- gsub("\\.tif", "", scans)

## Use scans vector to loop through pairs of image and mask files
## to extract pixel data and combine them in one array.
ptm <- proc.time()
arr <- extractPix("train", "1_1", "tif")

for (i in scans[2:length(scans)]) {
  new_arr <- extractPix("train", i, "tif")
  arr <- abind(arr, new_arr)
  
}
proc.time() - ptm
##    user  system elapsed 
##   0.584   0.057   0.651

Finished! Now the pixel data of all the image files is stored in one array. If for instance we want to see the matrix belonging to the 1_4 image file (first 6 rows and 20 columns) we can call:

head(arr[,c(1:20),"image","1_4"])
##           [,1]      [,2]      [,3]      [,4]      [,5]      [,6]      [,7]
## [1,] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## [2,] 0.6549020 0.9372549 0.9254902 0.5764706 0.5490196 0.4980392 0.4705882
## [3,] 0.5529412 0.7960784 0.7921569 0.5333333 0.4784314 0.4901961 0.4352941
## [4,] 0.5372549 0.7764706 0.7725490 0.5058824 0.4431373 0.4784314 0.4274510
## [5,] 0.4941176 0.7450980 0.7607843 0.4784314 0.4392157 0.4666667 0.4039216
## [6,] 0.4823529 0.7450980 0.7490196 0.4588235 0.4431373 0.4627451 0.3843137
##           [,8]      [,9]     [,10]     [,11]     [,12]     [,13]     [,14]
## [1,] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## [2,] 0.5098039 0.5450980 0.6078431 0.6392157 0.5960784 0.5960784 0.6196078
## [3,] 0.4627451 0.5764706 0.6313725 0.5568627 0.5411765 0.5333333 0.5529412
## [4,] 0.4627451 0.5764706 0.6313725 0.5568627 0.5411765 0.5333333 0.5529412
## [5,] 0.4392157 0.5607843 0.6313725 0.5647059 0.5254902 0.5411765 0.5686275
## [6,] 0.4156863 0.5490196 0.6352941 0.5764706 0.5176471 0.5490196 0.5803922
##          [,15]     [,16]     [,17]     [,18]     [,19]     [,20]
## [1,] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## [2,] 0.6352941 0.6470588 0.6549020 0.6627451 0.6509804 0.6156863
## [3,] 0.5764706 0.5490196 0.6156863 0.6470588 0.6117647 0.5529412
## [4,] 0.5764706 0.5490196 0.6156863 0.6470588 0.6117647 0.5647059
## [5,] 0.5725490 0.5607843 0.6000000 0.6509804 0.6313725 0.6000000
## [6,] 0.5686275 0.5725490 0.5921569 0.6470588 0.6470588 0.6352941

Step B1: Prepare an empty array.

First we generate a vector with all file names. This vector is used as one of name vectors of the array.

## Create vector with names of image files without the extension.
scans <- list.files(path = "train", pattern = ".[0-9]+\\.tif")
scans <- gsub("\\.tif", "", scans)

## Prepare an empty array
ptm <- proc.time()
image_array <- array(
  c(numeric(580*420), numeric(580*420)),
  dim = c(580, 420, 2, length(scans)),
  dimnames = list(
    NULL,
    NULL,
    c("image", "mask"),
    scans
  )
)

Step B2: Fill the empty array with data.

We loop through the vector with image names and fill the array with data from the pixel matrices of both the image and the mask.

for (i in scans[1:length(scans)]) {
filepath <- paste0("train/", i, ".tif")
filepath.m <- paste0("train/", i, "_mask.tif")
Image <- readImage(filepath)
Mask <- readImage(filepath.m)

image_array[,,"image", i] <- as.vector(Image@.Data)
image_array[,,"mask", i] <- as.vector(Mask@.Data)
}
proc.time() - ptm
##    user  system elapsed 
##   0.280   0.035   0.317

The time stamps show that method B is much more efficient than A. This is expecially relevant when working with large image libraries.