Theses are examples of collecting data from a set of images and putting them into an array. In this case the images are in a folder called ‘train’ and consist of pairs: one ultrasound picture has one associated mask picture. There are two meythods for handling this: (A) building the array record by record (via the abind package) or (B) preparing an empty array and replace the empty record with data. As we will discover the second method is more efficient.
I used the EBImage package to read the tiff images.
library(EBImage)
# Function to read pairs of tiff files, extract pixel matrices and create an array.
# E.g. an_array[,,"mask", "1_3"] contains the pixel matrix of the 1_3_matrix.tif file.
# Uses the 'EBImage' package.
extractPix <- function(folder, image_name, extension) {
## Concatenate filepath from input variables and read images.
filepath <- paste0(folder, "/", image_name, ".", extension)
filepath.m <- paste0(folder, "/", image_name, "_mask.", extension)
Image <- readImage(filepath)
Mask <- readImage(filepath.m)
## Extract pixel matrices from both images and combine them in one array.
(image_array <- array(
c(as.vector(Image@.Data), as.vector(Mask@.Data)),
dim = c(580, 420, 2, 1),
dimnames = list(
NULL,
NULL,
c("image", "mask"),
image_name
)
))
return(image_array)
}
I used the abind package to bind new arrays to the existing one.
source("functions.R")
library(abind)
## Create vector with names of image files without the extension.
scans <- list.files(path = "train", pattern = ".[0-9]+\\.tif")
scans <- gsub("\\.tif", "", scans)
## Use scans vector to loop through pairs of image and mask files
## to extract pixel data and combine them in one array.
ptm <- proc.time()
arr <- extractPix("train", "1_1", "tif")
for (i in scans[2:length(scans)]) {
new_arr <- extractPix("train", i, "tif")
arr <- abind(arr, new_arr)
}
proc.time() - ptm
## user system elapsed
## 0.584 0.057 0.651
Finished! Now the pixel data of all the image files is stored in one array. If for instance we want to see the matrix belonging to the 1_4 image file (first 6 rows and 20 columns) we can call:
head(arr[,c(1:20),"image","1_4"])
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## [2,] 0.6549020 0.9372549 0.9254902 0.5764706 0.5490196 0.4980392 0.4705882
## [3,] 0.5529412 0.7960784 0.7921569 0.5333333 0.4784314 0.4901961 0.4352941
## [4,] 0.5372549 0.7764706 0.7725490 0.5058824 0.4431373 0.4784314 0.4274510
## [5,] 0.4941176 0.7450980 0.7607843 0.4784314 0.4392157 0.4666667 0.4039216
## [6,] 0.4823529 0.7450980 0.7490196 0.4588235 0.4431373 0.4627451 0.3843137
## [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## [1,] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## [2,] 0.5098039 0.5450980 0.6078431 0.6392157 0.5960784 0.5960784 0.6196078
## [3,] 0.4627451 0.5764706 0.6313725 0.5568627 0.5411765 0.5333333 0.5529412
## [4,] 0.4627451 0.5764706 0.6313725 0.5568627 0.5411765 0.5333333 0.5529412
## [5,] 0.4392157 0.5607843 0.6313725 0.5647059 0.5254902 0.5411765 0.5686275
## [6,] 0.4156863 0.5490196 0.6352941 0.5764706 0.5176471 0.5490196 0.5803922
## [,15] [,16] [,17] [,18] [,19] [,20]
## [1,] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## [2,] 0.6352941 0.6470588 0.6549020 0.6627451 0.6509804 0.6156863
## [3,] 0.5764706 0.5490196 0.6156863 0.6470588 0.6117647 0.5529412
## [4,] 0.5764706 0.5490196 0.6156863 0.6470588 0.6117647 0.5647059
## [5,] 0.5725490 0.5607843 0.6000000 0.6509804 0.6313725 0.6000000
## [6,] 0.5686275 0.5725490 0.5921569 0.6470588 0.6470588 0.6352941
First we generate a vector with all file names. This vector is used as one of name vectors of the array.
## Create vector with names of image files without the extension.
scans <- list.files(path = "train", pattern = ".[0-9]+\\.tif")
scans <- gsub("\\.tif", "", scans)
## Prepare an empty array
ptm <- proc.time()
image_array <- array(
c(numeric(580*420), numeric(580*420)),
dim = c(580, 420, 2, length(scans)),
dimnames = list(
NULL,
NULL,
c("image", "mask"),
scans
)
)
We loop through the vector with image names and fill the array with data from the pixel matrices of both the image and the mask.
for (i in scans[1:length(scans)]) {
filepath <- paste0("train/", i, ".tif")
filepath.m <- paste0("train/", i, "_mask.tif")
Image <- readImage(filepath)
Mask <- readImage(filepath.m)
image_array[,,"image", i] <- as.vector(Image@.Data)
image_array[,,"mask", i] <- as.vector(Mask@.Data)
}
proc.time() - ptm
## user system elapsed
## 0.280 0.035 0.317
The time stamps show that method B is much more efficient than A. This is expecially relevant when working with large image libraries.