Automating medians from intensity plots

A brief methodology

Author

Richard Timmerman

Abstract

The purpose of this brief document is to explain the batch extraction of median values (\(\mu\)) from intensity histograms associated with photographic images featuring rasterisation properties.

1 Image data in R

As the sophistication of digitised photography becomes increasingly sophisticated, so does their analysis and subsequent manipulation (see McNamara, 2005). Increasingly quantitative analyses of raw image data are featuring more prominently in the Geographic Sciences (see Xu et al, 2022; Valente et al, 2023); a mutual space where these components meet is in the R programming language.

R is capable of performing sophisticated/cutting-edge statistical analysis on variety of data classifications including rasters converted from conventional images formats such as JPG/JPEG, PNG, GEOTIFF, and BMP.

A powerful package that introduces image data into an R environment is EBImage (Pau et al, 2010) that is available through the Bioconductor repository.

if(!require(BiocManager)) install.packages("Biomanager")
if(!require(EBImage)) BiocManager::install("EBImage")

With the EBImage package installed, an image can be easily loaded using the readImage(...) function. However, the batch process is worth going through.

1.1 Batch image upload

In this example, only two images will be loaded from a pre-set working directory. Each image is a JPG so the pattern follows the extension .jpg.

# Image uploads (batch)
sapply(X = list.files(pattern = "vlc", recursive = TRUE), length) |>
  sum() -> file_num

for(i in 1:file_num) {
  readImage(files = list.files(pattern = "vlc", recursive = TRUE)[i]) -> img
  channel(img, mode = "gray") -> img
}

The metadate for each image is stored in an Image Formal class and is now read for analysis.

2 Intensity distribution analysis

When it comes to grayscale images, possessing monochromatic properties, intensity is interpreted in coefficient terms with values ranging between 0 and 1.

  • 0 represents pure black
  • 1 represents pure white

An intensity distribution is best represented by a Tukey Histogram, easily implemented in R using the hist(...) function. It shows the frequency of shades of gray in image(s) relative to their pixel intensity. This gives an analyst an idea of some of the photographic properties such as contrast, lights and darks; a photographer usually manipulates an intensity histograms to achieve equalisation.

In the case of the data that was loaded in Section 1.1, we can readily plot histograms using the Image class object called img:

hist(img)

Figure 1: Running the hist function on the img object returns the final image’s frequency distribution

To extract individual histograms for each image, we alter our ealier code by introducing the hist(...) function within a for-loop that breaks at the maximum number of images. To return single-channel intensity frequencies, the image is converted to grayscale using the channel(...) function:

par(mfrow = c(2,2)) # Set parameters to accommodate plots

for(i in 1:file_num) {
  readImage(files = list.files(pattern = "vlc", recursive = TRUE)[i]) -> img 
  hist(
    channel(
      img, mode = "gray"
      ), xlim = c(0,1)
  )
}

Figure 2: Histograms for each of the images stored in the img object

Figure 3: The respective images in sequence. The subject matter are three stills of a video of a maturing butternut squash captured using the VideoLAN Clime snapshot feature. Source: the author

Naturally, if the task is analyse hundreds of images, histograms will quickly fall out of favour. Therefore, it is efficient to examine differences by scrutinising shifts in each histogram’s (pixel-intensity) median.

2.1 Extracting medians from intensity plots

The process for extracting the median from an object is as denoted by the following equations:

\[ \mu = X \left [ \frac{n+1}{2} \right ] \tag{1}\]

\[ \mu' = \frac{X \left [ \frac{n}{2} \right ] + \left[ \frac{n}{2} + 1 \right ] }{2} \tag{2}\]

Equation 1 calculates the \(\mu\) when a sample is even; Equation 2 is used when a sample is uneven/odd. To distinguish the latter process, the notation \(\mu'\) is used. R’s implementation flexibly adapts according to protocols established by Becker et al (1988). This is evoked using the median(...) function on the img data created in Section 1.1.

As a for-loop is introduced, both formulaic processes can augmented to produce the following loop:

\[ \bigcup_{i=1}^{\mu \cdot n}x_i = \mu(1...n) \]

The \(\bigcup\) represents the unified object created by the readImage(...) function that contains the metadata from \(x_i\) number of images. The process in R (with printed result) is as follows:

for(i in 1:file_num) {
  readImage(files = list.files(pattern = "vlc",
                               recursive = TRUE)[i]) -> img 
  print(median(img))
}
[1] 0.1843137
[1] 0.1686275
[1] 0.1882353
Note

Logic dictates that an increased median is relative to an increase in lighter gradient tones in an image.

To save the outputs, it is recommended that a data frame be created before hand and then to use the rbind function to store the outputs. For example

new_data <- data.frame()

for(i in 1:file_num) {
  readImage(files = list.files(pattern = "vlc",
                               recursive = TRUE)[i]) -> img 
  print(median(img)) -> results
  new_data <- rbind(new_data, results)
}
[1] 0.1843137
[1] 0.1686275
[1] 0.1882353
colnames(new_data) <- "Medians"

# Previewing the analytical result:
print(new_data$Medians)
[1] 0.1843137 0.1686275 0.1882353

2.2 Recommended post-analysis

As we are dealing with medians, it is inferred that the data follows a non-parametric distribution, where it is advisable to scruntise the data in its rank-transformed state. Specifically:

  • When dealing with unpaired data, use a Mann-Whitney \(U\) test.
  • When dealing with paired data, use a Wilcoxon signed rank test.
  • When dealing with multiple ordinal series, use the Kruskal-Wallis \(H\) test.

References

Becker, R.A., Chambers, J.M. and Wilks, A.R. (1988) The new s, Language.
McNamara, G. (2005) Color balancing histology images for presentations and publication, Journal of Histotechnology, 28(2), 81–88.
Pau, G., Fuchs, F., Sklyar, O., et al (2010) EBImage—an r package for image processing with applications to cellular phenotypes, Bioinformatics, 26(7), 979–981, doi: 10.1093/bioinformatics/btq046.
Valente, T., Ventura, D., Matiddi, M., et al (2023) Image processing tools in the study of environmental contamination by microplastics: Reliability and perspectives, Environmental Science and Pollution Research, 30(1), 298–309.
Xu, L., Ma, H. and Wang, Z. (2022) Soil salinity and soil water content estimation using digital images in coastal field: A case study in yancheng city of jiangsu province, china, Chinese Geographical Science, 32(4), 676–685.