if(!require(BiocManager)) install.packages("Biomanager")
if(!require(EBImage)) BiocManager::install("EBImage")
Automating medians from intensity plots
A brief methodology
The purpose of this brief document is to explain the batch extraction of median values (\(\mu\)) from intensity histograms associated with photographic images featuring rasterisation properties.
1 Image data in R
As the sophistication of digitised photography becomes increasingly sophisticated, so does their analysis and subsequent manipulation (see McNamara, 2005). Increasingly quantitative analyses of raw image data are featuring more prominently in the Geographic Sciences (see Xu et al, 2022; Valente et al, 2023); a mutual space where these components meet is in the R programming language.
R is capable of performing sophisticated/cutting-edge statistical analysis on variety of data classifications including rasters converted from conventional images formats such as JPG/JPEG, PNG, GEOTIFF, and BMP.
A powerful package that introduces image data into an R environment is EBImage
(Pau et al, 2010) that is available through the Bioconductor repository.
With the EBImage
package installed, an image can be easily loaded using the readImage(...)
function. However, the batch process is worth going through.
1.1 Batch image upload
In this example, only two images will be loaded from a pre-set working directory. Each image is a JPG so the pattern follows the extension .jpg
.
# Image uploads (batch)
sapply(X = list.files(pattern = "vlc", recursive = TRUE), length) |>
sum() -> file_num
for(i in 1:file_num) {
readImage(files = list.files(pattern = "vlc", recursive = TRUE)[i]) -> img
channel(img, mode = "gray") -> img
}
The metadate for each image is stored in an Image Formal class and is now read for analysis.
2 Intensity distribution analysis
When it comes to grayscale images, possessing monochromatic properties, intensity is interpreted in coefficient terms with values ranging between 0 and 1.
- 0 represents pure black
- 1 represents pure white
An intensity distribution is best represented by a Tukey Histogram, easily implemented in R using the hist(...)
function. It shows the frequency of shades of gray in image(s) relative to their pixel intensity. This gives an analyst an idea of some of the photographic properties such as contrast, lights and darks; a photographer usually manipulates an intensity histograms to achieve equalisation.
In the case of the data that was loaded in Section 1.1, we can readily plot histograms using the Image class object called img
:
hist(img)
img
object returns the final image’s frequency distributionTo extract individual histograms for each image, we alter our ealier code by introducing the hist(...)
function within a for-loop that breaks at the maximum number of images. To return single-channel intensity frequencies, the image is converted to grayscale using the channel(...)
function:
par(mfrow = c(2,2)) # Set parameters to accommodate plots
for(i in 1:file_num) {
readImage(files = list.files(pattern = "vlc", recursive = TRUE)[i]) -> img
hist(
channel(
mode = "gray"
img, xlim = c(0,1)
),
) }
img
objectNaturally, if the task is analyse hundreds of images, histograms will quickly fall out of favour. Therefore, it is efficient to examine differences by scrutinising shifts in each histogram’s (pixel-intensity) median.
2.1 Extracting medians from intensity plots
The process for extracting the median from an object is as denoted by the following equations:
\[ \mu = X \left [ \frac{n+1}{2} \right ] \tag{1}\]
\[ \mu' = \frac{X \left [ \frac{n}{2} \right ] + \left[ \frac{n}{2} + 1 \right ] }{2} \tag{2}\]
Equation 1 calculates the \(\mu\) when a sample is even; Equation 2 is used when a sample is uneven/odd. To distinguish the latter process, the notation \(\mu'\) is used. R’s implementation flexibly adapts according to protocols established by Becker et al (1988). This is evoked using the median(...)
function on the img
data created in Section 1.1.
As a for-loop is introduced, both formulaic processes can augmented to produce the following loop:
\[ \bigcup_{i=1}^{\mu \cdot n}x_i = \mu(1...n) \]
The \(\bigcup\) represents the unified object created by the readImage(...)
function that contains the metadata from \(x_i\) number of images. The process in R (with printed result) is as follows:
for(i in 1:file_num) {
readImage(files = list.files(pattern = "vlc",
recursive = TRUE)[i]) -> img
print(median(img))
}
[1] 0.1843137
[1] 0.1686275
[1] 0.1882353
Logic dictates that an increased median is relative to an increase in lighter gradient tones in an image.
To save the outputs, it is recommended that a data frame be created before hand and then to use the rbind
function to store the outputs. For example
<- data.frame()
new_data
for(i in 1:file_num) {
readImage(files = list.files(pattern = "vlc",
recursive = TRUE)[i]) -> img
print(median(img)) -> results
<- rbind(new_data, results)
new_data }
[1] 0.1843137
[1] 0.1686275
[1] 0.1882353
colnames(new_data) <- "Medians"
# Previewing the analytical result:
print(new_data$Medians)
[1] 0.1843137 0.1686275 0.1882353
2.2 Recommended post-analysis
As we are dealing with medians, it is inferred that the data follows a non-parametric distribution, where it is advisable to scruntise the data in its rank-transformed state. Specifically:
- When dealing with unpaired data, use a Mann-Whitney \(U\) test.
- When dealing with paired data, use a Wilcoxon signed rank test.
- When dealing with multiple ordinal series, use the Kruskal-Wallis \(H\) test.