Computing relative viral levels from COVID-19 test kits using R
Author
John Anderson
Background
A while ago, I saw a (formerly known as) Twitter post where someone had computed COVID levels from rapid-test results. Unfortunately, I can’t find the post, but I recall the person had to convert images of the test results to grayscale, and then compute averages for how dark the band for the control and test areas were.
A few weeks ago, I got COVID, and this seemed like as good a project to try out as any. At first, I tried using imageJ to compute the average values for regions of interest, but this is tedious, and likely could be automated.
Here’s an approach in R that automatically reads in cropped test results from png files in a directory and produces a ratio of the Test (minus background) to Control (minus background) results.
In plain English, higher values of this ratio indicate relatively darker lines for the Test (COVID positive) region than the Control region.
What you need!
A few images of COVID tests - preferably, some will be positive to varying degrees. Here’s an example of a raw image:
Raw photo of COVID test
… and here’s how you’ll need to reformat that image for this script to work.
reformatted photo (C is the right hand bar in ALL cases). The image should be grayscale and cropped tight around the test “well”
If you have at least one of these in a directory called “COVID_PHOTOS”, you can run the script to get values.
Overview of the steps:
The script will attempt to resize all the photos to have the same width & then allow for differences in intial cropping by saving the dimensions of the new image.
The “process image” function then reads in each file, converts it to grayscale, and applies three masks, one for the Test region, one for the Control region, and one for the Background (BG).
Mean values are computed within the masks (images are scaled between 0 and 1) and backgrounds are subtracted from the T and the C values. What this yields, are values for how much darker the test and control regions are than the background regions. In the figure below, the green region corresponds to the mask for control, the yellow for the background, and the pink to the test.
Masked regions for image intensity calculations
Finally, a ratio of T/C is computed, and higher values indicate how much darker (relatively) T is than C.
resize_images_magick <-function(image_paths, width =486) { resized_paths <-character(length(image_paths)) dimensions <-matrix(0, nrow =length(image_paths), ncol =2)for (i inseq_along(image_paths)) { image <-image_read(image_paths[i]) resized_image <-image_scale(image, paste0(width, "x")) resized_path <-paste0("resized_", basename(image_paths[i]))image_write(resized_image, resized_path) resized_paths[i] <- resized_path dimensions[i,] <-c(width, image_info(resized_image)$height) # Get the new height }return(list(paths = resized_paths, dimensions = dimensions))}process_image <-function(image_path, mask_C, mask_T, mask_BG) {# Load the image image <-image_read(image_path)# Convert to grayscale image_gray <-image_convert(image, colorspace ='Gray')# Get the pixel values pixels <-as.numeric(image_data(image_gray)) height <-nrow(mask_C) width <-ncol(mask_C) pixels <-matrix(pixels, nrow = height, ncol = width) # Reshape pixels according to the dimensions# Apply the masks to extract the regions of interest roi_C <- pixels * mask_C roi_T <- pixels * mask_T roi_BG <- pixels * mask_BG# Calculate mean intensity mean_T <-mean(roi_T[roi_T >0]) mean_C <-mean(roi_C[roi_C >0]) mean_BG <-mean(roi_BG[roi_BG >0])# Subtract background corrected_T <- mean_T - mean_BG corrected_C <- mean_C - mean_BG# Calculate ratio ratio <- corrected_T / corrected_Creturn(ratio)}# Example usage for multiple imagesoriginal_image_paths <-list.files("/Users/johnanderson/Downloads/COVID_PHOTOS/", pattern ="\\.(jpg|jpeg|png)$", full.names =TRUE) # Adjust path and extensions as neededresized_images <-resize_images_magick(original_image_paths)resized_image_paths <- resized_images$pathsdimensions <- resized_images$dimensionsratios <-numeric(length(resized_image_paths))# Initialize a data frame to hold the resultsresults <-data.frame(filename =character(), ratio =numeric())# Iterate through the images and apply masks based on their individual dimensionsfor (i inseq_along(resized_image_paths)) { width <- dimensions[i, 1] height <- dimensions[i, 2] mask_C <-matrix(0, nrow = height, ncol = width) mask_T <-matrix(0, nrow = height, ncol = width) mask_BG <-matrix(0, nrow = height, ncol = width)# Define the specific regions for each mask here (update these as needed) mask_C[80:120, 310:350] <-1 mask_T[80:120, 155:195] <-1 mask_BG[80:120, 195:310] <-1# Call the display_masks function to visualize the masks ratio <-process_image(resized_image_paths[i], mask_C, mask_T, mask_BG)# Extract the filename without the extension filename <- tools::file_path_sans_ext(basename(resized_image_paths[i]))# Add the filename and ratio to the results data frame results <-rbind(results, data.frame(filename = filename, ratio = ratio))}# You can now use the results data frame for further analysisprint(results)
If you want to plot these values, you might also want to include days from symptom onset. Here’s an example image for three people. Note that there was a slight increase in viral load around day 10 (this was also when I really started getting interested in the results and testing much more often).
library(readr)library(ggplot2)library(ggprism)COVID <-read_csv("~/Downloads/COVID.csv")COVID <- COVID[complete.cases(COVID$Person), ]ggplot(COVID, aes(x =`Days from symptom onset`, y = New_T_C, colour = Person, shape = Person))+geom_point(size =4)+theme_prism()+scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9"))+geom_smooth(se=FALSE)+geom_hline(yintercept =0, linetype=2, size =1)+ylab("T/C \n How much darker Test is than Control")+facet_wrap(~Person)