This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.
Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.
This project explores the detection of Plasmodium falciparum infections using two diagnostic methods: microscopy and PCR. Adoption of molecular techniques (PCR) has revealed many low-density, transmissible infections that are often missed by microscopy (submicroscopic infections). The analysis aims to compare detection rates, compute prevalence ratios, and visualize how submicroscopic infections vary across global malaria regions.
First, make sure you install your tidyverse package which also includes the readr package - which is a more modern, fast, and consistent way to read tabular data compared to base R.
library(tidyverse)
library(readr)
#Read the dataset
dataset <- read.table(file = "https://raw.githubusercontent.com/HackBio-Internship/public_datasets/main/R/lancet_malaria.txt", header = TRUE, sep = "\t")
malaria_data <- dataset
head(malaria_data)
colnames(malaria_data) <- c("Review Found", "Author", "Title", "Year", "Region","Country","Location", "PCR_N_Tested", "PCR_N_Positive", "PCR_Percent","Microscopy_N_Tested", "Microscopy_N_Positive", "Microscopy_Percent", "Historical_Transmission", "Current_Transmission", "Setting_20", "Setting_15", "Setting_10", "Setting_5", "PCR_Method", "Microscopy_Fields", "Sampling_Season", "Notes")
head(malaria_data)
plot(malaria_data$PCR_Percent, malaria_data$Microscopy_Percent,
xlab = "Microscopy %", ylab = "PCR %",
main = "PCR vs Microscopy Prevalence",
col = "blue", pch = 19)
abline(0, 1, lty = 2, col = "red")
malaria_data$Prevalence_Ratio <- malaria_data$Microscopy_N_Positive / malaria_data$PCR_N_Positive
head(malaria_data)
ggplot(malaria_data, aes(x = Microscopy_Percent, y = PCR_Percent, color = Region)) +
geom_point() + geom_abline(intercept = 0, slope = 1, linetype = "dotted") + facet_wrap(~Region) + labs(title = "PCR% vs Microscopy% by Region",
x = "Microscopy %", y = "PCR %")
boxplot(Prevalence_Ratio ~ Region, data = malaria_data,
main = "Prevalence Ratio by Region",
xlab = "Global Region", ylab = "Prevalence Ratio",
col = c("lightblue","lightgreen","lightpink","lightyellow"),
las = 2, notch = TRUE)
abline(h = 1, col = "red", lty = 2)
According to the boxplot above, West Africa has the highest median prevalence ratio.This suggests microscopy is relatively better at detecting infections compared to other regions.
ggplot(malaria_data, aes(x = Region, y = Prevalence_Ratio, fill = Region)) +
geom_boxplot(alpha = 0.7) + labs(title = "Prevalence Ratio by Region",
x = "Region", y = "Prevalence Ratio")
The boxplot of prevalence ratios across global regions highlights notable differences in the burden of submicroscopic Plasmodium falciparum infections. South America exhibits the lowest median prevalence ratio, indicating that microscopy detects relatively few infections compared to PCR, and suggesting a high prevalence of submicroscopic infections.
In contrast, Asia & Oceania and East Africa show intermediate prevalence ratios, consistent with a moderate burden of submicroscopic infections. West Africa demonstrates the highest prevalence ratio, implying that microscopy performs comparatively well in this region and that submicroscopic infections are less common relative to other regions.
These findings underscore the critical role of molecular diagnostics, such as PCR, in accurately assessing the true malaria burden, particularly in regions like South America where submicroscopic infections are highly prevalent.