This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.
Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.
This project explores the detection of Plasmodium falciparum infections using two diagnostic methods: microscopy and PCR. Adoption of molecular techniques (PCR) has revealed many low-density, transmissible infections that are often missed by microscopy (submicroscopic infections). The analysis aims to compare detection rates, compute prevalence ratios, and visualize how submicroscopic infections vary across global malaria regions.
First, make sure you install your tidyverse package which also includes the readr package - which is a more modern, fast, and consistent way to read tabular data compared to base R.
library(tidyverse)
library(readr)
#Read the dataset
dataset <- read.table(file = "https://raw.githubusercontent.com/HackBio-Internship/public_datasets/main/R/lancet_malaria.txt", header = TRUE, sep = "\t")
malaria_data <- dataset
head(malaria_data)
colnames(malaria_data) <- c("Review Found", "Author", "Title", "Year", "Region","Country","Location", "PCR_N_Tested", "PCR_N_Positive", "PCR_Percent","Microscopy_N_Tested", "Microscopy_N_Positive", "Microscopy_Percent", "Historical_Transmission", "Current_Transmission", "Setting_20", "Setting_15", "Setting_10", "Setting_5", "PCR_Method", "Microscopy_Fields", "Sampling_Season", "Notes")
head(malaria_data)
plot(malaria_data$PCR_Percent, malaria_data$Microscopy_Percent,
xlab = "Microscopy %", ylab = "PCR %",
main = "PCR vs Microscopy Prevalence",
col = "blue", pch = 19)
abline(0, 1, lty = 2, col = "red")
malaria_data$Prevalence_Ratio <- malaria_data$Microscopy_N_Positive / malaria_data$PCR_N_Positive
head(malaria_data)
ggplot(malaria_data, aes(x = Microscopy_Percent, y = PCR_Percent, color = Region)) +
geom_point() + geom_abline(intercept = 0, slope = 1, linetype = "dotted") + facet_wrap(~Region) + labs(title = "PCR% vs Microscopy% by Region",
x = "Microscopy %", y = "PCR %")
boxplot(Prevalence_Ratio ~ Region, data = malaria_data,
main = "Prevalence Ratio by Region",
xlab = "Global Region", ylab = "Prevalence Ratio",
col = c("lightblue","lightgreen","lightpink","lightyellow"),
las = 2, notch = TRUE)
abline(h = 1, col = "red", lty = 2)
According to the boxplot above, West Africa has the highest median prevalence ratio.This suggests microscopy is relatively better at detecting infections compared to other regions.
ggplot(malaria_data, aes(x = Region, y = Prevalence_Ratio, fill = Region)) +
geom_boxplot(alpha = 0.7) + labs(title = "Prevalence Ratio by Region",
x = "Region", y = "Prevalence Ratio")