Malaria Detector Project

This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.

Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.

Project Overview

This project explores the detection of Plasmodium falciparum infections using two diagnostic methods: microscopy and PCR. Adoption of molecular techniques (PCR) has revealed many low-density, transmissible infections that are often missed by microscopy (submicroscopic infections). The analysis aims to compare detection rates, compute prevalence ratios, and visualize how submicroscopic infections vary across global malaria regions.

Tasks

  • Visualize PCR % vs. Microscopy %
  • Add a 1:1 reference line to compare both techniques
  • Compute the Prevalence Ratio (Microscopy Positives/PCR Positives)
  • Generate boxplots of prevalence ratios across global regions
  • Interpret results to determine which region has the highest density of submicroscopic infections

First, make sure you install your tidyverse package which also includes the readr package - which is a more modern, fast, and consistent way to read tabular data compared to base R.

Loading libraries

library(tidyverse)
library(readr)

Loading Dataset

#Read the dataset
dataset <- read.table(file = "https://raw.githubusercontent.com/HackBio-Internship/public_datasets/main/R/lancet_malaria.txt", header = TRUE, sep = "\t")
malaria_data <- dataset
head(malaria_data)

Renaming of the column names

colnames(malaria_data) <- c("Review Found", "Author", "Title", "Year", "Region","Country","Location", "PCR_N_Tested", "PCR_N_Positive", "PCR_Percent","Microscopy_N_Tested", "Microscopy_N_Positive", "Microscopy_Percent", "Historical_Transmission", "Current_Transmission", "Setting_20", "Setting_15", "Setting_10", "Setting_5", "PCR_Method", "Microscopy_Fields", "Sampling_Season", "Notes")
head(malaria_data)

Visualization of PCR % against microscopy %

plot(malaria_data$PCR_Percent, malaria_data$Microscopy_Percent,
     xlab = "Microscopy %", ylab = "PCR %",
     main = "PCR vs Microscopy Prevalence",
     col = "blue", pch = 19)
abline(0, 1, lty = 2, col = "red")

Prevalence Ratio

malaria_data$Prevalence_Ratio <- malaria_data$Microscopy_N_Positive / malaria_data$PCR_N_Positive
head(malaria_data)

PCR% vs Microscopy% by Region

ggplot(malaria_data, aes(x = Microscopy_Percent, y = PCR_Percent, color = Region)) +
geom_point() + geom_abline(intercept = 0, slope = 1, linetype = "dotted") + facet_wrap(~Region) + labs(title = "PCR% vs Microscopy% by Region",
x = "Microscopy %", y = "PCR %")

Prevalence Ratio by Region

boxplot(Prevalence_Ratio ~ Region, data = malaria_data,
        main = "Prevalence Ratio by Region",
        xlab = "Global Region", ylab = "Prevalence Ratio",
        col = c("lightblue","lightgreen","lightpink","lightyellow"),
        las = 2, notch = TRUE)
abline(h = 1, col = "red", lty = 2) 

According to the boxplot above, West Africa has the highest median prevalence ratio.This suggests microscopy is relatively better at detecting infections compared to other regions.

Prevalence Ratio by Region Using ggplot

ggplot(malaria_data, aes(x = Region, y = Prevalence_Ratio, fill = Region)) +
geom_boxplot(alpha = 0.7) + labs(title = "Prevalence Ratio by Region",
x = "Region", y = "Prevalence Ratio")