Illustrate the number of human genomes sequenced in major genomics projects since 2003. This plot was part of my Data Science in Context presentation.
Data were obtained from https://www.yourgenome.org/theme/timeline-history-of-genomics/ and allofus.nih.gov/about/program-overview/what-makes-all-us-different.
Since there were only a few data points, I created a dataframe manually.
projects <- data.frame(
year = c(2003, 2015, 2018, 2028),
n_genomes = c(0, 3, 5, 6)
)
ggplot(projects, aes(x = year, y = n_genomes)) +
geom_point(color = "darkblue", size = 3) +
annotate("text", x = 2008, y = 0, label = "Human Genome Project") +
annotate("text", x = 2020, y = 3, label = "1000 Genomes Project") +
annotate("text", x = 2012, y = 5, label = "100K Genomes Project (UK)") +
annotate("text", x = 2028, y = 6.5, label = "All of Us", fontface = 2) +
geom_smooth(method = 'lm', se = FALSE, color = "lightgray", linetype = "dashed") +
xlim(2000, 2030) +
scale_y_continuous(breaks = seq(0, 6)) +
xlab("Completion date (year)") +
ylab(bquote(bold(log[10](genomes)))) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.text = element_text(size = 12),
axis.title = element_text(face = "bold"),
axis.line = element_line(color = "black"),
panel.background = element_blank()
)