How test() and check() showed my bad code

Dr Paul Brennan

logo

2 November 2017

R for Biochemists

drawProteins

My aim was to create a package that would allow visualisation of proteins given data from Uniprot website.

Workflow:

  • Get data using UniProt API
  • Turn data from JSON into dataframe
  • Visualise using ggplot2

drawProteins - demo - step 1

library(magrittr)
library(drawProteins)
library(httr)
library(ggplot2)

# accession numbers of hair keratin
"Q14533 P19012 P02538" %>%
  drawProteins::get_features() ->
  protein_json
[1] "Download has worked"

drawProteins - demo - step 2

# turn JSON object into a dataframe
protein_json %>%
  drawProteins::feature_to_dataframe() ->
  prot_data

drawProteins - demo - step 3

# series of functions to visualise
prot_data %>%
  geom_chains() %>%
  geom_domains() %>%
  geom_region %>%
  geom_motif %>%
  geom_phospho(size = 8) -> p

p <- p + theme_bw(base_size = 20) +  # white background and change text size
  theme(panel.grid.minor=element_blank(),
        panel.grid.major=element_blank()) +
  theme(axis.ticks = element_blank(),
        axis.text.y = element_blank()) +
  theme(panel.border = element_blank())

drawProteins - output

plot of chunk code_demo_4

drawProteins - works - let's make a package

New concepts

First Build - unsuccessful (of course)

Finally with a bit of effort, managed to bring devtools::check() and Travis CI into parallel

No ERRORS but WARNINGS and NOTES

  • missing documentation entries … WARNING
  • Extensions’ manual.checking installed package size … NOTE

Learning curve

How to be a good documentor

Becoming a better tester

drawProteins - Bioconductor

Bioconductor has more checks than CRAN

  • I want drawProteins to work with other Bioconductor packages
  • Style guide for Vignette (add bioc_required: true to .travis.yml)
  • BiocCheck() package
  • ERROR: At least 80% of man pages documenting exported objects must have runnable examples.

Notes

devtools::check() shows: 0 errors | 0 warnings | 1 note

  • I can't ignore this any more
  • This is really what showed my bad code
  • geom_domains: no visible binding for global variable ‘prot_data’
  • I didn't really understand this warning…
  • I'd ignored it… just a NOTE….

Global variables... geom_chains...

# show the function
geom_chains <- function(prot_data = prot_data,
                        outline = "black",
                        fill = "grey",
                        label_chains = TRUE,
                        labels = prot_data[prot_data$type == "CHAIN",]$entryName,
                        size = 0.5,
                        label_size = 4){

    p <-ggplot2::ggplot() +
    ggplot2::ylim(0.5, max(prot_data$order)+0.5) +

Passing some arguments in e.g. prot_data BUT not all of them…

Global variables... geom_domains

geom_domains <- function(p,
                         label_domains = TRUE,
                         label_size = 4){
  p <- p +
    ggplot2::geom_rect(data= prot_data[prot_data$type == "DOMAIN",],
            mapping=ggplot2::aes(xmin=begin,

Clearer here: prot_data and begin are NOT passed in to the function so R has to find it…

This is bad code....ARGHH!!!

At the CaRdiff User Group

  • we discussed variable scope
  • but until I really addressed this NOTE
  • I didn't really understand it!!!

I think I better think it out again!!!

Options

  • Pass in each of the variables…
  • Learn how to allow some sort of inheriting of variables
  • Make a proper ggplot2 extension
  • Go to the pub and forget about coding completely!!!

Acknowledgements

Contacts