17 Oct 2017

Becoming a better data analyst

  1. Crisis in reproducibility
  2. An opportunity and insipration
  3. Willingness to learn and share
  4. Support of an open source community

Crisis and opportunity

An opportunity and inspiration

  • Proteomics data
  • Submitted for publication
  • Analysis not robust enough
  • Set out to learn more….
  • data viz I couldn't do in Excel

Inspiration

Opportunity

Opportunity

My first cluster dendrogram

# make the cluster dendrogram object using the hclust() and dist() functions
hc <- hclust(dist(rbind(P1, P2, P3, P4, P5, P6, P7, P8, P9, P10, P11, P12)))  

# plot the cluster dendrogram 
plot(hc, xlab = "Patient Samples")

Willingness to learn and share

  • My professional society - the Biochemical Society.
  • Suggested a training day in R
  • Then, I was organising it and delivering it
  • Created a blog: R for Biochemists
  • Delivered a training day - June 2015
  • Further training in Cardiff, Germany and Namibia

Open Source Learning & Teaching

Open Source Learning & Teaching

  • R User Group
  • Making packages - talk on 2nd of November
  • twitter: @brennanpcardiff #rstats

Package development

  • A package is a collection of R functions
  • Created and shared with the community
  • Packages are being developed regularly
  • Hosted on github, CRAN, Bioconductor

drawProteins Package

  • Bringing together webscraping and data visualisation
  • Our bodies are made up of cells
  • These cells depend on molecules called proteins
  • Built a package to generate visualistions of these proteins

drawProteins Package - learning continues

  • Testing functions
  • Package coverage
  • Documentation

drawProteins Package - demo

Keratin - an important skin and hair protein

## [1] "Download has worked"

Summary - becoming a better data analyst

  • TIME and PATIENCE
  • Inspiration
  • Reasons to stay committed
  • Open Source Learning & Teaching

Acknowledgements