Identifying Tumor Growth Drivers in Biomedical Text

Nathan Byers
October 26, 2017

Motivation

Problem

  • Precision medicine uses genetic testing to taylor cancer treatment
  • A sequenced tumor could have thousands of mutations
  • Which mutations drive tumor growth and which don't

driver http://www.pancreaticcancer.net.au/research-genomics/

Problem

  • How is this currently handled?
  • Clinical pathologist manually reviews and classifies each mutation by searching text in biomedical literature
  • This is a bottleneck
  • Can it be automated?

My Project

  • Identify diseases (DNorm) and mutations (tmVar) in the Kaggle training text using the National Center for Biotechnology Information API
  • Create features
    • Frequency of mutation
    • Distance from mutation to disease name
    • Distance from other mutations
    • Sentiment of text

My Project

  • Apply several machine learning algorithms
    • Decision Tree
    • Neural Network
    • Support Vector Machine
  • Score on the test text and see which is most accurate

References

  • Doughty E, Kertesz-Farkas A, Bodenreider O, et al. “Toward an automatic method for extracting cancer- and other disease-related point mutations from the biomedical literature.” Bioinformatics. 2011 27(3):408–415

  • Singhal A, Simmons M, and Lu Z. “Text mining for precision medicine: automating disease-mutation relationship extraction from biomedical literature.” Journal of the American Medical Informatics Association 2016 23: 766-772

  • Leaman R, Doğan RI, Lu Z. “DNorm: disease name normalization with pairwise learning to rank.” Bioinformatics 2013 29(22):2909–2917

  • Wei C-H, Harris BR, Kao H-Y, Lu Z. “tmVar: a text mining approach for extracting sequence variants in biomedical literature.” Bioinformatics 2013 29(11):1433–1439