Data Science Capstone Final Project

Ola Lie
May 10th, 2016

  • a 5 slide deck in this file (R Presentation)
  • a 5 slide deck on GitHub (Slidify)

EXPLORE

The content in the provided texts is explored in this Milestone Report

Trigram

ALGORITHM

  1. Create corpus and clean data with tm
  2. Create bi-, tri- and tetragrams with RWeka
  3. In server.R (shiny)
    • Strip user input to last three (,two, one) words
    • Search first three words of tetragrams
    • If no matches, search first two words of trigrams
    • If no matches, search first word of bigrams
    • Calculate percentages for matches

Word Colud

PERFORMANCE

Less than
five seconds
response time

The first search
might take a bit longer
when the app is awakening

Hourglass

TRY IT OUT