May 19, 2017

Introduction

How to use the App

  • Type your phrase under the field "Type your sentence below:"
  • Wait for few seconds and the word will appear under the field "Predicted word:"

How the App works in the background?

  • The app will load 5 n-gram data derived from the cleaned sample data (0.1% from the corpora)
  • The N-gram representation of a text lists all N-tuples of words that appear. The simplest case is the unigram (1 word), followed by bigram (2 words), trigram (3 words), fourgram (4 words), and fivegram (5 words).
  • The n-gram data would be converted into frequency table by phrase respectively.
  • The algorithm will first clean the typed phrase and start predict by looking up the highest frequency from the matched phrase in the fivegrams frequency table. If the phrase does not match against the typed phrase, then it will start to look from fourgrams frequency table, trigrams frequency table, bigrams frequency table, and lastly unigrams frequency table

Consideration

  • The prediction app only built based on sample data of 0.1% from the corpora (Twitter, News and Blog)
  • We need to balance up between Shiny Server performance and prediction accuracy. Higher sample will gives better accruacy at the expenses of longer processing time.
  • As data scientist, we always have to work around with available resources to analyse the data.