Capstone: WordMinder

Rony M.C.
12/30/17

Introduction

  • State of the art NLP word prediction system

  • High performance prediction in the blink of an eye

  • High accuracy dictionary built from 1000s of blogs/tweets/news

Design - ngram frequency dictionary

  • The project comprised of multiple stages from understanding the problem & exploring the data to building a Shiny web app with a predictive text mining algorithm
  • Samples were combined from english news/tweets/blogs
  • These were preprocessed to remove profanities, punctuations, etc. and finally served as the corpus for the application
  • The corpus was tokenized to construct a ngram frequency data dictionary
  • This phase was iterated multiple times until a high accuracy yet nimble ngram dictionary was obtained

Design - Shiny App

  • The shiny app is a simple web page with one input textbox

  • Uncluttered design instantly returns a predicted word upon user text key-in

  • Responsive Markov model based on a stupid backoff algorithm

    (i) Frequency dictionary consists of ngrams of upto 4 words
    (ii) Search for first 3 words and predict the 4th word if matched. If no match,
    (iii) Search for the first 2 words and predict the 3rd word if matched. If no match,… and so on until,
    (iv) Predict the most common words from the 1 gram data

Instructions: Try WordMinder now!

alt text