Next Word

How to predict next world based on N-gram algorithm

Vojko
Data Science Specialization Participant

Introduction: "Next Word" Web App

  • App is trained on MODERN data sets (twitter, blog posts and news)
  • Next Word app is based on N-gram model.
  • "Next Word" is easily translatable to other languages
  • "Next Word" provides multiple predictions ordered by probability
  • "Next Word"" provides FREE, simple and efficient UI for predicting next word as you type
  • It is available 24/7 on NextWordCapstone URL.

User Interface

UI

Algorithm description

  • Preprocessing using library(tm):
    • removing non-textual char (regex)
    • stripping punctuations, numbers, whitespaces
    • conversion to lover cases
  • Tokenization and DocumentTermMatrix (DTM) creation
    • using library(RWeka)
    • 3-grams selected as optimal
    • removed sparsed items for size/speed optimization
  • DTM exported as "R data frame""
    • calculated non-normalized probability based on DTM
    • last (0:2) words from input text searched (regex)
    • suggestions ordered by probability of occurence

Thank You!

  • Contact author for "NextWordCapstone Enterprise Edition" ;)