Predict Next Word - Capstone Project

Omer Shechter

25 January 2019

Project Summary

  • This project is about understanding and building a predictive text model
  • The goal is to use Text Data set from Blogs, News and Twitter and create an NLP Model
  • The outcome is to get a sentence or a string and predict the next word
  • The Last step is to build a data Product using Shiny APP that will illustrate the Prediction

title

The Algorithm

  • The Algorithm used is Stupid Backoff
  • All data is used (except from Twitter data - 85%)
  • Data cleaning actions were done (e.g. Remove symbols, Numbers, Punctuations)
  • 1:5 sets of Ngrams were created
  • Data was trimmed based on frequency to save memory space
  • The following R packages were used: quanteda, data.table, qdap
  • Stupid Backoff briefly explained below :

title

Next Word Predict - The Application

  • The Application is a Shiny Application
  • Type a sentence in the slide bar (at the left)
  • Push the Predict button
  • The next word will be shown in the blue Window
  • 4 Lower Priority options are displayed in the green window

title