SWIFTKEY wordpredict

generalinsight
January 01, 2018

NLP Next Word Prediction Model

Building a swiftkey like next word prediction model using web data aggregated from various sources.

The model to be hosted in a shiny app.

Below are some highlights

  • Large datasets from blogs, news and twitter loaded
  • The datasets cleaned and normalized using various NLP techniques
  • N-grams of various lengths extracted and used as the prediction model
  • Algorithm built to predict the next word or words as inputs
  • Testing done to finetune accuracy and speed

wordpredict Algorithm

Data gathered from various web sources were cleanded as below

  • lowercase conversion
  • removing numbers
  • removing english common stopwords
  • removing punctuation
  • eliminating extra white space

Algorithm was built using

  • N-gram model with stupid back-off
  • N-grams referenced were from 6-grams down to unigrams
  • Model size reduction using various finetuning, including dropping least frequent N-grams

App Functionality

App is hosted with shiny. Few notable highlights

  • The text box for users to enter input text
  • Shiny app runs model in the background
  • Predicts most probable next word as output
  • User can customize how many predictions to be made
  • Response time substantially improved to remain under 3 seconds
  • Application memory usage to remain under 200MBs

THANK YOU!