Word Predictor

By Chemba Ranganathan

A simple shiny application that predicts the next word based on the text entered. The aim of this application is to develop a simple interface to be used in smart phones or other hand held devices and achieve results similar to Swift key's smart keyboard.

The application can be loaded from

https://chemba.shinyapps.io/WordPredictior/

Methodology

  • Data Set

The data set contained data from blogs, twitter and news provided by Coursera. Five percent of this sample was used to form the dictionary so that it can meet the small memory footprint requirement for hand held devices.

  • Algorithm

N gram model was chosen for creating the dictionary since it is most commonly used in natural language processing. Kneser ney smoothing for one to five gram models were used since it was one of the most accurate smoothing procedure. The next word was predicted based on the continuation probability.

Process

  • Cleaning Data

Twitter hashtags, html tags, punctuations and profanity words were removed and cases were eliminated to allow easy processing. Stemming was not performed since it eliminated the most commonly used words.

  • Creating NGrams and Predictions

Quanteda package was used to create ngram models. This seemed to handle larger data set when compared to other packages like “tm”.The processed ngram dictionaries were manipulated using ‘data.table’ package.

  • Accuracy and Speed

The model is relatively accurate considerning the fact that a very small sample was chosen for the analysis. The model does not predict words instantaneously but does not take too much time.

Shiny Application

  • Layout

    The application consists of two panels

    • An input panel with a text field and a 'Predict Word' button
    • A tabbed panel with three tabs ('About', 'Usage' and 'Predictions')
  • Usage

    Type one or more words in the text field of the input panel and press the 'Enter'key or the 'Predict Word' button. The output is displayed in the 'Predictions' tab.

  • Additional Details

    The 'About' tab gives a detailed summary of the application while the 'Usage' tab gives more explanation about the output displayed in the 'Predictions' tab.

Example

  • Input Panel

Happy new                    PredictWord
  • Predictions tab

    Word1 Word2  Word3     Word4 Word5
 favorite  year report beginning years

plot of chunk unnamed-chunk-4