Predict Next Word

Domingos Santos
08/18/2015

Shiny application to predict next word in sentence

Introduction

Application

  • The application has a navigation bar with two panels: Predict App (main panel), and Documentation.
  • This image shows the navigation bar and the main panel.

navigation

  • The user enters a small text in Text input, and clicks on the Predict next word button.
  • Four predicted words will be displayed in the text output box.
  • The words are listed in order by more relevant for the prediction.

Algorithm

It was implemented the Stupid Backoff algorithm in this way:

  • Text input is cleaning and transforming to be analyzed.
  • The last 3 words of text input are searched in the 4-gram file.
  • If they are found, the last words of the 4-gram more frequently are defined as predicted words.
  • If there are not four predict words, the app searches the last 2 words of the text input in 3-gram file.
  • The same happen with last word of the text input if there are not four predicted words: 2-Gram file is searched.
  • If there are not four predicted words at the end, the app searches last word of the text input in the most frequent 2-grams from the Corpus of Contemporary American English (COCA).
  • R code of the algorithm is in “Documentation” tab of the app.

Necessary improvements

The application developed needs the following enhancements:

  • Implement use of n-gram probabilities to get better accuracy.
  • Implement some method of smoothing the probabilities of all null probabilities.
  • Use a higher percentage of text for the training test.
  • Improve steps for transformation and cleaning data, and the layout of the panels.