Predict Word Application

Rinnette Ramdhanie
9 May 2020

This app makes it easier to type text by automatically predicting the next word based on what was already typed.

Data Science Capstone presentation
John Hopkins University

Data

Source: The data for this project was obtained from twitter, blogs and news websites. A sample was taken from over 4.2 million lines of text with more than 105 million words.

Cleaning: This included removal of profanity, punctuation and extra spaces.

Processing: The data was tokenized: unigrams, bigrams, trigrams and quadgrams were obtained with their associated frequencies. Phrases with low frequencies were not used in the model, in order to decrease the size of the files.

Laplace smoothing was used to calculate the probability for each n-gram

How the algorithm works

The algorithm uses a simple backoff method.

  1. The last 3 words entered are searched in the quadgram table for phrases that start with these words. The 3 most popular phrases are selected, ie., those with the highest probabilities of occurring, and the last words in the phrases are displayed as possible options.
  2. If no phrases are found, the last 2 words entered are similarly searched in the trigram table, then the last word in the bigram table. If the searches yield no possible words then the most popular unigrams are displayed.
  3. When 1 or 2 words are entered initially, the bigram and trigram tables are searched respectively.

Using the Predict Word App

Instructions for use:

  • Enter a word or phrase in the text box.

my image

Instructions Continued

  • Three options for the next word will be automatically shown. Select one of the options and it will be added to what has already been entered.
  • The options shown will continue to update automatically as words are entered or changed in the textbox. Continue to select options or type words as necessary.

Application is available at the following link: https://niala.shinyapps.io/predictWordApp/