2024-08-11

Predict Word Shiny App

Description: This is a shiny app that predicts the next word of a provided text based on the previous 3 words. The words are generated by analyzing the texts of news, blogs and twitter.

How it functions: The app consists of a field that accepts user input of a text, a refresh button to generate the prediction and an output field where the single predicted word appears.

Instructions:

  1. Load the app: https://meri-galindo.shinyapps.io/PredictWord/
  2. Enter the text in the box indicated
  3. Hit ENTER or click on the “Predict word” button
  4. The predicted word appears on the right panel, within the box provided.

Data preparation

  • 10% of the training data were subset and analyzed to generate 4-Grams, 3-Grams, 2-Grams and most frequent single words using the quanteda package for efficiency.
  • The nGrams column was divided into 2 columns, where the first column is the search feature (n-1_Gram) and the second column has the last word (word to predict).
  • The size of the resulting dataframes was reduced by removing non-used columns and repeated n-1_Grams (only the highest frequency last word is kept). This resulted in using only 40% of the initial memory.
  • The data frames were ordered and saved for later use.
  • The shiny app loaded the data frames instead of calculating them every time, reducing the memory and time to run the app (only 11 MB of data to load).

Algorithm description

  • The prediction algorithm first searches the 4-Grams, then the 3-Grams, then the 2-Grams and finally the single most frequent word to predict the next word.

  • To speed up the search the ordered data frames were searched using a binary search, which reduced the search time dramatically from seconds/minutes to miliseconds.

References