2023-12-24

Shiny App to Predict the Next Word in a Phrase

I have created an app that will predict the next word in an inputted phrase using natural language processing, located at https://timjames1999.shinyapps.io/WordPredict/

This app is a Minimum Viable Product (MVP) that requires further investment to perfect into an exquisite tool. The process to get this far, current status, and proposed next steps are covered in this pitch.

App Development

  1. Three large text files scrapped from the internet were provided for training the language model:

    • 1. Twitter, Blogs, and News
  2. These files were:

    • 1. Pre-processed by changing all to lower case; removing numbers, punctuation, profanity, and extra whitespace
    • 2. A 5% sample was taken from each and combined
  3. Next I created length 3 Ngrams from the sample file (multiple lengths were created but not used based on memory issues)

  4. Finally I create an app that takes any length input and predicts the next word (based on the last 2 entered). The app was published to shinyapp.io for public use

App Demonstration

Here are examples of the app working with 2, 3, and 4 words entered.

Next Steps

  • App development was limited due mainly to computer hardware limitations (using an old laptop from my child)
    • This severely limited the ability to make a final product as I repeatedly ran into memory allocation issues, had script run times measured in hours, etc.
  • Proposed next steps if the project is adopted/resourced to continue:
    • Ngrams of multiple lengths to support a back-off model (more accurate prediction based off increased context)
    • Larger sample files size
    • Automatic and continuous web scrapping to keep up with current language trends
    • A living frequency model that will run in the background and only push the more relevant Ngrams to the app