10/28/2021
The Word Predictor App uses natural language processing to predict the next word in a sentence
To Use the App
- Type a word or phrase into the text box
- The app uses the last 4 words in the phrase to predict the next word
- The most likely match to the input phrase is then displayed in the main panel
Model Building Steps
- Raw data is collected from blogs, news articles, and twitter for model training
- Data set is cleaned, removing unwanted words/characters including: profanity, non-English words, numbers, punctuation
- Words are grouped into 2, 3, 4, and 5 word phrases called N-grams and saved as tibbles
- Tibbles are sorted by frequency and saved in repos
- A “Back-Off” prediction model is used to predict the next word in the N-gram
Benefits of the Chosen Model
- Code is easy to read
- Training data is processed quickly
- A large portion of the corpus can be sampled for more accurate results
- Only requires a small amount of saved memory
Resources and Links
Tidy Data
Text Mining with R
Shiny App