Capstone Presentation

David Stanley
16NOV2020

Text Predict is a Shiny app that uses algorithms to predict the next word or words based on the input words or phrases
Using an algorithm based on n-grams the application will provide suggestions for the next word in the sentence
n-grams being a sequence of 2,3,4 or more words from a sentence of text
The predictive model was formed based on data provided from a large selection of blogs, news, and twitter data in English. n-grams were obtained from a sample of this data set and used in setting up the prediction model

The Prediction Model was built from a sample of the large dataset of blogs, news and twitter data
Using the tm package in R, the sample data was processed and cleaned and later tokenized. Items such as email addresses, URLs, hash tags and so on were removed as well as all words converted to lowercase
In the tokenization proess, the data was split into n-grams (2,3 and 4)
When the user inputs text into the app, the program starts from the longest n-gram (4) and then works down to the shortest n-gram (2) to match the user input to the sample dataset
The suggested word is based off of the longest and most frequent matching n-gram

The suggested next word is shown after the app detects that the user is done typing their input
When entering text, allow a sufficient amound of time for the output to show up
The slider tool provides an option to select 1 to 3 suggestions for the next word
The top prediction is shown first with the second and third as the next likely results