Capstone Presentation
David Stanley
16NOV2020
- Coursera Data Science Specialization
- Final Project
autosize: true
The Shiny Application
- Text Predict is a Shiny app that uses algorithms to predict the next word or words based on the input words or phrases
- Using an algorithm based on n-grams the application will provide suggestions for the next word in the sentence
- n-grams being a sequence of 2,3,4 or more words from a sentence of text
- The predictive model was formed based on data provided from a large selection of blogs, news, and twitter data in English. n-grams were obtained from a sample of this data set and used in setting up the prediction model
The Prediction Model
- The Prediction Model was built from a sample of the large dataset of blogs, news and twitter data
- Using the tm package in R, the sample data was processed and cleaned and later tokenized. Items such as email addresses, URLs, hash tags and so on were removed as well as all words converted to lowercase
- In the tokenization proess, the data was split into n-grams (2,3 and 4)
- When the user inputs text into the app, the program starts from the longest n-gram (4) and then works down to the shortest n-gram (2) to match the user input to the sample dataset
- The suggested word is based off of the longest and most frequent matching n-gram
The Application
- The suggested next word is shown after the app detects that the user is done typing their input
- When entering text, allow a sufficient amound of time for the output to show up
- The slider tool provides an option to select 1 to 3 suggestions for the next word
- The top prediction is shown first with the second and third as the next likely results