The goal of this project is to predict the most likely next words for an input of text. For this purpose a shiny app was created where a user can input a sentence and by clicking a button the app predicts the three most likely next words.
6 August 2019
The goal of this project is to predict the most likely next words for an input of text. For this purpose a shiny app was created where a user can input a sentence and by clicking a button the app predicts the three most likely next words.
After reading the data for blogs, news and tweets from the US in the data was sampled and combined to use one dataset consisting of three subsets of the original datasets. Then the datasets was splittet into 80% training, 10% validation and 10% test set. To clean the data the following methods were applied:
After that the n-grams were created.
The Algorithm to predict the next words uses a simple backoff starting on 4-gram data. If the input consists of at least two words, the algorithm backs off to 3-gram data, then to 2-gram data and finally to 1-gram data.
After the completion of the algorithm the three most likely words to follow the input are returned.
Visit the Next Word Prediction App
Type your sentence in the textbox and press the button "Predict Next Word" to predict the three most likely words to follow your input.