Vincent Amedekah
2016-09-21
The goal of this project is to create an application which uses prediction algorithm to predict the next word when a user enter some words. The shiny application created takes an input and spits out predicted next word.
The data source for the application is SwiftKey Data set containing blogs, twitter and news. The data is read in as a text and a corpus is created. The corpus is cleaned to remove profanity words, stop words, numbers etc. The corpus is then tokenized into 2, 3, 4, 5 grams which are used for the prediction technique.
The first task is to filter the user input, this is same text cleaning process we used on the SwiftKey data. This includes removing numbers, punctuation, foreign characters, profanity, single letter words and contractions etc. Next we search of matches based on the user input. For example if we have the input 'looking forward seeing' a match is defined as 'looking forward seeing you'. If matches are found with shortened phrases last 3 words, the algorithm returm a match 4 word from the stored N grams.
A shiny application is created with a input box where the user can input the phrase. A button below the input box can be clicked and a predicted word is displaced in the predicted text section The application can be accessed at https://mccosby2020.shinyapps.io/nextwordprediction/