Irni Jasmina Ibrahim
May 27, 2017
This aim of this project is to create a ShinyApp that can take input, be it a phrase or multiple words, and predicts probable outputs based on the inputs keyed in.
This project is based on the data provided by SwiftKey on twitter, news and blogs.
In this project, various exercises have been done such as cleaning the data and prediction model creation. For example, the data used has been cleaned of from any special characters and bad words.
Using the cleaned up data, the data has been tokenized into an N-gram model.
An n-gram is a contiguous sequence of n items from a given sequence of text or speech. The items can be phonemes, syllables, letters, words or base pairs according to the application. Source:Wikipedia
In this model, the N-gram model algorithm will process the sample corpus data into a N-grams model with their frequencies (bi-gram, tri-gram, and quad-gram).
The output is predicted based on the inputs, which will be looked into the data frame to find the next words with the frequencies as per the n-grams table.
You can access the ShinyApp by clicking here.