Omid Jazi
May 2, 2021
In this Project, we create a product to highlight the prediction algorithm that you have built and to provide an interface that can be accessed by others. Accordingly, we build
The data was cleaned and processed by the tm, stringr packages which have different inbult functions for removing common puntuation, stopswords, numbers, twitter handles etc.The clean data was then combined together for furthur analysis.
A sample of one percent of the data was used for the project. The N-Grams were created using tokenization. The model algorithm uses the stupid back-off strategy for words prediction.
We have trained one percent of sample on the SwiftKey data on blogs, news and twitter.
The model adapts a set of n-grams whcih is a contiguous sequence of \(n\) items from a given sample of text or speech and it is used to make a prediction on the next word.
In the prediction algorithm, the results are in the order of quadgram, trigram, and bigram. In case no result is found, it will return the word “the” as the predicted word.
The link for the Shiny Application is Here
On the left side, there is a textbox to enter a phrase. On the right side, the output with “NULL” value.
Enter a word phrase, then the algorithm will predict the next word. In case no result is found, it will return the word “the” as the predicted word.