Manoj Kundrapakam
10/20/2021
Data used in this prediction app consists of twitter and news data provided by coursera.
The goal of this exercise is to create a product to highlight the prediction algorithm that you have built and to provide an interface that can be accessed by others.
The goal is to create a Shiny app that takes as input a phrase (multiple words) in a text box input and outputs a prediction of the next word.In this app the user can input one,two or three words to get next word.
Below is the sample code for transforming text and creating N gram model
corp <- VCorpus(VectorSource(df))
corp <- tm_map(corp, tolower)
corp <- tm_map(corp, removePunctuation)
corp <- tm_map(corp, removeNumbers)
corp <- tm_map(corp, stripWhitespace)
corp <- tm_map(corp, PlainTextDocument)
changetospace <- content_transformer(function(x, pattern) gsub(pattern, " ", x))
corp <- tm_map(corp, changetospace, "/|@|\\|")
#use a tokenizer to break speeck into components that can be read my machine
uniGramTokenizer <- function(x) NGramTokenizer(x, Weka_control(min = 1, max = 1))
biGramTokenizer <- function(x) NGramTokenizer(x, Weka_control(min = 2, max = 2))
triGramTokenizer <- function(x) NGramTokenizer(x, Weka_control(min = 3, max = 3))
quadGramTokenizer <- function(x) NGramTokenizer(x, Weka_control(min = 4, max = 4))
Data used here is sampled from given source as my machine cannot process huge data so not all the words get predicted. I am working on improvising the model. Thanks for your patience.