T. Rydzewski
20/04/2020
To see the application in action please visit https://tommr.shinyapps.io/FinalProject/.
Tha goal of the project was to create an application that would predict the next typed word.
We used data news and twitter data and build N-grams. My application uses 2 and 3 grams meaning that it uses 1 or 2 words to make predictions.
Here is a sumary of the opperations we did on the data before converting it to N-grams
Get rid of the partial data - we dont need it anymore
rm(blogs_data, news_data, twitter_data)
Convert to lower case
corpus <- tm_map(corpus, content_transformer(tolower))
Remove numbers
corpus <- tm_map(corpus, removeNumbers)
Remove english common stopwords
corpus <- tm_map(corpus, removeWords, stopwords('english'))
Remove punctuation
corpus <- tm_map(corpus, removePunctuation)
Eliminate white spaces
corpus <- tm_map(corpus, stripWhitespace)
Create plain text
corpus<-tm_map(corpus,PlainTextDocument)
Strip back to the root of the word
corpus <- tm_map(corpus, stemDocument)
Choose if you want to use 2-grams or 3-grams usign the slider
Enter a text
The applications will make a prediction based on the text you entered. Worth nothing that for 3 grams you need to enter at least 2 words.
Have Fun! T