Coursera Final "Capstone" Project

T. Rydzewski
20/04/2020

Application

To see the application in action please visit https://tommr.shinyapps.io/FinalProject/.

Algorithm

Tha goal of the project was to create an application that would predict the next typed word.

We used data news and twitter data and build N-grams. My application uses 2 and 3 grams meaning that it uses 1 or 2 words to make predictions.

Ngrams

Data cleaning

Here is a sumary of the opperations we did on the data before converting it to N-grams

Get rid of the partial data - we dont need it anymore
rm(blogs_data, news_data, twitter_data)
Convert  to lower case
corpus <- tm_map(corpus, content_transformer(tolower))
Remove numbers
corpus <- tm_map(corpus, removeNumbers)
Remove english common stopwords
corpus <- tm_map(corpus, removeWords, stopwords('english'))
Remove punctuation
corpus <- tm_map(corpus, removePunctuation)
Eliminate white spaces
corpus <- tm_map(corpus, stripWhitespace)
Create plain text 
corpus<-tm_map(corpus,PlainTextDocument)
Strip back to the root of the word
corpus <- tm_map(corpus, stemDocument)

Using the Application

  1. Choose if you want to use 2-grams or 3-grams usign the slider

  2. Enter a text

  3. The applications will make a prediction based on the text you entered. Worth nothing that for 3 grams you need to enter at least 2 words.

Ngrams

Have Fun! T