Word Predictor App

Sarathy Jay
May 15 2016

About the App

This is a smart text prediction app that learned multiple word combinations from huge set of twitter, public blogs and news datasets.

Key Features

Performs data cleaning to remove special characters and profanity words
The datasets are loaded into R data frames for faster prediction
Text Mining & NLP techniques are used to create N-grams (1,2,3 and 4 words)

How the app works

User Interface

A text box to capture user input. The user can type in one or few works
A button to perform the word prediction action

Background Process

The app gets the user's input and performs data cleaning (removes punctuation, special characters, extra-white spaces and profanity words)
Loads already created n-grams into memory
Performs prediction model using built in algorithm
Outputs all the possible next work predictions in a drop down box

Algorithm

The app’s algorithm is based on N-grams. N-gram is a contiguous sequence of n items from a given sequence of text or speech. The items can be phonemes, syllables, letters, words or base pairs according to the application. The n-grams typically are collected from a text or speech corpus

Corpus

The “corpus” used were blogs, news & tweets in the English language. Based on these, we have build a pair on n-grams (unigrams, bigrams, trigrams & quadragrams) to help predict the most likely word to come next in a sentence, based on the frequency that same word was used in the corpus we analyzed.

Conclusion

The app is availble through shinny for exploration.

Link to shinny app: Word Predictor

References

Wikipedia
http://blog.algorithmia.com/