Word Predictor

investh
21 Apr 2015

Word Predictor

Background

This application is a product of the Johns Hopkins Data Science Capstone Project in partnership with Swiftkey.

The objective is to build an application which will predict the next word using NLP. The data set is from a corpus called HC Corpora and consist of Twitter, Blog and News text files.

In order to build the application exploration and cleaning of the dataset is performed. The clean data si them used for the building of the prediction algorith which will power the app.Further optimization is then required to ensure accuracy of the prediction and fast loads.

The prediction algorithm

  • The prediction algorithm uses n-gram tokenization with the RWeka package to build a text corpus of 1,2 & 3 terms and the relevant prediction. It also uses the frequency of each term to score the results. This way more frequent terms will show higgher in the result in order of relevance.
  • The algorithm starts with the four-gram terms and if it doesn't find match falls back to trigram and bigram terms to find the most relevant prediction word.
  • Initial tests of the algorithm showed ~19% accuracy. Word Predictor App

How the app works

Working with the application is very simple. Just type phrase of 1 to 3 words and click submit button to get your next word prediction in the panel on the right.

The application will score and suggest in order potential words as next prediction.Similar to how a person browse search engine and get results in order of relevance the application suggest multiple words so every user can decide which word is most suitable.


Try the application >> here.

Future work

This is only the fisrt step towards building sustainable word prediction application. I will continue to explore ways to improve the algorithm and make the interface more user friendly. For example ability for displaying the predicted word as the phrase is typed. So stay tuned.

Click on the below link or type the address in a browser to try the Word Predictor

https://investh.shinyapps.io/word-predictor/

I would love your feedback