NLP - Predicting the Next Word

Alejandro Montoya
November 28, 2017

About

Around the world, people are spending an increasing amount of time on their mobile devices for email, social networking, banking and a whole range of other activities. But typing on mobile devices can be a serious pain.

SwiftKey, Coursera's corporate partner for the Data Science Specialization Capstone Project, builds a smart keyboard that makes it easier for people to type on their mobile devices. One cornerstone of their smart keyboard is predictive text models. When someone types, for example, I went to the, the keyboard presents three options for what the next word might be. For example, the three words might be gym, store, restaurant.

The following product, intends to simulate that same keyboard, by allowing a user to enter a text and then suggesting a possible word to write next according to the text that was just entered

Solution Presentation

I built a Shiny application that allows the user to access a predictive model based on a Corpora created from text extracts of news, blogs and twits.

This application is accesible at https://alemontoya.shinyapps.io/natural_language_processing/

Using the App

To use the app, the user simply has go to the section called “Next Word Prediction” and start typing on the text box in the left side. Once the user has written at least 2 words, he/she will start to see the suggested words to the right side. Just keep writing, and the application will keep suggesting words

alt text

Predictive Model Explanation

I built the predictive model giving life to this application with the help of n-grams. I created a probability decision matrix based on the frequency of appearance of each n-gram, and gave it the concept of context by calculating this matrix for bigrams, trigrams, fourgrams, fivegrams and sixgrams. So, the application will always try to use the last 6 words of the sentence being written by the user to try to find the next word. It will iterate between all the matrices until it finds the right word to suggest.

In order to make it work faster and with a decent degree of accuracy, I used only 15% of the data to train the model.

Bonus Content

When accessing the application, the user will also notice another section called “Text Generator”. In this section, the user will be able to generate a random text based on 2 initial words, a number of total words and a flag that controls if the predicted word should be randomly selected from the list of possible candidates, or if it's always selected as the word with the highest probability.

This is just a funny toy that can generate texts that have some sense (as well as others that have absolutely none) by recursively using the predictive model.

Try it and have fun!!!!