Predicting Next Word

Marisa Souza
03/17/2018

The Problem and the Solution

  • The Problem - Around the world, people are spending an increasing amount of time on their mobile devices for email, social networking, banking and a whole range of other activities. But typing on mobile devices can be a serious pain.

  • The Solution - My project builds an app with smart keyboard that makes it easier for people to type on their mobile devices. One cornerstone of my smart keyboard is predictive text models. When someone types:

I am going to the

the keyboard presents three options for what the next word might be. For example, the three words might be gym, movies, doctor.

About the Algoritm

  • I got a sample data (2%) in 3 files: en_US.twitter.txt, en_US.news.txt and en_US.blogs.
  • I created n-grams models(1,2,3 and 4) for predicting the next word based on the previous words.
  • I used quantenda package to generate the model fastly
  • I created the models with their respective frequencies
  • I saved the 4 models (unigram, bigram, trigram and fourgram) in separated files
  • My algoritm calculates the probability of ocurrence of words according with Markov Chains model

About the Algoritm

  • A Markov chain is “a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event.”
    • To each sentence typed I search the frequency of the 4 last words in 4-grams:
  • if the sentence does not exist in 4-grams I search the frequency of the 3 last words in 3-grams
  • if the sentence does not exist in 3-grams I search the frequency of the 2 last words in 2-grams
  • if the sentence does not exist in 2-grams I search the frequency of the 1 last word in 1-gram
  • if the sentence exists in some grams above I calculate the probability of ocurrence according to “smooth” the probabilities

The App

The app is available at the following link: https://ssmaryisa.shinyapps.io/predictnextword/

The Guide

  • While you are typing the text the app will predict the next word.
  • The probable words are displayed on the red buttons
  • You can select the probable word in the button or continue typing

alt text