Next word predictor

Risto Hinno

Overview

Simple app made as a part of Data Science Capstone.

alt text

  • Enter a word/words in text area (example “my name”)
  • Push the button “Predict next word”
  • See the results (“is”, “and”, “on”)

About

Idea. Use existing data (texts) to build a model that predicts what is next word in the text (more information).

Model uses:

  • 3 482 415 three-grams (example: “my name is”)

  • 1 492 091 two-grams (example “I am”)

  • 3 one-grams

More about n-grams.

How it works

Simple backoff model using n-grams:

alt text

Accuracy

Use plot to monitor overall and each guess accuracy:

alt text

This functionality could be turned off from the input panel

Accuracy

Expected overall accuracy is 25%:

  • 1st guess 15%

  • 2nd guess 5%

  • 3rd guess 5%

In reality it might be lower if data is different from data used to build the model.

alt text