Text Prediction What's the likelihood?

michael.coursera@eipsoftware.com
28 December 2017

Predictive Model

The predictive model that was used is the Katz-Backoff model.

The model looks at a predetermined list of word phrases referred to as n-grams. The n-grams are comprised of 2, 3, or 4 words. An example of a n-gram of size 3 would be “I love you”

If a user enters “I love”, the model tries to predict the next word based on how frequently the phrased appeared in the source material. The model may suggest, you, cars, or food. The model then ranks the suggestions based on the frequency and presents the top suggestions to the user.

Demonstration

The application is made up of three sections.

picture of word guesser

Application Sections

Section 1: User Input

User will enter in a word phrase. Each word separated by spaces.

Section 2: Number of Results

The user can request how many results to return. Options are from one result to 10 results. If 10 results are not available the top N available results are returned.

Section 3: Results

Predicted results are displayed in the right column. They are ranked by most likely and the probability is shown.

Application Performance

Application Performance:

- Startup: ~2 seconds
- Query runtime: Best Case: < 0.1ms and Worst Case: < 0.5ms
- Results Returned and Displayed to User: Average: < 0.8ms
- Prediction Results from model: ~20% accuracy.

Model results are low because of constraints of database size and limitations in using the Katz-Backoff model.

Assumption user has a high-speed (>20Mb/s) database connection.