App for next word prediction

Falko Schoenteich
19.04.2015

Introduction

This presentation explains an App for predicting the next word of an user input. The top requirements where:

  • intuitive and minimalistic design
  • high efficiency/ small processing delays
  • possible use on mobile devices

Online-Demo at https://fschoen.shinyapps.io/NextWordPrediction/

  • Instructions
    • The user has to enter text in the input box
    • The prediction for the next word automatically updates (no button-clicking necessary)
    • User may switch languages

Algorithm

  • App uses Maximum-Likelihood-Estimate
    • Prediction based on frequencies of previous observations
  • Basis for the model in the prototype: corpus of twitter posts
  • App selects most likely next word accoding to most common bi- and trigrams in corpus
    • Bi- and trigrams = phrases with two and three words
  • Trigram-matches are preffered over bigram-matches

Further information on data preprocessing: https://rpubs.com/FSCHOEN/MilestoneReportCapstoneDataScience

Demonstration

  • Example
    • User enters good
    • Most common bigram starting with good is good morning
    • Prediction for next word is morning
  • BUT
    • User enters a good
    • Most common trigram starting with a good is a good day
    • Prediction for next word is day as trigram-matches are preferred

Outlook

  • Current status
    • well working prototype for English, German and Finnish
    • decent accuracy (although model was created with only a relativly small text corpus on a home PC)
  • Future optimisation
    • performance enhancement by hashing
    • increasing accuracy by using a larger text corpus
  • Additional possible features
    • model improvement by learning on-use(esp. new words)
    • support for more languages
    • adding case-sensitivity
    • grammar detection