2017/6/18

Introduction

Around the world, people are spending an increasing amount of time on their mobile devices for email, social networking, banking and a whole range of other activities. But typing on mobile devices can be a serious pain.

SwiftKey, our corporate partner in this capstone, builds a smart keyboard that makes it easier for people to type on their mobile devices. One cornerstone of their smart keyboard is predictive text models.

In this capstone I will work on understanding, building predictive text models and building Shiny APP like those used by SwiftKey.

Shiny APP

Try it now:
https://gitycc.shinyapps.io/shiny_app_predict_next_word/

Step1: Enter your text in dialog box (ex: "New York ")
Step2: It will predict next work (ex: "city")

[Example 1] input: "apple ", predict: "pie"
[Example 2] input: "right ", predict: "now"
[Example 3] input: "English ", predict: "language"
[Example 4] input: "English language ", predict: "learners"
[Example 5] input: "go back ", predict: "sleep"

Algorithm: Katz's Back-off Model

I use a simple Katz's back-off model to construct our prediction model.

Katz back-off is a generative n-gram language model that estimates the conditional probability of a word given its history in the n-gram. It accomplishes this estimation by "backing-off" to models with smaller histories under certain conditions. By doing so, the model with the most reliable information about a given history is used to provide the better results.

Algorithm: Katz's Back-off Model

Step1
sperate input text as word array

Step2
according to length of this word array determine search from n-gram (if length=2, search from 3-gram)

Step3
get a matching prediction from n-gram and return. If it can not find anyone, search from next gram (n-1)-gram.

Step4
repeat step3 until no gram to find and show the prediction.