Data Science Capstone Project: Next Best Word

Vaz, Thiago
2018, March 14th

Overview

There are two trends changing the way we deal with communication: (i) easy access to information and (ii) new channels spreading this info in a way we've never seen before.

With the purpose to facilitate the process, intelligent systems are been developed to help us with the unnecessary issues or to accelerate common activities, mainly based on a previous knowledge database.

  • The goal of this app is to support the writting process predicting the next best word the user is supposing to type.
  • The end of the game is to provide a mechanism where user do not need to input every word in his/her keyboard and save most of the time spent with it.

Predictive Model: How it works?

To achieve this goal the app is based on model with the following steps:

  • Data was acquared from (a sample of) twitter, newspapers and blogs

  • Data was pre-processe/cleaned to enable better analysis and avoid bad behaviors (like profanity words)

  • We calculated the n-grams, a contiguous sequence of n items from the given sample of text

  • Creation of a Katz's back-off model, that estimate conditional probability of a word given by its history in the n-gram and “backing-off” data to smaller histories.

Predictive Model: Performance

Considering the experience user will face using the app, I've made some choices (such as reducing n-grams size) to get a better performance (avoiding legging between one word and another).

Specially for this model, the performance is related with the size of n-grams.

However, the main use case (predicting through a typing process) requires almost real-time responses.

To address this issue and balance accuracy and time, as furthur improvements, I would test new models/frameworks (Markov and RNN's) deployed in a scalable infrastructure.

Product details

  • The app is published in https://thiagogarciavaz.shinyapps.io/next-best-word-app/.
  • In the main screen (bellow) in the left side there is a “how-to” guide (A), where user learn how the app works.
  • In the center there is a text box (B) to input the sentence from user.
  • As the user press button, the app predicts the next best word and present bellow the text area ©.
  • As the user increment the sentence with new words, the algorithm predicts again and again new options based on Backoff model structure
  • At the same time the next best word is presented, we also show some other options from words with smaller probability (D)
  • There is also another “tab” called “About”, where the user find additional information about the project

  • Product Screenshot App screenshot.