26/06/2019

Overview

This project is the final project of the Capstone course for Data Science Specialization, which is held by the JHU together with SwiftKey complany. The project is in form of a shiny app (link to app). The purpose of the app is to take some text as input, and provides as output the prediction of the next possible word. All this is done by using Natural Language Processing.

Project consists of:

  • Source code for the model, which takes text as input and outputs next word
  • Shiny app, which provides UI for the user
  • The 5 slide deck (this)
  • Milestone project (during the development)

The algorithm

The algorithm for the source code (which is under the hood of the shiny app) consists of several steps:

  1. The code first takes the data, cleans it, and separates it into separate words, in order to convert them to n-grams
  2. Decide depending on the number of words what n to use for the n-gram
  3. Sort n-grams by frequency
  4. Use a type of language model based on counting words in the corpora to establish probabilities about next words (more on this topic)
  5. Provides the output to the user by choosing the outcome with the largest probability

The App overview and instructions

The Shiny app itself consists of the title (and subtitle), left side part with instructions of how to use it, and on the right side the input/output section where the user and the app can interact.

The app is automatically responsive (there is no need to press a button or press enter to get an output), and the only part with which the user interacts is the text box. Beneath, the predicted word appears.

From the benefitial side, the app itself is fast, and scales good as the input gets longer.

Documentation and Summary

Related Shiny App: “https://lukabrdar.shinyapps.io/Word-predictor/

JHU Data Specialization Capstone: “https://www.coursera.org/learn/data-science-project

Theoretical part for the algorithm: “http://www.modsimworld.org/papers/2015/Natural_Language_Processing.pdf

More on SwiftKey: “https://www.microsoft.com/en-us/swiftkey

All in all, the app tries to predict the next word that the user would like to write, which could help the user to write faster. The approach was done by using n-grams, and the given product is quick and has no problems with too long inputs.

Thank you!