Data Science Capstone Project : Predicting Next Word

Azam Yahya
20-04-2016

Overview

The Shiny App is an R Shiny application exploring the area of text prediction, in part fulfillment of the requirements of the Coursera JHU Data Science Specialization Capstone Project.

The following objectives were considered and achieved in the design of this application:

  • Mandatory requirement of predicting next word given an input n-gram
  • Ease of use for general users
  • Fast and lightweight

Naive Corpus n-Gram Algorithm

The following steps were performed in building the application:

  • The Coursera dataset was cleaned (Whitespaces, non-alphabetic characters were removed, and all text are converted to lower case) {2,3,4}-grams were extracted and formed for our usage in the prediction algorithm.
  • The best match (ordered by n-Gram length, followed by prevalence) is utilized in predicting the next word following the input n-Gram. “

User Interface (UI) (1/2)

The main layout elements are a sidebar panel with prediction algorithm controls and a main content panel with tabs for

'Prediction' (with input and output elements) After a user enters a phrase in the “Text Input” box, the predicted word will be shown almost instantaneously.

alt text

User Interface (UI) (2/2)

'Instructions' documenting the UI and options for the algorithm

alt text