Word Prediction Shiny Application

Minh Tu Pham
June 13 2017

Objective

The application is under the capstone project of Data Science organized by Coursera.

In this project, data science is applied in the area of natural language processing.

The project objective is to build a shiny application that is able to predict the next word when any words is typed.

The data is from a corpus called HC Corpora provided by SwiftKey.

The application is available at: https://minhtupham.shinyapps.io/WordPrediction/

Tasks

In order to build the shiny application, serveral different tasks need to cover:

  • Understanding the problem
  • Data acquisition and cleaning
  • Exploratory analysis
  • Statistical modeling
  • Predictive modeling
  • Creative exploration
  • Creating a data product
  • Creating a short slide deck pitching your product

Methods

Steps to build the application:

  • Getting data from the HC Corpora data,
  • Cleaning data, e.g. lowercase conversion, removing of punctuation, links, white space, numbers and all kinds of special characters, etc
  • Tokenized the cleaned data into n-grams, only bi-,tri- and quadgram are used.
  • Using n-grams data to predict
  • Building the model with shinyapps

How to use the app

Drawing

  • The application is ready when 'NA' shown at the predicted-words.
  • Entering the input text. Only English is supported.
  • The result shows the most 2 popular-used words next the input words.