Word Prediction Shiny Application

Minh Tu Pham
June 13 2017

The application is under the capstone project of Data Science organized by Coursera.

In this project, data science is applied in the area of natural language processing.

The project objective is to build a shiny application that is able to predict the next word when any words is typed.

The data is from a corpus called HC Corpora provided by SwiftKey.

In order to build the shiny application, serveral different tasks need to cover:

Steps to build the application:

Getting data from the HC Corpora data,
Cleaning data, e.g. lowercase conversion, removing of punctuation, links, white space, numbers and all kinds of special characters, etc
Tokenized the cleaned data into n-grams, only bi-,tri- and quadgram are used.
Using n-grams data to predict
Building the model with shinyapps

Drawing