Final Project for DATA SCIENCE CAPSTONE, Coursera Specialization

Pooya F
February 2018

  • Natural Language Processing Techniques are used for predicting next word.
  • The Shiny application can be visualized at the link: Word Prediction application
  • The Capstone Project is a cooperation between Coursera and SwiftKey company.

OVERVIEW

  • The goal of this capstone project is to develope a Shiny app in R that can predict the next word using previous ones, like that used in cellphone devices keyboard applications implemented by the SWIFTKEY.

  • Previous tasks that were done in order to get to this point were: Understanding the problem, Getting and Cleaning the Data, Exploratory Data Analysis, Modeling, Prediction Model and Executing all the data and information into one shiny app that is able to do the objective of this project.

METHODS

  • After loaded the data, a sample was created, cleaned and prepared to be used as a corpus of text. It was converted to lower case, removed the punctuation, links, whitespace, numbers and profanity words.

  • The sample text was “tokenized” into so-called n-grams to construct the predictive models (Tokenization is the process of breaking a stream of text up into words, phrases. N-gram is a contiguous sequence of n items from a given sequence of text).

  • The n-grams files or data.frames (unigram, bigram, trigram and quadgram) are matrices with frequencies of words, used into the algorithm to predict the next word based on the text entered by the user.

THE SHINY APPLICATION

  • The Shiny application allow the prediction of the next possible word in a sentence.

  • The user entered the text in an input box, and in the other one, the application returns the most probability word to be used.

  • The predicted word is obtained from the n-grams matrices, comparing it with tokenized frequency of 2, 3 and 4 grams sequences.

  • While entering the text, the field with the predicted next word refreshes instantaneously, and then the predicted word is then provided for the user's choice.

THE APP USER INTERFACE

  • Screenshot of the user interface with the directions to provide a sentence or a word and get the prediction of the next likely word.

Application Screenshot

ADDITIONAL COMMENTS AND LINKS

  • Accuracy could be improved increasing the sample size.

  • The next word prediction app is hosted on shinyapps.io: Shiny app