Capstone Project - Next Word Prediction

ljaraque@yahoo.com
2015.04.26

Introduction

The purpose of the Project has been the generation of a Web App based on Shiny R, able to receive a writting input from a user and predict the next coming word.
Main features of the developed App:

  • Reactive update to the writting of the user.
  • Showing the result of the most probable next word.
  • A didactic bar plot is displayed with the top words showing the probability of each one in the group.
  • It is based on N-Gram analysis and probability of them.
  • Main packages used: tm and RWeka.

Web App Description

  • The Web App is developed in Shiny framework.
  • There is a side panel and the left in which the user input the text.
  • The main panel shows mainly three results:
    1. A reproduction of the current text input by the user,
    2. The most probable next word predicted,
    3. A plot with the ranking based on probability of the next words results of the analysis.

A Brief View of the App

The follwoing is a screenshot of the App described in the previous page. Try it by yourself at the link below:

Screenshot
link to the Web App
(Wait some seconds for setup on loading)

Logic behind the App

  • The data has been cleaned and subsetting to an reasonable amount considering the tradeoff between processing time and quality of results.
  • Bi-gram, Tri-gram and Qua-gram tdm matrices constructed (tm & RWeka). All analysis done in lowercase, no punctuation, additional spaces removed. .
  • Many failure points handled, like unrecognizable words with no prediction, input with excess spaces, Upper/Lower case.
  • The algorithm detects the input number of words and adapt them for searching based on maximum probability recursively in 4-gram –> 3-gram –> 2-gram database. The top probable word is suggested to the user and the top ranking is shown.

References and Final Words

“This is the End of a Journey for all of us. Long nights working for assessments and Quizzes. For me this has been a real approach to Data Science beyond construction of algorithms for Machine Learning in which I involved before. I learned how to use and interpret many tools and I am equiped with them now. Thanks to all the Instructors and Peer Reviewers for the efforts, dedicating time to check my uploads! Greetings from Chile and see you around!”