Capstone Project - Text Predictor

Kevin Pérez
16 / 07/ 2016

About the motivation

This project was created by the need to present the final work of data science specialization is delivered through the Coursera in conjunction with SwiftKey and seeks to exploit the vast amounts of data generated in social networks to power through predictive models provide solutions more effective in predicting the next word on a keyboard of any device.

About the project

This work represents the final stage of data science specialization , which is to create a model that predicts the next word in the construction of a sentence. Some additional tasks

  • Develop a presentation on R and publish RPubs.
  • Develop a Milestone Report for the Text Analisys in R
  • Develop a shiny Apps and published showing the characteristics of predictive model

About the model

  • The first task was text organization and tokenization
  • The second task was the construction of a Statistical Language Model, based in a n-gram theory and Maximum Likelihood Estimation (MLE)
  • smoothing the model via TF-IDF weighting

App features

  • This app predicts the next word given one, two or three words, a fourth word will produce a warning message
  • Words that are not in any document will produce an error message.
  • Displayed on a table the three most probable words for the given word.
  • It works by typing the desired word in the text field and clicking the button to send.