Text Prediction

Emma Sun
Feb 22nd, 2017

Overview

The goal of this exercise is to create a product to highlight the prediction algorithm that you have built and to provide an interface that can be accessed by others. For this project you must submit:

A Shiny app that takes as input a phrase (multiple words) in a text box input and outputs a prediction of the next word. A slide deck consisting of no more than 5 slides created with R Studio Presenter pitching your algorithm and app as if you were presenting to your boss or an investor.

Instructions: How to use the app

  • The idea of this project is to make predictions upon the text user inputs.
  • The interface is very friendly.
    • On the input board, you can enter a single word or a few words – a phase or part of a sentence.
    • Then you can see the output immediately, as we predict the next word using a N-gram model.
    • To better understand what the whole sentence would be, you can see what you entered exactly, and see what follows.

To experience the app, click here: https://emmacourserahwork.shinyapps.io/textprediction/

The algorithm behind the app

The process is as follows:

  • First, we cleaned the data that you inputted. This includes removing punctuations, transfer all capitals to lowercases, removing all spaces, and clean all symbols or characters we can't read. For example, this app now only supports English.
  • Second, we try to find the possible answers based on the bigram, quadgram and trigram matrixs we built.
  • Last, we follow the grams, see what is most possible as to frequency and give the output.

More info on the model

n-gram models are widely used in statistical natural language processing.Below are some paper or lectures you can check out if you are interested in this topic.