Overview

  • The goal of this exercise is to create a product to highlight the prediction algorithm that you have built and to provide an interface that can be accessed by others. For this project you must submit:
  • A Shiny app that takes as input a phrase (multiple words) in a text box input and outputs a prediction of the next word.
  • A slide deck consisting of no more than 5 slides created with R Studio Presenter pitching your algorithm and app as if you were presenting to your boss or an investor.
  • Data Science Specialization by John Hopkins University on Coursera: https://www.coursera.org/specializations/jhu-data-science

Instructions to Use Shiny App

  • The idea of this project is to make predictions upon the text user inputs.
  • Enter a single word or sentence into the textbox, there will be an output immediately, as we predict the next word using a N-gram model.
  • To better understand what the whole sentence would be, you can see what you entered exactly, and see what follows.
  • For more information, click here: https://github.com/sdsng/C10W7-Capstone/tree/master

The Algorithm Behind the App

The process is as follows:

  • First, we cleaned the data that you inputted by removing punctuations, changing all capitals to lowercases, removing all spaces, and clean all symbols or characters we can't read. Do note that this app currently only supports English.
  • Second, we try to find the possible answers based on the quadgram, trigram, and bigram matrixs we built.
  • Last, we follow the grams to check which sets of words appear with the highest frequency, and suggest that as an output.

More Info on N-grams