Assignment: Final Project Submission

Hasmirah Hassan

03rd June 2016

Background of the Final Project

The objective of this final project is to create an application to highlight the prediction algorithm and to provide an interface that can be accessed by others.

This project consists of:

  1. A Shiny app that take input as a phrase in a text box and outputs a prediction of the next word.
  2. A slide deck consisting of no more than 5 slides created with R studio presenter.

The application is accessible via this link: THE APPLICATION

The Applied Methods & Models

  1. The data is from a corpus called HC Corpora.
  2. The sample is about 10% of the data.
  3. Filter the data by removing profanity words, punctuation, contractions, numbers, foreign characters, common words, and any extra white space.
  4. For algorithm, N-grams method is used to predict the next word by using the previous word in sequence.
  5. Matches of words are searched based on the user input. For example, the input ‘I am looking for’ a match is defined as ‘looking forward’.
  6. From the number of matches, a probability model is used to give a score for each predicted word. Scores are sorted with the most likely words to the least likely.

Application: Input

Input Panel

  1. Please type the phrases that need to be analyzed.
  2. Please choose the number of predicted word from 1 to 5.
  3. Click on the GO button to execute the analysis.

Sample of Input

Sample input

Application: Output

Output Panel

The algorithms produce three outputs i.e

  1. The phrases keyed in.
  2. The filtered text provided to the algorithm.
  3. A table consist of predicted words and the log probability. The table is sorted from the most likely word to the least likely in the last row.

Sample of Output

Sample output