Next Word Predictor: Capstone Project

Dipesh Shrestha
05/31/2016

Description

From the n-grams generated from data sets provided in given training data set, the app take words, phrases or sentences as an input and predicts/suggests the list of possible next words. The shiny application is available here More about running the app and alogirthm ahead.

Objectives

  • Analaysis of the above data set
  • Building prediction algorithm
  • Create a shiny app taking phrase input and producing predicted word
  • Finaly , pitching the app
  • Instructions

    1. Wait for shinyapps.io page to load completely.
    2. Input your desired word/phrase/sentence in the side panel
    3. Either press “Submit Sentence” or hit the enter key to see result
    4. Wait few seconds for result to display
    5. View the top fifty words that might follow your sentence
    6. Click on ‘Next predicted word’ tab to see list of top 7 suggested word.
    Application Snapshot

    Prediction Algorithm: N-gram model

    1. The algorithm applied is Katz's back-off Models
    2. At first, the dataset with tokenized n-grams previously generated are loaded for prediction.
    3. The algorithm takes the last 4 words (or less) of a user input as a reference for n-gram freq. tables.
    4. The user entered value is processed for removing bad words, tokenizing and lowering
    5. Result Generation: The most frequently occuring next word is returned as prediction.

    Back-Off: Prediction Algorithm Continued

    The algorithm works as follows:

  • Based on the input phrase, it will start searching the Quadgrams for a match
  • It will backoff to the Trigrams if no match was found
  • It will then backoff again to the Bigram to search for a match
  • Finally, it will display the most frequent word from unigram.
  • Link to my shiny app Next Word Suggestor