24/12/2017

Introduction

Approach

  • Raw Data is cleaned to remove non alphabetics like numbers, punctuation, etc., converted lower case and sample was created.
  • N-grams (1,2,3,4 grams) were created from sample data.
  • N-gram rankings/ metrics were used to find the most suitable/ matching words for prediction
  • Prediction model is based on the Stupid backoff algorithm (a more simplified approach to Katz Backoff)
  • Considers only last 3 words for prediction
  • First checks for Quadgram, If not found in Quadgram, it will check for Trigram, like wise it will go to next levels
  • Lists 5 best possible words matching

Application

The Shiny applicaitons can be accessed from https://ramch.shinyapps.io/PredictWords/

  • Input the word or sentence in the input box
  • Application predicts the next potential 5 words on right tab
  • Small usage not can be found on left tab below text box
  • For performane reasons only last 3 words from the input sentence is considered for predicting the next words
  • Pick a word from predicted words or any other word and add it to the input text, on right you will see new set of words.

Application UI