August 14, 2017

Introduction

Word Prediction Application is based on natural Language Processing (NLP)and prediction Algorithm that is designed in Shiny App.

source of data: from blogs, news , twitter with the following link

https://d396qusza40orc.cloudfront.net/dsscapstone/dataset/Coursera-SwiftKey.zip

statistical summary of database :

       File         Size      No_Entries   Total_Chars    Max_Chars
    1   Blogs       248.5 Mb     899288       206824505       40833
    2   News        19.2 Mb      77259        15639408        5760
    3   Twitter     301.4 Mb      2360148     162096031       140

Prediction Algorithm

  • Algorithm starts by loading of data from blogs,news,twitter as in milestone report.
  • Cleaning of unnecessary files and selection of a sample from data.
  • Designing of corpus ,matrix and following of tokenizing of data determination of most frequent words
  • Creating of unigram, bigram ,trigram and tetragram from the above steps
  • Designing of n-gram by merging uni,bi,tri,tetragrams to create prediction algorithm.

How does App Work?

  • Prediction algorithm designed based on Markov-Chain & Backoff models

  • Entering any word and sentence in word box will load database from corpus and matrix and based on n-gram and prediction algorithm determines high probable word to complete the word and sentence.

  • If Application can't find suitable word it shows NA .

Word Prediction App