Sakib Shahriar
8th April 2019
Link to the App
First the text corpus was cleaned by removing things like punctuation and numbers. Tokenization was performed, followed by so called n-gram modelling n-grams.
An n-gram is a contiguous sequence of n items from a given sequence of text. Given a sentence, s, we can construct a list of n-grams from s by finding pairs of words that occur next to each other. For example, given the sentence “I am Sam” you can construct bigrams (n-grams of length 2) by finding consecutive pairs of words. (Kevin Sookocheff )
The next word is predicted using the n-gram table.
Follow the image for instructions
The app Can be found live here: https://sakibshahriar95.shinyapps.io/cdsc/
Thank You and Congratulations!!!