Johns Hopkins University Data Science Capstone Project

Sakib Shahriar
8th April 2019

Algorithm and Modelling

Link to the App

First the text corpus was cleaned by removing things like punctuation and numbers. Tokenization was performed, followed by so called n-gram modelling n-grams.

An n-gram is a contiguous sequence of n items from a given sequence of text. Given a sentence, s, we can construct a list of n-grams from s by finding pairs of words that occur next to each other. For example, given the sentence “I am Sam” you can construct bigrams (n-grams of length 2) by finding consecutive pairs of words. (Kevin Sookocheff )

The next word is predicted using the n-gram table.

App Instructions

Follow the image for instructions

App Link

The app Can be found live here: https://sakibshahriar95.shinyapps.io/cdsc/

Conclusion

Thank You and Congratulations!!!