Capstone Presentation

DrAmericasBoo'sPath
September 22, 2021

For this project we first had to create a milestone rmarkdown presentation.
After that we had to put the cleaned data to a predictive text algorithm and connect that algorithm to a shiny ui.R and Server.R.
The data was provided by swiftkey, which is a predictive text keyboard for you android or iPhone.

Data sample was created from the HC Corpora data. As part of cleaning/pre-processing, the text was converted to all-lowercase, and all non-text characters such as punctuation marks, whitespace, numbers, URLs etc.
This cleaned data sample was then tokenized into n-grams, a contiguous sequence of n items from a sequence of text or speech. The n-grams of our interest are the bi-,tri- and the quad-grams.
A model is built from the N-grams. A Simple Good Turing (SGT) probability model is computed for the frequency of the N-grams.
The prediction is reasonable, but may not be the best. Natural language processing is a big problem in computing, and an individual project like this you can only do so much.

N-gram model with back-off strategy was used for the Natural Language Process.
The data was then tokenized 3 times using 1-gram to 3-gram calculations using RWeka.
The algorithm predicts teh next word based on the last words inputted.

Here is a link to the shiny app https://drsudhirpathak.shinyapps.io/datasciencecapstone/
The app will predict your next word after you click “Analyze Text”.