Capstone Presentation

DrAmericasBoo'sPath
September 22, 2021

Goal of the Project

  • For this project we first had to create a milestone rmarkdown presentation.
  • After that we had to put the cleaned data to a predictive text algorithm and connect that algorithm to a shiny ui.R and Server.R.
  • The data was provided by swiftkey, which is a predictive text keyboard for you android or iPhone.

Data processing and Modeling

  • Data sample was created from the HC Corpora data. As part of cleaning/pre-processing, the text was converted to all-lowercase, and all non-text characters such as punctuation marks, whitespace, numbers, URLs etc.

  • This cleaned data sample was then tokenized into n-grams, a contiguous sequence of n items from a sequence of text or speech. The n-grams of our interest are the bi-,tri- and the quad-grams.

  • A model is built from the N-grams. A Simple Good Turing (SGT) probability model is computed for the frequency of the N-grams.

  • The prediction is reasonable, but may not be the best. Natural language processing is a big problem in computing, and an individual project like this you can only do so much.

Algorithm and Model Building

  • N-gram model with back-off strategy was used for the Natural Language Process.
  • The data was then tokenized 3 times using 1-gram to 3-gram calculations using RWeka.
  • The algorithm predicts teh next word based on the last words inputted.

Shiny App