2023-04-23

Presentation

This is an R Markdown presentation in presenting the next word that can be predicted when given a database of words. The web application can be found at:

https://johnkn31.shinyapps.io/DataScienceCaptsone/

When the user open up the app, the user can read the “instruction” on the left side panel. The main panel is where the user will type in a word or phrase.

Algorithm

The web app is build on the fundamental of N-Grams and Markov Chain. For this web app, I have decided to use Bigram, Trigam, and Quadgram to make the next word prediction. Markov Chain in Natural Processing language is the idea that one can predict the next word with given information about the previous word.

Learn more at: https://www.youtube.com/watch?v=CXpZnZM63Gg&list=PL8FFE3F391203C98C&index=1

Important Findings

Trade off: If one were to have a large database to contain many words, then run time to find the next word would be much longer versus someone having a small database of words to find the next word. However, having a smaller database may not contain the next word or phrase for the user. While, having a larger database would give one a higher likelihood of finding the next word or phrase. This is important when building data for deployment on R Shiny Server.

Do not need to know all words in order to know the whole Corpora. By this I mean when examining distinct words, one will find that it would be log graph.

Picture of important finding

Helpful Links