Darius Kharazi
07/31/18
Natural Language Processing or NLP is a field of computer science with the interaction between computers and human languages.
One on the oldest NLP problem related with computer word prediction is
Claude Shannon's
problem of assigning a probability to a word, Shannon used
n-grams, defined as a contiguous sequence of n items, from a given sequence of text or speech, to compute probabilities of English sentences.
Essentially, we are only analyzing ibasic, lower case words from the given tweets, blog posts, and news articles
Each n-gram table includes phrases and words pulled from the data preprocessing step.
There are 2-grams, 3-grams, 4-grams, and 5-grams included in the repository and Shiny app to be used for predicting the next word
Each n-gram table includes columns with the input phrase, predicted output word, and the frequency, or probability, of the predicted word in the n-gram.
Simply run the Shipy app server from the Shiny directory or URL.
In order to run the Shiny app locally, you will need to build your own prediction model.
If you would like a step-by-step process of preprocessing the data, building the prediction model, and using it for example phrase, go to the rmarkdown file and configure the “data” directories, as noted.
Implement more accurate prediction algorithms, such as linear regression and logistic regression, that predicts the probability of observing the predicted word.
Test the code with other languages other than English.