CAPSTONE PROJECT
ADHAM ALEID
Introduction
- The Natural Language Processing (NLP) is an active area of research in Data Science
- Purpose: The goal of this project was to build a predictive model for text.
- R programming language was used to develop this prediction model, and shiny app was used for front-end interface.
FRAMEWORK
- A large English datasets from blogs, news and twitter was downloaded. This datasets were used to build next-word predictive model.
- To understand variation in the frequencies of words and word pairs.
- n-gram approach, which is contiguous sequence of n words in text, was used in this model.
FRAMEWORK
- Three order of n-gram was used in this model
- Unigram: next word is predicted based on the last word
- Bigram: next word is predicted based on the last 2 word
- Trigram: next word is predicted based on the last 3 word
PREDICTION APP

- Left, the user enters the text in the textbox, and choose the n-gram model. Right, predicted words appear in the table.
CONCLUSION
- This prediction model exhibits substantial accuracy, especially for words used commonly.
- The app is easy to used, and results returned usually in less than 1 second, which is good for newly developed model.
- This model has a large potential for improvement in both accuracy and speed.