This project demonstrates how Natural Language Processing (NLP) techniques can be used to predict the next word in a sentence using real-world text data from blogs, news articles, and Twitter posts.
The objective of this project is to build a lightweight and responsive predictive text application capable of suggesting the next likely word based on user input.
Predictive text systems are widely used in:
The goal was to create a practical prototype using R and Shiny.
The project used the HC Corpora English datasets containing text from:
| Dataset | Approximate Lines | Approximate Words |
|---|---|---|
| Blogs | 899K | 37 Million |
| News | 1 Million | 34 Million |
| 2.3 Million | 30 Million |
The prediction engine uses an N-gram language modeling approach.
The algorithm analyzes previously occurring word sequences and predicts the most probable next word.
If a longer phrase match is unavailable, the model falls back to smaller word combinations to ensure a prediction is always returned.
| User Input | Predicted Word |
|---|---|
| how are | you |
| looking forward | to |
| machine learning | is |
| according to | the |
| artificial intelligence | is |
https://04yn7w-jatin-bhardwaj.shinyapps.io/firrs/
This project successfully demonstrates the implementation of predictive text analytics using Natural Language Processing and Shiny.
The final solution combines:
Future versions may include: