Basic Statistics of the Data Files
| Blogs |
200 |
899288 |
37334131 |
| News |
196 |
1010242 |
34372530 |
| Twitter |
159 |
2360148 |
30373583 |

3. Future Goals for Prediction Algorithm and Shiny App
The final goal of this project is to build a predictive text
application. Based on this exploratory analysis, my plan is as
follows:
- Prediction Model: I will develop an N-gram model
(using 2-word, 3-word, and 4-word sequences) to predict the next word
based on user input.
- Handling Unseen Phrases: I will implement a
“back-off” strategy. If a 3-word phrase isn’t found, the algorithm will
look at 2-word pairs to provide the best possible guess.
- App Design: The Shiny app will feature a simple
text interface. As the user types, the top 3 most likely next words will
be displayed instantly.
- Optimization: To ensure the app is fast and
memory-efficient for mobile users, I will prune the dictionary to remove
very rare words and phrases.