Top 15 Most Frequent Words (Unigrams) # INSERT YOUR CODE HERE to generate a bar plot of top 15 words Top 15 Most Frequent Word Pairs (Bigrams) # INSERT YOUR CODE HERE to generate a bar plot of top 15 word pairs Key Findings Sparsity: A small percentage of unique words account for a large portion of the total word occurrences.
Context Matters: Twitter data contains shorter sentences and more informal language compared to News and Blogs.
Stop Words: Common words like “the”, “and”, and “to” dominate the frequencies, but are vital for structural prediction.
Prediction Model: I will implement a “Katz Back-off” algorithm. If a three-word sequence (Trigram) is not found, the model will “back-off” to a two-word sequence (Bigram), and eventually a single word.
Optimization: To ensure the app runs quickly, I will prune the N-gram tables to remove very rare word combinations that do not significantly improve accuracy.
Shiny App UI: The app will feature a simple text input box. As the user types, the top 3 most likely next words will be displayed instantly.