Peter Geers
May 2017
The objective of the App is a predictive model that offers hints with what verbs to continue the words entered by user. The dataset used to train the application includes text from twitter, news and blogs provided by Swiftkey. After performing data cleaning, sampling and sub-setting, all data is gathered in a data frame. Applying some Text Mining (TM) and NLP techniques, a set of word combinations (N-grams) is created. A Katz Backoff algorithm predicts the next word.
Just type one or more words. The app shows what the user entered and a cleaned version. As the main result, the top n-grams predictions, based on the data enetered, are displayed. The user can review and change the data, and the app will turn back to present more hints to predict. Another tab offers more documentation.
Top 5 of some N-Grams in the data frame loaded by Shiny App.
| word | freq | |
|---|---|---|
| right now | right now | 423 |
| cant wait | cant wait | 391 |
| last night | last night | 305 |
| feel like | feel like | 243 |
| dont know | dont know | 237 |
| word | freq | |
|---|---|---|
| thanks for the follow | thanks for the follow | 141 |
| the end of the | the end of the | 102 |
| at the end of | at the end of | 87 |
| the rest of the | the rest of the | 79 |
| cant wait to see | cant wait to see | 77 |
Based on the dataset retrieved word clouds are made to get an impression of the data in the dataset. Here a Twitter example.