14 April 2019

Function

The app can suggestions of the next word given any existing words/phrases/sentences (no length limit) input by the user.

Below is a simple example of how it works: given the token love that, all the possibles guesses are listed in the right column.

dem <- n3 %>% filter(t1_2 == 'love that') %>% arrange(desc(p3))
head(dem %>% select(t1_2, t3))
## # A tibble: 6 x 2
##   t1_2      t3   
##   <chr>     <chr>
## 1 love that place
## 2 love that the  
## 3 love that i    
## 4 love that movie
## 5 love that you  
## 6 love that he

Model

  • The app is built on the logic of the Markovchain model and makes suggestions according to the conditional probability

  • The app also adopts the Katz' Backoff model to discount each existing probabilities (thus spare some probabilities for unobserved ones) and impose penalties on every step back.

  • Only 2% of all the text data from each source (news, blogs and tweets) got selected in order to run it on Shiny.

Illustration

The app compiles all the possible guesses in a Wordcloud, as illustrated below (input: 'her')

Conclusions

To conclude, the app is built on a simple model, thus making it fairly generalizable and easy to use.

Hope you enjoying playing around with it :)