Shiny App Word Predictor

Chris Cox

Background

Goal is to create a predictive model that will take in user text and predict the next word with an aim to improve the efficiency of the users typing.

This app was build to allow the user to explore the predictive model and consider ways to improve its accuracy. The App:

  • suggests 3 next words based on the inputted phrase using the prototype predictive model and a sentiment adjustment
  • shows the sentiment of the phrase, both a numeric score and whether it is positive, neutral or negative
  • displays a circular barplot of top 20 words with probability
  • displays a word cloud with max 100 words

Prediction Model

Following steps were taken to build the prediction model using Stupid Back-Off method

  1. Load and clean the data from 3 datasets (blog, news, twitter)
  2. Build a training and test set by subsetting the provided datasets
  3. Use sbo::kgrams_freqs_fast to build prediction model
  4. At 10,000 words accuracy was 30% on the test dataset. Run time was ~ 2sec and file size was 382Mb
  5. 75% of the correct predictions came from 708 words. With 708 words, accuracy is 27%, run time is 0.7 sec, file size is 106Mb
  6. Using 85% or 1688 words only improved accuracy by 1%
  7. Final decision to use 708 words is a good balance of accuracy, size, and run time
### train using the sbo::kgrams_freqs_fast function.
small_train_freq <- kgram_freqs_fast(train, N=4, dict = max_size ~ 708, EOS = ".?!:;")

3 Word Suggestion Algorithm

Phrase sentiment is used to try to improve accuracy. Sentiment is used to order words with equal probabilities

s <- sentiment(phrase)
pt <- predict(small_train_freq, phrase)
colnames(pt) <- c("word", "probability")
pt <- anti_join(pt, badwords, by = "word") %>%
          left_join(sentim, by = "word") %>%
          filter(word != "<EOS>")
if(s$sentiment > -0.25 & s$sentiment < 0.25) { pt[is.na(pt)] <- 1
            pt <- arrange(pt, desc(probability), desc(value)) 
        } else { pt[is.na(p)] <- 0
        if(s$sentiment > 0.25){ pt <- arrange(pt, desc(probability),              desc(value))
        } else { pt <- arrange(pt, desc(probability), value)} }  

The App

Type a phrase and press Enter to see the top 3 suggested words. For example, enter the phrase “i've had the best” gives the suggested words “time, day, dream”.

The app also shows 3 plots: phrase sentiment, circular barplot with top 20 words, and a word cloud with 100 words max. plot of chunk unnamed-chunk-3