Word Prediction for Swiftkey

Kier O'Neil
2017-09-02

How the predictive model works?

The Word Genie uses a large volume of US news, blogs, and tweets to predict the next word

All text preparation was done using the tidytext package
The user enters 1 to 4 words and presses [Submit] button.
The submitted text is cleaned and compared to an ordered dataset
The top results are listed by their probability
The app uses the “Stupid Backoff” method for word prediction

Quantitatively summarize the performance of your prediction algorithm?

As a performance test I run 1000 iterations with a new sample from each of the data_sets.

The mean and the standard deviation below are in seconds.

performance_output <- readRDS("performance_output.RDS")
print(performance_output)

  word_num       mean          sd
1      two 0.02257417 0.010906659
2    three 0.03093342 0.011440817
3     four 0.02700870 0.009176452
4     five 0.02323461 0.009941166

How can you show the user how the product works?

Access the Word Genie here

The user will enter one to four words and press the submit button.

The model cleans and standardizes the entered text and then compares it to the appropriate dataset based on number of words.

Word Genie User Interface

References

Text Mining with R, A Tidy Approach; (2017-05-07) Julia Silge and David Robinson; http://tidytextmining.com/index.html