TEXT PREDICTOR

Hai Bacti
November 21, 2020

Apply Text Mining packages to mimic the SwiftKey application on web platform.

Available at https://bacti.shinyapps.io/text-predictor/

Goals Of The Application

💯 Memory. As a shinyapps.io Free Plan limits the memory at 1 GB, we load only 10% of the datasets to build the corpora of n-grams.

💯 Performance. We should optimize the loading time and give instant response of predictions to user input texts.

💯 Efficiency. The application should provide relevant predictions following a reliable metric that is integrated in the process of n-grams.

💯 User Experience. The app should compromise the art of visual composition in favor of pure usability.

How Does It Works?

👉 We build sets of \( N \)-grams from the corpora, processed and splitted in \( (N-1) \)-grams and the final word, corresponding to the predictor and the outcome word respectively.

👉 For performance purpose, we consider bigram and trigram only.

👉 Then calculate the frequency as maximum likelihood estimation following a Markov assumption.
\( Pr(w_n|w_{n-N+1}^{n-1}) = {count(w_{n-N+1}^{n-1}w_n) \over count(w_{n-N+1}^{n-1})} \)

👉 Top items for the prediction would be chosen from highest frequency for each predictor.

The UX Design

👍 Predicted words show instantly following the user input.
👍 Recommendation decreases from left to right.

The UX Design (cont.)

👍 Complete the sentence by tapping on suggested word.
👍 Or tapping on such word in the wordcloud.

THANK YOU!!