Word Prediction Web App

by Herchu

Unmatched user experience!

Highly accurate and blazing-fast!

Sleek design. Shows or hides profanity.

Works in all modern (JavaScript enabled) browsers. Responsiveness guaranteed: tested in an iPad and an iPhone.

Using the application

app main screen capture

Internal Model

4-gram Model with Linear Interpolation Smoothing

\[ \begin{align*} P(w_n|w_{n-1},\dotsc,w_{n-3}) = & \lambda_0 P(w_n) + \lambda_1 P(w_n|w_{n-1}) + \dotsc + \\ & \lambda_3 P(w_n|w_{n-1},\dotsc,w_{n-3}) \end{align*} \]

30,000 words dictionary, unigram to tetagram tables. 2-3-4-grams have 1 million entries each (MLE; frequencies >= 3.)

n-grams tuples include begin-of-sentence tokens.

Achieves ~~16% accuracy~~ for the first word and ~~26%~~ within the best three. Independently scored by benchmark.R

Implementation

Web app hosted in shinyapps.io (developed in R)

Words in the ngram tables are integer coded: less char strings results in a ~~50MB~~ total memory footprint.

One line in R (~~fast!~~) gets the most probable next word:

head(order(rowSums(sweep(ngrams,2,weights,`*`)),decreasing=T),n=1)

Weights \( (\lambda_0,\lambda_1,\lambda_2,\lambda_3) \) were eyeballed. Optimization (COBYLA) didn't get better results.

Try it!

Includes two optional extra features:

Predicts the obligatory word plus ~~two more~~ words
PAT Phrase-Auto-Typing ™ ;-)