SwiftKey NLP: Next Word Predictor

Data Science Student | Johns Hopkins Capstone
June 2026

Slide 1 — The App

https://akshaisuresh.shinyapps.io/Capstone_Project/

What it does: Predicts your next word as you type — just like the autocomplete bar on a smartphone keyboard, powered entirely by data science.

How to use it:

Works on desktop and mobile browsers. No login required.

Given input “I want to ___”, the model:

Step	Action	Score
1	Look up quadgrams starting with “want to”	freq(quad)/freq(tri)
2	No match → try trigrams starting with “to”	× 0.4
3	No match → try bigrams	× 0.4²
4	No match → top unigrams	× 0.4³

Source	Lines	Words	Style
Blogs	899K	37M	Long-form, personal
News	1.01M	34M	Formal, structured
Twitter	2.36M	30M	Short, conversational

Training used a 10% random sample (seed = 42) for speed and memory.

  127 words  →  50% of all text covered
6,694 words  →  90% of all text covered

This means a vocabulary of ~10,000 words handles nearly everything a user will type. The rest is pruned without meaningfully hurting accuracy.

“Feels like Swiftkey but in a browser. Start typing any news headline or tweet — by the third word, predictions are already on target.”

Test phrases (try these):

Benchmarked on 5,000 held-out Twitter sentences.

✅ Fully deployed — live URL, no setup needed

✅ Robust — predicts for any input, never fails

✅ Transparent — shows which n-gram order fired

✅ Extensible — swap in Kneser-Ney or neural LM with zero UI changes

✅ Open source R — reproducible, documented, ready to scale

Built with R · data.table · Shiny · tidytext Data: HC Corpora (SwiftKey / Johns Hopkins)