Title Slide
========================================================
author: Alejandra
date: March 27, 2025
output: ioslides_presentation
Trigram Next-Word Prediction
Data Science Capstone Project
Slide 1
========================================================
## Introduction
- **Project**: Trigram Next-Word Prediction App
- **Goal**: Suggest the next word a user is likely to type
- **Motivation**: Improve typing efficiency and simulate predictive keyboards
- **Dataset**: Blogs, news, and Twitter from the SwiftKey corpus
- **Built with**: R, quanteda, Shiny, data.table
Slide 2
========================================================
## Data Processing
- Sampled 10,000 lines from each text source (blogs, news, Twitter)
- Removed punctuation, numbers, symbols, and URLs
- Converted text to lowercase
- Tokenized into **trigrams** (three-word phrases)
- Counted frequency of trigrams → exported to CSV (`trigram_freq.csv`)
```{r, echo=FALSE}
library(quanteda)
text <- c(“This is a sample sentence for tokenization.”)
toks <- tokens(text, remove_punct = TRUE)
toks <- tokens_tolower(toks)
toks_trigrams <- tokens_ngrams(toks, n = 3)
dfm <- dfm(toks_trigrams)
```
Slide 3
========================================================
## Prediction Algorithm
- **Input**: User types any phrase (2+ words)
- **Algorithm**:
Extract last two words
Match against `phrase` column in trigram frequency table
Return most frequent `next_word`
```{r, echo=FALSE}
predict_next_word <- function(input_phrase, trigram_table) {
last_two <- tail(strsplit(input_phrase, ” “)[[1]], 2)
phrase <- paste(last_two, collapse = ” “)
match <- trigram_table[phrase == phrase]
return(match$next_word[1])
}
```
Slide 4
========================================================
## Shiny App Demo
🌐 Deployed App:
[https://duarfel.shinyapps.io/app_ale/](https://duarfel.shinyapps.io/app_ale/)
### How it Works:
- Enter a phrase like:
`I want to`
`Let me know`
`Can you please`
- Click **Predict**
- View predicted next word below
📸 *(Insert screenshot of the app here if desired)*
Slide 5
========================================================
## Conclusion & Future Work
✅ Completed:
- Cleaned and sampled data from multiple corpora
- Built trigram model
- Deployed live app using Shiny
🚀 Next Steps:
- Back-off to bigrams/unigrams if no trigram match
- Add multiple next-word suggestions
- Experiment with neural net language models
- Mobile optimization
Thank you! 🙌
**Contact Alejandra**
[duarfel.shinyapps.io/app_ale](https://duarfel.shinyapps.io/app_ale)