Title Slide

========================================================

author: Alejandra

date: March 27, 2025

output: ioslides_presentation

Trigram Next-Word Prediction

Data Science Capstone Project


Slide 1

========================================================

## Introduction

- **Project**: Trigram Next-Word Prediction App

- **Goal**: Suggest the next word a user is likely to type

- **Motivation**: Improve typing efficiency and simulate predictive keyboards

- **Dataset**: Blogs, news, and Twitter from the SwiftKey corpus

- **Built with**: R, quanteda, Shiny, data.table


Slide 2

========================================================

## Data Processing

- Sampled 10,000 lines from each text source (blogs, news, Twitter)

- Removed punctuation, numbers, symbols, and URLs

- Converted text to lowercase

- Tokenized into **trigrams** (three-word phrases)

- Counted frequency of trigrams → exported to CSV (`trigram_freq.csv`)

```{r, echo=FALSE}

library(quanteda)

text <- c(“This is a sample sentence for tokenization.”)

toks <- tokens(text, remove_punct = TRUE)

toks <- tokens_tolower(toks)

toks_trigrams <- tokens_ngrams(toks, n = 3)

dfm <- dfm(toks_trigrams)

```


Slide 3

========================================================

## Prediction Algorithm

- **Input**: User types any phrase (2+ words)

- **Algorithm**:

  1. Extract last two words

  2. Match against `phrase` column in trigram frequency table

  3. Return most frequent `next_word`

```{r, echo=FALSE}

predict_next_word <- function(input_phrase, trigram_table) {

last_two <- tail(strsplit(input_phrase, ” “)[[1]], 2)

phrase <- paste(last_two, collapse = ” “)

match <- trigram_table[phrase == phrase]

return(match$next_word[1])

}

```


Slide 4

========================================================

## Shiny App Demo

🌐 Deployed App:

[https://duarfel.shinyapps.io/app_ale/](https://duarfel.shinyapps.io/app_ale/)

### How it Works:

- Enter a phrase like:

- Click **Predict**

- View predicted next word below

📸 *(Insert screenshot of the app here if desired)*


Slide 5

========================================================

## Conclusion & Future Work

✅ Completed:

- Cleaned and sampled data from multiple corpora

- Built trigram model

- Deployed live app using Shiny

🚀 Next Steps:

- Back-off to bigrams/unigrams if no trigram match

- Add multiple next-word suggestions

- Experiment with neural net language models

- Mobile optimization

Thank you! 🙌

**Contact Alejandra**

[duarfel.shinyapps.io/app_ale](https://duarfel.shinyapps.io/app_ale)