2024-12

About the app

The Text predition app is a lightweight tool that improves text input efficiency by anticipating the next word. When predictions are correct users can select words instead of typing them.

How does it work?

Predictions are based on word sequences found in nearly half a million lines of sample text. The samples consist of English language blogs, news stories and twitter/X posts. Here’s what the app does:

  1. isolates the last two words the user types
  2. finds the same combination of words in the sample text
  3. lists words that most often follow that combination

To give users choice, the app lists the top seven words by frequency.

Note: if there are fewer than seven matches for the last two words, the app uses the last word only to complete the list.

Preparing the data

Social media posts can be full of spelling mistakes and profanity. To keep predictions clean and running smoothly I prepared the sample data before use:

  • removed spelling mistakes
  • removed profanity
  • split hyphenated words
  • converted everything to lowercase
  • removed most punctuation and all numerals
  • replaced common contractions with expanded versions
    (I'm becomes I am)

Note: The rationale and code for data preparation can be found at https://sagarana.github.io/data_science_capstone/capstone_data_preparation.html

What next?

The Shiny app is intentionally simple to make it easier to incorporate into other interfaces. It showcases two ways to engage with the predictions (list and word cloud), though only one is likely to be used in any given context. Code for the Shiny app is available here on GitHub.

The app is a start, but potential improvements could make it even more useful:

  • ability to select word using tab and enter keys
  • predictions based on type of word (e.g. noun, pronoun, verb)
  • predictions refined based on first letter typed

Thanks for your engagement!