2022-10-09
Introduction
- Mobile phones are increasingly used to communicate through emails, text messages, and/or social media
- To make typing on mobile phones easier, smart keyboards that use models to predict the next word have been developed
- Creating such an app is the Capstone project of Coursera’s Data Science Specialization
- This project required researching NLP (Natural Language Processing) techniques for processing text
- Project deliverable is a prediction model using the SwiftKey data files to predict a user’s next word
- The SwiftKey data used can be found here.
- For information on the raw data and text processing methods I used, see the Milestone Report
Markov Chain Models with Back-Off
- Markov-Chain models use n-grams - word strings of ‘n’ length - to predict the next word
- Typical algorithms check for the probable next word by using the largest n-gram model based on entered text
- If no prediction found, the algorithm processes smaller n-gram models until a word is found
- This is known as the Back-Off method
Testing
- Test data was processed using the same methods as training data
- 20 tests were ran - each test randomly selecting 50 lines from test data - for a total of 1000 individual tests
- Input was processed by all n-gram models equal to and less than the text length entered
- Results were compared to the actual next word in the test data
- Sample results for one 50-line test are shown below

Testing (Cont’d)
- The 1st table shows the end result of 5 50-line tests
- PercNgCorrect = NoNgCorrect/NoNgPredicted
- The second table shows the overall results for all 20 test runs

Shiny App
- Seeing the different words predicted by each model during testing was interesting to me
- Thinking others may find it interesting as well, I decided to return the same information with my app
- My Shiny App will show you what each n-gram model predicts based on the text you submit
- Give it a try and see which n-gram model looks the most accurate to you
- My Shiny App