The Next Word Is!
Data Science Capstone Project
Avinayan
Predicting the Next Word
- The aim of this project is to predict the next word of a sentence or a phrase
- Pitch:
- In today's world of smart phones and devices with small form-factor, the size of the keypad is shrinking.
- This makes the task of typing harder.
- This is where companies like SwiftKey are working on predicting what the user is likely to type next and through that improve the typing experience for users.
- In this project, we will use data science to predict the user's next word with reasonable accuracy.
Model Strategy and Algorithm
- To begin with, the data provided had to be prepared so that it is ready for the Algorithm.
- This involved using the
tm package on the sample data.
- Then various data preprocessing steps like converting to lower case, removing numbers, punctuations, stopwords, profanity and using only the stem words were completed
- The corpus was then converted into n-gram tokens. (n = 1, 2, 3, 4)
- The probability of the occurrence of each of those n-grams were computed and sorted.
- This process was repeated and refined to get a good n-grams model.
- The n-grams based on their probability score forms the basis for predicting the next word
Shiny App
- The final predictive model was deployed on the Shiny Server.
- The Link is provided here: The Next Word Is!
- How to Use this App:
- Enter your phrase or sentence in the Input Box and hit Submit
- The most likely next word of the phrase / sentence will be displayed on the right side panel
- Note: Please wait for a few extra seconds during the first attempt so that the model can load.
Conclusion and Acknowledgements
- This model provides reasonable level of accuracy and is good in predicting commonly used words.
- Some Future enhancements:
- Utilize cloud computing infrastructure as it provides for additional RAM capacity (which was limited in my laptop).
- Utilize ensemble methods to improve the model accuracy.
- Use external data and data idiosyncratic to the user (reinforced learning) to improve prediction accuracy.
- Acknowldgements:
- Professors at the JHU for the excellent content in this specialization and providing opportunity to practically apply what was learnt.
- SwiftKey for the data, support and consulting through the Capstone project.