Mehmet İLİK
2022-09-05
Welcome to text prediction app! This is the capstone project of the Johns Hopkins University Data Science Specialization by Coursera. Application is about predicting the next word based on the words user typed. When you run the app and fill the box with at least one word, predictions will be shown downside of the box. Application will try to predict next word based on previous word or words you’ve typed. This application uses ngram models. Ngram models are created from Swiftkey Dataset. Dataset has 4 language sets. We used the English language. And three kind of resources in the dataset (Blogs, News, Twitter) are combined. Up to sixgram models are being used to predict the word. After you have typed more than 5 words, only last 5 words will be considered. There are 3 predictions from different n gram models. But sometimes some models will not produce a prediction. First prediction can take time but after that predictions will be much more faster.
When user type 1 word, app will take it as input and clean it (from emojis and numbers-symbols etc. and make letters lower) and bigram model will be filtered based on first word and most used word with first word will be suggested. And when user type second word app will take two of the words as input and will filter the trigram model and suggest the most used word with these both words. But when this is happening app will also take only second word as input and will filter the bigram model and suggest a new word. So when user typed two words app will suggest two new words based on bigram and trigram models. And when user typed 3 words this time app will filter the fourgram model for three words and filter trigram model for two words and will filter bigram model for the last word. So we will have 3 predictions from 3 different ngram models. But sometimes some models won’t suggest any word (Because model may not have such data). When user typed 4th word, app will filter fivegram model, fourgram model and trigram models. And when user typed 5th word app will filter sixgram model, fivegram model and fourgram model. And here we have an exception. If fourgram model can’t suggest any word for first prediction then trigram and bigram models will be used to suggest a word. And if fivegram model can’t suggest any word for the second prediction then fourgram and trigram models will be used to filter. After user typed more than 5 words, app will consider the last 5 words typed. Because we have up to sixgram models.
Usage of app
For the code of app: https://github.com/milikest/Coursera-Johns-Hopkins-University-Data-Science-Capstone-Project
Shiny app link:
https://c94mnx-mehmet-0l0k.shinyapps.io/Text_Prediction/
Thank you so much for being a companion with me in this data science journey. I hope our roads will cross again. Who knows maybe in same start-up ha?