Deepti Singh Chauhan
2020-08-31
This project is done under Data Science Specialization on Coursera. The project is to create a Shiny App that uses a predictive algorithm that recommends the most likely words that would follow a particular text phrase typed by the user based on previous 1,2 or 3 words typed. The link for the app is - https://deeptichauhan.shinyapps.io/PredictiveTextApp/
The input set is represented by three files that contain text messages from different web sources (blogs, news and twitter). The content is similar, but the texts (specially in twitter messages that are often typed on smartphones) are characterized by the use of slang, emoticons, special characters, and so on. The model is based off of the so called “Stupid Back-off model”, which predicts the next word by finding a similar history of words, and then selecting the word which is most common based on the previous history. For example:
The model was tested on 1,000 test cases. The model was able to obtain an accuracy rate of 20% based on one of the top 3 predictions being correct.
The model can be used at https://deeptichauhan.shinyapps.io/PredictiveTextApp/
Additional Notes