Coursera Data Science Specialization Capstone Project
khsarma
21-Oct-2018
This project is related to NLP. It involves analysing Swiftkey data - Text data feeds from Twitter, News and Blogs. Task is to take input data, create corpus, create N-grams and output next word predictions.
Input:
Output:
Katz's Backoff model (Reference: Wiki) with Good-Turing Discounting is used for prediction. This model calculates the conditional probability of a word against preceding words.
Shiny App contains:
Example: Step-1: User enters a set of words - “hello how are ” and hits Submit button
Step-2: User can find predicted words under “Prediction” tab.
It can be observed that word “you” is having higher probability of appearing next.
Step-3: User can also check plot of words under “Plot” tab.