Data Science Capstone Final Project Submission

Parichay
1st August 2020

Word prediction using Katz backoff algorithm

Introduction

This project is made with dedication to learn prediction of words with the help of natural language processing and choosing the best algorithm for the prediction.

Link to the app for the prediction model: https://parichayk.shinyapps.io/predict_word/.

Getting and cleaning data

  • The data used in the model is the data provided by the John Hopkins Univeristy.SInce the dataset was to large and was taking time to be processed we have subset the data to 10% using rbinom function.

  • The data has been tokenized using tm_map package and profanity words have been removed to enhance the output produced.

Prediction model

The data was first subdivided into ngrams and bigram and trigram were processed and smoothened to be used in the predictive algorithm.

The predictive algorithm used in the model is Katz backoff model

Shiny application

alt text

Thank You