25 February 2018

Overview

This project is part of the Coursera JHU Data Science Specialization Capstone Project. The full capstone project consists of: - Exploratory analysis on different corpus - Build ngrams model of the different corpus - Predict the next word using our model - Build an interactive SHiny app to use our model

My model Description

  • Data cleansing (lower case, remove digits, non words removed)
  • Build ngrams model of the different corpus
  • NGram Tokenizer was used to break text into words
  • Built 1,2,3 and 4 Grams dataset sorted by frequence
  • Naive prediction the most frequent mathinc words will be predicted

User Interface