Sagar Pathak
May 14, 2016
As referred by the name, this project involves to predict the next word from the word entered by the user by processing complex algoration which uses data tables of 4 grams with frequencies of occurences. The HC Corpora dataset is comprised of the output of crawls of news sites, blogs and twitter. The dataset contains 3 files across four languages (Russian, Finnish, German and English). This project is created using the English language datasets.
Features of prediction algorithm
The main goal of this capstone project is to build a shiny application that is able to predict the next word. This exercise was divided into seven sub tasks like data cleansing, exploratory analysis, the creation of a predictive model and more.
All text data that is used to create a frequency dictionary and thus to predict the next words comes from a corpus called HC Corpora.
All text mining and natural language processing was done with the usage of a variety of well-known R packages such as stylo, data.table etc.
Application Link: https://sagar1992.shinyapps.io/word-predict-project
User will be able to input phrase insde the input box. Result will display on the right side box (User Interface II) on the fly.
After user enters the input phrase. Predicted word will display as a tile as follows.
The word predicting application was successfully created and hosted on
https://sagar1992.shinyapps.io/word-predict-project
using R packages such as stylo, data.table etc. This project helped me get advanced use of R programming language and also a R studio with features such as RPress, shinyapps and R Pubs. Which will be definetely helpful for other research and presentations.
Thanks, Sagar