Introduction
A Shiny app that takes as input a phrase (multiple words) in a text box input and outputs a prediction of the next word.
Data cleaning, sampling and analysis
- Once analyse the data then found its too big to process
- Take a medium size sample from all three input files and mix them for final sample which represent overall picture
- Clean the data before any moving to next docker i.e. remove punctuation, control, digits, non-ASCII characters and make all the text in lower case