Course Data Science Capstone

Word prediction app Presentation

The application:

Prediction App screen

First was created a data sample from the imported data available to the project;
Then this sample was cleaned by conversion to lowercase, removing punctuation, links, whitespace, profanation words, numbers and all kinds of special characters;
The data sample was also tokenized into bi-,tri- and quadgram;
The n-gram term frequency matrices have been transferred into frequency dictionaries;
Resulting data frames are used to predict the next word for the text inputed based on frequencies of the underlying n-grams disctionaries.