This presentation reports the Exploratory Data Analysis (EDA) conducted for building a “N-gram based Next Word Prediction” model, last project in the R for Data Science 10 Course Specialisation.
This report comprises details on following phases.
- Basic Characteristics
- Data Pre-Processing
- Word and N-gram Distributions
- How many words we need to cover 60% of all instances
- Next plans