Coursera Data Science - Capstone Project

Gustavo Seifer
08.August.2021

Capstone Project Details

Background and rationale

Around the world, people are spending an increasing amount of time on their mobile devices for email, social networking, banking and a whole range of other activities.
The main objective of this project was to develop a text prediction model which involves Natutal Processing Language.

APP

Through a simple user interface the App predicts the next word.

Main tasks

Understanding the problem
Data acquisition and cleaning
Exploratory analysis
Statistical modeling & Predictive modeling
Creating a data product (Shiny App)
Creating a short slide deck pitching your product

Brief Description of the Methodology for developing the App

The text from different sources (News, Blogs, Twitter) was analyzed, cleaned and properly transform in a tidy format through tokenization (each word per row = token).

The words was randomly sample in order to reduce the computation time.

The words were filtered in order to eliminate stopping words and words without meaning

The words were counted globally and by source.

n-grams (bi-grams and tri-grams) were generated. Based on the n-grams a predicted model was developed

The next word is quickly predicted after the input of the user.