COURSERA - JHU Swiftkey Capstone Project

Gito

2024-11-25

Executive Summary

This Project is part of Coursera - John Hopkins Data Science Specialization Course on Swiftkey Capstone Project.

This Project is about predicting/suggesting next text/word based on the given text/sentences, like popular phone Swiftkey Smart Keyboard.

The N-Gram result was already summarized from Task02 Project based on collections of 3 training datasets, en_US.blogs.txt,en_US.twitter.txt and en_US.news.txt which then pre-processed to be N-Gram tables.

Algorithm Steps :

  1. Data Collections from News,Blogs and Twitter and Data Cleaning

  2. NLP Analysis to produce n-Gram, in this project we will use mono,bi and tri-Gram to predict up to 2 word predictors

  3. Generate 5 highest frequency n-gram

  4. Filter up to 2 words input to predict 2nd/3rd words based on 5 highest likelihood

  5. System View

Data Collections and Data Cleaning

Below are the steps of Data Collections and Data Cleaning

Filtering, Predictions Result and Limitations

Predictions will be made based on the mono,bi and tri-gram tables from the given input predictors

Below are project limitations and limitations could be further enhanced as per required.

System View

Below are graphical interface on shiny apps