Capstone Presentation

15/11/2021

Overview

This presentation is for the Johns Hopkins University Data Science Specialization Capstone
The dataset that is available for this project is provided by Swiftkey
The Coursera Data Science Specialization project is to create an application that predicts the next word in a phrase/sentence.

Data used

The corpora, provided by Swiftkey, was publicly available and collected by a web crawler. Four data sets were available; our application uses the English dataset only. The data was taken from random news articles, blog posts, and twitter feeds. For use in this project, it was necessary to clean the data, removing extraneous punctuation, excessive whitespace, profanity, and other non-text elements. The portion of the data was then tokenized into ngram tables.

App features

Side panel with user instructions
Text box for user input
Predicted next word output dynamically below user input

Benefits of using the app

Lightweight
Fast response
Method allows for large training sets leading to better next word predictions