JHU Data Science Specialization Capstone: Project

Vishal Ambavade
2020-07-27

Smart Word predictor using SwiftKey Dataset

Following slides will walk you through the capstone
project made under the JHU's Data Science Specialization Course.

Project Objective

The aim of this project is to develop a Shiny app which can predict the next word in a sentence based on user's input.
The phases of the project are as follows:

  • Getting and cleaning the data
  • Exploratory Data Analysis
  • Modeling
  • Prediction Model building
  • Creative exploration
  • Shiny App
  • Slide deck

Methodology

The Data

  • The corpora are collected from publicly available sources by a web crawler provided by SwiftKey.
  • I have used only English data for this project.

Method

  • Data WAS downloaded and cleaned
  • EDA WAS performed on the data
  • Data was tokenized into n-grams
  • Model was created based on tokenization
  • Algorithm was improved by tweaking the parameters
  • Shiny app was made which predicts the next word based on user's input

The Shiny App

  • The shiny app can be found here: https://vishal-ambavade.shinyapps.io/Smart_Word_Predictor/
  • The app consists of a single text input field where the user is supposed to enter the text and the predicted word is shown in blue color
  • Graphs show the top n-grams from model
  • Below is the snapshot showing the app alt text

References