SWOORDS: Next Word? No Problem!

Hazim Hanif
12 June 2017

A Data Science Specialization Capstone Project using Shiny Apps

Introduction

  • Swoords is a text prediction app that predict the next word from phrases input by user using n-gram models and backoff algorithm.
  • This project is dedicated for Data Science Specialization Capstone project.
  • The source data is from HC Corpora https://d396qusza40orc.cloudfront.net/dsscapstone/dataset/Coursera-SwiftKey.zip
  • Source data consist of multilingual datasets from blogs, twitter and news.

Project Information

  • This app uses the n-gram models along with combination of Katz Backoff algorithm to perform prediction of the next word from a give phrases
  • The model were trained with a random sample of 1% from twitter, blogs and news datasets.
  • A combination of bigrams untils pentagrams were used to produce this prediction model.
  • A term-frequency document is stored locally thus increasing the performance of the prediction model.
  • R will just perform a lookup matching on the term-frequency document to search for matching words.
  • Support English only.
  • This app is deploy using Shiny.

Features

Some features of Swoords:

  • Reactive input and output display.
  • Intuitive UI using custom themes.
  • Multiple words suggesstions.
  • Wordcloud of predicted words.
  • Performance optimization by locally stored term-frequency documents.
  • Use up to 5-grams model thus increasing the context awareness of the prediction.

How It Works

  1. Visit the app at: https://hazimhanif.shinyapps.io/Swoords/
  2. Follow the simple instructions provided.
  3. Enjoy!