Data Science Specialization Capstone Project

Dongjun_Cho

8/17/2020

Introduction

This project is the final step of the Capstone Project for the Data Science specialization Coursera in collaboration with Swiftkey. The goal of this project is to create an application to predict the next word based on previous user input. This project implements Natural Language Processing (NLP) and Text mining..

About Dataset

This project uses Swiftkey Dataset from blogs, new sites, and Twitter from this site.

en_US.blogs
en_US.news
en_US.twitter

Predictive Model

This predictive model is based on n-gram language model. N-gram is the simplest model that assign probabilities to sentences and sequences of words from this site.

This predictive model uses Bigram, Trigram, and Quadgram. This application tries to find best-matched words from Quadgram to Bigram. This application uses Quadgram first to find the next words, but if there is no match found, then it will use trigram and so on.

bigram.RData
trigram.RData
quadgram.RData

Shiny Application

Shiny App