Welcome to Captone Next Word Prediction project

Aysegul Sonmez
03 May, 2018

                          Data Science Capstone
                       by Johns Hopkins University

Introduction

Mr. Jeff Leek, PhD, Associate Professor, Biostatistics Bloomberg School of Public Health Roger D. Peng, PhD

Mr. Roger D. Peng, PhD, Associate Professor, Biostatistics Bloomberg School of Public Health Brian Caffo, PhD

Mr. Brian Caffo, PhD, Professor, Biostatistics Bloomberg School of Public Health

Project Goal

Create a usable application on natural language processing The objective of the project is to build a functioning predictive text model. The data is from a corpus called HC Corpora, and,for this application, only the english datasets have been utilized.

[Johns Hopkins University]:(https://www.jhsph.edu/) (JHU) is to create a usable application on natural language processing. This capstone project is offered in collaboration with SwiftKey.

For this project, the Text Mining packages tm and text2vec were used, along with the data manipulationpackage dplyr and the package doParallel. The app was created using the shiny package.

Predictive Model/ Algorithm



To build the predictive model, 2.000.000 lines from all twitter, blogs and news datasets were sampled and worked on it. dataset cleaned, by removing all non-ascii characters converted to lowercase letters and then by removing all contractions, punctuation, numbers, profanities, leftout letters and extra whitespaces.

It was very processor-intensive.Processing correct amount of word dictionaries are very sensitive to get correct estimation.Choosed n (for n-gram) equals to number of words plus one word. Search for appropriate n-grams and order results from common (high frequency, or high probability) to rare.

All frequencies up to 6-grams were computed with sampled data. The top ten predictions displayed with back-off model,according to the user input.

Please access the URL of an R Pubs document describing my exploratory analysis with (http://rpubs.com/)

Click here to access Milestone Project

The Shiny Application

-Word Predict Applicasion User Manual

Word Prediction Application

1.I used TabPanel for “Next Word Prediction”, “Alorith”,“About”

2.In the Next Word Prediction please type your text and see your top ten prediction results and also best prediction.

3.You will see wordcloud plot according to your prediction.

4.Wait dictionary load in ~5 sec.

  1. Enter some text and explore results will show in one sec.

7- *For accessing to the next word prediction application * Please Click here.

Word Prediction Application Word Prediction Application