Data Science Presentation

Alessandro Galletto
27th January, 2020

Introduction

This is the final project for the Data Science Course by John Hopkins University. It consists in developing a shiny app that predicts the next word for a sentence.

The app carries out these functions:

  • Entry field for a sentence
  • Prediction algorithm for the next word based on a corpus of news, twitter and blogs sentences.
  • Show the next word suggested.

Prediction algorithm

The algorithm is based on a corpus of text that is tokenized into bigram, trigram and quadgram and the frequency matrices are stored in RDS files to be used by the prediction algorithm. The shiny app uses this files in order to find and show the next word most frequently used.

Shiny app server algorithm

  1. Read input string
  2. Tokenize the input
  3. Predict using quadrigram (if not found I try trigram o bigram)
  4. Show the next better three choices.

Shiny app ui

  1. Simply insert the text in the input box
  2. The three predictions are shown

Word predictor