18/6/2020

Data Science Capstone Final Project

Introduction

This project consists, as it says in the description of the given instructions, in:

  1. A Shiny app that takes as input a phrase (multiple words) in a text box input and outputs a prediction of the next word.

  2. A slide deck consisting of no more than 5 slides created with R Studio.

For the development of the application, the data of the company SwiftKey was used.

These data were cleaned and processed. Due to their size, only a 5% sample was taken from each data set. Bigrams (3264676) and trigrams (3104218) were built and statistics were generated to know the most frequent. From there the prerditions are generated.

Data Science Capstone Final Project

The app consists mainly of 3 modules:

  • Predict Text: In this part, users enter text and the app generates the following possible words, using bigrams and trigrams. You show the 10 most frequent.

  • All Words: Here users can consult the list of all possible combinations, both for bigrams and trigrams, with the text they have entered.

  • More: Information about the application.

Data Science Capstone Final Project