2023-11-10

Data Science Capstone Project

This is an R Markdown presentation to pitch an application to predict the next word. It is the capstone project for the Data Science Specialization in Coursera.

Objective

The goal of this exercise is to create a product to highlight the prediction of the built algorithm and to provide an interface that can be accessed by others.

A Shiny app has been created, taking as input a phrase (multiple words) in a text box input and to output a prediction of the next word.

How it works

After creating a data sample from the corpora data, the sample was cleaned by conversion to lowercase, removing punctuation, links, white space, numbers and all kinds of special characters. This data sample was then tokenized into so-called n-grams. Those aggregated bi-,tri- and quadgram term frequency matrices have been transferred into frequency dictionaries. The resulting data.frames are used to predict the next word in connection with the text input by a user of the described application and the frequencies of the underlying n-grams table.

User info

The user must enter the desired words in the box “Enter your text here” The app will output the predicted next words.

Only English language is acepted!