Capstone Final Project

November 28, 2025

Text Prediction

The goal of this exercise is to create a product to highlight the prediction algorithm that I have built and to provide an interface that can be accessed by others. The Shiny app will takes as input a phrase (multiple words) in a text box input and outputs a prediction of the next word. This slide deck consist of no more than 5 slides created with R Studio Presenter pitching the algorithm and app.

For this project, link to the application and code are given below

-Shiny App https://bcaslan.shinyapps.io/Data-Science-Capstone-Final-Project-main/.

-Repo https://github.com/11beyza/CapstoneFinalProject.

Prediction Algorithm

Data was downloaded from Coursera-SwiftKey.zip. Read the blog, news and twitter dataset from the English language files and built a a collection of written texts called text corpus using VCorpus. The corpus is processed using tm_map to remove punctuation, numbers, whitespaces, stopwords, convert text to lower case and stemDocument.

Next we apply tokenization which is the splitting of a phrase, sentence, paragraph, or an entire text document into smaller units, such as individual words or terms. The processed corpus was then tokenized in n-grams frequency database, namely 2-gram (biwords), 3-grams (triwords) and 4-grams (quadwords) with frequency of occurrence n

Shiny Application

A shiny app that predict the next word after some text input by a user. The app will first loads the n-grams frequency database from the GitHub account that is used to perform the word prediction. The user entered the text in an input box and the predicted word will be shown.

Screenshot of the Application

Please see the shiny app for the application.