Coursera Data Science Capstone Project Presentation

A. Zuoza
16 August 2015

Summary about project

The goal of the project was to create a algorithm and application for predicting next word, which the user want to write.

This is briefly presentation about algorithm and application.

All this work is a part of the Coursera Data Science specialization, offered by Johns Hopkins University.

All calculations, analysis and application was done with R and RStudio.

How prediction works 1

Based on given data a frequency dictionary was build.

The dictionary consist from 88158 trig rams, which was found at least 5 times in given data. Small piece of dictionary is presented below.

           X1       X2       Y Frequency
1    concerns    about       a         5
89   brothers      and       a         5
405        of becoming       a        61
3162      and      the ability       127

Prediction, whit the help of simple search, is working for third word only, i.e. if user entered two words, then algorythm is trying to predict third. If user entered more then three words, then prediction is made based on last two words.

How prediction works 2 - resuts

There are three posible scenarios: 1. User input is found in the dictionary. Then the algorythm gives back ordered by frequency posibles third words. For example: - user input - output

Just last word was found in the dictionary. Then the algorythm gives back ordered by frequency posibles third words. For example:
- user input
- output
User entry was not found in the dictionary. Then the algorythm gives back NA value.

How app works

shinyapp

User can enter her/his text
The text is repeated one more time
Prediction is presented

Future development steps

Expand dictionary with less repeated trigrams.
Expand dictionary to 4 most used words.

Additional links

My “next 6 words” prediction app is placed on shinyapps.io: https://azuoza.shinyapps.io/Capstone_project
The code of application, reports and scripst can be found on Git Hub: https://github.com/azuoza/
More about Data Science Specialization on Coursera can be found on: https://www.coursera.org/