25/05/2020

Introduction

Using a dataset that consists of English text gathered online, we try to build a model that can predict and generate text based provided text.

Getting and processing the data

the dataset is obtained from the Capstone course, we create smaller samples of the data and create data frames that contain the frequency of each n-gram

Word frequency

Words frequency

Model

we create our N-gram model that will allow us to make predict the next word based on the previous n-1 words. if no suitable prediction was found we move the (n-1)gram.

ShinyApp

We present the model in the form of a shiny App that has 4 main features :
- Text Generator :
generate N words based on the text you provided.
- Auto-Complete :
Returns the 3 most possible words for your input (with their corresponding probabilities)
- Word ranks :
find a word with a certain rank , or find the rank of a certain word.
- words and n-grams frequencies :
Display the barplots of the frequencies of the top 20 n-grams and words.