Using a dataset that consists of English text gathered online, we try to build a model that can predict and generate text based provided text.
Getting and processing the data
the dataset is obtained from the Capstone course, we create smaller samples of the data and create data frames that contain the frequency of each n-gram