This project could be considered as a subdomain application of Text Mining and Natural Language Processing where there is an intrinsic handling of documents or group of text to attain an objective of study.
Properly speaking, the main porpuse of the project is to take advantage of the probabilistic distribution of text sourced from a group of documents (Corpora), and treat that input in order to estimate parameters of the Language Model needed to support the underlying algorithm to be applied onto a dataset to calculate the most likely words in the tail of a potential sentence.