Abdallah Abdelsameia
June 28th, 2020
This is an introductory pitch presentation that introduces the text prediction final project for the data science specialization.
The aim of the project is to:
The n-gram model was built using a unigram and trigram.
The unigram was built on the previous two words:
P(wn|wn−1)=C(wn−1wn)C(wn−1)
The trigram was built on the previous three words:
P(wn|wn−2wn−1)=λ1P(wn)+λ2P(wn|wn−1)+λ3P(wn|wn−2wn−1)
The frequency of the least observed n-gram was assumed for the unobserved n-gram
The Data was 10% from the Twitter, Blogs, and News data.
There are a couple of things that might enhance the working of this algorithm