Ghida Ibrahim
This is the presentation of the data science capstone project done as part of the Coursera data science specialization in partnership with Johns Hopkins and Swiftkey
The goal of this capstone project is to develop a text prediction app that predicts the next word based on previously written words. Involved steps include:
After creating a data sample from the HC Corpora data, this sample was cleaned by conversion to lowercase, removing punctuation, links, white space, numbers and all kinds of special characters. This data sample was then tokenized into so-called n-grams.
Those aggregated bi-,tri- and quadgram term frequency matrices have been transferred into frequency dictionaries.
The resulting dataframes are used to predict the next word in connection with the text input by a user of the described application and the frequencies of the underlying n-grams table.
The app can be found here: https://ghida.shinyapps.io/Shiny_App_Capstone_Project/