Final Presentation

NHartley

2024-05-14

Product Description

The goal of this project is to create a predictive text model which predicts the next word in a sequence of 2-4 words, based off of a corpus of text pulled from various public social media and news sources

Source N-gram Analysis

Most bodies of text should generally conform to Zipf’s law. Zipf’s law is an empirical rule that states that in a given dataset of natural language, the frequency of any word is inversely proportional to its rank in the frequency table

Source Stop Word Analysis

Retaining common stop words - which are often removed - made the training set larger and the model more robust to a wider of potential text inputs, making the model more generalizable

Model Effectiveness

The model returns clear and coherent responses, although the accuracy might mean that there is heightened sensitivity to modestly different spelling formats and hidden text figures.