Introduction

This document presents an exploratory data analysis (EDA) for a text prediction task. The goal of this analysis is to understand basic patterns in text data that can support next-word prediction.

Data Overview

The analysis focuses on simple text samples commonly used in natural language processing tasks.

text_data <- c(
  "i am happy",
  "you are learning",
  "we are students",
  "hello world",
  "good morning"
)

text_data
## [1] "i am happy"       "you are learning" "we are students"  "hello world"     
## [5] "good morning"
## Word Frequency Analysis
words <- unlist(strsplit(tolower(text_data), " "))
table(words)
## words
##       am      are     good    happy    hello        i learning  morning 
##        1        2        1        1        1        1        1        1 
## students       we    world      you 
##        1        1        1        1