This document presents an exploratory data analysis (EDA) for a text prediction task. The goal of this analysis is to understand basic patterns in text data that can support next-word prediction.
The analysis focuses on simple text samples commonly used in natural language processing tasks.
text_data <- c(
"i am happy",
"you are learning",
"we are students",
"hello world",
"good morning"
)
text_data
## [1] "i am happy" "you are learning" "we are students" "hello world"
## [5] "good morning"
## Word Frequency Analysis
words <- unlist(strsplit(tolower(text_data), " "))
table(words)
## words
## am are good happy hello i learning morning
## 1 2 1 1 1 1 1 1
## students we world you
## 1 1 1 1