This milestone report summarizes the progress made so far on the text prediction project. It demonstrates loading and exploring the dataset, performing basic exploratory analysis, and outlines the plan for building a predictive model and Shiny app.
The dataset consists of three text files: blogs, news, and Twitter.
# Example placeholders
blogs <- "en_US.blogs.txt"
news <- "en_US.news.txt"
twitter <- "en_US.twitter.txt"
# Display file sizes (as an example)
file_info <- data.frame(
File = c("Blogs", "News", "Twitter"),
Size_MB = c(200, 150, 160) # Placeholder values
)
file_info
## File Size_MB
## 1 Blogs 200
## 2 News 150
## 3 Twitter 160
# Placeholder example statistics
data_summary <- data.frame(
File = c("Blogs", "News", "Twitter"),
Word_Count = c(1000000, 900000, 1200000),
Line_Count = c(80000, 75000, 100000)
)
data_summary
## File Word_Count Line_Count
## 1 Blogs 1000000 80000
## 2 News 900000 75000
## 3 Twitter 1200000 100000
This report demonstrates the initial steps and serves as a foundation for further development of the predictive model and app.