Introduction

This report presents an initial exploration of text data that will be used to build a text prediction application.


Data Description

The data comes from three sources: - Blogs - News - Twitter


Simple Plot

library(ggplot2)

words <- c( sapply(strsplit(blogs, ” “), length), sapply(strsplit(news,” “), length), sapply(strsplit(twitter,” “), length) )

source <- c(“Blogs”,“Blogs”,“News”,“News”,“Twitter”,“Twitter”)

df <- data.frame(words, source)

ggplot(df, aes(words)) + geom_histogram(fill = “steelblue”) + facet_wrap(~source) + labs(title = “Word Count per Line”) ## Plan for Prediction Algorithm

The final application will predict the next word based on previously typed words. ## Plan for Shiny App

The Shiny app will allow users to enter text and receive a predicted next word.

Conclusion

This report confirms readiness to build the prediction model and Shiny application. ## Simple Data Summary

blogs <- c("I love data science", "Text prediction is interesting")
news <- c("The economy is growing", "Markets closed higher today")
twitter <- c("I love coding", "Learning R is fun")

data.frame(
  Source = c("Blogs", "News", "Twitter"),
  Lines = c(length(blogs), length(news), length(twitter))
)
##    Source Lines
## 1   Blogs     2
## 2    News     2
## 3 Twitter     2