This report presents an initial exploration of text data that will be used to build a text prediction application.
The data comes from three sources: - Blogs - News - Twitter
library(ggplot2)
words <- c( sapply(strsplit(blogs, ” “), length), sapply(strsplit(news,” “), length), sapply(strsplit(twitter,” “), length) )
source <- c(“Blogs”,“Blogs”,“News”,“News”,“Twitter”,“Twitter”)
df <- data.frame(words, source)
ggplot(df, aes(words)) + geom_histogram(fill = “steelblue”) + facet_wrap(~source) + labs(title = “Word Count per Line”) ## Plan for Prediction Algorithm
The final application will predict the next word based on previously typed words. ## Plan for Shiny App
The Shiny app will allow users to enter text and receive a predicted next word.
This report confirms readiness to build the prediction model and Shiny application. ## Simple Data Summary
blogs <- c("I love data science", "Text prediction is interesting")
news <- c("The economy is growing", "Markets closed higher today")
twitter <- c("I love coding", "Learning R is fun")
data.frame(
Source = c("Blogs", "News", "Twitter"),
Lines = c(length(blogs), length(news), length(twitter))
)
## Source Lines
## 1 Blogs 2
## 2 News 2
## 3 Twitter 2