Introduction

The goal of this project is to demonstrate familiarity with the text data and to show progress toward building a predictive text model and Shiny web application. This report presents an exploratory analysis of the training data and outlines plans for the final application.


Data Loading

if (!require(stringi)) install.packages("stringi")
if (!require(ggplot2)) install.packages("ggplot2")

library(stringi)
library(ggplot2)

Data Description

The dataset consists of text collected from three sources: - Blogs - News - Twitter

These sources represent different writing styles and text lengths.


Simple Data Summary

data.frame( Source = c(“Blogs”, “News”, “Twitter”), Lines = c(length(blogs), length(news), length(twitter)) )

library(ggplot2)

words <- c( stri_count_words(blogs), stri_count_words(news), stri_count_words(twitter) )

source <- c( rep(“Blogs”, length(blogs)), rep(“News”, length(news)), rep(“Twitter”, length(twitter)) )

df <- data.frame(words = words, source = source)

ggplot(df, aes(words)) + geom_histogram(fill = “steelblue”, bins = 50) + facet_wrap(~source, scales = “free_y”) + labs(title = “Word Count per Line”, x = “Words per Line”, y = “Frequency”)

##Plan for Prediction Algorithm

The final application will predict the next word based on previously typed words using statistical language models. The model will learn common word patterns from the text data.


##Plan for Prediction Algorithm

The final application will predict the next word based on previously typed words using statistical language models. The model will learn common word patterns from the text data. —

##Conclusion

This exploratory analysis confirms readiness to build the prediction model and the Shiny application. —