Project-SwiftKey Milestone Report

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)

##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

blogs <- readLines(“en_US.blogs.txt”, n = 10000) ## Warning in readLines(“en_US.blogs.txt”, n = 10000): incomplete final line found ## on ‘en_US.blogs.txt’ news <- readLines(“en_US.news.txt”, n = 10000) ## Warning in readLines(“en_US.news.txt”, n = 10000): incomplete final line found ## on ‘en_US.news.txt’ twitter <- readLines(“en_US.twitter.txt”, n = 10000) ## Warning in readLines(“en_US.twitter.txt”, n = 10000): incomplete final line ## found on ‘en_US.twitter.txt’ blogs_char <- nchar(blogs) ggplot(data.frame(length=blogs_char), aes(x=length)) + geom_histogram(binwidth=50, fill=“blue”, color=“black”) + labs(title=“Distribution of line lengths in Blogs”, x=“Number of characters”, y=“Frequency”) blogs_words <- tibble(text = blogs) %>% unnest_tokens(word, text)

blogs_words %>% count(word, sort=TRUE) %>% top_n(10) ## Selecting by n ## # A tibble: 1 × 2 ## word n ## ## 1 test1 1 blogs_char <- nchar(blogs) ggplot(data.frame(length=blogs_char), aes(x=length)) + geom_histogram(binwidth=50, fill=“blue”, color=“black”) + labs(title=“Distribution of line lengths in Blogs”, x=“Number of characters”, y=“Frequency”) blogs_words <- tibble(text = blogs) %>% unnest_tokens(word, text)

blogs_words %>% count(word, sort=TRUE) %>% top_n(10) ## Selecting by n ## # A tibble: 1 × 2 ## word n ## ## 1 test1 1 cat(“Next steps:”, “1. Build n-gram models (unigram, bigram, trigram) from tokenized words.”, “2. Handle unseen n-grams using smoothing and backoff techniques.”, “3. Evaluate model performance with test phrases.”, “4. Create a Shiny app that predicts the next word for a given input phrase.”) ## Next steps: ## 1. Build n-gram models (unigram, bigram, trigram) from tokenized words. ## 2. Handle unseen n-grams using smoothing and backoff techniques. ## 3. Evaluate model performance with test phrases. ## 4. Create a Shiny app that predicts the next word for a given input phrase.

Project-SwiftKey Milestone Report

Deepak Varshney

2026-02-16

R Markdown

Including Plots