SwiftKey Capstone: Exploratory Analysis Report

1. Introduction

This report presents an initial exploratory analysis of the SwiftKey dataset provided for the Coursera Data Science Capstone.
The goal is to:

Confirm that the dataset is downloaded and successfully loaded
Understand its basic structure
Explore text characteristics
Display summary statistics and simple visualizations
Outline plans for building the final prediction model and Shiny app

This document is written in a simple, business-friendly style so that non-technical stakeholders can understand the progress.

2. Dataset Overview

The dataset contains text data from:

Blogs
News
Twitter

We use only the English-language files for model training.

# Load packages
library(stringi)
library(ggplot2)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

blogs   <- readLines("final/en_US/en_US.blogs.txt", warn = FALSE, skipNul = TRUE)
news    <- readLines("final/en_US/en_US.news.txt", warn = FALSE, skipNul = TRUE)
twitter <- readLines("final/en_US/en_US.twitter.txt", warn = FALSE, skipNul = TRUE)

SwiftKey Capstone: Exploratory Analysis Report

Spurthi

1. Introduction

2. Dataset Overview