================================================

Title: “Reproducible Research Peer Assignment 1”

Date: “December 16, 2015”

================================================

Clear the Workspace

rm(list = ls())

Load the raw data from CSV file

activity <- read.csv("activity.csv", 
                     colClasses = c("numeric", "character", "numeric"))

Structure of the loaded data file

str(activity)
## 'data.frame':    17568 obs. of  3 variables:
##  $ steps   : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ date    : chr  "2012-10-01" "2012-10-01" "2012-10-01" "2012-10-01" ...
##  $ interval: num  0 5 10 15 20 25 30 35 40 45 ...

Set the date attribute to an actual date format

library(lattice)
## Warning: package 'lattice' was built under R version 3.2.3
activity$date <- as.Date(activity$date, format = "%Y-%m-%d")

Question 1

What is mean total number of steps taken per day?

(1) Compute the total number of steps each day (missing values are removed)

TotalNoofSteps <- aggregate(activity$steps, by=list(activity$date), 
                            FUN=sum, na.rm=TRUE)

Rename the attribute

names(TotalNoofSteps) <- c("date","total")
head(TotalNoofSteps)
##         date total
## 1 2012-10-01     0
## 2 2012-10-02   126
## 3 2012-10-03 11352
## 4 2012-10-04 12116
## 5 2012-10-05 13294
## 6 2012-10-06 15420

(2) Histogram of the total number of steps taken each day

hist(TotalNoofSteps$total, 
     main = "Histogram: Total no of steps taken each day", 
     xlab = "day", col = "blue")

(3) Mean and Median of total number of steps taken per day

mean(TotalNoofSteps$total)
## [1] 9354.23
median(TotalNoofSteps$total)
## [1] 10395

Clear the TotalNoofSteps

rm(TotalNoofSteps)

Question 2

What is the average daily activity pattern?

Mean of steps across all days for each interval

MeanofSteps <- aggregate(activity$steps, by=list(activity$interval), 
                         FUN=mean, na.rm=TRUE)

Reanme the attributes

names(MeanofSteps) = c("Interval","Mean")
head(MeanofSteps)
##   Interval      Mean
## 1        0 1.7169811
## 2        5 0.3396226
## 3       10 0.1320755
## 4       15 0.1509434
## 5       20 0.0754717
## 6       25 2.0943396

(1) Time series plot of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all days (y-axis)

plot(MeanofSteps$Interval, MeanofSteps$Mean, type = "l", 
     col = "blue", xlab = "Interval (minutes)", 
     ylab = "Average numebr of Steps", 
     main = "Time series plot of the average number of steps per inetrvals")

(2) Which 5-minute interval, on average across all the days in the dataset, contains the maximum number of steps?

MaxNoSteps <- which(MeanofSteps$Mean == max(MeanofSteps$Mean))
Interval <- MeanofSteps[MaxNoSteps, 1]
Interval
## [1] 835

Clear workspace

rm(Interval, MaxNoSteps)

Question 3

Imputing missing values

(1) Total number of missing Values in the dataset

NACount <- sum(is.na(activity$steps))  
NACount
## [1] 2304

Clear workspace

rm(NACount)

(2) Filling in all of the missing values in the dataset

Strategy Used: Find the missing position and replace missing values by the mean of the steps attribute

NAPosition <- which(is.na(activity$steps))

NewMeanVec <- rep(mean(activity$steps, na.rm = TRUE),times = length(NAPosition))

(3) Create a new dataset that is equal to the original dataset by with the missing values filled in; Now replace NA’s by the NewMeanVec

activity[NAPosition, "steps"] <- NewMeanVec  
head(activity)
##     steps       date interval
## 1 37.3826 2012-10-01        0
## 2 37.3826 2012-10-01        5
## 3 37.3826 2012-10-01       10
## 4 37.3826 2012-10-01       15
## 5 37.3826 2012-10-01       20
## 6 37.3826 2012-10-01       25

clear workspace

rm(NAPosition, NewMeanVec)

(4) Compute total number of steps taken each day and plot the histogram

Calculate Mean and Medain of total number of steps taken each day

TotalNoofSteps <- aggregate(activity$steps, by=list(activity$date), FUN=sum)

Rename the attributes

names(TotalNoofSteps) <- c("date","total")
head(TotalNoofSteps)
##         date    total
## 1 2012-10-01 10766.19
## 2 2012-10-02   126.00
## 3 2012-10-03 11352.00
## 4 2012-10-04 12116.00
## 5 2012-10-05 13294.00
## 6 2012-10-06 15420.00

Histogram

hist(TotalNoofSteps$total, main = "Histogram: Total no of steps taken per day", 
     xlab = "day", col = "blue")

Mean and Median of total number of steps taken per day

mean(TotalNoofSteps$total)  
## [1] 10766.19
median(TotalNoofSteps$total)
## [1] 10766.19

Impact after filling missing values (NA by means):

Estimation of Mean and Median differ greatly from the first question of assignment

Question 4

Are there differences in activity patterns between weekdays and weekends?

(1) Computing the weekdays from the date attribute, using weekdays()

activity$date <- as.Date(activity$date, format = "%Y-%m-%d")

activity <- data.frame(date = activity$date, weekday = tolower
                       (weekdays(activity$date,abbreviate = FALSE)),
                       steps = activity$steps,interval = activity$interval)

activity <- cbind(activity,
                  daytype = ifelse(activity$weekday == "saturday" |
                                   activity$weekday == "sunday","weekend",
                                   "weekday"))

(2) Compute the average number of steps taken, averaged across all weekdays or weekends

AverageData <- aggregate(activity$steps, 
                         by=list(activity$daytype, 
                                 activity$weekday, activity$interval), mean)

Rename the attributes

names(AverageData) <- c("daytype", "weekday", "interval", "mean")

Panel plot containing a time seris plot

xyplot(mean ~ interval | daytype, AverageData, type="l",  
       xlab="Interval", ylab="Number of steps", layout=c(1,2))