Reproducible Research: Peer Assignments

Loading and preprocessing the data

unzip(zipfile="activity.zip")
data <- read.csv("activity.csv")

What is mean total number of steps taken per day?

library(ggplot2)
total.steps <- tapply(data$steps, data$date, FUN = sum, na.rm = TRUE)
qplot(total.steps, binwidth = 1000, xlab = "total number of steps taken each day ")

plot of chunk unnamed-chunk-1

mean(total.steps, na.rm = TRUE)
## [1] 9354
median(total.steps, na.rm = TRUE)
## [1] 10395

What is the average daily activity pattern?

library(ggplot2)
averages <- aggregate(x=list(steps=data$steps), by=list(interval=data$interval),
                      FUN=mean, na.rm=TRUE)
ggplot(data=averages, aes(x=interval, y=steps)) +
    geom_line() +
    xlab("5-minute interval") +
    ylab("average number of steps taken")

plot of chunk unnamed-chunk-2

Imputing missing values

missing <- is.na(data$steps)
# How many missing
table(missing)
## missing
## FALSE  TRUE 
## 15264  2304

All of the missing values are filled in with mean value for that 5-minute interval.

# Replace each missing value with the mean value of its 5-minute interval
fill.value <- function(steps, interval) {
    filled <- NA
    if (!is.na(steps))
        filled <- c(steps)
    else
        filled <- (averages[averages$interval==interval, "steps"])
    return(filled)
}
filled.data <- data
filled.data$steps <- mapply(fill.value, filled.data$steps, filled.data$interval)

Are there differences in activity patterns between weekdays and weekends?

First, let’s find the day of the week for each measurement in the dataset. In this part, we use the dataset with the filled-in values.

weekday.or.weekend <- function(date) {
    day <- weekdays(date)
    if (day %in% c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday"))
        return("weekday")
    else if (day %in% c("Saturday", "Sunday"))
        return("weekend")
    else
        stop("invalid date")
}
filled.data$date <- as.Date(filled.data$date)
filled.data$day <- sapply(filled.data$date, FUN=weekday.or.weekend)

Now, let’s make a panel plot containing plots of average number of steps taken on weekdays and weekends.

averages <- aggregate(steps ~ interval + day, data=filled.data, mean)
ggplot(averages, aes(interval, steps)) + geom_line() + facet_grid(day ~ .) +
    xlab("5-minute interval") + ylab("Number of steps")

plot of chunk unnamed-chunk-5