Activity Monitoring Data Analysis

Executive Summary

Descriptive statistics, means and medians, for non-missing daily steps excluding missing values and including missing values are lower when 2,304 missing values are imputed as zeroes.

Number of Observations Mean Daily Steps Median Daily Steps
Non-missing 15,264 (53 days) 10,766 10,765
Missing 2,304 (8 days)
Total 17,568 (61 days) 9,354 10,400

A. Loading Data

Source Dataset: activity.csv (2/11/2014 10:08AM)

(https://d396qusa40orc.cloudfront.net/repdata%2Factivity.zip)

Three variables are included in the activity monitoring dataset:
1. steps taken in a 5-minute interval (missing values are coded as NA),
2. date on which a measurement was taken in YYYY-MM-DD format, and
3. interval in which a measurement was taken.


B. Average Daily Activity Pattern

setwd("C:/Users/d2i2k/RepData_PeerAssessment1")
ActivityData <- read.csv("activity.csv", header=TRUE) 
x <- tapply(ActivityData$steps,INDEX=ActivityData$date,FUN=sum,na.rm=TRUE)
y <- subset(x, x>0)

Rplot1. Histogram of Daily Step Counts (excluding missing values)

C. Strategy for Imputation of Missing Values

Since all 2,304 missing values occurred on the same eight days, missing data are imputed as zeroes. Altogether 13,318 of 17,568 step counts (75%) are zeroes, 9,354 non-missing zero step counts and 2,304 missing or imputed zeroes.

setwd("C:/Users/d2i2k/RepData_PeerAssessment1")
ActivityData <- read.csv("activity.csv", header=TRUE) 
x <- is.na(ActivityData$steps)
x.sub <- subset(x,x="TRUE")
length(x.sub)
## [1] 2304
y <- ifelse(is.na(ActivityData$steps),0,ActivityData$steps)
z <- data.frame(y,ActivityData$date)
w <- tapply(z$y,INDEX=z$ActivityData.date,FUN=sum,na.rm=TRUE)

Rplot2. Histogram of Daily Step Counts (including imputed missing values)

C. Mean and Median Daily Step Counts (excluding missing values)

summary(y)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    0.00    0.00   32.48    0.00  806.00
x <- tapply(ActivityData$interval,INDEX=ActivityData$interval,FUN=mean,na.rm=TRUE)
y <- tapply(ActivityData$steps,INDEX=ActivityData$interval,FUN=mean,na.rm=TRUE)
xy <- cbind(x,y)

Rplot3. Time series of mean steps taken per five-minute interval averaged over days

Maximum number of steps taken per five-minute interval (peak activity equals 206 steps during the 104th five-minute interval @ 835 minutes)

which.max(y)
## 835 
## 104

D.Mean and Median Steps Taken Daily (including imputed missing values)

summary(w)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0    6778   10400    9354   12810   21190

Repeat strategy for data imputation of missing values as zeroes

x <- ifelse(is.na(ActivityData$steps), 0, ActivityData$steps)  # 17,568 row vector
y <- ActivityData$interval                                     # 17,568 row vector

E. Factor Variable for Weekends (Sat-Sun) versus Week Days (Mon-Fri)

library(chron)
w <- is.weekend(ActivityData$date)                             # 17,568 row vector
xyw <- data.frame(x,y,w)                                       # 17,568 row by 3 column array
xyw1 <- subset(xyw,w=="TRUE")                                  #  4,608 row by 3 column array for weekends
x <- tapply(xyw1$x,INDEX=xyw1$y,FUN=mean,na.rm=TRUE)           #    288 row vector of steps
y <- tapply(xyw1$y,INDEX=xyw1$y,FUN=mean,na.rm=TRUE)           #    288 row vector of intervals
z <- vector(mode = "character",length=288)                     #    288 row vector of weekends
   for (i in 1:288) {z[i] <- "Weekend"}
xy1 <- cbind(as.data.frame(x),as.data.frame(y),as.data.frame(z))    #    288 row by 3 column array for weekends

xyw2 <- subset(xyw,w=="FALSE")                                 #  4,608 row by 3 column array for weekdays
x <- tapply(xyw2$x,INDEX=xyw2$y,FUN=mean,na.rm=TRUE)           #    288 row vector of steps
y <- tapply(xyw2$y,INDEX=xyw2$y,FUN=mean,na.rm=TRUE)           #    288 row vector of intervals
z <- vector(mode = "character",length=288)                     #    288 row vector of weekdays
   for (i in 1:288) {z[i] <- "Weekday"}
xy2 <- cbind(as.data.frame(x),as.data.frame(y),as.data.frame(z))    #    288 row by 3 column array for weekdays

xy <- rbind(xy1,xy2)                                           #    576 row by 3 column array

Rplot4. Multiple time series of mean steps taken per five-minute interval averaged over weekends or weekdays