Descriptive statistics, means and medians, for non-missing daily steps excluding missing values and including missing values are lower when 2,304 missing values are imputed as zeroes.
| Number of Observations | Mean Daily Steps | Median Daily Steps |
|---|---|---|
| Non-missing 15,264 (53 days) | 10,766 | 10,765 |
| Missing 2,304 (8 days) | ||
| Total 17,568 (61 days) | 9,354 | 10,400 |
Source Dataset: activity.csv (2/11/2014 10:08AM)
(https://d396qusa40orc.cloudfront.net/repdata%2Factivity.zip)
Three variables are included in the activity monitoring dataset:
1. steps taken in a 5-minute interval (missing values are coded as NA),
2. date on which a measurement was taken in YYYY-MM-DD format, and
3. interval in which a measurement was taken.
setwd("C:/Users/d2i2k/RepData_PeerAssessment1")
ActivityData <- read.csv("activity.csv", header=TRUE)
x <- tapply(ActivityData$steps,INDEX=ActivityData$date,FUN=sum,na.rm=TRUE)
y <- subset(x, x>0)
Rplot1. Histogram of Daily Step Counts (excluding missing values)
Since all 2,304 missing values occurred on the same eight days, missing data are imputed as zeroes. Altogether 13,318 of 17,568 step counts (75%) are zeroes, 9,354 non-missing zero step counts and 2,304 missing or imputed zeroes.
setwd("C:/Users/d2i2k/RepData_PeerAssessment1")
ActivityData <- read.csv("activity.csv", header=TRUE)
x <- is.na(ActivityData$steps)
x.sub <- subset(x,x="TRUE")
length(x.sub)
## [1] 2304
y <- ifelse(is.na(ActivityData$steps),0,ActivityData$steps)
z <- data.frame(y,ActivityData$date)
w <- tapply(z$y,INDEX=z$ActivityData.date,FUN=sum,na.rm=TRUE)
Rplot2. Histogram of Daily Step Counts (including imputed missing values)
summary(y)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 0.00 0.00 32.48 0.00 806.00
x <- tapply(ActivityData$interval,INDEX=ActivityData$interval,FUN=mean,na.rm=TRUE)
y <- tapply(ActivityData$steps,INDEX=ActivityData$interval,FUN=mean,na.rm=TRUE)
xy <- cbind(x,y)
Rplot3. Time series of mean steps taken per five-minute interval averaged over days
Maximum number of steps taken per five-minute interval (peak activity equals 206 steps during the 104th five-minute interval @ 835 minutes)
which.max(y)
## 835
## 104
summary(w)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 6778 10400 9354 12810 21190
x <- ifelse(is.na(ActivityData$steps), 0, ActivityData$steps) # 17,568 row vector
y <- ActivityData$interval # 17,568 row vector
library(chron)
w <- is.weekend(ActivityData$date) # 17,568 row vector
xyw <- data.frame(x,y,w) # 17,568 row by 3 column array
xyw1 <- subset(xyw,w=="TRUE") # 4,608 row by 3 column array for weekends
x <- tapply(xyw1$x,INDEX=xyw1$y,FUN=mean,na.rm=TRUE) # 288 row vector of steps
y <- tapply(xyw1$y,INDEX=xyw1$y,FUN=mean,na.rm=TRUE) # 288 row vector of intervals
z <- vector(mode = "character",length=288) # 288 row vector of weekends
for (i in 1:288) {z[i] <- "Weekend"}
xy1 <- cbind(as.data.frame(x),as.data.frame(y),as.data.frame(z)) # 288 row by 3 column array for weekends
xyw2 <- subset(xyw,w=="FALSE") # 4,608 row by 3 column array for weekdays
x <- tapply(xyw2$x,INDEX=xyw2$y,FUN=mean,na.rm=TRUE) # 288 row vector of steps
y <- tapply(xyw2$y,INDEX=xyw2$y,FUN=mean,na.rm=TRUE) # 288 row vector of intervals
z <- vector(mode = "character",length=288) # 288 row vector of weekdays
for (i in 1:288) {z[i] <- "Weekday"}
xy2 <- cbind(as.data.frame(x),as.data.frame(y),as.data.frame(z)) # 288 row by 3 column array for weekdays
xy <- rbind(xy1,xy2) # 576 row by 3 column array
Rplot4. Multiple time series of mean steps taken per five-minute interval averaged over weekends or weekdays