This report presents some analysis on the number of steps taken by people wearing activity monitoring devices.
The data is loaded from a CSV into a datatable.
library(data.table)
library(lattice)
options(scipen=999)
data <- read.csv("./activity.csv",stringsAsFactors=FALSE)
dt <- data.table(data)
totalStepsPerDay <- dt[,list(total_steps=sum(steps)),by=date]
histogram(~total_steps, totalStepsPerDay)
meanSteps <- round(mean(totalStepsPerDay$total_steps,na.rm=T),digits=2)
medianSteps <- round(median(totalStepsPerDay$total_steps,na.rm=T),digits=2)
The mean number of steps in a day is 10766.19 and the median number of steps in a day is 10765.
meanStepsPerInterval <- dt[,list(mean_steps=mean(steps,na.rm=T)),by=interval]
xyplot(mean_steps~interval, meanStepsPerInterval, type='l')
maxInterval <- meanStepsPerInterval[which.max(meanStepsPerInterval[,mean_steps]),interval]
The interval with the highest mean number of steps is 835.
incompleteCaseCount <- sum(!complete.cases(dt))
There are 2304 rows with NA’s.
The missing values are replaced with the mean for that interval.
dt <- merge(dt, meanStepsPerInterval, by="interval")
dt$steps <- ifelse(is.na(dt$steps), dt$mean_steps, dt$steps)
totalStepsPerDay <- dt[,list(total_steps=sum(steps)),by=date]
histogram(~total_steps, totalStepsPerDay)
meanSteps <- round(mean(totalStepsPerDay$total_steps,na.rm=T),digits=2)
medianSteps <- round(median(totalStepsPerDay$total_steps,na.rm=T),digits=2)
The updated mean number of steps in a day is 10766.19 and the median number of steps in a day is 10766.19. By definition, adding additional samples with the mean value has no effect on the mean number of steps taken in a day but the median increases slightly.
dt$dayType <- factor(ifelse(weekdays(as.Date(dt$date)) %in% c("Saturday","Sunday"), "Weekend", "Weekday"))
meanStepsPerIntervalByDayType <- dt[,list(mean_steps=mean(steps,na.rm=T)),by=list(interval = dt$interval, dayType = dt$dayType)]
xyplot(mean_steps~interval | dayType, meanStepsPerIntervalByDayType, type='l',layout=c(1,2))