Data analysis about personal movement

Synopsis

This project makes use of data from a personal activity monitoring device. This device collects data at 5 minute intervals through out the day. The data consists of two months of data from an anonymous individual collected during the months of October and November, 2012 and include the number of steps taken in 5 minute intervals each day.

The report answer three questions:
1. How is the distribution of daily total steps across the two months?
2. How is the distribution of daily total steps across the two months after all NA values were replaced?
3. What’s the difference in activity patterns between weekdays and weekends?

Loading and preprocessing the data

setwd("~/Desktop/Coursera/Reproducible Research/week 2/Project/RepData_PeerAssessment1/")
activity <- read.csv("./activity.csv")

What is mean total number of steps taken per day?

First we can plot the total number of steps taken each day. By using aggregate function we got a new dataset which contains the date and corresponding total steps.

aggsum <- aggregate(steps~date, activity, sum)

Then use histogram to show the distribution of total steps each day.

hist(aggsum$steps,main = "Hist of total steps of each day",xlab = "Total steps")
abline(v = mean(aggsum$steps),col = "magenta",lwd = 2)
abline(v = median(aggsum$steps), col = "blue",lwd = 2)

Mean and median of the total number of steps taken per day

mean(aggsum$steps)

## [1] 10766.19

median(aggsum$steps)

## [1] 10765

What is the average daily activity pattern?

aggintervalsteps <- aggregate(steps~interval,activity,mean)
with(aggintervalsteps, plot(interval,steps,type = "l",xlab = "Interval", ylab = "Average steps"))

The 5_interval on average contains the maximum number of steps

maxinter <- aggintervalsteps[aggintervalsteps$steps == max(aggintervalsteps$steps),"interval"]
maxinter

## [1] 835

Imputing missing values

Calculate and report the total number of missing values in dataset.

sum(!complete.cases(activity))

## [1] 2304

Filling in all the NA values and create a new dataset.

library(plyr)
impute.mean <- function(x) replace(x,is.na(x),mean(x,na.rm = TRUE))
activityNew <- ddply(activity,~interval,transform,steps = impute.mean(steps))
activityNew <- activityNew[order(activityNew$date),]

Histogram of the total number of steps taken each day.

aggsumnew <- aggregate(steps~date,activityNew,sum)
hist(aggsumnew$steps,main = "Hist of total steps of each day(with no NA)",xlab = "Total steps")
abline(v = mean(aggsumnew$steps),col = "magenta",lwd = 2)
abline(v = median(aggsumnew$steps), col = "blue",lwd = 2)

Mean and median of the total number of steps taken per day

mean(aggsumnew$steps)

## [1] 10766.19

median(aggsumnew$steps)

## [1] 10766.19

Are there differences in activity patterns between weekdays and weekends?

activityNew$date <- as.Date(activityNew$date)
activityNew$week <- weekdays(activityNew$date)
activityNew[(activityNew$week == 'Saturday'|activityNew$week == 'Sunday'),]$week <- 'weekend'
activityNew[(activityNew$week == 'Monday'|activityNew$week == 'Tuesday'|activityNew$week == 'Thursday'|activityNew$week == 'Wednesday'|activityNew$week == 'Friday'),]$week <- 'weekday'
activityNew$week <- factor(activityNew$week)

aggintervalsteps1 <- aggregate(steps~interval+week,activityNew,mean)
library(lattice)
xyplot(steps~interval|week,data = aggintervalsteps1,type = 'l',layout = c(1,2),xlab = "Interval",ylab = "Number of steps")