Introduction

This report will seek to analyze some data taken from a personal activity monitor taken by an anonymous individual. Data is collected during the months of October and November, 2012, and includes steps taken per 5 minue interval throughout the day. Variables in the data set include:

Loading and processing the data

if(!file.exists("activity.csv")){
download.file(url = "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2Factivity.zip", destfile = "activity.zip")
unzip("activity.zip")
}

# Loading Libraries
library(dplyr)
library(knitr)

# Load data
act <- read.csv('activity.csv')
act <- data.frame('steps'=as.integer(act$steps),  
                  'date'=as.Date(act$date),  
                  'interval'=as.integer(act$interval))

Initial Analysis: Mean, Median, Maximum

Mean and Median for the total steps taken per day:

Note that the mean is somewhat lower than the median because of the large number of 0’s in the data set.

act_steps <- tapply(act$steps, act$date, sum, na.rm=T)
mean(act_steps)
## [1] 9354.23
median(act_steps)
## [1] 10395

What is the interval with the maximum number of steps?

The interval with the highest maximum number of steps is No. 615, with a value of 806.

maximum <- which(act$steps == max(act$steps[!is.na(act$steps)]))
act[maximum,]
##       steps       date interval
## 16492   806 2012-11-27      615

This is by no means an outlier, as illustrated by the table showing the top 0.05% of other maximum steps per interval

top_001 <- quantile(act$steps, 0.9995, na.rm = T)
kable(act[which(act$steps > top_001),], caption='Top 0.05% max steps per interval', align = "c")
Top 0.05% max steps per interval
steps date interval
3277 802 2012-10-12 900
4136 786 2012-10-15 835
10194 785 2012-11-05 925
14024 785 2012-11-18 1635
14201 789 2012-11-19 720
15745 785 2012-11-24 1600
16487 794 2012-11-27 550
16492 806 2012-11-27 615

Visualizing the data:

Frequency of steps taken by measurement:

This skewing of the data twoard 0 is apparent in the plot of the frequency of steps taken per measurement of each interval:

hist(act$steps, main="Frequency of Steps Taken", xlab="steps", ylab="frequency")

Frequency of total steps taken by day

The plot of total steps taken per day shows that, though the vast majority of the measurements are 0, the actual number of steps taken per day is somewhat Gausian:

hist(aggregate(steps ~ date, act, sum)$steps, main ="Sum of Steps per Day", xlab = "Steps per Day")

Daily Activity pattern

ptrn <- tapply(act$steps, act$interval, mean, na.rm = T)
plot(ptrn, type="l", main = "Fig 3: Daily Activity Pattern", ylab="steps", xlab = "interval")

Imputing missing values

To impute the missing data by filling with the mean of steps taken per interval:

    ags <- aggregate(steps ~ interval, data = act, FUN=mean)
    na_fill <- NULL
    for(i in 1:nrow(act)) {
        replace_rows <- act[i,]
        
        ifelse(is.na(replace_rows$steps), 
            tmp <- subset(ags, interval == replace_rows$interval)$steps,
            tmp <- replace_rows$steps)
        
        na_fill <- c(na_fill, tmp)
    }
act_new <- act
act_new$steps <- na_fill

Mean, Median of new dataset:

The new mean and median are larger than in the original data set because NA values are now equal to the mean of each interval.

act_new_steps <- tapply(act_new$steps, act_new$date, FUN = sum)
mean(act_new_steps)
## [1] 10766.19
median(act_new_steps)
## [1] 10766.19


This is also visible in the frequency plot for sum of steps per day: the only change is in the central bucket, 1000-1500 steps, because NA values were imputed with mean values.

hist(act_new_steps, main = "New total steps per day", xlab="steps per day")



Are there differences in activity patterns between weekdays and weekends?

The daily pattern for weekends is similar to weekdays, but there is more noise. In both daily patterns, there is a large jump in steps taken around the 105th interval, and then a dip for the rest of the day. For weekends, intervals after the 105th interval, contain much more noise. Also, there are generally more steps taken on the weekends.