INTRODUCTION:

It is now possible to collect a large amount of data about personal movement using activity monitoring devices such as a Fitbit, Nike Fuelband, or Jawbone Up. These type of devices are part of the “quantified self” movement - a group of enthusiasts who take measurements about themselves regularly to improve their health, to find patterns in their behavior, or because they are tech geeks.This assignment makes use of data from a personal activity monitoring device. This device collects data at 5 minute intervals through out the day. The data consists of two months of data from an anonymous individual collected during the months of October and November, 2012 and include the number of steps taken in 5 minute intervals each day. Dataset: https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2Factivity.zip

The variables included in this dataset are:

steps: Number of steps taking in a 5-minute interval (missing values are coded as NA) date: The date on which the measurement was taken in YYYY-MM-DD format interval: Identifier for the 5-minute interval in which measurement was taken The dataset is stored in a comma-separated-value (CSV) file and there are a total of 17,568 observations in this dataset. #Necessary packages to be installed

library(lubridate)

## Warning: package 'lubridate' was built under R version 3.4.2

## 
## Attaching package: 'lubridate'

## The following object is masked from 'package:base':
## 
##     date

library(dplyr)

## Warning: package 'dplyr' was built under R version 3.4.2

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:lubridate':
## 
##     intersect, setdiff, union

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(reshape2)

## Warning: package 'reshape2' was built under R version 3.4.2

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 3.4.2

Loading and pre processing the data

Mean of total No of steps taken per day (using reshape2 package)

Calculate the total number of steps taken per day Histogram of total no of steps taken each day Mean and Median of total no of steps taken per day

actMeltDat <- melt(activitydata, id= "date",measure.vars = "steps", na.rm = TRUE)
actCastDat <-dcast(actMeltDat,date ~ variable,sum)
plot(actCastDat$date, actCastDat$steps, type="h", main="Histogram of Daily Steps", xlab="Date", 
     ylab="Steps per Day", col="blue", lwd= 8)
abline(h=mean(actCastDat$steps, na.rm=TRUE), col="red", lwd=2)

Mean <- mean(actCastDat$steps, na.rm = TRUE)
Median <- median(actCastDat$steps, na.rm = TRUE)

What is the average daily activity pattern? (using package reshape2)

Average daily pattern by interval

Time series plot of the average number of steps taken (Average frequency of steps by interval )

actMeltInt <- melt(activitydata, id="interval", measure.vars = "steps", na.rm = TRUE)
actCastInt <- dcast(actMeltInt, interval ~ variable, sum)
plot(actCastInt$interval, actCastInt$steps, type="l",main = "Frequeny of steps per interval",
     xlab = "Interval", ylab = "Steps", col= "purple", lwd = 3 )
abline(h = mean(actCastInt$steps, na.rm = TRUE),col = "blue", lwd = 2)

maxInterval <- actCastInt[which.max(actCastInt$steps),1]

Strategy for imputing missing values

1.Calculate totall number of missing values 2.To fill the missing value,we choose to replace the value with the mean value. 3.We create the function na_fill(data,pervalue), in which data is the activitydata dataframe and pervalue is actCastInt dataframe. 4.Histogram of total no of steps taken each day 5.Number of rows with NA values

sum(is.na(activitydata$steps))

## [1] 2304

naFill <- function(activitydata, pervalue) {
  naIndex <- which(is.na(activitydata$steps))
  naReplace <- unlist(lapply(naIndex, FUN=function(idx){
    interval = activitydata[idx,]$interval
    pervalue[pervalue$interval == interval,]$steps
  }))
  fillSteps <- activitydata$steps
  fillSteps[naIndex] <- naReplace
  fillSteps
}

activitydataFill <- data.frame(  
  steps = naFill(activitydata, actCastInt),  
  date = activitydata$date,  
  interval = activitydata$interval)
str(activitydataFill)

## 'data.frame':    17568 obs. of  3 variables:
##  $ steps   : int  91 18 7 8 4 111 28 46 0 78 ...
##  $ date    : Date, format: "2012-10-01" "2012-10-01" ...
##  $ interval: int  0 5 10 15 20 25 30 35 40 45 ...

sum(is.na(activitydataFill$steps))

## [1] 0

totalSteps <- aggregate(steps ~ date, activitydataFill,sum)
colnames(totalSteps) <- c("date", "steps")
hist(totalSteps$steps,totalSteps$date, main = "Histogram of total steps taken per day", 
     xlab = "Total steps per day", ylab = "Number of days", 
     breaks = 10, col = "steel blue")

## Warning in if (freq) x$counts else x$density: the condition has length > 1
## and only the first element will be used

abline(v = mean(totalSteps$steps), lty = 1, lwd = 2, col = "red")
abline(v = median(totalSteps$steps), lty = 2, lwd = 2, col = "black")
legend(x = "topright", c("Mean", "Median"), col = c("red", "black"), 
       lty = c(2, 1), lwd = c(2, 2))

sum(is.na(activitydata$steps))

## [1] 2304

sum(is.na(activitydata$steps))*100/nrow(activitydata)

## [1] 13.11475

Panel plot comparing the average number of steps taken per 5-minute interval across weekdays and weekends

compare the activities in weekday and weekend

For this I have used the dataset in which the missing data is filled in.

Using dplyr,mutate to create new column dayIndicator

Calculation of average steps interval in 5min

Using ggplot2 for the plotting of timeseries of 5min of weekday and weekend and compare the average steps

activitydataFill <- mutate(activitydataFill, dayIndicator = 
                             ifelse(weekdays(activitydataFill$date) == "Saturday" | 
                                      weekdays(activitydataFill$date) == "Sunday", "weekend", "weekday"))

## Warning: package 'bindrcpp' was built under R version 3.4.2

activitydataFill$dayIndicator <- as.factor(activitydataFill$dayIndicator)
head(activitydataFill)

##   steps       date interval dayIndicator
## 1    91 2012-10-01        0      weekday
## 2    18 2012-10-01        5      weekday
## 3     7 2012-10-01       10      weekday
## 4     8 2012-10-01       15      weekday
## 5     4 2012-10-01       20      weekday
## 6   111 2012-10-01       25      weekday

averageInterval <- activitydataFill %>%
  group_by(interval, dayIndicator) %>%
  summarise(steps = mean(steps))
g <- ggplot(averageInterval, aes(x=interval, y=steps, color = dayIndicator)) +
  geom_line() +
  facet_wrap(~dayIndicator, ncol = 1, nrow=2)
print(g)

Reproducible Research - Project1

Lopamudra Satpathy

November 17, 2017