It is now possible to collect a large amount of data about personal movement using activity monitoring devices such as a Fitbit, Nike Fuelband, or Jawbone Up. These type of devices are part of the “quantified self” movement - a group of enthusiasts who take measurements about themselves regularly to improve their health, to find patterns in their behavior, or because they are tech geeks.This assignment makes use of data from a personal activity monitoring device. This device collects data at 5 minute intervals through out the day. The data consists of two months of data from an anonymous individual collected during the months of October and November, 2012 and include the number of steps taken in 5 minute intervals each day. Dataset: https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2Factivity.zip
The variables included in this dataset are:
steps: Number of steps taking in a 5-minute interval (missing values are coded as NA) date: The date on which the measurement was taken in YYYY-MM-DD format interval: Identifier for the 5-minute interval in which measurement was taken The dataset is stored in a comma-separated-value (CSV) file and there are a total of 17,568 observations in this dataset. #Necessary packages to be installed
library(lubridate)
## Warning: package 'lubridate' was built under R version 3.4.2
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.4.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:lubridate':
##
## intersect, setdiff, union
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(reshape2)
## Warning: package 'reshape2' was built under R version 3.4.2
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.2
Calculate the total number of steps taken per day Histogram of total no of steps taken each day Mean and Median of total no of steps taken per day
actMeltDat <- melt(activitydata, id= "date",measure.vars = "steps", na.rm = TRUE)
actCastDat <-dcast(actMeltDat,date ~ variable,sum)
plot(actCastDat$date, actCastDat$steps, type="h", main="Histogram of Daily Steps", xlab="Date",
ylab="Steps per Day", col="blue", lwd= 8)
abline(h=mean(actCastDat$steps, na.rm=TRUE), col="red", lwd=2)
Mean <- mean(actCastDat$steps, na.rm = TRUE)
Median <- median(actCastDat$steps, na.rm = TRUE)
actMeltInt <- melt(activitydata, id="interval", measure.vars = "steps", na.rm = TRUE)
actCastInt <- dcast(actMeltInt, interval ~ variable, sum)
plot(actCastInt$interval, actCastInt$steps, type="l",main = "Frequeny of steps per interval",
xlab = "Interval", ylab = "Steps", col= "purple", lwd = 3 )
abline(h = mean(actCastInt$steps, na.rm = TRUE),col = "blue", lwd = 2)
maxInterval <- actCastInt[which.max(actCastInt$steps),1]
1.Calculate totall number of missing values 2.To fill the missing value,we choose to replace the value with the mean value. 3.We create the function na_fill(data,pervalue), in which data is the activitydata dataframe and pervalue is actCastInt dataframe. 4.Histogram of total no of steps taken each day 5.Number of rows with NA values
sum(is.na(activitydata$steps))
## [1] 2304
naFill <- function(activitydata, pervalue) {
naIndex <- which(is.na(activitydata$steps))
naReplace <- unlist(lapply(naIndex, FUN=function(idx){
interval = activitydata[idx,]$interval
pervalue[pervalue$interval == interval,]$steps
}))
fillSteps <- activitydata$steps
fillSteps[naIndex] <- naReplace
fillSteps
}
activitydataFill <- data.frame(
steps = naFill(activitydata, actCastInt),
date = activitydata$date,
interval = activitydata$interval)
str(activitydataFill)
## 'data.frame': 17568 obs. of 3 variables:
## $ steps : int 91 18 7 8 4 111 28 46 0 78 ...
## $ date : Date, format: "2012-10-01" "2012-10-01" ...
## $ interval: int 0 5 10 15 20 25 30 35 40 45 ...
sum(is.na(activitydataFill$steps))
## [1] 0
totalSteps <- aggregate(steps ~ date, activitydataFill,sum)
colnames(totalSteps) <- c("date", "steps")
hist(totalSteps$steps,totalSteps$date, main = "Histogram of total steps taken per day",
xlab = "Total steps per day", ylab = "Number of days",
breaks = 10, col = "steel blue")
## Warning in if (freq) x$counts else x$density: the condition has length > 1
## and only the first element will be used
abline(v = mean(totalSteps$steps), lty = 1, lwd = 2, col = "red")
abline(v = median(totalSteps$steps), lty = 2, lwd = 2, col = "black")
legend(x = "topright", c("Mean", "Median"), col = c("red", "black"),
lty = c(2, 1), lwd = c(2, 2))
sum(is.na(activitydata$steps))
## [1] 2304
sum(is.na(activitydata$steps))*100/nrow(activitydata)
## [1] 13.11475
activitydataFill <- mutate(activitydataFill, dayIndicator =
ifelse(weekdays(activitydataFill$date) == "Saturday" |
weekdays(activitydataFill$date) == "Sunday", "weekend", "weekday"))
## Warning: package 'bindrcpp' was built under R version 3.4.2
activitydataFill$dayIndicator <- as.factor(activitydataFill$dayIndicator)
head(activitydataFill)
## steps date interval dayIndicator
## 1 91 2012-10-01 0 weekday
## 2 18 2012-10-01 5 weekday
## 3 7 2012-10-01 10 weekday
## 4 8 2012-10-01 15 weekday
## 5 4 2012-10-01 20 weekday
## 6 111 2012-10-01 25 weekday
averageInterval <- activitydataFill %>%
group_by(interval, dayIndicator) %>%
summarise(steps = mean(steps))
g <- ggplot(averageInterval, aes(x=interval, y=steps, color = dayIndicator)) +
geom_line() +
facet_wrap(~dayIndicator, ncol = 1, nrow=2)
print(g)