================================================
Title: “Reproducible Research Peer Assignment 1”
Date: “December 16, 2015”
================================================
Clear the Workspace
rm(list = ls())
Load the raw data from CSV file
activity <- read.csv("activity.csv",
colClasses = c("numeric", "character", "numeric"))
Structure of the loaded data file
str(activity)
## 'data.frame': 17568 obs. of 3 variables:
## $ steps : num NA NA NA NA NA NA NA NA NA NA ...
## $ date : chr "2012-10-01" "2012-10-01" "2012-10-01" "2012-10-01" ...
## $ interval: num 0 5 10 15 20 25 30 35 40 45 ...
Question 1
What is mean total number of steps taken per day?
(1) Compute the total number of steps each day (missing values are removed)
TotalNoofSteps <- aggregate(activity$steps, by=list(activity$date),
FUN=sum, na.rm=TRUE)
Rename the attribute
names(TotalNoofSteps) <- c("date","total")
head(TotalNoofSteps)
## date total
## 1 2012-10-01 0
## 2 2012-10-02 126
## 3 2012-10-03 11352
## 4 2012-10-04 12116
## 5 2012-10-05 13294
## 6 2012-10-06 15420
(2) Histogram of the total number of steps taken each day
hist(TotalNoofSteps$total,
main = "Histogram: Total no of steps taken each day",
xlab = "day", col = "blue")

Clear the TotalNoofSteps
rm(TotalNoofSteps)
Question 2
What is the average daily activity pattern?
Mean of steps across all days for each interval
MeanofSteps <- aggregate(activity$steps, by=list(activity$interval),
FUN=mean, na.rm=TRUE)
Reanme the attributes
names(MeanofSteps) = c("Interval","Mean")
head(MeanofSteps)
## Interval Mean
## 1 0 1.7169811
## 2 5 0.3396226
## 3 10 0.1320755
## 4 15 0.1509434
## 5 20 0.0754717
## 6 25 2.0943396
(1) Time series plot of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all days (y-axis)
plot(MeanofSteps$Interval, MeanofSteps$Mean, type = "l",
col = "blue", xlab = "Interval (minutes)",
ylab = "Average numebr of Steps",
main = "Time series plot of the average number of steps per inetrvals")

(2) Which 5-minute interval, on average across all the days in the dataset, contains the maximum number of steps?
MaxNoSteps <- which(MeanofSteps$Mean == max(MeanofSteps$Mean))
Interval <- MeanofSteps[MaxNoSteps, 1]
Interval
## [1] 835
Clear workspace
rm(Interval, MaxNoSteps)
Question 3
Imputing missing values
(1) Total number of missing Values in the dataset
NACount <- sum(is.na(activity$steps))
NACount
## [1] 2304
Clear workspace
rm(NACount)
(2) Filling in all of the missing values in the dataset
Strategy Used: Find the missing position and replace missing values by the mean of the steps attribute
NAPosition <- which(is.na(activity$steps))
NewMeanVec <- rep(mean(activity$steps, na.rm = TRUE),times = length(NAPosition))
(3) Create a new dataset that is equal to the original dataset by with the missing values filled in; Now replace NA’s by the NewMeanVec
activity[NAPosition, "steps"] <- NewMeanVec
head(activity)
## steps date interval
## 1 37.3826 2012-10-01 0
## 2 37.3826 2012-10-01 5
## 3 37.3826 2012-10-01 10
## 4 37.3826 2012-10-01 15
## 5 37.3826 2012-10-01 20
## 6 37.3826 2012-10-01 25
clear workspace
rm(NAPosition, NewMeanVec)
(4) Compute total number of steps taken each day and plot the histogram
Calculate Mean and Medain of total number of steps taken each day
TotalNoofSteps <- aggregate(activity$steps, by=list(activity$date), FUN=sum)
Rename the attributes
names(TotalNoofSteps) <- c("date","total")
head(TotalNoofSteps)
## date total
## 1 2012-10-01 10766.19
## 2 2012-10-02 126.00
## 3 2012-10-03 11352.00
## 4 2012-10-04 12116.00
## 5 2012-10-05 13294.00
## 6 2012-10-06 15420.00
Histogram
hist(TotalNoofSteps$total, main = "Histogram: Total no of steps taken per day",
xlab = "day", col = "blue")

Impact after filling missing values (NA by means):
Estimation of Mean and Median differ greatly from the first question of assignment
Question 4
Are there differences in activity patterns between weekdays and weekends?
(1) Computing the weekdays from the date attribute, using weekdays()
activity$date <- as.Date(activity$date, format = "%Y-%m-%d")
activity <- data.frame(date = activity$date, weekday = tolower
(weekdays(activity$date,abbreviate = FALSE)),
steps = activity$steps,interval = activity$interval)
activity <- cbind(activity,
daytype = ifelse(activity$weekday == "saturday" |
activity$weekday == "sunday","weekend",
"weekday"))
(2) Compute the average number of steps taken, averaged across all weekdays or weekends
AverageData <- aggregate(activity$steps,
by=list(activity$daytype,
activity$weekday, activity$interval), mean)
Rename the attributes
names(AverageData) <- c("daytype", "weekday", "interval", "mean")
Panel plot containing a time seris plot
xyplot(mean ~ interval | daytype, AverageData, type="l",
xlab="Interval", ylab="Number of steps", layout=c(1,2))
