In this Section, the activity data is loaded. If we run the command str(activitydat), it can be seen that, the date variable is stored as a factor. So, one of the preprocessing steps is to convert the date variable into Date data type.
activitydat <- read.csv("activity.csv")
activitydat$date <- as.Date(activitydat$date)
str(activitydat)
## 'data.frame': 17568 obs. of 3 variables:
## $ steps : int NA NA NA NA NA NA NA NA NA NA ...
## $ date : Date, format: "2012-10-01" "2012-10-01" ...
## $ interval: int 0 5 10 15 20 25 30 35 40 45 ...
In this Section, the total number of steps for each day is calculated using the aggregate function. A sample of the aggregation is shown below. Then, a histogram of the number of steps taken each day is plotted. Finally, the mean and median of the total number of steps is calculated.
steps_day <- aggregate(steps ~ date, sum, data = activitydat)
head(steps_day)
## date steps
## 1 2012-10-02 126
## 2 2012-10-03 11352
## 3 2012-10-04 12116
## 4 2012-10-05 13294
## 5 2012-10-06 15420
## 6 2012-10-07 11015
hist(steps_day$steps,
xlab = "Steps taken Each day",
ylab = "Number of Days",
main = "Histogram of Steps taken Each Day")
mean(steps_day$steps)
## [1] 10766.19
median(steps_day$steps)
## [1] 10765
In this Section, the daily activity pattern averaged over the 5 minute interval is studied via a time series plot. The interval containing the maximum number of steps is also found.
pattern <- aggregate(steps ~ interval, mean, data = activitydat)
library(ggplot2)
qplot(interval,steps,data = pattern,
geom = "line",
xlab = "5 Minute Interval",
ylab = "Average Number of steps",
main = "Daily Activity Pattern")
pattern[(pattern$steps == max(pattern$steps)),]
## interval steps
## 104 835 206.1698
NumMissing <- which(is.na(activitydat))
The Number of missing values in the dataset is 2304
imputedat <- activitydat
for (i in 1 : nrow(imputedat))
{
if(is.na(imputedat$steps[i]))
{
ind <- which((pattern$interval == imputedat$interval[i]))
imputedat$steps[i] <- pattern$steps[ind]
}
}
head(imputedat)
## steps date interval
## 1 1.7169811 2012-10-01 0
## 2 0.3396226 2012-10-01 5
## 3 0.1320755 2012-10-01 10
## 4 0.1509434 2012-10-01 15
## 5 0.0754717 2012-10-01 20
## 6 2.0943396 2012-10-01 25
imp_steps_day <- aggregate(steps ~ date, sum, data = imputedat)
head(imputedat)
## steps date interval
## 1 1.7169811 2012-10-01 0
## 2 0.3396226 2012-10-01 5
## 3 0.1320755 2012-10-01 10
## 4 0.1509434 2012-10-01 15
## 5 0.0754717 2012-10-01 20
## 6 2.0943396 2012-10-01 25
hist(imp_steps_day$steps,
xlab = "Steps taken Each day",
ylab = "Number of Days",
main = "Histogram of Steps taken Each Day - After Imputing")
mean(imp_steps_day$steps)
## [1] 10766.19
median(imp_steps_day$steps)
## [1] 10766.19
library(timeDate)
for (i in 1: nrow(imputedat))
{
if(isWeekend(imputedat$date[i]))
imputedat$day[i] <- "Weekend"
else
imputedat$day[i] <- "Weekday"
}
imputedat$day <- as.factor(imputedat$day)
patternwk <- aggregate(steps ~ interval+day, mean, data = imputedat)
qplot(interval,steps,data = patternwk,
geom = "line",
xlab = "5 Minute Interval",
ylab = "Average Number of steps",
main = "Daily Activity Pattern", facets = (day ~ .)
)