Reproducible Research: Peer Assessment 1
Author: Alexander Barrantes Herrera
Date: March 14, 2019
Loading and preprocessing the data
1. Load the data (i.e. read.csv())
if(!file.exists('activity.csv')){
unzip('activity.zip')
}
activityData <- read.csv('activity.csv')
What is mean total number of steps taken per day?
stepsByDay <- tapply(activityData$steps, activityData$date, sum, na.rm=TRUE)
1. Make a histogram of the total number of steps taken each day
qplot(stepsByDay, xlab='Total steps per day', ylab='Frequency using binwith 500', binwidth=500)

What is the average daily activity pattern?
averageStepsPerTimeBlock <- aggregate(x=list(meanSteps=activityData$steps), by=list(interval=activityData$interval), FUN=mean, na.rm=TRUE)
1. Make a time series plot
ggplot(data=averageStepsPerTimeBlock, aes(x=interval, y=meanSteps)) +
geom_line() +
xlab("5-minute interval") +
ylab("average number of steps taken")

2. Which 5-minute interval, on average across all the days in the dataset, contains the maximum number of steps?
mostSteps <- which.max(averageStepsPerTimeBlock$meanSteps)
timeMostSteps <- gsub("([0-9]{1,2})([0-9]{2})", "\\1:\\2", averageStepsPerTimeBlock[mostSteps,'interval'])
Imputing missing values
1. Calculate and report the total number of missing values in the dataset
numMissingValues <- length(which(is.na(activityData$steps)))
- Number of missing values: 2304
2. Devise a strategy for filling in all of the missing values in the dataset.
3. Create a new dataset that is equal to the original dataset but with the missing data filled in.
activityDataImputed <- activityData
activityDataImputed$steps <- impute(activityData$steps, fun=mean)
4. Make a histogram of the total number of steps taken each day
stepsByDayImputed <- tapply(activityDataImputed$steps, activityDataImputed$date, sum)
qplot(stepsByDayImputed, xlab='Total steps per day (Imputed)', ylab='Frequency using binwith 500', binwidth=500)

Are there differences in activity patterns between weekdays and weekends?
1. Create a new factor variable in the dataset with two levels â âweekdayâ and âweekendâ indicating whether a given date is a weekday or weekend day.
activityDataImputed$dateType <- ifelse(as.POSIXlt(activityDataImputed$date)$wday %in% c(0,6), 'weekend', 'weekday')
2. Make a panel plot containing a time series plot
averagedActivityDataImputed <- aggregate(steps ~ interval + dateType, data=activityDataImputed, mean)
ggplot(averagedActivityDataImputed, aes(interval, steps)) +
geom_line() +
facet_grid(dateType ~ .) +
xlab("5-minute interval") +
ylab("avarage number of steps")
