Reproducible Research: Peer Assessment 1

By Jawad Rashid

Loading and preprocessing the data

1. Loading the data.

data <- read.csv("activity.csv", stringsAsFactors = FALSE)

2. Converting the date to a valid R date object.

data$date <- as.Date(data$date)

What is mean total number of steps taken per day?

1. Histogram of the total number of steps taken each day

hist(data$steps, xlab = "Total number of steps", main = "Total number of steps taken each day")

plot of chunk unnamed-chunk-3

2. Mean and median total number of steps taken per day

mean(data$steps, na.rm = TRUE)
## [1] 37.38
median(data$steps, na.rm = TRUE)
## [1] 0

What is the average daily activity pattern?

1. Time series plot (i.e. type = “l”) of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all days (y-axis)

averageStepsByInterval <- tapply(data$steps, data$interval, mean, na.rm = TRUE)
plot(names(averageStepsByInterval), averageStepsByInterval, type = "l", xlab = "5-minute Interval", 
    ylab = "Average number of steps taken", main = "Avg number of steps across 5-minute interval")

plot of chunk unnamed-chunk-5

2. 5-minute interval, on average across all the days in the dataset, contains the maximum number of steps?

# Find the index and the maximum value
maxValues <- which.max(averageStepsByInterval)[1]
maxInterval <- names(which.max(averageStepsByInterval))
maxValue <- which.max(averageStepsByInterval)[[1]]
# Convert the interval from string to integer value
maxInterval <- strtoi(maxInterval, 10L)

maxInterval
## [1] 835
maxValue
## [1] 104

Imputing missing values

1. Total number of missing values in the dataset (i.e. the total number of rows with NAs)

# Missing Values
missingValues <- is.na(data$steps)
table(missingValues)[[2]]
## [1] 2304

2. Strategy for filling in all of the missing values in the dataset is use the mean for the 5-minute interval.

meanStepsByInterval <- tapply(data$steps, data$interval, mean, na.rm = TRUE)
naRows <- data[is.na(data), ]
for (i in 1:nrow(naRows)) {
    naRows[i, 1] <- meanStepsByInterval[[toString(naRows[i, 3])]]
}

3. Creating a new dataset that is equal to the original dataset but with the missing data filled in.

filledInData <- data
filledInData[is.na(data), 1] <- naRows[, 1]

4. Creating a histogram with the missing values and recomputing mean and median.

hist(filledInData$steps, xlab = "Total number of steps", main = "Total number of steps taken each day")

plot of chunk unnamed-chunk-10

mean(filledInData$steps, na.rm = TRUE)
## [1] 37.38
median(filledInData$steps, na.rm = TRUE)
## [1] 0
averageStepsByIntervalForFilledData <- tapply(filledInData$steps, filledInData$interval, 
    mean, na.rm = TRUE)

Impact of replacing missing values.

averageStepsByIntervalForFilledData <- tapply(filledInData$steps, filledInData$interval, 
    mean, na.rm = TRUE)
averageStepsByInterval[1:10]
##       0       5      10      15      20      25      30      35      40 
## 1.71698 0.33962 0.13208 0.15094 0.07547 2.09434 0.52830 0.86792 0.00000 
##      45 
## 1.47170
averageStepsByIntervalForFilledData[1:10]
##       0       5      10      15      20      25      30      35      40 
## 1.71698 0.33962 0.13208 0.15094 0.07547 2.09434 0.52830 0.86792 0.00000 
##      45 
## 1.47170
par(mfrow = c(1, 2))
hist(data$steps, xlab = "Total number of steps", main = "Missing Values")
hist(filledInData$steps, xlab = "Total number of steps", main = "Filled in Missing Values")

plot of chunk unnamed-chunk-12

You can see the difference more visible with log of the frequency

par(mfrow = c(1, 2))
hist(log10(data$steps + 1), xlab = "Total number of steps", main = "Log of Missing Values")
hist(log10(filledInData$steps + 1), xlab = "Total number of steps", main = "Log of Filled in Values")

plot of chunk unnamed-chunk-13

Are there differences in activity patterns between weekdays and weekends?

days <- weekdays(filledInData$date)
weekend <- (days == "Saturday" | days == "Sunday")
dayfactor <- factor(weekend, labels = list("weekday", "weekend"))
filledInData$daytype <- dayfactor

groupedData <- aggregate(filledInData$steps, list(DayType = filledInData$daytype, 
    Interval = filledInData$interval), mean)
library(lattice)
xyplot(groupedData$x ~ groupedData$Interval | groupedData$DayType, layout = c(1, 
    2), xlab = "Interval", ylab = "Number of Steps", type = "l")

plot of chunk unnamed-chunk-14