Reproducible Research: Peer Assessment 1

Loading and preprocessing the data

data <- read.csv("C:\\Users\\lenovo\\Desktop\\activity.csv")

What is mean total number of steps taken per day?

For this part of the assignment, missing values are ignored.

1.Make a histogram of the total number of steps taken each day

steps <- tapply(data$steps, data$date, sum)
hist(steps, breaks = 10, col = "blue", main = "Histogram of Steps Taken Each Day", 
    xlab = "Steps Taken Each Day")

plot of chunk unnamed-chunk-2

2.Calculate and report the mean and median total number of steps taken per day

stepsmean <- mean(steps, na.rm = T)
stepsmedian <- median(steps, na.rm = T)

Mean of total number of steps taken per day is 1.0766 × 10⁴, and median of total number of steps taken per day is 10765.

What is the average daily activity pattern?

1.Make a time series plot of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all days (y-axis)

stepsAve <- tapply(data$steps, data$interval, mean, na.rm = T)
stepsAve <- as.numeric(stepsAve)
interval <- as.numeric(levels(factor(data$interval)))
plot(interval, stepsAve, type = "l", col = "red", xlab = "5-minute interval", 
    ylab = "Steps Averaged", main = "Steps Averaged Across All Days", frame = F)

plot of chunk unnamed-chunk-4

2.Which 5-minute interval, on average across all the days in the dataset, contains the maximum number of steps?

data1 <- data.frame(interval, stepsAve)
maxinterval <- data1[data1[, 2] == max(stepsAve), ][1]

5-minute interval of “ 835 ” contains the maximum number of steps.

Imputing missing values

1.Calculate and report the total number of missing values in the dataset

missingNum <- sum(is.na(data))

There are 2304 missing values in the dataset.

2.Mean for that 5-minute interval are used to fill in all the missing values in the dataset

dataFilled <- data
for (i in 1:nrow(dataFilled)) {
    if (is.na(dataFilled[i, 1])) 
        dataFilled[i, 1] <- data1[data1[, 1] == dataFilled[i, 3], ][, 2]
}
missingnum <- sum(is.na(dataFilled))

Now, new dataset was created that is equal to the original one but with 0 missing value.

3.Make a histogram of the total number of steps taken each day and Calculate the mean and median total number of steps taken per day

stepsFilled <- tapply(dataFilled$steps, dataFilled$date, sum)
hist(stepsFilled, breaks = 10, col = "green", main = "Histogram of Steps Taken Each Day (Filled)", 
    xlab = "Steps Taken Each Day")

plot of chunk unnamed-chunk-8

stepsFilledmean <- mean(stepsFilled, na.rm = T)
stepsFilledmedian <- median(stepsFilled, na.rm = T)

After imputing missing values, mean of total number of steps taken per day is 1.0766 × 10⁴, and median of total number of steps taken per day is 1.0766 × 10⁴.It shows no difference from the estimates above. Using mean for that 5-minute interval to fill in all the missing values has on impact on the estimates of the total daily number of steps.

Are there differences in activity patterns between weekdays and weekends?

1.Create a new factor variable in the dataset with two levels - “weekday” and “weekend” indicating whether a given date is a weekday or weekend day.

dataFilled$weekday <- c("weekday")
dataFilled[weekdays(as.Date(dataFilled[, 2])) %in% c("Sunday", "Saturday"), 
    ][4] <- c("weekend")
dataFilled$weekday <- factor(dataFilled$weekday)

2.Make a panel plot containing a time series plot of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all weekday days or weekend days (y-axis).

g <- split(dataFilled, dataFilled$weekday)
result <- lapply(g, function(df) tapply(df$steps, df$interval, mean))
datafinal <- data.frame(Interval = rep(interval, 2), stepsFilledAve = as.vector(unlist(result)), 
    DAY = factor(rep(c("weekday", "weekend"), each = 288)))
library(lattice)
xyplot(stepsFilledAve ~ Interval | DAY, data = datafinal, layout = c(1, 2), 
    type = "l", ylab = "Number of steps")

plot of chunk unnamed-chunk-10