data <- read.csv("C:\\Users\\lenovo\\Desktop\\activity.csv")
For this part of the assignment, missing values are ignored.
1.Make a histogram of the total number of steps taken each day
steps <- tapply(data$steps, data$date, sum)
hist(steps, breaks = 10, col = "blue", main = "Histogram of Steps Taken Each Day",
xlab = "Steps Taken Each Day")
2.Calculate and report the mean and median total number of steps taken per day
stepsmean <- mean(steps, na.rm = T)
stepsmedian <- median(steps, na.rm = T)
Mean of total number of steps taken per day is 1.0766 × 104, and median of total number of steps taken per day is 10765.
1.Make a time series plot of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all days (y-axis)
stepsAve <- tapply(data$steps, data$interval, mean, na.rm = T)
stepsAve <- as.numeric(stepsAve)
interval <- as.numeric(levels(factor(data$interval)))
plot(interval, stepsAve, type = "l", col = "red", xlab = "5-minute interval",
ylab = "Steps Averaged", main = "Steps Averaged Across All Days", frame = F)
2.Which 5-minute interval, on average across all the days in the dataset, contains the maximum number of steps?
data1 <- data.frame(interval, stepsAve)
maxinterval <- data1[data1[, 2] == max(stepsAve), ][1]
5-minute interval of “ 835 ” contains the maximum number of steps.
1.Calculate and report the total number of missing values in the dataset
missingNum <- sum(is.na(data))
There are 2304 missing values in the dataset.
2.Mean for that 5-minute interval are used to fill in all the missing values in the dataset
dataFilled <- data
for (i in 1:nrow(dataFilled)) {
if (is.na(dataFilled[i, 1]))
dataFilled[i, 1] <- data1[data1[, 1] == dataFilled[i, 3], ][, 2]
}
missingnum <- sum(is.na(dataFilled))
Now, new dataset was created that is equal to the original one but with 0 missing value.
3.Make a histogram of the total number of steps taken each day and Calculate the mean and median total number of steps taken per day
stepsFilled <- tapply(dataFilled$steps, dataFilled$date, sum)
hist(stepsFilled, breaks = 10, col = "green", main = "Histogram of Steps Taken Each Day (Filled)",
xlab = "Steps Taken Each Day")
stepsFilledmean <- mean(stepsFilled, na.rm = T)
stepsFilledmedian <- median(stepsFilled, na.rm = T)
After imputing missing values, mean of total number of steps taken per day is 1.0766 × 104, and median of total number of steps taken per day is 1.0766 × 104.It shows no difference from the estimates above. Using mean for that 5-minute interval to fill in all the missing values has on impact on the estimates of the total daily number of steps.
1.Create a new factor variable in the dataset with two levels - “weekday” and “weekend” indicating whether a given date is a weekday or weekend day.
dataFilled$weekday <- c("weekday")
dataFilled[weekdays(as.Date(dataFilled[, 2])) %in% c("Sunday", "Saturday"),
][4] <- c("weekend")
dataFilled$weekday <- factor(dataFilled$weekday)
2.Make a panel plot containing a time series plot of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all weekday days or weekend days (y-axis).
g <- split(dataFilled, dataFilled$weekday)
result <- lapply(g, function(df) tapply(df$steps, df$interval, mean))
datafinal <- data.frame(Interval = rep(interval, 2), stepsFilledAve = as.vector(unlist(result)),
DAY = factor(rep(c("weekday", "weekend"), each = 288)))
library(lattice)
xyplot(stepsFilledAve ~ Interval | DAY, data = datafinal, layout = c(1, 2),
type = "l", ylab = "Number of steps")