Reproducible research_Peer-graded Assignment: Course Project 1
Code for reading in the dataset and processing the data
- Read the csv data into R using read.csv.
- convert the “date” column as date class.
activity<-read.csv("activity.csv")
activity$date <- as.Date(activity$date, "%Y-%m-%d")
Histogram of the total number of steps taken each day
- Use functions the dplyr package.
- Group the data frame using “date” and calculate the sum of “steps” for each date, with NA value removed.
- Plot the total number of steps taken each day using histogram.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
total_step<-activity %>%
group_by(date) %>%
summarize(total=sum(steps, na.rm = TRUE))
hist(total_step$total, xlab="Total steps per day", main = "Total number of steps taken each day")

Time series plot of the average number of steps taken
- Group the data frame using “interval” and calculate the sum of “steps” for each interval, with NA value removed.
- Plot time series plot of the average number of steps taken.
Daily_average_step<-activity %>%
group_by(interval) %>%
summarize(average=mean(steps,na.rm=TRUE))
with(Daily_average_step, plot(interval, average, type="l", ylab="Average steps"))

The 5-minute interval that, on average, contains the maximum number of steps
Daily_average_step$interval[which(Daily_average_step$average==max(Daily_average_step$average))]
## [1] 835
Code to describe and show a strategy for imputing missing data
- Describe the total number of missing values in the data frame.
- Set the NA value to the mean for that 5-minute interval using a “for” loop.
# The total number of missing values in the dataset
sum(is.na(activity$steps))
## [1] 2304
# Set the NA value to the mean for that 5-minute interval.
activity_2<-activity
for (i in 1:nrow(activity_2)){
if(is.na(activity_2$steps[i])){
activity_2$steps[i] <- Daily_average_step$average[which(Daily_average_step$interval==activity_2$interval[i])]
}
}
Histogram of the total number of steps taken each day after missing values are imputed
- Group the data frame using “date” and calculate the sum of “steps” for each date, using the dataset after imputing the NA values.
- Plot the total number of steps taken each day using histogram.
total_step_NARemoved <- activity_2 %>%
group_by(date) %>%
summarize(total=sum(steps))
hist(total_step_NARemoved$total, xlab="Total steps per day", main = "Total number of steps taken each day (NA imputated)")

Panel plot comparing the average number of steps taken per 5-minute interval across weekdays and weekends
- Using a “for” loop to create a new factor variable called “Week” in the dataset with two levels (“weekday” and “weekend”) to indicate whether a given date is a weekday or weekend day.
- Using ggplot2 package to plot the average number of steps taken per 5-minute interval across weekdays and weekends.
#Create a new factor variable in the dataset with two levels – “weekday” and “weekend” indicating whether a given date is a weekday or weekend day.
for (i in 1:nrow(activity_2)){
if(weekdays(activity_2$date[i]) %in% c("Saturday", "Sunday")) {
activity_2$Week[i] <- "weekend"
} else{
activity_2$Week[i] <- "weekday"
}
}
activity_2$Week <- factor(activity_2$Week)
#Panel plot comparing the average number of steps taken per 5-minute interval across weekdays and weekends
library(ggplot2)
step_week<-activity_2 %>%
group_by(interval,Week) %>%
summarize(average=mean(steps))
## `summarise()` has grouped output by 'interval'. You can override using the
## `.groups` argument.
g <- ggplot(step_week, aes(interval, average))
g+geom_line()+facet_grid(Week~.)+labs(y="Average steps", x="Time interval")
