Loading and preprocessing the data
1.Load the data
activity <- read.csv("C:/Users/kwak/Desktop/activity.csv")
head(activity)
steps date interval
1 NA 2012-10-01 0
2 NA 2012-10-01 5
3 NA 2012-10-01 10
4 NA 2012-10-01 15
5 NA 2012-10-01 20
6 NA 2012-10-01 25
What is the average dailiy activity pattern?
1.Make a time sereies plot(i.e. type=“1”) of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all days (y-axis)
activity_avg<-ddply(activity, .(interval), summarize, avg_steps = mean(steps, na.rm=T))
ggplot(activity_avg, aes(x=interval, y=avg_steps)) + geom_line()+xlab("interval")+ylab("Averge Steps")

2.Which 5-minute interval, on average across all the days in the dataset, contains the maximum number of steps?
activity_avg[which.max(activity_avg$avg_steps),]
interval avg_steps
104 835 206.1698
Imputing missing values
1.Calculate and report the total number of missing values in the dataset (i.e. the total number of rows with NAs)
count(is.na(activity$steps))
x freq
1 FALSE 15264
2 TRUE 2304
3.Create a new dataset that is equal to the original dataset but with the missing data filled in.
activity_upd_sum<-ddply(activity_upd, .(date), summarize, total_steps = sum(steps, na.rm=T))
head(activity_upd_sum)
date total_steps
1 2012-10-01 10766.19
2 2012-10-02 126.00
3 2012-10-03 11352.00
4 2012-10-04 12116.00
5 2012-10-05 13294.00
6 2012-10-06 15420.00
Are there differences in activity patterns between weekdays and weekends?
1.Create a new factor variable in the dataset with two levels – “weekday” and “weekend” indicating whether a given date is a weekday or weekend day.
activity_week<-activity_upd
activity_week$date<-as.Date(activity_week$date)
activity_week$week<-weekdays(activity_week$date)
activity_week$weekend <- ifelse(activity_week$week == "토요일" | activity_week$week == "일요일", "weekend","weekday")
activity_week_avg<-ddply(activity_week, .(interval, weekend), summarize, total_steps = mean(steps, na.rm=T))
head(activity_week_avg)
interval weekend total_steps
1 0 weekday 2.25115304
2 0 weekend 0.21462264
3 5 weekday 0.44528302
4 5 weekend 0.04245283
5 10 weekday 0.17316562
6 10 weekend 0.01650943
2.Make a panel plot containing a time series plot (i.e. type = “l”) of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all weekday days or weekend days (y-axis). The plot should look something like the following, which was creating using simulated data:
ggplot(activity_week_avg, aes(color=weekend, x=interval, y=total_steps)) + geom_line()+xlab("interval")+ylab("Averge Steps Taken")+facet_grid( . ~ weekend)
