Global Options

Global options are set to show all R code.

library(knitr)
opts_chunk$set(echo=TRUE)

Loading and preprocessing the data

  1. The data is loaded and assigned to activity_data variable by using read.csv() function.
setwd("/Dropbox/Course Materials/Reproducible Research/Assignments/RepData_PeerAssessment1")
activity_data <- read.csv("./data/activity.csv", colClasses=c("numeric", "Date", "numeric"))
  1. steps and internal are converted to R numeric type and date is converted to R date type. Summary of the data:
summary(activity_data)
##      steps            date               interval   
##  Min.   :  0.0   Min.   :2012-10-01   Min.   :   0  
##  1st Qu.:  0.0   1st Qu.:2012-10-16   1st Qu.: 589  
##  Median :  0.0   Median :2012-10-31   Median :1178  
##  Mean   : 37.4   Mean   :2012-10-31   Mean   :1178  
##  3rd Qu.: 12.0   3rd Qu.:2012-11-15   3rd Qu.:1766  
##  Max.   :806.0   Max.   :2012-11-30   Max.   :2355  
##  NA's   :2304

What is mean total number of steps taken per day?

  1. Histogram of the total number of steps taken each day:
total_steps <- aggregate(steps ~ date, activity_data, sum)
hist(total_steps$steps, main="Total number of steps taken per day", xlab = "Total steps", ylab = "Count", col = "Red", breaks=8)

plot of chunk histogram

  1. Mean and median total number of steps taken per day:
mean_steps <- mean(total_steps$steps)
mean_steps
## [1] 10766
median_steps <- median(total_steps$steps)
median_steps
## [1] 10765
  1. Mean of total number of steps taken per day is 1.0766 × 104 steps.
  2. Median of total number of steps taken per day is 1.0765 × 104 steps.

What is the average daily activity pattern?

  1. Time series plot (type = “l”) of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all days (y-axis):
avg_dailysteps <- aggregate(steps ~ interval, activity_data, mean)
plot(avg_dailysteps, type="l", xlab="Time Intervals (5-minute)", ylab="Average number of steps taken (all Days)", main = "Average steps taken accross all days", col="red")

plot of chunk timeseriesplot

  1. The interval, on average across all the days in the dataset, contains the maximum number of steps
max_interval <- avg_dailysteps$interval[which.max(avg_dailysteps$steps)]
max_interval
## [1] 835

is 835th minute.

Imputing missing values

  1. Total number of missing values in the dataset (the total number of rows with NAs)
total_NA <- sum(is.na(activity_data))
total_NA
## [1] 2304

is 2304.

  1. Missing values in the dataset are filled with the mean value across all days.
mod_activity_data <- activity_data
mod_activity_data$steps[is.na(mod_activity_data$steps)] <- avg_dailysteps$steps
  1. New dataset mod_activity_data is equal to the original dataset activity_data but with the missing data filled in with the mean values across all days.
summary(activity_data)
##      steps            date               interval   
##  Min.   :  0.0   Min.   :2012-10-01   Min.   :   0  
##  1st Qu.:  0.0   1st Qu.:2012-10-16   1st Qu.: 589  
##  Median :  0.0   Median :2012-10-31   Median :1178  
##  Mean   : 37.4   Mean   :2012-10-31   Mean   :1178  
##  3rd Qu.: 12.0   3rd Qu.:2012-11-15   3rd Qu.:1766  
##  Max.   :806.0   Max.   :2012-11-30   Max.   :2355  
##  NA's   :2304
summary(mod_activity_data)
##      steps            date               interval   
##  Min.   :  0.0   Min.   :2012-10-01   Min.   :   0  
##  1st Qu.:  0.0   1st Qu.:2012-10-16   1st Qu.: 589  
##  Median :  0.0   Median :2012-10-31   Median :1178  
##  Mean   : 37.4   Mean   :2012-10-31   Mean   :1178  
##  3rd Qu.: 27.0   3rd Qu.:2012-11-15   3rd Qu.:1766  
##  Max.   :806.0   Max.   :2012-11-30   Max.   :2355
  1. Histogram of the total number of steps taken each day:
mod_total_steps <- aggregate(steps ~ date, mod_activity_data, sum)
hist(mod_total_steps$steps, main="Total number of steps taken per day", xlab = "Total steps", ylab = "Count", col = "Red", breaks=8)

plot of chunk modhistogram

Mean and median total number of steps taken per day:

mod_mean_steps <- mean(mod_total_steps$steps)
mod_mean_steps
## [1] 10766
mod_median_steps <- median(mod_total_steps$steps)
mod_median_steps
## [1] 10766
  1. Mean of total number of steps taken per day is 1.0766 × 104 steps.
  2. Median of total number of steps taken per day is 1.0766 × 104 steps.

Are there differences in activity patterns between weekdays and weekends?

  1. “weekday” and “weekend” indicating whether a given date is a weekday or weekend day.
week_activity_data <- activity_data
week_activity_data$weekdays <- factor(format(week_activity_data$date, "%A"))
levels(week_activity_data$weekdays) <- list(weekday=c("Pazartesi", "Salı", "Çarşamba", "Perşembe", "Cuma"), weekend = c("Cumartesi", "Pazar"))
  1. Time series plot (type = “l”) of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all weekday days or weekend days (y-axis):
avg_weeksteps <- aggregate(steps ~ interval + weekdays, week_activity_data, "mean")

library(lattice)
xyplot(steps ~ interval | weekdays, avg_weeksteps, layout=c(1, 2), type="l", xlab="Interval (5-minute)", ylab="Average number of steps across all days")

plot of chunk weekplot