Introduction

Increasing the measurement devices related to our physical activity has increased in recent times and the following document will practice with the R language and set of data from a device of physical activity.

The URL to the data

The variables included in this dataset are: * steps: Number of steps taking in a 5-minute interval (missing values are coded as NA) * date: The date on which the measurement was taken in YYYY-MM-DD format * interval: Identifier for the 5-minute interval in which measurement was taken

The dataset is stored in a comma-separated-value (CSV) file and there are a total of 17,568 observations in this dataset.

Reading data file

I’ve used dplyr and Lattice to group, summarise and plot the data.

invisible(library(dplyr))
## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(lattice)
Sys.setlocale("LC_TIME", "C")
## [1] "C"
data <- read.csv("activity.csv", sep = ",", 
                  col.names = c("steps", "date", "interval"),
                  colClasses = c("integer", "Date", "integer"))

Filter all nan-values in the steps variable and store the result in noNAS.

noNAS <- filter(data, !is.na(steps))

1.1. Make a histogram of the total number of steps taken each day.

  noNAS %>%
  group_by(date) %>%
  summarise(totalSteps = sum(steps)) %>% 
  with(histogram(totalSteps , breaks = 14, layout = c(1, 1), 
                 xlab ="Total Steps per day", ylab = "Percent of total"))

1.2. Calculate and report the mean and median total number of steps taken per day.

  noNAS %>%
  group_by(date) %>%
  summarise(totalSteps = sum(steps))%>% 
  ungroup() %>%
  summarise(meanSteps = mean(totalSteps),
            medianSteps = median(totalSteps))
## Source: local data frame [1 x 2]
## 
##   meanSteps medianSteps
## 1  10766.19       10765

2.1. Time series plot of tyoe list of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all days (y-axis)

  noNAS %>%
  group_by(interval) %>%
  summarise(meanSteps = mean(steps)) %>% 
  with(xyplot(meanSteps ~ interval, type = "l",
              main = "Average Number of Steps taken by 5-minutes interval", 
              xlab = "5-minutes interval", 
              ylab = "Average Steps"))

2.2. Which 5-minute interval, on average across all the days in the dataset, contains the maximum number of steps?

  noNAS %>%
  group_by(interval) %>%
  mutate(meanSteps = mean(steps)) %>%
  ungroup() %>%
  top_n(meanSteps, n = 1) 
## Source: local data frame [53 x 4]
## 
##    steps       date interval meanSteps
## 1      0 2012-10-02      835  206.1698
## 2     19 2012-10-03      835  206.1698
## 3    423 2012-10-04      835  206.1698
## 4    470 2012-10-05      835  206.1698
## 5    225 2012-10-06      835  206.1698
## 6      0 2012-10-07      835  206.1698
## 7    635 2012-10-09      835  206.1698
## 8      0 2012-10-10      835  206.1698
## 9    747 2012-10-11      835  206.1698
## 10   742 2012-10-12      835  206.1698
## ..   ...        ...      ...       ...

3.1. There are only missing values in the steps variable.

  nrow(filter(data, is.na(steps)))
## [1] 2304
  nrow(filter(data, is.na(date)))
## [1] 0
  nrow(filter(data, is.na(interval)))
## [1] 0

3.2. Create a dataframe df_media_interval with the mean of all intervals.

  noNAS %>%
  group_by(interval) %>%
  summarise(meanSteps = mean(steps)) -> df_media_interval

3.3. Modify all missing values with the mean steps of the interval.

  for(i in 1:nrow(data)){
    if (is.na(data[i,]$steps)){
      interval <- data[i,]$interval
      data[i,]$steps <- df_media_interval[df_media_interval$interval == interval,]$meanSteps #Select()
    }
  }

3.4. Check nan values.

nrow(filter(data, is.na(steps)))
## [1] 0

3.5. Histogram of the total number of steps taken each day

  data %>%
  group_by(date) %>%
  summarise(totalSteps = sum(steps)) %>% 
  with(histogram(totalSteps , breaks = 14, layout = c(1, 1), 
                 xlab ="Total Steps per day", ylab = "Percent of total"))

3.6. Calculate and report the mean and median total number of steps taken per day

  data %>%
  group_by(date) %>%
  summarise(totalSteps = sum(steps))%>% 
  ungroup() %>%
  summarise(meanSteps = mean(totalSteps),
            medianSteps = median(totalSteps))
## Source: local data frame [1 x 2]
## 
##   meanSteps medianSteps
## 1  10766.19    10766.19

The media and the median does not change the study between the two datasets.

4.1 Create a new factor variable in the dataset with two levels – “weekday” and “weekend” indicating whether a given date is a weekday or weekend day.

  data %>%
  mutate(day_type = as.factor(ifelse(weekdays(as.Date(date)) %in% c("Saturday", "Sunday"), "weekend", "weekday"))) -> data

4.2 Plot a time series plot of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all weekday days or weekend days (y-axis).

  data %>%
  group_by(interval, day_type) %>%
  summarise(averaged_steps = mean(steps)) %>%
  xyplot(averaged_steps ~ interval | day_type, data = ., type = "l", layout = c(1, 2))