1. Loading the data

data <- read.csv("activity.csv", colClasses = c("integer", "Date", "factor"))
sinNA <- na.omit(data)

2. What is mean total number of steps taken per day?

For this part of the assignment, you can ignore the missing values in the dataset.

Make a histogram of the total number of steps taken each day

Calculate and report the mean and median total number of steps taken per day

attach(sinNA)
totalSteps <- aggregate(steps, list(Date = date), FUN = "sum")
mean(totalSteps$x)

## [1] 10766.19

median(totalSteps$x)

## [1] 10765

hist(totalSteps$x, col = "lightblue", xlab = "Number of Steps Taken Each Day", 
     ylab =   "freq", main = "Histogram of Total Number of Steps Taken Each Day")

3.What is the average daily activity pattern?

Make a time series plot (i.e. type = “l”) of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all days (y-axis)

Which 5-minute interval, on average across all the days in the dataset, contains the maximum number of steps?

library(ggplot2)
timeseries <- aggregate(steps, list(interval = as.numeric(as.character(interval))), FUN = "mean")
ggplot(timeseries, aes(interval, y = x)) + geom_line() + labs(title = "Time Series Plot of the 5-minute Interval", x = "5-minute intervals", y = "Average Number of Steps Taken")

maximo <- which.max(timeseries$x)
maximo

## [1] 104

timeseries [maximo, ]

##     interval        x
## 104      835 206.1698

4. Imputing missing values

Devise a strategy for filling in all of the missing values in the dataset. The strategy does not need to be sophisticated. For example, you could use the mean/median for that day, or the mean for that 5-minute interval, etc. My strategy is to use the mean for that 5-minute interval to fill each NA value in the steps column. Now I Create a new dataset that is equal to the original dataset but with the missing data filled in.

newdata <- data 
for (i in 1:nrow(newdata)) {
     if (is.na(newdata$steps[i])) {
        newdata$steps[i] <- timeseries[which(newdata$interval[i] == timeseries$interval), ]$x
                   }
            }

  head(newdata)

##       steps       date interval
## 1 1.7169811 2012-10-01        0
## 2 0.3396226 2012-10-01        5
## 3 0.1320755 2012-10-01       10
## 4 0.1509434 2012-10-01       15
## 5 0.0754717 2012-10-01       20
## 6 2.0943396 2012-10-01       25

sum(is.na(newdata))

## [1] 0

5. Imputing missing values. Second Part.

Make a histogram of the total number of steps taken each day and Calculate and report the mean and median total number of steps taken per day. Do these values differ from the estimates from the first part of the assignment? What is the impact of imputing missing data on the estimates of the total daily number of steps? I think is that The mean value should’not change because I use the mean for that 5-minute interval to fill each NA value in the steps column. And mathematically I can show that the new mean is te same.

totalSteps2 <- aggregate(newdata$steps, list(Date = newdata$date), FUN = "sum")
mean(totalSteps2$x)

## [1] 10766.19

median(totalSteps2$x)

## [1] 10766.19

hist(totalSteps2$x, col = "blue", xlab = "Number of Steps Taken Each Day",
     ylab = "freq",main = "Histogram of Total Number of Steps Taken Each Day" )

6. Are there differences in activity patterns between weekdays and weekends?

Creating a new factor variable in the dataset with two levels – “weekday” and “weekend” indicating whether a given date is a weekday or weekend day.

newdata$date <- as.Date(newdata$date, "%Y-%m-%d")
newdata$day <- weekdays(newdata$date)
happydays <- c("sábado","domingo")

newdata$tipodia<-as.factor(ifelse(weekdays(newdata$date)%in%happydays,"weekend","weekday"))

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

by_tipodia <- group_by(newdata, tipodia)
a <- summarize(by_tipodia, mean(steps))
a

## Source: local data frame [2 x 2]
## 
##   tipodia mean(steps)
##    (fctr)       (dbl)
## 1 weekday    35.61058
## 2 weekend    42.36640

by_tipodia

## Source: local data frame [17,568 x 5]
## Groups: tipodia [2]
## 
##        steps       date interval           day tipodia
##        (dbl)     (date)   (fctr)         (chr)  (fctr)
## 1  1.7169811 2012-10-01        0 segunda-feira weekday
## 2  0.3396226 2012-10-01        5 segunda-feira weekday
## 3  0.1320755 2012-10-01       10 segunda-feira weekday
## 4  0.1509434 2012-10-01       15 segunda-feira weekday
## 5  0.0754717 2012-10-01       20 segunda-feira weekday
## 6  2.0943396 2012-10-01       25 segunda-feira weekday
## 7  0.5283019 2012-10-01       30 segunda-feira weekday
## 8  0.8679245 2012-10-01       35 segunda-feira weekday
## 9  0.0000000 2012-10-01       40 segunda-feira weekday
## 10 1.4716981 2012-10-01       45 segunda-feira weekday
## ..       ...        ...      ...           ...     ...

qplot(interval, steps, data=by_tipodia, geom=c("line"), xlab="5-min intervals", 
      ylab="steps mean", main="") + facet_wrap(~ tipodia, ncol=1)

Reproducible Research

Assignment 1

Delermando Branquinho Filho

June 11, 2016