The data is loaded from the URL given.
The class was analysed after loading the file from the downloaded zip file and the only collumn which had to be adjusted was the date collumn.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(lubridate)
URL_file <- c("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2Factivity.zip")
download.file(URL_file, "~/R/Course 5/Factivity.zip")
downloaddatum <- Sys.Date()
Factivity <- read.csv((unz("~/R/Course 5/Factivity.zip", "activity.csv")))
Factivity$date <- as.Date(as.character(Factivity$date))
print(downloaddatum)
## [1] "2017-09-07"
This step exists of two parts:
- Summerizing the steps perday
- plotting the steps in a histogram
#Second question: histogram number of steps per day
Sum_per_day <- Factivity %>% group_by(date) %>% summarize(total_steps=sum(steps))
hist(Sum_per_day$total_steps, breaks = 20, xlab = "Day", ylab = "number of steps", main = "Number of steps per day")
Within this part the bullets 3 to 5 are answered.
The first question is the mean and the medium. Here the NA’s are excluded.
#Third question: mean and medium steps per day
Mean_per_day <- mean( Sum_per_day$total_steps, na.rm = TRUE)
Median_per_day <- median(Sum_per_day$total_steps, na.rm = TRUE)
cat("Mean :", Mean_per_day)
## Mean : 10766.19
cat(" Median :", Median_per_day)
## Median : 10765
The next question is the plot with the time series and the steps at that particular time of day. You can see that the top is between 800 and 900.
#Forth question: time series plot of average number of steps taken
Mean_per_interval <- filter(Factivity, !is.na(Factivity$steps))
Mean_per_interval <- Mean_per_interval %>% group_by(interval) %>% summarize(total_steps=mean(steps))
plot(Mean_per_interval$interval, Mean_per_interval$total_steps, type = 'l',xlab = "daynumber", ylab = "number of steps", main = "Average number of steps per interval")
The next question is the interval with the highest number of steps. This is 835.
# Fifth question: interval that takes on average maximum number of steps per day
max_steps <- filter(Mean_per_interval, Mean_per_interval$total_steps == max(Mean_per_interval$total_steps))
print(max_steps)
## Source: local data frame [1 x 2]
##
## interval total_steps
## (int) (dbl)
## 1 835 206.1698
Within this question two bullets are answered:
- the impute strategy
- plotting the results
During the day the mean of the steps during a certain intervals varies a lot.This means that de daily average is not a good impute strategy. The other available variable is the interval. The impute strategy choosen was the mean during the interval in the available observations. There are no intervals with no observation at all.
# chosen to impute the steps with NA with the average of the interval
Number_of_NA <- sum(is.na(Factivity$steps))
aant_int <- as.integer(count(Factivity))
for (i in 1:aant_int) {
if(is.na(Factivity$steps[[i]])) {
Interval <- as.integer(Factivity$interval[[i]])
Mean_step <- filter(Mean_per_interval, Mean_per_interval$interval == Interval)
Factivity$steps[[i]] <- as.integer(Mean_step$total_steps)
}
}
The plotting code was identical as the code for Question 2.
# Seventh question: Histogram of the total number of steps taken each day
# after missing values are imputed
Sum_per_day <- Factivity %>% group_by(date) %>% summarize(total_steps=sum(steps))
hist(Sum_per_day$total_steps, breaks = 20, xlab = "Day", ylab = "number of steps", main = "Number of steps per day")
To answer the last question three steps were coded:
1. Splitting the data in weekday’s (day 2 to 6) and weekends (day 1 and 7)
2. calculating the mean per interval (weekend and weekdays)
3. printing the two plots (one for weekend en one for weekdays) where the average number of steps per interval are shown.
# Eighth question: Panel plot comparing the average number of steps taken
# per 5-minute interval across weekdays and weekends
Factivity$weekday1 <- wday(Factivity$date)
for (i in 1:aant_int) {
if(Factivity$weekday1[[i]] %in% c(2, 3, 4, 5, 6)) {
Factivity$weekday2[[i]] <- "weekday"
} else {
Factivity$weekday2[[i]] <- "weekend"
}
}
Fact_weekday_interval <- Factivity %>% group_by(weekday2, interval) %>% summarize(total_steps=mean(steps))
xyplot(total_steps ~ interval | weekday2,
data = Fact_weekday_interval,
layout = c(1,2),
type = 'l',
main = "Main number of steps per interval weekend versus weekday",
xlab = "daynumber",
ylab = "mean steps")