Frist, we find the file path to the directory where the project has been downloaded / cloned (the package “here” - install.packages(“here”) - is needed):
library(here)
oldWD <- getwd()
setwd(here())
## Warning: 'here' is deprecated; use 'now' instead. Deprecated in version
## '1.5.6'.
## Error in setwd(here()): character argument expected
The loading and preprocessing of data belongs to the munging of data as per the ProjectTemplate structure, see also the READ.md in the root of the project direcotry. Data is loaded with read.csv(). We create a data frame, df_steps, from the csv. The code is saved in the /munge/01-A.R script, which is run every time we do load.project():
library(ProjectTemplate)
load.project()
## Project name: ReproducibleResearch_CourseProject1
## Loading project configuration
## Autoloading packages
## Loading package: reshape
## Loading package: plyr
## Loading package: dplyr
## Loading package: ggplot2
## Loading package: stringr
## Loading package: lubridate
## Loading package: Hmisc
## Autoloading helper functions
## Running helper script: globals.R
## Running helper script: helpers.R
## Autoloading data
## Munging data
## Running preprocessing script: 01-A.R
dim(df_steps)
## [1] 17568 3
head(df_steps)
## steps date interval
## 1 NA 2012-10-01 0
## 2 NA 2012-10-01 5
## 3 NA 2012-10-01 10
## 4 NA 2012-10-01 15
## 5 NA 2012-10-01 20
## 6 NA 2012-10-01 25
summary(df_steps)
## steps date interval
## Min. : 0.00 2012-10-01: 288 Min. : 0.0
## 1st Qu.: 0.00 2012-10-02: 288 1st Qu.: 588.8
## Median : 0.00 2012-10-03: 288 Median :1177.5
## Mean : 37.38 2012-10-04: 288 Mean :1177.5
## 3rd Qu.: 12.00 2012-10-05: 288 3rd Qu.:1766.2
## Max. :806.00 2012-10-06: 288 Max. :2355.0
## NA's :2304 (Other) :15840
NOTE: The path in **setwd()* must be changed to the where you have checked out the project on your computer.
All of the necessary commands for this part are saved to the cp1_basic_processing.R-script in the src folder. For this part, the missing values in the dataset are ignored.
Total number of steps taken per day:
df_steps %>% group_by(date) %>% summarise(sum=sum(steps))
## # A tibble: 61 x 2
## date sum
## <fct> <int>
## 1 2012-10-01 NA
## 2 2012-10-02 126
## 3 2012-10-03 11352
## 4 2012-10-04 12116
## 5 2012-10-05 13294
## 6 2012-10-06 15420
## 7 2012-10-07 11015
## 8 2012-10-08 NA
## 9 2012-10-09 12811
## 10 2012-10-10 9900
## # ... with 51 more rows
Histogram of the total number of steps taken each day:
dt_steps_per_day <- df_steps %>% group_by(date) %>%
summarise(sum=sum(steps))
hist(dt_steps_per_day$sum,
xlab="Total number of steps per day",
main = "Histogram of total number of steps per day")
dev.print(png,
'figure/histogram_sum_steps_per_day.png',
width=640,
height=800)
## png
## 2
Mean and median of the total number of steps taken per day:
mean(dt_steps_per_day$sum,na.rm=TRUE)
## [1] 10766.19
median(dt_steps_per_day$sum,na.rm=TRUE)
## [1] 10765
Time series plot (type = “l”) of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all days (y-axis):
dt_average_steps_per_interval <- df_steps %>%
group_by(interval) %>%
summarise(mean=mean(steps, na.rm=TRUE))
plot(dt_average_steps_per_interval$interval,
dt_average_steps_per_interval$mean,
type="l",
xlab="Interval",
ylab="Average number of steps",
main="Average Number of Steps Per Interval")
dev.print(png,
'figure/lineplot_average_steps_per_interval.png',
width=640,
height=800)
## png
## 2
5-minute interval, on average across all the days in the dataset, that contains the maximum number of steps:
which.max(dt_average_steps_per_interval$mean)
## [1] 104
The total number of missing values in the dataset (i.e. the total number of rows with NAs):
sapply(df_steps, function(x) sum(is.na(x)))
## steps date interval
## 2304 0 0
The chosen strategy for filling in all of the missing values in the dataset is to use mean for that 5-minute interval. A new dataset that is equal to the original dataset, but with the missing data filled in:
y <- which(is.na(df_steps$steps)==TRUE)
df_steps_imputed <- merge(df_steps,dt_average_steps_per_interval,by="interval")
df_steps_imputed$steps <- with(df_steps_imputed,impute(steps,mean[y]))
Histogram of the total number of steps taken each day:
dt_steps_per_day_imputed <- df_steps_imputed %>% group_by(date) %>%
summarise(sum=sum(steps))
hist(dt_steps_per_day_imputed$sum,
xlab="Total number of steps per day",
main = "Histogram of total number of steps per day")
dev.print(png,
'figure/histogram_sum_steps_per_day_imputed.png',
width=640,
height=800)
## png
## 2
The mean and median total number of steps taken per day:
mean(dt_steps_per_day_imputed$sum)
## [1] 10889.8
median(dt_steps_per_day_imputed$sum)
## [1] 11458
These values are higher than the estimates from the first part of the assignment. Thus, the impact of imputing missing data on the estimates of the total daily number of steps is increased values.
New factor variable in the dataset with two levels – “weekday” and “weekend” indicating whether a given date is a weekday or weekend day:
source("src/cp1_function_daytype.R")
df_steps_imputed$daytype <- apply(df_steps_imputed,1,
function(x) daytype(x[3]))
Panel plot containing a time series plot (type = “l”) of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all weekday days or weekend days (y-axis).
df_weekday <- subset(df_steps_imputed,daytype=="weekday")
df_weekend <- subset(df_steps_imputed,daytype=="weekend")
dt_steps_per_day_weekday <- df_weekday %>% group_by(interval) %>%
summarise(mean=mean(steps))
dt_steps_per_day_weekend <- df_weekend %>% group_by(interval) %>%
summarise(mean=mean(steps))
par(mfrow=c(2,1))
plot(dt_steps_per_day_weekday$interval,
dt_steps_per_day_weekday$mean,
type="l",
xlab="Interval",
ylab="Average number of steps (weekdays)",
main="Average Number of Steps Per Interval On Weekdays")
plot(dt_steps_per_day_weekend$interval,
dt_steps_per_day_weekend$mean,
type="l",
xlab="Interval",
ylab="Average number of steps (weekends)",
main="Average Number of Steps Per Interval On Weekends")
dev.print(png,
'figure/multipanelplot_steps_weekdays_weekends.png',
width=640,
height=800)
## png
## 2
Cleanup: Reset to old working directory:
setwd(oldWD)