1. Download the pedometer2.csv file into your pwd.

  2. Read pedometer2.csv into a dataframe

pedo2 <- read.csv("pedometer2.csv", header = TRUE)
str(pedo2)
## 'data.frame':    88 obs. of  4 variables:
##  $ Observation: int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Day        : Factor w/ 7 levels "Fri","Mon","Sat",..: 5 1 3 2 6 7 5 1 4 2 ...
##  $ Steps      : int  9178 694 12503 7802 7913 6135 7022 NA 11751 6861 ...
##  $ Hours      : int  96 120 144 24 48 72 96 120 168 24 ...
  1. Show the average hourly pattern of steps (i.e. how do step counts change over hours) in a time-series plot?
pedo2AvgSteps <-  tapply(pedo2$Steps, pedo2$Hours, mean ,na.rm = TRUE)
pedo2AvgSteps
##        24        48        72        96       120       144       168 
## 10210.000  8687.273  8547.909  9476.364  8532.444  9697.333 10236.857
  1. Which hourly segment (from 24 hours till 168 hours) has the maximum number of steps?
testmax <- (tapply(pedo2$Steps, pedo2$Hours, max ,na.rm = TRUE))
max(testmax)
## [1] 16051
  1. Count how many NaN are there in pedometer2.csv
sum(is.na(pedo2))
## [1] 20
  1. Suggest a reasonable strategy for dealing with the NA value, and implement it.

Replace the NA value with 0

pedo2[is.na(pedo2)] <- 0
  1. Plot the new dataset
library(ggplot2)

plot(pedo2$Hours, pedo2$Steps,col=pedo2$Day,pch=16,cex=0.5,xlab="Hours",ylab="Steps",main="New Datasets")

  1. Modify the new dataset which has been completed to create a new factor variable which will indicate whether it’s a weekend or a weekday.
 Week <- seq(length = 88) 
 pedo2ver2 <- data.frame(pedo2,Week)

 pedo2ver2$Week[pedo2ver2$Day  %in% c('Sat','Sun') ] <- "weekend" 
 pedo2ver2$Week[pedo2ver2$Day  %in% c('Mon','Tue','Wed','Thu','Fri') ] <- "weekday" 
  1. Create a panel plot of the data for average steps by weekend or weekday and is there a difference?
pedo2ver2Avg <-  tapply(pedo2ver2$Steps, pedo2ver2$Week, mean ,na.rm = TRUE)

barplot(pedo2ver2Avg, main = "Average steps of weekend and weekdays", col = c("darkred","darkblue"))