The working directory fo the code is the directory where the data files “activity.zip resides”. First data is read directly from the zip file into a data frame.
activitydata <- read.csv(unz("activity.zip", "activity.csv"),
colClasses=c("numeric","Date","numeric"),
header=TRUE, sep=",", stringsAsFactors=FALSE)
Apply the ‘sum’ function to the steps column, aggregating rows based on unique date values.
totalstepsday<-aggregate(steps~date, data=activitydata,sum)
The histogram plot of the total number of steps each day is shown below :
barplot(totalstepsday$steps, names.arg=totalstepsday$date,ylab="Total Steps", xlab="Dates")
meansteps <- mean(totalstepsday$steps)
mediansteps <- median(totalstepsday$steps)
The mean total number of steps taken each day is 1.076618910^{4}.
The median total number of steps taken each day is 1.076510^{4}.
First find the mean number of steps in each time interval calculated across all days :
meanstepsalldays<-aggregate(steps~interval, activitydata, mean)
highesactivityinterval <- meanstepsalldays[which.max(meanstepsalldays$steps),]$interval
The mean number of steps for each time interval calculated for all days is shown in the plot below :
plot(meanstepsalldays$interval,
meanstepsalldays$steps,
type='l',
xlab="5-minute Time Interval",
ylab="Mean No. of Steps",
main="Mean Number of Steps Across All days for Each Time Interval")
The interval with the highest activity is 835.
First find the number of rows with NA :
nrowsNA <- nrow(activitydata) - sum(complete.cases(activitydata))
The number of rows with NA in steps are 2304.
Replace all days with NA in number of steps with the mean number of step across all days (for the specific time interval). Create a new data table called filledactivitydata to contain the filled data.
datesNA<-activitydata[is.na(activitydata$steps),]$date
dateswithNAdata<-unique(datesNA) ## determine the unique dates with NA in 'steps'
filledactivitydata <- activitydata ## Make a copy of the original data
for(i in 1:length(dateswithNAdata))
{
filledactivitydata[filledactivitydata$date ==
dateswithNAdata[i],]$steps<-meanstepsalldays$steps
}
filledtotalstepsday<-aggregate(steps~date, data=filledactivitydata,sum)
The histogram plot of the filled total number of steps each day is shown below :
barplot(filledtotalstepsday$steps, names.arg=filledtotalstepsday$date,ylab="Total Steps", xlab="Dates")
filledmeansteps <- mean(filledtotalstepsday$steps)
filledmediansteps <- median(filledtotalstepsday$steps)
The mean total number of steps taken each day from the filled data is 1.076618910^{4}.
The median total number of steps taken each day from the filled data is 1.076618910^{4}.
Although the mean value has remained the same, the median value now differs slightly from that calculated in the original dataset without the estimates added in.
Next, a new column “daytype” is created which contains a factor of either one of two levels – “Weekday” or “Weekend”.
dayofweek<-weekdays(filledactivitydata$date)
dayofweek[grepl("Sunday",dayofweek)] <- "Weekend"
dayofweek[grepl("Saturday",dayofweek)] <- "Weekend"
dayofweek[!grepl("Weekend",dayofweek)] <- "Weekday"
dayofweek <- as.factor(dayofweek)
filledactivitydata$daytype<-dayofweek
The mean number of steps in each time interval is then computed for the weekdays and separately for the weekends.
meanstepsperinterval<-aggregate(steps~interval+daytype, data=filledactivitydata,mean)
Plotting the data for both weekdays and weekends :
library(lattice)
xyplot(steps~interval|daytype,data=meanstepsperinterval,layout(1,2),type='l')
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.