Data<-read.csv("activity.csv")
DataClean<-na.omit(Data)
library(plyr)
summary(DataClean)
## steps date interval
## Min. : 0.00 2012-10-02: 288 Min. : 0.0
## 1st Qu.: 0.00 2012-10-03: 288 1st Qu.: 588.8
## Median : 0.00 2012-10-04: 288 Median :1177.5
## Mean : 37.38 2012-10-05: 288 Mean :1177.5
## 3rd Qu.: 12.00 2012-10-06: 288 3rd Qu.:1766.2
## Max. :806.00 2012-10-07: 288 Max. :2355.0
## (Other) :13536
Note that all NAs were removed but we still have zero min quantity
Histogram of the total number of steps the individuals in our data set
SumData<-aggregate(DataClean["steps"],list(date = DataClean$date),sum)
hist(SumData$steps, breaks=50, main="Steps by Individual")
summary(SumData)
## date steps
## 2012-10-02: 1 Min. : 41
## 2012-10-03: 1 1st Qu.: 8841
## 2012-10-04: 1 Median :10765
## 2012-10-05: 1 Mean :10766
## 2012-10-06: 1 3rd Qu.:13294
## 2012-10-07: 1 Max. :21194
## (Other) :47
Note the mean and median from the summary taken on the data set
Plot of the Daily Steps Taken (in 5 Minute intervals)
MeanData<-aggregate(DataClean["steps"],list(Interval = DataClean$interval),mean)
plot(MeanData$Interval, MeanData$steps, type ="l", main= "Mean Steps by Interval Period")
summary(MeanData)
## Interval steps
## Min. : 0.0 Min. : 0.000
## 1st Qu.: 588.8 1st Qu.: 2.486
## Median :1177.5 Median : 34.113
## Mean :1177.5 Mean : 37.383
## 3rd Qu.:1766.2 3rd Qu.: 52.835
## Max. :2355.0 Max. :206.170
See summary above for max interval of 2355
How many values of NA do I have in my original Dataset?
StepsNA<-sum(is.na(Data$steps))
DataNA<-sum(is.na(Data$date))
IntervalNA<-sum(is.na(Data$interval))
There are 2304 rows containing an NA value, all of which are the the Steps column
## First append DataMean to Data based on matching interval
Data$match<-MeanData$steps[match(Data$interval,MeanData$Interval)]
## Then conditionally index the Data Frame and replace all NA in step column 1 with match column 4
Index<-is.na(Data$steps)
Data$steps[Index]<-Data$match[Index]
##Verify NA's have been replaced by running sum(is.na(Data$Steps))
SumDataNEW<-aggregate(Data["steps"],list(date = Data$date),sum)
hist(SumDataNEW$steps, breaks=50, main="Steps by Individual")
summary(SumDataNEW)
## date steps
## 2012-10-01: 1 Min. : 41
## 2012-10-02: 1 1st Qu.: 9819
## 2012-10-03: 1 Median :10766
## 2012-10-04: 1 Mean :10766
## 2012-10-05: 1 3rd Qu.:12811
## 2012-10-06: 1 Max. :21194
## (Other) :55
summary(SumData)
## date steps
## 2012-10-02: 1 Min. : 41
## 2012-10-03: 1 1st Qu.: 8841
## 2012-10-04: 1 Median :10765
## 2012-10-05: 1 Mean :10766
## 2012-10-06: 1 3rd Qu.:13294
## 2012-10-07: 1 Max. :21194
## (Other) :47
When comparing the mean and median of the old data set and the new data set, per the summary of each, the totals work out to be the same. The impact is in the quantile data and in the total number of records
## Make sure your date column changes from a 'factor' class to a 'date' class
DataClean$date<-as.Date(DataClean$date)
## Add new factor column using weekdays() function
DataClean$Weekdata<-weekdays(DataClean$date)
## Add Another factor column to state whether the day is a weekend or weekday
DataClean$Day_or_End<- ifelse(DataClean$Weekdata == "Saturday" | DataClean$Weekdata == "Sunday", "Weekend", "Weekday")
##Final Aggregate data to plot
FinalAGG<-aggregate(steps ~ interval + Day_or_End, DataClean, sum)
attach(FinalAGG)
par(mfrow=c(2,1))
plot(subset(FinalAGG, Day_or_End == "Weekend", select=c("interval","steps")), type="l", main="Weekend steps by interval")
plot(subset(FinalAGG, Day_or_End == "Weekday", select=c("interval","steps")), type="l", main="Weekday steps by interval")
knit2html()