- Firstly, we load and preprocess the data. My activity.zip file is right at the working directory along with the PA1_template.Rmd and other stuff
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.3
act <- read.csv(unzip("repdata-data-activity.zip"))
## Warning in unzip("repdata-data-activity.zip"): error 1 al extraer del
## archivo zip
## Error in file(file, "rt"): argumento 'description' inválido
- Format dates to the appropiate type
act$date <- as.Date(act$date , format = "%Y-%m-%d")
## Error in as.Date(act$date, format = "%Y-%m-%d"): objeto 'act' no encontrado
--- .class #3
Slide 3
- From the original data, create and name the columns steps, day and interval
act.day <- aggregate(act$steps, by=list(act$date), sum)
## Error in aggregate(act$steps, by = list(act$date), sum): objeto 'act' no encontrado
act.interval <- aggregate(act$steps, by=list(act$interval), sum)
## Error in aggregate(act$steps, by = list(act$interval), sum): objeto 'act' no encontrado
names(act.day)[2] <- "steps"
## Error in names(act.day)[2] <- "steps": objeto 'act.day' no encontrado
names(act.day)[1] <- "date"
## Error in names(act.day)[1] <- "date": objeto 'act.day' no encontrado
names(act.interval)[2] <- "steps"
## Error in names(act.interval)[2] <- "steps": objeto 'act.interval' no encontrado
names(act.interval)[1] <- "interval"
## Error in names(act.interval)[1] <- "interval": objeto 'act.interval' no encontrado
- Now, to the orginal data, we'll aggregate and name a column with the mean number of steps per interval
act.m.interval <- aggregate(act$steps, by=list(act$interval), mean, na.rm=TRUE, na.action=NULL)
## Error in aggregate(act$steps, by = list(act$interval), mean, na.rm = TRUE, : objeto 'act' no encontrado
names(act.m.interval)[1] <- "interval"
## Error in names(act.m.interval)[1] <- "interval": objeto 'act.m.interval' no encontrado
names(act.m.interval)[2] <- "mean.steps"
## Error in names(act.m.interval)[2] <- "mean.steps": objeto 'act.m.interval' no encontrado
--- .class #4
Slide 4
First Question: ¿What is mean total number of steps taken per day?
We'll calculate both MEAN and MEDIAN:
mean(act.day$steps, na.rm = TRUE)
## Error in mean(act.day$steps, na.rm = TRUE): objeto 'act.day' no encontrado
median(act.day$steps, na.rm = TRUE )
## Error in median(act.day$steps, na.rm = TRUE): objeto 'act.day' no encontrado
Note that the summary command shows, also, the number of NA in the set
summary(act.day$steps)
## Error in summary(act.day$steps): objeto 'act.day' no encontrado
- And, the requested histogram:*
hist(act.day$steps, col = "lavender", main = "Histogram of Total Number of Steps per Day",
xlab = "Total Number of Steps per Day")
## Error in hist(act.day$steps, col = "lavender", main = "Histogram of Total Number of Steps per Day", : objeto 'act.day' no encontrado
Second Question: ¿What is the average daily activity pattern? Specifically:
- Make a time series plot (i.e. type = "l") of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all days (y-axis)
- Which 5-minute interval, on average across all the days in the dataset, contains the maximum number of steps?
Something slightly different (I do like more circles around dots of data than simple lines)
data <- read.csv("activity.csv")
## Warning in file(file, "rt"): no fue posible abrir el archivo
## 'activity.csv': No such file or directory
## Error in file(file, "rt"): no se puede abrir la conexión
stepsInInterval<-aggregate(steps~interval, data, mean)
## Error in terms.formula(formula, data = data): 'data' argument is of the wrong type
plot(stepsInInterval$interval, stepsInInterval$steps, type='o', col='blue',main="Average of steps per day", xlab="Interval", ylab="Average of Steps in the Interval")
## Error in plot(stepsInInterval$interval, stepsInInterval$steps, type = "o", : objeto 'stepsInInterval' no encontrado
Now we want to find which 5-minute interval,in the dataset, contains the maximum number of steps?
(note the answer points exactly to the sudden peak in previous plot: the 5-minutes interval number 835)
act.m.interval[which.max(act.m.interval$mean.steps), 1]
## Error in eval(expr, envir, enclos): objeto 'act.m.interval' no encontrado
Now: "The presence of missing days may introduce bias into some calculations or summaries of the data"
Third Question: ¿Are there differences in activity patterns between weekdays and weekends?
¿How many NA values are in the set?
table(is.na(data$steps))
## Error in data$steps: objeto de tipo 'closure' no es subconjunto
In order to correct this situation, let's merge and replace lost/missed/NA values with
the MEAN value for the interval, as given by the 'function' act.m.interval.
and then create a 'new' set with NO NA values
act.lost <- merge(act, act.m.interval, by = "interval", sort= FALSE)
## Error in merge(act, act.m.interval, by = "interval", sort = FALSE): objeto 'act' no encontrado
act.lost$steps[is.na(act.lost$steps)] <- act.lost$mean.steps[is.na(act.lost$steps)]
## Error in eval(expr, envir, enclos): objeto 'act.lost' no encontrado
act.nona <- act.lost[, c(2,3,1)]
## Error in eval(expr, envir, enclos): objeto 'act.lost' no encontrado
Before going any further, compare the new and old set of data
Create a new dataset with the total steps per day
act.day.new <- aggregate(act.nona$steps, by=list(act.nona$date), sum)
## Error in aggregate(act.nona$steps, by = list(act.nona$date), sum): objeto 'act.nona' no encontrado
names(act.day.new)[1] <-"day"
## Error in names(act.day.new)[1] <- "day": objeto 'act.day.new' no encontrado
names(act.day.new)[2] <-"steps"
## Error in names(act.day.new)[2] <- "steps": objeto 'act.day.new' no encontrado
And now plot the new 'corrected' histogram
hist(act.day.new$steps, col = "blue", main = "Total Number of Steps per Day (*without* NA values)", xlab = "Total Steps")
## Error in hist(act.day.new$steps, col = "blue", main = "Total Number of Steps per Day (*without* NA values)", : objeto 'act.day.new' no encontrado
By looking histograms is hard to tell a difference; let's compare using the MEAN & MEDIAN:
mean(act.day.new$steps)
## Error in mean(act.day.new$steps): objeto 'act.day.new' no encontrado
median(act.day.new$steps)
## Error in median(act.day.new$steps): objeto 'act.day.new' no encontrado
MEAN values with AND without NA data are the same but, the original MEDIAN was slightly smaller than the 'corrected' value
Fourth Question: ¿Are there differences in activity patterns between weekdays and weekends?
First we need to separate our set in 'weekdays' and 'weekend' days.
And then we add a new column with this new datum: wDay (week or weekend day.)
act.nona$wDay <- ifelse(as.POSIXlt(act.nona$date)$wday %in% c(0,6), 'weekend', 'weekday')
## Error in as.POSIXlt(act.nona$date): objeto 'act.nona' no encontrado
adi <- aggregate(steps ~ interval + wDay, data=act.nona, mean)
## Error in eval(expr, envir, enclos): objeto 'act.nona' no encontrado
Now it is possible to use, again, a time series plot with 'interval' in the X-axis and
the average number of steps per days@interval in the Y-axis and compare the activity
of weekdays versus weekend days.
ggplot(adi, aes(interval, steps)) +
geom_line() +
facet_grid(wDay ~ .) +
xlab("5-minute Interval") +
ylab("Average Number of Steps")
## Error in ggplot(adi, aes(interval, steps)): objeto 'adi' no encontrado