4/12/2020

Introduction

For this project we’ll use some of the plotting shown on the Reproducible Research course first assignment (which you can see here https://github.com/alroru95/RepData_PeerAssessment1). In this project the plot shown will the one comparing activity patterns (nº of steps per day type and 5 minute interval) between weekdays after imputing NA values. The steps to process data are shown in this presentation

Data load

URLzip <- 
  "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2Factivity.zip"
download.file(URLzip, "./RepData.zip", method = "curl")
unzip("./RepData.zip", exdir = "./RepData")
steps <- read.csv("./RepData/activity.csv", header = TRUE, sep = ",")

Removing NA’s and aggregating per day type

library(imputeMissings)
imputed_steps <- impute(steps, object = NULL, method = "median/mode" , 
                        flag = FALSE)
imputed_steps_day <- aggregate(steps ~ date, data = imputed_steps, 
                               FUN = sum, na.rm = TRUE)
imputed_steps$date <- as.Date(imputed_steps$date, "%Y-%m-%d")
weekday <- weekdays(imputed_steps$date)
weekday_steps <- cbind(imputed_steps, weekday)
weekday_steps$DayType <-
  ifelse(weekday_steps$weekday == "sábado" | 
           weekday_steps$weekday == "domingo", "weekend", "weekday")        
## In Spanish: sábado = Saturday, domingo = Sunday.
weekday_steps_interval <- aggregate(steps ~ interval + DayType, 
                                    data = weekday_steps, FUN = mean)

Plotting

library(plotly)
plot_ly(weekday_steps_interval, x = ~interval, y = ~steps, 
        color = ~DayType, colors = "Set1", mode = "markers")