April 9, 2017

Background

This R Markdown presentation intends to feature plots created with Plotly.

Dataset to be used is from the course website: activity.csv

This was the same dataset provided for the Week 2 assignment of the Reproducible Research of Data Science Course I took . It was mentioned that the data are from a personal activity monitoring device. This device collects data at 5 minute intervals through out the day. The data consists of two months of data from an anonymous individual, collected during the months of October and November 2012 and include the number of steps taken in 5 minute intervals each day.

Load and process the data

activity <- read.csv("activity.csv",na.strings = "NA")
library(plotly)
head(activity)
  steps       date interval
1    NA 2012-10-01        0
2    NA 2012-10-01        5
3    NA 2012-10-01       10
4    NA 2012-10-01       15
5    NA 2012-10-01       20
6    NA 2012-10-01       25

We may observe that there are NA's in the first rows of the dataset. So let us count the total number of missing values in the dataset (i.e. the total number of rows with NAs) and its percentage to the total observations.

Load and process the data

     steps             date               interval     
 Min.   :  0.00   Min.   :2012-10-01   Min.   :   0.0  
 1st Qu.:  0.00   1st Qu.:2012-10-16   1st Qu.: 588.8  
 Median :  0.00   Median :2012-10-31   Median :1177.5  
 Mean   : 37.38   Mean   :2012-10-31   Mean   :1177.5  
 3rd Qu.: 12.00   3rd Qu.:2012-11-15   3rd Qu.:1766.2  
 Max.   :806.00   Max.   :2012-11-30   Max.   :2355.0  
 NA's   :2304                                          
CountNA_steps
[1] 2304
RatioNA_steps
[1] 0.1311475

Load and process the data

There are 2304 missing values under "steps" column or 0.1311475 of the total observations.

Figures are significant. So, let us fill in the missing values by the mean number of steps for the corresponding 5-minute interval.

Result from the new data frame:

head(activity)
      steps       date interval
1 1.7169811 2012-10-01        0
2 0.3396226 2012-10-01        5
3 0.1320755 2012-10-01       10
4 0.1509434 2012-10-01       15
5 0.0754717 2012-10-01       20
6 2.0943396 2012-10-01       25

Create plots with Plotly

1. Histogram of the total number of steps taken each day

head(TotalSteps)
        date TotalSteps
1 2012-10-01   10766.19
2 2012-10-02     126.00
3 2012-10-03   11352.00
4 2012-10-04   12116.00
5 2012-10-05   13294.00
6 2012-10-06   15420.00

1. Histogram of the total number of steps taken each day

plot_ly(TotalSteps, x = ~TotalSteps, type = "histogram")

2. Time series plot of the average number of steps taken each day

head(MeanPerDay)
        date MeanSteps
1 2012-10-01  37.38260
2 2012-10-02   0.43750
3 2012-10-03  39.41667
4 2012-10-04  42.06944
5 2012-10-05  46.15972
6 2012-10-06  53.54167

2. Time series plot of the average number of steps taken each day

plot_ly(MeanPerDay, x = ~date, y = ~MeanSteps, type = "scatter", mode = "lines")

3. Plot comparing the average number of steps taken per 5-minute interval across weekdays and weekends

head(activitySummary)
  interval   MeanSteps DayType
1        0 0.214622642 weekend
2        5 0.042452830 weekend
3       10 0.016509434 weekend
4       15 0.018867925 weekend
5       20 0.009433962 weekend
6       25 3.511792453 weekend
tail(activitySummary)
    interval MeanSteps DayType
571     2330 3.0360587 weekday
572     2335 2.2486373 weekday
573     2340 2.2402516 weekday
574     2345 0.2633124 weekday
575     2350 0.2968553 weekday
576     2355 1.4100629 weekday

3. Plot comparing the average number of steps taken per 5-minute interval across weekdays and weekends

plot_ly(activitySummary, x = ~interval, y = ~MeanSteps, color = ~DayType, type = "scatter", mode = "lines", colors = c("blue","green"))

— END —