Introduction

Ask

Prepare

Process

install.packages("tidyverse")
install.packages("ggplot2")
install.packages("dplyr")
install.packages("tidyr")
install.packages("lubridate")
install.packages("janitor")
library(tidyverse)
library(ggplot2)
library(dplyr)
library(tidyr)
library(lubridate)
library(janitor)

Import dataset

dailyActivity_merged<-read_csv("dailyActivity_merged.csv")
sleepDay_merged<-read_csv("sleepDay_merged.csv")
dailySteps_merged<-read_csv("dailySteps_merged.csv")
hourlySteps_merged<-read_csv("hourlySteps_merged.csv")

Process data for dailyActivity_merged

Check the structure and preview the dataframe

head (dailyActivity_merged)
str(dailyActivity_merged)
## spec_tbl_df [940 × 15] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Id                      : num [1:940] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityDate            : chr [1:940] "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
##  $ TotalSteps              : num [1:940] 13162 10735 10460 9762 12669 ...
##  $ TotalDistance           : num [1:940] 8.5 6.97 6.74 6.28 8.16 ...
##  $ TrackerDistance         : num [1:940] 8.5 6.97 6.74 6.28 8.16 ...
##  $ LoggedActivitiesDistance: num [1:940] 0 0 0 0 0 0 0 0 0 0 ...
##  $ VeryActiveDistance      : num [1:940] 1.88 1.57 2.44 2.14 2.71 ...
##  $ ModeratelyActiveDistance: num [1:940] 0.55 0.69 0.4 1.26 0.41 ...
##  $ LightActiveDistance     : num [1:940] 6.06 4.71 3.91 2.83 5.04 ...
##  $ SedentaryActiveDistance : num [1:940] 0 0 0 0 0 0 0 0 0 0 ...
##  $ VeryActiveMinutes       : num [1:940] 25 21 30 29 36 38 42 50 28 19 ...
##  $ FairlyActiveMinutes     : num [1:940] 13 19 11 34 10 20 16 31 12 8 ...
##  $ LightlyActiveMinutes    : num [1:940] 328 217 181 209 221 164 233 264 205 211 ...
##  $ SedentaryMinutes        : num [1:940] 728 776 1218 726 773 ...
##  $ Calories                : num [1:940] 1985 1797 1776 1745 1863 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Id = col_double(),
##   ..   ActivityDate = col_character(),
##   ..   TotalSteps = col_double(),
##   ..   TotalDistance = col_double(),
##   ..   TrackerDistance = col_double(),
##   ..   LoggedActivitiesDistance = col_double(),
##   ..   VeryActiveDistance = col_double(),
##   ..   ModeratelyActiveDistance = col_double(),
##   ..   LightActiveDistance = col_double(),
##   ..   SedentaryActiveDistance = col_double(),
##   ..   VeryActiveMinutes = col_double(),
##   ..   FairlyActiveMinutes = col_double(),
##   ..   LightlyActiveMinutes = col_double(),
##   ..   SedentaryMinutes = col_double(),
##   ..   Calories = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

Check for number of users and null values

  • Check for number of users
n_distinct(dailyActivity_merged$Id)
## [1] 33
  • Check for null values. FALSE=no null value
is.null(dailyActivity_merged)
## [1] FALSE
head(dailyActivity_merged$ActivityDate)
## [1] "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" "4/16/2016" "4/17/2016"

Change of date column format

  • Date column is confusing to understand. For easier reference, it will be changed to YYMMDD instead
dailyActivity_merged$ActivityDate=as.Date(dailyActivity_merged$ActivityDate,"%m/%d/%Y")
head(dailyActivity_merged$ActivityDate)
## [1] "2016-04-12" "2016-04-13" "2016-04-14" "2016-04-15" "2016-04-16"
## [6] "2016-04-17"

Process data for sleepday

Check the structure and preview the dataframe

head (sleepDay_merged)
str(sleepDay_merged)
## spec_tbl_df [413 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Id                : num [1:413] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ SleepDay          : chr [1:413] "4/12/2016 12:00:00 AM" "4/13/2016 12:00:00 AM" "4/15/2016 12:00:00 AM" "4/16/2016 12:00:00 AM" ...
##  $ TotalSleepRecords : num [1:413] 1 2 1 2 1 1 1 1 1 1 ...
##  $ TotalMinutesAsleep: num [1:413] 327 384 412 340 700 304 360 325 361 430 ...
##  $ TotalTimeInBed    : num [1:413] 346 407 442 367 712 320 377 364 384 449 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Id = col_double(),
##   ..   SleepDay = col_character(),
##   ..   TotalSleepRecords = col_double(),
##   ..   TotalMinutesAsleep = col_double(),
##   ..   TotalTimeInBed = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

Check for number of users and null values

  • Check for number of users
n_distinct(sleepDay_merged$Id)
## [1] 24
  • Check for null values. FALSE=no null value
is.null(sleepDay_merged)
## [1] FALSE

Convert sleepday datetime to date

sleepDay_merged$SleepDay=as.Date(sleepDay_merged$SleepDay,"%m/%d/%Y")

Process data for dailysteps

Check the structure and preview the dataframe

head (dailySteps_merged)
str(dailySteps_merged)
## spec_tbl_df [940 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Id         : num [1:940] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityDay: chr [1:940] "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
##  $ StepTotal  : num [1:940] 13162 10735 10460 9762 12669 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Id = col_double(),
##   ..   ActivityDay = col_character(),
##   ..   StepTotal = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

Check for number of users and null values

  • Check for number of users
n_distinct(dailySteps_merged$Id)
## [1] 33
  • Check for null values. FALSE=no null value
is.null(dailySteps_merged)
## [1] FALSE

Convert dailysteps datetime to date

dailySteps_merged$ActivityDay=as.Date(dailySteps_merged$ActivityDay,"%m/%d/%Y")

Process data for hourlysteps

Check the structure and preview the dataframe

head (hourlySteps_merged)
str(hourlySteps_merged)
## spec_tbl_df [22,099 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Id          : num [1:22099] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityHour: chr [1:22099] "4/12/2016 12:00:00 AM" "4/12/2016 1:00:00 AM" "4/12/2016 2:00:00 AM" "4/12/2016 3:00:00 AM" ...
##  $ StepTotal   : num [1:22099] 373 160 151 0 0 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Id = col_double(),
##   ..   ActivityHour = col_character(),
##   ..   StepTotal = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

Check for number of users and null values

  • Check for number of users
n_distinct(hourlySteps_merged$Id)
## [1] 33
  • Check for null values. FALSE=no null value
is.null(hourlySteps_merged)
## [1] FALSE

Rename column names and format date/time

Hourly_steps<- hourlySteps_merged %>% 
  rename(date_time = ActivityHour) %>% 
  mutate(date_time = as.POSIXct(date_time,format ="%m/%d/%Y %I:%M:%S %p" , tz=Sys.timezone()))


Hourly_steps <- Hourly_steps %>%
  separate(date_time, into = c("date", "time"), sep= " ") %>%
  mutate(date = ymd(date)) 

head(Hourly_steps)
str(Hourly_steps)
## tibble [22,099 × 4] (S3: tbl_df/tbl/data.frame)
##  $ Id       : num [1:22099] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ date     : Date[1:22099], format: "2016-04-12" "2016-04-12" ...
##  $ time     : chr [1:22099] "00:00:00" "01:00:00" "02:00:00" "03:00:00" ...
##  $ StepTotal: num [1:22099] 373 160 151 0 0 ...

Rename sleep day date column

sleepDay_merged=rename(sleepDay_merged, ActivityDate=SleepDay)

Merge dailyactivity with sleepday

  • Merging of data is to ensure a more fluent visualization process
data_merged=merge(dailyActivity_merged,sleepDay_merged, by=c("Id","ActivityDate"))

Analyze

Correlations

Correlation between steps and sedentary minutes

From the graph above, it can be seen that:

  • Generally, total steps falls between the range of 0 to 15000 steps

  • Steps and sedentary minutes are inversely correlated. With increase in steps, there will be a decrease in sedentary minutes.

  • When total steps reaches above 15000, there is a slight increase in sedentary minutes. However, there is insufficient measurements for total steps of 15000 and above. Hence, more data will be required for this observation.

Correlation between steps and calories

From the graph above, it can be seen that: * Steps is directly correlated to calories. With increase in total steps, there will be increase in calories

Correlation between steps and sleep minutes

From the graph above, it can be seen that there is no correlation between steps and sleep minutes

Determine which day of the week is farily active

According to healthline, the minimum total steps for fairly active is 7500

From the bar chart above, the days that is fairly active are Monday, Tuesday and Saturday.

Determine which day of the week has enough sleep

According to national sleep foundation, healthy adults need between 7 and 9 hours of sleep per night. By taking the average, sleep timing of 8 hours will be used.

From the bar chart above, it can be observed that none of the users meet the advised hours of sleep.

Determine the activity of total steps throughout the day

It can be observed that users are generally active during the early afternoon(1200-1300) as well as early evening(1700-1800).

Visualize the activeness of all the users through activeness minutes

Activeness can be measured through the use of heartbeat or total steps. According to centers for disease control and prevention, activeness intensity is dependence on age and percentage of maximum heartbeat. In this data set, the respective activeness is measured based on heartbeat in minutes and has been taken down using fit bit.

From the pie chart, it can be observed that:

  • 81% of the users are sitting down at sedentary most of the time. This could possibly mean that most fitbit users are either student or working in office environment, doing desk job.

  • 16% of the users have a lightly active lifestyle. This might be due to commuting to work/school

  • 1% are fairly active and 2% are very active users. It is possible to increase the percentage on these two groups through marketing.

Share

Note: In the case of real world task, a powerpoint slide will be done instead of the summary below

After visualizations, some of the observations on user’s trend are stated as follows:

  1. Generally, fitbit users are not very active. This is supported by:
  • 4/7 of their week meets the minimum steps of 7500
  • 81% of the users are at sedentary most of the time
  • Bellabeat’s marketing team can work on this area to lower the sedentary minutes and increase the weekly steps so as to keep a healthier lifestyle
  1. Users usually does not get enough hours of sleep throughout the week, with Sunday having the longest hours of sleep of around 7.5 hours.

  2. With reference to the heatmap on total steps against time and weekdays:

  • Users have a lower total step count on Sunday
  • Users have a higher total step on Tuesday and Wednesday
  • Users are generally more active during lunch timing(1200-1300) and dinner timing(1700-1800)
  • Users are much less active after 2000 hrs and before 1100 hrs

Act(Recommendations)

Below are the future actions that is recommended to Bellabeat:

There could be a lot of reasons for the user to have a low step count and inactive lifestyle, such as: did not bring along their tracker, tracker does not look fashionable, short battery life, etc. Hence, the company is strongly advised to: