Introduction

Bellabeat

Bellabeat is founded by Urška Sršen and Sando Mur, a high-tech company that manufactures health-focused smart products. Sršen used her background as an artist to develop beautifully designed technology that informs and inspires women around the world. Collecting data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with knowledge about their own health and habits. Since it was founded in 2013, Bellabeat has grown rapidly and quickly positioned itself as a tech-driven wellness company for women.

By 2016, Bellabeat had opened offices around the world and launched multiple products. Bellabeat products became available through a growing number of online retailers in addition to their own e-commerce channel on their website. The company has invested in traditional advertising media, such as radio, out-of-home billboards, print, and television, but focuses on digital marketing extensively. Bellabeat invests year-round in Google Search, maintaining active Facebook and Instagram pages, and consistently engages consumers on Twitter. Additionally, Bellabeat runs video ads on Youtube and display ads on the Google Display Network to support campaigns around key marketing dates.

Products:
  • Bellabeat app: The Bellabeat app provides users with health data related to their activity, sleep, stress, menstrual cycle, and mindfulness habits. This data can help users better understand their current habits and make healthy decisions. The Bellabeat app connects to their line of smart wellness products. Leaf: Bellabeat’s classic wellness tracker can be worn as a bracelet, necklace, or clip. The Leaf tracker connects to the Bellabeat app to track activity, sleep, and stress.

  • Time: This wellness watch combines the timeless look of a classic timepiece with smart technology to track user activity, sleep, and stress. The Time watch connects to the Bellabeat app to provide you with insights into your daily wellness.

  • Spring: This is a water bottle that tracks daily water intake using smart technology to ensure that you are appropriately hydrated throughout the day. The Spring bottle connects to the Bellabeat app to track your hydration levels.

  • Bellabeat membership: Bellabeat also offers a subscription-based membership program for users. Membership gives users 24/7 access to fully personalized guidance on nutrition, activity, sleep, health and beauty, and mindfulness based on their lifestyle and goals.

Phase 1: ASK

Business Task

Analyze smart device usage data in order to gain insight into how consumers use non-Bellabeat smart devices

Question for the Analysis

  • What are some trends in smart device usage?
  • How could these trends apply to Bellabeat customers?
  • How could these trends help influence Bellabeat marketing strategy?

Phase 2: PREPARE

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.7     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
dailyActivity <- read_csv("dailyActivity_merged.csv")
## Rows: 940 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): ActivityDate
## dbl (14): Id, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDi...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
sleep_data <- read_csv("sleepDay_merged.csv")
## Rows: 413 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): SleepDay
## dbl (4): Id, TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
weight_info <- read_csv("weightLogInfo_merged.csv")
## Rows: 67 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Date
## dbl (6): Id, WeightKg, WeightPounds, Fat, BMI, LogId
## lgl (1): IsManualReport
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(dailyActivity)
## # A tibble: 6 × 15
##        Id ActivityDate TotalSteps TotalDistance TrackerDistance LoggedActivitie…
##     <dbl> <chr>             <dbl>         <dbl>           <dbl>            <dbl>
## 1  1.50e9 4/12/2016         13162          8.5             8.5                 0
## 2  1.50e9 4/13/2016         10735          6.97            6.97                0
## 3  1.50e9 4/14/2016         10460          6.74            6.74                0
## 4  1.50e9 4/15/2016          9762          6.28            6.28                0
## 5  1.50e9 4/16/2016         12669          8.16            8.16                0
## 6  1.50e9 4/17/2016          9705          6.48            6.48                0
## # … with 9 more variables: VeryActiveDistance <dbl>,
## #   ModeratelyActiveDistance <dbl>, LightActiveDistance <dbl>,
## #   SedentaryActiveDistance <dbl>, VeryActiveMinutes <dbl>,
## #   FairlyActiveMinutes <dbl>, LightlyActiveMinutes <dbl>,
## #   SedentaryMinutes <dbl>, Calories <dbl>
head(sleep_data)
## # A tibble: 6 × 5
##           Id SleepDay           TotalSleepRecor… TotalMinutesAsl… TotalTimeInBed
##        <dbl> <chr>                         <dbl>            <dbl>          <dbl>
## 1 1503960366 4/12/2016 12:00:0…                1              327            346
## 2 1503960366 4/13/2016 12:00:0…                2              384            407
## 3 1503960366 4/15/2016 12:00:0…                1              412            442
## 4 1503960366 4/16/2016 12:00:0…                2              340            367
## 5 1503960366 4/17/2016 12:00:0…                1              700            712
## 6 1503960366 4/19/2016 12:00:0…                1              304            320
head(weight_info)
## # A tibble: 6 × 8
##           Id Date       WeightKg WeightPounds   Fat   BMI IsManualReport   LogId
##        <dbl> <chr>         <dbl>        <dbl> <dbl> <dbl> <lgl>            <dbl>
## 1 1503960366 5/2/2016 …     52.6         116.    22  22.6 TRUE           1.46e12
## 2 1503960366 5/3/2016 …     52.6         116.    NA  22.6 TRUE           1.46e12
## 3 1927972279 4/13/2016…    134.          294.    NA  47.5 FALSE          1.46e12
## 4 2873212765 4/21/2016…     56.7         125.    NA  21.5 TRUE           1.46e12
## 5 2873212765 5/12/2016…     57.3         126.    NA  21.7 TRUE           1.46e12
## 6 4319703577 4/17/2016…     72.4         160.    25  27.5 TRUE           1.46e12

Phase 3: Process

n_distinct(dailyActivity$Id)
## [1] 33
n_distinct(sleep_data$Id)
## [1] 24
n_distinct(weight_info$Id)
## [1] 8
dailyActivity <- dailyActivity %>%
  rename(date = ActivityDate)
sleep_data <- sleep_data %>%
  rename(Date = SleepDay)
str(dailyActivity)
## spec_tbl_df [940 × 15] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Id                      : num [1:940] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ date                    : chr [1:940] "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
##  $ TotalSteps              : num [1:940] 13162 10735 10460 9762 12669 ...
##  $ TotalDistance           : num [1:940] 8.5 6.97 6.74 6.28 8.16 ...
##  $ TrackerDistance         : num [1:940] 8.5 6.97 6.74 6.28 8.16 ...
##  $ LoggedActivitiesDistance: num [1:940] 0 0 0 0 0 0 0 0 0 0 ...
##  $ VeryActiveDistance      : num [1:940] 1.88 1.57 2.44 2.14 2.71 ...
##  $ ModeratelyActiveDistance: num [1:940] 0.55 0.69 0.4 1.26 0.41 ...
##  $ LightActiveDistance     : num [1:940] 6.06 4.71 3.91 2.83 5.04 ...
##  $ SedentaryActiveDistance : num [1:940] 0 0 0 0 0 0 0 0 0 0 ...
##  $ VeryActiveMinutes       : num [1:940] 25 21 30 29 36 38 42 50 28 19 ...
##  $ FairlyActiveMinutes     : num [1:940] 13 19 11 34 10 20 16 31 12 8 ...
##  $ LightlyActiveMinutes    : num [1:940] 328 217 181 209 221 164 233 264 205 211 ...
##  $ SedentaryMinutes        : num [1:940] 728 776 1218 726 773 ...
##  $ Calories                : num [1:940] 1985 1797 1776 1745 1863 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Id = col_double(),
##   ..   ActivityDate = col_character(),
##   ..   TotalSteps = col_double(),
##   ..   TotalDistance = col_double(),
##   ..   TrackerDistance = col_double(),
##   ..   LoggedActivitiesDistance = col_double(),
##   ..   VeryActiveDistance = col_double(),
##   ..   ModeratelyActiveDistance = col_double(),
##   ..   LightActiveDistance = col_double(),
##   ..   SedentaryActiveDistance = col_double(),
##   ..   VeryActiveMinutes = col_double(),
##   ..   FairlyActiveMinutes = col_double(),
##   ..   LightlyActiveMinutes = col_double(),
##   ..   SedentaryMinutes = col_double(),
##   ..   Calories = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
str(sleep_data)
## spec_tbl_df [413 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Id                : num [1:413] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ Date              : chr [1:413] "4/12/2016 12:00:00 AM" "4/13/2016 12:00:00 AM" "4/15/2016 12:00:00 AM" "4/16/2016 12:00:00 AM" ...
##  $ TotalSleepRecords : num [1:413] 1 2 1 2 1 1 1 1 1 1 ...
##  $ TotalMinutesAsleep: num [1:413] 327 384 412 340 700 304 360 325 361 430 ...
##  $ TotalTimeInBed    : num [1:413] 346 407 442 367 712 320 377 364 384 449 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Id = col_double(),
##   ..   SleepDay = col_character(),
##   ..   TotalSleepRecords = col_double(),
##   ..   TotalMinutesAsleep = col_double(),
##   ..   TotalTimeInBed = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
str(weight_info)
## spec_tbl_df [67 × 8] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Id            : num [1:67] 1.50e+09 1.50e+09 1.93e+09 2.87e+09 2.87e+09 ...
##  $ Date          : chr [1:67] "5/2/2016 11:59:59 PM" "5/3/2016 11:59:59 PM" "4/13/2016 1:08:52 AM" "4/21/2016 11:59:59 PM" ...
##  $ WeightKg      : num [1:67] 52.6 52.6 133.5 56.7 57.3 ...
##  $ WeightPounds  : num [1:67] 116 116 294 125 126 ...
##  $ Fat           : num [1:67] 22 NA NA NA NA 25 NA NA NA NA ...
##  $ BMI           : num [1:67] 22.6 22.6 47.5 21.5 21.7 ...
##  $ IsManualReport: logi [1:67] TRUE TRUE FALSE TRUE TRUE TRUE ...
##  $ LogId         : num [1:67] 1.46e+12 1.46e+12 1.46e+12 1.46e+12 1.46e+12 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Id = col_double(),
##   ..   Date = col_character(),
##   ..   WeightKg = col_double(),
##   ..   WeightPounds = col_double(),
##   ..   Fat = col_double(),
##   ..   BMI = col_double(),
##   ..   IsManualReport = col_logical(),
##   ..   LogId = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
dailyActivity$date <- as.Date.character(dailyActivity$date, format="%m/%d/%Y")
sleep_data$Date <- as.Date.character(sleep_data$Date,format="%m/%d/%Y")
weight_info$Date <- as.Date.character(weight_info$Date,format="%m/%d/%Y")
sleep_data$date <- as.Date(sleep_data$Date)
sleep_data$time <- format(as.POSIXct(sleep_data$Date),
                          format = "%H:%M:%S")
weight_info$date <- as.Date(weight_info$Date)
weight_info$time <- format(as.POSIXct(weight_info$Date),
                          format = "%H:%M:%S")
dailyActivity$weekday = wday(dailyActivity$date, label = T)
sleep_data$weekday = wday(sleep_data$date, label = T)
weight_info$weekday = wday(weight_info$date, label = T)
head(dailyActivity)
## # A tibble: 6 × 16
##          Id date       TotalSteps TotalDistance TrackerDistance LoggedActivitie…
##       <dbl> <date>          <dbl>         <dbl>           <dbl>            <dbl>
## 1    1.50e9 2016-04-12      13162          8.5             8.5                 0
## 2    1.50e9 2016-04-13      10735          6.97            6.97                0
## 3    1.50e9 2016-04-14      10460          6.74            6.74                0
## 4    1.50e9 2016-04-15       9762          6.28            6.28                0
## 5    1.50e9 2016-04-16      12669          8.16            8.16                0
## 6    1.50e9 2016-04-17       9705          6.48            6.48                0
## # … with 10 more variables: VeryActiveDistance <dbl>,
## #   ModeratelyActiveDistance <dbl>, LightActiveDistance <dbl>,
## #   SedentaryActiveDistance <dbl>, VeryActiveMinutes <dbl>,
## #   FairlyActiveMinutes <dbl>, LightlyActiveMinutes <dbl>,
## #   SedentaryMinutes <dbl>, Calories <dbl>, weekday <ord>
head(sleep_data)
## # A tibble: 6 × 8
##        Id Date       TotalSleepRecor… TotalMinutesAsl… TotalTimeInBed date      
##     <dbl> <date>                <dbl>            <dbl>          <dbl> <date>    
## 1  1.50e9 2016-04-12                1              327            346 2016-04-12
## 2  1.50e9 2016-04-13                2              384            407 2016-04-13
## 3  1.50e9 2016-04-15                1              412            442 2016-04-15
## 4  1.50e9 2016-04-16                2              340            367 2016-04-16
## 5  1.50e9 2016-04-17                1              700            712 2016-04-17
## 6  1.50e9 2016-04-19                1              304            320 2016-04-19
## # … with 2 more variables: time <chr>, weekday <ord>
head(weight_info)
## # A tibble: 6 × 11
##           Id Date       WeightKg WeightPounds   Fat   BMI IsManualReport   LogId
##        <dbl> <date>        <dbl>        <dbl> <dbl> <dbl> <lgl>            <dbl>
## 1 1503960366 2016-05-02     52.6         116.    22  22.6 TRUE           1.46e12
## 2 1503960366 2016-05-03     52.6         116.    NA  22.6 TRUE           1.46e12
## 3 1927972279 2016-04-13    134.          294.    NA  47.5 FALSE          1.46e12
## 4 2873212765 2016-04-21     56.7         125.    NA  21.5 TRUE           1.46e12
## 5 2873212765 2016-05-12     57.3         126.    NA  21.7 TRUE           1.46e12
## 6 4319703577 2016-04-17     72.4         160.    25  27.5 TRUE           1.46e12
## # … with 3 more variables: date <date>, time <chr>, weekday <ord>
dailyActivity %>%  
  select(TotalSteps,
         TotalDistance,
         SedentaryMinutes) %>%
  summary()
##    TotalSteps    TotalDistance    SedentaryMinutes
##  Min.   :    0   Min.   : 0.000   Min.   :   0.0  
##  1st Qu.: 3790   1st Qu.: 2.620   1st Qu.: 729.8  
##  Median : 7406   Median : 5.245   Median :1057.5  
##  Mean   : 7638   Mean   : 5.490   Mean   : 991.2  
##  3rd Qu.:10727   3rd Qu.: 7.713   3rd Qu.:1229.5  
##  Max.   :36019   Max.   :28.030   Max.   :1440.0
sleep_data %>% 
  select(TotalMinutesAsleep,
         TotalTimeInBed) %>% 
  summary()
##  TotalMinutesAsleep TotalTimeInBed 
##  Min.   : 58.0      Min.   : 61.0  
##  1st Qu.:361.0      1st Qu.:403.0  
##  Median :433.0      Median :463.0  
##  Mean   :419.5      Mean   :458.6  
##  3rd Qu.:490.0      3rd Qu.:526.0  
##  Max.   :796.0      Max.   :961.0
weight_info %>% 
  select(BMI,
         WeightKg) %>% 
  summary()
##       BMI           WeightKg     
##  Min.   :21.45   Min.   : 52.60  
##  1st Qu.:23.96   1st Qu.: 61.40  
##  Median :24.39   Median : 62.50  
##  Mean   :25.19   Mean   : 72.04  
##  3rd Qu.:25.56   3rd Qu.: 85.05  
##  Max.   :47.54   Max.   :133.50
dailyActivity_Sleep_Merged <- inner_join(dailyActivity, sleep_data, by = c("Id", "date"))
dailyActivity_Sleep_Merged %>%  
  select(TotalSteps,
         TotalDistance,
         SedentaryMinutes,
         TotalMinutesAsleep,
         TotalTimeInBed) %>%
  summary()
##    TotalSteps    TotalDistance    SedentaryMinutes TotalMinutesAsleep
##  Min.   :   17   Min.   : 0.010   Min.   :   0.0   Min.   : 58.0     
##  1st Qu.: 5206   1st Qu.: 3.600   1st Qu.: 631.0   1st Qu.:361.0     
##  Median : 8925   Median : 6.290   Median : 717.0   Median :433.0     
##  Mean   : 8541   Mean   : 6.039   Mean   : 712.2   Mean   :419.5     
##  3rd Qu.:11393   3rd Qu.: 8.030   3rd Qu.: 783.0   3rd Qu.:490.0     
##  Max.   :22770   Max.   :17.540   Max.   :1265.0   Max.   :796.0     
##  TotalTimeInBed 
##  Min.   : 61.0  
##  1st Qu.:403.0  
##  Median :463.0  
##  Mean   :458.6  
##  3rd Qu.:526.0  
##  Max.   :961.0
dailyActivity_Weight_Merged <- inner_join(dailyActivity, weight_info, by = c("Id", "date"))
dailyActivity_Sleep_Weight_Merged <- inner_join(dailyActivity_Sleep_Merged, weight_info, by = c("Id", "date"))
dailyActivity_Sleep_Weight_Merged %>%  
  select(TotalSteps,
         TotalDistance,
         SedentaryMinutes,
         TotalMinutesAsleep,
         TotalTimeInBed,
         BMI,
         WeightKg) %>%
  summary()
##    TotalSteps    TotalDistance    SedentaryMinutes TotalMinutesAsleep
##  Min.   :  356   Min.   : 0.250   Min.   : 127.0   Min.   :115.0     
##  1st Qu.: 5780   1st Qu.: 3.825   1st Qu.: 635.5   1st Qu.:399.0     
##  Median :10524   Median : 6.960   Median : 689.0   Median :442.0     
##  Mean   : 9687   Mean   : 6.523   Mean   : 688.5   Mean   :430.3     
##  3rd Qu.:12484   3rd Qu.: 8.730   3rd Qu.: 736.0   3rd Qu.:472.5     
##  Max.   :20031   Max.   :13.240   Max.   :1121.0   Max.   :630.0     
##  TotalTimeInBed       BMI           WeightKg     
##  Min.   :129.0   Min.   :22.65   Min.   : 52.60  
##  1st Qu.:420.0   1st Qu.:23.89   1st Qu.: 61.20  
##  Median :455.0   Median :24.00   Median : 61.50  
##  Mean   :449.8   Mean   :24.83   Mean   : 64.17  
##  3rd Qu.:494.0   3rd Qu.:24.17   3rd Qu.: 61.90  
##  Max.   :679.0   Max.   :47.54   Max.   :133.50
head(dailyActivity_Sleep_Merged)
## # A tibble: 6 × 22
##          Id date       TotalSteps TotalDistance TrackerDistance LoggedActivitie…
##       <dbl> <date>          <dbl>         <dbl>           <dbl>            <dbl>
## 1    1.50e9 2016-04-12      13162          8.5             8.5                 0
## 2    1.50e9 2016-04-13      10735          6.97            6.97                0
## 3    1.50e9 2016-04-15       9762          6.28            6.28                0
## 4    1.50e9 2016-04-16      12669          8.16            8.16                0
## 5    1.50e9 2016-04-17       9705          6.48            6.48                0
## 6    1.50e9 2016-04-19      15506          9.88            9.88                0
## # … with 16 more variables: VeryActiveDistance <dbl>,
## #   ModeratelyActiveDistance <dbl>, LightActiveDistance <dbl>,
## #   SedentaryActiveDistance <dbl>, VeryActiveMinutes <dbl>,
## #   FairlyActiveMinutes <dbl>, LightlyActiveMinutes <dbl>,
## #   SedentaryMinutes <dbl>, Calories <dbl>, weekday.x <ord>, Date <date>,
## #   TotalSleepRecords <dbl>, TotalMinutesAsleep <dbl>, TotalTimeInBed <dbl>,
## #   time <chr>, weekday.y <ord>
head(dailyActivity_Weight_Merged)
## # A tibble: 6 × 25
##          Id date       TotalSteps TotalDistance TrackerDistance LoggedActivitie…
##       <dbl> <date>          <dbl>         <dbl>           <dbl>            <dbl>
## 1    1.50e9 2016-05-02      14727        9.71            9.71                  0
## 2    1.50e9 2016-05-03      15103        9.66            9.66                  0
## 3    1.93e9 2016-04-13        356        0.25            0.25                  0
## 4    2.87e9 2016-04-21       8859        5.98            5.98                  0
## 5    2.87e9 2016-05-12       7566        5.11            5.11                  0
## 6    4.32e9 2016-04-17         29        0.0200          0.0200                0
## # … with 19 more variables: VeryActiveDistance <dbl>,
## #   ModeratelyActiveDistance <dbl>, LightActiveDistance <dbl>,
## #   SedentaryActiveDistance <dbl>, VeryActiveMinutes <dbl>,
## #   FairlyActiveMinutes <dbl>, LightlyActiveMinutes <dbl>,
## #   SedentaryMinutes <dbl>, Calories <dbl>, weekday.x <ord>, Date <date>,
## #   WeightKg <dbl>, WeightPounds <dbl>, Fat <dbl>, BMI <dbl>,
## #   IsManualReport <lgl>, LogId <dbl>, time <chr>, weekday.y <ord>
head(dailyActivity_Sleep_Weight_Merged)
## # A tibble: 6 × 31
##          Id date       TotalSteps TotalDistance TrackerDistance LoggedActivitie…
##       <dbl> <date>          <dbl>         <dbl>           <dbl>            <dbl>
## 1    1.50e9 2016-05-02      14727          9.71            9.71                0
## 2    1.50e9 2016-05-03      15103          9.66            9.66                0
## 3    1.93e9 2016-04-13        356          0.25            0.25                0
## 4    4.56e9 2016-05-01       3428          2.27            2.27                0
## 5    5.58e9 2016-04-17      12231          9.14            9.14                0
## 6    6.96e9 2016-04-12      10199          6.74            6.74                0
## # … with 25 more variables: VeryActiveDistance <dbl>,
## #   ModeratelyActiveDistance <dbl>, LightActiveDistance <dbl>,
## #   SedentaryActiveDistance <dbl>, VeryActiveMinutes <dbl>,
## #   FairlyActiveMinutes <dbl>, LightlyActiveMinutes <dbl>,
## #   SedentaryMinutes <dbl>, Calories <dbl>, weekday.x <ord>, Date.x <date>,
## #   TotalSleepRecords <dbl>, TotalMinutesAsleep <dbl>, TotalTimeInBed <dbl>,
## #   time.x <chr>, weekday.y <ord>, Date.y <date>, WeightKg <dbl>, …
uniqueId_dailyActivity <- filter(dailyActivity, Id == 1503960366)
uniqueId_dailyActivity_Sleep_Merged <- filter(dailyActivity_Sleep_Merged, Id == 1503960366)
n_distinct(dailyActivity$Id)
## [1] 33
n_distinct(sleep_data$Id)
## [1] 24
n_distinct(weight_info$Id)
## [1] 8
n_distinct(dailyActivity_Sleep_Merged$Id)
## [1] 24
n_distinct(dailyActivity_Weight_Merged$Id)
## [1] 8
n_distinct(dailyActivity_Sleep_Weight_Merged$Id)
## [1] 5

Phase 4 and 5: Analyze and Share

ggplot(data = dailyActivity, mapping = aes(x=TotalSteps, y=SedentaryMinutes)) + 
  geom_point(aes(color = weekday)) +
  geom_smooth(method = lm) +
  labs(title = "Total Steps Vs. Sedentary Minutes")
## `geom_smooth()` using formula 'y ~ x'

Looking at this, it seems the lower the steps taken the higher the sedentary minutes

ggplot(data = dailyActivity)+ 
  geom_col(mapping = aes(x=weekday, y=TotalSteps, fill = weekday))+
  labs(title = "Total Steps Per Day")

In this graph we can conclude that every Tuesday is the highest total steps

We can compare the highest Total steps taken per day and the sedentary minutes per day which is both Tuesday

ggplot(data = dailyActivity) + 
  geom_col(mapping = aes(x=weekday, y=Calories, fill = weekday))+
  labs(title = "Total Calories Per Day")

At this graph, we can also see that the highest calorie burned is also on Tuesday

ggplot(data = uniqueId_dailyActivity, aes(x=date)) +
  geom_line(mapping = aes(y=VeryActiveMinutes, color = "Very Active Minutes")) +
  geom_point(mapping = aes(y=VeryActiveMinutes, color = "Very Active Minutes")) +
  geom_line(mapping = aes(y=FairlyActiveMinutes, color = "Fairly Active Minutes")) +
  geom_point(mapping = aes(y=FairlyActiveMinutes, color = "Fairly Active Minutes")) +
  labs(title = "Active Minutes of a user", y="", color = "Legend") +
  scale_color_manual(values = c("Very Active Minutes" = "red",
                                "Fairly Active Minutes" = "orange"))

ggplot(data = uniqueId_dailyActivity, aes(x=date)) +
  geom_line(mapping = aes(y=VeryActiveMinutes, color = "Very Active Minutes")) +
  geom_point(mapping = aes(y=VeryActiveMinutes, color = "Very Active Minutes")) +
  geom_line(mapping = aes(y=FairlyActiveMinutes, color = "Fairly Active Minutes")) +
  geom_point(mapping = aes(y=FairlyActiveMinutes, color = "Fairly Active Minutes")) +
  geom_line(mapping = aes(y=LightlyActiveMinutes, color = "Lightly Active Minutes")) +
  geom_point(mapping = aes(y=LightlyActiveMinutes, color = "Lightly Active Minutes")) +
  geom_line(mapping = aes(y=SedentaryMinutes, color = "Sedentary Minutes")) +
  geom_point(mapping = aes(y=SedentaryMinutes, color = "Sedentary Minutes")) +
  labs(title = "Active Minutes of a user", y="", color = "Legend") +
  scale_color_manual(values = c("Very Active Minutes" = "red",
                                "Fairly Active Minutes" = "orange",
                                "Lighty Active Minutes" = "yellow",
                                "Sedentary Minutes" = "green"))

In this chart, we can we that a person has a higher sedentary minutes per day than being active

ggplot(data = uniqueId_dailyActivity_Sleep_Merged, aes(x = date)) +
  geom_line(mapping = aes(y=VeryActiveMinutes, color = "Very Active Minutes")) +
  geom_point(mapping = aes(y=VeryActiveMinutes, color = "Very Active Minutes")) +
  geom_line(mapping = aes(y=TotalMinutesAsleep, color = "Total Minutes Asleep")) +
  geom_point(mapping = aes(y=TotalMinutesAsleep, color = "Total Minutes Asleep")) + 
  geom_line(mapping = aes(y=SedentaryMinutes, color = "Sedentary Minutes")) +
  geom_point(mapping = aes(y=SedentaryMinutes, color = "Sedentary Minutes")) +
  labs(title = "Active Minutes and Minutes asleep of a user", y="", color = "Legend") +
  scale_color_manual(values = c("Very Active Minutes" = "red",
                                "Total Minutes Asleep" = "blue",
                                "Sedentary Minutes" = "green"))

Looking at this chart, we can see that if the total minutes asleep of the user in a day is high, the sedentary minutes in that day decreases substantially.

ggplot(data = sleep_data, mapping = aes(x=TotalTimeInBed, y=TotalMinutesAsleep))+
  geom_point(aes(color = weekday))+
  geom_smooth(method = lm)+
  labs(title = "Correlation Between Minutes Asleep and Time In Bed")
## `geom_smooth()` using formula 'y ~ x'

In this chart we can see that the higher the Total time in bed the higher the Total time asleep

ggplot(data = sleep_data)+ 
  geom_col(mapping = aes(x=weekday, y=TotalMinutesAsleep, fill = weekday))+
  labs(title = "Day of Most Sleep")

Looking at this bar graph, we can conclude that after the highest calorie burned in Tuesday, users tends to sleep longer the following day

ggplot(data = dailyActivity_Sleep_Merged, mapping = aes(x=SedentaryMinutes, y=TotalMinutesAsleep))+
  geom_point(aes(color = weekday.x))+
  geom_smooth(method = lm)+
  labs(title = "Correlation Between Sedentary Time and Time Asleep")
## `geom_smooth()` using formula 'y ~ x'

In this graph, we can see that the higher the time asleep, the lower the sedentary minutes which confirms the hypothesis we got from a earlier graph with one unique user id

ggplot(data = dailyActivity_Weight_Merged, mapping = aes(x=BMI, y=TotalSteps))+
  geom_point(aes(color = weekday.x))+
  geom_smooth(method = lm)+
  labs(title = "Correlation Between BMI and TotalSteps")
## `geom_smooth()` using formula 'y ~ x'

But due to lack of unique Ids that record their weight, we can not draw a hypothesis with this graph We need more data about this correlation

Phase 6: Act

Conclusion and Recommendations

Conclusion

  • Average steps taken is 7600+ steps and the highest recorded steps taken is around 36000+ steps
  • Steps taken per day is lower with a higher sedentary minutes
  • Highest total steps taken in a week is tuesday which is also the highest calorie burned and also the highest sedentary minutes
  • Users tend to sleep longer after the highest steps taken and calories burned day
  • The longer minutes asleep a user, the shorter sedentary minutes a user have in a day
  • Not all users record their time asleep and only a handful of users record their weight info

Recommendation

  • The company should give a achievement with recording your weight and time asleep
  • User should set a steps taken per day as a goal and notification should be sent everytime the goal is not finish
  • Having a achievement if the goal has been reached everyday of the week to reward the user
  • Develop a feature to encourage people to have more active times than sedentary time
  • Notifications should be sent if the sedentary time is too high per day
  • Design the device to be more comfortable to use while sleeping