Introduction

I will be looking at data from other wellness trackers to determine how they are used so that Bellabeat can make informed decisions with theirs. My assumption is that the most common usage of wellness trackers is for individuals to keep track of how far they have walked in distance or steps.

For this project, I will be following the steps for analyzing data:

Ask

About the Company

Bellabeat is a company that specializes in fitness products for women. Some of the main products are the app, wellness tracker, wellness watch, and water bottle that tracks water intake.

Stakeholders

  • Urška Sršen - cofounder and Chief Creative Officer
  • Sando Mur - cofounder and mathematician
  • Bellabeat marketing analytics team - responsible for guiding marketing strategy

Key Questions

  1. What are some trends in smart device usage?
  2. How could these trends apply to Bellabeat customers?
  3. How could these trends help influence Bellabeat marketing strategy?

Business Task

To analyze data on other smart device usage, specifically in fitness, to find trends that Bellabeat may apply to its marketing strategy as well as to best meet customer needs.

Prepare

Data

I will be using data from a public dataset that includes data from 30 Fitbit users. I will be focusing on the daily values and weight log for this data.

Package Installation

I need to install several packages to ensure that I can analyze the data.

install.packages("tidyverse")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("ggplot2")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("here")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("janitor")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("skimr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("plyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("dplyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("readr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("purrr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("tidyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("tibble")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.1     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
library(ggplot2)
library(here)
## here() starts at /cloud/project
library(janitor)
## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(skimr)
library(plyr)
## ------------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## ------------------------------------------------------------------------------
## 
## Attaching package: 'plyr'
## 
## The following object is masked from 'package:here':
## 
##     here
## 
## The following objects are masked from 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## 
## The following object is masked from 'package:purrr':
## 
##     compact
library(dplyr)
library(readr)
library(purrr)
library(tidyr)
library(tibble)

Adding Fitbit Data

FitBit Fitness Tracker Data

library(readr)
activity <- read_csv("Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")
## Rows: 940 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): ActivityDate
## dbl (14): Id, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDi...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(readr)
calories <- read_csv("Fitabase Data 4.12.16-5.12.16/dailyCalories_merged.csv")
## Rows: 940 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDay
## dbl (2): Id, Calories
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(readr)
intensities <- read_csv("Fitabase Data 4.12.16-5.12.16/dailyIntensities_merged.csv")
## Rows: 940 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDay
## dbl (9): Id, SedentaryMinutes, LightlyActiveMinutes, FairlyActiveMinutes, Ve...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(readr)
steps <- read_csv("Fitabase Data 4.12.16-5.12.16/dailySteps_merged.csv")
## Rows: 940 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDay
## dbl (2): Id, StepTotal
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(readr)
sleep_sep <- read_csv("sleepDay_merged.csv")
## Rows: 413 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): ActivityDay
## dbl  (3): Id, TotalMinutesAsleep, TotalTimeInBed
## time (1): Time
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(readr)
weight_sep <- read_csv("weightLogInfo_merged.csv")
## Rows: 67 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): ActivityDay
## dbl  (5): Id, WeightPounds, Fat, BMI, LogId
## lgl  (1): IsManualReport
## time (1): Time
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Preview the Data

head(activity)
## # A tibble: 6 × 15
##       Id Activ…¹ Total…² Total…³ Track…⁴ Logge…⁵ VeryA…⁶ Moder…⁷ Light…⁸ Seden…⁹
##    <dbl> <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1 1.50e9 4/12/2…   13162    8.5     8.5        0    1.88   0.550    6.06       0
## 2 1.50e9 4/13/2…   10735    6.97    6.97       0    1.57   0.690    4.71       0
## 3 1.50e9 4/14/2…   10460    6.74    6.74       0    2.44   0.400    3.91       0
## 4 1.50e9 4/15/2…    9762    6.28    6.28       0    2.14   1.26     2.83       0
## 5 1.50e9 4/16/2…   12669    8.16    8.16       0    2.71   0.410    5.04       0
## 6 1.50e9 4/17/2…    9705    6.48    6.48       0    3.19   0.780    2.51       0
## # … with 5 more variables: VeryActiveMinutes <dbl>, FairlyActiveMinutes <dbl>,
## #   LightlyActiveMinutes <dbl>, SedentaryMinutes <dbl>, Calories <dbl>, and
## #   abbreviated variable names ¹​ActivityDate, ²​TotalSteps, ³​TotalDistance,
## #   ⁴​TrackerDistance, ⁵​LoggedActivitiesDistance, ⁶​VeryActiveDistance,
## #   ⁷​ModeratelyActiveDistance, ⁸​LightActiveDistance, ⁹​SedentaryActiveDistance
head(calories)
## # A tibble: 6 × 3
##           Id ActivityDay Calories
##        <dbl> <chr>          <dbl>
## 1 1503960366 4/12/2016       1985
## 2 1503960366 4/13/2016       1797
## 3 1503960366 4/14/2016       1776
## 4 1503960366 4/15/2016       1745
## 5 1503960366 4/16/2016       1863
## 6 1503960366 4/17/2016       1728
head(intensities)
## # A tibble: 6 × 10
##       Id Activ…¹ Seden…² Light…³ Fairl…⁴ VeryA…⁵ Seden…⁶ Light…⁷ Moder…⁸ VeryA…⁹
##    <dbl> <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1 1.50e9 4/12/2…     728     328      13      25       0    6.06   0.550    1.88
## 2 1.50e9 4/13/2…     776     217      19      21       0    4.71   0.690    1.57
## 3 1.50e9 4/14/2…    1218     181      11      30       0    3.91   0.400    2.44
## 4 1.50e9 4/15/2…     726     209      34      29       0    2.83   1.26     2.14
## 5 1.50e9 4/16/2…     773     221      10      36       0    5.04   0.410    2.71
## 6 1.50e9 4/17/2…     539     164      20      38       0    2.51   0.780    3.19
## # … with abbreviated variable names ¹​ActivityDay, ²​SedentaryMinutes,
## #   ³​LightlyActiveMinutes, ⁴​FairlyActiveMinutes, ⁵​VeryActiveMinutes,
## #   ⁶​SedentaryActiveDistance, ⁷​LightActiveDistance, ⁸​ModeratelyActiveDistance,
## #   ⁹​VeryActiveDistance
head(steps)
## # A tibble: 6 × 3
##           Id ActivityDay StepTotal
##        <dbl> <chr>           <dbl>
## 1 1503960366 4/12/2016       13162
## 2 1503960366 4/13/2016       10735
## 3 1503960366 4/14/2016       10460
## 4 1503960366 4/15/2016        9762
## 5 1503960366 4/16/2016       12669
## 6 1503960366 4/17/2016        9705
head(weight_sep)
## # A tibble: 6 × 8
##           Id ActivityDay Time     WeightPounds   Fat   BMI IsManualRep…¹   LogId
##        <dbl> <chr>       <time>          <dbl> <dbl> <dbl> <lgl>           <dbl>
## 1 1503960366 5/2/2016    23:59:59         116.    22  22.6 TRUE          1.46e12
## 2 1503960366 5/3/2016    23:59:59         116.    NA  22.6 TRUE          1.46e12
## 3 1927972279 4/13/2016   01:08:52         294.    NA  47.5 FALSE         1.46e12
## 4 2873212765 4/21/2016   23:59:59         125.    NA  21.5 TRUE          1.46e12
## 5 2873212765 5/12/2016   23:59:59         126.    NA  21.7 TRUE          1.46e12
## 6 4319703577 4/17/2016   23:59:59         160.    25  27.5 TRUE          1.46e12
## # … with abbreviated variable name ¹​IsManualReport
head(sleep_sep)
## # A tibble: 6 × 5
##           Id ActivityDay Time   TotalMinutesAsleep TotalTimeInBed
##        <dbl> <chr>       <time>              <dbl>          <dbl>
## 1 1503960366 4/12/2016   00'00"                327            346
## 2 1503960366 4/13/2016   00'00"                384            407
## 3 1503960366 4/15/2016   00'00"                412            442
## 4 1503960366 4/16/2016   00'00"                340            367
## 5 1503960366 4/17/2016   00'00"                700            712
## 6 1503960366 4/19/2016   00'00"                304            320

Process

Cleaning the Data

  • Separated the date and time in sleep and weight files for consistency with the use of Excel.

  • Decided to exclude the dataset “steps” because the steps were included in the activity dataset.

  • Merged data into one dataset for Fitbit so I could see everything together.

m1 <- merge(activity, calories, by = 1:2)
m2 <- merge(intensities, m1, by = 1:2)
merged_other <- merge(sleep_sep, weight_sep, by = 1:2, all = TRUE)
merged_fb <- merge(m2,merged_other, by = 1:2)

view(merged_fb)

Remove unnecessary/redundant columns.

all_fb_data <- subset(merged_fb, select = -c(Time.y, WeightPounds, Fat, BMI, IsManualReport, LogId, Time.x, Calories.x, TrackerDistance))

head(all_fb_data)
##           Id ActivityDay SedentaryMinutes.x LightlyActiveMinutes.x
## 1 1503960366   4/12/2016                728                    328
## 2 1503960366   4/13/2016                776                    217
## 3 1503960366   4/15/2016                726                    209
## 4 1503960366   4/16/2016                773                    221
## 5 1503960366   4/17/2016                539                    164
## 6 1503960366   4/19/2016                775                    264
##   FairlyActiveMinutes.x VeryActiveMinutes.x SedentaryActiveDistance.x
## 1                    13                  25                         0
## 2                    19                  21                         0
## 3                    34                  29                         0
## 4                    10                  36                         0
## 5                    20                  38                         0
## 6                    31                  50                         0
##   LightActiveDistance.x ModeratelyActiveDistance.x VeryActiveDistance.x
## 1                  6.06                       0.55                 1.88
## 2                  4.71                       0.69                 1.57
## 3                  2.83                       1.26                 2.14
## 4                  5.04                       0.41                 2.71
## 5                  2.51                       0.78                 3.19
## 6                  5.03                       1.32                 3.53
##   TotalSteps TotalDistance LoggedActivitiesDistance VeryActiveDistance.y
## 1      13162          8.50                        0                 1.88
## 2      10735          6.97                        0                 1.57
## 3       9762          6.28                        0                 2.14
## 4      12669          8.16                        0                 2.71
## 5       9705          6.48                        0                 3.19
## 6      15506          9.88                        0                 3.53
##   ModeratelyActiveDistance.y LightActiveDistance.y SedentaryActiveDistance.y
## 1                       0.55                  6.06                         0
## 2                       0.69                  4.71                         0
## 3                       1.26                  2.83                         0
## 4                       0.41                  5.04                         0
## 5                       0.78                  2.51                         0
## 6                       1.32                  5.03                         0
##   VeryActiveMinutes.y FairlyActiveMinutes.y LightlyActiveMinutes.y
## 1                  25                    13                    328
## 2                  21                    19                    217
## 3                  29                    34                    209
## 4                  36                    10                    221
## 5                  38                    20                    164
## 6                  50                    31                    264
##   SedentaryMinutes.y Calories.y TotalMinutesAsleep TotalTimeInBed
## 1                728       1985                327            346
## 2                776       1797                384            407
## 3                726       1745                412            442
## 4                773       1863                340            367
## 5                539       1728                700            712
## 6                775       2035                304            320

Used Excel to do the following.

  • Remove duplicate rows.
  • Find the average daily steps of each user.
  • Find the days of the week for each date given.
  • Determine the average steps each day of the week.
  • Determine the average of hours asleep each day of the week.

Analyze

summary(all_fb_data)
##        Id            ActivityDay        SedentaryMinutes.x
##  Min.   :1.504e+09   Length:445         Min.   :   0.0    
##  1st Qu.:4.020e+09   Class :character   1st Qu.: 644.0    
##  Median :4.703e+09   Mode  :character   Median : 727.0    
##  Mean   :5.193e+09                      Mean   : 739.5    
##  3rd Qu.:6.962e+09                      3rd Qu.: 816.0    
##  Max.   :8.878e+09                      Max.   :1363.0    
##                                                           
##  LightlyActiveMinutes.x FairlyActiveMinutes.x VeryActiveMinutes.x
##  Min.   :  2.0          Min.   :  0.00        Min.   :  0.00     
##  1st Qu.:161.0          1st Qu.:  0.00        1st Qu.:  0.00     
##  Median :214.0          Median : 11.00        Median : 11.00     
##  Mean   :219.3          Mean   : 17.45        Mean   : 27.16     
##  3rd Qu.:265.0          3rd Qu.: 25.00        3rd Qu.: 43.00     
##  Max.   :518.0          Max.   :143.00        Max.   :210.00     
##                                                                  
##  SedentaryActiveDistance.x LightActiveDistance.x ModeratelyActiveDistance.x
##  Min.   :0.000000          Min.   : 0.010        Min.   :0.0000            
##  1st Qu.:0.000000          1st Qu.: 2.630        1st Qu.:0.0000            
##  Median :0.000000          Median : 3.870        Median :0.4000            
##  Mean   :0.001101          Mean   : 3.948        Mean   :0.7229            
##  3rd Qu.:0.000000          3rd Qu.: 5.220        3rd Qu.:0.9700            
##  Max.   :0.110000          Max.   :10.710        Max.   :6.4800            
##                                                                            
##  VeryActiveDistance.x   TotalSteps    TotalDistance    LoggedActivitiesDistance
##  Min.   : 0.000       Min.   :   17   Min.   : 0.010   Min.   :0.000           
##  1st Qu.: 0.000       1st Qu.: 5454   1st Qu.: 3.730   1st Qu.:0.000           
##  Median : 0.650       Median : 9148   Median : 6.470   Median :0.000           
##  Mean   : 1.776       Mean   : 8987   Mean   : 6.478   Mean   :0.105           
##  3rd Qu.: 2.560       3rd Qu.:11611   3rd Qu.: 8.250   3rd Qu.:0.000           
##  Max.   :21.660       Max.   :29326   Max.   :26.720   Max.   :4.082           
##                                                                                
##  VeryActiveDistance.y ModeratelyActiveDistance.y LightActiveDistance.y
##  Min.   : 0.000       Min.   :0.0000             Min.   : 0.010       
##  1st Qu.: 0.000       1st Qu.:0.0000             1st Qu.: 2.630       
##  Median : 0.650       Median :0.4000             Median : 3.870       
##  Mean   : 1.776       Mean   :0.7229             Mean   : 3.948       
##  3rd Qu.: 2.560       3rd Qu.:0.9700             3rd Qu.: 5.220       
##  Max.   :21.660       Max.   :6.4800             Max.   :10.710       
##                                                                       
##  SedentaryActiveDistance.y VeryActiveMinutes.y FairlyActiveMinutes.y
##  Min.   :0.000000          Min.   :  0.00      Min.   :  0.00       
##  1st Qu.:0.000000          1st Qu.:  0.00      1st Qu.:  0.00       
##  Median :0.000000          Median : 11.00      Median : 11.00       
##  Mean   :0.001101          Mean   : 27.16      Mean   : 17.45       
##  3rd Qu.:0.000000          3rd Qu.: 43.00      3rd Qu.: 25.00       
##  Max.   :0.110000          Max.   :210.00      Max.   :143.00       
##                                                                     
##  LightlyActiveMinutes.y SedentaryMinutes.y   Calories.y   TotalMinutesAsleep
##  Min.   :  2.0          Min.   :   0.0     Min.   : 257   Min.   : 58.0     
##  1st Qu.:161.0          1st Qu.: 644.0     1st Qu.:1863   1st Qu.:361.0     
##  Median :214.0          Median : 727.0     Median :2236   Median :433.0     
##  Mean   :219.3          Mean   : 739.5     Mean   :2447   Mean   :419.5     
##  3rd Qu.:265.0          3rd Qu.: 816.0     3rd Qu.:2984   3rd Qu.:490.0     
##  Max.   :518.0          Max.   :1363.0     Max.   :4900   Max.   :796.0     
##                                                           NA's   :32        
##  TotalTimeInBed 
##  Min.   : 61.0  
##  1st Qu.:403.0  
##  Median :463.0  
##  Mean   :458.6  
##  3rd Qu.:526.0  
##  Max.   :961.0  
##  NA's   :32

Observations

A lot of information is in the summary table. I especially appreciate the ability to see the average, minimum, and maximum for each column. This allows me to look over values, like the average total steps (8987).

With the use of the summary table, I added the averages together. This shows that the average time a user wears their Fitbit on any given day is about 23.5 hours.

Share

Data Visualizations

I used Excel to look at certain relationships and values in the data. I looked at the average daily steps of each user. I used the information found on the 10,000 Steps Project to determine activity levels based on step totals. By using this, I can predict the activity levels of those most likely to purchase a fitness tracker like the Fitbit.

Fig. A The data shown in Fig. A is surprising, but helpful. The users who meet the recommended daily activity only make up 23% of Fitbit users. Therefore, the other 77% do not meet this goal. With this in mind, it is my recommendation that Bellabeat target users of all activity levels. In order to do this, Bellabeat may choose to make advertising campaigns inclusive by showing users of all shapes and sizes and show varying activity levels (i.e., walking in the park, doing a marathon). In addition, selling the fitness trackers based on improving a person’s fitness is a better angle so that all users are motivated to purchase the tracker.

  • I determined the average amount of sleep each day of the week, in hours, of the users in the Fitbit data. I can see if days of the week may affect a users amount of sleep.

Fig. B

The bar graph in Fig B shows that Sunday shows a marked increase in the amount of sleep that a user is getting. Knowing that days of the week affect the average sleep of users, Bellabeat could encourage the use of their fitness tracker to work on sleep goals; including consistency.

  • I determined the average amount of steps each day of the week. I can see if days of the week may affect a users activity level.

Fig. C The data in Fig. C suggests that users achieve similar averages in steps per weekday with the highest average being on Saturday and the lowest on Sunday. Understanding this, Bellabeat could try to help users increase daily steps and improve consistency to encourage users to purchase their fitness trackers.

Act

  1. What are some trends in smart device usage?
  2. How could these trends apply to Bellabeat customers?
  3. How could these trends help influence Bellabeat marketing strategy?

According to my analysis, I believe Bellabeat would best benefit from the following key takeaways (in bold):

1. Marketing should be geared towards all activity levels with an emphasis on those who are less active.

When looking at the data, it is evident that users of fitness trackers, like the Fitbit, are not highly active. Furthermore, 28% of users are classified as “Sedentary” or “Low Active”. Therefore, marketing should focus on users who want to improve their fitness with a focus on trying to increase their activity, no matter their beginning fitness level. Based on the data, Bellabeat will reach the users most likely to purchase one of its fitness trackers by targeting the less active users.

2. Marketing should focus on improving sleep quality and consistency through the use of the tracker.

The data shows that the total amount of sleep Fitbit users get varies by the day of the week. However, it also shows that Fitbit users are not getting the amount of sleep recommended. Assuming that all of the users in the study were adults, the CDC recommends a minimum of 7 hours of sleep each night. Fig. B shows that the users surveyed were below the recommended average for more than half the days of the week. Bellabeat could do this in a couple of ways:

3. Ensure the fitness tracker has a long battery life.

The data shows that the average Fitbit user wears his or her fitness tracker for 23.5 hours each day. If the average user is wearing the fitness tracker for 98% of the day, it should have a long battery life. Increasing the battery life of a smart device will make it more marketable to consumers who will wear the device the majority of the time.