Introduction

Bellabeat is a high-tech manufacturer of health-focused products for women. Bellabeat is a successful small company and believes that analyzing smart device fitness data could help unlock new growth opportunities for the company. You have been asked to focus on one of Bellabeat’s products and analyze smart device data to gain insight into how consumers are using their smart devices in order to guide new marketing strategies for the company.

Bellabeat Products
  • Bellabeat app: The Bellabeat app provides users with health data related to their activity, sleep, stress,menstrual cycle, and mindfulness habits. This data can help users better understand their current habits and make healthy decisions. The BellaBeat app connects to their line of smart wellness products.
  • Leaf: Bellabeat’s classic wellness tracker can be worn as a bracelet, necklace, or clip. The Leaf tracker connects to the Bellabeat app to track activity, sleep, and stress.
  • Time: This wellness watch combines the timeless look of a classic timepiece with smart technology to track user activity, sleep, and stress. The Time watch connects to the Bellabeat app to provide you with insights into your daily wellness.
  • Spring: This is a water bottle that tracks daily water intake using smart technology to ensure that you are appropriately hydrated throughout the day. The Spring bottle connects to the Bellabeat app to track your hydration levels.

Bellabeat also offers a subscription-based membership program for users. Membership gives users 24/7 access to fully personalized guidance on nutrition, activity, sleep, health and beauty, and mindfulness based on their lifestyle and goals.

Step 1- Ask

In this step of the analysis process, we identify the business task and desired outcome.

Questions
  1. What are some trends in smart device usage?
  2. How could these trends apply to Bellabeat customers?
  3. How could these trends help influence Bellabeat marketing strategy?
Business Task

Bellabeat wants an analysis of their available consumer data in hope that it will reveal more opportunities for growth. they have has asked the marketing analytics team to focus on one Bellabeat product and analyze smart device usage data in order to gain insight into how people are already using their smart devices. Then, using this information, they would like high-level recommendations for how these trends can inform Bellabeat marketing strategy.

Step 2- Prepare

Dataset

The dataset used is this FitBit dataset from Kaggle.

We will determine the datasets credibility with the ROCCC method.
  • Reliability: 30 eligible FitBit users consented of unknown demographics and backgrounds were used in this survey, which is not enough participants to get a good idea of the overall FitBit user base.
  • Original: the survey used in this dataset was conducted by Amazon Mechanical Turk.
  • Comprehensive: the dataset includes records of weight, steps taken, distance traveled, calories burned, and intensity of daily activity. However it is missing information that could be helpful during analysis such as age, gender, and location.
  • Current: this data was collected from March 2016 to May 2016 and has not been updated since, making it not current.
  • Cited: the data is cited but since this dataset is from an outside survey, it may be unreliable.

Step 3- Process

First I downloaded the dataset from Kaggle and cleaned it in Excel before beginning analysis in R Studio. In Excel I removed duplicate rows, created new columns to convert minutes to hours, and some formatting such as number and date formatting.

installing and loading packages
install.packages("tidyverse")
install.packages("lubridate")
install.packages("dplyr")
install.packages("ggplot2")
install.packages("tidyr")
install.packages("here")
install.packages("skimr")
install.packages("janitor")

library(tidyverse)
library(lubridate)
library(dplyr)
library(ggplot2)
library(tidyr)
library(here)
library(skimr)
library(janitor)
importing dataframes
activity <- read_csv("/cloud/lib/dailyActivity_cleaned.csv")
sleep <- read_csv("/cloud/lib/sleep_day_cleaned.csv")
weight <- read_csv("/cloud/lib/weightLogInfo_cleaned.csv")
viewing dataframes with the ‘head()’ function
head(activity)
## # A tibble: 6 × 17
##           Id ActivityDate TotalSteps TotalDistance TrackerDistance
##        <dbl> <chr>             <dbl>         <dbl>           <dbl>
## 1 1503960366 04/12/16          13162          8.5             8.5 
## 2 1503960366 04/13/16          10735          6.97            6.97
## 3 1503960366 04/14/16          10460          6.74            6.74
## 4 1503960366 04/15/16           9762          6.28            6.28
## 5 1503960366 04/16/16          12669          8.16            8.16
## 6 1503960366 04/17/16           9705          6.48            6.48
## # ℹ 12 more variables: LoggedActivitiesDistance <dbl>,
## #   VeryActiveDistance <dbl>, ModeratelyActiveDistance <dbl>,
## #   LightActiveDistance <dbl>, SedentaryActiveDistance <dbl>,
## #   VeryActiveMinutes <dbl>, FairlyActiveMinutes <dbl>,
## #   LightlyActiveMinutes <dbl>, SedentaryMinutes <dbl>, Calories <dbl>,
## #   LightHours <dbl>, SedHours <dbl>
head(sleep)
## # A tibble: 6 × 7
##           Id SleepDay TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
##        <dbl> <chr>                <dbl>              <dbl>          <dbl>
## 1 1503960366 04/12/16                 1                327            346
## 2 1503960366 04/13/16                 2                384            407
## 3 1503960366 04/15/16                 1                412            442
## 4 1503960366 04/16/16                 2                340            367
## 5 1503960366 04/17/16                 1                700            712
## 6 1503960366 04/19/16                 1                304            320
## # ℹ 2 more variables: TotalHoursAsleep <dbl>, TotalHoursInBed <dbl>
head(weight)
## # A tibble: 6 × 8
##           Id Date     WeightKg WeightPounds   Fat   BMI IsManualReport     LogId
##        <dbl> <chr>       <dbl>        <dbl> <dbl> <dbl> <lgl>              <dbl>
## 1 1503960366 05/02/16     52.6         116.    22  22.6 TRUE             1.46e12
## 2 1503960366 05/03/16     52.6         116.    NA  22.6 TRUE             1.46e12
## 3 1927972279 04/13/16    134.          294.    NA  47.5 FALSE            1.46e12
## 4 2873212765 04/21/16     56.7         125     NA  21.4 TRUE             1.46e12
## 5 2873212765 05/12/16     57.3         126.    NA  21.7 TRUE             1.46e12
## 6 4319703577 04/17/16     72.4         160.    25  27.4 TRUE             1.46e12
determining the number of unique user IDs to know the number of participants
n_distinct(activity$Id)
## [1] 33
n_distinct(sleep$Id)
## [1] 25
n_distinct(weight$Id)
## [1] 8

Step 4- Analyze

gathering summaries of the dataframes
activity %>%  
  select(TotalSteps,
         TotalDistance,
         SedentaryMinutes,
        VeryActiveMinutes,
        LightlyActiveMinutes,
         Calories) %>%
    summary()
##    TotalSteps    TotalDistance    SedentaryMinutes VeryActiveMinutes
##  Min.   :    0   Min.   : 0.000   Min.   :   0.0   Min.   :  0.00   
##  1st Qu.: 3790   1st Qu.: 2.620   1st Qu.: 729.8   1st Qu.:  0.00   
##  Median : 7406   Median : 5.245   Median :1057.5   Median :  4.00   
##  Mean   : 7638   Mean   : 5.490   Mean   : 991.2   Mean   : 21.16   
##  3rd Qu.:10727   3rd Qu.: 7.713   3rd Qu.:1229.5   3rd Qu.: 32.00   
##  Max.   :36019   Max.   :28.030   Max.   :1440.0   Max.   :210.00   
##  LightlyActiveMinutes    Calories   
##  Min.   :  0.0        Min.   :   0  
##  1st Qu.:127.0        1st Qu.:1828  
##  Median :199.0        Median :2134  
##  Mean   :192.8        Mean   :2304  
##  3rd Qu.:264.0        3rd Qu.:2793  
##  Max.   :518.0        Max.   :4900
 sleep %>%  
  select(TotalHoursAsleep,
         TotalHoursInBed,
         TotalSleepRecords) %>%
    summary()
##  TotalHoursAsleep TotalHoursInBed  TotalSleepRecords
##  Min.   : 0.970   Min.   : 1.020   Min.   :1.00     
##  1st Qu.: 6.020   1st Qu.: 6.732   1st Qu.:1.00     
##  Median : 7.210   Median : 7.720   Median :1.00     
##  Mean   : 6.987   Mean   : 7.641   Mean   :1.12     
##  3rd Qu.: 8.170   3rd Qu.: 8.770   3rd Qu.:1.00     
##  Max.   :13.270   Max.   :16.020   Max.   :3.00     
##  NA's   :3        NA's   :3        NA's   :3
weight %>%  
  select(WeightPounds,
         BMI) %>%
  summary()
##   WeightPounds        BMI       
##  Min.   :116.0   Min.   :21.45  
##  1st Qu.:135.4   1st Qu.:23.96  
##  Median :137.8   Median :24.39  
##  Mean   :158.8   Mean   :25.19  
##  3rd Qu.:187.5   3rd Qu.:25.56  
##  Max.   :294.3   Max.   :47.54
Findings
  • Users are, on average, only taking about 7,600 steps a day, which is lower than the 10,000 daily steps recommended by the Centers for Disease Control and Prevention (CDC).
  • The average BMI of the 8 participants who provided their weight statistics is 25.19, with the highest recorded being 47.54. According to the CDC, a healthy BMI is 18.5 to 24.9 which makes the average user overweight and the user with a 47.54 BMI obese.
  • Participants that contributed weight information had not lost any weight in the recorded period.
  • The average person is getting just below the CDC recommended 7-8 hours of sleep per night for adults. The average amount of sleep participants got was 6.98 hours.
  • It is worth noting that there are no time stamps for the sleep data and some users had logged multiple entries per day, so some of the higher sleep times could be users logging their naps along with their night time sleep.

Step 5- Share

Creating visualizations to communicate findings

These two graphs show the positive correlation between distance traveled and calories burned, and between very active minutes and calories burned.

This graph shows the positive correlation between hours asleep and hours spent in bed. It also shows that about 70% of participants got less than 8 hours of sleep a night.

Findings
  • The number of sedentary minutes is much higher than the number of active minutes.
  • People burn more calories the more they move, which tells us that Bellabeat should be encouraging users to move more.
  • Participants that provided weight data did not lose any weight during the data collection period despite using fitness products.

Step 6- Act

Recommendations
  • Use of reminders: Bellabeat can send push notifications to users phones and Bellabeat device with reminders to stand and move if they have been stationary for too long. Users spend an average of 991 minutes or 16.5 hours sedentary and a reminder to get up to move and stretch can be very beneficial.
  • Weekly challenges:Introducing weekly personal challenges in the Bellabeat app can help motivate users to be more active and reach their health goals. They could also introduce weekly or monthly competitions between friends or users from the same geographical location for digital trophies or other small rewards.
  • Collect more user data: Collecting more specific user information such as height, age, and other body measurements can help create a more personalized health plan for users that can’t be achieved with just BMI and weight.
  • Differentiate between naps and sleep: Showing the total hours slept in a day can be misleading if naps are included in the sleep data., Bellabeat should introduce an option to differentiate naps from nighttime sleeping to help users track their sleep more efficiently.