Bellabeat Case study Analysis Using R

By Christy

The purpose of this case study analysis is to provide marketing strategy solutions to Bellabeat, a high-tech manufacturer of health-focused products for women.

##About the company

Urška Sršen and Sando Mur founded Bellabeat, a high-tech company that manufactures health-focused smart products. Sršen used her background as an artist to develop beautifully designed technology that informs and inspires women around the world. Collecting data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with knowledge about their own health and habits. Since it was founded in 2013, Bellabeat has grown rapidly and quickly positioned itself as a tech-driven wellness company for women.

This document is divided into sections that reflect the data analysis process, which includes the following six phases: ask, prepare, process, analyze, share, and act.

Ask

By analysing public data that explores smart device users’ daily habits, I drew up high-level recommendations for how these trends can inform Bellabeat’s marketing strategy. 1. What are some trends in smart device usage? 2. How could these trends apply to Bellabeat customers? 3. How could these trends help influence Bellabeat marketing strategy?

Prepare

The data that was used was from the FitBit Fitness Tracker Data, which was a CC0 public domain dataset that was made available on Kaggle. 30 eligible Fitbit users consented to publicize their personal tracker data.

Although the data is original, it is not as reliable as data entries are incomplete. Furthermore, the sample size is too small to come to an unbiased conclusion.

As it was easier to use the R programming language to organise large datasets and create visualisations with greater flexibility, I used R throughout all phases of the data analysis process.

Downloading and storing the data

library(tidyverse)
library(here)
library(tinytex)
library(skimr)
library(dplyr)
library(janitor)
library(lubridate)
daily_activity<-read_csv("C:/Users/Christy Un/OneDrive/Documents/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")
daily_steps<-read_csv("C:/Users/Christy Un/OneDrive/Documents/Fitabase Data 4.12.16-5.12.16/dailySteps_merged.csv")
daily_sleep<-read_csv("C:/Users/Christy Un/OneDrive/Documents/Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")
hourly_steps<-read_csv("C:/Users/Christy Un/OneDrive/Documents/Fitabase Data 4.12.16-5.12.16/hourlySteps_merged.csv")
hourly_calories<-read_csv("C:/Users/Christy Un/OneDrive/Documents/Fitabase Data 4.12.16-5.12.16/hourlyCalories_merged.csv")
hourly_intensities<-read_csv("C:/Users/Christy Un/OneDrive/Documents/Fitabase Data 4.12.16-5.12.16/hourlyIntensities_merged.csv")
weight<-read_csv("C:/Users/Christy Un/OneDrive/Documents/Fitabase Data 4.12.16-5.12.16/weightLogInfo_merged.csv")

Process

Then, I cleaned the data to remove duplicated entries and made the variable names with consistent and accessible formats.

daily_activity <- daily_activity %>% distinct() %>% drop_na()
daily_sleep <- daily_sleep %>% distinct() %>% drop_na()
daily_steps <- daily_steps %>% distinct() %>% drop_na()
hourly_calories <- hourly_calories %>% distinct() %>% drop_na()
hourly_intensities <- hourly_intensities %>% distinct() %>% drop_na()
hourly_steps <- hourly_steps %>% distinct() %>% drop_na()
daily_activity <- rename_with(daily_activity, tolower)
daily_sleep <- rename_with(daily_sleep, tolower)
daily_steps <- rename_with(daily_steps, tolower)
hourly_calories <- rename_with(hourly_calories, tolower)
hourly_intensities <- rename_with(hourly_intensities, tolower)
hourly_steps <- rename_with(hourly_steps, tolower)

The date entries were then converted to usable formats for analysis at a later stage.

daily_activity<-daily_activity%>%rename(date=activitydate)%>%mutate(date = as_date(date, format = "%m/%d/%Y"))

daily_sleep<-daily_sleep%>%rename(date=sleepday)%>%mutate(date = as_date(date, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone()))

hourly_calories<-hourly_calories%>%rename(date_time=activityhour)%>%mutate(date_time = as.POSIXct(date_time, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone()))

hourly_intensities<-hourly_intensities%>%rename(date_time=activityhour)%>%mutate(date_time = as.POSIXct(date_time, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone()))

hourly_steps<-hourly_steps%>%rename(date_time=activityhour)%>%mutate(date_time = as.POSIXct(date_time, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone()))

To find the relationship between sleeping patterns and daily activities of users, I then merged the two datasets together.

daily_activity_sleep <- merge(daily_activity, daily_sleep, by=c ("id", "date"))

Analyze and Share

For analysis, the relevant data visualisation packages were loaded.

library(ggplot2)
library(ggthemes)

Daily activity

It was observable that most users spent their time in sedentary activities (an average of 991.2 minutes), followed by lightly active exercises (an average of 192.8 minutes). While the World Health Organisation notes that adults aged 18 to 64 years old should have at least 150 to 300 minutes of fairly active aerobic physical activity per week for good health, which is equivalent to at least 21.4 to 42.9 minutes per day.Yet, the average fairly active duration per day was 13.56 minutes according to the sample data.

In terms of calories expended and the total number of steps taken, Fitbit users have, on average, reached the bottom line for relatively good health. An average of 2,304 calories was expended, which was slightly higher than the 2,200 calories for male and 1,600 calories for female according to the Healthline. Meanwhile, although an average of 7,638 steps were taken per day - exceeding the 5,000 steps benchmark for sedentary lifestyle, the general medical advice recommends most adults to take 10,000 steps per day for good health.

daily_activity%>%select(veryactiveminutes,fairlyactiveminutes,lightlyactiveminutes,sedentaryminutes,calories,totalsteps)%>%summary()
##  veryactiveminutes fairlyactiveminutes lightlyactiveminutes sedentaryminutes
##  Min.   :  0.00    Min.   :  0.00      Min.   :  0.0        Min.   :   0.0  
##  1st Qu.:  0.00    1st Qu.:  0.00      1st Qu.:127.0        1st Qu.: 729.8  
##  Median :  4.00    Median :  6.00      Median :199.0        Median :1057.5  
##  Mean   : 21.16    Mean   : 13.56      Mean   :192.8        Mean   : 991.2  
##  3rd Qu.: 32.00    3rd Qu.: 19.00      3rd Qu.:264.0        3rd Qu.:1229.5  
##  Max.   :210.00    Max.   :143.00      Max.   :518.0        Max.   :1440.0  
##     calories      totalsteps   
##  Min.   :   0   Min.   :    0  
##  1st Qu.:1828   1st Qu.: 3790  
##  Median :2134   Median : 7406  
##  Mean   :2304   Mean   : 7638  
##  3rd Qu.:2793   3rd Qu.:10727  
##  Max.   :4900   Max.   :36019

Hourly activity

The below analysis showed that the peak hours of calory expenditure occurred during the peak hours of 17:00 to 19:00 after work. This might be the time when users had more free time to exercise. On the other hand, the least calories were expended during midnight and early morning.

hourly_calories<-hourly_calories%>%mutate(time=format(as.POSIXct(date_time),format="%H:%M"))

hourly_calories%>%ggplot(aes(x=time,y=calories))+geom_bar(stat="identity")+labs(title="Calories Burned by Time",x="Time (hour)",y="Calories")+theme(plot.title = element_text(hjust = 0.5))

Further analysis was conducted to examine user activity intensities by hour, which reflected similar patterns as the chart above (calories burned by hour). Below, it was clear that the period between 17:00 and 19:00 had the highest minute-level intensity values that occurred within the hour. As a result, Fitbit users were most active during these hours.

hourly_intensities<-hourly_intensities%>%mutate(time=format(as.POSIXct(date_time),format="%H:%M"))
hourly_intensities%>%ggplot(aes(x=time,y=totalintensity))+geom_bar(stat="identity")+labs(title="Activity Intensities by Hour",x="Time (hour)",y="Intensities")+theme(plot.title = element_text(hjust = 0.5))

Relationship between distance tracked and calories

After analysis, I found that there was a positive correlation between the total distance tracked by Fitbit device and total estimated energy expenditure in kilocalories. Therefore, more exercise is beneficial for good health.

Relationship_TrackerDistance_Calories<-daily_activity%>%ggplot(aes(totalsteps,calories))+geom_point()+geom_smooth(method="loess")+ggtitle("Relationship Between Total Steps and Calories")+xlab("Total Number of Steps")+ylab("Calories (kilocalories)")

Relationship_TrackerDistance_Calories

Relationship between weight, BMI and calories

According to the weight classification based on BMI range, there are seven categories of weight, as shown below.

BMI range and classification
BMI range and classification

Looking at the summary statistics of the users’ BMI and weight, the mean BMI of users is 25.19, which was under the overweight category (25-29.9). Meanwhile, the mean weight was 72.04kg.

weight%>%select(BMI,WeightKg)%>%summary()
##       BMI           WeightKg     
##  Min.   :21.45   Min.   : 52.60  
##  1st Qu.:23.96   1st Qu.: 61.40  
##  Median :24.39   Median : 62.50  
##  Mean   :25.19   Mean   : 72.04  
##  3rd Qu.:25.56   3rd Qu.: 85.05  
##  Max.   :47.54   Max.   :133.50

I created another variable called “weight_categories” for further analysis.

weightfinal<-weight%>%mutate(weight_categories=case_when(BMI<16.5 ~ "Severely Underweight",BMI>=16.5 & BMI<=18.4 ~ "Underweight",BMI>=18.5 & BMI<=24.9 ~ "Normal Weight",BMI>=25 & BMI<=29.9 ~ "Overweight", BMI>=30 & BMI<=34.9 ~ "Obese Class I", BMI>=35 & BMI<=39.9 ~ "Obese Class II", BMI>=40 ~ "Obese Class III"))

Then, I created a boxplot to find the relationship between weight and BMI. For users who were under the overweight category, the mean weight was over 85 kg. There was an outlier case for obese class III.

weightfinal%>%ggplot(aes(x=reorder(weight_categories,BMI),y=WeightKg))+geom_boxplot(outlier.color="red")+labs(title="Average Weight (kg) by BMI",x="BMI Classification",y="Weight (kg)")+scale_y_continuous(breaks=seq(60,135,15))+theme(plot.title = element_text(hjust = 0.5))

Nonetheless, the accuracy of these conclusions was undermined by the small sample size and also the lack of demographic information. For example, BMI classifications depend on age and sex for individuals aged between 2 and 20 years old (see link).

I also found that there were almost as much overweight users as users who were with normal weight, as displayed below.

weight_classification<-weightfinal%>%group_by(weight_categories)%>%summarise(number_of_people=n())
weight_classification%>%ggplot(aes(x="", y=number_of_people, fill=weight_categories)) +geom_bar(stat="identity", width=1,color="white")+geom_text(aes(label=number_of_people),position=position_stack(vjust=0.5)) +coord_polar(theta="y", start=0)+theme_economist()+theme_void()+labs(fill="Weight Categories")+scale_fill_discrete(breaks=c("Normal Weight","Overweight","Obese Class III"))+ggtitle("Weight Categories of Bellabeat Users")+theme(plot.title = element_text(hjust = 0.5))

User sleep and weight

Studies have shown that poor sleep or stress is linked to more consumption of calories and weight gain. I merged the two datasets related to weight and sleep. Relevant information can be found on Harvard Health Publishing and Healthline.

weightfinal$Date<-mdy_hms(weightfinal$Date)

weightrevised<-weightfinal%>%mutate(YMDDate=format(as.POSIXct(Date),format="%Y-%m-%d"))

weightrevised<-weightrevised%>%rename("date"="Date")%>%rename("id"="Id")

daily_sleep_weight <- merge(daily_activity_sleep, weightrevised, by=c ("id", "date"))

Then, I created an extra variable named “time_not_sleeping” to find the difference between total time in bed and total minutes asleep. This would include times when Fitbit users were restless and awake, indicating poor sleep quality.

daily_sleep_weight<-daily_sleep_weight%>%mutate(time_not_sleeping=totaltimeinbed-totalminutesasleep)

head(daily_activity_sleep)
##           id       date totalsteps totaldistance trackerdistance
## 1 1503960366 2016-04-12      13162          8.50            8.50
## 2 1503960366 2016-04-13      10735          6.97            6.97
## 3 1503960366 2016-04-15       9762          6.28            6.28
## 4 1503960366 2016-04-16      12669          8.16            8.16
## 5 1503960366 2016-04-17       9705          6.48            6.48
## 6 1503960366 2016-04-19      15506          9.88            9.88
##   loggedactivitiesdistance veryactivedistance moderatelyactivedistance
## 1                        0               1.88                     0.55
## 2                        0               1.57                     0.69
## 3                        0               2.14                     1.26
## 4                        0               2.71                     0.41
## 5                        0               3.19                     0.78
## 6                        0               3.53                     1.32
##   lightactivedistance sedentaryactivedistance veryactiveminutes
## 1                6.06                       0                25
## 2                4.71                       0                21
## 3                2.83                       0                29
## 4                5.04                       0                36
## 5                2.51                       0                38
## 6                5.03                       0                50
##   fairlyactiveminutes lightlyactiveminutes sedentaryminutes calories
## 1                  13                  328              728     1985
## 2                  19                  217              776     1797
## 3                  34                  209              726     1745
## 4                  10                  221              773     1863
## 5                  20                  164              539     1728
## 6                  31                  264              775     2035
##   totalsleeprecords totalminutesasleep totaltimeinbed
## 1                 1                327            346
## 2                 2                384            407
## 3                 1                412            442
## 4                 2                340            367
## 5                 1                700            712
## 6                 1                304            320
daily_activity_sleep<-daily_activity_sleep%>%mutate(time_not_sleeping=totaltimeinbed-totalminutesasleep)

head(daily_activity_sleep)
##           id       date totalsteps totaldistance trackerdistance
## 1 1503960366 2016-04-12      13162          8.50            8.50
## 2 1503960366 2016-04-13      10735          6.97            6.97
## 3 1503960366 2016-04-15       9762          6.28            6.28
## 4 1503960366 2016-04-16      12669          8.16            8.16
## 5 1503960366 2016-04-17       9705          6.48            6.48
## 6 1503960366 2016-04-19      15506          9.88            9.88
##   loggedactivitiesdistance veryactivedistance moderatelyactivedistance
## 1                        0               1.88                     0.55
## 2                        0               1.57                     0.69
## 3                        0               2.14                     1.26
## 4                        0               2.71                     0.41
## 5                        0               3.19                     0.78
## 6                        0               3.53                     1.32
##   lightactivedistance sedentaryactivedistance veryactiveminutes
## 1                6.06                       0                25
## 2                4.71                       0                21
## 3                2.83                       0                29
## 4                5.04                       0                36
## 5                2.51                       0                38
## 6                5.03                       0                50
##   fairlyactiveminutes lightlyactiveminutes sedentaryminutes calories
## 1                  13                  328              728     1985
## 2                  19                  217              776     1797
## 3                  34                  209              726     1745
## 4                  10                  221              773     1863
## 5                  20                  164              539     1728
## 6                  31                  264              775     2035
##   totalsleeprecords totalminutesasleep totaltimeinbed time_not_sleeping
## 1                 1                327            346                19
## 2                 2                384            407                23
## 3                 1                412            442                30
## 4                 2                340            367                27
## 5                 1                700            712                12
## 6                 1                304            320                16

The result showed that users on average slept 419.2 minutes (equivalent to less than 7 hours of sleep), which was below the recommended 8 hour sleep. Meanwhile, users on average spent over half an hour (39.31 minutes) in an active state on bed.

daily_activity_sleep%>%select(time_not_sleeping,totalminutesasleep)%>%summary()
##  time_not_sleeping totalminutesasleep
##  Min.   :  0.00    Min.   : 58.0     
##  1st Qu.: 17.00    1st Qu.:361.0     
##  Median : 25.50    Median :432.5     
##  Mean   : 39.31    Mean   :419.2     
##  3rd Qu.: 40.00    3rd Qu.:490.0     
##  Max.   :371.00    Max.   :796.0

However, when examining the relationship between sleep time/restless time and weight, the correlations were weak. Nonetheless, this finding was not conclusive because of the small sample size.

daily_sleep_weight%>%summarize(weight_nosleep=cor(time_not_sleeping, WeightKg))
##   weight_nosleep
## 1      0.1023052
daily_sleep_weight%>%summarize(weight_sleep=cor(totalminutesasleep, WeightKg))
##   weight_sleep
## 1   0.03348514

Ultimately, I looked into the users’ total amount of sleep based on their weight categories.It was found that users who were the most healthy or with normal weight slept the longest, while users who were overweight slept relatively less compared with those under obsese class III.

daily_sleep_weight%>%select(weight_categories,totalminutesasleep)%>%group_by(weight_categories)%>%summarise(mean(totalminutesasleep))
## # A tibble: 3 × 2
##   weight_categories `mean(totalminutesasleep)`
##   <chr>                                  <dbl>
## 1 Normal Weight                           437.
## 2 Obese Class III                         398 
## 3 Overweight                              332

User sleep and sedentary activities

Finally, there was a negative correlation between the total minutes asleep and minutes of sedentary activities. Although there were not enough data points for users with lower sedentary minutes, the chart below demonstrated that the less active an individual was, the worse the sleep quality became.

Relationship_Sedentary_Sleep<-daily_activity_sleep%>%ggplot(aes(sedentaryminutes,totalminutesasleep))+geom_point()+geom_smooth(method="loess")+ggtitle("Relationship Between Sleep and Sedentary Activities")+xlab("Sedentary Minutes")+ylab("Total Minutes Asleep")

Act

The target audience of Bellabeat’s marketing strategy is working women in office settings.

After analysing sample data from Fitbit users, the following recommendations were provided to improve Bellabeat’s marketing strategy.

Main findings and recommendations

  1. Based on the calories burned throughout the hours of a day, the time between late afternoon and early evening is the period when calories are most expended. This reflects the circadian rhythms - a natural calorie-burning pattern in humans. This has significance for dietary control, as eating during low calorie-burning hours might result in extra stored calories. This provides insights for the Bellabeat membership program to tailor personal guidance based on nutrition and circadian rhythms.

  2. As sample Fitbit users were the most active during the period between 17:00 to 19:00 in terms of calory expenditure and activity intensities, notifications for taking activities could be pushed out during this time. Recommendations for activities could include fun ways to exercise that Bellabeat users could take based on the amount of calories to be expended or exercises to take for the day.

  3. Given that the sample Fitbit users did not have a healthy active lifestyle – they did not take enough fairly active exercises and steps per day, the Bellabeat app could be designed to send daily reminders for how much more exercise Bellabeat users should aim for.

  4. As there were almost as many Fitbit overweight users as those with normal weight, the Bellabeat membership program could customize monthly and weekly goals for users – especially overweight users, including recommendations covering diet, sleep patterns, and activities.

  5. As sample Fitbit users on average had less than the recommended eight-hour sleep and relatively poor sleep quality (with nearly 40 minutes of time in bed in an active state), the Bellabeat products - such as wellness watch – can remind users of their sleep time and also provide health tips and mindfulness quotes to help users fall asleep better. Furthermore, as poor sleep is correlated with sedentary activities, notifications for more exercises should be pushed out when users are experiencing a lack of/poor sleep.

  6. When examining the total amounts of sleep based on weight categories, sample Fitbit users with normal weight had a longer duration of average sleep. In the future, more data should be collected to analyze the relationship between sleep, weight, and BMI to provide further recommendations. Factors such as demographic information (e.g. age and gender) are also important considerations for data collection.