By Christy
The purpose of this case study analysis is to provide marketing
strategy solutions to Bellabeat, a high-tech manufacturer of
health-focused products for women.
##About the company
Urška Sršen and Sando Mur founded Bellabeat, a high-tech company that
manufactures health-focused smart products. Sršen used her background as
an artist to develop beautifully designed technology that informs and
inspires women around the world. Collecting data on activity, sleep,
stress, and reproductive health has allowed Bellabeat to empower women
with knowledge about their own health and habits. Since it was founded
in 2013, Bellabeat has grown rapidly and quickly positioned itself as a
tech-driven wellness company for women.
This document is divided into sections that reflect the data analysis
process, which includes the following six phases: ask,
prepare, process,
analyze, share, and
act.
Process
Then, I cleaned the data to remove duplicated entries and made the
variable names with consistent and accessible formats.
daily_activity <- daily_activity %>% distinct() %>% drop_na()
daily_sleep <- daily_sleep %>% distinct() %>% drop_na()
daily_steps <- daily_steps %>% distinct() %>% drop_na()
hourly_calories <- hourly_calories %>% distinct() %>% drop_na()
hourly_intensities <- hourly_intensities %>% distinct() %>% drop_na()
hourly_steps <- hourly_steps %>% distinct() %>% drop_na()
daily_activity <- rename_with(daily_activity, tolower)
daily_sleep <- rename_with(daily_sleep, tolower)
daily_steps <- rename_with(daily_steps, tolower)
hourly_calories <- rename_with(hourly_calories, tolower)
hourly_intensities <- rename_with(hourly_intensities, tolower)
hourly_steps <- rename_with(hourly_steps, tolower)
The date entries were then converted to usable formats for analysis
at a later stage.
daily_activity<-daily_activity%>%rename(date=activitydate)%>%mutate(date = as_date(date, format = "%m/%d/%Y"))
daily_sleep<-daily_sleep%>%rename(date=sleepday)%>%mutate(date = as_date(date, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone()))
hourly_calories<-hourly_calories%>%rename(date_time=activityhour)%>%mutate(date_time = as.POSIXct(date_time, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone()))
hourly_intensities<-hourly_intensities%>%rename(date_time=activityhour)%>%mutate(date_time = as.POSIXct(date_time, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone()))
hourly_steps<-hourly_steps%>%rename(date_time=activityhour)%>%mutate(date_time = as.POSIXct(date_time, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone()))
To find the relationship between sleeping patterns and daily
activities of users, I then merged the two datasets together.
daily_activity_sleep <- merge(daily_activity, daily_sleep, by=c ("id", "date"))
Analyze and Share
For analysis, the relevant data visualisation packages were
loaded.
library(ggplot2)
library(ggthemes)
Daily activity
It was observable that most users spent their time in sedentary
activities (an average of 991.2 minutes), followed by lightly active
exercises (an average of 192.8 minutes). While the World Health
Organisation notes
that adults aged 18 to 64 years old should have at least 150 to 300
minutes of fairly active aerobic physical activity per week for good
health, which is equivalent to at least 21.4 to 42.9 minutes per
day.Yet, the average fairly active duration per day was 13.56 minutes
according to the sample data.
In terms of calories expended and the total number of steps taken,
Fitbit users have, on average, reached the bottom line for relatively
good health. An average of 2,304 calories was expended, which was
slightly higher than the 2,200 calories for male and 1,600 calories for
female according
to the Healthline. Meanwhile, although an average of 7,638 steps
were taken per day - exceeding the 5,000 steps benchmark for sedentary
lifestyle, the general medical advice recommends
most adults to take 10,000 steps per day for good health.
daily_activity%>%select(veryactiveminutes,fairlyactiveminutes,lightlyactiveminutes,sedentaryminutes,calories,totalsteps)%>%summary()
## veryactiveminutes fairlyactiveminutes lightlyactiveminutes sedentaryminutes
## Min. : 0.00 Min. : 0.00 Min. : 0.0 Min. : 0.0
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.:127.0 1st Qu.: 729.8
## Median : 4.00 Median : 6.00 Median :199.0 Median :1057.5
## Mean : 21.16 Mean : 13.56 Mean :192.8 Mean : 991.2
## 3rd Qu.: 32.00 3rd Qu.: 19.00 3rd Qu.:264.0 3rd Qu.:1229.5
## Max. :210.00 Max. :143.00 Max. :518.0 Max. :1440.0
## calories totalsteps
## Min. : 0 Min. : 0
## 1st Qu.:1828 1st Qu.: 3790
## Median :2134 Median : 7406
## Mean :2304 Mean : 7638
## 3rd Qu.:2793 3rd Qu.:10727
## Max. :4900 Max. :36019
Hourly activity
The below analysis showed that the peak hours of calory expenditure
occurred during the peak hours of 17:00 to 19:00 after work. This might
be the time when users had more free time to exercise. On the other
hand, the least calories were expended during midnight and early
morning.
hourly_calories<-hourly_calories%>%mutate(time=format(as.POSIXct(date_time),format="%H:%M"))
hourly_calories%>%ggplot(aes(x=time,y=calories))+geom_bar(stat="identity")+labs(title="Calories Burned by Time",x="Time (hour)",y="Calories")+theme(plot.title = element_text(hjust = 0.5))

Further analysis was conducted to examine user activity intensities
by hour, which reflected similar patterns as the chart above (calories
burned by hour). Below, it was clear that the period between 17:00 and
19:00 had the highest minute-level intensity values that occurred within
the hour. As a result, Fitbit users were most active during these
hours.
hourly_intensities<-hourly_intensities%>%mutate(time=format(as.POSIXct(date_time),format="%H:%M"))
hourly_intensities%>%ggplot(aes(x=time,y=totalintensity))+geom_bar(stat="identity")+labs(title="Activity Intensities by Hour",x="Time (hour)",y="Intensities")+theme(plot.title = element_text(hjust = 0.5))

Relationship between distance tracked and calories
After analysis, I found that there was a positive correlation between
the total distance tracked by Fitbit device and total estimated energy
expenditure in kilocalories. Therefore, more exercise is beneficial for
good health.
Relationship_TrackerDistance_Calories<-daily_activity%>%ggplot(aes(totalsteps,calories))+geom_point()+geom_smooth(method="loess")+ggtitle("Relationship Between Total Steps and Calories")+xlab("Total Number of Steps")+ylab("Calories (kilocalories)")
Relationship_TrackerDistance_Calories

Relationship between weight, BMI and calories
According to the weight
classification based on BMI range, there are seven categories of
weight, as shown below.
BMI range and classification
Looking at the summary statistics of the users’ BMI and weight, the
mean BMI of users is 25.19, which was under the overweight category
(25-29.9). Meanwhile, the mean weight was 72.04kg.
weight%>%select(BMI,WeightKg)%>%summary()
## BMI WeightKg
## Min. :21.45 Min. : 52.60
## 1st Qu.:23.96 1st Qu.: 61.40
## Median :24.39 Median : 62.50
## Mean :25.19 Mean : 72.04
## 3rd Qu.:25.56 3rd Qu.: 85.05
## Max. :47.54 Max. :133.50
I created another variable called “weight_categories” for further
analysis.
weightfinal<-weight%>%mutate(weight_categories=case_when(BMI<16.5 ~ "Severely Underweight",BMI>=16.5 & BMI<=18.4 ~ "Underweight",BMI>=18.5 & BMI<=24.9 ~ "Normal Weight",BMI>=25 & BMI<=29.9 ~ "Overweight", BMI>=30 & BMI<=34.9 ~ "Obese Class I", BMI>=35 & BMI<=39.9 ~ "Obese Class II", BMI>=40 ~ "Obese Class III"))
Then, I created a boxplot to find the relationship between weight and
BMI. For users who were under the overweight category, the mean weight
was over 85 kg. There was an outlier case for obese class III.
weightfinal%>%ggplot(aes(x=reorder(weight_categories,BMI),y=WeightKg))+geom_boxplot(outlier.color="red")+labs(title="Average Weight (kg) by BMI",x="BMI Classification",y="Weight (kg)")+scale_y_continuous(breaks=seq(60,135,15))+theme(plot.title = element_text(hjust = 0.5))

Nonetheless, the accuracy of these conclusions was undermined by the
small sample size and also the lack of demographic information. For
example, BMI classifications depend on age and sex for individuals aged
between 2 and 20 years old (see link).
I also found that there were almost as much overweight users as users
who were with normal weight, as displayed below.
weight_classification<-weightfinal%>%group_by(weight_categories)%>%summarise(number_of_people=n())
weight_classification%>%ggplot(aes(x="", y=number_of_people, fill=weight_categories)) +geom_bar(stat="identity", width=1,color="white")+geom_text(aes(label=number_of_people),position=position_stack(vjust=0.5)) +coord_polar(theta="y", start=0)+theme_economist()+theme_void()+labs(fill="Weight Categories")+scale_fill_discrete(breaks=c("Normal Weight","Overweight","Obese Class III"))+ggtitle("Weight Categories of Bellabeat Users")+theme(plot.title = element_text(hjust = 0.5))

User sleep and weight
Studies have shown that poor sleep or stress is linked to more
consumption of calories and weight gain. I merged the two datasets
related to weight and sleep. Relevant information can be found on Harvard
Health Publishing and Healthline.
weightfinal$Date<-mdy_hms(weightfinal$Date)
weightrevised<-weightfinal%>%mutate(YMDDate=format(as.POSIXct(Date),format="%Y-%m-%d"))
weightrevised<-weightrevised%>%rename("date"="Date")%>%rename("id"="Id")
daily_sleep_weight <- merge(daily_activity_sleep, weightrevised, by=c ("id", "date"))
Then, I created an extra variable named “time_not_sleeping” to find
the difference between total time in bed and total minutes asleep. This
would include times when Fitbit users were restless and awake,
indicating poor sleep quality.
daily_sleep_weight<-daily_sleep_weight%>%mutate(time_not_sleeping=totaltimeinbed-totalminutesasleep)
head(daily_activity_sleep)
## id date totalsteps totaldistance trackerdistance
## 1 1503960366 2016-04-12 13162 8.50 8.50
## 2 1503960366 2016-04-13 10735 6.97 6.97
## 3 1503960366 2016-04-15 9762 6.28 6.28
## 4 1503960366 2016-04-16 12669 8.16 8.16
## 5 1503960366 2016-04-17 9705 6.48 6.48
## 6 1503960366 2016-04-19 15506 9.88 9.88
## loggedactivitiesdistance veryactivedistance moderatelyactivedistance
## 1 0 1.88 0.55
## 2 0 1.57 0.69
## 3 0 2.14 1.26
## 4 0 2.71 0.41
## 5 0 3.19 0.78
## 6 0 3.53 1.32
## lightactivedistance sedentaryactivedistance veryactiveminutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 2.83 0 29
## 4 5.04 0 36
## 5 2.51 0 38
## 6 5.03 0 50
## fairlyactiveminutes lightlyactiveminutes sedentaryminutes calories
## 1 13 328 728 1985
## 2 19 217 776 1797
## 3 34 209 726 1745
## 4 10 221 773 1863
## 5 20 164 539 1728
## 6 31 264 775 2035
## totalsleeprecords totalminutesasleep totaltimeinbed
## 1 1 327 346
## 2 2 384 407
## 3 1 412 442
## 4 2 340 367
## 5 1 700 712
## 6 1 304 320
daily_activity_sleep<-daily_activity_sleep%>%mutate(time_not_sleeping=totaltimeinbed-totalminutesasleep)
head(daily_activity_sleep)
## id date totalsteps totaldistance trackerdistance
## 1 1503960366 2016-04-12 13162 8.50 8.50
## 2 1503960366 2016-04-13 10735 6.97 6.97
## 3 1503960366 2016-04-15 9762 6.28 6.28
## 4 1503960366 2016-04-16 12669 8.16 8.16
## 5 1503960366 2016-04-17 9705 6.48 6.48
## 6 1503960366 2016-04-19 15506 9.88 9.88
## loggedactivitiesdistance veryactivedistance moderatelyactivedistance
## 1 0 1.88 0.55
## 2 0 1.57 0.69
## 3 0 2.14 1.26
## 4 0 2.71 0.41
## 5 0 3.19 0.78
## 6 0 3.53 1.32
## lightactivedistance sedentaryactivedistance veryactiveminutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 2.83 0 29
## 4 5.04 0 36
## 5 2.51 0 38
## 6 5.03 0 50
## fairlyactiveminutes lightlyactiveminutes sedentaryminutes calories
## 1 13 328 728 1985
## 2 19 217 776 1797
## 3 34 209 726 1745
## 4 10 221 773 1863
## 5 20 164 539 1728
## 6 31 264 775 2035
## totalsleeprecords totalminutesasleep totaltimeinbed time_not_sleeping
## 1 1 327 346 19
## 2 2 384 407 23
## 3 1 412 442 30
## 4 2 340 367 27
## 5 1 700 712 12
## 6 1 304 320 16
The result showed that users on average slept 419.2 minutes
(equivalent to less than 7 hours of sleep), which was below the
recommended 8 hour sleep. Meanwhile, users on average spent over half an
hour (39.31 minutes) in an active state on bed.
daily_activity_sleep%>%select(time_not_sleeping,totalminutesasleep)%>%summary()
## time_not_sleeping totalminutesasleep
## Min. : 0.00 Min. : 58.0
## 1st Qu.: 17.00 1st Qu.:361.0
## Median : 25.50 Median :432.5
## Mean : 39.31 Mean :419.2
## 3rd Qu.: 40.00 3rd Qu.:490.0
## Max. :371.00 Max. :796.0
However, when examining the relationship between sleep time/restless
time and weight, the correlations were weak. Nonetheless, this finding
was not conclusive because of the small sample size.
daily_sleep_weight%>%summarize(weight_nosleep=cor(time_not_sleeping, WeightKg))
## weight_nosleep
## 1 0.1023052
daily_sleep_weight%>%summarize(weight_sleep=cor(totalminutesasleep, WeightKg))
## weight_sleep
## 1 0.03348514
Ultimately, I looked into the users’ total amount of sleep based on
their weight categories.It was found that users who were the most
healthy or with normal weight slept the longest, while users who were
overweight slept relatively less compared with those under obsese class
III.
daily_sleep_weight%>%select(weight_categories,totalminutesasleep)%>%group_by(weight_categories)%>%summarise(mean(totalminutesasleep))
## # A tibble: 3 × 2
## weight_categories `mean(totalminutesasleep)`
## <chr> <dbl>
## 1 Normal Weight 437.
## 2 Obese Class III 398
## 3 Overweight 332
User sleep and sedentary activities
Finally, there was a negative correlation between the total minutes
asleep and minutes of sedentary activities. Although there were not
enough data points for users with lower sedentary minutes, the chart
below demonstrated that the less active an individual was, the worse the
sleep quality became.
Relationship_Sedentary_Sleep<-daily_activity_sleep%>%ggplot(aes(sedentaryminutes,totalminutesasleep))+geom_point()+geom_smooth(method="loess")+ggtitle("Relationship Between Sleep and Sedentary Activities")+xlab("Sedentary Minutes")+ylab("Total Minutes Asleep")