“Leaf Urban”, one of Bellabeat’s stylish products
The Google Data Analytics Certificate offered on Coursera.org provides an 8-course curriculum which teaches entry-level data analysts a variety of data-related skills and techniques. Through videos lead by Google employee instructors, quizzes on course content, and hands-on practice demonstrations with programs like SQL, Excel, Tableau, and R, this curriculum teaches how to navigate data in all steps of its journey. In preparation for real entry-level jobs, the courses offer real-life data set examples and tangible ways to practice data skills in a methodical framework, called the “Six Steps of Data Analysis”: Ask, Prepare, Process, Analyze, Share, and Act. This case study will use the same framework, excluding the “Act” phase, to investigate the smart device company, Bellabeat.
The background information, scenario, and requirements for this company were provided by the Google Data Analytics Certificate course, coming specifically from the eighth and final module, “Data Analytics Capstone Project: Complete a Case Study”. The goal of this case study is to demonstrate the knowledge, process, problem-solving, and skillsets learned from the previous modules of the curriculum and apply it to a world-use case.
Bellabeat is a high-tech company that manufactures health-focused smart products founded by Urška Sršen. Sršen used her background as an artist to develop beautifully designed technology that informs and inspires women around the world. Collecting data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with knowledge about their own health and habits.
By 2016, Bellabeat had opened offices around the world and launched multiple products. Bellabeat products became available through a growing number of online retailers in addition to their own e-commerce channel on their website. The company has invested in traditional advertising media, such as radio, out-of-home billboards, print, and television, but focuses on digital marketing extensively.
Sršen knows that an analysis of Bellabeat’s available consumer data would reveal more opportunities for growth. She has asked the marketing analytics team to focus on a Bellabeat product and analyze smart device usage data in order to gain insight into how people are already using their smart devices. Then, using this information, she would like high-level recommendations for how these trends can inform Bellabeat marketing strategy.
This case study will focus on one of Bellabeat’s products and analyze smart device data to gain insight into how consumers are using their smart devices. The discovered insights will then help guide marketing strategy for the company. The analysis and high-level recommendations for Bellabeat’s marketing strategy will be presented to the Bellabeat executive team.
Urška Sršen: Bellabeat’s cofounder and Chief Creative Officer
Sando Mur: Mathematician and Bellabeat’s cofounder; key member of the Bellabeat executive team
Bellabeat marketing analytics team: A team of data analysts responsible for collecting, analyzing, and reporting data that helps guide Bellabeat’s marketing strategy. You joined this team six months ago and have been busy learning about Bellabeat’’s mission and business goals — as well as how you, as a junior data analyst, can help Bellabeat achieve them.
Bellabeat app: The Bellabeat app provides users with health data related to their activity, sleep, stress, menstrual cycle, and mindfulness habits. This data can help users better understand their current habits and make healthy decisions. The Bellabeat app connects to their line of smart wellness products.
Leaf: Bellabeat’s classic wellness tracker can be worn as a bracelet, necklace, or clip. The Leaf tracker connects to the Bellabeat app to track activity, sleep, and stress.
Time: This wellness watch combines the timeless look of a classic timepiece with smart technology to track user activity, sleep, and stress. The Time watch connects to the Bellabeat app to provide you with insights into your daily wellness.
Spring: This is a water bottle that tracks daily water intake using smart technology to ensure that you are appropriately hydrated throughout the day. The Spring bottle connects to the Bellabeat app to track your hydration levels.
Bellabeat membership: Bellabeat also offers a subscription-based membership program for users. Membership gives users 24/7 access to fully personalized guidance on nutrition, activity, sleep, health and beauty, and mindfulness based on their lifestyle and goals
The data used for this case study is Fit Bit Tracker data sourced from Kaggle.com (CC0: Public Domain, dataset made available through Mobius).
The information, formatting, and guidline of this case study comes from Coursera’s Google Data Analytics Capstone module (the 8th and final module): Case Study 2: How Can a Wellness Technology Company Play it Smart?.
Additional guiding questions:
We want to find any relevant trends from the data source and create recommendations to key stakeholders about Bellabeat’s marketing strategy.
Packages used in this case study:
library(tidyverse) #collection of R packages which we will be using often
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.6 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.4 ✓ stringr 1.4.0
## ✓ readr 2.1.1 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(dplyr) #for data manipulation
library(ggplot2) #data visualization package
library(ggpubr) #extensive visualizations with ggplot2
library(sqldf) #for running SQL commands within R
## Loading required package: gsubfn
## Loading required package: proto
## Warning in fun(libname, pkgname): couldn't connect to display ":0"
## Loading required package: RSQLite
library(lubridate) #for working with dates in R
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(janitor) #for data examination and cleaning
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(skimr) #for summary statistics in R
library(tidyr) #for organizing tabular data
library(RColorBrewer) #for color palettes
## Reads in dataset containing daily activities.
activity <- read.csv("dailyActivity_merged.csv")
## Reads in dataset containing daily calorie expenditures.
calories <- read.csv("dailyCalories_merged.csv")
## Reads in dataset containing daily intensities.
intensities <- read.csv("dailyIntensities_merged.csv")
## Reads in dataset containing daily steps.
steps <- read.csv("dailySteps_merged.csv")
## Reads in dataset for sleep data.
sleep <- read.csv("sleepDay_merged.csv")
## Reads in data set for logged weight.
weight <- read.csv("weightLogInfo_merged.csv")
## Reads in data set for hourly steps.
hourly_steps <- read.csv("hourlySteps_merged.csv")
Examining the first few rows of every data set:
head(activity)
## Id ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 1503960366 4/12/2016 13162 8.50 8.50
## 2 1503960366 4/13/2016 10735 6.97 6.97
## 3 1503960366 4/14/2016 10460 6.74 6.74
## 4 1503960366 4/15/2016 9762 6.28 6.28
## 5 1503960366 4/16/2016 12669 8.16 8.16
## 6 1503960366 4/17/2016 9705 6.48 6.48
## LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1 0 1.88 0.55
## 2 0 1.57 0.69
## 3 0 2.44 0.40
## 4 0 2.14 1.26
## 5 0 2.71 0.41
## 6 0 3.19 0.78
## LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 3.91 0 30
## 4 2.83 0 29
## 5 5.04 0 36
## 6 2.51 0 38
## FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1 13 328 728 1985
## 2 19 217 776 1797
## 3 11 181 1218 1776
## 4 34 209 726 1745
## 5 10 221 773 1863
## 6 20 164 539 1728
head(calories)
## Id ActivityDay Calories
## 1 1503960366 4/12/2016 1985
## 2 1503960366 4/13/2016 1797
## 3 1503960366 4/14/2016 1776
## 4 1503960366 4/15/2016 1745
## 5 1503960366 4/16/2016 1863
## 6 1503960366 4/17/2016 1728
head(intensities)
## Id ActivityDay SedentaryMinutes LightlyActiveMinutes
## 1 1503960366 4/12/2016 728 328
## 2 1503960366 4/13/2016 776 217
## 3 1503960366 4/14/2016 1218 181
## 4 1503960366 4/15/2016 726 209
## 5 1503960366 4/16/2016 773 221
## 6 1503960366 4/17/2016 539 164
## FairlyActiveMinutes VeryActiveMinutes SedentaryActiveDistance
## 1 13 25 0
## 2 19 21 0
## 3 11 30 0
## 4 34 29 0
## 5 10 36 0
## 6 20 38 0
## LightActiveDistance ModeratelyActiveDistance VeryActiveDistance
## 1 6.06 0.55 1.88
## 2 4.71 0.69 1.57
## 3 3.91 0.40 2.44
## 4 2.83 1.26 2.14
## 5 5.04 0.41 2.71
## 6 2.51 0.78 3.19
head(sleep)
## Id SleepDay TotalSleepRecords TotalMinutesAsleep
## 1 1503960366 4/12/2016 12:00:00 AM 1 327
## 2 1503960366 4/13/2016 12:00:00 AM 2 384
## 3 1503960366 4/15/2016 12:00:00 AM 1 412
## 4 1503960366 4/16/2016 12:00:00 AM 2 340
## 5 1503960366 4/17/2016 12:00:00 AM 1 700
## 6 1503960366 4/19/2016 12:00:00 AM 1 304
## TotalTimeInBed
## 1 346
## 2 407
## 3 442
## 4 367
## 5 712
## 6 320
head(weight)
## Id Date WeightKg WeightPounds Fat BMI
## 1 1503960366 5/2/2016 11:59:59 PM 52.6 115.9631 22 22.65
## 2 1503960366 5/3/2016 11:59:59 PM 52.6 115.9631 NA 22.65
## 3 1927972279 4/13/2016 1:08:52 AM 133.5 294.3171 NA 47.54
## 4 2873212765 4/21/2016 11:59:59 PM 56.7 125.0021 NA 21.45
## 5 2873212765 5/12/2016 11:59:59 PM 57.3 126.3249 NA 21.69
## 6 4319703577 4/17/2016 11:59:59 PM 72.4 159.6147 25 27.45
## IsManualReport LogId
## 1 True 1.462234e+12
## 2 True 1.462320e+12
## 3 False 1.460510e+12
## 4 True 1.461283e+12
## 5 True 1.463098e+12
## 6 True 1.460938e+12
head(steps)
## Id ActivityDay StepTotal
## 1 1503960366 4/12/2016 13162
## 2 1503960366 4/13/2016 10735
## 3 1503960366 4/14/2016 10460
## 4 1503960366 4/15/2016 9762
## 5 1503960366 4/16/2016 12669
## 6 1503960366 4/17/2016 9705
head(hourly_steps)
## Id ActivityHour StepTotal
## 1 1503960366 4/12/2016 12:00:00 AM 373
## 2 1503960366 4/12/2016 1:00:00 AM 160
## 3 1503960366 4/12/2016 2:00:00 AM 151
## 4 1503960366 4/12/2016 3:00:00 AM 0
## 5 1503960366 4/12/2016 4:00:00 AM 0
## 6 1503960366 4/12/2016 5:00:00 AM 0
It seems as though the columns from calories, intensities, and steps are subsets of activity. Using a trick I found via this Kaggle capstone project, we will be using SQL queries in R to check for subsets via sqldf():
sqldf("SELECT COUNT()
FROM activity
LEFT JOIN calories ON
activity.Id = calories.Id AND
activity.ActivityDate = calories.ActivityDay AND
activity.Calories = calories.Calories")
## COUNT()
## 1 940
sqldf("SELECT COUNT()
FROM activity
LEFT JOIN steps ON
activity.Id = steps.Id AND
activity.ActivityDate = steps.ActivityDay AND
activity.TotalSteps = steps.StepTotal")
## COUNT()
## 1 940
sqldf("SELECT COUNT()
FROM activity
LEFT JOIN intensities ON
activity.Id = intensities.Id AND
activity.ActivityDate = intensities.ActivityDay AND
activity.SedentaryMinutes = intensities.SedentaryMinutes AND
activity.LightlyActiveMinutes = intensities.LightlyActiveMinutes AND
activity.FairlyActiveMinutes = intensities.FairlyActiveMinutes AND
activity.VeryActiveMinutes = intensities.VeryActiveMinutes AND
activity.SedentaryActiveDistance = intensities.SedentaryActiveDistance AND
activity.LightActiveDistance = intensities.LightActiveDistance AND
activity.ModeratelyActiveDistance = intensities.ModeratelyActiveDistance AND
activity.VeryActiveDistance = intensities.VeryActiveDistance")
## COUNT()
## 1 940
Each query returned 940 counted observations, so it is true that the columns from calories, steps, and intensities are already included in the data set, activity. We do not have to include calories, steps, or intensities data sets in our analysis.
n_distinct(activity$Id)
## [1] 33
n_distinct(sleep$Id)
## [1] 24
n_distinct(weight$Id)
## [1] 8
The sample size for weight data is relatively small, so it will not be included.
# list rows of data that have missing values
activity[!complete.cases(activity),]
## [1] Id ActivityDate TotalSteps
## [4] TotalDistance TrackerDistance LoggedActivitiesDistance
## [7] VeryActiveDistance ModeratelyActiveDistance LightActiveDistance
## [10] SedentaryActiveDistance VeryActiveMinutes FairlyActiveMinutes
## [13] LightlyActiveMinutes SedentaryMinutes Calories
## <0 rows> (or 0-length row.names)
sleep[!complete.cases(sleep),]
## [1] Id SleepDay TotalSleepRecords TotalMinutesAsleep
## [5] TotalTimeInBed
## <0 rows> (or 0-length row.names)
Checking for and removing duplicates:
sum(duplicated(activity))
## [1] 0
sum(duplicated(sleep))
## [1] 3
Our sleep data table has 3 duplicate rows - lets remove those:
sleep <- sleep[!duplicated(sleep), ]
sum(duplicated(sleep))
## [1] 0
# activity
activity$ActivityDate=as.POSIXct(activity$ActivityDate,
format='%m/%d/%Y',
tz=Sys.timezone())
activity$Date<-format(activity$ActivityDate,
format='%m/%d/%Y')
head(activity)
## Id ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 1503960366 2016-04-12 13162 8.50 8.50
## 2 1503960366 2016-04-13 10735 6.97 6.97
## 3 1503960366 2016-04-14 10460 6.74 6.74
## 4 1503960366 2016-04-15 9762 6.28 6.28
## 5 1503960366 2016-04-16 12669 8.16 8.16
## 6 1503960366 2016-04-17 9705 6.48 6.48
## LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1 0 1.88 0.55
## 2 0 1.57 0.69
## 3 0 2.44 0.40
## 4 0 2.14 1.26
## 5 0 2.71 0.41
## 6 0 3.19 0.78
## LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 3.91 0 30
## 4 2.83 0 29
## 5 5.04 0 36
## 6 2.51 0 38
## FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories Date
## 1 13 328 728 1985 04/12/2016
## 2 19 217 776 1797 04/13/2016
## 3 11 181 1218 1776 04/14/2016
## 4 34 209 726 1745 04/15/2016
## 5 10 221 773 1863 04/16/2016
## 6 20 164 539 1728 04/17/2016
# sleep
sleep$SleepDay=as.POSIXct(sleep$SleepDay,
format='%m/%d/%Y %I:%M:%S %p',
tz=Sys.timezone())
sleep$Date<-format(sleep$SleepDay,
format='%m/%d/%Y')
head(sleep)
## Id SleepDay TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
## 1 1503960366 2016-04-12 1 327 346
## 2 1503960366 2016-04-13 2 384 407
## 3 1503960366 2016-04-15 1 412 442
## 4 1503960366 2016-04-16 2 340 367
## 5 1503960366 2016-04-17 1 700 712
## 6 1503960366 2016-04-19 1 304 320
## Date
## 1 04/12/2016
## 2 04/13/2016
## 3 04/15/2016
## 4 04/16/2016
## 5 04/17/2016
## 6 04/19/2016
# hourly steps
hourly_steps$ActivityHour=as.POSIXct(hourly_steps$ActivityHour,
format='%m/%d/%Y %I:%M:%S %p',
tz=Sys.timezone())
hourly_steps$Date <- format(hourly_steps$ActivityHour,
format='%m/%d/%Y')
hourly_steps$Hour <- format(hourly_steps$ActivityHour,
format='%I:%M:%S')
head(hourly_steps)
## Id ActivityHour StepTotal Date Hour
## 1 1503960366 2016-04-12 00:00:00 373 04/12/2016 12:00:00
## 2 1503960366 2016-04-12 01:00:00 160 04/12/2016 01:00:00
## 3 1503960366 2016-04-12 02:00:00 151 04/12/2016 02:00:00
## 4 1503960366 2016-04-12 03:00:00 0 04/12/2016 03:00:00
## 5 1503960366 2016-04-12 04:00:00 0 04/12/2016 04:00:00
## 6 1503960366 2016-04-12 05:00:00 0 04/12/2016 05:00:00
intersect(as.character(sleep$Date), as.character(activity$Date))
## [1] "04/12/2016" "04/13/2016" "04/15/2016" "04/16/2016" "04/17/2016"
## [6] "04/19/2016" "04/20/2016" "04/21/2016" "04/23/2016" "04/24/2016"
## [11] "04/25/2016" "04/26/2016" "04/28/2016" "04/29/2016" "04/30/2016"
## [16] "05/01/2016" "05/02/2016" "05/03/2016" "05/05/2016" "05/06/2016"
## [21] "05/07/2016" "05/08/2016" "05/09/2016" "05/10/2016" "05/11/2016"
## [26] "04/14/2016" "04/22/2016" "04/27/2016" "05/04/2016" "05/12/2016"
## [31] "04/18/2016"
# inner join for activity and sleep
activity_sleep<-merge(activity, sleep,
by=c("Id", "Date"), all.x = TRUE)
head(activity_sleep)
## Id Date ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 1503960366 04/12/2016 2016-04-12 13162 8.50 8.50
## 2 1503960366 04/13/2016 2016-04-13 10735 6.97 6.97
## 3 1503960366 04/14/2016 2016-04-14 10460 6.74 6.74
## 4 1503960366 04/15/2016 2016-04-15 9762 6.28 6.28
## 5 1503960366 04/16/2016 2016-04-16 12669 8.16 8.16
## 6 1503960366 04/17/2016 2016-04-17 9705 6.48 6.48
## LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1 0 1.88 0.55
## 2 0 1.57 0.69
## 3 0 2.44 0.40
## 4 0 2.14 1.26
## 5 0 2.71 0.41
## 6 0 3.19 0.78
## LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 3.91 0 30
## 4 2.83 0 29
## 5 5.04 0 36
## 6 2.51 0 38
## FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories SleepDay
## 1 13 328 728 1985 2016-04-12
## 2 19 217 776 1797 2016-04-13
## 3 11 181 1218 1776 <NA>
## 4 34 209 726 1745 2016-04-15
## 5 10 221 773 1863 2016-04-16
## 6 20 164 539 1728 2016-04-17
## TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
## 1 1 327 346
## 2 2 384 407
## 3 NA NA NA
## 4 1 412 442
## 5 2 340 367
## 6 1 700 712
activity_sleep %>%
select(TotalSteps, TotalDistance, Calories, VeryActiveMinutes, FairlyActiveMinutes, LightlyActiveMinutes, SedentaryMinutes, TotalMinutesAsleep, TotalTimeInBed, TotalSleepRecords) %>%
summary()
## TotalSteps TotalDistance Calories VeryActiveMinutes
## Min. : 0 Min. : 0.000 Min. : 0 Min. : 0.00
## 1st Qu.: 3790 1st Qu.: 2.620 1st Qu.:1828 1st Qu.: 0.00
## Median : 7406 Median : 5.245 Median :2134 Median : 4.00
## Mean : 7638 Mean : 5.490 Mean :2304 Mean : 21.16
## 3rd Qu.:10727 3rd Qu.: 7.713 3rd Qu.:2793 3rd Qu.: 32.00
## Max. :36019 Max. :28.030 Max. :4900 Max. :210.00
##
## FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes TotalMinutesAsleep
## Min. : 0.00 Min. : 0.0 Min. : 0.0 Min. : 58.0
## 1st Qu.: 0.00 1st Qu.:127.0 1st Qu.: 729.8 1st Qu.:361.0
## Median : 6.00 Median :199.0 Median :1057.5 Median :432.5
## Mean : 13.56 Mean :192.8 Mean : 991.2 Mean :419.2
## 3rd Qu.: 19.00 3rd Qu.:264.0 3rd Qu.:1229.5 3rd Qu.:490.0
## Max. :143.00 Max. :518.0 Max. :1440.0 Max. :796.0
## NA's :530
## TotalTimeInBed TotalSleepRecords
## Min. : 61.0 Min. :1.000
## 1st Qu.:403.8 1st Qu.:1.000
## Median :463.0 Median :1.000
## Mean :458.5 Mean :1.119
## 3rd Qu.:526.0 3rd Qu.:1.000
## Max. :961.0 Max. :3.000
## NA's :530 NA's :530
#data frame for highlighting outliers in total steps
highlight_df <- activity_sleep %>%
filter(TotalSteps>25000)
We will separate observations into fitness groups based on walking lifestyle: “Sedentary, Slightly Active, Fairly Active, and Very Active”.
activity_sleep$walking_lifestyle <- ifelse(
(activity_sleep$TotalSteps <= 3790),"Sedentary",
ifelse((activity_sleep$TotalSteps > 3790 & activity_sleep$TotalSteps <= mean(activity_sleep$TotalSteps)), "Slightly Active",
ifelse((activity_sleep$TotalSteps > mean(activity_sleep$TotalSteps) & activity_sleep$TotalSteps <= 10727), "Fairly Active",
ifelse((activity_sleep$TotalSteps > 10727), "Very Active", "Other"
)
)
)
)
head(activity_sleep)
## Id Date ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 1503960366 04/12/2016 2016-04-12 13162 8.50 8.50
## 2 1503960366 04/13/2016 2016-04-13 10735 6.97 6.97
## 3 1503960366 04/14/2016 2016-04-14 10460 6.74 6.74
## 4 1503960366 04/15/2016 2016-04-15 9762 6.28 6.28
## 5 1503960366 04/16/2016 2016-04-16 12669 8.16 8.16
## 6 1503960366 04/17/2016 2016-04-17 9705 6.48 6.48
## LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1 0 1.88 0.55
## 2 0 1.57 0.69
## 3 0 2.44 0.40
## 4 0 2.14 1.26
## 5 0 2.71 0.41
## 6 0 3.19 0.78
## LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 3.91 0 30
## 4 2.83 0 29
## 5 5.04 0 36
## 6 2.51 0 38
## FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories SleepDay
## 1 13 328 728 1985 2016-04-12
## 2 19 217 776 1797 2016-04-13
## 3 11 181 1218 1776 <NA>
## 4 34 209 726 1745 2016-04-15
## 5 10 221 773 1863 2016-04-16
## 6 20 164 539 1728 2016-04-17
## TotalSleepRecords TotalMinutesAsleep TotalTimeInBed walking_lifestyle
## 1 1 327 346 Very Active
## 2 2 384 407 Very Active
## 3 NA NA NA Fairly Active
## 4 1 412 442 Fairly Active
## 5 2 340 367 Very Active
## 6 1 700 712 Fairly Active
The average number of daily steps by users was 7,638 steps from 4/12/2016 - 5/12/2016.
Users averaged around 7 hours of daily sleep from 4/12/2016 - 5/12/2016.
There was a positive trend line between the number of daily steps and the number of calories burnt for a user based on the available tracker data.
The distribution for daily steps was right-skewed, meaning that there were more outliers who had daily steps far above the mean.
A user’s walking lifestyle is linked to calories burnt, distance walked, number of daily active minutes, and number of daily sedentary minutes. However, a user’s walking lifestyle is not linked to sleep.
Tracker usage decreased overtime and dropped heavily by the end of the study, from an average daily use of 24 hours to 8 hours.
From these trends, we can make several recommendations to Urška Sršen and Bellabeat’s marketing strategy team. These key findings indicate a strategy pertaining to the Bellabeat app.
Based on the findings, it appears that retention rate is a big issue, as users used their Fit Bit trackers an average of 16 hours less from the beginning to the end of the study. To illicit excitement and staying power, it is important to engage users through development of the app. The app has an abysmal 2.6/5 stars on the Google Play Store and 4.4/5 stars on Apple’s app store - our recommendations will be based on bolstering the app’s success with Bellabeat’s users.
User retention rate is defined as the number of days that a user continues to use a certain product after its purchase or acquisition1. Tracking data as early as possible2, personalizing user experiences3, offering two-way communication between the brand and customer4, adjusting push notifications to be more receptive with users5, reducing the cognitive load - or unecessary noise - within the app6, mapping the user interface according to “thumb zones”7, optimizing the app’s onboarding process8, and gamifying the app via rewawrd systems and seratonin-responses9 are a few of the many tactics used for building a user-engaging app strategy.
Bellabeat wants to gain market share by out competing competitors such as Fit Bit, therefore a focus on its users and their relationship with their Bellebat products - app included - has to improve from its current 2.6 Google Play store rating. The trends indicate user retention lasts up to 30 days then drops substantially, which begs the question: what are ways in which Bellabeat can retain their customers better than Fit Bit does?
We recommend that Bellabeat invests into software developers and engineers who can create or update a fitness app that matches the aesthetic beauty of the company’s fitness devices. With a focus on customer service, user interface, and tracker reliability, the app will boost Bellabeat’s own reputation in the fitness tracker market while keeping users engaged for longer periods.
There were several limiting factors to the data:
The gender was not specificied for the user in the data. Bellabeat is company focused on fitness devices catered to women, so it would have been helpful to see how different genders used their fitness trackers.
The data is limited to a one-month range from 4/12/2016-5/12/2016. First, we don’t know how the knowledge of this range impacted user retention of the tracker - there is a possibility that users stopped using the tracker since they knew the study would end soon. Second, a month’s worth of data does not give us other important trends to consider like seasonality.
The data is taken from a small sample size of 30 participants. This misleads our data by having overly weighted outliers and higher margins of error
The data was limited to fitness data taken from the tracker and did not consider data from a possible app that is connected to the tracker.
In addition to not knowing the user gender data, the user demographic data was limited. We were missing features such as age, weight, BMI as well as other health data like nutrition.
There is a limit in skill too. Since this is my first R notebook, I am certainly missing many pathways and ideas to visualizing, predicting, and analyzing the source data.
This was my very first R Workbook - thank you for reading
through until the end. Here is a picture of a kitten as a
reward