Bellabeat Case Study

HOW CAN A WELLNESS TECHNOLOGY COMPANY PLAY IT SMART?

Google Data Analytics Professional Certificate Capstone Project

This case study is the final project in the Google Data Analytics Professional Certificate program. During the study, I will be assuming the role of junior data analyst for Bellabeat, a company that produces high-tech wellness products for women.

Scenario

Bellabeat believes that analysis of fitness technology data will help them gain insight into the the wellness technology market. I was asked to analyze smart device data to better understand the consumers use their smart devices.

I was provided a small dataset of fitness tracker information for analysis. I was also asked to look for additional datasets that may help support Bellabeat’s consumer analysis.

I have been asked to document my processing and provide five deliverables. Toward this goal, I will follow the steps outlined for the project. I will also include code chunks that document my analysis.

STEP 1: ASK

What is the business problem you are trying to solve?

The BellaBeat team is looking for information about how people use their smart devices. They want to understand how they can analyze their in-house data to better understand their own customers. Additionally, they are looking for high-level recommendations about how trends in smart device usage can direct their marketing strategy.

BellaBeat has provided us with Fitbit data as a starting point toward their goal; they understand that this dataset has limitations. They would like us to find other applicable datasets, if possible, and analyze them.

Identifying the business task:

Identifying the audience:

BellaBeat team:
- Urška Sršen: cofounder and chief creative officer
- Sando Mur: cofounder, executive team member, mathematician
- BellaBeat marketing analytics team: collect, analyze, and report data to guide the BellaBeat marketing strategy.

Identifying the business goal:

Analyze how customers are using fitness technologies to assess:

trends in smart device usage
how trends might apply to BellaBeat customers
how trends might help influence BellaBeat marketing strategy
how trends in smart device usage and technology can be applied to BellaBeat products

Deliverable 1: a clear statement of the business task

Analyze the Fitbit dataset (Mobius, 2021) for data trends among users.
Identify datasets containing information about smart device usage.

Analyze for smart device usage trends

Identify how results of analyses can be applied toward understanding BellaBeat customers’ data

Make recommendations for analyses of BellaBeat customer data

Based on analyses, make high-level recommendations about how trends in smart device usage can direct BellaBeat’s marketing strategy.
Present findings to BellaBeat team

STEP 2: PREPARE

Identify and prepare datasets.

Data used for analysis should be reliable, original, comprehensive, current, and cited, or ROCCC.

Three datasets were identified for analyses. They will be evaluated for ROCC standards prior to analysis.

Dataset 1: Fitbit Fitness Tracker Data

The datasets, downloaded at: https://www.kaggle.com/datasets/arashnic/fitbit, contains voluntary submissions from Fitbit users. Data categories in the dataset include activity, calories, heart rate, intensities, MET, sleep, and steps. Datasets are not representative samples of Fitbit users. Datasets may be biased. Descriptions for categorical data is not available, and in some circumstances, it is unclear what is being measured.

ROCCC: Analysis of Fitbit Fitness Tracker Data

Measure	Description
Reliable	• Dataset is limited to voluntary contributions by Fitbit users and is not complete
	• No way to check if data is unbiased
	• Dataset not vetted
	• Data not proven fit for use
Original	• Data not validated with the original source
Comprehensive	• Metadata not available
	• Descriptions of columns not available for any datasets; in some instances, it is unclear what the data measures
	• Large dataset will allow us to evaluate how different user are using Fitbits and for how long
Current	• Dataset was collected in 2016 and is not current
Cited	• Dataset has not been cited

Although the Fitbit Fitness Tracker Data does not meet ROCC specifications, Bellabeat requested analysis of this data. Analysis will be restricted to data categories that are unambiguous and easily identified.

Reference for Fitbit Fitness Tracker Data

Mobius. “FitBit Fitness Tracker Data.” Kaggle. 2021. 14 02 2023. https://www.kaggle.com/datasets/arashnic/fitbit.

Dataset 2: Worldwide Survey of Fitness Trends Data

The dataset, downloaded at https://links.lww.com/FIT/A133, includes data for the top 20 fitness trend between 2007 and 2020 from surveys performed by ACSM’s Health & Fitness Journal®. The 2020 survey was sent to 56,746 academic and health and wellness professionals, and included 3,067 survey responses (6%) (Thompson, 2019).

The data is available as a Word document table; it was copied to Excel and saved as a csv file.

ROCCC: Analysis of Worldwide Fitness Trends dataset

Measure	Description
Reliable	• Dataset contains compilation of ranking of fitness trends, 2009- 2020 and is complete
	• Dataset is voluntary responses to a survey and may not be unbiased (Mobius, 2021)
	• Dataset is vetted; survey respondents include identified members of the health fitness industry, academic professionals, and was also made available to online respondents (Thompson, 2019)
	• Data is fit for use
Original	• Data is original
Comprehensive	• Metadata available
	• Limited dataset
Current	• Dataset was collected between 2007 and 2020 and is current
Cited	• Dataset has been cited 260 times

The Analysis of Worldwide Fitness Trends dataset meets ROCC standards, so it will be used in analyses.

Reference for Worldwide Survey of Fitness Tracker Data

Thompson, W. R. (2019). Worldwide survey of fitness trends for 2020. ACSM’s Health and Fitness Journal 26(3), 10-18. Data downloaded 2023-02-05 from https://journals.lww.com/acsm-healthfitness/Fulltext/2019/11000/WORLDWIDE_SURVEY_OF_FITNESS_TRENDS_FOR_2020.6.aspx

Dataset 3: Dataset of Fitness Trackers and Smartwatches to Measuring Physical Activity in Research

This dataset contains information about 423 wearable technologies, including fitness trackers and watches, extracted from six databases, and the technologies installed in them. Fitness tracker devices in the dataset were released between 2011 and 2017. The authors verified extracted data against information available for wearable technologies on company websites.

Data was downloaded as a csv file from https://doi.org/10.18710/6ZWC9Z.

Some notes on wearable devices:

Wearable devices may contain one or more of the following technologies:

accelerometer: records speed and acceleration (Science for Sport, n.d.)
barometer: calculates altitude; helps determine vertical location (Terra API, 2022)
gps: tracks location; enables mapping technologies (Science for Sport, n.d.)
gyroscope: measures three-dimensional rotational movement (Science for Sport, n.d.)
magnetometer: measures true north; in conjunction with GPS, capable of determining position and direction of movement (Science for Sport, n.d.)
ppg: an optical sensor used in wearable technologies; can be used to track * pulse oximetry and heart rate (Castaneda, Esparza, Ghamari, Soltanpur, & Nazeran, 2018)

ROCCC: Analysis of Dataset of Fitness Trackers and Smartwatches to Measuring Physical Activity in Research Data

Measure	Description
Reliable	• Information was checked against online sources and is reliable
	• Dataset not vetted
	• Data fit for use
Original	• Dataset is original
Comprehensive	• Metadata is available
	• Descriptions of columns are available
	• Dataset is limited to online search of six databases and websites; it does not contain data for all fitness tracker and watches available between 2011 and 2016 , but is comprehensive
Current	• Dataset was reported in 2020, but contains information about fitness trackers and watches on the market between 2011 and 2017
Cited	• Dataset has not been cited

The dataset was checked against information available for products in the dataset; many of the products are no longer available on the market.

It is important for BellaBeat to understand technologies installed in wearable fitness trackers and watches. Although the Dataset of Fitness Trackers and Smartwatches to Measuring Physical Activity in Research data is not current, it is representative of technologies available in fitness trackers and watches and comparable to BellaBeat product lines, so data will be used in analyses.

References cited:

Castaneda, D., et al. “A review on wearable photoplethysmography sensors and their potential future applications in health care.” Int J Biosens Bioelectron 4(4) (2018): 195-202. https://pubmed.ncbi.nlm.nih.gov/30906922/.

Henriksen, A. W. (2022). Dataset of fitness trackers and smartwatches to measuring physical activity in research. BMC Research Notes 15.1, 1-3. https://pubmed.ncbi.nlm.nih.gov/35842728/

Science for Sport. (n.d.). Science for Sport. Retrieved from GPS (Wearables): Part 1 – Technology, validity, and reliability: https://www.scienceforsport.com/gps-wearables-part-1-technology-validity-and-reliability3/

Terra API. (2022, 04 22). Terra API. Retrieved from Barometer: list of wearables that contain barometers: https://blog.tryterra.co/barometer-list-of-wearables-that-contain-barometers-76a02563906c

Deliverable 2: a description of all data sources

Dataset 1: Fitbit Fitness Tracker Data

Description of dataset: “This dataset generated by respondents to a distributed survey via Amazon Mechanical Turk between 03.12.2016-05.12.2016. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. Individual reports can be parsed by export session ID (column A) or timestamp (column B). Variation between output represents use of different types of Fitbit trackers and individual tracking behaviors / preferences.” (Mobius, 2021).

Data downloaded at: https://www.kaggle.com/datasets/arashnic/fitbit

Descriptions of the datasets:

File name	Description
dailyActivity_merged.csv	941 rows, 15 columns: Id, ActivityDate (date), TotalDistance (number), TrackerDistance(number), LoggedActivity (number), VeryActive distance (number), ModeratelyActive distance (number), LightlyActive distance (number), SedentaryActiveDistance (number), VeryActive minutes (number), FairlyActiveMinutes (number), LightlyActiveMinutes (number), SedentaryActiveMinutes (number), Calories
minuteCaloriesNarrow_merged.csv	1,048,576 rows, 3 columns: ID, ActivityMinute (datetime), Calories (number)
minuteCaloriesWide_merged.csv	21,646 rows, 62 columns: ID, ActivityHour (datetime), Calories00: Calories59 (number)
hourlyCalories_merged.csv	21,646 rows, 62 columns: ID, ActivityHour (datetime), Calories00: Calories59 (number)
dailyCalories_merged.csv	941 rows, 3 columns: ID, ActivityDay (date), calories (number)
heartrate_seconds_merged.csv	1,048,576 rows, 3 columns: ID, time (datetime); Value (number)
minuteIntensitiesWide_merged.csv	21,646 rows, 62 columns: ID, ActivityHour (datetime), Intensity: Intensity59 (number)
minuteIntensitiesNarrow_merged.csv	1,048,575 rows, 3 columns: ID, ActivityMinute (datetime), Intensity (number)
hourlyIntensities_merged.csv	22,100 rows, 4 columns: ID, ActivityHour (datetime), TotalIntensity (number), AverageIntensity (number)
dailyIntensities_merged.csv	941 rows, 10 columns: ID, ActivityDay (date), SedentaryMinutes (number), LightlyActiveMinutes (number), FairlyActiveMinutes (number), VeryActiveMinutes (number), SedentaryActiveDistance (number), LightlyActive distance (number), ModeratelyActive distance (number), VeryActive distance (number)
minuteMETsNarrow_merged.csv	1,048,576 rows, 3 columns: ID, ActivityMinutes (datetime), METS (number
minuteSleep_merged.csv	188,522 rows, 4 columns: ID, Date(datetime), Value (number), logid (number)
sleepDay_merged.csv	414 rows, 5 columns: Id, SleepDay (date), TotalSleepRecords (number), TotalMinutesAsleep (number), TotalTimeInBed (number)
dailySteps_merged.csv	941 columns, 3 rows: ID, ActivityDay (Date), StepTotal (number)
minuteStepsNarrow_merged.csv	1,048,576 rows, 3 columns: ID, ActivityMinute (datetime), Steps (number)
minuteStepsWide_merged.csv	21,646 rows, 62 columns: ID, ActivityHour (datetime), Steps00: Steps59 (number)
hourlySteps_merged.csv	22,100 Rows, 3 columns: ID, ActivityHour (datetime), StepTotal (number)
weightLogInfo_merged.csv	67 rows, 8 columns: ID (number), Date (character), WeightKg (number), WeightPounds (number), Fat(number), BMI(number), IsManualReport (logical),LogId (number)

Dataset 2: Worldwide Survey of Fitness Trends Data

Description of data: “For the last 14 years, the editors of ACSM’s Health & Fitness Journal® (FIT) have circulated an electronic survey to thousands of professionals around the world to determine health and fitness trends for the following year….The first survey (1), conducted in 2006 (for predictions in 2007), introduced a systematic way to forecast health and fitness trends, and these surveys have been conducted annually since that time (2–13) using the same methodology. As this is a survey of trends, respondents were asked to first make the very important distinction between a”fad” and a “trend.” (Thompson, 2019).

Data downloaded at: https://links.lww.com/FIT/A133

Description of the dataset:

File name	Description
worldwide_fitness_data_trends.csv	13 rows (2 blank rows), 14 columns: 2007 (character), 2008 (character), 2009 (character), 2010 (character), 2011 (character), 2012 (character), 2013 (character), 2014 (character), 2015 (character), 2016 (character), 2017 (character), 2018 (character), 2019 (character), 2020 (character)

Dataset 3: Dataset of Fitness Trackers and Smartwatches to Measuring Physical Activity in Research

Description of dataset: Wearables information was extracted from from online and offline databases and websites. Twelve attributes- wearable name, company/brand name, release year, country of origin, whether the wearable was crowd funded, form factor (fitness tracker or smartwatch), and type of sensor- were collected. Sensor technology supported in fitness tracking devices included accelerometer, magnetometer, gyroscope, altimeter or barometer, global-positioning-system (gps), and optical pulse sensor (i.e., photoplethysmograph (Henriksen, Woldaregay, Muzny, Hopstock, & Grimsgaard, 2022).

Data downloaded from https://doi.org/10.18710/6ZWC9Z

Descriptions of the data is below:

File name	Description
fitness_trackers.csv	424 rows, 12 columns: Company name, Device name, Crowd funded (logical), Country of origin, release year (year), Form factor, Accelerometer (logical), Gyroscope (logical), Magnetometer (logical), Barometer (logical), GPS (logical), PPG (logical)

STEP 3: PROCESS

Prepare data for analyses.

Datasets will be cleaned and processed in R. This will ensure data integrity is not compromised.

To ensure data integrity:

Data in R was verified against csv’s of datasets in Excel; counts of rows and columns were verified.

Deliverable 3: document the cleaning and data manipulation processes

Clean data steps:

Data was checked for duplicates; if applicable, duplicates were deleted
Data was checked to ensure no row labels were present
Data types were validated
Datasets were checked for incomplete/missing data rows; empty rows were removed, when applicable
Column headers were identified and standardized to lower letters; any spaces in names were removed
Date formats were standardized
Data types were verified and transformed, as needed, prior to analyses

Verify data is ready to use:

To ensure data integrity, data was loaded into R and verified against csv’s of datasets opened in Excel; counts of rows and columns were verified.

Cleaning process:

Dataset 1: Fitbit Fitness Tracker Data

The Fitbit data was downloaded as a series of csv files and imported into R for cleaning and initial analysis.

The following datasets were used: minuteCaloriesNarrow_merged.csv, heartrate_seconds_merged.csv, minuteIntensitiesNarrow_merged.csv, minuteMETsNarrow_merged.csv, minuteSleep_merged.csv, minuteStepsNarrow_merged.csv, dailyActivity_merged.csv, and sleep_day_merged.csv.

Prepare the environment

Note: add packages necessary for cleaning and processing data

  library(dplyr)
  library(ggplot2)
  library(janitor)
  library(lubridate)
  library(png)
  library(readr)
  library(readxl)
  library(skimr)
  library(stringr)
  library(tidyr)
  library(tidyverse)
  library(writexl)

Analysis 1: Fitbit minutes datasets

These dataset contain information about Fitbit users by the minute; they are long data.

minuteCaloriesNarrow_merged.csv
heartrate_seconds_merged.csv
minuteIntensitiesNarrow_merged.csv
minuteMETsNarrow_merged.csv
minuteSleep_merged.csv
minuteStepsNarrow_merged.csv

These datasets will be merged and analyzed over time to look for trends in when Fitbit users are using their devices.

Load the data:

Note: use readr()

  minuteCaloriesNarrow_merged <-  read_csv("minuteCaloriesNarrow_merged.csv")
  heartrate_seconds_merged <- read_csv("heartrate_seconds_merged.csv")
  minuteIntensitiesNarrow_merged <- read_csv("minuteIntensitiesNarrow_merged.csv")
  minuteMETsNarrow_merged <- read_csv("minuteMETsNarrow_merged.csv")
  minuteSleep_merged <- read_csv("minuteSleep_merged.csv")
  minuteStepsNarrow_merged <- read_csv("minuteStepsNarrow_merged.csv")

Review the dataset:

Notes: use tidyverse()

Column names:

  colnames(minuteCaloriesNarrow_merged)
  colnames(heartrate_seconds_merged)
  colnames(minuteIntensitiesNarrow_merged)
  colnames(minuteMETsNarrow_merged)
  colnames(minuteSleep_merged)
  colnames(minuteStepsNarrow_merged)

Head/Tail:

  head(minuteCaloriesNarrow_merged)
  tail(minuteCaloriesNarrow_merged)
  head(heartrate_seconds_merged)
  tail(heartrate_seconds_merged)
  head(minuteIntensitiesNarrow_merged)
  tail(minuteIntensitiesNarrow_merged)
  head(minuteMETsNarrow_merged)
  tail(minuteMETsNarrow_merged)
  head(minuteSleep_merged)
  tail(minuteSleep_merged)
  head(minuteStepsNarrow_merged)
  tail(minuteStepsNarrow_merged)

Summary: Fitbit minutes datasets:

#calories
summary(minuteCaloriesNarrow_merged)
ncol(minuteCaloriesNarrow_merged)
nrow(minuteCaloriesNarrow_merged)
#heartrate
summary(heartrate_seconds_merged)
ncol(heartrate_seconds_merged)
nrow(heartrate_seconds_merged)
#intensities
summary(minuteIntensitiesNarrow_merged)
ncol(minuteIntensitiesNarrow_merged)
nrow(minuteIntensitiesNarrow_merged)
#mets
summary(minuteMETsNarrow_merged)
ncol(minuteMETsNarrow_merged)
nrow(minuteMETsNarrow_merged)
#sleep
summary(minuteSleep_merged)
ncol(minuteSleep_merged)
nrow(minuteSleep_merged)
#steps
summary(minuteStepsNarrow_merged)
ncol(minuteStepsNarrow_merged)
nrow(minuteStepsNarrow_merged)

minuteCaloriesNarrow_merged.csv: contains 1325580 rows and 3 columns of data; Id: double; ActivityMinute: character; Calories: double.
heartrate_seconds_merged.csv: dataset contains 2483658 rows and 3 columns of data; Id: double; ActivityMinute: character; Calories: double.
minuteIntensitiesNarrow_merged.csv: dataset contains 1325580 rows and 3 columns of data; Id: double; ActivityMinute: character; Calories: double.
minuteMETsNarrow_merged.csv: dataset contains 1325580 rows and 3 columns of data; Id: double; ActivityMinute: character; Calories: double.
minuteSleep_merged.csv: The sleep dataset contains 188521 rows and 4 columns of data; Id: double; ActivityMinute: character; SleepValue: double; SleepLogId: double.
minuteStepsNarrow_merged.csv: The steps dataset contains 1325580 rows and 3 columns of data; Id: double; ActivityMinute: character; Calories: double.

Clean the data:

Standardize column names all Fitbit minutes datasets:

  colnames(minuteCaloriesNarrow_merged) <- c("id","activity_minute","calories")
  colnames(heartrate_seconds_merged) <- c("id","activity_minute","heart_rate")
  colnames(minuteIntensitiesNarrow_merged) <- c("id","activity_minute","intensity")
  colnames(minuteMETsNarrow_merged) <- c("id","activity_minute","mets")
  colnames(minuteSleep_merged) <- c("id","activity_minute","sleep_value", 
                                    "sleep_log_id")
  colnames(minuteStepsNarrow_merged) <- c("id","activity_minute","steps")

Merge Fitbit minutes datasets:

Note: leave all columns (all=TRUE); remove duplicates (no.dupe=TRUE). Merges of merge_7, merge_8, merge_9, merge_10, and fitbit_minutes were run once; these were time-consuming. All data will be merged to “fitbit”, exported as csv, and then “fitbit.csv” will be reloaded to prevent duplication of merges in RMD.

  merge_7 <- merge(minuteCaloriesNarrow_merged, heartrate_seconds_merged, 
                   by = c("id","activity_minute"), all = TRUE, no.dups = TRUE)
  
  merge_8 <- merge(minuteIntensitiesNarrow_merged, 
                   minuteMETsNarrow_merged, by = c("id","activity_minute"), 
                   all = TRUE, no.dups = TRUE)
  
  merge_9 <-   merge(minuteSleep_merged, 
                     minuteStepsNarrow_merged, by = c("id","activity_minute"),
                     all = TRUE, no.dups = TRUE)
  
  merge_10 <- merge(merge_7, merge_8, by = c("id","activity_minute"), 
                          all = TRUE, no.dups = TRUE)
  
  merge_10 <- merge(merge_7, merge_8, by = c("id","activity_minute"),
                    all = TRUE, no.dups = TRUE)
  
  fitbit_minutes <- merge(merge_9, merge_10, by = c("id","activity_minute"),
                    all = TRUE, no.dups = TRUE)

Write to csv:

fitbit_minutes will be written to a dataframe, exported as fitbit.csv

  fitbit <- data.frame(fitbit_minutes)
  write.csv(fitbit, file = "fitbit.csv")

Read fitbit.csv:

  fitbit <-  read_csv("fitbit.csv")

Check for NAs:

sum(is.na(fitbit))

There are 17410848 NA values in the fitbit dataset. As the dataset was merged from multiple long datasets, this is expected. NAs will be filtered when necessary.

Change activity_minute from character to datetime:

Note: use lubridate()

  fitbit$activity_minute <- mdy_hms(fitbit$activity_minute)

Review data:

  fitbit %>%
  group_by(id) %>%
  summarize(distinct_id = n_distinct(id))

There are 33 distinct ids in the Fitbit dataset.

Note: the Fitbit data author indicated that there were 30 unique users in the original Fitbit dataset.

DAILY ACTIVITY/ SLEEP MERGE

It would be helpful to understand the relationship between Fitbit users’ daily activity and their sleep. Toward this goal, the datasets dailyActivity_merged and sleep_day_merged will be cleaned, merged and compared.

Load the data:

Load the Fitbit dailyActivity_merged.csv dataset

  daily_activity <-  read_csv("dailyActivity_merged.csv")

Head/Tail:

  head(daily_activity)
  tail(daily_activity)

Review the dataset:

Column names:

  colnames(daily_activity)

Summary:

  summary(daily_activity)
  ncol(daily_activity)
  nrow(daily_activity)

daily_activity contains 940 rows and 15 columns.

Column names:

Id: dbl
ActivityDate: char
TotalSteps: dbl
TotalDistance: dbl
TrackerDistance: dbl
LoggedActivitiesDistance: dbl
VeryActiveDistance: dbl
ModeratelyActiveDistance: dbl
LightActiveDistance: dbl
SedentaryActiveDistance: dbl
VeryActiveMinutes: dbl
FairlyActiveMinutes: dbl
LightlyActiveMinutes: dbl
SedentaryMinutes: dbl
Calories: dbl

Remove duplicates:

Note: use dplyr()

  daily_activity <- daily_activity[!duplicated(daily_activity), ]

Standardize column names:

  colnames(daily_activity) <- c("id", "activity_date", "total_steps", "total_distance", "tracker_distance", "logged_activities_distance", "very_active_distance", "moderately_active_distance", "light_active_distance", "sedentary_active_distance", "very_active_minutes", "fairly_active_minutes",
"lightly_active_minutes", "sedentary_minutes", "calories")

Change activity_date from character to date:

Note: use lubridate()

  daily_activity$activity_date <- mdy(daily_activity$activity_date)
  str(daily_activity)

Export data

  write.csv(daily_activity, file = "daily_activity.csv")

SLEEP DAY

Load the data:

Load sleep_day_merged dataset

  sleep_day <-  read_csv("sleepDay_merged.csv")

Review the dataset:

Column names:

  colnames(sleep_day)

Summary:

  summary(sleep_day)
  ncol(sleep_day)
  nrow(sleep_day)

Sleep day contains 413 rows and 5 columns.

Columns:

Id: double
SleepDay: character
TotalSleepRecords
TotalMinutesAsleep
TotalTimeInBed: character

Remove duplicates:

Note: use dplyr()

  print(sleep_day)
  length(unique(sleep_day$id))

There are 3 unique user ids in the sleep day dataset.

1503960366: 25 records
2026352035: 5 records
2347167796: 1 record

This data will not provide meaningful analyse when combined with daily_activity dataset. No further action will be taken with this dataset. Dataset daily_activity will be evaluated on its own.

Dataset 2: Worldwide Survey of Fitness Trends Data

“Worldwide Survey of Fitness Trends for 2020” (Thompson, 2019) contains rankings of fitness trends, 2007- 2020.

This dataset was downloaded as a table in a Word document, copied to a csv file, and imported to R for cleaning.

Prepare the environment:

Note: packages necessary for cleaning and processing data previously loaded: dplyr, ggplot2, janitor, lubridate, png, readr, readxl, skimr, stringr, tidyr, tidyverse, writexl

Load the data:

Review the data:

Notes: use tidyverse()

Column names:

  colnames(fitness_trends)

Head/Tail:

  head(fitness_trends)
  tail(fitness_trends)

Summary:

  summary(fitness_trends)
  nrow(fitness_trends)
  ncol(fitness_trends)

The dataset contains 12 rows and 14 columns of data; column names, years in the original dataset, are written as character/ number combinations.

Clean the data:

Rename columns:

Note: use janitor()

colnames(fitness_trends) <- c('2007', '2008', '2009', '2010','2011', '2012', '2013', '2014','2015', '2016', '2017', '2018','2019', '2020')

Unpivot data:

Notes: use janitor()

fitness_trends_unpivoted <- fitness_trends %>%
  pivot_longer(cols = c('2007', '2008', '2009', '2010',
                        '2011', '2012', '2013', '2014',
                        '2015', '2016', '2017', '2018',
                        '2019', '2020'),
               names_to = "year",
               values_to = "category"
  )

Notes: fitness_trends_unpivoted contains 2 columns of data: one named ‘years’; the second named ‘categories’, containing names of fitness trends and their ranks in one column.

Split ‘category’ column into 2 columns:

Note: use stringr()

fitness_trends_unpivoted[c('rank', 'category')] <- str_split_fixed(fitness_trends_unpivoted$category, ". ", 2)

Check for NA data:

Notes: the fitness_trends_unpivoted datset contains blank rows; check to see if they are NA or blank

sum(is.na(fitness_trends_unpivoted))

There are 28 NAs in fitness_trends_unpivoted dataset.

Remove rows with empty data:

Notes: Filter for empty rows; rename as trends_in_fitness

trends_in_fitness <- fitness_trends_unpivoted[!(fitness_trends_unpivoted$category=="" | fitness_trends_unpivoted$rank==""),]

The trends_in_fitness dataset is now 140 rows of data and 3 columns.

Columns:

year
category
rank.

Change trends_in_fitness to a dataframe:

trends_in_fitness <- data.frame(trends_in_fitness)

Export trends_in_fitness as Excel file:

Note: use writexl()

Note: Prior to analysis, the following fitness trends categories from the original “Worldwide Survey of Fitness Trends for 2020” data table were combined in Tableau for analysis:

Body weight Training and Body Weight Training
Educated and experienced fitness professionals; Educated, Certified and Experienced Fitness Professionals; and Educated, Certified and Experienced Fitness Professionals
Exercise is Medicine and Exercise is Medicine (EIM)
Functional Fitness and Functional Fitness Training
Group Personal Training and Group personal Training
Personal Training and Personal training

Dataset 3: Dataset of Fitness Trackers and Smartwatches to Measuring Physical Activity in Research

Load the data:

  fitness_trackers <- read.csv('datasets/fitness_trackers.csv')

Review the data:

Note: use tidyverse()

Column names:

  colnames(fitness_trackers)

Head/Tail:

  head(fitness_trackers)
  tail(fitness_trackers)

Summary:

  summary(fitness_trackers)

The dataset contains information for wearable fitness trackers and smart watches produced between 2011 and 2017. There are 423 separate units contained in this dataset.

Clean the data:

Standardize column names:

Notes: use stringr()

  fitness_trackers_cleaned <- clean_names(fitness_trackers)
  colnames(fitness_trackers_cleaned) <- c('company_name', 'device_name', 'crowd_funded', 'country_of_origin',
    'release_year', 'form_factor', 'accelerometer', 'gyroscope',
    'magnetometer', 'barometer', 'gps', 'ppg')

Change release_year from a number to a factor:

  factor(fitness_trackers_cleaned$'release_year')
  fitness_trackers_cleaned$'release_year' <-          factor(fitness_trackers_cleaned$'release_year')

Check for and remove duplicate data:

Notes: use dplyr()

 fitness_trackers_cleaned[!duplicated(fitness_trackers_cleaned$device_name), ]
 length(unique(fitness_trackers_cleaned$device_name))

There are 411 unique devices in the dataset.

Save data to csv:

Note: Use writexl()

Convert fitness_trackers_cleaned from wide to long data

Note: use dplyr(), tidyr()

The fitness tracker dataset is wide; it will be converted to long data for analysis in Tableau.

**Unpivot data for ‘accelerometer’, ‘gyroscope’,‘magnetometer’, ‘barometer’, ‘gps’, ‘ppg’*to analyze for presence of technologies**

  fitness_trackers_unpivoted <- fitness_trackers_cleaned %>%
    pivot_longer(cols = c('accelerometer', 'gyroscope',
                 'magnetometer', 'barometer', 'gps', 'ppg'),
        names_to = "technology_type",
        values_to = "value"
        )

Export data:

write_xlsx(fitness_trackers_unpivoted, "datasets\\fitness_trackers_unpivoted.xlsx")

STEP 4: ANALYZE

Analyze data.

Datasets were analyzed in R and Tableau. Results in R are contained within this document. Links to Tableau analyses are available throughout this document.

Dataset 1: Fitbit Fitness Tracker Data

Analysis 1: Fitbit minutes datasets

Data for fitbit was exported as csv files, analyzed, and visualized in Tableau. Results are available at https://public.tableau.com/views/FitbitActivityMinutes/FitbitActivityMinutes2?:language=en-US&:display_count=n&:origin=viz_share_link and https://public.tableau.com/views/FitbitAppUSE/CALORIESHEARTRATESLEEP?:language=en-US&:display_count=n&:origin=viz_share_link.

Findings include:

Fitbit app data was recorded for 33 study members between April 11 and May 12, 2016.
Use of all apps fell consistently between April 11 and May 12 2016
33 study members used Fitbit apps on April 11th, the first day of the study
- 19 study members used Fitbit apps on May 12th, the last day of the study
There was no discernible difference between weekday and weekend use
- 32 study members used Fitbit apps on Saturdays, Sundays, and Mondays
- 33 study members used Fitbit apps on Tuesdays, Wednesdays, Thursdays, and Fridays
There was no discernible difference among hourly Fitbit usage
- 33 study members used Fitbit apps during all hours of the day
Daily usage for intensity, METs, and steps apps results are identical across categories
- It is likely that technology/technologies used to relay these results are related
- Highest number of intensity, METs, and steps apps sleep app users: 33
- Lowest number of intensity, METs, and steps apps sleep app users: 19
17 study members used Fitbit sleep apps
- Use of the app was inconsistent over time
- Highest number of daily sleep app users: 17
- Lowest number of daily sleep app users: 8
- Day with lowest average number of users: Tuesday
- Day with highest number of sleep app users: Friday

Analysis 2: Daily Activity and Daily Sleep

Daily Activity data was analyzed in R and Tableau.

R Analysis:

Summary for daily_activity:

  summary(daily_activity)

Summarized daily_activity dataset:

  summary(daily_activity)
  ncol(daily_activity)
  nrow(daily_activity)
  n_distinct(daily_activity$id)

Median minutes by date and activity type:

  daily_activity %>% group_by(activity_date) %>%
    summarise('median_sedentary_minutes' = median(sedentary_minutes), 
              'median_lightly_active_minutes' = median(lightly_active_minutes),
              'median_fairly_active_minutes' = median(fairly_active_minutes),
              'median_very_active_minutes' = median(very_active_minutes),
              'max_sedentary_minutes' = max(sedentary_minutes), 
              'max_lightly_active_minutes' = max(lightly_active_minutes),
              'max_fairly_active_minutes' = max(fairly_active_minutes),
              'max_very_active_minutes' = max(very_active_minutes),
              'min_sedentary_minutes' = min(sedentary_minutes), 
              'min_lightly_active_minutes' = min(lightly_active_minutes),
              'min_fairly_active_minutes' = min(fairly_active_minutes),
              'min_very_active_minutes' = min(very_active_minutes)
    ) %>% print(n = 31)

Finding include:

sedentary minutes:
- minumum sedentary minutes by day ranged between 0 (last day of study) and 706 (first day of study); low number of minutes may indicate that the app was not in use.
- maximum sedentary minutes by day was 1440 minutes (24 hours) for 30 study days and 1375 minutes (22.9 hours) for 1 study day; high numbers may indicate anomaly in data.
- median sedentary minutes by day ranged between 721 (12 hours; last day of study) and 1113 (18.6 hours, day 21 of study).
lightly active minutes:
- minumum lightly active minutes by day was 0 for 30 days of the study and 51 for 1 study day; high number of lightly active days may indicate anomaly in data.
- maximum lightly active minutes by day range between 326 (day 27) and 518 (day 5).
- median lightly active minutes by day ranged between 68 (day 31) and 233 (day 19).
fairly active minutes:
- minimum fairly active minutes by day was 0 for 31 days of the study.
- maximum fairly active minutes by day ranged between 143 (day 21) and 16 (day 31).
- median fairly active minutes by day ranged between 0 (days 3 and 31) and 12.5 (day 29).
very active minutes:
- minimum very active minutes by day was 0 for 31 days of the study.
- maximum very active minutes by day ranged between 28 (day 31) and 210 (day 13).
- median very active minutes by day ranged between 0 (days 4, 23, 26, 27, 31) and 18 (day 22).

The Tableau analysis can be viewed at: https://public.tableau.com/views/FitbitActivityMinutes/FitbitActivityMinutes?:language=en-US&publish=yes&:display_count=n&:origin=viz_share_link.

Findings include:

Total activity minutes recorded for all Fitbit users between April 12 and May 12, 2016 trended downward over time.
- Maximum recorded minutes: 41,427, April 12, 2016
- Minumum recorded minutes: 15,900, May 12, 2016
Daily percentage of sedentary minutes ranged between 79 and 86 % of Fitbit minutes recorded.
Daily percentage of lightly active minutes ranged between 13 and 19% of Fitbit minutes recorded.
Daily percentage of moderately active minutes ranged between 0.3 and 1.6 % of Fitbit minutes recorded.
- Percent of moderately active minutes increased over time.
Daily percentage of very active minutes raged between 0.6 and 2.3 % of Fitbit minutes recorded.

TRENDS IN FITNESS ANALYSIS

The trends_in_fitness dataset was visualized in Tableau.

The Tableau visualization can be viewed at: https://public.tableau.com/views/TopTrendsinFitness2007-2020/Fitnesstrends2007-2020?:language=en-US&:display_count=n&:origin=viz_share_link

Findings include:

“Wearable technologies” was the top-rated fitness trend for four years between 2016 and 2020:
- 2016 : ranked #1
- 2017: ranked #1
- 2018 ranked #3
- 2019: ranked #1
- 2020: ranked #1

Other top 5 fitness trends since 2016 include:
- body weight training
- group personal training
- high intensity interval training
- strength training
- free weight training
- fitness training for older adults

Past trends that no longer rank among the top 5:
- employing fitness professionals
- strength training
- exercise and weight loss
- children and obesity

Analysis 3: Dataset of Fitness Trackers and Smartwatches to Measuring Physical Activity in Research

The fitness_trackers_cleaned dataset was analyzed in both R and Tableau.

R Analyses

Graphing technologies in wearable by year and country:

What are the trends in technologies in wearables?

Over time?
Between countries that produce wearables?

Dataset was graphed by technology types found in wearable watches and trackers by country and year. Notes: use ggplot2(), tidtr(); data where release_year is NA will be discounted

Set up graph:

  a <- ggplot(data=subset(fitness_trackers_cleaned, !is.na(release_year)), aes(x=release_year))+
    scale_fill_manual(values = c("FALSE"= "grey", "TRUE" = "darkgreen"))

Plot number of watches and number of trackers over time:

Notes: use ggplot2(); remove values where year= NA; this will show numbers of watches versus number fitness trackers on the market between 2011 and 2017.

  ggplot(data=subset(fitness_trackers_cleaned, !is.na(release_year)), aes(x=release_year, fill=form_factor))+
           geom_bar() +
  facet_wrap(~form_factor)+
    labs(title = "Popularity of Fitness Tracker Types", subtitle = "2011-2017", x = "year", y ="", fill= "tracker or watch")+
    theme(legend.position = "left")+
    scale_fill_manual(values = c("tracker"= "lightblue", "watch" = "lightgreen"))

  form_factor_data <- fitness_trackers_cleaned %>%
    group_by(form_factor,country_of_origin) %>%
    tally()

Between 2011 and 2017:
- Watches were 56% (236/423) of wearable units produced.
- Trackers were 44% (187/423) of wearables produced.
In 2015, production of wearable technologies was at its highest, with 49 trackers and 72 watches produced.
By 2017, production of wearable products dropped to 13 trackers and 25 watches.

Technology type: accelorometer

Total accelerometer units installed in wearables by year

  a + geom_bar(aes(fill= accelerometer))+
    labs(title = "Use of Accelerometer Technology in Fitness Trackers and Watches",
         subtitle = "2011- 2017", x = "year", y ="", fill= "accelerometer installed")

Summary of accelerometer data

  # Notes: use dplyr()
  accelerometer_data <- fitness_trackers_cleaned %>%
    filter(accelerometer ==TRUE) %>%
    group_by(country_of_origin, form_factor) %>%
    tally()

Between 2011 and 2017, 423/423 (100%) of wearable units produced contained accelorometer technology.
USA (88 trackers, 76 watches) and China (15 trackers, 71 watches) produced the most units containing accelerometer technology.
Accelerometer technology was contained in 100% 187/187 of trackers and 100% 236/236 of watches produced.
In 2015, production of units with accelerometer technology was at its highest (121 units).
By 2017, production dropped to 38 units with accelerometer technology.

Technology type: barometer

Total barometer units installed in wearables by year

  a <- ggplot(data=subset(fitness_trackers_cleaned, !is.na(release_year)), aes(x=release_year))+
    scale_fill_manual(values = c("FALSE"= "gray", "TRUE" = "darkblue"))

  a + geom_bar(aes(fill= barometer))+
    labs(title = "Use of Barometer Technology in Fitness Trackers and Watches",
         subtitle = "2011- 2017", x = "year", y ="", fill= "barometer installed")

Summary of barometer data

  # Notes: use dplyr()
  barometer_data <- fitness_trackers_cleaned %>%
    filter(barometer ==TRUE) %>%
    group_by(country_of_origin, form_factor) %>%
    tally()

Between 2011 and 2017, 13% (57/423) of units produced contained barometer technology.
USA (4 trackers, 21 watches: 44% of units produced) and China (11 watches: 19% of units produced) produced the most units containing barometer technology.
Barometer technology was installed in 3% (6/187) of trackers and 22% (51/236) of watches produced.
In 2016, production of units with barometer technology was at its highest, 20 products.
In 2017, production dropped to 12 products with barometer technology.

Technology type: gps

Total gps units installed in wearables by year

    a <- ggplot(data=subset(fitness_trackers_cleaned, !is.na(release_year)), aes(x=release_year))+
    scale_fill_manual(values = c("FALSE"= "grey", "TRUE" = "lightgreen"))

  a + geom_bar(aes(fill= gps))+
    labs(title = "Use of GPS Technology in Fitness Trackers and Watches",
         subtitle = "2011- 2017", x = "year", y ="", fill= "gps installed")

Summary of gps data

  # Notes: use dplyr()
  gps_data <- fitness_trackers_cleaned %>%
    filter(gps ==TRUE) %>%
    group_by(country_of_origin, form_factor) %>%
    tally()

Between 2011 and 2017, 30% (125/423) of units produced contained gps technology.
USA (8 trackers, 44 watches: 12% of units produced) and China (1 tracker, 23 watches: 6% of units produced) produce the most units containing gps technology.
Gps technology was installed in 9% (17/187) of trackers and 46% (108/236) of watches produced.
In 2016, production of units with gps technology was at its highest, 40 products.
In 2017, production dropped to 19 products with gps technology.

Technology type: gyroscope

Total gyroscope units installed in wearables by year

  a <- ggplot(data=subset(fitness_trackers_cleaned, !is.na(release_year)), aes(x=release_year))+
    scale_fill_manual(values = c("FALSE"= "grey", "TRUE" = "red"))
  a + geom_bar(aes(fill= gyroscope))+
    labs(title = "Use of Gyroscope Technology in Fitness Trackers and Watches",
         subtitle = "2011- 2017", x = "year", y ="", fill= "gyroscope installed")

Summary of gyroscope data

# Notes: use dplyr()
  gyroscope_data <- fitness_trackers_cleaned %>%
    filter(gyroscope ==TRUE) %>%
    group_by(country_of_origin, form_factor) %>%
    tally()

Between 2011 and 2017, 24% (101/423) of units produced contained gyroscope technology.
USA (12 trackers, 21 watches: 8% of units produced), China (1 tracker, 20 watches: 5% of units produced), and South Korea (0 trackers, 17 watches: 4% of units produced) produce the most units containing gyroscope technology.
Gyroscope technology was installed in 17% (17/187) of trackers and 36% 84/236 of watches produced.
In 2016, production of units with gyroscope technology was at its highest, 33 products.
In 2017, production dropped to 15 products with gyroscope technology.

Technology type: magnetometer

Total magnetometer units installed in wearables by year

Summary of magnetometer data

Between 2011 and 2017, 18% (77/423) of units produced contained magnetometer technology.
USA (4 trackers, 26 watches: 7% of units produced) and China (1 tracker, 12 watches: 3% of units produced) produced the most units containing magnetometer technology.
Magnetometer technology was installed in 4% (8/187) of trackers and 29% (69/236) of watches produced.
In 2016, production of units with magnetometer technology was at its highest, 22 products.
In 2017, production dropped to 13 products with magnetometer technology.

Technology type: ppg

Total ppg units installed in wearables by year

Summary of ppg data

Between 2011 and 2017, 38% (162/423) of units produced contained ppg technology.
USA (25 trackers, 28 watches: 13% of units produced), China (5 trackers, 34 watches: 9% of units produced), and South Korea (0 trackers, 13 watches: 3% of units produced) produced the most units containing ppg technology.
Ppg technology was installed in 31% (59/187) of trackers and 44% (103/236) of watches produced.
In 2016, production of units with ppg technology was at its highest, 66 products.
In 2017, production dropped to 27 products with ppg technology.

Tableau analysis is available at: https://public.tableau.com/views/WearableFitnessTrackers/Technologiesinwearables?:language=en-US&:display_count=n&:origin=viz_share_link

Among wearable technologies produced between 2011 and 2017, more watches were produced than trackers.

US produced 39% of wearable watches and trackers (164/423); China produced 17% (71/423) of wearable technologies.

100% of wearable technologies produced contain accelerometer technology.

38% of wearable units contain ppg technology.

The number of wearable products on the market dropped drastically between 2015 and 2017.

Deliverable 4: a summary of analysis

Wearable fitness technologies consistently trends as a top fitness trend.

Bellabeat should market toward other top fitness trends including personal training, high intensity interval training (HIIT), and fitness programs for older adults.

Among wearable technologies produced between 2011 and 2017, more watches were produced than trackers.

Bellabeat should focus on developing their watch lines.

US produced 39% of wearable watches and trackers (164/423).

Bellabeat should focus on the US market.

100% of wearable technologies produced contain accelerometer technology; 38% of wearable units contain ppg technology.

As ppg technolgy offers advantages in monitoring heart rate and blood oxygen saturation, Bellabeat should consider including ppg sensors in future models.

Fitbit use fell off over time; as the sample size for the study was limited, it is difficult to determine whether this is true for all fitness tracking technologies.

As the Bellabeat fitness watch is stylish and can be worn regularly, Bellabeat should stress this advantage over other fitness trackers on their market.

The number of fitness products on the market decreased 59% between 2016 and 2017.

It may be that use of wearables is decreasing.
Alternatively, the number of wearable products on the market may be shrinking, while sales of more popular devices is increasing.
Further investigation of the dataset for completeness and/or market changes around wearable technologies after 2016 is warranted.

Activity minutes recorded for Fitbit users were predominantly sedentary minutes.

Bellabeat should focus marketing on health advantages of light and moderate exercise for Bellabeat users.

Fitbit app users tracked METs, steps, and intensities data more than other datasets.

Bellabeat should focus their marketing on walking, running, and training activities.

Fitbit users did not regularly use their sleep app.

This may be a market where Bellabeat can gain users.
Bellabeat should market Bellabeat Time for its sleep and meditation apps.

Bellabeat Case Study

Donna Thompson

2023-03-02

HOW CAN A WELLNESS TECHNOLOGY COMPANY PLAY IT SMART?

Google Data Analytics Professional Certificate Capstone Project

Scenario

STEP 1: ASK

Deliverable 1: a clear statement of the business task

STEP 2: PREPARE

Deliverable 2: a description of all data sources

STEP 3: PROCESS

Deliverable 3: document the cleaning and data manipulation processes

Analysis 1: Fitbit minutes datasets

Export data

STEP 4: ANALYZE

Analysis 1: Fitbit minutes datasets

Analysis 2: Daily Activity and Daily Sleep

Technology type: accelorometer

Total accelerometer units installed in wearables by year

Summary of accelerometer data

Technology type: barometer

Total barometer units installed in wearables by year

Summary of barometer data

Technology type: gps

Total gps units installed in wearables by year

Summary of gps data

Technology type: gyroscope

Total gyroscope units installed in wearables by year

Summary of gyroscope data

Technology type: magnetometer

Total magnetometer units installed in wearables by year

Summary of magnetometer data

Technology type: ppg

Total ppg units installed in wearables by year

Summary of ppg data

Deliverable 4: a summary of analysis