This case study is the final project in the Google Data Analytics Professional Certificate program. During the study, I will be assuming the role of junior data analyst for Bellabeat, a company that produces high-tech wellness products for women.
Bellabeat believes that analysis of fitness technology data will help them gain insight into the the wellness technology market. I was asked to analyze smart device data to better understand the consumers use their smart devices.
I was provided a small dataset of fitness tracker information for analysis. I was also asked to look for additional datasets that may help support Bellabeat’s consumer analysis.
I have been asked to document my processing and provide five deliverables. Toward this goal, I will follow the steps outlined for the project. I will also include code chunks that document my analysis.
What is the business problem you are trying to solve?
The BellaBeat team is looking for information about how people use their smart devices. They want to understand how they can analyze their in-house data to better understand their own customers. Additionally, they are looking for high-level recommendations about how trends in smart device usage can direct their marketing strategy.
BellaBeat has provided us with Fitbit data as a starting point toward their goal; they understand that this dataset has limitations. They would like us to find other applicable datasets, if possible, and analyze them.
Identifying the business task:
Identifying the audience:
Identifying the business goal:
Analyze how customers are using fitness technologies to assess:
Identify and prepare datasets.
Data used for analysis should be reliable, original, comprehensive, current, and cited, or ROCCC.
Three datasets were identified for analyses. They will be evaluated for ROCC standards prior to analysis.
Dataset 1: Fitbit Fitness Tracker Data
The datasets, downloaded at: https://www.kaggle.com/datasets/arashnic/fitbit, contains voluntary submissions from Fitbit users. Data categories in the dataset include activity, calories, heart rate, intensities, MET, sleep, and steps. Datasets are not representative samples of Fitbit users. Datasets may be biased. Descriptions for categorical data is not available, and in some circumstances, it is unclear what is being measured.
ROCCC: Analysis of Fitbit Fitness Tracker Data
| Measure | Description |
|---|---|
| Reliable | • Dataset is limited to voluntary contributions by Fitbit users and is not complete |
| • No way to check if data is unbiased | |
| • Dataset not vetted | |
| • Data not proven fit for use | |
| Original | • Data not validated with the original source |
| Comprehensive | • Metadata not available |
| • Descriptions of columns not available for any datasets; in some instances, it is unclear what the data measures | |
| • Large dataset will allow us to evaluate how different user are using Fitbits and for how long | |
| Current | • Dataset was collected in 2016 and is not current |
| Cited | • Dataset has not been cited |
Although the Fitbit Fitness Tracker Data does not meet ROCC specifications, Bellabeat requested analysis of this data. Analysis will be restricted to data categories that are unambiguous and easily identified.
Reference for Fitbit Fitness Tracker Data
Mobius. “FitBit Fitness Tracker Data.” Kaggle. 2021. 14 02 2023. https://www.kaggle.com/datasets/arashnic/fitbit.
Dataset 2: Worldwide Survey of Fitness Trends Data
The dataset, downloaded at https://links.lww.com/FIT/A133, includes data for the top 20 fitness trend between 2007 and 2020 from surveys performed by ACSM’s Health & Fitness Journal®. The 2020 survey was sent to 56,746 academic and health and wellness professionals, and included 3,067 survey responses (6%) (Thompson, 2019).
The data is available as a Word document table; it was copied to Excel and saved as a csv file.
ROCCC: Analysis of Worldwide Fitness Trends dataset
| Measure | Description |
|---|---|
| Reliable | • Dataset contains compilation of ranking of fitness trends, 2009- 2020 and is complete |
| • Dataset is voluntary responses to a survey and may not be unbiased (Mobius, 2021) | |
| • Dataset is vetted; survey respondents include identified members of the health fitness industry, academic professionals, and was also made available to online respondents (Thompson, 2019) | |
| • Data is fit for use | |
| Original | • Data is original |
| Comprehensive | • Metadata available |
| • Limited dataset | |
| Current | • Dataset was collected between 2007 and 2020 and is current |
| Cited | • Dataset has been cited 260 times |
The Analysis of Worldwide Fitness Trends dataset meets ROCC standards, so it will be used in analyses.
Reference for Worldwide Survey of Fitness Tracker Data
Thompson, W. R. (2019). Worldwide survey of fitness trends for 2020. ACSM’s Health and Fitness Journal 26(3), 10-18. Data downloaded 2023-02-05 from https://journals.lww.com/acsm-healthfitness/Fulltext/2019/11000/WORLDWIDE_SURVEY_OF_FITNESS_TRENDS_FOR_2020.6.aspx
Dataset 3: Dataset of Fitness Trackers and Smartwatches to Measuring Physical Activity in Research
This dataset contains information about 423 wearable technologies, including fitness trackers and watches, extracted from six databases, and the technologies installed in them. Fitness tracker devices in the dataset were released between 2011 and 2017. The authors verified extracted data against information available for wearable technologies on company websites.
Data was downloaded as a csv file from https://doi.org/10.18710/6ZWC9Z.
Some notes on wearable devices:
Wearable devices may contain one or more of the following technologies:
ROCCC: Analysis of Dataset of Fitness Trackers and Smartwatches to Measuring Physical Activity in Research Data
| Measure | Description |
|---|---|
| Reliable | • Information was checked against online sources and is reliable |
| • Dataset not vetted | |
| • Data fit for use | |
| Original | • Dataset is original |
| Comprehensive | • Metadata is available |
| • Descriptions of columns are available | |
| • Dataset is limited to online search of six databases and websites; it does not contain data for all fitness tracker and watches available between 2011 and 2016 , but is comprehensive | |
| Current | • Dataset was reported in 2020, but contains information about fitness trackers and watches on the market between 2011 and 2017 |
| Cited | • Dataset has not been cited |
The dataset was checked against information available for products in the dataset; many of the products are no longer available on the market.
It is important for BellaBeat to understand technologies installed in wearable fitness trackers and watches. Although the Dataset of Fitness Trackers and Smartwatches to Measuring Physical Activity in Research data is not current, it is representative of technologies available in fitness trackers and watches and comparable to BellaBeat product lines, so data will be used in analyses.
References cited:
Castaneda, D., et al. “A review on wearable photoplethysmography sensors and their potential future applications in health care.” Int J Biosens Bioelectron 4(4) (2018): 195-202. https://pubmed.ncbi.nlm.nih.gov/30906922/.
Henriksen, A. W. (2022). Dataset of fitness trackers and smartwatches to measuring physical activity in research. BMC Research Notes 15.1, 1-3. https://pubmed.ncbi.nlm.nih.gov/35842728/
Science for Sport. (n.d.). Science for Sport. Retrieved from GPS (Wearables): Part 1 – Technology, validity, and reliability: https://www.scienceforsport.com/gps-wearables-part-1-technology-validity-and-reliability3/
Terra API. (2022, 04 22). Terra API. Retrieved from Barometer: list of wearables that contain barometers: https://blog.tryterra.co/barometer-list-of-wearables-that-contain-barometers-76a02563906c
Dataset 1: Fitbit Fitness Tracker Data
Description of dataset: “This dataset generated by respondents to a distributed survey via Amazon Mechanical Turk between 03.12.2016-05.12.2016. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. Individual reports can be parsed by export session ID (column A) or timestamp (column B). Variation between output represents use of different types of Fitbit trackers and individual tracking behaviors / preferences.” (Mobius, 2021).
Data downloaded at: https://www.kaggle.com/datasets/arashnic/fitbit
Descriptions of the datasets:
| File name | Description |
|---|---|
| dailyActivity_merged.csv | 941 rows, 15 columns: Id, ActivityDate (date), TotalDistance (number), TrackerDistance(number), LoggedActivity (number), VeryActive distance (number), ModeratelyActive distance (number), LightlyActive distance (number), SedentaryActiveDistance (number), VeryActive minutes (number), FairlyActiveMinutes (number), LightlyActiveMinutes (number), SedentaryActiveMinutes (number), Calories |
| minuteCaloriesNarrow_merged.csv | 1,048,576 rows, 3 columns: ID, ActivityMinute (datetime), Calories (number) |
| minuteCaloriesWide_merged.csv | 21,646 rows, 62 columns: ID, ActivityHour (datetime), Calories00: Calories59 (number) |
| hourlyCalories_merged.csv | 21,646 rows, 62 columns: ID, ActivityHour (datetime), Calories00: Calories59 (number) |
| dailyCalories_merged.csv | 941 rows, 3 columns: ID, ActivityDay (date), calories (number) |
| heartrate_seconds_merged.csv | 1,048,576 rows, 3 columns: ID, time (datetime); Value (number) |
| minuteIntensitiesWide_merged.csv | 21,646 rows, 62 columns: ID, ActivityHour (datetime), Intensity: Intensity59 (number) |
| minuteIntensitiesNarrow_merged.csv | 1,048,575 rows, 3 columns: ID, ActivityMinute (datetime), Intensity (number) |
| hourlyIntensities_merged.csv | 22,100 rows, 4 columns: ID, ActivityHour (datetime), TotalIntensity (number), AverageIntensity (number) |
| dailyIntensities_merged.csv | 941 rows, 10 columns: ID, ActivityDay (date), SedentaryMinutes (number), LightlyActiveMinutes (number), FairlyActiveMinutes (number), VeryActiveMinutes (number), SedentaryActiveDistance (number), LightlyActive distance (number), ModeratelyActive distance (number), VeryActive distance (number) |
| minuteMETsNarrow_merged.csv | 1,048,576 rows, 3 columns: ID, ActivityMinutes (datetime), METS (number |
| minuteSleep_merged.csv | 188,522 rows, 4 columns: ID, Date(datetime), Value (number), logid (number) |
| sleepDay_merged.csv | 414 rows, 5 columns: Id, SleepDay (date), TotalSleepRecords (number), TotalMinutesAsleep (number), TotalTimeInBed (number) |
| dailySteps_merged.csv | 941 columns, 3 rows: ID, ActivityDay (Date), StepTotal (number) |
| minuteStepsNarrow_merged.csv | 1,048,576 rows, 3 columns: ID, ActivityMinute (datetime), Steps (number) |
| minuteStepsWide_merged.csv | 21,646 rows, 62 columns: ID, ActivityHour (datetime), Steps00: Steps59 (number) |
| hourlySteps_merged.csv | 22,100 Rows, 3 columns: ID, ActivityHour (datetime), StepTotal (number) |
| weightLogInfo_merged.csv | 67 rows, 8 columns: ID (number), Date (character), WeightKg (number), WeightPounds (number), Fat(number), BMI(number), IsManualReport (logical),LogId (number) |
Dataset 2: Worldwide Survey of Fitness Trends Data
Description of data: “For the last 14 years, the editors of ACSM’s Health & Fitness Journal® (FIT) have circulated an electronic survey to thousands of professionals around the world to determine health and fitness trends for the following year….The first survey (1), conducted in 2006 (for predictions in 2007), introduced a systematic way to forecast health and fitness trends, and these surveys have been conducted annually since that time (2–13) using the same methodology. As this is a survey of trends, respondents were asked to first make the very important distinction between a”fad” and a “trend.” (Thompson, 2019).
Data downloaded at: https://links.lww.com/FIT/A133
Description of the dataset:
| File name | Description |
|---|---|
| worldwide_fitness_data_trends.csv | 13 rows (2 blank rows), 14 columns: 2007 (character), 2008 (character), 2009 (character), 2010 (character), 2011 (character), 2012 (character), 2013 (character), 2014 (character), 2015 (character), 2016 (character), 2017 (character), 2018 (character), 2019 (character), 2020 (character) |
Dataset 3: Dataset of Fitness Trackers and Smartwatches to Measuring Physical Activity in Research
Description of dataset: Wearables information was extracted from from online and offline databases and websites. Twelve attributes- wearable name, company/brand name, release year, country of origin, whether the wearable was crowd funded, form factor (fitness tracker or smartwatch), and type of sensor- were collected. Sensor technology supported in fitness tracking devices included accelerometer, magnetometer, gyroscope, altimeter or barometer, global-positioning-system (gps), and optical pulse sensor (i.e., photoplethysmograph (Henriksen, Woldaregay, Muzny, Hopstock, & Grimsgaard, 2022).
Data downloaded from https://doi.org/10.18710/6ZWC9Z
Descriptions of the data is below:
| File name | Description |
|---|---|
| fitness_trackers.csv | 424 rows, 12 columns: Company name, Device name, Crowd funded (logical), Country of origin, release year (year), Form factor, Accelerometer (logical), Gyroscope (logical), Magnetometer (logical), Barometer (logical), GPS (logical), PPG (logical) |
Prepare data for analyses.
Datasets will be cleaned and processed in R. This will ensure data integrity is not compromised.
To ensure data integrity:
Clean data steps:
Verify data is ready to use:
To ensure data integrity, data was loaded into R and verified against csv’s of datasets opened in Excel; counts of rows and columns were verified.
Cleaning process:
Dataset 1: Fitbit Fitness Tracker Data
The Fitbit data was downloaded as a series of csv files and imported into R for cleaning and initial analysis.
The following datasets were used: minuteCaloriesNarrow_merged.csv, heartrate_seconds_merged.csv, minuteIntensitiesNarrow_merged.csv, minuteMETsNarrow_merged.csv, minuteSleep_merged.csv, minuteStepsNarrow_merged.csv, dailyActivity_merged.csv, and sleep_day_merged.csv.
Prepare the environment
Note: add packages necessary for cleaning and processing data
library(dplyr)
library(ggplot2)
library(janitor)
library(lubridate)
library(png)
library(readr)
library(readxl)
library(skimr)
library(stringr)
library(tidyr)
library(tidyverse)
library(writexl)
These dataset contain information about Fitbit users by the minute; they are long data.
These datasets will be merged and analyzed over time to look for trends in when Fitbit users are using their devices.
Load the data:
Note: use readr()
minuteCaloriesNarrow_merged <- read_csv("minuteCaloriesNarrow_merged.csv")
heartrate_seconds_merged <- read_csv("heartrate_seconds_merged.csv")
minuteIntensitiesNarrow_merged <- read_csv("minuteIntensitiesNarrow_merged.csv")
minuteMETsNarrow_merged <- read_csv("minuteMETsNarrow_merged.csv")
minuteSleep_merged <- read_csv("minuteSleep_merged.csv")
minuteStepsNarrow_merged <- read_csv("minuteStepsNarrow_merged.csv")
Review the dataset:
Notes: use tidyverse()
Column names:
colnames(minuteCaloriesNarrow_merged)
colnames(heartrate_seconds_merged)
colnames(minuteIntensitiesNarrow_merged)
colnames(minuteMETsNarrow_merged)
colnames(minuteSleep_merged)
colnames(minuteStepsNarrow_merged)
Head/Tail:
head(minuteCaloriesNarrow_merged)
tail(minuteCaloriesNarrow_merged)
head(heartrate_seconds_merged)
tail(heartrate_seconds_merged)
head(minuteIntensitiesNarrow_merged)
tail(minuteIntensitiesNarrow_merged)
head(minuteMETsNarrow_merged)
tail(minuteMETsNarrow_merged)
head(minuteSleep_merged)
tail(minuteSleep_merged)
head(minuteStepsNarrow_merged)
tail(minuteStepsNarrow_merged)
Summary: Fitbit minutes datasets:
#calories
summary(minuteCaloriesNarrow_merged)
ncol(minuteCaloriesNarrow_merged)
nrow(minuteCaloriesNarrow_merged)
#heartrate
summary(heartrate_seconds_merged)
ncol(heartrate_seconds_merged)
nrow(heartrate_seconds_merged)
#intensities
summary(minuteIntensitiesNarrow_merged)
ncol(minuteIntensitiesNarrow_merged)
nrow(minuteIntensitiesNarrow_merged)
#mets
summary(minuteMETsNarrow_merged)
ncol(minuteMETsNarrow_merged)
nrow(minuteMETsNarrow_merged)
#sleep
summary(minuteSleep_merged)
ncol(minuteSleep_merged)
nrow(minuteSleep_merged)
#steps
summary(minuteStepsNarrow_merged)
ncol(minuteStepsNarrow_merged)
nrow(minuteStepsNarrow_merged)
Clean the data:
Standardize column names all Fitbit minutes datasets:
colnames(minuteCaloriesNarrow_merged) <- c("id","activity_minute","calories")
colnames(heartrate_seconds_merged) <- c("id","activity_minute","heart_rate")
colnames(minuteIntensitiesNarrow_merged) <- c("id","activity_minute","intensity")
colnames(minuteMETsNarrow_merged) <- c("id","activity_minute","mets")
colnames(minuteSleep_merged) <- c("id","activity_minute","sleep_value",
"sleep_log_id")
colnames(minuteStepsNarrow_merged) <- c("id","activity_minute","steps")
Merge Fitbit minutes datasets:
Note: leave all columns (all=TRUE); remove duplicates (no.dupe=TRUE). Merges of merge_7, merge_8, merge_9, merge_10, and fitbit_minutes were run once; these were time-consuming. All data will be merged to “fitbit”, exported as csv, and then “fitbit.csv” will be reloaded to prevent duplication of merges in RMD.
merge_7 <- merge(minuteCaloriesNarrow_merged, heartrate_seconds_merged,
by = c("id","activity_minute"), all = TRUE, no.dups = TRUE)
merge_8 <- merge(minuteIntensitiesNarrow_merged,
minuteMETsNarrow_merged, by = c("id","activity_minute"),
all = TRUE, no.dups = TRUE)
merge_9 <- merge(minuteSleep_merged,
minuteStepsNarrow_merged, by = c("id","activity_minute"),
all = TRUE, no.dups = TRUE)
merge_10 <- merge(merge_7, merge_8, by = c("id","activity_minute"),
all = TRUE, no.dups = TRUE)
merge_10 <- merge(merge_7, merge_8, by = c("id","activity_minute"),
all = TRUE, no.dups = TRUE)
fitbit_minutes <- merge(merge_9, merge_10, by = c("id","activity_minute"),
all = TRUE, no.dups = TRUE)
Write to csv:
fitbit_minutes will be written to a dataframe, exported as fitbit.csv
fitbit <- data.frame(fitbit_minutes)
write.csv(fitbit, file = "fitbit.csv")
Read fitbit.csv:
fitbit <- read_csv("fitbit.csv")
Check for NAs:
sum(is.na(fitbit))
There are 17410848 NA values in the fitbit dataset. As the dataset was merged from multiple long datasets, this is expected. NAs will be filtered when necessary.
Change activity_minute from character to datetime:
Note: use lubridate()
fitbit$activity_minute <- mdy_hms(fitbit$activity_minute)
Review data:
fitbit %>%
group_by(id) %>%
summarize(distinct_id = n_distinct(id))
There are 33 distinct ids in the Fitbit dataset.
Note: the Fitbit data author indicated that there were 30 unique users in the original Fitbit dataset.
DAILY ACTIVITY/ SLEEP MERGE
It would be helpful to understand the relationship between Fitbit users’ daily activity and their sleep. Toward this goal, the datasets dailyActivity_merged and sleep_day_merged will be cleaned, merged and compared.
Load the data:
Load the Fitbit dailyActivity_merged.csv dataset
daily_activity <- read_csv("dailyActivity_merged.csv")
Head/Tail:
head(daily_activity)
tail(daily_activity)
Review the dataset:
Column names:
colnames(daily_activity)
Summary:
summary(daily_activity)
ncol(daily_activity)
nrow(daily_activity)
daily_activity contains 940 rows and 15 columns.
Column names:
Remove duplicates:
Note: use dplyr()
daily_activity <- daily_activity[!duplicated(daily_activity), ]
Standardize column names:
colnames(daily_activity) <- c("id", "activity_date", "total_steps", "total_distance", "tracker_distance", "logged_activities_distance", "very_active_distance", "moderately_active_distance", "light_active_distance", "sedentary_active_distance", "very_active_minutes", "fairly_active_minutes",
"lightly_active_minutes", "sedentary_minutes", "calories")
Change activity_date from character to date:
Note: use lubridate()
daily_activity$activity_date <- mdy(daily_activity$activity_date)
str(daily_activity)
write.csv(daily_activity, file = "daily_activity.csv")
SLEEP DAY
Load the data:
Load sleep_day_merged dataset
sleep_day <- read_csv("sleepDay_merged.csv")
Review the dataset:
Column names:
colnames(sleep_day)
Summary:
summary(sleep_day)
ncol(sleep_day)
nrow(sleep_day)
Sleep day contains 413 rows and 5 columns.
Columns:
Remove duplicates:
Note: use dplyr()
print(sleep_day)
length(unique(sleep_day$id))
There are 3 unique user ids in the sleep day dataset.
This data will not provide meaningful analyse when combined with daily_activity dataset. No further action will be taken with this dataset. Dataset daily_activity will be evaluated on its own.
Dataset 2: Worldwide Survey of Fitness Trends Data
“Worldwide Survey of Fitness Trends for 2020” (Thompson, 2019) contains rankings of fitness trends, 2007- 2020.
This dataset was downloaded as a table in a Word document, copied to a csv file, and imported to R for cleaning.
Prepare the environment:
Note: packages necessary for cleaning and processing data previously loaded: dplyr, ggplot2, janitor, lubridate, png, readr, readxl, skimr, stringr, tidyr, tidyverse, writexl
Load the data:
Review the data:
Notes: use tidyverse()
Column names:
colnames(fitness_trends)
Head/Tail:
head(fitness_trends)
tail(fitness_trends)
Summary:
summary(fitness_trends)
nrow(fitness_trends)
ncol(fitness_trends)
The dataset contains 12 rows and 14 columns of data; column names, years in the original dataset, are written as character/ number combinations.
Clean the data:
Rename columns:
Note: use janitor()
colnames(fitness_trends) <- c('2007', '2008', '2009', '2010','2011', '2012', '2013', '2014','2015', '2016', '2017', '2018','2019', '2020')
Unpivot data:
Notes: use janitor()
fitness_trends_unpivoted <- fitness_trends %>%
pivot_longer(cols = c('2007', '2008', '2009', '2010',
'2011', '2012', '2013', '2014',
'2015', '2016', '2017', '2018',
'2019', '2020'),
names_to = "year",
values_to = "category"
)
Notes: fitness_trends_unpivoted contains 2 columns of data: one named ‘years’; the second named ‘categories’, containing names of fitness trends and their ranks in one column.
Split ‘category’ column into 2 columns:
Note: use stringr()
fitness_trends_unpivoted[c('rank', 'category')] <- str_split_fixed(fitness_trends_unpivoted$category, ". ", 2)
Check for NA data:
Notes: the fitness_trends_unpivoted datset contains blank rows; check to see if they are NA or blank
sum(is.na(fitness_trends_unpivoted))
There are 28 NAs in fitness_trends_unpivoted dataset.
Remove rows with empty data:
Notes: Filter for empty rows; rename as trends_in_fitness
trends_in_fitness <- fitness_trends_unpivoted[!(fitness_trends_unpivoted$category=="" | fitness_trends_unpivoted$rank==""),]
The trends_in_fitness dataset is now 140 rows of data and 3 columns.
Columns:
Change trends_in_fitness to a dataframe:
trends_in_fitness <- data.frame(trends_in_fitness)
Export trends_in_fitness as Excel file:
Note: use writexl()
Note: Prior to analysis, the following fitness trends categories from the original “Worldwide Survey of Fitness Trends for 2020” data table were combined in Tableau for analysis:
Dataset 3: Dataset of Fitness Trackers and Smartwatches to Measuring Physical Activity in Research
Load the data:
fitness_trackers <- read.csv('datasets/fitness_trackers.csv')
Review the data:
Note: use tidyverse()
Column names:
colnames(fitness_trackers)
Head/Tail:
head(fitness_trackers)
tail(fitness_trackers)
Summary:
summary(fitness_trackers)
The dataset contains information for wearable fitness trackers and smart watches produced between 2011 and 2017. There are 423 separate units contained in this dataset.
Clean the data:
Standardize column names:
Notes: use stringr()
fitness_trackers_cleaned <- clean_names(fitness_trackers)
colnames(fitness_trackers_cleaned) <- c('company_name', 'device_name', 'crowd_funded', 'country_of_origin',
'release_year', 'form_factor', 'accelerometer', 'gyroscope',
'magnetometer', 'barometer', 'gps', 'ppg')
Change release_year from a number to a factor:
factor(fitness_trackers_cleaned$'release_year')
fitness_trackers_cleaned$'release_year' <- factor(fitness_trackers_cleaned$'release_year')
Check for and remove duplicate data:
Notes: use dplyr()
fitness_trackers_cleaned[!duplicated(fitness_trackers_cleaned$device_name), ]
length(unique(fitness_trackers_cleaned$device_name))
There are 411 unique devices in the dataset.
Save data to csv:
Note: Use writexl()
Convert fitness_trackers_cleaned from wide to long data
Note: use dplyr(), tidyr()
The fitness tracker dataset is wide; it will be converted to long data for analysis in Tableau.
**Unpivot data for ‘accelerometer’, ‘gyroscope’,‘magnetometer’, ‘barometer’, ‘gps’, ‘ppg’*to analyze for presence of technologies**
fitness_trackers_unpivoted <- fitness_trackers_cleaned %>%
pivot_longer(cols = c('accelerometer', 'gyroscope',
'magnetometer', 'barometer', 'gps', 'ppg'),
names_to = "technology_type",
values_to = "value"
)
Export data:
write_xlsx(fitness_trackers_unpivoted, "datasets\\fitness_trackers_unpivoted.xlsx")
Analyze data.
Datasets were analyzed in R and Tableau. Results in R are contained within this document. Links to Tableau analyses are available throughout this document.
Dataset 1: Fitbit Fitness Tracker Data
Data for fitbit was exported as csv files, analyzed, and visualized in Tableau. Results are available at https://public.tableau.com/views/FitbitActivityMinutes/FitbitActivityMinutes2?:language=en-US&:display_count=n&:origin=viz_share_link and https://public.tableau.com/views/FitbitAppUSE/CALORIESHEARTRATESLEEP?:language=en-US&:display_count=n&:origin=viz_share_link.
Findings include:
Daily Activity data was analyzed in R and Tableau.
R Analysis:
Summary for daily_activity:
summary(daily_activity)
Summarized daily_activity dataset:
summary(daily_activity)
ncol(daily_activity)
nrow(daily_activity)
n_distinct(daily_activity$id)
Median minutes by date and activity type:
daily_activity %>% group_by(activity_date) %>%
summarise('median_sedentary_minutes' = median(sedentary_minutes),
'median_lightly_active_minutes' = median(lightly_active_minutes),
'median_fairly_active_minutes' = median(fairly_active_minutes),
'median_very_active_minutes' = median(very_active_minutes),
'max_sedentary_minutes' = max(sedentary_minutes),
'max_lightly_active_minutes' = max(lightly_active_minutes),
'max_fairly_active_minutes' = max(fairly_active_minutes),
'max_very_active_minutes' = max(very_active_minutes),
'min_sedentary_minutes' = min(sedentary_minutes),
'min_lightly_active_minutes' = min(lightly_active_minutes),
'min_fairly_active_minutes' = min(fairly_active_minutes),
'min_very_active_minutes' = min(very_active_minutes)
) %>% print(n = 31)
Finding include:
The Tableau analysis can be viewed at: https://public.tableau.com/views/FitbitActivityMinutes/FitbitActivityMinutes?:language=en-US&publish=yes&:display_count=n&:origin=viz_share_link.
Findings include:
TRENDS IN FITNESS ANALYSIS
The trends_in_fitness dataset was visualized in Tableau.
The Tableau visualization can be viewed at: https://public.tableau.com/views/TopTrendsinFitness2007-2020/Fitnesstrends2007-2020?:language=en-US&:display_count=n&:origin=viz_share_link
Findings include:
Analysis 3: Dataset of Fitness Trackers and Smartwatches to Measuring Physical Activity in Research
The fitness_trackers_cleaned dataset was analyzed in both R and Tableau.
R Analyses
Graphing technologies in wearable by year and country:
What are the trends in technologies in wearables?
Dataset was graphed by technology types found in wearable watches and trackers by country and year. Notes: use ggplot2(), tidtr(); data where release_year is NA will be discounted
Set up graph:
a <- ggplot(data=subset(fitness_trackers_cleaned, !is.na(release_year)), aes(x=release_year))+
scale_fill_manual(values = c("FALSE"= "grey", "TRUE" = "darkgreen"))
Plot number of watches and number of trackers over time:
Notes: use ggplot2(); remove values where year= NA; this will show numbers of watches versus number fitness trackers on the market between 2011 and 2017.
ggplot(data=subset(fitness_trackers_cleaned, !is.na(release_year)), aes(x=release_year, fill=form_factor))+
geom_bar() +
facet_wrap(~form_factor)+
labs(title = "Popularity of Fitness Tracker Types", subtitle = "2011-2017", x = "year", y ="", fill= "tracker or watch")+
theme(legend.position = "left")+
scale_fill_manual(values = c("tracker"= "lightblue", "watch" = "lightgreen"))
form_factor_data <- fitness_trackers_cleaned %>%
group_by(form_factor,country_of_origin) %>%
tally()
a + geom_bar(aes(fill= accelerometer))+
labs(title = "Use of Accelerometer Technology in Fitness Trackers and Watches",
subtitle = "2011- 2017", x = "year", y ="", fill= "accelerometer installed")
# Notes: use dplyr()
accelerometer_data <- fitness_trackers_cleaned %>%
filter(accelerometer ==TRUE) %>%
group_by(country_of_origin, form_factor) %>%
tally()
a <- ggplot(data=subset(fitness_trackers_cleaned, !is.na(release_year)), aes(x=release_year))+
scale_fill_manual(values = c("FALSE"= "gray", "TRUE" = "darkblue"))
a + geom_bar(aes(fill= barometer))+
labs(title = "Use of Barometer Technology in Fitness Trackers and Watches",
subtitle = "2011- 2017", x = "year", y ="", fill= "barometer installed")
# Notes: use dplyr()
barometer_data <- fitness_trackers_cleaned %>%
filter(barometer ==TRUE) %>%
group_by(country_of_origin, form_factor) %>%
tally()
a <- ggplot(data=subset(fitness_trackers_cleaned, !is.na(release_year)), aes(x=release_year))+
scale_fill_manual(values = c("FALSE"= "grey", "TRUE" = "lightgreen"))
a + geom_bar(aes(fill= gps))+
labs(title = "Use of GPS Technology in Fitness Trackers and Watches",
subtitle = "2011- 2017", x = "year", y ="", fill= "gps installed")
# Notes: use dplyr()
gps_data <- fitness_trackers_cleaned %>%
filter(gps ==TRUE) %>%
group_by(country_of_origin, form_factor) %>%
tally()
a <- ggplot(data=subset(fitness_trackers_cleaned, !is.na(release_year)), aes(x=release_year))+
scale_fill_manual(values = c("FALSE"= "grey", "TRUE" = "red"))
a + geom_bar(aes(fill= gyroscope))+
labs(title = "Use of Gyroscope Technology in Fitness Trackers and Watches",
subtitle = "2011- 2017", x = "year", y ="", fill= "gyroscope installed")
# Notes: use dplyr()
gyroscope_data <- fitness_trackers_cleaned %>%
filter(gyroscope ==TRUE) %>%
group_by(country_of_origin, form_factor) %>%
tally()
Tableau analysis is available at: https://public.tableau.com/views/WearableFitnessTrackers/Technologiesinwearables?:language=en-US&:display_count=n&:origin=viz_share_link