I will be looking at data from other wellness trackers to determine how they are used so that Bellabeat can make informed decisions with theirs. My assumption is that the most common usage of wellness trackers is for individuals to keep track of how far they have walked in distance or steps.
For this project, I will be following the steps for analyzing data:
Bellabeat is a company that specializes in fitness products for women. Some of the main products are the app, wellness tracker, wellness watch, and water bottle that tracks water intake.
To analyze data on other smart device usage, specifically in fitness, to find trends that Bellabeat may apply to its marketing strategy as well as to best meet customer needs.
I will be using data from a public dataset that includes data from 30 Fitbit users. I will be focusing on the daily values and weight log for this data.
I need to install several packages to ensure that I can analyze the data.
install.packages("tidyverse")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("ggplot2")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("here")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("janitor")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("skimr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("plyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("dplyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("readr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("purrr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("tidyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("tibble")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.1 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
library(ggplot2)
library(here)
## here() starts at /cloud/project
library(janitor)
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(skimr)
library(plyr)
## ------------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## ------------------------------------------------------------------------------
##
## Attaching package: 'plyr'
##
## The following object is masked from 'package:here':
##
## here
##
## The following objects are masked from 'package:dplyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
##
## The following object is masked from 'package:purrr':
##
## compact
library(dplyr)
library(readr)
library(purrr)
library(tidyr)
library(tibble)
FitBit Fitness Tracker Data
library(readr)
activity <- read_csv("Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")
## Rows: 940 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDate
## dbl (14): Id, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDi...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(readr)
calories <- read_csv("Fitabase Data 4.12.16-5.12.16/dailyCalories_merged.csv")
## Rows: 940 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDay
## dbl (2): Id, Calories
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(readr)
intensities <- read_csv("Fitabase Data 4.12.16-5.12.16/dailyIntensities_merged.csv")
## Rows: 940 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDay
## dbl (9): Id, SedentaryMinutes, LightlyActiveMinutes, FairlyActiveMinutes, Ve...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(readr)
steps <- read_csv("Fitabase Data 4.12.16-5.12.16/dailySteps_merged.csv")
## Rows: 940 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDay
## dbl (2): Id, StepTotal
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(readr)
sleep_sep <- read_csv("sleepDay_merged.csv")
## Rows: 413 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDay
## dbl (3): Id, TotalMinutesAsleep, TotalTimeInBed
## time (1): Time
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(readr)
weight_sep <- read_csv("weightLogInfo_merged.csv")
## Rows: 67 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDay
## dbl (5): Id, WeightPounds, Fat, BMI, LogId
## lgl (1): IsManualReport
## time (1): Time
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(activity)
## # A tibble: 6 × 15
## Id Activ…¹ Total…² Total…³ Track…⁴ Logge…⁵ VeryA…⁶ Moder…⁷ Light…⁸ Seden…⁹
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1.50e9 4/12/2… 13162 8.5 8.5 0 1.88 0.550 6.06 0
## 2 1.50e9 4/13/2… 10735 6.97 6.97 0 1.57 0.690 4.71 0
## 3 1.50e9 4/14/2… 10460 6.74 6.74 0 2.44 0.400 3.91 0
## 4 1.50e9 4/15/2… 9762 6.28 6.28 0 2.14 1.26 2.83 0
## 5 1.50e9 4/16/2… 12669 8.16 8.16 0 2.71 0.410 5.04 0
## 6 1.50e9 4/17/2… 9705 6.48 6.48 0 3.19 0.780 2.51 0
## # … with 5 more variables: VeryActiveMinutes <dbl>, FairlyActiveMinutes <dbl>,
## # LightlyActiveMinutes <dbl>, SedentaryMinutes <dbl>, Calories <dbl>, and
## # abbreviated variable names ¹ActivityDate, ²TotalSteps, ³TotalDistance,
## # ⁴TrackerDistance, ⁵LoggedActivitiesDistance, ⁶VeryActiveDistance,
## # ⁷ModeratelyActiveDistance, ⁸LightActiveDistance, ⁹SedentaryActiveDistance
head(calories)
## # A tibble: 6 × 3
## Id ActivityDay Calories
## <dbl> <chr> <dbl>
## 1 1503960366 4/12/2016 1985
## 2 1503960366 4/13/2016 1797
## 3 1503960366 4/14/2016 1776
## 4 1503960366 4/15/2016 1745
## 5 1503960366 4/16/2016 1863
## 6 1503960366 4/17/2016 1728
head(intensities)
## # A tibble: 6 × 10
## Id Activ…¹ Seden…² Light…³ Fairl…⁴ VeryA…⁵ Seden…⁶ Light…⁷ Moder…⁸ VeryA…⁹
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1.50e9 4/12/2… 728 328 13 25 0 6.06 0.550 1.88
## 2 1.50e9 4/13/2… 776 217 19 21 0 4.71 0.690 1.57
## 3 1.50e9 4/14/2… 1218 181 11 30 0 3.91 0.400 2.44
## 4 1.50e9 4/15/2… 726 209 34 29 0 2.83 1.26 2.14
## 5 1.50e9 4/16/2… 773 221 10 36 0 5.04 0.410 2.71
## 6 1.50e9 4/17/2… 539 164 20 38 0 2.51 0.780 3.19
## # … with abbreviated variable names ¹ActivityDay, ²SedentaryMinutes,
## # ³LightlyActiveMinutes, ⁴FairlyActiveMinutes, ⁵VeryActiveMinutes,
## # ⁶SedentaryActiveDistance, ⁷LightActiveDistance, ⁸ModeratelyActiveDistance,
## # ⁹VeryActiveDistance
head(steps)
## # A tibble: 6 × 3
## Id ActivityDay StepTotal
## <dbl> <chr> <dbl>
## 1 1503960366 4/12/2016 13162
## 2 1503960366 4/13/2016 10735
## 3 1503960366 4/14/2016 10460
## 4 1503960366 4/15/2016 9762
## 5 1503960366 4/16/2016 12669
## 6 1503960366 4/17/2016 9705
head(weight_sep)
## # A tibble: 6 × 8
## Id ActivityDay Time WeightPounds Fat BMI IsManualRep…¹ LogId
## <dbl> <chr> <time> <dbl> <dbl> <dbl> <lgl> <dbl>
## 1 1503960366 5/2/2016 23:59:59 116. 22 22.6 TRUE 1.46e12
## 2 1503960366 5/3/2016 23:59:59 116. NA 22.6 TRUE 1.46e12
## 3 1927972279 4/13/2016 01:08:52 294. NA 47.5 FALSE 1.46e12
## 4 2873212765 4/21/2016 23:59:59 125. NA 21.5 TRUE 1.46e12
## 5 2873212765 5/12/2016 23:59:59 126. NA 21.7 TRUE 1.46e12
## 6 4319703577 4/17/2016 23:59:59 160. 25 27.5 TRUE 1.46e12
## # … with abbreviated variable name ¹IsManualReport
head(sleep_sep)
## # A tibble: 6 × 5
## Id ActivityDay Time TotalMinutesAsleep TotalTimeInBed
## <dbl> <chr> <time> <dbl> <dbl>
## 1 1503960366 4/12/2016 00'00" 327 346
## 2 1503960366 4/13/2016 00'00" 384 407
## 3 1503960366 4/15/2016 00'00" 412 442
## 4 1503960366 4/16/2016 00'00" 340 367
## 5 1503960366 4/17/2016 00'00" 700 712
## 6 1503960366 4/19/2016 00'00" 304 320
Separated the date and time in sleep and weight files for consistency with the use of Excel.
Decided to exclude the dataset “steps” because the steps were included in the activity dataset.
Merged data into one dataset for Fitbit so I could see everything together.
m1 <- merge(activity, calories, by = 1:2)
m2 <- merge(intensities, m1, by = 1:2)
merged_other <- merge(sleep_sep, weight_sep, by = 1:2, all = TRUE)
merged_fb <- merge(m2,merged_other, by = 1:2)
view(merged_fb)
all_fb_data <- subset(merged_fb, select = -c(Time.y, WeightPounds, Fat, BMI, IsManualReport, LogId, Time.x, Calories.x, TrackerDistance))
head(all_fb_data)
## Id ActivityDay SedentaryMinutes.x LightlyActiveMinutes.x
## 1 1503960366 4/12/2016 728 328
## 2 1503960366 4/13/2016 776 217
## 3 1503960366 4/15/2016 726 209
## 4 1503960366 4/16/2016 773 221
## 5 1503960366 4/17/2016 539 164
## 6 1503960366 4/19/2016 775 264
## FairlyActiveMinutes.x VeryActiveMinutes.x SedentaryActiveDistance.x
## 1 13 25 0
## 2 19 21 0
## 3 34 29 0
## 4 10 36 0
## 5 20 38 0
## 6 31 50 0
## LightActiveDistance.x ModeratelyActiveDistance.x VeryActiveDistance.x
## 1 6.06 0.55 1.88
## 2 4.71 0.69 1.57
## 3 2.83 1.26 2.14
## 4 5.04 0.41 2.71
## 5 2.51 0.78 3.19
## 6 5.03 1.32 3.53
## TotalSteps TotalDistance LoggedActivitiesDistance VeryActiveDistance.y
## 1 13162 8.50 0 1.88
## 2 10735 6.97 0 1.57
## 3 9762 6.28 0 2.14
## 4 12669 8.16 0 2.71
## 5 9705 6.48 0 3.19
## 6 15506 9.88 0 3.53
## ModeratelyActiveDistance.y LightActiveDistance.y SedentaryActiveDistance.y
## 1 0.55 6.06 0
## 2 0.69 4.71 0
## 3 1.26 2.83 0
## 4 0.41 5.04 0
## 5 0.78 2.51 0
## 6 1.32 5.03 0
## VeryActiveMinutes.y FairlyActiveMinutes.y LightlyActiveMinutes.y
## 1 25 13 328
## 2 21 19 217
## 3 29 34 209
## 4 36 10 221
## 5 38 20 164
## 6 50 31 264
## SedentaryMinutes.y Calories.y TotalMinutesAsleep TotalTimeInBed
## 1 728 1985 327 346
## 2 776 1797 384 407
## 3 726 1745 412 442
## 4 773 1863 340 367
## 5 539 1728 700 712
## 6 775 2035 304 320
Used Excel to do the following.
summary(all_fb_data)
## Id ActivityDay SedentaryMinutes.x
## Min. :1.504e+09 Length:445 Min. : 0.0
## 1st Qu.:4.020e+09 Class :character 1st Qu.: 644.0
## Median :4.703e+09 Mode :character Median : 727.0
## Mean :5.193e+09 Mean : 739.5
## 3rd Qu.:6.962e+09 3rd Qu.: 816.0
## Max. :8.878e+09 Max. :1363.0
##
## LightlyActiveMinutes.x FairlyActiveMinutes.x VeryActiveMinutes.x
## Min. : 2.0 Min. : 0.00 Min. : 0.00
## 1st Qu.:161.0 1st Qu.: 0.00 1st Qu.: 0.00
## Median :214.0 Median : 11.00 Median : 11.00
## Mean :219.3 Mean : 17.45 Mean : 27.16
## 3rd Qu.:265.0 3rd Qu.: 25.00 3rd Qu.: 43.00
## Max. :518.0 Max. :143.00 Max. :210.00
##
## SedentaryActiveDistance.x LightActiveDistance.x ModeratelyActiveDistance.x
## Min. :0.000000 Min. : 0.010 Min. :0.0000
## 1st Qu.:0.000000 1st Qu.: 2.630 1st Qu.:0.0000
## Median :0.000000 Median : 3.870 Median :0.4000
## Mean :0.001101 Mean : 3.948 Mean :0.7229
## 3rd Qu.:0.000000 3rd Qu.: 5.220 3rd Qu.:0.9700
## Max. :0.110000 Max. :10.710 Max. :6.4800
##
## VeryActiveDistance.x TotalSteps TotalDistance LoggedActivitiesDistance
## Min. : 0.000 Min. : 17 Min. : 0.010 Min. :0.000
## 1st Qu.: 0.000 1st Qu.: 5454 1st Qu.: 3.730 1st Qu.:0.000
## Median : 0.650 Median : 9148 Median : 6.470 Median :0.000
## Mean : 1.776 Mean : 8987 Mean : 6.478 Mean :0.105
## 3rd Qu.: 2.560 3rd Qu.:11611 3rd Qu.: 8.250 3rd Qu.:0.000
## Max. :21.660 Max. :29326 Max. :26.720 Max. :4.082
##
## VeryActiveDistance.y ModeratelyActiveDistance.y LightActiveDistance.y
## Min. : 0.000 Min. :0.0000 Min. : 0.010
## 1st Qu.: 0.000 1st Qu.:0.0000 1st Qu.: 2.630
## Median : 0.650 Median :0.4000 Median : 3.870
## Mean : 1.776 Mean :0.7229 Mean : 3.948
## 3rd Qu.: 2.560 3rd Qu.:0.9700 3rd Qu.: 5.220
## Max. :21.660 Max. :6.4800 Max. :10.710
##
## SedentaryActiveDistance.y VeryActiveMinutes.y FairlyActiveMinutes.y
## Min. :0.000000 Min. : 0.00 Min. : 0.00
## 1st Qu.:0.000000 1st Qu.: 0.00 1st Qu.: 0.00
## Median :0.000000 Median : 11.00 Median : 11.00
## Mean :0.001101 Mean : 27.16 Mean : 17.45
## 3rd Qu.:0.000000 3rd Qu.: 43.00 3rd Qu.: 25.00
## Max. :0.110000 Max. :210.00 Max. :143.00
##
## LightlyActiveMinutes.y SedentaryMinutes.y Calories.y TotalMinutesAsleep
## Min. : 2.0 Min. : 0.0 Min. : 257 Min. : 58.0
## 1st Qu.:161.0 1st Qu.: 644.0 1st Qu.:1863 1st Qu.:361.0
## Median :214.0 Median : 727.0 Median :2236 Median :433.0
## Mean :219.3 Mean : 739.5 Mean :2447 Mean :419.5
## 3rd Qu.:265.0 3rd Qu.: 816.0 3rd Qu.:2984 3rd Qu.:490.0
## Max. :518.0 Max. :1363.0 Max. :4900 Max. :796.0
## NA's :32
## TotalTimeInBed
## Min. : 61.0
## 1st Qu.:403.0
## Median :463.0
## Mean :458.6
## 3rd Qu.:526.0
## Max. :961.0
## NA's :32
A lot of information is in the summary table. I especially appreciate the ability to see the average, minimum, and maximum for each column. This allows me to look over values, like the average total steps (8987).
With the use of the summary table, I added the averages together. This shows that the average time a user wears their Fitbit on any given day is about 23.5 hours.
According to my analysis, I believe Bellabeat would best benefit from the following key takeaways (in bold):
1. Marketing should be geared towards all activity levels with an emphasis on those who are less active.
When looking at the data, it is evident that users of fitness trackers, like the Fitbit, are not highly active. Furthermore, 28% of users are classified as “Sedentary” or “Low Active”. Therefore, marketing should focus on users who want to improve their fitness with a focus on trying to increase their activity, no matter their beginning fitness level. Based on the data, Bellabeat will reach the users most likely to purchase one of its fitness trackers by targeting the less active users.
2. Marketing should focus on improving sleep quality and consistency through the use of the tracker.
The data shows that the total amount of sleep Fitbit users get varies by the day of the week. However, it also shows that Fitbit users are not getting the amount of sleep recommended. Assuming that all of the users in the study were adults, the CDC recommends a minimum of 7 hours of sleep each night. Fig. B shows that the users surveyed were below the recommended average for more than half the days of the week. Bellabeat could do this in a couple of ways:
3. Ensure the fitness tracker has a long battery life.
The data shows that the average Fitbit user wears his or her fitness tracker for 23.5 hours each day. If the average user is wearing the fitness tracker for 98% of the day, it should have a long battery life. Increasing the battery life of a smart device will make it more marketable to consumers who will wear the device the majority of the time.