Introduction
Bellabeat is a technology wellness company that makes health-focused products for women. They have a variety of products that can be used to track steps, water intake, and calories burned amongst other things. For the purpose of this analysis I will focus on one product, the Bellabeat app. I will be analyzing a data set from the FitBit Fitness Tracker Data to identify trends for marketing in order to increase sales and membership.
Business task: Identify the data trends in smart device usage to inform and make suggestions for marketing strategies.
Prepare and Process
First, let’s make sure all our necessary packages are installed in order to manipulate, clean and organize the data sets.
library(readr)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(rmarkdown)
library(tidyr)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ tibble 3.1.6 ✓ stringr 1.4.0
## ✓ purrr 0.3.4 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x lubridate::as.difftime() masks base::as.difftime()
## x lubridate::date() masks base::date()
## x dplyr::filter() masks stats::filter()
## x lubridate::intersect() masks base::intersect()
## x dplyr::lag() masks stats::lag()
## x lubridate::setdiff() masks base::setdiff()
## x lubridate::union() masks base::union()
Now, let’s import the dataset provided.
Once the data is imported we can take a look to determine if the data is credible. The ROCCC method will be used to make sure all the bases are covered.
- Reliable - No, data is not reliable due to it’s small sample size.
- Original - No , Bellabeat did not collect this data.
- Comprehensive - No, there is not any information on the parameters (Users weight, age, etc.)
- Current - No, the data is from 2016 making it 6 years old.
- Cited - Amazon Mechanical Turk is cited, but is not a reliable source.
Unfortunately the data is unreliable. Upon further reading, I will be focusing on the Daily activity and Sleep datasets to look for any possible trends and relationships as there are typically correlatiolns between the two.
Analysis
Now that the data is clean, it’s ready for analysis.
Taking a look at the summary of the data
Daily_Activity %>%
select(Calories, TotalSteps)%>%
summary()
## Calories TotalSteps
## Min. : 0 Min. : 0
## 1st Qu.:1828 1st Qu.: 3790
## Median :2134 Median : 7406
## Mean :2304 Mean : 7638
## 3rd Qu.:2793 3rd Qu.:10727
## Max. :4900 Max. :36019
Observations:
- The median amount of steps taken daily are less than the recommended 10,000 daily steps.
- Most participants burned approx. 2,100 calories.
Sleep %>%
select(TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed) %>%
summary()
## TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
## Min. :1.00 Min. : 58.0 Min. : 61.0
## 1st Qu.:1.00 1st Qu.:361.0 1st Qu.:403.8
## Median :1.00 Median :432.5 Median :463.0
## Mean :1.12 Mean :419.2 Mean :458.5
## 3rd Qu.:1.00 3rd Qu.:490.0 3rd Qu.:526.0
## Max. :3.00 Max. :796.0 Max. :961.0
Observations:
- Most participants only logged sleeping once a day, meaning no naps during the day.
- On average the participants are sleeping about 7 hours a night.
Merging the data together will make creating graphs easier
Activity_and_Sleep <- merge(Daily_Activity, Sleep, by= c("Id" = "Id", "Date" = "Date"), all = TRUE)
head(Activity_and_Sleep)
## Id Date ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 1503960366 2016-04-12 4/12/2016 13162 8.50 8.50
## 2 1503960366 2016-04-13 4/13/2016 10735 6.97 6.97
## 3 1503960366 2016-04-14 4/14/2016 10460 6.74 6.74
## 4 1503960366 2016-04-15 4/15/2016 9762 6.28 6.28
## 5 1503960366 2016-04-16 4/16/2016 12669 8.16 8.16
## 6 1503960366 2016-04-17 4/17/2016 9705 6.48 6.48
## LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1 0 1.88 0.55
## 2 0 1.57 0.69
## 3 0 2.44 0.40
## 4 0 2.14 1.26
## 5 0 2.71 0.41
## 6 0 3.19 0.78
## LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 3.91 0 30
## 4 2.83 0 29
## 5 5.04 0 36
## 6 2.51 0 38
## FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1 13 328 728 1985
## 2 19 217 776 1797
## 3 11 181 1218 1776
## 4 34 209 726 1745
## 5 10 221 773 1863
## 6 20 164 539 1728
## SleepDay TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
## 1 4/12/2016 12:00:00 AM 1 327 346
## 2 4/13/2016 12:00:00 AM 2 384 407
## 3 <NA> NA NA NA
## 4 4/15/2016 12:00:00 AM 1 412 442
## 5 4/16/2016 12:00:00 AM 2 340 367
## 6 4/17/2016 12:00:00 AM 1 700 712
Relationship between calories burned and steps
ggplot(data=Activity_and_Sleep)+ geom_point(mapping=aes(x=TotalSteps, y=Calories), color="sienna1") +labs(title= "Calories burned vs Total Steps", x="Total Steps", y="Calories") +geom_smooth(mapping=aes(x=TotalSteps, y=Calories))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

As expected, people who get more steps in also happen to burn more calories
Relationship between high activity and minutes slept
ggplot(data=Activity_and_Sleep)+ geom_point(mapping=aes(x=TotalMinutesAsleep, y=VeryActiveMinutes), color="deeppink") +labs(title= "High Acvtivity Minutes vs Minutes Slept", x="Total Minutes Asleep", y="Very Active Minutes") +geom_smooth(mapping=aes(x=TotalMinutesAsleep, y=VeryActiveMinutes))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 530 rows containing non-finite values (stat_smooth).
## Warning: Removed 530 rows containing missing values (geom_point).

We can see that those who had a lot minutes of “high activity” had an average amount of sleep, as did those that did not have as much “high activity” minutes.
Recommendations
1. Implementing habit reminders through the app could help remind users to walk and move more, encouraging them to get more daily steps in.
2. Weight is something that could not be analyzed due to only 8 individuals having tracked it. This opens up an opportunity for Bellabeat to explore integrating a new product. Whether the participants forgot to log their weight or they don’t have a scale, Bellabeat could offer a scale that links directly to their app. This way when they weigh themselves, it’ll automatically be recorded on the app, making it easier to track.
3. To improve tracking sleep, Bellabeat could integrate a reminder option in their app to encourage better tracking habits.