Introduction

As part of the Google Data Analytics Professional Certificate course on Coursera, it is recommended that we complete a capstone demonstrating all the skills we learned. I chose to complete the second case study: How Can a Wellness Technology Company Play It Smart? due to my interest in women’s health and fitness. This documentation will be showcasing my understanding of the data analysis process including; Ask, Prepare, Process, Analyze, Share, and Act.

The Scenario

Bellabeat is a high-tech manufacturer of health-focused products for women. I will be acting as a junior data analyst on the marketing analytics team, analyzing data from one of our smart fitness devices to gain insight into how consumers are utilizing it. Some key products that I will be looking at include:

Bellabeat App: an app that connects to the line of smart wellness products which provides users with health data related to activity, sleep, stress, menstrual cycle, and mindfulness habits.

Leaf: a wellness tracker that can be worn and connects to the Bellabeat app (similar to a FitBit)

Time: a wellness watch that allows users to track activity, sleep, and stress and connects to the Bellabeat app

Spring: a water bottle that tracks daily water intake and connects to the app

Bellabeat Membership: a subscription-based membership program that gives users 24/7 access to personalized guidance on health and wellness habits

ASK

I want to analyze smart device usage data to gain insight into how consumers use non-Bellabeat smart devices. I will then select one of the Bellabeat products to apply these insights. Some questions I will seek to answer are:

  1. What are some trends in smart device usage?

  2. How could these trends apply to Bellabeat customers?

  3. How could these trends help influence Bellabeat marketing strategy?

The business task at hand is to gain insights from non-Bellabeat smart devices to formulate a marketing strategy for Bellabeat. I also want to consider key stakeholders which include Bellabeat co founders - Urška Sršen and Sando Mur.

PREPARE

For this project I was given a specific data set related to FitBit tracker Data. See the data here. The data is organized in rows and columns in .csv format and is in long format since each row is a time point per subject.

Installing Packages:

r = getOption("repos")
r["CRAN"] = "http://cran.us.r-project.org"
options(repos = r)
install.packages('tidyverse')
## 
## The downloaded binary packages are in
##  /var/folders/8p/04rfxqyn331928nqk5bs1fmm0000gn/T//RtmpX4jgBb/downloaded_packages
install.packages('janitor')
## 
## The downloaded binary packages are in
##  /var/folders/8p/04rfxqyn331928nqk5bs1fmm0000gn/T//RtmpX4jgBb/downloaded_packages
install.packages('lubridate')
## 
## The downloaded binary packages are in
##  /var/folders/8p/04rfxqyn331928nqk5bs1fmm0000gn/T//RtmpX4jgBb/downloaded_packages
install.packages('skimr')
## 
## The downloaded binary packages are in
##  /var/folders/8p/04rfxqyn331928nqk5bs1fmm0000gn/T//RtmpX4jgBb/downloaded_packages
install.packages('POSIXct')
## Warning: package 'POSIXct' is not available for this version of R
## 
## A version of this package for your version of R might be available elsewhere,
## see the ideas at
## https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(janitor)
## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(lubridate)
library(skimr)
library(ggplot2)

Uploading/Storing Data:

daily_activity <- read.csv("Bellabeat Fitbit Data/dailyActivity_merged.csv")
daily_calories <- read.csv("Bellabeat Fitbit Data/dailyCalories_merged.csv")
daily_steps <- read.csv("Bellabeat Fitbit Data/dailySteps_merged.csv")
daily_sleep <- read.csv("Bellabeat Fitbit Data/sleepDay_merged.csv")

To identify how the data is organized I am using the commands colnames() and head() to see what the names of the columns are in each of the data sets, along with a short preview of what the data sets look like.

colnames(daily_sleep)
## [1] "Id"                 "SleepDay"           "TotalSleepRecords" 
## [4] "TotalMinutesAsleep" "TotalTimeInBed"
head(daily_sleep)
##           Id              SleepDay TotalSleepRecords TotalMinutesAsleep
## 1 1503960366 4/12/2016 12:00:00 AM                 1                327
## 2 1503960366 4/13/2016 12:00:00 AM                 2                384
## 3 1503960366 4/15/2016 12:00:00 AM                 1                412
## 4 1503960366 4/16/2016 12:00:00 AM                 2                340
## 5 1503960366 4/17/2016 12:00:00 AM                 1                700
## 6 1503960366 4/19/2016 12:00:00 AM                 1                304
##   TotalTimeInBed
## 1            346
## 2            407
## 3            442
## 4            367
## 5            712
## 6            320
colnames(daily_steps)
## [1] "Id"          "ActivityDay" "StepTotal"
head(daily_steps)
##           Id ActivityDay StepTotal
## 1 1503960366   4/12/2016     13162
## 2 1503960366   4/13/2016     10735
## 3 1503960366   4/14/2016     10460
## 4 1503960366   4/15/2016      9762
## 5 1503960366   4/16/2016     12669
## 6 1503960366   4/17/2016      9705
colnames(daily_calories)
## [1] "Id"          "ActivityDay" "Calories"
head(daily_calories)
##           Id ActivityDay Calories
## 1 1503960366   4/12/2016     1985
## 2 1503960366   4/13/2016     1797
## 3 1503960366   4/14/2016     1776
## 4 1503960366   4/15/2016     1745
## 5 1503960366   4/16/2016     1863
## 6 1503960366   4/17/2016     1728
colnames(daily_activity)
##  [1] "Id"                       "ActivityDate"            
##  [3] "TotalSteps"               "TotalDistance"           
##  [5] "TrackerDistance"          "LoggedActivitiesDistance"
##  [7] "VeryActiveDistance"       "ModeratelyActiveDistance"
##  [9] "LightActiveDistance"      "SedentaryActiveDistance" 
## [11] "VeryActiveMinutes"        "FairlyActiveMinutes"     
## [13] "LightlyActiveMinutes"     "SedentaryMinutes"        
## [15] "Calories"
head(daily_activity)
##           Id ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 1503960366    4/12/2016      13162          8.50            8.50
## 2 1503960366    4/13/2016      10735          6.97            6.97
## 3 1503960366    4/14/2016      10460          6.74            6.74
## 4 1503960366    4/15/2016       9762          6.28            6.28
## 5 1503960366    4/16/2016      12669          8.16            8.16
## 6 1503960366    4/17/2016       9705          6.48            6.48
##   LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1                        0               1.88                     0.55
## 2                        0               1.57                     0.69
## 3                        0               2.44                     0.40
## 4                        0               2.14                     1.26
## 5                        0               2.71                     0.41
## 6                        0               3.19                     0.78
##   LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1                6.06                       0                25
## 2                4.71                       0                21
## 3                3.91                       0                30
## 4                2.83                       0                29
## 5                5.04                       0                36
## 6                2.51                       0                38
##   FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1                  13                  328              728     1985
## 2                  19                  217              776     1797
## 3                  11                  181             1218     1776
## 4                  34                  209              726     1745
## 5                  10                  221              773     1863
## 6                  20                  164              539     1728

PROCESS

For this project I will be utilizing R Studio along with packages such as ‘tidyverse’ and ‘lubridate’. Right off the bat, I want to check the data for any problems. I started by searching for null values using is.null() and did not find any.

is.null(daily_activity)
## [1] FALSE
is.null(daily_calories)
## [1] FALSE
is.null(daily_steps)
## [1] FALSE
is.null(daily_sleep)
## [1] FALSE

Then, I checked for data errors using str().

str(daily_activity)
## 'data.frame':    940 obs. of  15 variables:
##  $ Id                      : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityDate            : chr  "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
##  $ TotalSteps              : int  13162 10735 10460 9762 12669 9705 13019 15506 10544 9819 ...
##  $ TotalDistance           : num  8.5 6.97 6.74 6.28 8.16 ...
##  $ TrackerDistance         : num  8.5 6.97 6.74 6.28 8.16 ...
##  $ LoggedActivitiesDistance: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ VeryActiveDistance      : num  1.88 1.57 2.44 2.14 2.71 ...
##  $ ModeratelyActiveDistance: num  0.55 0.69 0.4 1.26 0.41 ...
##  $ LightActiveDistance     : num  6.06 4.71 3.91 2.83 5.04 ...
##  $ SedentaryActiveDistance : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ VeryActiveMinutes       : int  25 21 30 29 36 38 42 50 28 19 ...
##  $ FairlyActiveMinutes     : int  13 19 11 34 10 20 16 31 12 8 ...
##  $ LightlyActiveMinutes    : int  328 217 181 209 221 164 233 264 205 211 ...
##  $ SedentaryMinutes        : int  728 776 1218 726 773 539 1149 775 818 838 ...
##  $ Calories                : int  1985 1797 1776 1745 1863 1728 1921 2035 1786 1775 ...
str(daily_calories)
## 'data.frame':    940 obs. of  3 variables:
##  $ Id         : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityDay: chr  "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
##  $ Calories   : int  1985 1797 1776 1745 1863 1728 1921 2035 1786 1775 ...
str(daily_steps)
## 'data.frame':    940 obs. of  3 variables:
##  $ Id         : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityDay: chr  "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
##  $ StepTotal  : int  13162 10735 10460 9762 12669 9705 13019 15506 10544 9819 ...
str(daily_sleep)
## 'data.frame':    413 obs. of  5 variables:
##  $ Id                : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ SleepDay          : chr  "4/12/2016 12:00:00 AM" "4/13/2016 12:00:00 AM" "4/15/2016 12:00:00 AM" "4/16/2016 12:00:00 AM" ...
##  $ TotalSleepRecords : int  1 2 1 2 1 1 1 1 1 1 ...
##  $ TotalMinutesAsleep: int  327 384 412 340 700 304 360 325 361 430 ...
##  $ TotalTimeInBed    : int  346 407 442 367 712 320 377 364 384 449 ...

The biggest thing I see here is that there are several columns for dates that are in character format rather than date format, so I need to fix this.

daily_sleep$SleepDay <- as.Date(daily_sleep$SleepDay, '%m/%d/%y')
daily_steps$ActivityDay <-as.Date(daily_steps$ActivityDay, '%m/%d/%y')
daily_calories$ActivityDay <- as.Date(daily_calories$ActivityDay, '%m/%d/%y')
daily_activity$ActivityDate <- as.Date(daily_activity$ActivityDate, '%m/%d/%y')

In the sleep data, I thought it might be interesting to see some data regarding how much time it takes people to fall asleep. I set up a new column in this dataset to display it.

time_taken_to_sleep <- daily_sleep$TotalTimeInBed - daily_sleep$TotalMinutesAsleep
daily_sleep$time_taken_to_sleep = (daily_sleep$TotalTimeInBed - daily_sleep$TotalMinutesAsleep)
head(time_taken_to_sleep)
## [1] 19 23 30 27 12 16

Next thing I noticed is that there appear to be days where people do not wear their Fitbit tracker, resulting in zero values for total distance and total steps. I do not want to include these in my analyses so I am going to get rid of them before producing some graphs.

cleaned_daily_activity <- daily_activity[!(daily_activity$Calories<=0),]
cleaned_daily_activity <- cleaned_daily_activity[!(cleaned_daily_activity$TotalDistance<=0.00),]
head(cleaned_daily_activity)
##           Id ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 1503960366   2020-04-12      13162          8.50            8.50
## 2 1503960366   2020-04-13      10735          6.97            6.97
## 3 1503960366   2020-04-14      10460          6.74            6.74
## 4 1503960366   2020-04-15       9762          6.28            6.28
## 5 1503960366   2020-04-16      12669          8.16            8.16
## 6 1503960366   2020-04-17       9705          6.48            6.48
##   LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1                        0               1.88                     0.55
## 2                        0               1.57                     0.69
## 3                        0               2.44                     0.40
## 4                        0               2.14                     1.26
## 5                        0               2.71                     0.41
## 6                        0               3.19                     0.78
##   LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1                6.06                       0                25
## 2                4.71                       0                21
## 3                3.91                       0                30
## 4                2.83                       0                29
## 5                5.04                       0                36
## 6                2.51                       0                38
##   FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1                  13                  328              728     1985
## 2                  19                  217              776     1797
## 3                  11                  181             1218     1776
## 4                  34                  209              726     1745
## 5                  10                  221              773     1863
## 6                  20                  164              539     1728

The last change that I want to make is adding a column in the activity, calories, and sleep datasets for day of the week.

Activity:

daily_activity$ActivityDate <- as.Date(daily_activity$ActivityDate, "%Y-%m-%d")
daily_activity$DayOfWeek <- weekdays(daily_activity$ActivityDate)
head(daily_activity)
##           Id ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 1503960366   2020-04-12      13162          8.50            8.50
## 2 1503960366   2020-04-13      10735          6.97            6.97
## 3 1503960366   2020-04-14      10460          6.74            6.74
## 4 1503960366   2020-04-15       9762          6.28            6.28
## 5 1503960366   2020-04-16      12669          8.16            8.16
## 6 1503960366   2020-04-17       9705          6.48            6.48
##   LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1                        0               1.88                     0.55
## 2                        0               1.57                     0.69
## 3                        0               2.44                     0.40
## 4                        0               2.14                     1.26
## 5                        0               2.71                     0.41
## 6                        0               3.19                     0.78
##   LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1                6.06                       0                25
## 2                4.71                       0                21
## 3                3.91                       0                30
## 4                2.83                       0                29
## 5                5.04                       0                36
## 6                2.51                       0                38
##   FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories DayOfWeek
## 1                  13                  328              728     1985    Sunday
## 2                  19                  217              776     1797    Monday
## 3                  11                  181             1218     1776   Tuesday
## 4                  34                  209              726     1745 Wednesday
## 5                  10                  221              773     1863  Thursday
## 6                  20                  164              539     1728    Friday
cleaned_daily_activity$ActivityDate <- as.Date(cleaned_daily_activity$ActivityDate, "%Y-%m-%d")
cleaned_daily_activity$DayOfWeek <- weekdays(cleaned_daily_activity$ActivityDate)
head(cleaned_daily_activity)
##           Id ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 1503960366   2020-04-12      13162          8.50            8.50
## 2 1503960366   2020-04-13      10735          6.97            6.97
## 3 1503960366   2020-04-14      10460          6.74            6.74
## 4 1503960366   2020-04-15       9762          6.28            6.28
## 5 1503960366   2020-04-16      12669          8.16            8.16
## 6 1503960366   2020-04-17       9705          6.48            6.48
##   LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1                        0               1.88                     0.55
## 2                        0               1.57                     0.69
## 3                        0               2.44                     0.40
## 4                        0               2.14                     1.26
## 5                        0               2.71                     0.41
## 6                        0               3.19                     0.78
##   LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1                6.06                       0                25
## 2                4.71                       0                21
## 3                3.91                       0                30
## 4                2.83                       0                29
## 5                5.04                       0                36
## 6                2.51                       0                38
##   FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories DayOfWeek
## 1                  13                  328              728     1985    Sunday
## 2                  19                  217              776     1797    Monday
## 3                  11                  181             1218     1776   Tuesday
## 4                  34                  209              726     1745 Wednesday
## 5                  10                  221              773     1863  Thursday
## 6                  20                  164              539     1728    Friday

Calories:

daily_calories$ActivityDay <- as.Date(daily_calories$ActivityDay, "%Y-%m-%d")
daily_calories$DayOfWeek <- weekdays(daily_calories$ActivityDay)
head(daily_calories)
##           Id ActivityDay Calories DayOfWeek
## 1 1503960366  2020-04-12     1985    Sunday
## 2 1503960366  2020-04-13     1797    Monday
## 3 1503960366  2020-04-14     1776   Tuesday
## 4 1503960366  2020-04-15     1745 Wednesday
## 5 1503960366  2020-04-16     1863  Thursday
## 6 1503960366  2020-04-17     1728    Friday

Sleep:

daily_sleep$SleepDay <- as.Date(daily_sleep$SleepDay, "%Y-%m-%d")
daily_sleep$DayOfWeek <- weekdays(daily_sleep$SleepDay)
head(daily_sleep)
##           Id   SleepDay TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
## 1 1503960366 2020-04-12                 1                327            346
## 2 1503960366 2020-04-13                 2                384            407
## 3 1503960366 2020-04-15                 1                412            442
## 4 1503960366 2020-04-16                 2                340            367
## 5 1503960366 2020-04-17                 1                700            712
## 6 1503960366 2020-04-19                 1                304            320
##   time_taken_to_sleep DayOfWeek
## 1                  19    Sunday
## 2                  23    Monday
## 3                  30 Wednesday
## 4                  27  Thursday
## 5                  12    Friday
## 6                  16    Sunday

Now that my data is cleaned, I’m ready to produce some graphs!

ANALYZE

I’m going to start by looking at the daily activity data. First, I want to start by running some simple summary statistics to give us a better idea of what to expect.

summary(cleaned_daily_activity$TotalSteps)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       8    4927    8054    8329   11096   36019

This shows us that users take an average of 8329 steps a day, which is less than the advised 10,000 steps a day. I find this interesting too because the sample is slightly biased towards people who wear a fitness tracker as we would expect these people to take more steps than those who don’t.

summary(cleaned_daily_activity$Calories)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##      52    1857    2220    2362    2832    4900

This shows us that users burn an average of 2,362 calories a day, which is about the expected amount of burned calories for women.

Now, I want to compare how many calories users are burning with how many steps they are taking. I expect to see a positive correlation here.

ggplot(data = cleaned_daily_activity) +
  aes(x= TotalSteps, y = Calories) +
  geom_point(color = 'blue') +
  geom_smooth() +
  labs(x = 'Total Steps', y = 'Calories Burned', title = 'Calories Burned vs Total Steps')
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

As expected, this graph shows us a general trend where the more steps a user takes, the more calories they burn. It would be interesting to see some more data of users who take more than 20,000 steps a day to see if the trend continues or it does plateau like the graph predicts based on some outliers.

Similarly, I want to compare the amount of calories users burn with the total distance they go in a day, and I expect it to look quite similar to the previous graph.

ggplot(data = cleaned_daily_activity) +
  aes(x= TotalDistance, y = Calories) +
  geom_point(color = 'blue') +
  geom_smooth() +
  labs(x = 'Total Distance', y = 'Calories Burned', title = 'Calories Burned vs Total Distance')
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

As expected, these graphs are nearly identical showing a general trend where the more distance a user goes in a day, the more calories they burn.

Next I want to see how many calories users burn in comparison to sedentary minutes. Here, I am expecting to see an inverse relationship to the previous two graphs.

ggplot(data = cleaned_daily_activity) +
  aes(x= SedentaryMinutes, y = Calories) +
  geom_point(color = 'blue') +
  geom_smooth() +
  labs(x = 'Sedentary Minutes', y = 'Calories Burned', title = 'Calories Burned vs Sedentary Minutes')
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

This result was a bit surprising to me, as I expected to see the opposite of my first two charts, however it appears it starts with a positive correlation until a user hits about 750 minutes of sedentary activity and it starts to plateau a bit and become more negatively correlated.

After seeing all of this, I think it would be interesting to check and see if there are any trends in the amount of steps users are taking per day based on the day of the week.

daily_activity$DayOfWeek <- factor(daily_activity$DayOfWeek, c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday","Friday","Saturday"))

ggplot(data = daily_activity) +
  aes(x = DayOfWeek, y = TotalSteps) +
  geom_col(fill =  'blue') +
  labs(x = 'Day of Week', y = 'Total Steps', title = 'Total steps taken in a week')

This graph shows us that users tend to take the most amount of steps on Sundays and the least amount of steps on Fridays. It also appears as though users tend to take fewer steps as the week progresses.

I also want to see if there is a relationship between the amount of calories a user burns based on the day of the week.

options(scipen = 999) #remove scientific notation
cleaned_daily_activity$DayOfWeek <- factor(cleaned_daily_activity$DayOfWeek, c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday","Friday","Saturday"))

ggplot(data = cleaned_daily_activity) +
  aes(x = DayOfWeek, y = Calories) +
  geom_col(fill =  'blue') 

  labs(x = 'Day of week', y = 'Calories Burned', title = 'Calories burned in a week')
## $x
## [1] "Day of week"
## 
## $y
## [1] "Calories Burned"
## 
## $title
## [1] "Calories burned in a week"
## 
## attr(,"class")
## [1] "labels"

As expected, this comparison gives us similar results to total steps where users tend to burn the most amount of calories on Sundays and the least amount on Fridays.

Now that I have seen some graphs for users’ daily activity I want to look at users’ sleep activity. I want to first run a few simple summary statistics:

summary(daily_sleep$TotalMinutesAsleep)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    58.0   361.0   433.0   419.5   490.0   796.0

This tells us that the average amount of sleep users are getting is 419.5 minutes, which is almost 7 hours, which is an hour less than the suggested amount of sleep people should be getting per night.

summary(daily_sleep$time_taken_to_sleep)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   17.00   25.00   39.17   40.00  371.00

This tells us it takes an average of 39 minutes for users to fall asleep, ranging from 0 minutes to 371 minutes (a little over 6 hours!)

I think it would be interesting to see if there are any trends in how much sleep users are getting based on the day of the week.

daily_sleep$DayOfWeek <- factor(daily_sleep$DayOfWeek, c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday","Friday","Saturday"))

ggplot(data = daily_sleep) +
  aes(x = DayOfWeek, y = TotalMinutesAsleep) +
  geom_col(fill =  'blue') +
  labs(x = 'Day of week', y = 'Total Minutes Asleep', title = 'Total sleep in a week')

This graph doesn’t show any huge trends, but most notably users tend to sleep the least on Saturdays and the most on Mondays.

I also thought it would be interesting to search for trends regarding how long it takes a user to fall asleep on a particular day of the week.

daily_sleep$DayOfWeek <- factor(daily_sleep$DayOfWeek, c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday","Friday","Saturday"))

ggplot(data = daily_sleep) +
  aes(x = DayOfWeek, y = time_taken_to_sleep) +
  geom_col(fill =  'blue') +
  labs(x = 'Day of week', y = 'Time to fall asleep', title = 'Time taken to fall asleep by days of week')

This graph tells us that on Fridays users tend to take the longest amount of time to fall asleep and on Saturday users take the least amount of time to fall asleep.

One thing that sparked my curiosity was to see if the amount of time someone spends sleeping is in any way correlated to the amount of time they take to fall asleep.

ggplot(data = daily_sleep) +
  aes(x= TotalMinutesAsleep, y = time_taken_to_sleep) +
  geom_point(color = 'blue') +
  geom_smooth() +
  labs(x = 'Time Asleep', y = 'Time to fall asleep', title = 'Time sleeping vs. fall asleep')
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

There does not appear to be a strong relationship here in any way.

Finally, I wanted to compare the time a user spends asleep with the time they are in bed. Here, I expect to see a strong positive correlation.

ggplot(data = daily_sleep) +
  aes(x= TotalMinutesAsleep, y = TotalTimeInBed) +
  geom_point(color = 'blue') +
  geom_smooth() +
  labs(x = 'Time Asleep', y = 'Time in Bed', title = 'Time sleeping vs. time in bed')
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

SHARE

Based on my analysis, I want to share a summary of some observations I have made.

Daily Activity: 1. Users burn more calories as they take more steps, indicating a positive correlation between step count and calorie burn.

  1. A similar positive correlation exists between the distance users travel and the calories burned.

  2. Surprisingly, the relationship between sedentary minutes and calories burned is positive until around 750 minutes of sedentary activity, after which it starts to plateau and becomes more negatively correlated.

  3. Users tend to take the most steps and burn the most calories on Sundays, while Fridays see the fewest steps and calories burned.

  4. There is a noticeable decline in activity levels as the week progresses, with fewer steps and lower calorie burn toward the end of the week.

Sleep Activity:

  1. Users’ sleep duration varies by the day of the week, with shorter sleep times on Saturdays and longer sleep times on Mondays.

  2. Users take the longest time to fall asleep on Fridays and the shortest time on Saturdays.

  3. There is no strong relationship between the duration of sleep and the time taken to fall asleep. This suggests that individuals who take longer to fall asleep do not necessarily sleep for shorter durations.

  4. There is a strong positive correlation between the time a user spends asleep and the time they spend in bed, indicating that users tend to sleep for most of the time they are in bed.

ACT

My findings provide valuable insights into user behavior with FitBit trackers. One thing I may suggest to Bellabeat would be to potentially target certain days of the week or activity levels for promoting their wellness products. For example, hosting classes on days of the week where activity is typically higher. Additionally, the analysis highlights the need to consider individual variations in sleep patterns and activity levels when tailoring marketing efforts. Going off of this, it may be wise to create some sort of product that more actively tracks sleep patterns since people are giving their information based off of their best guess. I think something that may be beneficial for Bellabeat would be to partner with other health and wellness brands to promote active living by providing incentives such as deals on products.