As part of the Google Data Analytics Professional Certificate course on Coursera, it is recommended that we complete a capstone demonstrating all the skills we learned. I chose to complete the second case study: How Can a Wellness Technology Company Play It Smart? due to my interest in women’s health and fitness. This documentation will be showcasing my understanding of the data analysis process including; Ask, Prepare, Process, Analyze, Share, and Act.
Bellabeat is a high-tech manufacturer of health-focused products for women. I will be acting as a junior data analyst on the marketing analytics team, analyzing data from one of our smart fitness devices to gain insight into how consumers are utilizing it. Some key products that I will be looking at include:
– Bellabeat App: an app that connects to the line of smart wellness products which provides users with health data related to activity, sleep, stress, menstrual cycle, and mindfulness habits.
– Leaf: a wellness tracker that can be worn and connects to the Bellabeat app (similar to a FitBit)
– Time: a wellness watch that allows users to track activity, sleep, and stress and connects to the Bellabeat app
– Spring: a water bottle that tracks daily water intake and connects to the app
– Bellabeat Membership: a subscription-based membership program that gives users 24/7 access to personalized guidance on health and wellness habits
I want to analyze smart device usage data to gain insight into how consumers use non-Bellabeat smart devices. I will then select one of the Bellabeat products to apply these insights. Some questions I will seek to answer are:
What are some trends in smart device usage?
How could these trends apply to Bellabeat customers?
How could these trends help influence Bellabeat marketing strategy?
The business task at hand is to gain insights from non-Bellabeat smart devices to formulate a marketing strategy for Bellabeat. I also want to consider key stakeholders which include Bellabeat co founders - Urška Sršen and Sando Mur.
For this project I was given a specific data set related to FitBit tracker Data. See the data here. The data is organized in rows and columns in .csv format and is in long format since each row is a time point per subject.
Installing Packages:
r = getOption("repos")
r["CRAN"] = "http://cran.us.r-project.org"
options(repos = r)
install.packages('tidyverse')
##
## The downloaded binary packages are in
## /var/folders/8p/04rfxqyn331928nqk5bs1fmm0000gn/T//RtmpX4jgBb/downloaded_packages
install.packages('janitor')
##
## The downloaded binary packages are in
## /var/folders/8p/04rfxqyn331928nqk5bs1fmm0000gn/T//RtmpX4jgBb/downloaded_packages
install.packages('lubridate')
##
## The downloaded binary packages are in
## /var/folders/8p/04rfxqyn331928nqk5bs1fmm0000gn/T//RtmpX4jgBb/downloaded_packages
install.packages('skimr')
##
## The downloaded binary packages are in
## /var/folders/8p/04rfxqyn331928nqk5bs1fmm0000gn/T//RtmpX4jgBb/downloaded_packages
install.packages('POSIXct')
## Warning: package 'POSIXct' is not available for this version of R
##
## A version of this package for your version of R might be available elsewhere,
## see the ideas at
## https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(janitor)
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(lubridate)
library(skimr)
library(ggplot2)
Uploading/Storing Data:
daily_activity <- read.csv("Bellabeat Fitbit Data/dailyActivity_merged.csv")
daily_calories <- read.csv("Bellabeat Fitbit Data/dailyCalories_merged.csv")
daily_steps <- read.csv("Bellabeat Fitbit Data/dailySteps_merged.csv")
daily_sleep <- read.csv("Bellabeat Fitbit Data/sleepDay_merged.csv")
To identify how the data is organized I am using the commands colnames() and head() to see what the names of the columns are in each of the data sets, along with a short preview of what the data sets look like.
colnames(daily_sleep)
## [1] "Id" "SleepDay" "TotalSleepRecords"
## [4] "TotalMinutesAsleep" "TotalTimeInBed"
head(daily_sleep)
## Id SleepDay TotalSleepRecords TotalMinutesAsleep
## 1 1503960366 4/12/2016 12:00:00 AM 1 327
## 2 1503960366 4/13/2016 12:00:00 AM 2 384
## 3 1503960366 4/15/2016 12:00:00 AM 1 412
## 4 1503960366 4/16/2016 12:00:00 AM 2 340
## 5 1503960366 4/17/2016 12:00:00 AM 1 700
## 6 1503960366 4/19/2016 12:00:00 AM 1 304
## TotalTimeInBed
## 1 346
## 2 407
## 3 442
## 4 367
## 5 712
## 6 320
colnames(daily_steps)
## [1] "Id" "ActivityDay" "StepTotal"
head(daily_steps)
## Id ActivityDay StepTotal
## 1 1503960366 4/12/2016 13162
## 2 1503960366 4/13/2016 10735
## 3 1503960366 4/14/2016 10460
## 4 1503960366 4/15/2016 9762
## 5 1503960366 4/16/2016 12669
## 6 1503960366 4/17/2016 9705
colnames(daily_calories)
## [1] "Id" "ActivityDay" "Calories"
head(daily_calories)
## Id ActivityDay Calories
## 1 1503960366 4/12/2016 1985
## 2 1503960366 4/13/2016 1797
## 3 1503960366 4/14/2016 1776
## 4 1503960366 4/15/2016 1745
## 5 1503960366 4/16/2016 1863
## 6 1503960366 4/17/2016 1728
colnames(daily_activity)
## [1] "Id" "ActivityDate"
## [3] "TotalSteps" "TotalDistance"
## [5] "TrackerDistance" "LoggedActivitiesDistance"
## [7] "VeryActiveDistance" "ModeratelyActiveDistance"
## [9] "LightActiveDistance" "SedentaryActiveDistance"
## [11] "VeryActiveMinutes" "FairlyActiveMinutes"
## [13] "LightlyActiveMinutes" "SedentaryMinutes"
## [15] "Calories"
head(daily_activity)
## Id ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 1503960366 4/12/2016 13162 8.50 8.50
## 2 1503960366 4/13/2016 10735 6.97 6.97
## 3 1503960366 4/14/2016 10460 6.74 6.74
## 4 1503960366 4/15/2016 9762 6.28 6.28
## 5 1503960366 4/16/2016 12669 8.16 8.16
## 6 1503960366 4/17/2016 9705 6.48 6.48
## LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1 0 1.88 0.55
## 2 0 1.57 0.69
## 3 0 2.44 0.40
## 4 0 2.14 1.26
## 5 0 2.71 0.41
## 6 0 3.19 0.78
## LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 3.91 0 30
## 4 2.83 0 29
## 5 5.04 0 36
## 6 2.51 0 38
## FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1 13 328 728 1985
## 2 19 217 776 1797
## 3 11 181 1218 1776
## 4 34 209 726 1745
## 5 10 221 773 1863
## 6 20 164 539 1728
For this project I will be utilizing R Studio along with packages such as ‘tidyverse’ and ‘lubridate’. Right off the bat, I want to check the data for any problems. I started by searching for null values using is.null() and did not find any.
is.null(daily_activity)
## [1] FALSE
is.null(daily_calories)
## [1] FALSE
is.null(daily_steps)
## [1] FALSE
is.null(daily_sleep)
## [1] FALSE
Then, I checked for data errors using str().
str(daily_activity)
## 'data.frame': 940 obs. of 15 variables:
## $ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ ActivityDate : chr "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
## $ TotalSteps : int 13162 10735 10460 9762 12669 9705 13019 15506 10544 9819 ...
## $ TotalDistance : num 8.5 6.97 6.74 6.28 8.16 ...
## $ TrackerDistance : num 8.5 6.97 6.74 6.28 8.16 ...
## $ LoggedActivitiesDistance: num 0 0 0 0 0 0 0 0 0 0 ...
## $ VeryActiveDistance : num 1.88 1.57 2.44 2.14 2.71 ...
## $ ModeratelyActiveDistance: num 0.55 0.69 0.4 1.26 0.41 ...
## $ LightActiveDistance : num 6.06 4.71 3.91 2.83 5.04 ...
## $ SedentaryActiveDistance : num 0 0 0 0 0 0 0 0 0 0 ...
## $ VeryActiveMinutes : int 25 21 30 29 36 38 42 50 28 19 ...
## $ FairlyActiveMinutes : int 13 19 11 34 10 20 16 31 12 8 ...
## $ LightlyActiveMinutes : int 328 217 181 209 221 164 233 264 205 211 ...
## $ SedentaryMinutes : int 728 776 1218 726 773 539 1149 775 818 838 ...
## $ Calories : int 1985 1797 1776 1745 1863 1728 1921 2035 1786 1775 ...
str(daily_calories)
## 'data.frame': 940 obs. of 3 variables:
## $ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ ActivityDay: chr "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
## $ Calories : int 1985 1797 1776 1745 1863 1728 1921 2035 1786 1775 ...
str(daily_steps)
## 'data.frame': 940 obs. of 3 variables:
## $ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ ActivityDay: chr "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
## $ StepTotal : int 13162 10735 10460 9762 12669 9705 13019 15506 10544 9819 ...
str(daily_sleep)
## 'data.frame': 413 obs. of 5 variables:
## $ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ SleepDay : chr "4/12/2016 12:00:00 AM" "4/13/2016 12:00:00 AM" "4/15/2016 12:00:00 AM" "4/16/2016 12:00:00 AM" ...
## $ TotalSleepRecords : int 1 2 1 2 1 1 1 1 1 1 ...
## $ TotalMinutesAsleep: int 327 384 412 340 700 304 360 325 361 430 ...
## $ TotalTimeInBed : int 346 407 442 367 712 320 377 364 384 449 ...
The biggest thing I see here is that there are several columns for dates that are in character format rather than date format, so I need to fix this.
daily_sleep$SleepDay <- as.Date(daily_sleep$SleepDay, '%m/%d/%y')
daily_steps$ActivityDay <-as.Date(daily_steps$ActivityDay, '%m/%d/%y')
daily_calories$ActivityDay <- as.Date(daily_calories$ActivityDay, '%m/%d/%y')
daily_activity$ActivityDate <- as.Date(daily_activity$ActivityDate, '%m/%d/%y')
In the sleep data, I thought it might be interesting to see some data regarding how much time it takes people to fall asleep. I set up a new column in this dataset to display it.
time_taken_to_sleep <- daily_sleep$TotalTimeInBed - daily_sleep$TotalMinutesAsleep
daily_sleep$time_taken_to_sleep = (daily_sleep$TotalTimeInBed - daily_sleep$TotalMinutesAsleep)
head(time_taken_to_sleep)
## [1] 19 23 30 27 12 16
Next thing I noticed is that there appear to be days where people do not wear their Fitbit tracker, resulting in zero values for total distance and total steps. I do not want to include these in my analyses so I am going to get rid of them before producing some graphs.
cleaned_daily_activity <- daily_activity[!(daily_activity$Calories<=0),]
cleaned_daily_activity <- cleaned_daily_activity[!(cleaned_daily_activity$TotalDistance<=0.00),]
head(cleaned_daily_activity)
## Id ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 1503960366 2020-04-12 13162 8.50 8.50
## 2 1503960366 2020-04-13 10735 6.97 6.97
## 3 1503960366 2020-04-14 10460 6.74 6.74
## 4 1503960366 2020-04-15 9762 6.28 6.28
## 5 1503960366 2020-04-16 12669 8.16 8.16
## 6 1503960366 2020-04-17 9705 6.48 6.48
## LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1 0 1.88 0.55
## 2 0 1.57 0.69
## 3 0 2.44 0.40
## 4 0 2.14 1.26
## 5 0 2.71 0.41
## 6 0 3.19 0.78
## LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 3.91 0 30
## 4 2.83 0 29
## 5 5.04 0 36
## 6 2.51 0 38
## FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1 13 328 728 1985
## 2 19 217 776 1797
## 3 11 181 1218 1776
## 4 34 209 726 1745
## 5 10 221 773 1863
## 6 20 164 539 1728
The last change that I want to make is adding a column in the activity, calories, and sleep datasets for day of the week.
Activity:
daily_activity$ActivityDate <- as.Date(daily_activity$ActivityDate, "%Y-%m-%d")
daily_activity$DayOfWeek <- weekdays(daily_activity$ActivityDate)
head(daily_activity)
## Id ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 1503960366 2020-04-12 13162 8.50 8.50
## 2 1503960366 2020-04-13 10735 6.97 6.97
## 3 1503960366 2020-04-14 10460 6.74 6.74
## 4 1503960366 2020-04-15 9762 6.28 6.28
## 5 1503960366 2020-04-16 12669 8.16 8.16
## 6 1503960366 2020-04-17 9705 6.48 6.48
## LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1 0 1.88 0.55
## 2 0 1.57 0.69
## 3 0 2.44 0.40
## 4 0 2.14 1.26
## 5 0 2.71 0.41
## 6 0 3.19 0.78
## LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 3.91 0 30
## 4 2.83 0 29
## 5 5.04 0 36
## 6 2.51 0 38
## FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories DayOfWeek
## 1 13 328 728 1985 Sunday
## 2 19 217 776 1797 Monday
## 3 11 181 1218 1776 Tuesday
## 4 34 209 726 1745 Wednesday
## 5 10 221 773 1863 Thursday
## 6 20 164 539 1728 Friday
cleaned_daily_activity$ActivityDate <- as.Date(cleaned_daily_activity$ActivityDate, "%Y-%m-%d")
cleaned_daily_activity$DayOfWeek <- weekdays(cleaned_daily_activity$ActivityDate)
head(cleaned_daily_activity)
## Id ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 1503960366 2020-04-12 13162 8.50 8.50
## 2 1503960366 2020-04-13 10735 6.97 6.97
## 3 1503960366 2020-04-14 10460 6.74 6.74
## 4 1503960366 2020-04-15 9762 6.28 6.28
## 5 1503960366 2020-04-16 12669 8.16 8.16
## 6 1503960366 2020-04-17 9705 6.48 6.48
## LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1 0 1.88 0.55
## 2 0 1.57 0.69
## 3 0 2.44 0.40
## 4 0 2.14 1.26
## 5 0 2.71 0.41
## 6 0 3.19 0.78
## LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 3.91 0 30
## 4 2.83 0 29
## 5 5.04 0 36
## 6 2.51 0 38
## FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories DayOfWeek
## 1 13 328 728 1985 Sunday
## 2 19 217 776 1797 Monday
## 3 11 181 1218 1776 Tuesday
## 4 34 209 726 1745 Wednesday
## 5 10 221 773 1863 Thursday
## 6 20 164 539 1728 Friday
Calories:
daily_calories$ActivityDay <- as.Date(daily_calories$ActivityDay, "%Y-%m-%d")
daily_calories$DayOfWeek <- weekdays(daily_calories$ActivityDay)
head(daily_calories)
## Id ActivityDay Calories DayOfWeek
## 1 1503960366 2020-04-12 1985 Sunday
## 2 1503960366 2020-04-13 1797 Monday
## 3 1503960366 2020-04-14 1776 Tuesday
## 4 1503960366 2020-04-15 1745 Wednesday
## 5 1503960366 2020-04-16 1863 Thursday
## 6 1503960366 2020-04-17 1728 Friday
Sleep:
daily_sleep$SleepDay <- as.Date(daily_sleep$SleepDay, "%Y-%m-%d")
daily_sleep$DayOfWeek <- weekdays(daily_sleep$SleepDay)
head(daily_sleep)
## Id SleepDay TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
## 1 1503960366 2020-04-12 1 327 346
## 2 1503960366 2020-04-13 2 384 407
## 3 1503960366 2020-04-15 1 412 442
## 4 1503960366 2020-04-16 2 340 367
## 5 1503960366 2020-04-17 1 700 712
## 6 1503960366 2020-04-19 1 304 320
## time_taken_to_sleep DayOfWeek
## 1 19 Sunday
## 2 23 Monday
## 3 30 Wednesday
## 4 27 Thursday
## 5 12 Friday
## 6 16 Sunday
Now that my data is cleaned, I’m ready to produce some graphs!
I’m going to start by looking at the daily activity data. First, I want to start by running some simple summary statistics to give us a better idea of what to expect.
summary(cleaned_daily_activity$TotalSteps)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8 4927 8054 8329 11096 36019
This shows us that users take an average of 8329 steps a day, which is less than the advised 10,000 steps a day. I find this interesting too because the sample is slightly biased towards people who wear a fitness tracker as we would expect these people to take more steps than those who don’t.
summary(cleaned_daily_activity$Calories)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 52 1857 2220 2362 2832 4900
This shows us that users burn an average of 2,362 calories a day, which is about the expected amount of burned calories for women.
Now, I want to compare how many calories users are burning with how many steps they are taking. I expect to see a positive correlation here.
ggplot(data = cleaned_daily_activity) +
aes(x= TotalSteps, y = Calories) +
geom_point(color = 'blue') +
geom_smooth() +
labs(x = 'Total Steps', y = 'Calories Burned', title = 'Calories Burned vs Total Steps')
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
As expected, this graph shows us a general trend where the more steps a user takes, the more calories they burn. It would be interesting to see some more data of users who take more than 20,000 steps a day to see if the trend continues or it does plateau like the graph predicts based on some outliers.
Similarly, I want to compare the amount of calories users burn with the total distance they go in a day, and I expect it to look quite similar to the previous graph.
ggplot(data = cleaned_daily_activity) +
aes(x= TotalDistance, y = Calories) +
geom_point(color = 'blue') +
geom_smooth() +
labs(x = 'Total Distance', y = 'Calories Burned', title = 'Calories Burned vs Total Distance')
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
As expected, these graphs are nearly identical showing a general trend where the more distance a user goes in a day, the more calories they burn.
Next I want to see how many calories users burn in comparison to sedentary minutes. Here, I am expecting to see an inverse relationship to the previous two graphs.
ggplot(data = cleaned_daily_activity) +
aes(x= SedentaryMinutes, y = Calories) +
geom_point(color = 'blue') +
geom_smooth() +
labs(x = 'Sedentary Minutes', y = 'Calories Burned', title = 'Calories Burned vs Sedentary Minutes')
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
This result was a bit surprising to me, as I expected to see the opposite of my first two charts, however it appears it starts with a positive correlation until a user hits about 750 minutes of sedentary activity and it starts to plateau a bit and become more negatively correlated.
After seeing all of this, I think it would be interesting to check and see if there are any trends in the amount of steps users are taking per day based on the day of the week.
daily_activity$DayOfWeek <- factor(daily_activity$DayOfWeek, c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday","Friday","Saturday"))
ggplot(data = daily_activity) +
aes(x = DayOfWeek, y = TotalSteps) +
geom_col(fill = 'blue') +
labs(x = 'Day of Week', y = 'Total Steps', title = 'Total steps taken in a week')
This graph shows us that users tend to take the most amount of steps on Sundays and the least amount of steps on Fridays. It also appears as though users tend to take fewer steps as the week progresses.
I also want to see if there is a relationship between the amount of calories a user burns based on the day of the week.
options(scipen = 999) #remove scientific notation
cleaned_daily_activity$DayOfWeek <- factor(cleaned_daily_activity$DayOfWeek, c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday","Friday","Saturday"))
ggplot(data = cleaned_daily_activity) +
aes(x = DayOfWeek, y = Calories) +
geom_col(fill = 'blue')
labs(x = 'Day of week', y = 'Calories Burned', title = 'Calories burned in a week')
## $x
## [1] "Day of week"
##
## $y
## [1] "Calories Burned"
##
## $title
## [1] "Calories burned in a week"
##
## attr(,"class")
## [1] "labels"
As expected, this comparison gives us similar results to total steps where users tend to burn the most amount of calories on Sundays and the least amount on Fridays.
Now that I have seen some graphs for users’ daily activity I want to look at users’ sleep activity. I want to first run a few simple summary statistics:
summary(daily_sleep$TotalMinutesAsleep)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 58.0 361.0 433.0 419.5 490.0 796.0
This tells us that the average amount of sleep users are getting is 419.5 minutes, which is almost 7 hours, which is an hour less than the suggested amount of sleep people should be getting per night.
summary(daily_sleep$time_taken_to_sleep)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 17.00 25.00 39.17 40.00 371.00
This tells us it takes an average of 39 minutes for users to fall asleep, ranging from 0 minutes to 371 minutes (a little over 6 hours!)
I think it would be interesting to see if there are any trends in how much sleep users are getting based on the day of the week.
daily_sleep$DayOfWeek <- factor(daily_sleep$DayOfWeek, c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday","Friday","Saturday"))
ggplot(data = daily_sleep) +
aes(x = DayOfWeek, y = TotalMinutesAsleep) +
geom_col(fill = 'blue') +
labs(x = 'Day of week', y = 'Total Minutes Asleep', title = 'Total sleep in a week')
This graph doesn’t show any huge trends, but most notably users tend to sleep the least on Saturdays and the most on Mondays.
I also thought it would be interesting to search for trends regarding how long it takes a user to fall asleep on a particular day of the week.
daily_sleep$DayOfWeek <- factor(daily_sleep$DayOfWeek, c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday","Friday","Saturday"))
ggplot(data = daily_sleep) +
aes(x = DayOfWeek, y = time_taken_to_sleep) +
geom_col(fill = 'blue') +
labs(x = 'Day of week', y = 'Time to fall asleep', title = 'Time taken to fall asleep by days of week')
This graph tells us that on Fridays users tend to take the longest amount of time to fall asleep and on Saturday users take the least amount of time to fall asleep.
One thing that sparked my curiosity was to see if the amount of time someone spends sleeping is in any way correlated to the amount of time they take to fall asleep.
ggplot(data = daily_sleep) +
aes(x= TotalMinutesAsleep, y = time_taken_to_sleep) +
geom_point(color = 'blue') +
geom_smooth() +
labs(x = 'Time Asleep', y = 'Time to fall asleep', title = 'Time sleeping vs. fall asleep')
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
There does not appear to be a strong relationship here in any way.
Finally, I wanted to compare the time a user spends asleep with the time they are in bed. Here, I expect to see a strong positive correlation.
ggplot(data = daily_sleep) +
aes(x= TotalMinutesAsleep, y = TotalTimeInBed) +
geom_point(color = 'blue') +
geom_smooth() +
labs(x = 'Time Asleep', y = 'Time in Bed', title = 'Time sleeping vs. time in bed')
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
My findings provide valuable insights into user behavior with FitBit trackers. One thing I may suggest to Bellabeat would be to potentially target certain days of the week or activity levels for promoting their wellness products. For example, hosting classes on days of the week where activity is typically higher. Additionally, the analysis highlights the need to consider individual variations in sleep patterns and activity levels when tailoring marketing efforts. Going off of this, it may be wise to create some sort of product that more actively tracks sleep patterns since people are giving their information based off of their best guess. I think something that may be beneficial for Bellabeat would be to partner with other health and wellness brands to promote active living by providing incentives such as deals on products.