Bellabeat is a cutting-edge technology company founded by UrÅ”ka SrÅ”en and Sandor Mur. All of their products are designed to improve peopleās health. Sren drew on her background as an artist to develop innovative tools that give women access to information and inspiration on a global scale. By tracking their movement, rest, emotional state, and reproductive health, Bellabeat has provided women with the tools they need to take charge of their own health and well-being. Since its inception in 2013, Bellabeat, a tech-driven wellness company for women, has experienced explosive growth.
Ivy: is a one-of-a-kind health and wellness tracker designed by women specifically for women. Ivy, a stylish bracelet that analyzes your physiological data, physical and mental activities, knows what you need to do to improve your self-care routines and achieve peak performance. This product was just released, and its data are not included in our analysis.
Bellabeat app: offers users with health-related information regarding their activity, sleep, stress, menstrual cycle, and mindfulness practices. This data can aid users in gaining a better understanding of their existing habits and in making healthy choices. The Bellabeat app is compatible with their smart wellness product line.
Leaf: The Leaf wellness tracker by Bellabeat can be worn as a bracelet, necklace, or clip. Leaf connects to the Bellabeat app to monitor exercise, sleep, and stress. This health watch blends the timeless design of a classic clock with advanced technology to monitor the wearerās activity, sleep, and stress levels. The Time watch connects to the Bellabeat app in order to deliver daily wellness information.
Spring: This is a smart water bottle that monitors daily water intake to ensure that you remain well hydrated throughout the day. The Spring bottle is integrated with the Bellabeat app to monitor hydration levels.
Membership on Bellabeat Bellabeat also provides consumers with a subscription-based membership scheme. Membership grants customers access to fully customized advice on diet, fitness, sleep, health and beauty, and mindfulness, depending on their lifestyle and objectives, 24 hours a day, seven days a week.
Specifically, Iāll be looking at the data from Bellabeatās Leaf and Time products to learn more about how people are employing smart watches.
SrŔen requests that the data regarding the usage of smart devices be analyzed in order to acquire insight into the manner in which customers utilize smart devices that are not Bellabeat products.
How has the use of smart devices evolved recently?
Wearable fitness technology, including gadgets like FitBits and
smartwatches, has established itself as a viable niche in the healthcare
market. Consumersā interest in tracking their own health and vital signs
has led to a tripling in the adoption of wearable devices over the past
four years.Wearables are predicted to remain popular over the next few
years as more people become open to sharing their health data with
healthcare professionals and insurance. Insider Intelligence predicted
in October 2021 that the US Smart wearable user market would expand
25.5% YoY in 2023, up from 23.3% YoY growth in 2021.SmartDevices
Evolution
Find the key differences between Fitbit users and Bellabeat users and how digital media and other factors could influence them.
UrÅ”ka SrÅ”en: Bellabeatās co founder and Chief Creative Officer Sando Mur: Mathematician and Bellabeat co founder; key member of the Bellabeat executive team Bellabeat marketing analytics team: A team of data analysts responsible for collecting, analyzing, and reporting data that helps guide Bellabeatās marketing strategy.
The data is located in a kaggle dataset
This dataset is made available through Mobius
This dataset is hosted on Kaggle and was made public through user
Mobius in an open-source format. Hence, the data is public and available
to be copied, updated, and distributed, all without asking the user for
permission. According to reports, these datasets were created by
respondents to a distributed poll conducted by Amazon Mechanical
Turk between March 12 and May 12, 2016. Thirty qualified Fitbit
users reportedly (see credibility section immediately below) agreed to
submit personal tracker data, to include information about their daily
activity (number of steps walked, calories burnt, time awake, heart
rate, and distance traveled). This information was compiled by the
minute, the hour, and the day. Eighteen CSV files provide this
information. After saving all 18 files to my laptop, I decided to use
just 3 of them because they contained all the activities, sleep data and
weight Log Information. For security purposes, the rest of the files
have been wiped clean. The 3 files that were used for further analysis
are:
sleepDay_merged.csv.
dailyActivity_merged.csv.
weightLogInfo.csv.
Confirming the ROCCC process:
Reliable: No, the data is not reliable because
there are so few people represented in the sample (33). This large of a
number increases the likelihood of statistical error.
Original: No, a third-party service generates
the original dataset. Amazon Mechanical Turk.
Comprehensive: Yes/No, the information is
highly relevant to the Bellabeat Leaf productās sleep and activity
characteristics but does not represent any other features.
Current: A recent study, yes; this one is 7
years old, so may-not be relevant. Cited:
Referenced - The information was gathered without revealing any personal
details.
Aside from the ID and LogId number there is no personal information within the data collected. So there are no privacy concerns to address. The participants remain anonymous. That being said, I do not know the age or gender, color, status of these participants so I am unaware of bias. Note: Overall, this is not a quality dataset to be used for actual business recommendations.
Make data more understandable and readable by cleaning and formatting it. At this point, the data has been organized by adding columns, extracting relevant information, and eliminating any errors or duplication. To keep things straightforward, Iāve compiled everything into R. By transforming the CSV files into tables and then linking those tables together using common properties, I was able to simply handle the full set of files and run the necessary queries.
I uploaded the CSV files to my project from the relevant data sources mentioned above.
install.packages("tidyverse")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("lubridate")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("janitor")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("dplyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("ggplot2")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("readr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("tidyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(tidyverse)
## āā Attaching core tidyverse packages āāāāāāāāāāāāāāāāāāāāāāāā tidyverse 2.0.0 āā
## ā dplyr 1.1.1 ā readr 2.1.4
## ā forcats 1.0.0 ā stringr 1.5.0
## ā ggplot2 3.4.1 ā tibble 3.2.1
## ā lubridate 1.9.2 ā tidyr 1.3.0
## ā purrr 1.0.1
## āā Conflicts āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā tidyverse_conflicts() āā
## ā dplyr::filter() masks stats::filter()
## ā dplyr::lag() masks stats::lag()
## ā¹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
library(lubridate)
library(janitor)
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(dplyr)
library(ggplot2)
library(tidyr)
Here I am considering three CSV files dailyActivity_merged.csv, sleepDay_merged.csv, weightLogInfo.csv instead of loading all 18 files in R. The reason why I considered loading and analyzing these 3 files were that the dailyActivity_merged.csv contains a lot of same entities as the rest of the tables e.g.Ā calories, intensity, distance and steps data recorded on a daily basis. So to avoid the duplicacy of data I have considered only 3 datasets.
dailyActivity <- read_csv("/cloud/project/Fitabase Data 4.12.16 - 5.12.16/dailyActivity_merged.csv")
## Rows: 940 Columns: 15
## āā Column specification āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
## Delimiter: ","
## chr (1): ActivityDate
## dbl (14): Id, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDi...
##
## ā¹ Use `spec()` to retrieve the full column specification for this data.
## ā¹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
(dailyActivity)
## # A tibble: 940 Ć 15
## Id Activityā¦Ā¹ Totalā¦Ā² Totalā¦Ā³ Trackā¦ā“ Loggeā¦āµ VeryAā¦ā¶ Moderā¦ā· Lightā¦āø
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1503960366 4/12/2016 13162 8.5 8.5 0 1.88 0.550 6.06
## 2 1503960366 4/13/2016 10735 6.97 6.97 0 1.57 0.690 4.71
## 3 1503960366 4/14/2016 10460 6.74 6.74 0 2.44 0.400 3.91
## 4 1503960366 4/15/2016 9762 6.28 6.28 0 2.14 1.26 2.83
## 5 1503960366 4/16/2016 12669 8.16 8.16 0 2.71 0.410 5.04
## 6 1503960366 4/17/2016 9705 6.48 6.48 0 3.19 0.780 2.51
## 7 1503960366 4/18/2016 13019 8.59 8.59 0 3.25 0.640 4.71
## 8 1503960366 4/19/2016 15506 9.88 9.88 0 3.53 1.32 5.03
## 9 1503960366 4/20/2016 10544 6.68 6.68 0 1.96 0.480 4.24
## 10 1503960366 4/21/2016 9819 6.34 6.34 0 1.34 0.350 4.65
## # ⦠with 930 more rows, 6 more variables: SedentaryActiveDistance <dbl>,
## # VeryActiveMinutes <dbl>, FairlyActiveMinutes <dbl>,
## # LightlyActiveMinutes <dbl>, SedentaryMinutes <dbl>, Calories <dbl>, and
## # abbreviated variable names ¹āActivityDate, ²āTotalSteps, ³āTotalDistance,
## # ā“āTrackerDistance, āµāLoggedActivitiesDistance, ā¶āVeryActiveDistance,
## # ā·āModeratelyActiveDistance, āøāLightActiveDistance
sleepDay <- read_csv("/cloud/project/Fitabase Data 4.12.16 - 5.12.16/sleepDay_merged.csv")
## Rows: 413 Columns: 5
## āā Column specification āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
## Delimiter: ","
## chr (1): SleepDay
## dbl (4): Id, TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed
##
## ā¹ Use `spec()` to retrieve the full column specification for this data.
## ā¹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
(sleepDay)
## # A tibble: 413 Ć 5
## Id SleepDay TotalSleepRecords TotalMinutesAsleep Totalā¦Ā¹
## <dbl> <chr> <dbl> <dbl> <dbl>
## 1 1503960366 4/12/2016 12:00:00 AM 1 327 346
## 2 1503960366 4/13/2016 12:00:00 AM 2 384 407
## 3 1503960366 4/15/2016 12:00:00 AM 1 412 442
## 4 1503960366 4/16/2016 12:00:00 AM 2 340 367
## 5 1503960366 4/17/2016 12:00:00 AM 1 700 712
## 6 1503960366 4/19/2016 12:00:00 AM 1 304 320
## 7 1503960366 4/20/2016 12:00:00 AM 1 360 377
## 8 1503960366 4/21/2016 12:00:00 AM 1 325 364
## 9 1503960366 4/23/2016 12:00:00 AM 1 361 384
## 10 1503960366 4/24/2016 12:00:00 AM 1 430 449
## # ⦠with 403 more rows, and abbreviated variable name ¹āTotalTimeInBed
weightLogInfo <- read_csv("/cloud/project/Fitabase Data 4.12.16 - 5.12.16/weightLogInfo_merged.csv")
## Rows: 67 Columns: 8
## āā Column specification āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
## Delimiter: ","
## chr (1): Date
## dbl (6): Id, WeightKg, WeightPounds, Fat, BMI, LogId
## lgl (1): IsManualReport
##
## ā¹ Use `spec()` to retrieve the full column specification for this data.
## ā¹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
(weightLogInfo)
## # A tibble: 67 Ć 8
## Id Date WeightKg Weighā¦Ā¹ Fat BMI IsManā¦Ā² LogId
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <lgl> <dbl>
## 1 1503960366 5/2/2016 11:59:59 PM 52.6 116. 22 22.6 TRUE 1.46e12
## 2 1503960366 5/3/2016 11:59:59 PM 52.6 116. NA 22.6 TRUE 1.46e12
## 3 1927972279 4/13/2016 1:08:52 AM 134. 294. NA 47.5 FALSE 1.46e12
## 4 2873212765 4/21/2016 11:59:59 PM 56.7 125. NA 21.5 TRUE 1.46e12
## 5 2873212765 5/12/2016 11:59:59 PM 57.3 126. NA 21.7 TRUE 1.46e12
## 6 4319703577 4/17/2016 11:59:59 PM 72.4 160. 25 27.5 TRUE 1.46e12
## 7 4319703577 5/4/2016 11:59:59 PM 72.3 159. NA 27.4 TRUE 1.46e12
## 8 4558609924 4/18/2016 11:59:59 PM 69.7 154. NA 27.2 TRUE 1.46e12
## 9 4558609924 4/25/2016 11:59:59 PM 70.3 155. NA 27.5 TRUE 1.46e12
## 10 4558609924 5/1/2016 11:59:59 PM 69.9 154. NA 27.3 TRUE 1.46e12
## # ⦠with 57 more rows, and abbreviated variable names ¹āWeightPounds,
## # ²āIsManualReport
count(distinct(dailyActivity, Id))
## # A tibble: 1 Ć 1
## n
## <int>
## 1 33
For the result: I got 33 row
count(distinct(sleepDay, Id))
## # A tibble: 1 Ć 1
## n
## <int>
## 1 24
Here, I retrieved 24 rows
count(distinct(weightLogInfo, Id))
## # A tibble: 1 Ć 1
## n
## <int>
## 1 8
Here I retrieved only 8 rows, which is quite less to consider for our analysis
str(dailyActivity)
## spc_tbl_ [940 Ć 15] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Id : num [1:940] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ ActivityDate : chr [1:940] "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
## $ TotalSteps : num [1:940] 13162 10735 10460 9762 12669 ...
## $ TotalDistance : num [1:940] 8.5 6.97 6.74 6.28 8.16 ...
## $ TrackerDistance : num [1:940] 8.5 6.97 6.74 6.28 8.16 ...
## $ LoggedActivitiesDistance: num [1:940] 0 0 0 0 0 0 0 0 0 0 ...
## $ VeryActiveDistance : num [1:940] 1.88 1.57 2.44 2.14 2.71 ...
## $ ModeratelyActiveDistance: num [1:940] 0.55 0.69 0.4 1.26 0.41 ...
## $ LightActiveDistance : num [1:940] 6.06 4.71 3.91 2.83 5.04 ...
## $ SedentaryActiveDistance : num [1:940] 0 0 0 0 0 0 0 0 0 0 ...
## $ VeryActiveMinutes : num [1:940] 25 21 30 29 36 38 42 50 28 19 ...
## $ FairlyActiveMinutes : num [1:940] 13 19 11 34 10 20 16 31 12 8 ...
## $ LightlyActiveMinutes : num [1:940] 328 217 181 209 221 164 233 264 205 211 ...
## $ SedentaryMinutes : num [1:940] 728 776 1218 726 773 ...
## $ Calories : num [1:940] 1985 1797 1776 1745 1863 ...
## - attr(*, "spec")=
## .. cols(
## .. Id = col_double(),
## .. ActivityDate = col_character(),
## .. TotalSteps = col_double(),
## .. TotalDistance = col_double(),
## .. TrackerDistance = col_double(),
## .. LoggedActivitiesDistance = col_double(),
## .. VeryActiveDistance = col_double(),
## .. ModeratelyActiveDistance = col_double(),
## .. LightActiveDistance = col_double(),
## .. SedentaryActiveDistance = col_double(),
## .. VeryActiveMinutes = col_double(),
## .. FairlyActiveMinutes = col_double(),
## .. LightlyActiveMinutes = col_double(),
## .. SedentaryMinutes = col_double(),
## .. Calories = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
str(sleepDay)
## spc_tbl_ [413 Ć 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Id : num [1:413] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ SleepDay : chr [1:413] "4/12/2016 12:00:00 AM" "4/13/2016 12:00:00 AM" "4/15/2016 12:00:00 AM" "4/16/2016 12:00:00 AM" ...
## $ TotalSleepRecords : num [1:413] 1 2 1 2 1 1 1 1 1 1 ...
## $ TotalMinutesAsleep: num [1:413] 327 384 412 340 700 304 360 325 361 430 ...
## $ TotalTimeInBed : num [1:413] 346 407 442 367 712 320 377 364 384 449 ...
## - attr(*, "spec")=
## .. cols(
## .. Id = col_double(),
## .. SleepDay = col_character(),
## .. TotalSleepRecords = col_double(),
## .. TotalMinutesAsleep = col_double(),
## .. TotalTimeInBed = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
The Activity_Date entity and SleepDay entity are both Date/Time however they are defined incorrectly as a double character format.So, changing the format of the same
dailyActivity$ActivityDate <- as.Date.character(dailyActivity$ActivityDate, format = "%m/%d/%Y")
sleepDay$SleepDay <- as.Date.character(sleepDay$SleepDay, format = "%m/%d/%Y")
head(dailyActivity)
## # A tibble: 6 Ć 15
## Id ActivityDā¦Ā¹ Totalā¦Ā² Totalā¦Ā³ Trackā¦ā“ Loggeā¦āµ VeryAā¦ā¶ Moderā¦ā· Lightā¦āø
## <dbl> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1503960366 2016-04-12 13162 8.5 8.5 0 1.88 0.550 6.06
## 2 1503960366 2016-04-13 10735 6.97 6.97 0 1.57 0.690 4.71
## 3 1503960366 2016-04-14 10460 6.74 6.74 0 2.44 0.400 3.91
## 4 1503960366 2016-04-15 9762 6.28 6.28 0 2.14 1.26 2.83
## 5 1503960366 2016-04-16 12669 8.16 8.16 0 2.71 0.410 5.04
## 6 1503960366 2016-04-17 9705 6.48 6.48 0 3.19 0.780 2.51
## # ⦠with 6 more variables: SedentaryActiveDistance <dbl>,
## # VeryActiveMinutes <dbl>, FairlyActiveMinutes <dbl>,
## # LightlyActiveMinutes <dbl>, SedentaryMinutes <dbl>, Calories <dbl>, and
## # abbreviated variable names ¹āActivityDate, ²āTotalSteps, ³āTotalDistance,
## # ā“āTrackerDistance, āµāLoggedActivitiesDistance, ā¶āVeryActiveDistance,
## # ā·āModeratelyActiveDistance, āøāLightActiveDistance
glimpse(dailyActivity)
## Rows: 940
## Columns: 15
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 150396036ā¦
## $ ActivityDate <date> 2016-04-12, 2016-04-13, 2016-04-14, 2016-04-ā¦
## $ TotalSteps <dbl> 13162, 10735, 10460, 9762, 12669, 9705, 13019ā¦
## $ TotalDistance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.8ā¦
## $ TrackerDistance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.8ā¦
## $ LoggedActivitiesDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ā¦
## $ VeryActiveDistance <dbl> 1.88, 1.57, 2.44, 2.14, 2.71, 3.19, 3.25, 3.5ā¦
## $ ModeratelyActiveDistance <dbl> 0.55, 0.69, 0.40, 1.26, 0.41, 0.78, 0.64, 1.3ā¦
## $ LightActiveDistance <dbl> 6.06, 4.71, 3.91, 2.83, 5.04, 2.51, 4.71, 5.0ā¦
## $ SedentaryActiveDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ā¦
## $ VeryActiveMinutes <dbl> 25, 21, 30, 29, 36, 38, 42, 50, 28, 19, 66, 4ā¦
## $ FairlyActiveMinutes <dbl> 13, 19, 11, 34, 10, 20, 16, 31, 12, 8, 27, 21ā¦
## $ LightlyActiveMinutes <dbl> 328, 217, 181, 209, 221, 164, 233, 264, 205, ā¦
## $ SedentaryMinutes <dbl> 728, 776, 1218, 726, 773, 539, 1149, 775, 818ā¦
## $ Calories <dbl> 1985, 1797, 1776, 1745, 1863, 1728, 1921, 203ā¦
head(sleepDay)
## # A tibble: 6 Ć 5
## Id SleepDay TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
## <dbl> <date> <dbl> <dbl> <dbl>
## 1 1503960366 2016-04-12 1 327 346
## 2 1503960366 2016-04-13 2 384 407
## 3 1503960366 2016-04-15 1 412 442
## 4 1503960366 2016-04-16 2 340 367
## 5 1503960366 2016-04-17 1 700 712
## 6 1503960366 2016-04-19 1 304 320
glimpse(sleepDay)
## Rows: 413
## Columns: 5
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 150ā¦
## $ SleepDay <date> 2016-04-12, 2016-04-13, 2016-04-15, 2016-04-16, 20ā¦
## $ TotalSleepRecords <dbl> 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ā¦
## $ TotalMinutesAsleep <dbl> 327, 384, 412, 340, 700, 304, 360, 325, 361, 430, 2ā¦
## $ TotalTimeInBed <dbl> 346, 407, 442, 367, 712, 320, 377, 364, 384, 449, 3ā¦
sleepDay <- rename(sleepDay, date = SleepDay)
dailyActivity <- rename(dailyActivity, date = ActivityDate)
merged_daily_activity <- merge(x = dailyActivity, y = sleepDay, by = c("Id", "date"), all.x = TRUE )
Output of the merge
head(merged_daily_activity)
## Id date TotalSteps TotalDistance TrackerDistance
## 1 1503960366 2016-04-12 13162 8.50 8.50
## 2 1503960366 2016-04-13 10735 6.97 6.97
## 3 1503960366 2016-04-14 10460 6.74 6.74
## 4 1503960366 2016-04-15 9762 6.28 6.28
## 5 1503960366 2016-04-16 12669 8.16 8.16
## 6 1503960366 2016-04-17 9705 6.48 6.48
## LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1 0 1.88 0.55
## 2 0 1.57 0.69
## 3 0 2.44 0.40
## 4 0 2.14 1.26
## 5 0 2.71 0.41
## 6 0 3.19 0.78
## LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 3.91 0 30
## 4 2.83 0 29
## 5 5.04 0 36
## 6 2.51 0 38
## FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1 13 328 728 1985
## 2 19 217 776 1797
## 3 11 181 1218 1776
## 4 34 209 726 1745
## 5 10 221 773 1863
## 6 20 164 539 1728
## TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
## 1 1 327 346
## 2 2 384 407
## 3 NA NA NA
## 4 1 412 442
## 5 2 340 367
## 6 1 700 712
merged_daily_activity <- transform(merged_daily_activity, Weekday = weekdays(date))
glimpse(merged_daily_activity)
## Rows: 943
## Columns: 19
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 150396036ā¦
## $ date <date> 2016-04-12, 2016-04-13, 2016-04-14, 2016-04-ā¦
## $ TotalSteps <dbl> 13162, 10735, 10460, 9762, 12669, 9705, 13019ā¦
## $ TotalDistance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.8ā¦
## $ TrackerDistance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.8ā¦
## $ LoggedActivitiesDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ā¦
## $ VeryActiveDistance <dbl> 1.88, 1.57, 2.44, 2.14, 2.71, 3.19, 3.25, 3.5ā¦
## $ ModeratelyActiveDistance <dbl> 0.55, 0.69, 0.40, 1.26, 0.41, 0.78, 0.64, 1.3ā¦
## $ LightActiveDistance <dbl> 6.06, 4.71, 3.91, 2.83, 5.04, 2.51, 4.71, 5.0ā¦
## $ SedentaryActiveDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ā¦
## $ VeryActiveMinutes <dbl> 25, 21, 30, 29, 36, 38, 42, 50, 28, 19, 66, 4ā¦
## $ FairlyActiveMinutes <dbl> 13, 19, 11, 34, 10, 20, 16, 31, 12, 8, 27, 21ā¦
## $ LightlyActiveMinutes <dbl> 328, 217, 181, 209, 221, 164, 233, 264, 205, ā¦
## $ SedentaryMinutes <dbl> 728, 776, 1218, 726, 773, 539, 1149, 775, 818ā¦
## $ Calories <dbl> 1985, 1797, 1776, 1745, 1863, 1728, 1921, 203ā¦
## $ TotalSleepRecords <dbl> 1, 2, NA, 1, 2, 1, NA, 1, 1, 1, NA, 1, 1, 1, ā¦
## $ TotalMinutesAsleep <dbl> 327, 384, NA, 412, 340, 700, NA, 304, 360, 32ā¦
## $ TotalTimeInBed <dbl> 346, 407, NA, 442, 367, 712, NA, 320, 377, 36ā¦
## $ Weekday <chr> "Tuesday", "Wednesday", "Thursday", "Friday",ā¦
sum(duplicated(merged_daily_activity))
## [1] 3
Running the code, I found that there are 3 rows which are duplicated. Letās drop these rows to avoid duplicacy of data.
merged_daily_activity <- merged_daily_activity %>% distinct() %>% drop_na()
Verifying the data again to have a clean data
sum(duplicated(merged_daily_activity))
## [1] 0
Clearly communicate to the stakeholders of Bellabeat the insights that I have gained throughout the course of this data analysis project in such a way that it assists the stakeholders of Bellabeat in driving future data analysis projects for the purpose of assisting marketing strategies and promoting future growth. The most important takeaways are: According to the findings, the FitBit users appear to have sampled their recorded Step data more than twice as frequently as they recorded their Sleep Data. If this is indeed the case, then new data sources may present further chances for Bellabeat to educate its clients about the relative significance of getting enough sleep and staying active.
The more active a person is, as measured by the total amount of time spent being active, the more calories that person will burn. The number of steps taken and the distance traveled both have a direct bearing on the amount of calories that are expended. When one looks at the graph, it is rather obvious that the people burned more calories when they had sufficient amounts of rest. Which brings us to the conclusion that our theory that being more active will not only help us retain excellent health, but it will also be advantageous at work as well as in our personal lives, where we would be able to be more productive and sleep better.
Bellabeat should advertise the benefits of their products alongside the advantages of walking, running, or other forms of exercise, as well as the fact that Bellabeat products can help to monitor and manage healthy lifestyles by providing insights and data to continue improving and incorporate an active life. The application may be simple to use and can offer guidance to the consumer based on data patterns recorded by the application over a one-month period. As a result, it is essential to enable the individualsĀ to progressively improve from a Sedentary Lifestyle to a Casually Active to a Fairly Active Lifestyle, and to assist consumers in achieving their goal as an incentive to do so.
This case study provided me with a wealth of knowledge and insights for carrying out the analysis process from beginning to conclusion.