Ask

Bellabeat is a small successful Company founded by Urška Sršen and Sando Mur, a high-tech Company that manufactures a health-concentrated smart device called Bellabeat App, which is only focused for women. Analysis of this aforementioned product can provide health-related data and also provide insights on how consumers (women) use non-Bellabeat smart devices. Hence, the analysis of Bellabeat’s available consumer data would reveal more opportunities for the Company’s growth.

1. The business project

The task is to look-out for trends on how people use smart devices and how these insights can be used by the Bellabeat patronizers.

2. Stakeholders

• Urška Sršen: Bellabeat’s cofounder and Chief Creative Officer

• Sando Mur: Mathematician and Bellabeat’s cofounder; key member of the Bellabeat executive team

• Bellabeat marketing analytics team

Prepare

The repository to this Bellabeat dataset was kaggle, and it’s made available through Mobius. Moreover, the raw dataset can be accessed through this link.

  1. Data Source: The data is a public dataset which is made accessible to everyone. However, thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring.
  2. Data Organization: The dataset is organized into eighteen (18) files respectively, with each constituting information about daily activities, body mass/weight, calories, etc. Information in each file is keyed-in by the user ID in a long format data type.
  3. Data Integrity: For Data Analyst, the best way to check for data integrity is when the Sample Size representing the whole population is greater than or equal to 30. However, as the Bellabeat dataset is concerned, it only contains eighteen (18) Sample Size which is not too fit enough to represent the whole population. Hence, the dataset could be biased. Additionally, the dataset was assembled in the year 2016, noting that it is expected to become defunct or out of date because Covid-19 invasion might or may have infiltrated and changed people’s lifestyles.

Process

1.Data Cleaning with Spreadsheets

For this case study data, I was able to upload each file constituting the dataset with Excel, and used some of its functions specifically to remove duplicates, leading, trailing and repeated spaces in the data. Moreover, i chose to work on the major core functions and characters of the Bellabeat App such as the users’ daily activities with respect to calories burnt by each users, and their respective body-mass-index. Hence, no duplicate was found in the dataset as indicated in the image below.

2.Setting my environment in R, to getting the summary of each dataset files respectively

Notes: Setting my R environment by loading the ‘tidyverse’ and other useful packages for data analysis and visualizations.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.5 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.3      ✔ forcats 0.5.2
## Warning: package 'purrr' was built under R version 4.2.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(dplyr)
library(tidyr)
library(here)
## here() starts at C:/Users/MOSES OLUFEMI/Desktop/DOWNLOADED DATA/FitBase_Analysis
library(ggplot2)
library(colorspace)
library(readr)
dailyActivity_merged <-read_csv("dailyActivity_merged.csv")
## Rows: 940 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): ActivityDate
## dbl (14): Id, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDi...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
sleepDay_merged <-read_csv("sleepDay_merged.csv")
## Rows: 413 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): SleepDay
## dbl (4): Id, TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
weightLogInfo_merged <-read_csv("weightLogInfo_merged.csv")
## Rows: 67 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Date
## dbl (6): Id, WeightKg, WeightPounds, Fat, BMI, LogId
## lgl (1): IsManualReport
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
ggsave("dailyActivity_merged2.png")
## Saving 7 x 5 in image
ggsave("Immobility_Pattern.png")
## Saving 7 x 5 in image
ggsave("sleepDay_merged.png")
## Saving 7 x 5 in image
ggsave("DailyActivity.png")
## Saving 7 x 5 in image

Analyse

As this analysis is concerned, i can observe and assume that most users live a sedentary lifestyle. But to confirm this, we need to do some analyses and then preview users’ daily activity patterns in the respective dataset files.

str(dailyActivity_merged)
## spec_tbl_df [940 × 15] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Id                      : num [1:940] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityDate            : chr [1:940] "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
##  $ TotalSteps              : num [1:940] 13162 10735 10460 9762 12669 ...
##  $ TotalDistance           : num [1:940] 8.5 6.97 6.74 6.28 8.16 ...
##  $ TrackerDistance         : num [1:940] 8.5 6.97 6.74 6.28 8.16 ...
##  $ LoggedActivitiesDistance: num [1:940] 0 0 0 0 0 0 0 0 0 0 ...
##  $ VeryActiveDistance      : num [1:940] 1.88 1.57 2.44 2.14 2.71 ...
##  $ ModeratelyActiveDistance: num [1:940] 0.55 0.69 0.4 1.26 0.41 ...
##  $ LightActiveDistance     : num [1:940] 6.06 4.71 3.91 2.83 5.04 ...
##  $ SedentaryActiveDistance : num [1:940] 0 0 0 0 0 0 0 0 0 0 ...
##  $ VeryActiveMinutes       : num [1:940] 25 21 30 29 36 38 42 50 28 19 ...
##  $ FairlyActiveMinutes     : num [1:940] 13 19 11 34 10 20 16 31 12 8 ...
##  $ LightlyActiveMinutes    : num [1:940] 328 217 181 209 221 164 233 264 205 211 ...
##  $ SedentaryMinutes        : num [1:940] 728 776 1218 726 773 ...
##  $ Calories                : num [1:940] 1985 1797 1776 1745 1863 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Id = col_double(),
##   ..   ActivityDate = col_character(),
##   ..   TotalSteps = col_double(),
##   ..   TotalDistance = col_double(),
##   ..   TrackerDistance = col_double(),
##   ..   LoggedActivitiesDistance = col_double(),
##   ..   VeryActiveDistance = col_double(),
##   ..   ModeratelyActiveDistance = col_double(),
##   ..   LightActiveDistance = col_double(),
##   ..   SedentaryActiveDistance = col_double(),
##   ..   VeryActiveMinutes = col_double(),
##   ..   FairlyActiveMinutes = col_double(),
##   ..   LightlyActiveMinutes = col_double(),
##   ..   SedentaryMinutes = col_double(),
##   ..   Calories = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
colnames(dailyActivity_merged)
##  [1] "Id"                       "ActivityDate"            
##  [3] "TotalSteps"               "TotalDistance"           
##  [5] "TrackerDistance"          "LoggedActivitiesDistance"
##  [7] "VeryActiveDistance"       "ModeratelyActiveDistance"
##  [9] "LightActiveDistance"      "SedentaryActiveDistance" 
## [11] "VeryActiveMinutes"        "FairlyActiveMinutes"     
## [13] "LightlyActiveMinutes"     "SedentaryMinutes"        
## [15] "Calories"

The ‘dailyActivity_merged’ file contains 940 rows and 15 columns, with column names Id, ActivityDate, TotalSteps, TotalDistance, etc.

str(sleepDay_merged)
## spec_tbl_df [413 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Id                : num [1:413] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ SleepDay          : chr [1:413] "4/12/2016 12:00:00 AM" "4/13/2016 12:00:00 AM" "4/15/2016 12:00:00 AM" "4/16/2016 12:00:00 AM" ...
##  $ TotalSleepRecords : num [1:413] 1 2 1 2 1 1 1 1 1 1 ...
##  $ TotalMinutesAsleep: num [1:413] 327 384 412 340 700 304 360 325 361 430 ...
##  $ TotalTimeInBed    : num [1:413] 346 407 442 367 712 320 377 364 384 449 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Id = col_double(),
##   ..   SleepDay = col_character(),
##   ..   TotalSleepRecords = col_double(),
##   ..   TotalMinutesAsleep = col_double(),
##   ..   TotalTimeInBed = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
colnames(sleepDay_merged)
## [1] "Id"                 "SleepDay"           "TotalSleepRecords" 
## [4] "TotalMinutesAsleep" "TotalTimeInBed"

The SleepDay_merged file contains 413 rows and 5 columns, with column names Id, SleepDay, etc.

str(weightLogInfo_merged)
## spec_tbl_df [67 × 8] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Id            : num [1:67] 1.50e+09 1.50e+09 1.93e+09 2.87e+09 2.87e+09 ...
##  $ Date          : chr [1:67] "5/2/2016 11:59:59 PM" "5/3/2016 11:59:59 PM" "4/13/2016 1:08:52 AM" "4/21/2016 11:59:59 PM" ...
##  $ WeightKg      : num [1:67] 52.6 52.6 133.5 56.7 57.3 ...
##  $ WeightPounds  : num [1:67] 116 116 294 125 126 ...
##  $ Fat           : num [1:67] 22 NA NA NA NA 25 NA NA NA NA ...
##  $ BMI           : num [1:67] 22.6 22.6 47.5 21.5 21.7 ...
##  $ IsManualReport: logi [1:67] TRUE TRUE FALSE TRUE TRUE TRUE ...
##  $ LogId         : num [1:67] 1.46e+12 1.46e+12 1.46e+12 1.46e+12 1.46e+12 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Id = col_double(),
##   ..   Date = col_character(),
##   ..   WeightKg = col_double(),
##   ..   WeightPounds = col_double(),
##   ..   Fat = col_double(),
##   ..   BMI = col_double(),
##   ..   IsManualReport = col_logical(),
##   ..   LogId = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
colnames(weightLogInfo_merged)
## [1] "Id"             "Date"           "WeightKg"       "WeightPounds"  
## [5] "Fat"            "BMI"            "IsManualReport" "LogId"

The weightLogInfo_merged file contains 67 rows and 8 columns, with column names Id, Date, weightKg, etc.

Daily Activity and Distance Pattern

For this part of the analysis, I want to do the Sorting, arranging, and summarization of each dataset by specifying and excluding the columns i needed not, in each file i have chosen to work on

dailyActivity_merged2 <-dailyActivity_merged %>%
  select(TotalDistance, VeryActiveDistance, ModeratelyActiveDistance, LightActiveDistance, Calories)
dailyActivity_merged2 %>%
  group_by(TotalDistance, VeryActiveDistance, ModeratelyActiveDistance, LightActiveDistance, Calories) %>%
  drop_na() %>%
  summarise(TotalDistance, VeryActiveDistance, ModeratelyActiveDistance, LightActiveDistance, Calories)
## `summarise()` has grouped output by 'TotalDistance', 'VeryActiveDistance',
## 'ModeratelyActiveDistance', 'LightActiveDistance', 'Calories'. You can override
## using the `.groups` argument.
## # A tibble: 940 × 5
## # Groups:   TotalDistance, VeryActiveDistance, ModeratelyActiveDistance,
## #   LightActiveDistance, Calories [884]
##    TotalDistance VeryActiveDistance ModeratelyActiveDistance LightActi…¹ Calor…²
##            <dbl>              <dbl>                    <dbl>       <dbl>   <dbl>
##  1             0                  0                        0           0       0
##  2             0                  0                        0           0       0
##  3             0                  0                        0           0       0
##  4             0                  0                        0           0       0
##  5             0                  0                        0           0      57
##  6             0                  0                        0           0     120
##  7             0                  0                        0           0     665
##  8             0                  0                        0           0    1347
##  9             0                  0                        0           0    1347
## 10             0                  0                        0           0    1347
## # … with 930 more rows, and abbreviated variable names ¹​LightActiveDistance,
## #   ²​Calories

The pie chart below is generated using Excel, it shows the hourly percentage at which each Fitbit user spend their time. Based on the summary, we can tell thus;

• 81% of Fitbit users spend more than 12 hours in sedentary mode

• 16% of Fitbit users spend their time being lightly active

• 1% of Fitbit users spend their time being fairly active

• 2% of Fitbit users spend their time being very active

• Withing their active time, most users are approximaltely lightly active

Daily Activity Patterns and Calories

In this part of the analyse phase, i want to figure out if there’s connection between users’ daily activity patterns and the number of calories burnt daily

DailyActivity_Pattern <-dailyActivity_merged %>%
  select(TotalDistance, Calories)
DailyActivity_Pattern %>%
  group_by(TotalDistance, Calories) %>%
  drop_na() %>%
  summarise(TotalDistance, Calories)
## `summarise()` has grouped output by 'TotalDistance', 'Calories'. You can
## override using the `.groups` argument.
## # A tibble: 940 × 2
## # Groups:   TotalDistance, Calories [884]
##    TotalDistance Calories
##            <dbl>    <dbl>
##  1             0        0
##  2             0        0
##  3             0        0
##  4             0        0
##  5             0       57
##  6             0      120
##  7             0      665
##  8             0     1347
##  9             0     1347
## 10             0     1347
## # … with 930 more rows

Immobility Pattern

For this part, i want to find out whether there’s correlation between users’ Sedentary minutes and it’s corresponding Sedentary active distance

Immobility_Pattern <-dailyActivity_merged %>%
  select(SedentaryMinutes, SedentaryActiveDistance)
Immobility_Pattern %>%
  group_by(SedentaryMinutes, SedentaryActiveDistance) %>%
  drop_na() %>%
  summarise(SedentaryMinutes, SedentaryActiveDistance)
## `summarise()` has grouped output by 'SedentaryMinutes',
## 'SedentaryActiveDistance'. You can override using the `.groups` argument.
## # A tibble: 940 × 2
## # Groups:   SedentaryMinutes, SedentaryActiveDistance [597]
##    SedentaryMinutes SedentaryActiveDistance
##               <dbl>                   <dbl>
##  1                0                       0
##  2                2                       0
##  3               13                       0
##  4               48                       0
##  5              111                       0
##  6              125                       0
##  7              127                       0
##  8              218                       0
##  9              222                       0
## 10              241                       0
## # … with 930 more rows

Sleep Analysis

In the sleep analysis, i use three columns from the SleepDay_Pattern to do the analysis. This is because i want to use the sleep record to correlate the differences between users’ total time in bed and the total minutes they sleep off. It’s well understood that the total time users stay in bed cannot equates the total minutes they sleep off. Even though, closing of eyes most times whilst in bed doesn’t corresponds to sleeping. Henece, the difference between the two duration(Total TimeInBed and Total MinutesAsleep) can simply be termed as “Insomnia”

SleepDay_Pattern <-sleepDay_merged
print(SleepDay_Pattern)
## # A tibble: 413 × 5
##            Id SleepDay              TotalSleepRecords TotalMinutesAsleep Total…¹
##         <dbl> <chr>                             <dbl>              <dbl>   <dbl>
##  1 1503960366 4/12/2016 12:00:00 AM                 1                327     346
##  2 1503960366 4/13/2016 12:00:00 AM                 2                384     407
##  3 1503960366 4/15/2016 12:00:00 AM                 1                412     442
##  4 1503960366 4/16/2016 12:00:00 AM                 2                340     367
##  5 1503960366 4/17/2016 12:00:00 AM                 1                700     712
##  6 1503960366 4/19/2016 12:00:00 AM                 1                304     320
##  7 1503960366 4/20/2016 12:00:00 AM                 1                360     377
##  8 1503960366 4/21/2016 12:00:00 AM                 1                325     364
##  9 1503960366 4/23/2016 12:00:00 AM                 1                361     384
## 10 1503960366 4/24/2016 12:00:00 AM                 1                430     449
## # … with 403 more rows, and abbreviated variable name ¹​TotalTimeInBed

I chose to separate the date and time in the SleepDay column because i only need the date to work on, and moreso, to make it consistent.

separate(SleepDay_Pattern,SleepDay, into=c('Date','Time'), sep= ' ')
## Warning: Expected 2 pieces. Additional pieces discarded in 413 rows [1, 2, 3, 4,
## 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
## # A tibble: 413 × 6
##            Id Date      Time     TotalSleepRecords TotalMinutesAsleep TotalTim…¹
##         <dbl> <chr>     <chr>                <dbl>              <dbl>      <dbl>
##  1 1503960366 4/12/2016 12:00:00                 1                327        346
##  2 1503960366 4/13/2016 12:00:00                 2                384        407
##  3 1503960366 4/15/2016 12:00:00                 1                412        442
##  4 1503960366 4/16/2016 12:00:00                 2                340        367
##  5 1503960366 4/17/2016 12:00:00                 1                700        712
##  6 1503960366 4/19/2016 12:00:00                 1                304        320
##  7 1503960366 4/20/2016 12:00:00                 1                360        377
##  8 1503960366 4/21/2016 12:00:00                 1                325        364
##  9 1503960366 4/23/2016 12:00:00                 1                361        384
## 10 1503960366 4/24/2016 12:00:00                 1                430        449
## # … with 403 more rows, and abbreviated variable name ¹​TotalTimeInBed
SleepDay_Pattern %>%
  group_by(TotalMinutesAsleep, TotalTimeInBed) %>%
  drop_na() %>%
  summarise(TotalMinutesAsleep, TotalTimeInBed)
## `summarise()` has grouped output by 'TotalMinutesAsleep', 'TotalTimeInBed'. You
## can override using the `.groups` argument.
## # A tibble: 413 × 2
## # Groups:   TotalMinutesAsleep, TotalTimeInBed [407]
##    TotalMinutesAsleep TotalTimeInBed
##                 <dbl>          <dbl>
##  1                 58             61
##  2                 59             65
##  3                 61             69
##  4                 62             65
##  5                 74             75
##  6                 74             78
##  7                 77             77
##  8                 79             82
##  9                 82             85
## 10                 98            107
## # … with 403 more rows

The column bar chart below is the sleep day duration which is generated using Excel, it shows the relationship at which Fitbit users spend their time in bed and the hours the fall asleep. Based on the summary, we can tell thus;

• Apart from assumptions that there’s crystal clear differences between the time users stay in bed and the the time they fall asleep, the chart below underscores the fact that Fitbit users spend more hours in bed before they fall asleep. Therefore, it can be inferred that it’s not the exact time Fitbit users get to bed, that they fall asleep.

Daily Pattern and BMI

In the sleep analysis, i chose three columns from the DailyActivity to do the analysis. This is because i want to use the manual report generated for a check up, and to correlate the relationship between users’ weight(kg) and their body_Mass_Index(BMI). And also, we can have a look at whether users’ daily activity patterns are connected to their body shape.But before i proceed, I’d like to differentiate briefly between weight and body mass; • Body Mass is a quantity of matter a body or an object contains. • A Weight of an object or a body is the product of its mass and acceleration due to gravity. Hence, we can infer that the weight of a body is expected to override its mass.

DailyActivity <-weightLogInfo_merged %>%
  select(WeightKg, BMI, IsManualReport)
print(DailyActivity)
## # A tibble: 67 × 3
##    WeightKg   BMI IsManualReport
##       <dbl> <dbl> <lgl>         
##  1     52.6  22.6 TRUE          
##  2     52.6  22.6 TRUE          
##  3    134.   47.5 FALSE         
##  4     56.7  21.5 TRUE          
##  5     57.3  21.7 TRUE          
##  6     72.4  27.5 TRUE          
##  7     72.3  27.4 TRUE          
##  8     69.7  27.2 TRUE          
##  9     70.3  27.5 TRUE          
## 10     69.9  27.3 TRUE          
## # … with 57 more rows

Here, i need to sort the column ‘IsManualReport’ in a meaningful order

DailyActivity %>%
  arrange(WeightKg, BMI, IsManualReport)
## # A tibble: 67 × 3
##    WeightKg   BMI IsManualReport
##       <dbl> <dbl> <lgl>         
##  1     52.6  22.6 TRUE          
##  2     52.6  22.6 TRUE          
##  3     56.7  21.5 TRUE          
##  4     57.3  21.7 TRUE          
##  5     61    23.8 TRUE          
##  6     61    23.8 TRUE          
##  7     61.1  23.9 TRUE          
##  8     61.2  23.9 TRUE          
##  9     61.2  23.9 TRUE          
## 10     61.2  23.9 TRUE          
## # … with 57 more rows
DailyActivity %>%  
  group_by(WeightKg, BMI, IsManualReport) %>%
  drop_na() %>%
  summarise(WeightKg, BMI, IsManualReport)
## `summarise()` has grouped output by 'WeightKg', 'BMI', 'IsManualReport'. You
## can override using the `.groups` argument.
## # A tibble: 67 × 3
## # Groups:   WeightKg, BMI, IsManualReport [36]
##    WeightKg   BMI IsManualReport
##       <dbl> <dbl> <lgl>         
##  1     52.6  22.6 TRUE          
##  2     52.6  22.6 TRUE          
##  3     56.7  21.5 TRUE          
##  4     57.3  21.7 TRUE          
##  5     61    23.8 TRUE          
##  6     61    23.8 TRUE          
##  7     61.1  23.9 TRUE          
##  8     61.2  23.9 TRUE          
##  9     61.2  23.9 TRUE          
## 10     61.2  23.9 TRUE          
## # … with 57 more rows

Share (Visualizations)

Sharing the general trends in the analyzed dataset through visualizations, i used the geometric smoothing curves all through. The graphs are used to show positive trends and relationships between the respective analyzed parameters so far.

Plots showing trends in the relationship between VeryActiveDistance, ModeratelyActiveDistance, and LightActiveDistance by plotting “Calories” against “TotalDistance”

ggplot(data= dailyActivity_merged2)+
  geom_smooth(aes(x= VeryActiveDistance, y=Calories, group=1, color='VeryActiveDistance'))+
  geom_smooth(aes(x= ModeratelyActiveDistance, y=Calories, group=2, color='ModeratelyActiveDistance'))+
  geom_smooth(aes(x= LightActiveDistance, y=Calories, group=3, color='LightActiveDistance'))+
  ylab('Calories')+
  xlab('TotalDistance')+
  labs(title= "BellaBeat: Energy_Unit(Calories) vs Total Distance Covered", subtitle= "Report on Daily Activeness")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

The graph above shows the trend between the three parameters “LightActiveDistance, ModeratelyActiveDistance, and VeryActiveDistance”. Fitbit users who are very active burn the highest calories with respect to the Total Distance covered. Light active users burn lesser calories w.r.t Total Distance covered. Moderately active users also burn the lowest calories w.r.t Total Distance covered. Though each user burns calories at different rates. We can still see in the graph above that the calories burn per day largely decreases with a decrease in the total distance, however this could characterize that more calories can still be burnt as Fitbit users spend less time sitting down or immobile.

ggplot(data= Immobility_Pattern)+
  geom_smooth(mapping=aes(x= SedentaryMinutes, y= SedentaryActiveDistance, color= "Immobility Trend"))+
  labs(title= "BellaBeat: SedentaryActiveDistance vs SedentaryMinutes", subtitle= "Immobility Pattern")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

The plot above shows a trend of Fitbit users’ sedentary active distance against minutes spend. This indicates that Fitbit users spend very little distance being active, and higher time in sedentary mode.Hence, very little calories are burnt as they decide to stay less active or immobile.

ggplot(data= sleepDay_merged)+
  geom_smooth(aes(x= TotalMinutesAsleep, y= TotalSleepRecords, group=1, color='TotalTimeInBed'))+
  geom_smooth(aes(x= TotalTimeInBed, y= TotalSleepRecords, group=2, color='TotalMinutesAsleep'))+
  ylab('TotalSleepRecords')+
  xlab('SleepDuration')+
  labs(title= "BellaBeat: SedentaryActiveDistance vs SedentaryMinutes", subtitle= "Sleep Pattern")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

The graphical plots above show trends in relationship between “TotalTimeInBed, and TotalMinutesAsleep by plotting”TotalSleepRecords” against “SleepDuration”. The graphical plot illustrates the fact that Fitbit users spend more time in bed(Insomnia), relatively to their total sleep records than the time they fall asleep. Even though, both trends seem to be somewhat higher with respect to the total sleep time recorded.

ggplot(data= DailyActivity)+
  geom_smooth(mapping=aes(x= WeightKg, y= BMI, linetype= IsManualReport))+
  labs(title= "BellaBeat: WeightKg vs BodyMassIndex(BMI)", subtitle= "Manual Report on BodySize")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

This plot is to use the Manual Report to showcase a correlation between Fitbit users’ “Weight and their Body Mass Index”. The report “TRUE” conforms to the fact users weighing less than 73Kg, have less than 25 body mass. While the report “FALSE” contradicts the fact that users who weigh higher, must also have corresponding higher body mass.

Act(Recommendation)

Bellabeat is concerned about helping the users(women) to manage their health and fitness through a smart device(Fitbit). We have observed and inferred that all Fitbit device users in this dataset live a sedentary lifestyle. Evidences have shown that prolonged sedentary is characterized to having many chronic diseases. However, engaging oneself in doing exercices make someone keep fit all enough. To help Bellabeat users create better and healthier lifestyles, It’s pertinent of them to reduce their users’ sedentary durations and increase their activity level. From what we found in this analysis, reducing sedentary time can help:

  1. Burn more calories and keep their body mass index(BMI) within a healthy range. We have found that users spend more time on physical activities and less time on sedentary behaviors, can help burn more calories. Hence, this can however make them keep fit.
  2. Keep them active and alert in terms of emergency situations, i.e, they’re going to build a sharp response and sensitivity to an external stimuli. This is advantageous to users who engage themselves in doing exercise, rather than spending more time in sedentary mode.
  3. Avoid and reduce their rate to harbor chronic diseases such as heart-related infections, obesity, elephantiasis, and so on.