The BellaBeat Smart Wellness Tracker

Osorio O. Matucurane

Overview

The Bellabeat Wellness Smart Tracker
Business Problem Statment
Ascertaining the Data Quality
Data Exploration with R Programming
Summarizing the Results
Communicate the Findings with GGPLOT2
Final Considerations

BellaBeat - Wellness and Readness Tracker

Bellabeat is a Smart tech wellness brand dedicated at women’s health
An absolute “Health & wellness game changer”
Fashionable health trackers designed and engineered for women
Versatile to be able to be worn on the wrist, collar, or neck, clip it on clothes

:::

The Smart Tracker Regarded Best in the Market

The Bellabeat fashionable smart jewelry tracker has no display
The tracker is fitted with sensors and it sync with an app

What Bellabeat Realy Tracks

The tracker works 24/7 whether you’re sleeping, being active, or meditating.
Tracking and monitoring biometric data (respiratory rate, resting heart rate and VHR) and sleep pattern
Tracking and monitoring lifestyle data such as steps and distance moved

Business Statement

Explore daily usage data on Bellabeat fitness tracker app to identify trends, patterns and gather sufficient evidences that should enlighten and empower data driven decision making.

Bellabeat has no substantial evidences on how customers effectively exploit their products
Bellabeat lacks feedback on which features are most valued by their customers

Research Questions

What are some trends in smart device usage?
How could these trends apply to Bellabeat customers?
How could these trends help influence Bellabeat’s marketing strategy?

Data Analysis Tool

R Programming is the favorite statistical data analysis software.
The following packages/libraries will be used.

Code

#|warning: false
#|message: false
#|error: false 

library(readr)
library(tidyr)
library(dplyr)
library(lubridate)
library(hms)
library(forcats)
library(ggplot2)
library(ggthemes)
library(RColorBrewer)
library(viridis)
library(gt)
library(scales)
library(plotly)
library(summarytools)
library(janitor)
library(flextable)
library(knitr)
library(glue)
library(tibble)

Importing and Loading Datasets

Next we import 7 datasets into R Studio and perform some basic preliminary data preparation by chaining :

Specify variable/column data type
rename some variables
create new variables (month and week day)
tidying (reshaping) data - keeping each column as a variable and each row as observation
grouping (binning) - transforming numerical variable into categories
Each data set is loaded and attributed a single R object a

Code

#|warning: false
setwd("C:\\Users\\USER\\Documents\\DataAnalytics\\Projects\\FitBitFitness\\Dataset")

# daily activity dataset
active <- read_csv("dailyActivity_merged.csv",
  col_types = cols(
    Id = col_character(),
    ActivityDate = col_date(format = "%m/%d/%Y")
  )
) %>%
  dplyr::rename(
    tracker_id = Id,
    tracker_date = ActivityDate,
    tracker_steps = TotalSteps,
    total_distance = TrackerDistance,
    high_dist = VeryActiveDistance,
    moder_dist = ModeratelyActiveDistance,
    light_dist = LightActiveDistance,
    sedent_dist = SedentaryActiveDistance,
    long_tm = VeryActiveMinutes,
    fair_tm = FairlyActiveMinutes,
    light_tm = LightlyActiveMinutes,
    sedent_tm = SedentaryMinutes,
    logged_kms = LoggedActivitiesDistance,
    calor_burnt = Calories
  ) %>%
  pivot_longer(
    cols = ends_with("dist"),
    names_to = "activ_move",
    values_to = "activ_distance"
  ) %>%
  pivot_longer(
    cols = ends_with("tm"),
    names_to = "activ_duration",
    values_to = "activ_time"
  ) %>%
  mutate(
    active_month = month(tracker_date, label = TRUE),
    active_day = wday(tracker_date, label = TRUE),
    active_wday = as.factor(if_else((active_day == "Sun" | active_day == "Sat"), "weekend", "busday")),
    activ_move = as.factor(activ_move),
    activ_duration = as.factor(activ_duration)
  ) %>%
  mutate(nr_steps = as.factor(case_when(
    tracker_steps < 5000 ~ "sedent",
    tracker_steps < 7500 ~ "active",
    tracker_steps <= 10000 ~ "moder_act",
    tracker_steps <= 12500 ~ "hyper_act",
    TRUE ~ "athelete"
  )), .after = 3)

Code

#|warning: false
setwd("C:\\Users\\USER\\Documents\\DataAnalytics\\Projects\\FitBitFitness\\Dataset")


# 1. sleep datset
sleep <- read_csv("sleepDay_merged.csv",
  col_types = cols(
    Id = col_character()
  )
) %>%
  mutate(
    sleep_date = as.Date(SleepDay, format = "%m/%d/%Y %H:%M:%S"),
    sleep_time = parse_date_time(SleepDay, "%m/%d/%Y %I:%M:%S %p"),
    sleep_hm = format(as.POSIXct(sleep_time), format = "%H:%M"),
    sleep_hms = as_hms(sleep_time),
    sleep_month = month(sleep_date, label = TRUE),
    sleep_day = wday(sleep_date, label = TRUE)
  ) %>%
  rename(
    tracker_id = Id,
    sleep_duration = TotalMinutesAsleep,
    count_sleep = TotalSleepRecords
  ) %>% select(-(SleepDay))

# 2. heartrate dataset
hrate <- read_csv("heartrate_seconds_merged.csv",
  col_types = cols(
    Id = col_character()
  )
) %>%
  mutate(
    hrate_date = as.Date(Time, format = "%m/%d/%Y %H:%M:%S"),
    hrate_time = parse_date_time(Time, "%m/%d/%Y %I:%M:%S %p"),
    hrate_hm = format(as.POSIXct(hrate_time), format = "%H:%M"),
    hrate_hms = as_hms(hrate_time),
    hrate_month = month(hrate_date, label = TRUE),
    hrate_day = wday(hrate_date, label = TRUE)
  ) %>%
  rename(
    tracker_id = Id
  ) %>%
  select(-c( Time))

# 3. weight dataset
weight <- read_csv("weightLogInfo_merged.csv",
  col_types = cols(
    Id = col_character()
  )
) %>%
  mutate(
    weight_date = as.Date(Date, format = "%m/%d/%Y %H:%M:%S"),
    weight_dtime = parse_date_time(Date, "%m/%d/%Y %I:%M:%S %p"),
    weight_time = as_hms(weight_dtime),
    weight_month = month(weight_date, label = TRUE),
    weight_day = wday(weight_date, label = TRUE)

  ) %>%
  rename(
    tracker_id = Id
  ) %>%
  select(-c(weight_dtime, Date))

# 4. daily calories burnt dataset
calories <- read_csv("dailyCalories_merged.csv",
   col_types = cols(
     Id = col_character(),
     ActivityDay = col_date(format = "%m/%d/%Y")
   )
 ) %>%
   rename(
     tracker_id = Id,
     calor_date = ActivityDay
   ) %>%
   mutate(
     calor_month = month(calor_date, label = TRUE),
     calor_day = wday(calor_date, label = TRUE))

# 5. daily Intensity dataset

intensities <- read_csv("dailyIntensities_merged.csv",
  col_types = cols(
    Id = col_character(),
    ActivityDay = col_date(format = "%m/%d/%Y")
  )
) %>%
  rename(
    tracker_id = Id,
    intensit_date = ActivityDay
  ) %>%
  mutate(
    intensit_month = month(intensit_date, label = TRUE),
    intensit_day = wday(intensit_date, label = TRUE))

# 6. daily Steps dataset

steps <- read_csv("dailySteps_merged.csv",
  col_types = cols(
    Id = col_character(),
    ActivityDay = col_date(format = "%m/%d/%Y")
  )
) %>%
  rename(
    tracker_id = Id,
    step_date = ActivityDay
  ) %>%
  mutate(
    stept_month = month(step_date, label = TRUE),
    step_day = wday(step_date, label = TRUE)
  )

Setting the Theme

We set a common customized theme for all coming charts and plots

Code

#|warning: false
my_theme <- theme_set(theme_classic() +
  theme(
    plot.subtitle = element_text(
      hjust = 0.5,
      size = 14,
      color = "skyblue",
      face = "bold",
      family = "Times",
      
    ),
    plot.caption = element_text(
      hjust = 1,
      size = 12, color = "grey",
      face = "italic"
    ),
    plot.title = element_text(
      hjust = 0.5,
      size = 16,
      color = "skyblue",
      face = "bold",
      family = "Tahoma"
    ),
    plot.tag = element_text(
      size = 14,
      color = "grey",
      face = "bold"
    ),
    axis.title = element_text(
      color = "steelblue",
      face = "bold",
      size = 15
    ),
    axis.line =  element_line(linewidth = 1.5, color = "lightgrey"),
    axis.text = element_text(
      face = "bold",
      color = "#993333",
      size = 14
    ),
    legend.title = element_blank(),
    legend.position = "top"
  ))

Dataset Dimensions (`nrows, ncolumns`)

How many observations and variables in each dataset?

Activities dataset : 15040, 15
intensities dataset : 940, 12
steps tracked dataset : 940, 5
calories burnt dataset : 940, 5
weight dataset : 67, 11
sleep dataset : 413, 10
heart rate dataset : 2483658, 8

Relationships Between The Datasets

Does all dataset share common elements (users)??
Which datasets are related in one or another way

Code

library(dplyr)
# unique vectors of the identifier in the smalldatasets
list_slp <- sleep %>% select(tracker_id) %>% unique() %>% as.vector()
list_hrt <- hrate %>% select(tracker_id) %>% unique() %>% as.vector()
list_wgt <- weight %>% select(tracker_id) %>% unique() %>% as.vector()
# Are they common identifiers? How many?
active %>% filter(tracker_id %in% list_slp) %>% sum()

[1] 0

Code

active %>% filter(tracker_id %in% list_hrt) %>% sum()

[1] 0

Code

active %>% filter(tracker_id %in% list_wgt) %>% sum()

[1] 0

Code

# Are they common identifiers between the small datasets? 
sleep %>% filter(tracker_id %in% list_wgt) %>% sum()

[1] 0

Code

sleep %>% filter(tracker_id %in% list_hrt) %>% sum()

[1] 0

Code

hrate %>% filter(tracker_id %in% list_wgt) %>% sum()

[1] 0

There NO are common elements (users) between the 3 datasets sleep, heart rate and weight .
There is no meaningful way to merge these three data sets and run analysis as a single data set

Ascertaining Data Quality

The data is publicly available on [Kaggle: FitBit Fitness Tracker Data](https://www.kaggle.com/datasets/arashnic/fitbit) and stored in 18 csv files.
Personal fitness tracker data from bellaeat users who consented to the submission of information about their daily activity, steps, heart rate and sleep monitoring.

Activities Dataset Quality

Code

#|fig-cap: "Activities Tracking Record"
#|fig-supcap:
#|  - " Frequency of Records"
#|  - "Days of Records"
#|layout-ncol: 2
#|column: page

pal <- c(
  "<= 10% (poor)" = "red",
  "<= 25% (moderate)" = "orange", 
  "<= 50% (good)" = "yellow", 
  "<= 75% (great)" = "skyblue",
  "75-100% (superb)" = "forestgreen" 
) 

activities_label <- paste0(rep("fit-", 9), 
        seq(1,33,4))

active %>%
  select(tracker_id, tracker_date, activ_move, nr_steps, activ_duration, active_month) %>%
  count(tracker_id) %>%arrange(desc(n)) %>% 

  mutate(freq = round(n / (length(unique(active$tracker_date))*4*4)*100,2)) %>% 
  ggplot(aes(
    y = reorder(tracker_id, freq),
    x = freq,
    fill = case_when(freq <= 10 ~ "<= 10% (poor)",
                     freq <= 25 ~ "<= 25% (moderate)",
                     freq <= 50 ~ "<= 50% (good)",
                     freq <= 75 ~ "<= 75% (great)",
                       TRUE ~ "75-100% (superb)")
  )) +
  geom_col( alpha = 0.6) +
  scale_fill_manual(
    values = pal,
    limits = names(pal)
  )+
  scale_x_continuous(labels = percent_format(scale = 1))+
  scale_y_discrete(labels = activities_label) +
  theme(axis.text.y = element_blank() ,
        axis.ticks.y = element_blank()) +
  xlab("Rating of Respondents (percentage)") +
  ylab("Participants (Users)") +
  ggtitle("High Ratio of response for Wellness Records")

Code

intensities %>%
  group_by(tracker_id) %>%
  count() %>%
  arrange(desc(n)) %>%
  ggplot(aes(
    y = reorder(tracker_id, n),
    x = n,
    fill = n
  )) +
  geom_col() +
  scale_fill_gradient(low = "yellow", high = "lightgreen", na.value = NA) +
  ggtitle("Frequency of Activities Records in 30 days Period") +
  xlab("Days Tracked") +
  ylab("Individual Tracked Users") +
  theme(axis.text.y = element_blank())

Note: This data has a satisfactorily higher completion ratio where most respondents have tracked data covering the data collection interval.

The Sleep Dataset Quality

Code

#|fig-cap: "Sleep Tracking Record"
#|fig-supcap:
#|  - " SleepFrequency of Records"
#|  - "Days of Records"
#|layout-ncol: 2
#|column: page

pal <- c(
  "<= 10% (poor)" = "red",
  "<= 25% (moderate)" = "orange", 
  "<= 50% (good)" = "yellow", 
  "<= 75% (great)" = "lightgreen",
  "75-100% (superb)" = "forestgreen" 
) 

plt_sleep <- sleep %>%
  select(tracker_id, sleep_date) %>%
  count(tracker_id) %>%
  mutate(freq = n / length(unique(sleep$sleep_date))) %>%
  ggplot(aes(
    y = reorder(tracker_id, freq),
    x = freq,
    fill = case_when(
      freq <= .10 ~ "<= 10% (poor)",
      freq <= .25 ~ "<= 25% (moderate)",
      freq <= .50 ~ "<= 50% (good)",
      freq <= .75 ~ "<= 75% (great)",
      TRUE ~ "75-100% (superb)"
    )
  )) +
  geom_col(alpha = 0.6) +
  scale_fill_manual(
    values = pal,
    limits = names(pal)
  ) +
  scale_x_continuous(labels = percent) +
  scale_y_discrete(labels = activities_label) +
  theme(
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank()
  ) +
  xlab("Rating of Respondents (percentage)") +
  ylab("Participants (Users)") +
  ggtitle("Ratio  of Respondents for Sleep Records in 30 days")

plt_sleep

Code

sleep %>% 
  group_by(tracker_id) %>% 
  count() %>% 
  arrange(desc(n)) %>% 
  ggplot(aes(y = reorder(tracker_id,n),
             x = n,
             fill = n)) +
  geom_col()+
scale_fill_gradient(low = "yellow", high = "lightgreen", na.value = NA)+

  ggtitle("Sleep Frequency For Tracked Users in 30 Days Period") +
  xlab("Tracking Days") + ylab("Tracked Users")+
  theme(axis.text.y = element_blank(),
        axis.ticks.y = element_blank())

Note: This data set is incomplete.

There are 5 respondents (15%) with low ratio of participation, bellow 10%.

Around 75% of respondents have complete record in 30 days period

Spotted one user responding 32 days - potential OUTLIER
Only 2 respondents responding 31 days period selected
Only 4 respondents 28 days
Only 2 respondents 25 days
There are 5 users that responded less that 10% (less that 3 days)

Heart Beat Data Quality Issues

Note:

In addition to fewer users monitoring heart rate in 30 days,
The dataset is incomplete, with about 50% getting tracked the heart beats scores in the 30 days period

Weight Tracker DataSet

This dataset is extremely poor, with only 8 respondents where 6 of them only tracking the weight for less than 10 days

Data Quality Summary

A good data should be Reliable, Original, Comprehensive, Current, and Cited (ROCCC).
Our data is far from being creditworthy, being riddled or mared with incomplete observations.
The sample size is smaller
The Data was collected backs to 2016, so not updated.
The data source remains credible

Exploring Data

Checking and Matching The Data Types

               Length Class   Mode     
tracker_id     15040  -none-  character
tracker_date   15040  Date    numeric  
tracker_steps  15040  -none-  numeric  
nr_steps       15040  factor  numeric  
TotalDistance  15040  -none-  numeric  
total_distance 15040  -none-  numeric  
logged_kms     15040  -none-  numeric  
calor_burnt    15040  -none-  numeric  
activ_move     15040  factor  numeric  
activ_distance 15040  -none-  numeric  
activ_duration 15040  factor  numeric  
activ_time     15040  -none-  numeric  
active_month   15040  ordered numeric  
active_day     15040  ordered numeric  
active_wday    15040  factor  numeric

Describing Data - Summary Table

A quick broad overview of our data frame with the skimr

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
tracker_id	0	1	10	10	0	33	0

Variable type: Date

skim_variable	n_missing	complete_rate	min	max	median	n_unique
tracker_date	0	1	2016-04-12	2016-05-12	2016-04-26	31

Variable type: factor

skim_variable	complete_rate	ordered	n_unique	top_counts
nr_steps	1	FALSE	5	sed: 4848, act: 2736, mod: 2608, hyp: 2544
activ_move	1	FALSE	4	hig: 3760, lig: 3760, mod: 3760, sed: 3760
activ_duration	1	FALSE	4	fai: 3760, lig: 3760, lon: 3760, sed: 3760
active_month	1	TRUE	2	Apr: 9776, May: 5264, Jan: 0, Feb: 0
active_day	1	TRUE	7	Tue: 2432, Wed: 2400, Thu: 2352, Fri: 2016
active_wday	1	FALSE	2	bus: 11120, wee: 3920

Variable type: numeric

skim_variable	complete_rate	mean	sd	p25	p50	p75	p100	hist
tracker_steps	1	7637.91	5084.61	3789.75	7405.50	10727.00	36019.00	▇▇▁▁▁
TotalDistance	1	5.49	3.92	2.62	5.24	7.71	28.03	▇▆▁▁▁
total_distance	1	5.48	3.91	2.62	5.24	7.71	28.03	▇▆▁▁▁
logged_kms	1	0.11	0.62	0.00	0.00	0.00	4.94	▇▁▁▁▁
calor_burnt	1	2303.61	717.81	1828.50	2134.00	2793.25	4900.00	▁▆▇▃▁
activ_distance	1	1.35	2.15	0.00	0.09	2.19	21.92	▇▁▁▁▁
activ_time	1	304.69	433.90	2.00	61.00	417.50	1440.00	▇▁▁▁▁

We get our summary split into 4 data type categories
Detailed account/profile of the column data quality
No missing values and No duplicated entries reported

$character - no issue
$Date - no issue
$factor - no issue
$numeric - higher dispersion as given by higher sd statistic
Histogram sketch with long tails suggesting some kind of unusual data points

Dealing with “Potential Outliers”:

Hunting unusual data points which are observations lying or falling distant from others

1. Scanning the Number of Steps

There are fewer observations spotted above 17500 steps
We set the limit to 17 500 steps

2. Scanning the Number of Calories Burnt

This pattern of observations for calories burnt seems to be common and plausible

3. Scanning the Distance

4. Scanning the Time Sync with the app

Time appears to be tracked in 30 minutes interval during 24 hours (1440 minutes)
The tracker device is weared 24 hours a day
There are 3 periods of substantial and significant activity
The majority is inactive, followed by 30 mints and 60 mints.
Most active period is 6 hours (360 mints)

6. Scanning Total Sleep Duration

Sleeping duration [3 - 12 hours]

7. Scanning Heart Rate Records

Code

#|label: fig_distance
#|fig-cap: "Distance  Tracked "
#|fig-supcap:
#|  - " Total Distance  Recorded"
#|  - "Distribution Cleanned Data"
#|layout-ncol: 2
#|column: page

hist_hrt1 <- hrate %>% select(Value) %>% 
  
 ggplot(aes(Value) )+
  geom_histogram(col = "tomato", fill = "chartreuse") +
  ggtitle("Distribution heart rate dataset") +
  scale_y_continuous(labels = label_comma())

ggplotly(hist_hrt1)

Code

hrate1 <- hrate %>% filter( Value <= 170)
hist_hrt2 <- hrate1 %>% select(Value) %>% 
  
 ggplot(aes(Value) )+
  geom_histogram(col = "#F4A582", fill = "#FDDBC7") +
  ggtitle("Cleaned Distribution heart rate dataset") +
  scale_y_continuous(labels = label_comma())

ggplotly(hist_hrt2)

Heart rate range: [60, 170]
values bellow 60 and above 170 beats are suspicious.

Data Summary

Descriptive Summary of Numerical Variables

Table 1. Activity Distance Moved - Average Distance
activ_move	activ_distance_mean	activ_distance_stdv
high_dist	1.190099120	1.896446727
light_dist	3.260660794	1.985207264
moder_dist	0.550627752	0.870193560
sedent_dist	0.001508811	0.007059017

Table 2. Activity Duration - Average Distance
activ_duration	active_hours_mean	active_hours_stdv
fair_tm	0.2180617	0.3338661
light_tm	3.1798458	1.8275716
long_tm	0.3101322	0.5017038
sedent_tm	16.5085903	5.0736653

Summary Categorical Variables

Tab. 3 Total Steps by Categories - Summary
Rank	Daily Steps	Steps Category	Total Users	Share
1	5 000	Sedentary	4,848	33.37%
2	10 000	Active	2,736	18.83%
3	7 500	Light Active	2,608	17.95%
4	12 500	Hyper Active	2,544	17.51%
5	12 500+	High Performer	1,792	12.33%

DATA VISUALIZATION WITH GGPLOT2

1. The Sample Size

2. Activities Tracking During 30 Days

The number of tracked users has declined sharply over the period

2. Wellness Tracking 1. Moved Distance

The Average distance moved is 5.06 kms

Bellabeat users are mostly less active (they move less)
They move on average 3.5 kms daily as light movements, 1.0 km and 0.5 km as high and moderate movements.

Tracking Metric 3. Logged Distance

About 97% of tracked users logged the distance (pre setting the target distance)

Wellness Tracking Metric 4. Average Active Time

Daily average active time (hours) = 5.05

About 17 hours are spent inactive, in sedentary activities like reading, watching, eating, ….

The Proportion of the Main Activity

Sedentary activity is the most dominant amg the tracked bellabeat users with 33% , moving less than 5 kms.
Occasionally they hit the recommended 10 km (18%).
Less frequently they go over 12. 5 km (12%)

Distribution for Users Active Time

Average time is 5.05 hours

Very active and fairly active activities levels receiving less than one hour (10exp(10))

Tracked Metric 5. Daily Average Steps

Average Steps = 7156.05

Tracked users barely and hardly hit 10 000 daily recommended steps.
Tracked users apparently more active during weekdays

Metric 6. Calories Incinerated by Tracked Users

Average Calories burnt by the tracked users = 2260.96

Average calories burnt slightly higher on busy days , but falling drastically in May

Average Sleep Duration

Week days average sleep hours higher and close to the recommended 8 hours

Heart Rate

Min Heart Rate = 36
Average Heart = 77.270074
Max Heart = 170

---

FINAL CONSIDERATIONS

Bealbeat Activity tracking is solid. The analysis of 33 reveals interesting patterns on how long they have been active, how far they have walked, how many calories you’ve burned and steps completed.

1. The activities are mostly tracked around 24 hours time

2. Tracked users are mostly sedentary where there spend 16 hours inactive, with average 7000 steps and burning 2300 calories .

3. They get sligtly active on week days from monday to friday.

4. It seems that tracked users are not engaged in high intensity cardio or work outs like cross fit training, running, jogging which typically burn more calories.

5. Heart beat, sleep and weigh are less tracked

6. We picked obsevations that are unusual, low and high heart rate, high number of steps and low seeleping hours. This could be related to low precision of the tracker suggesting needed improvements on the tracker.

7. A reminder could be included to alert users to get more active as they fall bellow recommended active scores.

8. A wider tracking period of at least 90 days and increased sample size covering much more users.

9. Bellabeat may improve the tracker utility. Users should have individual target, like weight loss, improved sleeping, and tracked againt the target and assess their performance.

10.. The last but not the least, based on these findings, although inconclusive, Bellabeat may consider designing a fitness plan targeted to improve the users activity scores and get more out of the smart tracker.

The BellaBeat Smart Wellness Tracker

Overview

BellaBeat - Wellness and Readness Tracker

The Smart Tracker Regarded Best in the Market

What Bellabeat Realy Tracks

Business Statement

Research Questions

Data Analysis Tool

Importing and Loading Datasets

Setting the Theme

Dataset Dimensions (nrows, ncolumns)

Relationships Between The Datasets

Ascertaining Data Quality

Activities Dataset Quality

The Sleep Dataset Quality

Heart Beat Data Quality Issues

Weight Tracker DataSet

Data Quality Summary

Exploring Data

Checking and Matching The Data Types

Describing Data - Summary Table

Dealing with “Potential Outliers”:

1. Scanning the Number of Steps

2. Scanning the Number of Calories Burnt

3. Scanning the Distance

4. Scanning the Time Sync with the app

6. Scanning Total Sleep Duration

7. Scanning Heart Rate Records

Data Summary

Descriptive Summary of Numerical Variables

Summary Categorical Variables

DATA VISUALIZATION WITH GGPLOT2

1. The Sample Size

2. Activities Tracking During 30 Days

2. Wellness Tracking 1. Moved Distance

Tracking Metric 3. Logged Distance

Wellness Tracking Metric 4. Average Active Time

The Proportion of the Main Activity

Distribution for Users Active Time

Tracked Metric 5. Daily Average Steps

Metric 6. Calories Incinerated by Tracked Users

Average Sleep Duration

Heart Rate

FINAL CONSIDERATIONS

Dataset Dimensions (`nrows, ncolumns`)