Case Study on Bellabeat

A Health Focused High-Tech Manufacturer

Author

By Somnath Das Gupta

1 INTRODUCTION

Hello everyone, it’s been a few months since I’ve been working on the Google Data Analytics Professional Certificate through Coursera. Throughout this journey I’ve accumulated lots of interesting, insightful and most importantly, useful information about various tools that were included with the bundle, such as Tableau, R programming, SQL, Spreadsheets.

This curriculum not only exposed me to diverse standardized practices, but also equipped me with a universal framework that I could apply to any project. Moreover, I gained valuable knowledge of key data analyst terminologies and processes. Through the completion of a case study included in the course, I was able to further enhance my skills and understanding by utilizing a variety of tools, methods, and strategies. Further, I will provide a brief walk through of my thought process and overall comprehension that I have developed over time through this case study.

2 BACKGROUND INFORMATION

Bellabeat is a high-tech manufacturer that manufactures health-focused smart products for women.They develop uniquely beautifully designed technology that informs and inspires women around the world. Their app and multiple smart devices are the source point to collect various data on activity, sleep, stress, hydration levels, and reproductive health to empower women with an understanding of their own health and habits.

Bellabeat was founded in 2013 by Urška Sršen and Sando Mur and since then it has grown rapidly and quickly positioned itself as a tech-driven wellness company for women.It took them three years to open multiple offices around the world and launch various products.They increased their product’s availability through a growing number of online retailers in addition to their own e - commerce channel on their website.

Bellabeat offers a range of products that promote wellness and a healthy lifestyle, all of which connect to the Bellabeat app.There are different product lines available, but one of their popular products is the Leaf, a wellness tracker that can be worn as a bracelet, necklace, or clip. It tracks the user’s activity, sleep, and stress levels and syncs with the Bellabeat app.Apart from their products, they also offer a subscription-based membership program that provides personalized guidance on nutrition, activity, sleep, health, beauty, and mindfulness, based on users’ lifestyles and goals, with 24/7 access.

3 DATA LIMITATIONS

We have identified several limitations in the datasets. First and foremost, the data is not comprehensive, as it includes inputs from only 33 unique users. Of these 33 users, only 8 entered weight, 12 entered heart rate, and 24 entered sleep data. Furthermore, some users did not provide information for all variables in the weight dataset, making the data incomplete. Despite these limitations, we will still work with these datasets as they contain important variables.

It’s important to note that the data comes from FitBit users, which is a secondary source, and therefore may not accurately reflect the behavior and data distribution of BellaBeat users, potentially leading to inaccurate insights.Another limitation is that the data is not current, as it was collected between 4/12/2016 to 5/12/2016, which was about 5 years prior to the time of this case study.The limited duration of data collection, only 30 days, and the small number of users, 33, also affect the reliability and may have resulted in a biased dataset.Some users may not have entered information, while others may have turned off their devices or not used them regularly. Additionally, some data was manually entered, such as weight information, which introduces potential errors.

If this were a real-life project intended to define BellaBeat’s marketing strategy, these limitations would need to be addressed before proceeding with the analysis. However, as this is a case study and we cannot control these limitations, we will still proceed with the analysis.In a real-world scenario, a data analyst would ask several questions before proceeding with data cleaning, such as why some users generated more data rows than others, whether users contributed data voluntarily or were told how often to use the app, and what measures were taken to eliminate sampling bias. It would also be beneficial to obtain newer versions of these datasets or similar datasets directly from BellaBeat for comparison and originality.

4 ASK PHASE:

Some of the determining questions asked which will significantly guide the direction for the future marketing program:

    • Do you have any specific recommendations for any selective domain or devices that you consider essential to focus on while recommending a marketing strategy?
    • What kinds of targets or expectations have you set for this analysis? Do you mind giving a concise brief?
    • Are you willing to pivot your business strategy if this analysis reveals insights that could help you expand your customer base and improve the customer experience apart from high level recommendation for marketing strategy?
    • Can you identify any products that did not receive the desired response, as compared to stakeholders expectations? If so, could you provide a short brief of what you think went wrong?
    • What will be the use case for recommending a high-level marketing strategy? Is it to enhance your current services or products, or perhaps to launch a new product or service offering?

4.1 Key Takeways

    1. Identify business task.
    • The main purpose of this analysis is to recommend a high-level marketing strategy and provide insights, which will enable the executive team to gain a clear landscape of the current state and identify untapped opportunities for growth.
    • The analysis of any one product can result in either minor or significant changes to that product or the creation of a new product that enhances the overall female customers experience as well as, can increase retention rates.
    1. Consider Key Stakeholders.
    • Urška Sršen(Bellabeat’s cofounder and Chief Creative Officer).
    • Sando Mur(Mathematician and Bellabeat’s cofounder)
    • Marketing Analytics Team.

4.2 Deliverables

    • Our goal is to ascertain growth opportunities for any specific products or services, as well as unlock the full potential of female customers who avail themselves of these offerings.
    • The aim is to spot all determining factors which are hindering female customers from achieving a balanced lifestyle through use of the app and also undermining possibilities for improving the app’s services to a superior standard.

5 PREPARE PHASE:

In this analysis, I will be using datasets to identify every pattern and style in how female users are utilizing the availed services or products. This analysis can also help broadcast areas where the product or service may need improvement or where the marketing strategies can be optimized to better cater to the needs and preferences of female users.The data is made available using public domain by Mobius page.

5.1 Key Task

    • Load the datasets in a particular order to ensure they are in consecutive order.
    • Downloading the datasets from a given online repository and then save them in a separate folder as raw data.
    • Determining the file format and ensuring if lists of docs are accessible,as well as both readable and writable.
    • Examining the credibility of data by inspecting for any vague or unwanted rows in each of the datasets and then sort them accordingly.
    • Checking if the total number of id’s are same for each dataset so as to identify any discrepancies and inconsistencies within the dataset.

5.2 Deliverables

    • Documenting the entire procedure step by step involved in this phase.
    • A short brief on each operation performed for clarity and ease of understanding.

5.3 Code Chunk

Let’s load the libraries:

Code
library(tidyverse)
library(ggplot2)
library(janitor)
library(hms)
library(geosphere)
library(spatialrisk)
library(distances)
library(Distance)
library(measurements)
library(plotrix)
library(lubridate)
library(ggalt)
library(hrbrthemes)
library(viridis)
library(ggridges)
library(scales)
library(readxl)
library(writexl)
library(ggiraph)
library(viridisLite)
library(labeling)
library(farver)
library(psych)
library(plotly)
library(GGally)
library(ggiraphExtra)
library(ggcorrplot)

Importing all Datsets.

Code
daily_activity <- read_csv("dailyActivity_merged.csv")
daily_calories <- read_csv("dailyCalories_merged.csv")
daily_Intensities <- read_csv("dailyIntensities_merged.csv")
daily_steps <- read_csv("dailySteps_merged.csv")
heartrate_seconds <- read_csv("heartrate_seconds_merged.csv")
hourly_calories <- read_csv("hourlyCalories_merged.csv")
hourly_intensities <- read_csv("hourlyIntensities_merged.csv")
hourly_steps <- read_csv("hourlySteps_merged.csv")
minute_calories_narrow <- read_csv("minuteCaloriesNarrow_merged.csv")
minute_calories_wide <- read_csv("minuteCaloriesWide_merged.csv")
minute_intensities_narrow <- read_csv("minuteIntensitiesNarrow_merged.csv")
minute_intensities_wide <- read_csv("minuteIntensitiesWide_merged.csv")
minute_METs_narrow <- read_csv("minuteMETsNarrow_merged.csv")
minute_sleep <- read_csv("minuteSleep_merged.csv")
minute_steps_narrow <- read_csv("minuteStepsNarrow_merged.csv")
minute_steps_wide <- read_csv("minuteStepsWide_merged.csv")
sleep_day <- read_csv("sleepDay_merged.csv")
weight_log_info <- read_csv("weightLogInfo_merged.csv")

Checking for distinctive user id in order to know exact number of users in a dataset.

For daily_steps dataset.

Code
n_distinct(daily_steps$Id)
[1] 33

For daily_calories dataset.

Code
n_distinct(daily_calories$Id)
[1] 33

For daily_intensities dataset.

Code
n_distinct(daily_Intensities$Id)
[1] 33

For daily_activities dataset.

Code
n_distinct(daily_Intensities$Id)
[1] 33

For sleep_day dataset.

Code
n_distinct(sleep_day$Id)
[1] 24

6 PROCESS PHASE:

Cleaning and processing of the gathered datasets is necessary to determine the quality of associated characteristics and to proceed with further analysis.

6.1 Key Task

    • Examine across datasets for errors or missing values.
    • Eradicating duplicates and outliers from datasets if any.
    • Selecting the appropriate tool to perform the required analysis.
    • Storing a backup of original datasets to refer back to in case any essential data is lost during analysis.
    • Transform the existing dataset into a workable format to perform the desired analysis.

6.2 Deliverables

    • Administering some crucial changes within datasets such as converting datatype to get desired one and using functions for calculating values.
    • Manipulating datasets by performing some critical computations as required.
    • Stating use case for all the changes made within datasets along with detailed summary of all datasets.
    • Recording statement for every minor to major changes made across datasets.

6.3 Code Chunk

Splitting date and time in two different columns.

Here parsing a character string into date and time using mdy_hms().

Code
heartrate_seconds$date1 <- mdy_hms(heartrate_seconds$Time)
hourly_calories$date1 <- mdy_hms(hourly_calories$ActivityHour)
hourly_intensities$date1 <- mdy_hms(hourly_intensities$ActivityHour)
hourly_steps$date1 <- mdy_hms(hourly_steps$ActivityHour)
minute_calories_narrow$date1 <- mdy_hms(minute_calories_narrow$ActivityMinute)
minute_calories_wide$date1 <- mdy_hms(minute_calories_wide$ActivityHour)
minute_intensities_wide$date1 <- mdy_hms(minute_intensities_wide$ActivityHour)
minute_intensities_narrow$date1 <- mdy_hms(minute_intensities_narrow$ActivityMinute)
minute_METs_narrow$date1 <- mdy_hms(minute_METs_narrow$ActivityMinute)
minute_sleep$date1 <- mdy_hms(minute_sleep$date)
minute_steps_narrow$date1 <- mdy_hms(minute_steps_narrow$ActivityMinute)
minute_steps_wide$date1 <- mdy_hms(minute_steps_wide$ActivityHour)
sleep_day$date1 <- mdy_hms(sleep_day$SleepDay)
weight_log_info$date1 <- mdy_hms(weight_log_info$Date)

Converting and extracting date from date and time column using as.date().

Code
daily_activity$ActivityDate <- as.Date(daily_activity$ActivityDate, format = "%m/%d/%Y")
daily_calories$ActivityDay <- as.Date(daily_calories$ActivityDay,format = "%m/%d/%Y")
daily_Intensities$ActivityDay <- as.Date(daily_Intensities$ActivityDay,format = "%m/%d/%Y")
daily_steps$ActivityDay <- as.Date(daily_steps$ActivityDay,format = "%m/%d/%Y")
heartrate_seconds$Date <- as.Date(heartrate_seconds$date1)
hourly_calories$Date <- as.Date(hourly_calories$date1)
hourly_intensities$Date <- as.Date(hourly_intensities$date1)
hourly_steps$Date <- as.Date(hourly_steps$date1)
minute_calories_narrow$Date <- as.Date(minute_calories_narrow$date1)
minute_calories_wide$Date <- as.Date(minute_calories_wide$date1)
minute_intensities_wide$Date <- as.Date(minute_intensities_wide$date1)
minute_intensities_narrow$Date <- as.Date(minute_intensities_narrow$date1)
minute_METs_narrow$Date <- as.Date(minute_METs_narrow$date1)
minute_sleep$Date <- as.Date(minute_sleep$date1)
minute_steps_narrow$Date <- as.Date(minute_steps_narrow$date1)
minute_steps_wide$Date <- as.Date(minute_steps_wide$date1)
sleep_day$Date <- as.Date(sleep_day$date1)
weight_log_info$Date <- as.Date(weight_log_info$date1)

Converting and extracting time from date and time column using as_hms().

Code
heartrate_seconds$Time <- as_hms(heartrate_seconds$date1)
hourly_calories$Time <- as_hms(hourly_calories$date1)
hourly_intensities$Time <- as_hms(hourly_intensities$date1)
hourly_steps$Time <- as_hms(hourly_steps$date1)
minute_calories_narrow$Time <- as_hms(minute_calories_narrow$date1)
minute_calories_wide$Time <- as_hms(minute_calories_wide$date1)
minute_intensities_wide$Time <- as_hms(minute_intensities_wide$date1)
minute_intensities_narrow$Time <- as_hms(minute_intensities_narrow$date1)
minute_METs_narrow$Time <- as_hms(minute_METs_narrow$date1)
minute_sleep$Time <- as_hms(minute_sleep$date1)
minute_steps_narrow$Time <- as_hms(minute_steps_narrow$date1)
minute_steps_wide$Time <- as_hms(minute_steps_wide$date1)
sleep_day$Time <- as_hms(sleep_day$date1)
weight_log_info$Time <- as_hms(weight_log_info$date1)

Removing Columns from a dataset.

Code
heartrate_seconds$date1 <- NULL
hourly_calories$date1 <- NULL
hourly_calories$ActivityHour <- NULL
hourly_intensities$date1 <- NULL
hourly_intensities$ActivityHour <- NULL
hourly_steps$date1 <- NULL
hourly_steps$ActivityHour <- NULL
minute_calories_narrow$date1 <- NULL
minute_calories_narrow$ActivityMinute <- NULL
minute_calories_wide$date1 <- NULL
minute_calories_wide$ActivityHour <- NULL
minute_intensities_wide$date1 <- NULL
minute_intensities_wide$ActivityHour <- NULL
minute_intensities_narrow$date1 <- NULL
minute_intensities_narrow$ActivityMinute <- NULL
minute_METs_narrow$date1 <- NULL
minute_METs_narrow$ActivityMinute <- NULL
minute_sleep$date1 <- NULL
minute_sleep$date <- NULL
minute_steps_narrow$date1 <- NULL
minute_steps_narrow$ActivityMinute <- NULL
minute_steps_wide$date1 <- NULL
minute_steps_wide$ActivityHour <- NULL
sleep_day$date1 <- NULL
sleep_day$SleepDay <- NULL
weight_log_info$date1 <- NULL

Extracting months from dates in datasets using format () and “%B” which reperesent month name in characters.

Code
daily_activity$Month <- format(daily_activity$ActivityDate, "%B")
daily_calories$Month <- format(daily_calories$ActivityDay, "%B")
daily_Intensities$Month <- format(daily_Intensities$ActivityDay, "%B")
daily_steps$Month <- format(daily_steps$ActivityDay, "%B")
heartrate_seconds$Month <- format(heartrate_seconds$Date,"%B")
hourly_calories$Month  <- format(hourly_calories$Date, "%B")
hourly_intensities$Month  <- format(hourly_intensities$Date,"%B")
hourly_steps$Month  <- format(hourly_steps$Date,"%B")
minute_calories_narrow$Month <- format(minute_calories_narrow$Date,"%B")
minute_calories_wide$Month <- format(minute_calories_wide$Date,"%B")
minute_intensities_wide$Month <- format(minute_intensities_wide$Date,"%B")
minute_intensities_narrow$Month <- format(minute_intensities_narrow$Date,"%B")
minute_METs_narrow$Month <- format(minute_METs_narrow$Date,"%B")
minute_sleep$Month <- format(minute_sleep$Date,"%B")
minute_steps_narrow$Month <- format(minute_steps_narrow$Date,"%B")
minute_steps_wide$Month <- format(minute_steps_wide$Date,"%B")
sleep_day$Month <- format(sleep_day$Date,"%B")
weight_log_info$Month <- format(weight_log_info$Date,"%B") 

Getting weekdays from date columns of every datasets using weekdays().

Code
daily_activity$WeekDay <- weekdays(daily_activity$ActivityDate)
daily_calories$WeekDay <- weekdays(daily_calories$ActivityDay)
daily_Intensities$WeekDay <- weekdays(daily_Intensities$ActivityDay)
daily_steps$WeekDay <- weekdays(daily_steps$ActivityDay)
heartrate_seconds$WeekDay <- weekdays(heartrate_seconds$Date)
hourly_calories$WeekDay   <- weekdays(hourly_calories$Date)
hourly_intensities$WeekDay   <- weekdays(hourly_intensities$Date)
hourly_steps$WeekDay  <- weekdays(hourly_steps$Date)
minute_calories_narrow$WeekDay  <- weekdays(minute_calories_narrow$Date)
minute_calories_wide$WeekDay  <- weekdays(minute_calories_wide$Date)
minute_intensities_wide$WeekDay  <- weekdays(minute_intensities_wide$Date)
minute_intensities_WeekDay <- weekdays(minute_METs_narrow$Date)
minute_sleep$WeekDay <- weekdays(minute_sleep$Date)
minute_steps_narrow$WeekDay  <- weekdays(minute_steps_narrow$Date)
minute_steps_wide$WeekDay  <- weekdays(minute_steps_wide$Date)
sleep_day$WeekDay  <- weekdays(sleep_day$Date)
weight_log_info$WeekDay  <- weekdays(weight_log_info$Date) 

Creating breaks and labels to convert time into time of the day.

Creating breaks.

Code
breaks <- hour(hm("00:00", "6:00", "12:00", "18:00", "23:59"))

Creating labels for breaks.

Code
labels <- c("Night", "Morning", "Afternoon", "Evening")

Converting time into time of the day for various datasets.

Code
heartrate_seconds$Time_of_day <- cut(x=hour(heartrate_seconds$Time),breaks = breaks,labels = labels,include.lowest = TRUE)
hourly_calories$Time_of_day <- cut(x=hour(hourly_calories$Time),breaks = breaks,labels = labels,include.lowest = TRUE)
hourly_intensities$Time_of_day <- cut(x=hour(hourly_intensities$Time),breaks = breaks,labels = labels,include.lowest = TRUE)
hourly_steps$Time_of_day <- cut(x=hour(hourly_steps$Time),breaks = breaks,labels = labels,include.lowest = TRUE)
weight_log_info$Time_of_day <- cut(x=hour(weight_log_info$Time),breaks = breaks,labels = labels,include.lowest = TRUE)
minute_intensities_wide$Time_of_day <- cut(x=hour(minute_intensities_wide$Time),breaks = breaks,labels = labels,include.lowest = TRUE)
minute_METs_narrow$Time_of_day <- cut(x=hour(minute_METs_narrow$Time),breaks = breaks,labels = labels,include.lowest = TRUE)
minute_sleep$Time_of_day <- cut(x=hour(minute_sleep$Time),breaks = breaks,labels = labels,include.lowest = TRUE)
minute_steps_wide$Time_of_day <- cut(x=hour(minute_steps_wide$Time),breaks = breaks,labels = labels,include.lowest = TRUE)
minute_steps_narrow$Time_of_day <- cut(x=hour(minute_steps_narrow$Time),breaks = breaks,labels = labels,include.lowest = TRUE)

Using str() function to get the detailed data formatting structure of the data frame which includes all rows and columns.

Code
str(daily_activity)
spc_tbl_ [940 × 17] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ Id                      : num [1:940] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
 $ ActivityDate            : Date[1:940], format: "2016-04-12" "2016-04-13" ...
 $ TotalSteps              : num [1:940] 13162 10735 10460 9762 12669 ...
 $ TotalDistance           : num [1:940] 8.5 6.97 6.74 6.28 8.16 ...
 $ TrackerDistance         : num [1:940] 8.5 6.97 6.74 6.28 8.16 ...
 $ LoggedActivitiesDistance: num [1:940] 0 0 0 0 0 0 0 0 0 0 ...
 $ VeryActiveDistance      : num [1:940] 1.88 1.57 2.44 2.14 2.71 ...
 $ ModeratelyActiveDistance: num [1:940] 0.55 0.69 0.4 1.26 0.41 ...
 $ LightActiveDistance     : num [1:940] 6.06 4.71 3.91 2.83 5.04 ...
 $ SedentaryActiveDistance : num [1:940] 0 0 0 0 0 0 0 0 0 0 ...
 $ VeryActiveMinutes       : num [1:940] 25 21 30 29 36 38 42 50 28 19 ...
 $ FairlyActiveMinutes     : num [1:940] 13 19 11 34 10 20 16 31 12 8 ...
 $ LightlyActiveMinutes    : num [1:940] 328 217 181 209 221 164 233 264 205 211 ...
 $ SedentaryMinutes        : num [1:940] 728 776 1218 726 773 ...
 $ Calories                : num [1:940] 1985 1797 1776 1745 1863 ...
 $ Month                   : chr [1:940] "April" "April" "April" "April" ...
 $ WeekDay                 : chr [1:940] "Tuesday" "Wednesday" "Thursday" "Friday" ...
 - attr(*, "spec")=
  .. cols(
  ..   Id = col_double(),
  ..   ActivityDate = col_character(),
  ..   TotalSteps = col_double(),
  ..   TotalDistance = col_double(),
  ..   TrackerDistance = col_double(),
  ..   LoggedActivitiesDistance = col_double(),
  ..   VeryActiveDistance = col_double(),
  ..   ModeratelyActiveDistance = col_double(),
  ..   LightActiveDistance = col_double(),
  ..   SedentaryActiveDistance = col_double(),
  ..   VeryActiveMinutes = col_double(),
  ..   FairlyActiveMinutes = col_double(),
  ..   LightlyActiveMinutes = col_double(),
  ..   SedentaryMinutes = col_double(),
  ..   Calories = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 
Code
str(sleep_day)
spc_tbl_ [413 × 8] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ Id                : num [1:413] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
 $ TotalSleepRecords : num [1:413] 1 2 1 2 1 1 1 1 1 1 ...
 $ TotalMinutesAsleep: num [1:413] 327 384 412 340 700 304 360 325 361 430 ...
 $ TotalTimeInBed    : num [1:413] 346 407 442 367 712 320 377 364 384 449 ...
 $ Date              : Date[1:413], format: "2016-04-12" "2016-04-13" ...
 $ Time              : 'hms' num [1:413] 00:00:00 00:00:00 00:00:00 00:00:00 ...
  ..- attr(*, "units")= chr "secs"
 $ Month             : chr [1:413] "April" "April" "April" "April" ...
 $ WeekDay           : chr [1:413] "Tuesday" "Wednesday" "Friday" "Saturday" ...
 - attr(*, "spec")=
  .. cols(
  ..   Id = col_double(),
  ..   SleepDay = col_character(),
  ..   TotalSleepRecords = col_double(),
  ..   TotalMinutesAsleep = col_double(),
  ..   TotalTimeInBed = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 
Code
str(daily_Intensities)
spc_tbl_ [940 × 12] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ Id                      : num [1:940] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
 $ ActivityDay             : Date[1:940], format: "2016-04-12" "2016-04-13" ...
 $ SedentaryMinutes        : num [1:940] 728 776 1218 726 773 ...
 $ LightlyActiveMinutes    : num [1:940] 328 217 181 209 221 164 233 264 205 211 ...
 $ FairlyActiveMinutes     : num [1:940] 13 19 11 34 10 20 16 31 12 8 ...
 $ VeryActiveMinutes       : num [1:940] 25 21 30 29 36 38 42 50 28 19 ...
 $ SedentaryActiveDistance : num [1:940] 0 0 0 0 0 0 0 0 0 0 ...
 $ LightActiveDistance     : num [1:940] 6.06 4.71 3.91 2.83 5.04 ...
 $ ModeratelyActiveDistance: num [1:940] 0.55 0.69 0.4 1.26 0.41 ...
 $ VeryActiveDistance      : num [1:940] 1.88 1.57 2.44 2.14 2.71 ...
 $ Month                   : chr [1:940] "April" "April" "April" "April" ...
 $ WeekDay                 : chr [1:940] "Tuesday" "Wednesday" "Thursday" "Friday" ...
 - attr(*, "spec")=
  .. cols(
  ..   Id = col_double(),
  ..   ActivityDay = col_character(),
  ..   SedentaryMinutes = col_double(),
  ..   LightlyActiveMinutes = col_double(),
  ..   FairlyActiveMinutes = col_double(),
  ..   VeryActiveMinutes = col_double(),
  ..   SedentaryActiveDistance = col_double(),
  ..   LightActiveDistance = col_double(),
  ..   ModeratelyActiveDistance = col_double(),
  ..   VeryActiveDistance = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 

Using colnames() checking the column names of the given data frames.

Code
colnames(weight_log_info)
 [1] "Id"             "Date"           "WeightKg"       "WeightPounds"  
 [5] "Fat"            "BMI"            "IsManualReport" "LogId"         
 [9] "Time"           "Month"          "WeekDay"        "Time_of_day"   
Code
colnames(daily_calories)
[1] "Id"          "ActivityDay" "Calories"    "Month"       "WeekDay"    
Code
colnames(daily_steps)
[1] "Id"          "ActivityDay" "StepTotal"   "Month"       "WeekDay"    

Removing dulicated rows and columns.

Code
daily_activity <- daily_activity[!duplicated(daily_activity), ]

sleep_day <- sleep_day[!duplicated(sleep_day), ]

daily_Intensities <- daily_Intensities[!duplicated(daily_Intensities), ]

weight_log_info <- weight_log_info[!duplicated(weight_log_info), ]

minute_steps_wide <- minute_steps_wide[!duplicated(minute_steps_wide), ]

minute_intensities_wide <- minute_intensities_wide[!duplicated(minute_intensities_wide), ]

minute_calories_wide <- minute_calories_wide[!duplicated(minute_calories_wide), ]

Merging data by Id’s and creating a distinguishable combination of data frames to ensure a widened outlook.

Code
calorie_steps <- merge(daily_calories,daily_steps, by = "Id",all = TRUE)

weight_sleep <- merge(weight_log_info,sleep_day,by = "Id", all = TRUE)

dailyActivity_sleep <- merge(daily_activity,sleep_day,by = "Id",all = TRUE)

dailyIntensities_weight <- merge(daily_Intensities,weight_log_info, by = "Id",all = TRUE)

Using ‘inner_join()’ concatenating datasets by Id’s and creating a distinguishable combination of data frames to ensure a widened outlook.

Code
hourly_calories_intensities <- inner_join(hourly_calories,hourly_intensities, by = "Id",multiple = "all")

hourly_calories_steps <- inner_join(hourly_calories,hourly_steps, by = "Id",multiple = "all")

hourly_intensities_calories <- inner_join(calorie_steps,hourly_intensities, by = "Id",multiple = "all")

Ommiting NA’s for dataset as it is mandatory in order to visualize.

Code
dailyActivity_sleep <- na.omit(dailyActivity_sleep)
dailyIntensities_weight <- na.omit(dailyIntensities_weight)
hourly_calories_intensities <- na.omit(hourly_calories_intensities)
daily_activity <- na.omit(daily_activity)

Getting summary of data using summary() function.

Code
summary(calorie_steps)
       Id            ActivityDay.x           Calories      Month.x         
 Min.   :1.504e+09   Min.   :2016-04-12   Min.   :   0   Length:27800      
 1st Qu.:2.320e+09   1st Qu.:2016-04-19   1st Qu.:1827   Class :character  
 Median :4.445e+09   Median :2016-04-26   Median :2156   Mode  :character  
 Mean   :4.833e+09   Mean   :2016-04-26   Mean   :2313                     
 3rd Qu.:6.962e+09   3rd Qu.:2016-05-04   3rd Qu.:2800                     
 Max.   :8.878e+09   Max.   :2016-05-12   Max.   :4900                     
  WeekDay.x         ActivityDay.y          StepTotal       Month.y         
 Length:27800       Min.   :2016-04-12   Min.   :    0   Length:27800      
 Class :character   1st Qu.:2016-04-19   1st Qu.: 3761   Class :character  
 Mode  :character   Median :2016-04-26   Median : 7443   Mode  :character  
                    Mean   :2016-04-26   Mean   : 7673                     
                    3rd Qu.:2016-05-04   3rd Qu.:10771                     
                    Max.   :2016-05-12   Max.   :36019                     
  WeekDay.y        
 Length:27800      
 Class :character  
 Mode  :character  
                   
                   
                   
Code
summary(weight_sleep)
       Id                Date.x              WeightKg       WeightPounds  
 Min.   :1.504e+09   Min.   :2016-04-12   Min.   : 52.60   Min.   :116.0  
 1st Qu.:5.577e+09   1st Qu.:2016-04-18   1st Qu.: 61.20   1st Qu.:134.9  
 Median :6.962e+09   Median :2016-04-28   Median : 61.50   Median :135.6  
 Mean   :6.235e+09   Mean   :2016-04-26   Mean   : 63.34   Mean   :139.6  
 3rd Qu.:6.962e+09   3rd Qu.:2016-05-04   3rd Qu.: 62.00   3rd Qu.:136.7  
 Max.   :8.878e+09   Max.   :2016-05-12   Max.   :133.50   Max.   :294.3  
                     NA's   :292          NA's   :292      NA's   :292    
      Fat             BMI        IsManualReport      LogId          
 Min.   :22.00   Min.   :21.45   Mode :logical   Min.   :1.460e+12  
 1st Qu.:22.00   1st Qu.:23.89   FALSE:55        1st Qu.:1.461e+12  
 Median :25.00   Median :24.00   TRUE :1059      Median :1.462e+12  
 Mean   :23.53   Mean   :24.42   NA's :292       Mean   :1.462e+12  
 3rd Qu.:25.00   3rd Qu.:24.21                   3rd Qu.:1.462e+12  
 Max.   :25.00   Max.   :47.54                   Max.   :1.463e+12  
 NA's   :1355    NA's   :292                     NA's   :292        
    Time.x           Month.x           WeekDay.x            Time_of_day  
 Length:1406       Length:1406        Length:1406        Night    :  23  
 Class1:hms        Class :character   Class :character   Morning  :  31  
 Class2:difftime   Mode  :character   Mode  :character   Afternoon:   1  
 Mode  :numeric                                          Evening  :1059  
                                                         NA's     : 292  
                                                                         
                                                                         
 TotalSleepRecords TotalMinutesAsleep TotalTimeInBed      Date.y          
 Min.   :1.000     Min.   : 58.0      Min.   : 61.0   Min.   :2016-04-12  
 1st Qu.:1.000     1st Qu.:400.0      1st Qu.:421.8   1st Qu.:2016-04-19  
 Median :1.000     Median :442.0      Median :457.0   Median :2016-04-27  
 Mean   :1.101     Mean   :433.7      Mean   :458.3   Mean   :2016-04-26  
 3rd Qu.:1.000     3rd Qu.:476.2      3rd Qu.:510.0   3rd Qu.:2016-05-04  
 Max.   :3.000     Max.   :796.0      Max.   :961.0   Max.   :2016-05-12  
 NA's   :26        NA's   :26         NA's   :26      NA's   :26          
    Time.y           Month.y           WeekDay.y        
 Length:1406       Length:1406        Length:1406       
 Class1:hms        Class :character   Class :character  
 Class2:difftime   Mode  :character   Mode  :character  
 Mode  :numeric                                         
                                                        
                                                        
                                                        
Code
summary(dailyActivity_sleep)
       Id             ActivityDate          TotalSteps    TotalDistance   
 Min.   :1.504e+09   Min.   :2016-04-12   Min.   :    0   Min.   : 0.000  
 1st Qu.:3.977e+09   1st Qu.:2016-04-19   1st Qu.: 4660   1st Qu.: 3.160  
 Median :4.703e+09   Median :2016-04-27   Median : 8585   Median : 6.120  
 Mean   :5.021e+09   Mean   :2016-04-26   Mean   : 8108   Mean   : 5.722  
 3rd Qu.:6.962e+09   3rd Qu.:2016-05-04   3rd Qu.:11317   3rd Qu.: 7.920  
 Max.   :8.792e+09   Max.   :2016-05-12   Max.   :22988   Max.   :17.950  
 TrackerDistance  LoggedActivitiesDistance VeryActiveDistance
 Min.   : 0.000   Min.   :0.0000           Min.   : 0.000    
 1st Qu.: 3.160   1st Qu.:0.0000           1st Qu.: 0.000    
 Median : 6.120   Median :0.0000           Median : 0.530    
 Mean   : 5.715   Mean   :0.1215           Mean   : 1.397    
 3rd Qu.: 7.880   3rd Qu.:0.0000           3rd Qu.: 2.310    
 Max.   :17.950   Max.   :4.9421           Max.   :13.400    
 ModeratelyActiveDistance LightActiveDistance SedentaryActiveDistance
 Min.   :0.0000           Min.   : 0.000      Min.   :0.0000000      
 1st Qu.:0.0000           1st Qu.: 2.350      1st Qu.:0.0000000      
 Median :0.4000           Median : 3.540      Median :0.0000000      
 Mean   :0.7309           Mean   : 3.532      Mean   :0.0006795      
 3rd Qu.:1.0000           3rd Qu.: 4.830      3rd Qu.:0.0000000      
 Max.   :6.4800           Max.   :10.300      Max.   :0.1100000      
 VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes
 Min.   :  0.00    Min.   :  0.00      Min.   :  0.0        Min.   :   0.0  
 1st Qu.:  0.00    1st Qu.:  0.00      1st Qu.:144.0        1st Qu.: 659.0  
 Median :  8.00    Median : 10.00      Median :200.0        Median : 734.0  
 Mean   : 23.94    Mean   : 17.34      Mean   :199.8        Mean   : 799.4  
 3rd Qu.: 36.00    3rd Qu.: 24.00      3rd Qu.:258.0        3rd Qu.: 853.0  
 Max.   :210.00    Max.   :143.00      Max.   :518.0        Max.   :1440.0  
    Calories      Month.x           WeekDay.x         TotalSleepRecords
 Min.   :   0   Length:12348       Length:12348       Min.   :1.000    
 1st Qu.:1776   Class :character   Class :character   1st Qu.:1.000    
 Median :2158   Mode  :character   Mode  :character   Median :1.000    
 Mean   :2323                                         Mean   :1.122    
 3rd Qu.:2859                                         3rd Qu.:1.000    
 Max.   :4900                                         Max.   :3.000    
 TotalMinutesAsleep TotalTimeInBed       Date                Time         
 Min.   : 58.0      Min.   : 61.0   Min.   :2016-04-12   Length:12348     
 1st Qu.:361.0      1st Qu.:402.0   1st Qu.:2016-04-19   Class1:hms       
 Median :432.0      Median :462.0   Median :2016-04-27   Class2:difftime  
 Mean   :419.1      Mean   :458.2   Mean   :2016-04-26   Mode  :numeric   
 3rd Qu.:492.0      3rd Qu.:526.0   3rd Qu.:2016-05-04                    
 Max.   :796.0      Max.   :961.0   Max.   :2016-05-12                    
   Month.y           WeekDay.y        
 Length:12348       Length:12348      
 Class :character   Class :character  
 Mode  :character   Mode  :character  
                                      
                                      
                                      
Code
summary(dailyIntensities_weight)
       Id             ActivityDay         SedentaryMinutes LightlyActiveMinutes
 Min.   :1.504e+09   Min.   :2016-04-12   Min.   :   0.0   Min.   :  0.0       
 1st Qu.:1.504e+09   1st Qu.:2016-04-19   1st Qu.: 654.0   1st Qu.:191.8       
 Median :2.912e+09   Median :2016-04-27   Median : 739.0   Median :233.0       
 Mean   :2.912e+09   Mean   :2016-04-27   Mean   : 792.0   Mean   :224.4       
 3rd Qu.:4.320e+09   3rd Qu.:2016-05-04   3rd Qu.: 834.5   3rd Qu.:288.8       
 Max.   :4.320e+09   Max.   :2016-05-12   Max.   :1440.0   Max.   :390.0       
 FairlyActiveMinutes VeryActiveMinutes SedentaryActiveDistance
 Min.   : 0.00       Min.   : 0.00     Min.   :0              
 1st Qu.: 8.00       1st Qu.: 1.00     1st Qu.:0              
 Median :13.50       Median :14.50     Median :0              
 Mean   :15.74       Mean   :21.15     Mean   :0              
 3rd Qu.:23.00       3rd Qu.:37.75     3rd Qu.:0              
 Max.   :47.00       Max.   :78.00     Max.   :0              
 LightActiveDistance ModeratelyActiveDistance VeryActiveDistance
 Min.   :0.000       Min.   :0.0000           Min.   :0.0000    
 1st Qu.:2.973       1st Qu.:0.3225           1st Qu.:0.0625    
 Median :4.255       Median :0.5650           Median :1.0350    
 Mean   :3.961       Mean   :0.6482           Mean   :1.5682    
 3rd Qu.:5.397       3rd Qu.:0.9875           3rd Qu.:2.7850    
 Max.   :6.440       Max.   :2.1200           Max.   :6.4000    
   Month.x           WeekDay.x              Date               WeightKg   
 Length:62          Length:62          Min.   :2016-04-17   Min.   :52.6  
 Class :character   Class :character   1st Qu.:2016-04-17   1st Qu.:52.6  
 Mode  :character   Mode  :character   Median :2016-04-24   Median :62.5  
                                       Mean   :2016-04-24   Mean   :62.5  
                                       3rd Qu.:2016-05-02   3rd Qu.:72.4  
                                       Max.   :2016-05-02   Max.   :72.4  
  WeightPounds        Fat            BMI        IsManualReport
 Min.   :116.0   Min.   :22.0   Min.   :22.65   Mode:logical  
 1st Qu.:116.0   1st Qu.:22.0   1st Qu.:22.65   TRUE:62       
 Median :137.8   Median :23.5   Median :25.05                 
 Mean   :137.8   Mean   :23.5   Mean   :25.05                 
 3rd Qu.:159.6   3rd Qu.:25.0   3rd Qu.:27.45                 
 Max.   :159.6   Max.   :25.0   Max.   :27.45                 
     LogId               Time            Month.y           WeekDay.y        
 Min.   :1.461e+12   Length:62         Length:62          Length:62         
 1st Qu.:1.461e+12   Class1:hms        Class :character   Class :character  
 Median :1.462e+12   Class2:difftime   Mode  :character   Mode  :character  
 Mean   :1.462e+12   Mode  :numeric                                         
 3rd Qu.:1.462e+12                                                          
 Max.   :1.462e+12                                                          
    Time_of_day
 Night    : 0  
 Morning  : 0  
 Afternoon: 0  
 Evening  :62  
               
               
Code
summary(hourly_calories_intensities)
       Id               Calories          Date.x              Time.x        
 Min.   :1.504e+09   Min.   : 42.00   Min.   :2016-04-12   Length:15393213  
 1st Qu.:2.320e+09   1st Qu.: 63.00   1st Qu.:2016-04-19   Class1:hms       
 Median :4.445e+09   Median : 83.00   Median :2016-04-26   Class2:difftime  
 Mean   :4.820e+09   Mean   : 97.74   Mean   :2016-04-26   Mode  :numeric   
 3rd Qu.:6.962e+09   3rd Qu.:109.00   3rd Qu.:2016-05-04                    
 Max.   :8.878e+09   Max.   :948.00   Max.   :2016-05-12                    
   Month.x           WeekDay.x           Time_of_day.x     TotalIntensity  
 Length:15393213    Length:15393213    Night    :4545397   Min.   :  0.00  
 Class :character   Class :character   Morning  :3880444   1st Qu.:  0.00  
 Mode  :character   Mode  :character   Afternoon:3814580   Median :  3.00  
                                       Evening  :3152792   Mean   : 12.06  
                                                           3rd Qu.: 16.00  
                                                           Max.   :180.00  
 AverageIntensity     Date.y              Time.y           Month.y         
 Min.   :0.0000   Min.   :2016-04-12   Length:15393213   Length:15393213   
 1st Qu.:0.0000   1st Qu.:2016-04-19   Class1:hms        Class :character  
 Median :0.0500   Median :2016-04-26   Class2:difftime   Mode  :character  
 Mean   :0.2010   Mean   :2016-04-26   Mode  :numeric                      
 3rd Qu.:0.2667   3rd Qu.:2016-05-04                                       
 Max.   :3.0000   Max.   :2016-05-12                                       
  WeekDay.y           Time_of_day.y    
 Length:15393213    Night    :4545397  
 Class :character   Morning  :3880444  
 Mode  :character   Afternoon:3814580  
                    Evening  :3152792  
                                       
                                       
Code
summary(hourly_calories_steps)
       Id               Calories          Date.x              Time.x        
 Min.   :1.504e+09   Min.   : 42.00   Min.   :2016-04-12   Length:15393213  
 1st Qu.:2.320e+09   1st Qu.: 63.00   1st Qu.:2016-04-19   Class1:hms       
 Median :4.445e+09   Median : 83.00   Median :2016-04-26   Class2:difftime  
 Mean   :4.820e+09   Mean   : 97.74   Mean   :2016-04-26   Mode  :numeric   
 3rd Qu.:6.962e+09   3rd Qu.:109.00   3rd Qu.:2016-05-04                    
 Max.   :8.878e+09   Max.   :948.00   Max.   :2016-05-12                    
   Month.x           WeekDay.x           Time_of_day.x       StepTotal      
 Length:15393213    Length:15393213    Night    :4545397   Min.   :    0.0  
 Class :character   Class :character   Morning  :3880444   1st Qu.:    0.0  
 Mode  :character   Mode  :character   Afternoon:3814580   Median :   41.0  
                                       Evening  :3152792   Mean   :  321.2  
                                                           3rd Qu.:  359.0  
                                                           Max.   :10554.0  
     Date.y              Time.y           Month.y           WeekDay.y        
 Min.   :2016-04-12   Length:15393213   Length:15393213    Length:15393213   
 1st Qu.:2016-04-19   Class1:hms        Class :character   Class :character  
 Median :2016-04-26   Class2:difftime   Mode  :character   Mode  :character  
 Mean   :2016-04-26   Mode  :numeric                                         
 3rd Qu.:2016-05-04                                                          
 Max.   :2016-05-12                                                          
   Time_of_day.y    
 Night    :4545397  
 Morning  :3880444  
 Afternoon:3814580  
 Evening  :3152792  
                    
                    
Code
summary(hourly_intensities_calories)
       Id            ActivityDay.x           Calories      Month.x         
 Min.   :1.504e+09   Min.   :2016-04-12   Min.   :   0   Length:19615642   
 1st Qu.:2.320e+09   1st Qu.:2016-04-19   1st Qu.:1821   Class :character  
 Median :4.445e+09   Median :2016-04-26   Median :2162   Mode  :character  
 Mean   :4.802e+09   Mean   :2016-04-26   Mean   :2320                     
 3rd Qu.:6.962e+09   3rd Qu.:2016-05-04   3rd Qu.:2809                     
 Max.   :8.878e+09   Max.   :2016-05-12   Max.   :4900                     
  WeekDay.x         ActivityDay.y          StepTotal       Month.y         
 Length:19615642    Min.   :2016-04-12   Min.   :    0   Length:19615642   
 Class :character   1st Qu.:2016-04-19   1st Qu.: 3761   Class :character  
 Mode  :character   Median :2016-04-26   Median : 7502   Mode  :character  
                    Mean   :2016-04-26   Mean   : 7696                     
                    3rd Qu.:2016-05-04   3rd Qu.:10817                     
                    Max.   :2016-05-12   Max.   :36019                     
  WeekDay.y         TotalIntensity   AverageIntensity      Date           
 Length:19615642    Min.   :  0.00   Min.   :0.0000   Min.   :2016-04-12  
 Class :character   1st Qu.:  0.00   1st Qu.:0.0000   1st Qu.:2016-04-19  
 Mode  :character   Median :  3.00   Median :0.0500   Median :2016-04-26  
                    Mean   : 12.06   Mean   :0.2010   Mean   :2016-04-26  
                    3rd Qu.: 16.00   3rd Qu.:0.2667   3rd Qu.:2016-05-04  
                    Max.   :180.00   Max.   :3.0000   Max.   :2016-05-12  
     Time             Month             WeekDay             Time_of_day     
 Length:19615642   Length:19615642    Length:19615642    Night    :5791418  
 Class1:hms        Class :character   Class :character   Morning  :4945271  
 Class2:difftime   Mode  :character   Mode  :character   Afternoon:4861084  
 Mode  :numeric                                          Evening  :4017869  
                                                                            
                                                                            

7 ANALYZE PHASE

In this crucial phase, I will analyze the clean and processed dataset to uncover answers to known and untapped questions. This will guide Bellabeat’s stakeholders and marketing executives to make informed decisions and develop a targeted marketing campaign. Ultimately, this will help retain the existing customer base as well as improve services to the highest standards possible.

7.1 Key Task

    • A list of computations was performed to perceive information in a more comprehensive manner in order to understand how female customers are utilizing products and services.
    • The analysis was conducted to obtain a thorough understanding of female customer traits and to identify any patterns that could help the analytics team determine areas for improvement.
    • Concatenating multiple datasets expands the ability to explore trends and relationships that may exist, thereby providing a clarity on the significance of the user base.
    • Several columns were aggregated in order to create another attribute on which comparisons were performed accordingly to ensure a refined understanding of the day to day activity recorded through the bellabeat app.
    • Various R built-in functions were used to thoroughly examine these datasets and to finalize the profiling for this analysis.

7.2 Deliverables

    • Numerous analysis were performed using some specific functions such as summarise(), distinct(),group_by(),describe(),table(), etc.
    • The computations will state a brief on how female customers perceive services and products across different categories.
    • In addition,some statistical operations were also performed to get the relevant distribution of attributes within datasets influencing customer’s conduct.

7.3 Code Chunk

Getting an overview of the maximum and minimum calories burned by each user using max() and min() function respectively.

Code
calorie_steps %>% 
  group_by(Id) %>% 
  summarise(Maximum_Calories = max(Calories),Minimum_Calores= min(Calories)) %>% 
  distinct() %>% 
  ungroup() %>% 
  slice(1:13)
# A tibble: 13 × 3
           Id Maximum_Calories Minimum_Calores
        <dbl>            <dbl>           <dbl>
 1 1503960366             2159               0
 2 1624580081             2690            1002
 3 1644430081             3846            1276
 4 1844505072             2130             665
 5 1927972279             2638            1383
 6 2022484408             3158            1848
 7 2026352035             1926            1141
 8 2320127002             2124            1125
 9 2347167796             2670             403
10 2873212765             2241            1431
11 3372868164             2124            1237
12 3977333714             1760              52
13 4020332650             3879            1120

Getting a brief on the comparison between calories and steps using max().

Code
calorie_steps %>% 
  group_by(Id) %>%
  summarise(Maximum_step = max(StepTotal),Calories = max(Calories)) %>% 
  distinct() %>% 
  ungroup() %>% 
  slice(1:13)
# A tibble: 13 × 3
           Id Maximum_step Calories
        <dbl>        <dbl>    <dbl>
 1 1503960366        18134     2159
 2 1624580081        36019     2690
 3 1644430081        18213     3846
 4 1844505072         8054     2130
 5 1927972279         3790     2638
 6 2022484408        18387     3158
 7 2026352035        12357     1926
 8 2320127002        10725     2124
 9 2347167796        22244     2670
10 2873212765         9685     2241
11 3372868164         9715     2124
12 3977333714        16520     1760
13 4020332650        11728     3879

Taking a comparison on some specific parameters such as total steps, total distance and Tracked distance based on each weekday.

Code
daily_activity %>%
  group_by(WeekDay) %>%
  select(TotalSteps,TotalDistance,TrackerDistance) %>%
  summarise_all(mean)
# A tibble: 7 × 4
  WeekDay   TotalSteps TotalDistance TrackerDistance
  <chr>          <dbl>         <dbl>           <dbl>
1 Friday         7448.          5.31            5.30
2 Monday         7781.          5.55            5.53
3 Saturday       8153.          5.85            5.85
4 Sunday         6933.          5.03            5.03
5 Thursday       7406.          5.31            5.29
6 Tuesday        8125.          5.83            5.81
7 Wednesday      7559.          5.49            5.47

Unraveling the most to least active covered distance average using a summarize_all() function.

Code
dailyActivity_sleep %>% 
  select(VeryActiveDistance,ModeratelyActiveDistance,LightActiveDistance) %>%
  summarize_all(mean)
  VeryActiveDistance ModeratelyActiveDistance LightActiveDistance
1           1.397498                0.7308617            3.532016

Understanding the split between most and least active time spent on workout grouped by each weekday using summarize_all().

Code
daily_activity %>%
  group_by(WeekDay) %>% 
  select(VeryActiveMinutes,FairlyActiveMinutes,LightlyActiveMinutes) %>% 
  summarize_all(mean)
# A tibble: 7 × 4
  WeekDay   VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes
  <chr>                 <dbl>               <dbl>                <dbl>
1 Friday                 20.1                12.1                 204.
2 Monday                 23.1                14                   192.
3 Saturday               21.9                15.2                 207.
4 Sunday                 20.0                14.5                 174.
5 Thursday               19.4                12.0                 185.
6 Tuesday                23.0                14.3                 197.
7 Wednesday              20.8                13.1                 190.

Previewing sum of very active minutes, fairly active minutes and lightly active minutes grouped by distinctive Id’s and month.

Code
daily_activity %>%
  group_by(Id,Month) %>%
  summarize_at(vars(VeryActiveMinutes, FairlyActiveMinutes, LightlyActiveMinutes), sum) %>% 
  distinct() %>% 
  ungroup() %>% 
  slice(1:15)
# A tibble: 15 × 5
           Id Month VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes
        <dbl> <chr>             <dbl>               <dbl>                <dbl>
 1 1503960366 April               762                 349                 4250
 2 1503960366 May                 438                 245                 2568
 3 1624580081 April                76                 111                 3205
 4 1624580081 May                 193                  69                 1553
 5 1644430081 April               171                 456                 3292
 6 1644430081 May                 116                 185                 2062
 7 1844505072 April                 4                  33                 3111
 8 1844505072 May                   0                   7                  468
 9 1927972279 April                 1                  15                  772
10 1927972279 May                  40                   9                  424
11 2022484408 April               691                 368                 5077
12 2022484408 May                 434                 232                 2904
13 2026352035 April                 3                   8                 4325
14 2026352035 May                   0                   0                 3631
15 2320127002 April                17                  53                 3867

Glance on maximum amount of sleep vs total time in bed using max().

Code
sleep_day %>% 
  group_by(Id,Month) %>% 
  summarize(max(TotalMinutesAsleep),max(TotalTimeInBed)) %>% 
  distinct() %>% 
  ungroup() %>% 
  slice(1:20)
# A tibble: 20 × 4
           Id Month `max(TotalMinutesAsleep)` `max(TotalTimeInBed)`
        <dbl> <chr>                     <dbl>                 <dbl>
 1 1503960366 April                       700                   712
 2 1503960366 May                         594                   611
 3 1644430081 April                       124                   142
 4 1644430081 May                         796                   961
 5 1844505072 April                       722                   961
 6 1844505072 May                         590                   961
 7 1927972279 April                       750                   775
 8 2026352035 April                       573                   607
 9 2026352035 May                         541                   568
10 2320127002 April                        61                    69
11 2347167796 April                       556                   602
12 3977333714 April                       424                   566
13 3977333714 May                         383                   626
14 4020332650 April                       501                   541
15 4020332650 May                         478                   536
16 4319703577 April                       692                   722
17 4319703577 May                         602                   638
18 4388161847 April                       619                   641
19 4388161847 May                         547                   597
20 4445114986 April                       462                   499

Glance on minimum amount of sleep attained vs total time in bed using min().

Code
sleep_day %>% 
  group_by(Id) %>% 
  summarize(min(TotalMinutesAsleep),min(TotalTimeInBed)) %>% 
  distinct() %>% 
  ungroup() %>% 
  slice(1:13)
# A tibble: 13 × 3
           Id `min(TotalMinutesAsleep)` `min(TotalTimeInBed)`
        <dbl>                     <dbl>                 <dbl>
 1 1503960366                       245                   264
 2 1644430081                       119                   127
 3 1844505072                       590                   961
 4 1927972279                       166                   178
 5 2026352035                       357                   380
 6 2320127002                        61                    69
 7 2347167796                       374                   386
 8 3977333714                       152                   305
 9 4020332650                        77                    77
10 4319703577                        59                    65
11 4388161847                        62                    65
12 4445114986                        98                   107
13 4558609924                       103                   121

Glimpse on a comparison between sedentary minutes, lightly active minutes and fairly active minutes using summarize_all().

Code
daily_Intensities %>% 
  group_by(WeekDay) %>% 
  select(SedentaryMinutes,LightlyActiveMinutes,FairlyActiveMinutes) %>% 
  summarise_all(mean)
# A tibble: 7 × 4
  WeekDay   SedentaryMinutes LightlyActiveMinutes FairlyActiveMinutes
  <chr>                <dbl>                <dbl>               <dbl>
1 Friday               1000.                 204.                12.1
2 Monday               1028.                 192.                14  
3 Saturday              964.                 207.                15.2
4 Sunday                990.                 174.                14.5
5 Thursday              962.                 185.                12.0
6 Tuesday              1007.                 197.                14.3
7 Wednesday             989.                 190.                13.1

Previewing a distinct user’s total sum of sedentary minutes, lightly active minutes and fairly active minutes using a sum() function.

Code
dailyIntensities_weight %>% 
  group_by(Id) %>% 
  summarise(sedentary_Minutes=sum(SedentaryMinutes), LightlyActive_Minutes=sum(LightlyActiveMinutes), 
            Active_Minutes=sum(FairlyActiveMinutes)) %>% 
            distinct() %>% ungroup() %>% slice(1:13)
# A tibble: 2 × 4
          Id sedentary_Minutes LightlyActive_Minutes Active_Minutes
       <dbl>             <dbl>                 <dbl>          <dbl>
1 1503960366             26293                  6818            594
2 4319703577             22810                  7092            382

Using ungroup() and slice() to preview the set number of entries.

Glimpse on fetching out the average intensity of workout under several parameters with exact hour and precise time of the day.

Code
dailyIntensities_weight %>% 
  group_by(Time_of_day,Time) %>% 
  select(SedentaryMinutes,LightlyActiveMinutes,FairlyActiveMinutes,VeryActiveMinutes) %>% 
  summarise_all(mean) %>% 
  ungroup() %>% 
  slice(1:20)
# A tibble: 1 × 6
  Time_of_day Time     SedentaryMinutes LightlyActiveMinutes FairlyAct…¹ VeryA…²
  <fct>       <time>              <dbl>                <dbl>       <dbl>   <dbl>
1 Evening     23:59:59             792.                 224.        15.7    21.1
# … with abbreviated variable names ¹​FairlyActiveMinutes, ²​VeryActiveMinutes

Glimpse on fetching out the average time of workout recorded under several parameters with exact hour and precise time of the day.

Code
dailyIntensities_weight %>% 
  group_by(Time_of_day,Time) %>% 
  select(LightlyActiveMinutes,ModeratelyActiveDistance,VeryActiveDistance) %>% 
  summarise_all(mean) %>% 
  ungroup() %>% 
  slice(1:20)
# A tibble: 1 × 5
  Time_of_day Time     LightlyActiveMinutes ModeratelyActiveDistance VeryActiv…¹
  <fct>       <time>                  <dbl>                    <dbl>       <dbl>
1 Evening     23:59:59                 224.                    0.648        1.57
# … with abbreviated variable name ¹​VeryActiveDistance

Previewing exact intensity ,calories and step total distribution on the basis of time of day parameter.

Code
hourly_intensities_calories %>% 
  group_by(Time_of_day) %>% 
  select(Calories,StepTotal,TotalIntensity) %>% 
  summarise_all(mean)
# A tibble: 4 × 4
  Time_of_day Calories StepTotal TotalIntensity
  <fct>          <dbl>     <dbl>          <dbl>
1 Night          2320.     7694.           2.71
2 Morning        2321.     7697.          15.5 
3 Afternoon      2319.     7699.          19.3 
4 Evening        2319.     7696.          12.6 

Using quantile function dividing and understanding the dataset overall split and difference.

AS NA’s are not allowed in the input data so it is being removed through na.omit() function.

Code
dailyIntensities_weight <- na.omit(dailyIntensities_weight)

sapply(dailyIntensities_weight[,c("LightlyActiveMinutes","ModeratelyActiveDistance","VeryActiveDistance","WeightKg")]
       , quantile, probs = c(0.25,0.35,0.5,0.75,0.85,1))
     LightlyActiveMinutes ModeratelyActiveDistance VeryActiveDistance WeightKg
25%                191.75                   0.3225             0.0625     52.6
35%                207.05                   0.4100             0.2765     52.6
50%                233.00                   0.5650             1.0350     62.5
75%                288.75                   0.9875             2.7850     72.4
85%                312.70                   1.1540             3.3350     72.4
100%               390.00                   2.1200             6.4000     72.4

Taking a look at the overall weight distribution in comparison with total time in bed and other parameter indices using a quantile() function.

AS NA’s are not allowed in the input data so it is being removed through na.omit() function.

Code
weight_sleep <- na.omit(weight_sleep)

sapply(weight_sleep[,c("WeightPounds","Fat","BMI","TotalTimeInBed","TotalMinutesAsleep" )], 
       quantile, probs = c(0.25,0.35,0.5,0.75,0.85,1))
     WeightPounds Fat   BMI TotalTimeInBed TotalMinutesAsleep
25%      115.9631  22 22.65          351.5              332.5
35%      115.9631  22 22.65          380.5              360.5
50%      159.6147  25 27.45          449.0              430.0
75%      159.6147  25 27.45          522.0              505.5
85%      159.6147  25 27.45          550.5              526.0
100%     159.6147  25 27.45          722.0              700.0

Summarizing the dataset specific column to take up the exact overview of respective columns and also rounding of the summary in two decimal places using ‘digit’.

Code
hourly_calories_intensities %>% 
  select(Calories,TotalIntensity, AverageIntensity) %>% 
  summary(mean, median, max, min, digits = 2)
    Calories   TotalIntensity AverageIntensity
 Min.   : 42   Min.   :  0    Min.   :0.00    
 1st Qu.: 63   1st Qu.:  0    1st Qu.:0.00    
 Median : 83   Median :  3    Median :0.05    
 Mean   : 98   Mean   : 12    Mean   :0.20    
 3rd Qu.:109   3rd Qu.: 16    3rd Qu.:0.27    
 Max.   :948   Max.   :180    Max.   :3.00    

Getting a distinct Id’s wise overview of calories and total step count.

Code
hourly_calories_steps %>% 
  group_by(Id) %>% 
  summarize(Calories = mean(Calories), Step_Total = mean(StepTotal)) %>% 
  distinct()
# A tibble: 33 × 3
           Id Calories Step_Total
        <dbl>    <dbl>      <dbl>
 1 1503960366     78.5      522. 
 2 1624580081     62.5      242. 
 3 1644430081    119.       308. 
 4 1844505072     66.6      109. 
 5 1927972279     91.5       38.6
 6 2022484408    105.       478. 
 7 2026352035     64.9      234. 
 8 2320127002     72.6      199. 
 9 2347167796     88.7      414. 
10 2873212765     80.2      318. 
# … with 23 more rows

Digging up various descriptive statistics and different prospects in the dataset using a described() function.

Code
describe(minute_calories_wide)
           vars     n         mean           sd       median      trimmed
Id            1 21645 4.836965e+09 2.424088e+09 4.445115e+09 4.754963e+09
Calories00    2 21645 1.620000e+00 1.400000e+00 1.220000e+00 1.300000e+00
Calories01    3 21645 1.630000e+00 1.400000e+00 1.220000e+00 1.310000e+00
Calories02    4 21645 1.640000e+00 1.410000e+00 1.220000e+00 1.310000e+00
Calories03    5 21645 1.640000e+00 1.420000e+00 1.220000e+00 1.310000e+00
Calories04    6 21645 1.640000e+00 1.430000e+00 1.220000e+00 1.310000e+00
Calories05    7 21645 1.640000e+00 1.440000e+00 1.220000e+00 1.310000e+00
Calories06    8 21645 1.640000e+00 1.440000e+00 1.220000e+00 1.310000e+00
Calories07    9 21645 1.630000e+00 1.420000e+00 1.220000e+00 1.300000e+00
Calories08   10 21645 1.620000e+00 1.410000e+00 1.220000e+00 1.300000e+00
Calories09   11 21645 1.620000e+00 1.400000e+00 1.220000e+00 1.300000e+00
Calories10   12 21645 1.620000e+00 1.410000e+00 1.220000e+00 1.300000e+00
Calories11   13 21645 1.620000e+00 1.410000e+00 1.220000e+00 1.300000e+00
Calories12   14 21645 1.610000e+00 1.400000e+00 1.220000e+00 1.290000e+00
Calories13   15 21645 1.610000e+00 1.410000e+00 1.220000e+00 1.290000e+00
Calories14   16 21645 1.610000e+00 1.400000e+00 1.220000e+00 1.290000e+00
Calories15   17 21645 1.610000e+00 1.400000e+00 1.220000e+00 1.290000e+00
Calories16   18 21645 1.610000e+00 1.400000e+00 1.220000e+00 1.280000e+00
Calories17   19 21645 1.610000e+00 1.410000e+00 1.220000e+00 1.290000e+00
Calories18   20 21645 1.610000e+00 1.400000e+00 1.220000e+00 1.290000e+00
Calories19   21 21645 1.620000e+00 1.410000e+00 1.220000e+00 1.290000e+00
Calories20   22 21645 1.620000e+00 1.410000e+00 1.220000e+00 1.290000e+00
Calories21   23 21645 1.610000e+00 1.410000e+00 1.220000e+00 1.290000e+00
Calories22   24 21645 1.630000e+00 1.420000e+00 1.220000e+00 1.300000e+00
Calories23   25 21645 1.620000e+00 1.410000e+00 1.220000e+00 1.290000e+00
Calories24   26 21645 1.610000e+00 1.420000e+00 1.220000e+00 1.290000e+00
Calories25   27 21645 1.620000e+00 1.420000e+00 1.220000e+00 1.290000e+00
Calories26   28 21645 1.610000e+00 1.410000e+00 1.220000e+00 1.290000e+00
Calories27   29 21645 1.620000e+00 1.420000e+00 1.220000e+00 1.290000e+00
Calories28   30 21645 1.620000e+00 1.420000e+00 1.220000e+00 1.290000e+00
Calories29   31 21645 1.620000e+00 1.430000e+00 1.220000e+00 1.300000e+00
Calories30   32 21645 1.620000e+00 1.420000e+00 1.220000e+00 1.290000e+00
Calories31   33 21645 1.630000e+00 1.410000e+00 1.220000e+00 1.300000e+00
Calories32   34 21645 1.630000e+00 1.430000e+00 1.220000e+00 1.300000e+00
Calories33   35 21645 1.640000e+00 1.440000e+00 1.220000e+00 1.300000e+00
Calories34   36 21645 1.630000e+00 1.430000e+00 1.220000e+00 1.300000e+00
Calories35   37 21645 1.630000e+00 1.420000e+00 1.220000e+00 1.310000e+00
Calories36   38 21645 1.640000e+00 1.460000e+00 1.220000e+00 1.310000e+00
Calories37   39 21645 1.640000e+00 1.450000e+00 1.220000e+00 1.300000e+00
Calories38   40 21645 1.630000e+00 1.450000e+00 1.220000e+00 1.300000e+00
Calories39   41 21645 1.630000e+00 1.430000e+00 1.220000e+00 1.300000e+00
Calories40   42 21645 1.630000e+00 1.420000e+00 1.220000e+00 1.300000e+00
Calories41   43 21645 1.620000e+00 1.410000e+00 1.220000e+00 1.300000e+00
Calories42   44 21645 1.620000e+00 1.410000e+00 1.220000e+00 1.290000e+00
Calories43   45 21645 1.620000e+00 1.410000e+00 1.220000e+00 1.300000e+00
Calories44   46 21645 1.620000e+00 1.410000e+00 1.220000e+00 1.300000e+00
Calories45   47 21645 1.620000e+00 1.400000e+00 1.220000e+00 1.300000e+00
Calories46   48 21645 1.620000e+00 1.410000e+00 1.220000e+00 1.290000e+00
Calories47   49 21645 1.620000e+00 1.410000e+00 1.220000e+00 1.290000e+00
Calories48   50 21645 1.620000e+00 1.400000e+00 1.220000e+00 1.300000e+00
Calories49   51 21645 1.620000e+00 1.400000e+00 1.220000e+00 1.300000e+00
Calories50   52 21645 1.620000e+00 1.410000e+00 1.220000e+00 1.300000e+00
Calories51   53 21645 1.610000e+00 1.400000e+00 1.220000e+00 1.290000e+00
Calories52   54 21645 1.620000e+00 1.410000e+00 1.220000e+00 1.300000e+00
Calories53   55 21645 1.620000e+00 1.400000e+00 1.220000e+00 1.300000e+00
Calories54   56 21645 1.620000e+00 1.410000e+00 1.220000e+00 1.300000e+00
Calories55   57 21645 1.620000e+00 1.390000e+00 1.220000e+00 1.300000e+00
Calories56   58 21645 1.610000e+00 1.380000e+00 1.220000e+00 1.290000e+00
Calories57   59 21645 1.610000e+00 1.370000e+00 1.220000e+00 1.300000e+00
Calories58   60 21645 1.610000e+00 1.370000e+00 1.220000e+00 1.300000e+00
Calories59   61 21645 1.610000e+00 1.370000e+00 1.220000e+00 1.300000e+00
Date         62 21645          NaN           NA           NA          NaN
Time         63 21645          NaN           NA           NA          NaN
Month*       64 21645 1.360000e+00 4.800000e-01 1.000000e+00 1.330000e+00
WeekDay*     65 21645 4.090000e+00 2.030000e+00 4.000000e+00 4.110000e+00
                    mad         min          max        range  skew kurtosis
Id         3.586058e+09 1.50396e+09 8.877689e+09 7.373729e+09  0.19    -1.27
Calories00 4.100000e-01 7.00000e-01 1.973000e+01 1.902000e+01  4.15    23.73
Calories01 4.100000e-01 7.00000e-01 1.973000e+01 1.902000e+01  4.14    23.72
Calories02 4.100000e-01 7.00000e-01 1.973000e+01 1.902000e+01  4.02    22.18
Calories03 4.100000e-01 7.00000e-01 1.973000e+01 1.902000e+01  4.03    21.64
Calories04 4.100000e-01 7.00000e-01 1.973000e+01 1.902000e+01  4.03    21.48
Calories05 4.100000e-01 7.00000e-01 1.973000e+01 1.902000e+01  4.11    22.63
Calories06 4.100000e-01 7.00000e-01 1.973000e+01 1.902000e+01  4.07    22.26
Calories07 4.100000e-01 7.00000e-01 1.973000e+01 1.902000e+01  4.13    22.92
Calories08 4.100000e-01 7.00000e-01 1.973000e+01 1.902000e+01  4.13    22.97
Calories09 4.100000e-01 7.00000e-01 1.676000e+01 1.606000e+01  4.05    21.62
Calories10 4.100000e-01 7.00000e-01 1.744000e+01 1.674000e+01  4.11    22.27
Calories11 4.100000e-01 7.00000e-01 1.676000e+01 1.606000e+01  4.10    22.16
Calories12 4.100000e-01 7.00000e-01 1.744000e+01 1.674000e+01  4.15    22.66
Calories13 4.100000e-01 7.00000e-01 1.668000e+01 1.598000e+01  4.19    22.97
Calories14 4.100000e-01 0.00000e+00 1.693000e+01 1.693000e+01  4.15    22.62
Calories15 4.100000e-01 7.00000e-01 1.719000e+01 1.648000e+01  4.17    22.95
Calories16 4.100000e-01 7.00000e-01 1.719000e+01 1.648000e+01  4.23    23.42
Calories17 4.100000e-01 7.00000e-01 1.744000e+01 1.674000e+01  4.23    23.70
Calories18 4.100000e-01 7.00000e-01 1.693000e+01 1.623000e+01  4.19    23.22
Calories19 4.100000e-01 7.00000e-01 1.668000e+01 1.598000e+01  4.17    23.03
Calories20 4.100000e-01 7.00000e-01 1.630000e+01 1.560000e+01  4.14    22.41
Calories21 4.100000e-01 7.00000e-01 1.683000e+01 1.612000e+01  4.21    23.36
Calories22 4.100000e-01 7.00000e-01 1.778000e+01 1.708000e+01  4.19    23.48
Calories23 4.100000e-01 7.00000e-01 1.778000e+01 1.708000e+01  4.21    23.64
Calories24 4.100000e-01 7.00000e-01 1.735000e+01 1.664000e+01  4.19    23.09
Calories25 4.100000e-01 0.00000e+00 1.709000e+01 1.709000e+01  4.20    23.18
Calories26 4.100000e-01 7.00000e-01 1.699000e+01 1.628000e+01  4.22    23.58
Calories27 4.100000e-01 7.00000e-01 1.723000e+01 1.653000e+01  4.25    23.92
Calories28 4.100000e-01 7.00000e-01 1.683000e+01 1.612000e+01  4.25    23.97
Calories29 4.100000e-01 7.00000e-01 1.735000e+01 1.664000e+01  4.20    23.15
Calories30 4.100000e-01 7.00000e-01 1.735000e+01 1.664000e+01  4.13    22.33
Calories31 4.100000e-01 7.00000e-01 1.761000e+01 1.691000e+01  4.14    22.67
Calories32 4.100000e-01 7.00000e-01 1.761000e+01 1.691000e+01  4.08    21.92
Calories33 4.100000e-01 7.00000e-01 1.761000e+01 1.691000e+01  4.07    21.79
Calories34 4.100000e-01 7.00000e-01 1.787000e+01 1.717000e+01  4.08    22.18
Calories35 4.100000e-01 7.00000e-01 1.787000e+01 1.717000e+01  4.08    22.23
Calories36 4.100000e-01 7.00000e-01 1.975000e+01 1.905000e+01  4.16    23.30
Calories37 4.100000e-01 7.00000e-01 1.975000e+01 1.905000e+01  4.22    24.09
Calories38 4.100000e-01 7.00000e-01 1.975000e+01 1.905000e+01  4.20    23.85
Calories39 4.100000e-01 7.00000e-01 1.975000e+01 1.905000e+01  4.15    23.38
Calories40 4.100000e-01 7.00000e-01 1.975000e+01 1.905000e+01  4.19    23.81
Calories41 4.100000e-01 7.00000e-01 1.975000e+01 1.905000e+01  4.13    23.20
Calories42 4.100000e-01 7.00000e-01 1.975000e+01 1.905000e+01  4.20    23.93
Calories43 4.100000e-01 7.00000e-01 1.975000e+01 1.905000e+01  4.20    24.37
Calories44 4.100000e-01 7.00000e-01 1.975000e+01 1.905000e+01  4.25    24.84
Calories45 4.100000e-01 7.00000e-01 1.975000e+01 1.905000e+01  4.24    24.57
Calories46 4.100000e-01 7.00000e-01 1.975000e+01 1.905000e+01  4.21    24.11
Calories47 4.100000e-01 7.00000e-01 1.975000e+01 1.905000e+01  4.18    23.76
Calories48 4.100000e-01 7.00000e-01 1.975000e+01 1.905000e+01  4.17    23.75
Calories49 4.100000e-01 7.00000e-01 1.975000e+01 1.905000e+01  4.13    23.21
Calories50 4.100000e-01 7.00000e-01 1.975000e+01 1.905000e+01  4.14    23.39
Calories51 4.100000e-01 7.00000e-01 1.975000e+01 1.905000e+01  4.16    23.70
Calories52 4.100000e-01 7.00000e-01 1.975000e+01 1.905000e+01  4.18    23.83
Calories53 4.100000e-01 7.00000e-01 1.975000e+01 1.905000e+01  4.17    23.87
Calories54 4.100000e-01 7.00000e-01 1.975000e+01 1.905000e+01  4.24    24.68
Calories55 4.100000e-01 7.00000e-01 1.975000e+01 1.905000e+01  4.23    24.79
Calories56 4.100000e-01 7.00000e-01 1.973000e+01 1.902000e+01  4.22    24.51
Calories57 4.100000e-01 7.00000e-01 1.973000e+01 1.902000e+01  4.11    23.26
Calories58 4.100000e-01 7.00000e-01 1.973000e+01 1.902000e+01  4.15    23.62
Calories59 4.100000e-01 0.00000e+00 1.973000e+01 1.973000e+01  4.12    23.54
Date                 NA         Inf         -Inf         -Inf    NA       NA
Time                 NA         Inf         -Inf         -Inf    NA       NA
Month*     0.000000e+00 1.00000e+00 2.000000e+00 1.000000e+00  0.58    -1.67
WeekDay*   2.970000e+00 1.00000e+00 7.000000e+00 6.000000e+00 -0.06    -1.26
                    se
Id         16476672.88
Calories00        0.01
Calories01        0.01
Calories02        0.01
Calories03        0.01
Calories04        0.01
Calories05        0.01
Calories06        0.01
Calories07        0.01
Calories08        0.01
Calories09        0.01
Calories10        0.01
Calories11        0.01
Calories12        0.01
Calories13        0.01
Calories14        0.01
Calories15        0.01
Calories16        0.01
Calories17        0.01
Calories18        0.01
Calories19        0.01
Calories20        0.01
Calories21        0.01
Calories22        0.01
Calories23        0.01
Calories24        0.01
Calories25        0.01
Calories26        0.01
Calories27        0.01
Calories28        0.01
Calories29        0.01
Calories30        0.01
Calories31        0.01
Calories32        0.01
Calories33        0.01
Calories34        0.01
Calories35        0.01
Calories36        0.01
Calories37        0.01
Calories38        0.01
Calories39        0.01
Calories40        0.01
Calories41        0.01
Calories42        0.01
Calories43        0.01
Calories44        0.01
Calories45        0.01
Calories46        0.01
Calories47        0.01
Calories48        0.01
Calories49        0.01
Calories50        0.01
Calories51        0.01
Calories52        0.01
Calories53        0.01
Calories54        0.01
Calories55        0.01
Calories56        0.01
Calories57        0.01
Calories58        0.01
Calories59        0.01
Date                NA
Time                NA
Month*            0.00
WeekDay*          0.01
Code
describe(minute_steps_wide)
             vars     n         mean           sd     median      trimmed
Id              1 21645 4.836965e+09 2.424088e+09 4445114986 4.754963e+09
Steps00         2 21645 5.300000e+00 1.778000e+01          0 5.700000e-01
Steps01         3 21645 5.340000e+00 1.768000e+01          0 6.300000e-01
Steps02         4 21645 5.530000e+00 1.808000e+01          0 6.800000e-01
Steps03         5 21645 5.470000e+00 1.811000e+01          0 6.300000e-01
Steps04         6 21645 5.460000e+00 1.829000e+01          0 6.100000e-01
Steps05         7 21645 5.590000e+00 1.857000e+01          0 6.200000e-01
Steps06         8 21645 5.560000e+00 1.848000e+01          0 6.500000e-01
Steps07         9 21645 5.410000e+00 1.834000e+01          0 5.600000e-01
Steps08        10 21645 5.360000e+00 1.821000e+01          0 5.500000e-01
Steps09        11 21645 5.360000e+00 1.819000e+01          0 5.400000e-01
Steps10        12 21645 5.340000e+00 1.834000e+01          0 5.200000e-01
Steps11        13 21645 5.290000e+00 1.818000e+01          0 5.200000e-01
Steps12        14 21645 5.300000e+00 1.830000e+01          0 4.800000e-01
Steps13        15 21645 5.260000e+00 1.835000e+01          0 4.600000e-01
Steps14        16 21645 5.340000e+00 1.840000e+01          0 5.200000e-01
Steps15        17 21645 5.280000e+00 1.829000e+01          0 4.800000e-01
Steps16        18 21645 5.210000e+00 1.815000e+01          0 4.400000e-01
Steps17        19 21645 5.290000e+00 1.822000e+01          0 4.900000e-01
Steps18        20 21645 5.350000e+00 1.830000e+01          0 5.300000e-01
Steps19        21 21645 5.420000e+00 1.849000e+01          0 5.200000e-01
Steps20        22 21645 5.300000e+00 1.844000e+01          0 4.700000e-01
Steps21        23 21645 5.290000e+00 1.837000e+01          0 4.900000e-01
Steps22        24 21645 5.530000e+00 1.871000e+01          0 5.800000e-01
Steps23        25 21645 5.350000e+00 1.839000e+01          0 5.100000e-01
Steps24        26 21645 5.310000e+00 1.827000e+01          0 5.000000e-01
Steps25        27 21645 5.300000e+00 1.830000e+01          0 4.700000e-01
Steps26        28 21645 5.250000e+00 1.816000e+01          0 4.900000e-01
Steps27        29 21645 5.310000e+00 1.822000e+01          0 4.900000e-01
Steps28        30 21645 5.270000e+00 1.802000e+01          0 5.100000e-01
Steps29        31 21645 5.260000e+00 1.802000e+01          0 5.300000e-01
Steps30        32 21645 5.400000e+00 1.832000e+01          0 5.500000e-01
Steps31        33 21645 5.360000e+00 1.812000e+01          0 6.000000e-01
Steps32        34 21645 5.440000e+00 1.820000e+01          0 6.100000e-01
Steps33        35 21645 5.500000e+00 1.840000e+01          0 6.000000e-01
Steps34        36 21645 5.470000e+00 1.832000e+01          0 6.000000e-01
Steps35        37 21645 5.420000e+00 1.819000e+01          0 6.100000e-01
Steps36        38 21645 5.580000e+00 1.870000e+01          0 6.000000e-01
Steps37        39 21645 5.500000e+00 1.850000e+01          0 5.800000e-01
Steps38        40 21645 5.480000e+00 1.850000e+01          0 5.400000e-01
Steps39        41 21645 5.340000e+00 1.806000e+01          0 5.500000e-01
Steps40        42 21645 5.380000e+00 1.803000e+01          0 6.000000e-01
Steps41        43 21645 5.340000e+00 1.806000e+01          0 5.700000e-01
Steps42        44 21645 5.260000e+00 1.802000e+01          0 5.100000e-01
Steps43        45 21645 5.290000e+00 1.784000e+01          0 5.600000e-01
Steps44        46 21645 5.350000e+00 1.799000e+01          0 5.700000e-01
Steps45        47 21645 5.240000e+00 1.786000e+01          0 5.300000e-01
Steps46        48 21645 5.340000e+00 1.809000e+01          0 5.200000e-01
Steps47        49 21645 5.300000e+00 1.794000e+01          0 5.300000e-01
Steps48        50 21645 5.320000e+00 1.780000e+01          0 5.500000e-01
Steps49        51 21645 5.350000e+00 1.795000e+01          0 5.500000e-01
Steps50        52 21645 5.330000e+00 1.787000e+01          0 5.700000e-01
Steps51        53 21645 5.190000e+00 1.760000e+01          0 5.100000e-01
Steps52        54 21645 5.230000e+00 1.762000e+01          0 5.300000e-01
Steps53        55 21645 5.150000e+00 1.757000e+01          0 5.200000e-01
Steps54        56 21645 5.220000e+00 1.768000e+01          0 5.400000e-01
Steps55        57 21645 5.280000e+00 1.783000e+01          0 5.600000e-01
Steps56        58 21645 5.180000e+00 1.757000e+01          0 5.300000e-01
Steps57        59 21645 5.250000e+00 1.769000e+01          0 5.500000e-01
Steps58        60 21645 5.140000e+00 1.743000e+01          0 5.400000e-01
Steps59        61 21645 5.290000e+00 1.772000e+01          0 5.700000e-01
Date           62 21645          NaN           NA         NA          NaN
Time           63 21645          NaN           NA         NA          NaN
Month*         64 21645 1.360000e+00 4.800000e-01          1 1.330000e+00
WeekDay*       65 21645 4.090000e+00 2.030000e+00          4 4.110000e+00
Time_of_day*   66 21645 2.360000e+00 1.110000e+00          2 2.330000e+00
                      mad        min        max      range  skew kurtosis
Id           3.586058e+09 1503960366 8877689391 7373729025  0.19    -1.27
Steps00      0.000000e+00          0        186        186  4.66    25.39
Steps01      0.000000e+00          0        180        180  4.62    24.85
Steps02      0.000000e+00          0        182        182  4.48    23.21
Steps03      0.000000e+00          0        182        182  4.57    24.24
Steps04      0.000000e+00          0        181        181  4.63    24.73
Steps05      0.000000e+00          0        180        180  4.54    23.80
Steps06      0.000000e+00          0        181        181  4.60    24.44
Steps07      0.000000e+00          0        183        183  4.69    25.48
Steps08      0.000000e+00          0        180        180  4.71    25.68
Steps09      0.000000e+00          0        183        183  4.64    24.58
Steps10      0.000000e+00          0        180        180  4.73    25.64
Steps11      0.000000e+00          0        181        181  4.75    25.77
Steps12      0.000000e+00          0        181        181  4.72    25.41
Steps13      0.000000e+00          0        180        180  4.78    26.05
Steps14      0.000000e+00          0        182        182  4.75    25.87
Steps15      0.000000e+00          0        179        179  4.74    25.74
Steps16      0.000000e+00          0        180        180  4.78    26.14
Steps17      0.000000e+00          0        183        183  4.73    25.64
Steps18      0.000000e+00          0        180        180  4.73    25.63
Steps19      0.000000e+00          0        182        182  4.68    25.12
Steps20      0.000000e+00          0        179        179  4.76    25.69
Steps21      0.000000e+00          0        185        185  4.79    26.26
Steps22      0.000000e+00          0        182        182  4.63    24.49
Steps23      0.000000e+00          0        187        187  4.74    25.83
Steps24      0.000000e+00          0        180        180  4.74    25.81
Steps25      0.000000e+00          0        181        181  4.75    25.93
Steps26      0.000000e+00          0        186        186  4.77    26.22
Steps27      0.000000e+00          0        180        180  4.70    25.37
Steps28      0.000000e+00          0        181        181  4.72    25.69
Steps29      0.000000e+00          0        183        183  4.75    26.01
Steps30      0.000000e+00          0        181        181  4.67    24.93
Steps31      0.000000e+00          0        181        181  4.71    25.64
Steps32      0.000000e+00          0        181        181  4.62    24.64
Steps33      0.000000e+00          0        182        182  4.59    24.18
Steps34      0.000000e+00          0        180        180  4.62    24.60
Steps35      0.000000e+00          0        187        187  4.66    25.19
Steps36      0.000000e+00          0        183        183  4.58    24.08
Steps37      0.000000e+00          0        181        181  4.63    24.67
Steps38      0.000000e+00          0        185        185  4.58    23.93
Steps39      0.000000e+00          0        184        184  4.66    25.13
Steps40      0.000000e+00          0        184        184  4.66    25.17
Steps41      0.000000e+00          0        184        184  4.68    25.28
Steps42      0.000000e+00          0        180        180  4.72    25.77
Steps43      0.000000e+00          0        188        188  4.66    25.19
Steps44      0.000000e+00          0        220        220  4.64    25.00
Steps45      0.000000e+00          0        184        184  4.68    25.22
Steps46      0.000000e+00          0        207        207  4.64    24.87
Steps47      0.000000e+00          0        190        190  4.61    24.42
Steps48      0.000000e+00          0        182        182  4.58    24.30
Steps49      0.000000e+00          0        182        182  4.59    24.42
Steps50      0.000000e+00          0        182        182  4.59    24.33
Steps51      0.000000e+00          0        181        181  4.67    25.40
Steps52      0.000000e+00          0        181        181  4.66    25.40
Steps53      0.000000e+00          0        181        181  4.79    26.97
Steps54      0.000000e+00          0        184        184  4.70    25.85
Steps55      0.000000e+00          0        181        181  4.69    25.57
Steps56      0.000000e+00          0        182        182  4.73    26.41
Steps57      0.000000e+00          0        182        182  4.67    25.50
Steps58      0.000000e+00          0        180        180  4.77    26.86
Steps59      0.000000e+00          0        189        189  4.66    25.51
Date                   NA        Inf       -Inf       -Inf    NA       NA
Time                   NA        Inf       -Inf       -Inf    NA       NA
Month*       0.000000e+00          1          2          1  0.58    -1.67
WeekDay*     2.970000e+00          1          7          6 -0.06    -1.26
Time_of_day* 1.480000e+00          1          4          3  0.15    -1.33
                      se
Id           16476672.88
Steps00             0.12
Steps01             0.12
Steps02             0.12
Steps03             0.12
Steps04             0.12
Steps05             0.13
Steps06             0.13
Steps07             0.12
Steps08             0.12
Steps09             0.12
Steps10             0.12
Steps11             0.12
Steps12             0.12
Steps13             0.12
Steps14             0.13
Steps15             0.12
Steps16             0.12
Steps17             0.12
Steps18             0.12
Steps19             0.13
Steps20             0.13
Steps21             0.12
Steps22             0.13
Steps23             0.12
Steps24             0.12
Steps25             0.12
Steps26             0.12
Steps27             0.12
Steps28             0.12
Steps29             0.12
Steps30             0.12
Steps31             0.12
Steps32             0.12
Steps33             0.13
Steps34             0.12
Steps35             0.12
Steps36             0.13
Steps37             0.13
Steps38             0.13
Steps39             0.12
Steps40             0.12
Steps41             0.12
Steps42             0.12
Steps43             0.12
Steps44             0.12
Steps45             0.12
Steps46             0.12
Steps47             0.12
Steps48             0.12
Steps49             0.12
Steps50             0.12
Steps51             0.12
Steps52             0.12
Steps53             0.12
Steps54             0.12
Steps55             0.12
Steps56             0.12
Steps57             0.12
Steps58             0.12
Steps59             0.12
Date                  NA
Time                  NA
Month*              0.00
WeekDay*            0.01
Time_of_day*        0.01

Using a table() function displaying a count of each user’s ID occurrence for every single entry recorded in day wise data.

Code
table(daily_activity$Id)

1503960366 1624580081 1644430081 1844505072 1927972279 2022484408 2026352035 
        31         31         30         31         31         31         31 
2320127002 2347167796 2873212765 3372868164 3977333714 4020332650 4057192912 
        31         18         31         20         30         31          4 
4319703577 4388161847 4445114986 4558609924 4702921684 5553957443 5577150313 
        31         31         31         31         31         31         30 
6117666160 6290855005 6775888955 6962181067 7007744171 7086361926 8053475328 
        28         29         26         31         26         31         31 
8253242879 8378563200 8583815059 8792009665 8877689391 
        19         31         31         29         31 

Using a table() function displaying a count of each user’s ID occurrence for every single entry recorded in hourly wise data.

Code
table(hourly_calories$Id)

1503960366 1624580081 1644430081 1844505072 1927972279 2022484408 2026352035 
       717        736        708        731        736        736        736 
2320127002 2347167796 2873212765 3372868164 3977333714 4020332650 4057192912 
       735        414        736        472        696        732         88 
4319703577 4388161847 4445114986 4558609924 4702921684 5553957443 5577150313 
       724        735        735        736        731        730        708 
6117666160 6290855005 6775888955 6962181067 7007744171 7086361926 8053475328 
       660        665        610        732        601        733        735 
8253242879 8378563200 8583815059 8792009665 8877689391 
       431        735        718        672        735 

Using a table() function displaying a count of each user’s ID occurrence for every single entry recorded in minute wise data.

Code
table(minute_calories_wide$Id)

1503960366 1624580081 1644430081 1844505072 1927972279 2022484408 2026352035 
       719        729        684        724        729        729        729 
2320127002 2347167796 2873212765 3372868164 3977333714 4020332650 4057192912 
       729        390        725        448        725        726         64 
4319703577 4388161847 4445114986 4558609924 4702921684 5553957443 5577150313 
       700        715        727        717        726        706        684 
6117666160 6290855005 6775888955 6962181067 7007744171 7086361926 8053475328 
       636        641        586        728        577        720        728 
8253242879 8378563200 8583815059 8792009665 8877689391 
       407        727        694        648        728 

Taking a glance at average of Fairly active and very active minutes using head().

Code
head(mean(daily_activity$FairlyActiveMinutes))
[1] 13.56489
Code
head(mean(daily_activity$VeryActiveMinutes))
[1] 21.16489

Extracting Id’s which are greater or equal to a given threshold and also keeping a count.

Code
activity_users_minutes <- daily_activity %>% 
  filter(FairlyActiveMinutes>=13.56 | VeryActiveMinutes>=21.16) %>% 
  group_by(Id) %>% 
  count(Id)

Taking a total sum of all different attributes combined to get the exact total amount of time users were active.

Code
total_minutes <- sum(daily_activity$VeryActiveMinutes,daily_activity$FairlyActiveMinutes,
                     daily_activity$LightlyActiveMinutes,daily_activity$SedentaryMinutes)

Calculating the percentage by adding each attribute using a sum() and then by dividing it to obtain final quotient.

Code
sedentary_percentage <- sum(daily_activity$SedentaryMinutes)/total_minutes*100
lightly_percentage <- sum(daily_activity$LightlyActiveMinutes)/total_minutes*100
fairly_percentage <- sum(daily_activity$FairlyActiveMinutes)/total_minutes*100
active_percentage <- sum(daily_activity$VeryActiveMinutes)/total_minutes*100

Final results are compiled into a data frame for further visualization.

Code
percentage_minutes_compile <- data.frame(
  label=c("Sedentary", "Lightly", "Fairly", "Very Active"),
  minutes=c(sedentary_percentage,lightly_percentage,fairly_percentage,active_percentage)
)

Taking a glance at average of moderate and very active minutes using head()

Code
head(mean(daily_activity$ModeratelyActiveDistance))
[1] 0.5675426
Code
head(mean(daily_activity$VeryActiveDistance))
[1] 1.502681

Extracting Id’s which are greater or equal to the given threshold and also keeping a count on the other hand.

Code
activity_user_distance <- daily_activity %>% 
  filter(ModeratelyActiveDistance>=0.56 | VeryActiveDistance >= 1.50 ) %>% 
  group_by(Id) %>% 
  count(Id)

Taking a total sum of all different attributes combined to get the exact total amount of distance covered by users.

Code
total_distance <- sum(daily_activity$VeryActiveDistance, daily_activity$ModeratelyActiveDistance, 
                      daily_activity$LightActiveDistance, 
                       daily_activity$SedentaryActiveDistance)

Calculating the percentage by adding each attribute using a sum() and then by dividing it to obtain the final quotient.

Code
veryActive_percentage <- sum(daily_activity$VeryActiveDistance)/total_distance *100
ModerateActive_percentage <- sum(daily_activity$ModeratelyActiveDistance)/total_distance *100
LightActive_percentage <- sum(daily_activity$LightActiveDistance)/total_distance *100
sedentaryDistance_percentage <- sum(daily_activity$SedentaryActiveDistance)/total_distance *100

Final results are compiled into a data frame for further visualization.

Code
percentage_distance_compile <- data.frame(
  label = c("Very Active","Moderate","Lightly Active","Sedentary"),
  distance = c(veryActive_percentage,ModerateActive_percentage,LightActive_percentage,
               sedentaryDistance_percentage)
)

Previewing Calories burned from morning time to evening time for every weekday in a month.

Code
Calories <- minute_calories_narrow %>% 
  filter(Calories >= 1) %>% 
  filter(hms::as_hms(Time) >= hms::as_hms("04:00:00") & hms::as_hms(Time) <= hms::as_hms("21:00:00")) %>% 
  group_by(Time, Month, WeekDay) %>% 
  summarize(total_calories = sum(Calories))

8 SHARE PHASE:

In this phase, potential insights will be shared through the use of appropriate visualizations created with tools such as R and Tableau. These visualizations will depict actionable steps that stakeholders can initiate to address the relevant concern.

8.1 Key Task

    • Selecting the most adequate tools such as R and tableau to illustrate the visualization in a more effective manner.
    • Choosing the appropriate graph type to conclude findings along with legends, labels and heading to improve readability and interpretation.
    • Provide detailed explanations for all aspects of the analysis, including minor details by making the visualization interactive.
    • Ensuring work is easily accessible.

8.2 Deliverables

    • Presentation of findings accompanied with illustration of graphs along with explanations.
    • Put a short brief for each visualization included in this phase to aid effective understanding.
    • All of the visualizations were made interactive in order to provide a wider outlook.

8.3 Visualization

Comparing Total time Slept Vs Total time in Bed using various geom() functions such as geom_point(), geom_smooth() and geom_jitter().

Code
ggplotly(ggplot(data = sleep_day) + aes(x = as_hms(TotalTimeInBed), y = as_hms(TotalMinutesAsleep)) + geom_point() + 
           geom_smooth() + geom_jitter() + labs(title = paste0("<b>", "Total time Asleep Vs Total Time In Bed" ,"</b>"), 
                                                x = "Total Time in Bed" , y = "Total Minutes as Asleep") 
  + theme_minimal())

Here, comparing the split of distance covered for every weekday of a month using geom_bar() function.

Code
ggplotly(ggplot(data = daily_activity) + aes(x = Month, y = TotalDistance, fill = WeekDay) + geom_bar(stat = 'identity',position = 'dodge', width = 1) + 
        scale_fill_manual(values =  c("blue","orange","brown","yellow","black","red","darkgoldenrod")) + 
          labs(title = paste0("<b>", "Comparing Distance for every Weekdays in a month","</b>"), 
        x = "Month", Y = "Total Distance" , fill = "Weekdays" ) + 
          theme(axis.text.x = element_text(vjust = 0.5, hjust=1),plot.background = element_rect(fill = "lightblue")))

Visualized the difference in total steps taken for each weekday of a month using the geom_col() function.

Code
ggplotly(ggplot(data = daily_activity) + aes(x = Month, y = TotalSteps, fill = WeekDay) + geom_col(position = 'dodge',width = 1 )
         + scale_fill_manual(values = c("brown","darkgreen","orange","darkgoldenrod","black","blue","darkorchid")) + 
           labs(title = paste0("<b>","Comparing Total Steps for every Weekdays in a month","</b>"), x = "Month", 
                y ="Total Steps", fill = "Weekdays") + 
           theme(axis.title.x = element_text(vjust = 0.5, hjust = 1), plot.background = element_rect(fill = "skyblue")))

Trying to get the co-relation of total distance covered vs total steps taken every weekday of a month using geom_line(),geom_point() and facet_wrap functions.

Code
ggplotly(ggplot(data = daily_activity) + aes(x = TotalDistance, y = TotalSteps, fill = Month) + geom_line(linewidth = 1.5)+ 
           geom_point(size = 2) + facet_wrap(~WeekDay) + scale_fill_manual(values = c("lightblue","orange") ) + 
           theme(panel.grid.major = element_line(color = "gray", linetype = "dotted")) + 
           labs(title = "Total Distance Vs Total Steps Taken") + xlab("Total Distance") + ylab("Total Steps"))

Visualized daily calories burned for every weekday in a month using type as bar plot.

Code
plot_ly(data = daily_calories, x = ~WeekDay, y = ~Calories, type = "bar", color = ~Month, 
        colors = c("black", "darkorchid")) %>% 
  layout(title = "Daily Calories by Weekday and Month", 
         xaxis = list(title = "Weekday"), yaxis = list(title = "Calories"),
         legend = list(title = list(text = "Month")),
         hovermode = "closest") %>% 
  layout(xaxis = list(tickangle = 60, tickfont = list(size = 10)))

Visualizing hourly calories burned by each user Id for a weekday in a month using type as geom_line().

Create ggplot object.

Code
ggploty_obj <- ggplot(data = hourly_calories, aes(x = as_hms(Time), 
                                                  y = Calories, color = Month)) +
  geom_line(linewidth = 1.5, alpha = 0.8) + facet_wrap(~WeekDay) +
  scale_color_brewer(palette = "Set1") + 
  labs(x = "Time of Day", y = "Calories Burned", 
       title = "Hourly Calories burned each weekday of an month") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Convert ggplot object to plotly object.

Code
plot_obj <- ggplotly(ggploty_obj)

Show plotly object.

Code
plot_obj

Visualized each minute of calories burned for a month on weekday basis using geom_line() and facet_wrap() function.

Create ggplot object.

Code
ggplot_obj <- ggplot(data = Calories, aes(x = as_hms(Time), y = total_calories, color = Month)) +
  geom_line(linewidth = 1.5, alpha = 0.8) + facet_wrap(~WeekDay) +
  scale_color_brewer(palette = "Dark2") + 
  labs(x = "Time of Day", y = "Calories Burned", 
       title = "Each minute of calories burned on a weekday of a month") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Convert ggplot object to plotly object.

Code
plotly_obj <- ggplotly(ggplot_obj)

Display plotly object.

Code
plotly_obj

Comparing randomness and correlation for Total_Minute_As_sleep, Total_Time_In_Bed and Sedentary_Minutes using correlogram.

Code
change_palette_single <- function(data, mapping, method = "lm", ...) {
    correlogram <- ggplot(data = data, mapping = mapping) +
      geom_point(colour = "darkgreen") +
      geom_smooth(method = method, color = "darkred", ...)
    return(correlogram)
  }

Selecting the specific columns.

Code
correlogram <- dailyActivity_sleep %>%
  select(TotalMinutesAsleep, TotalTimeInBed, SedentaryMinutes) %>%
  ggpairs(lower = list(continuous = wrap(change_palette_single, method = "lm",
                                         data = dailyActivity_sleep)),
          diag = list(continuous = wrap("barDiag", colour = "Darkgreen")),
          upper = list(continuous = wrap("cor", size = 4))) +
  theme(panel.grid.major = element_blank()) +
  labs(title = "Correlogram Of Sedentary Minutes vs Sleep")

Convert to interactive plot with ggplotly.

Code
p <- ggplotly(correlogram)

Adjust margin of title.

Code
p <- layout(p, title = list(text = "Correlogram Of Sedentary Minutes vs Sleep",
                            font = list(size = 16),
                            margin = list(b = 40)))

Display interactive plot.

Code
p

Comparing randomness and correlation for weight, BMI and Sedentary minutes using correlogram.

Code
change_palette_single1 <- function(data, mapping, method = "lm", ...) {
  correlogram <- ggplot(data = data, mapping = mapping) +
    geom_point(colour = "darkblue") +
    geom_smooth(method = method, color = "darkgoldenrod", ...)
  return(correlogram)
}

Selecting the specific columns.

Code
correlogram1 <- dailyIntensities_weight %>%
  select(WeightPounds, BMI, SedentaryMinutes) %>%
  ggpairs(lower = list(continuous = wrap(change_palette_single1, 
                  method = "lm", data = dailyIntensities_weight)),
          diag = list(continuous = wrap("barDiag", colour = "Darkblue")),
          upper = list(continuous = wrap("cor", size = 4))) +
  theme(panel.grid.major = element_blank()) +
  labs(title = "Correlogram Of Weight vs Sedentary Minutes")

Convert to interactive plot with ggplotly.

Code
p1 <- ggplotly(correlogram1)

Adjust margin of title

Code
p1 <- layout(p1, title = list(text = "Correlogram Of Weight vs Sedentary Minutes",
                            font = list(size = 16),
                            margin = list(b = 40)))

Display interactive plot.

Code
p1

Here, looking through Total steps Vs Calories followed by sedentary minutes using bubble chart.

Code
ggplotly(ggplot(data = daily_activity) + aes(x = TotalSteps, y = Calories, 
  color = SedentaryMinutes , size = SedentaryMinutes) +
  geom_point(alpha = 0.6) + scale_size(range = c(1.4,10)) + 
    labs(title = "Total Steps Vs Calories" ,x = "Total Steps",
         y = "Calories", color = "Sedentary Minutes") + 
    scale_color_viridis(discrete = FALSE, guide = FALSE)+ theme_ipsum() + 
    guides(color = guide_colorbar(title = "Sedentary Minutes")))

Here, created pie chart to know the exact split of various active minutes spent by users in percentage.

Code
plot_ly(percentage_minutes_compile, labels = ~label, values = ~minutes, 
        type = 'pie',textposition = 'outside',textinfo = 'label+percent') %>%
  layout(title = 'Activity Level Minutes',
         xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
         yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))

Here, created pie chart to know the exact split of various active distance covered by users in percentage.

Code
plot_ly(percentage_distance_compile, labels = ~label, values = ~distance, 
        type = 'pie',textposition = 'outside',textinfo = 'label+percent') %>%
  layout(title = 'Activity Distance',
         xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
         yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))

9 TABLEAU DASHBOARD

Code
htmltools::HTML("
<div class='tableauPlaceholder' id='viz1679140790858' style='position: relative'>
  <noscript>
    <a href='#'>
      <img alt='BELLABEAT DATA ANALYSIS DASHBOARD ' src='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Be&#47;BellaBeatCaseStudy_16779607987320&#47;BellaBeatDashBoard&#47;1_rss.png' style='border: none' />
    </a>
  </noscript>
  <object class='tableauViz'  style='display:none;'>
    <param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' />
    <param name='embed_code_version' value='3' />
    <param name='site_root' value='' />
    <param name='name' value='BellaBeatCaseStudy_16779607987320&#47;BellaBeatDashBoard' />
    <param name='tabs' value='no' />
    <param name='toolbar' value='yes' />
    <param name='static_image' value='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Be&#47;BellaBeatCaseStudy_16779607987320&#47;BellaBeatDashBoard&#47;1.png' />
    <param name='animate_transition' value='yes' />
    <param name='display_static_image' value='yes' />
    <param name='display_spinner' value='yes' />
    <param name='display_overlay' value='yes' />
    <param name='display_count' value='yes' />
    <param name='language' value='en-US' />
  </object>
</div>
<script type='text/javascript'>
  var divElement = document.getElementById('viz1679140790858');
  var vizElement = divElement.getElementsByTagName('object')[0];
  if ( divElement.offsetWidth > 800 ) {
    vizElement.style.width='100%';
    vizElement.style.height=(divElement.offsetWidth*0.75)+'px';
  } else if ( divElement.offsetWidth > 500 ) {
    vizElement.style.width='2000px';
    vizElement.style.height='1327px';
  } else {
    vizElement.style.width='100%';
    vizElement.style.height='2427px';
  }
  var scriptElement = document.createElement('script');
  scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';
  vizElement.parentNode.insertBefore(scriptElement, vizElement);
</script>")

10 ACT PHASE:

This extensively crucial phase of strategizing the new marketing campaign will be carried out by Urška Sršen(Bellabeat’s cofounder and Chief Creative Officer),Sando Mur(Mathematician and Bellabeat’s cofounder; key member of the Bellabeat executive team) and Bellabeat marketing team based on the conclusion of the above analysis made.

11 CONCLUSION:

    • The relationship between total time asleep and total time in bed appears to be quite linear. Nevertheless, some users sleep for more than 10 hours and spend over 12 hours in bed, which results in an unhealthy sleep cycle.
    • The distance covered by each female user appears to be more active during weekends, particularly on Saturdays and Sundays, compared to other weekdays. Additionally, it varies from month to month. Out of the two months’ data available, Thursday and Tuesday recorded the lowest distance covered by female users for the month of May.
    • The co-relation of total steps walked every weekday in a month appear to be similar with the distance covered by each user. This suggests that most calories are burned through walking, irrespective of any other exercises.
    • Based on two-month data, one interesting insight is that the maximum amount of calories on a daily basis were burned by women users during the month of April, whereas the lowest sum of calories was burned during the month of May.
    • The data shows that active weekdays for calories burned on a minute-by-minute and hourly basis almost follows a linear pattern, with Friday, Saturday, Thursday, and Tuesday being the peak weekdays compared to the rest of the week.
    • The correlogram of total time asleep, and total time in bed clearly indicates a positive relationship between them.However, sedentary minutes have a negative relationship with both total time asleep and time in bed. Additionally, the distribution of sedentary minutes is bimodal.These findings suggest that increasing time spent asleep and in bed may be beneficial for overall health, while reducing sedentary behavior could also have a positive impact on day to day life.
    • The correlogram of weight in pounds and BMI reveals a clear positive relationship between these variables, while sedentary minutes exhibit a negative relationship with both weight and BMI. These findings suggest that reducing sedentary behavior could help maintain a balanced weight and BMI.
    • The pie chart clearly shows that sedentary minutes percentage are comparatively much higher than the rest of the segments, such as very active minutes, fairly active minutes and lightly active minutes, which shows that users were very minimally active.
    • The second pie chart reveals that lightly active distance contribute almost 50% of total segments followed by very active and moderate distance.However, sedentary distance is almost negligible as compared to all three segments.
    • Among the three categories of distance covered by each female user, the average distance for very active distance is 1.39km, the moderate distance is 0.73km, while in comparison with the light active distance, which is much higher at 3.53km. These findings suggest that female users engaged in a variety of physical activity levels, with light activity being the most common and highest in distance.

12 DELIVERABLE:

    • Allocate data engineers to focus on collecting a diverse range of health data while maintaining data integrity and by enabling integration of data science into bellabeat resulting in personalized push notifications based on individual health parameters.
    • Integrating data science can help curate a comprehensive understanding of their health and wellness that in turn can guide them towards more informed decisions about their lifestyle choices and improve their overall well-being.Thus making the app a more effective tool for achieving health and wellness goals.
    • An effective way to get weight info is to first partner with any weight scale manufacturer to develop a smart weight scale digital solution to get the precise weight readings.It is one of the crucial parameters which will help nutritionists or algorithms to provide personalized and accurate guidance. Also, by leveraging this data, the overall experience for achieving health and wellness goals can be enhanced more effectively.
    • To create a more engaging experience for our users, we could develop a community feature that allows them to connect with friends from their contacts, social media accounts, or directly within the app. By leveraging social connections, we can boost user engagement and increase screen time on the app.
    • Bellabeat app can be an essential part of our users’ daily routine by integrating additional features that help them plan their day. Along with that, it could include an alarm option that alerts them to attend scheduled tasks throughout the day. Additionally, we could host weekly and monthly challenges that are open to users from any location, and offer rewards such as discount coupons or free product or service subscriptions for those who complete the challenges. These features will not only enhance the user experience, but also encourage them to use the app regularly and make healthier choices as part of their daily lifestyle.
    • To help users improve their sleep cycle, we can add extensions to the app that provide default notifications for getting into bed and waking up on time. This can be achieved by incorporating heart rate sensors that detect the user’s sleep time and provide alerts accordingly. These features will not only promote better sleep habits, but also enhance the user experience by providing personalized notifications based on their individual sleep patterns.
    • To reduce sedentary minutes, it is essential to add an extra feature that enables users to drink water after every half an hour to stay hydrated. Additionally,if a user has been sedentary for two or three hours, remind the user to take a short walk in order to make them more active and productive throughout the day.By incorporating these features, users can stay motivated to stay active and healthy, leading to a more balanced lifestyle.
    • To reduce total time in bed, bellabeat can add an additional feature to the sleep cycle feature that prompts users with a notification to fall asleep if they are awake in bed for an extended period of time. Additionally, Bellabeat can partner with a software company that produces meditation and sleep time stories and integrate subscription packages within its annual or monthly packages. This can help users to get a complete bundle of health and calmness along with a push to fall asleep early, thus reducing total time in bed.
    • Expand your user base by enabling referral benefits such as giving both users the benefit of a 50% - 50% discount on annual subscriptions, or else providing any subscription to any product.Other than that, another extra feature can be added by creating a bellabeat, secure wallet service where users can earn coins on referrals plus gets rewarded for every challenge they participate and then can spent it on any bellabeat products or services purchases made in any near future.

13 RESOURCES: