Case Study - Bellabeat Wellness Technology

Introduction

Smart devices in recent times more depended in health and wellness. Smart devices helps in analyzing fitness data, thereby providing insights.

It tracks health data related to their activity, sleep, stress, habits and wellness.Smart devices were one of the first wearable devices that promoted self-monitoring and were typically associated with fitness tracking. These technologies are used to gather data at all times during the day.

The flexibility in this technology also allows for more positive and accurate results.link

About the Company

Bellabeat is a small successful, a high-tech company that manufactures health-focused smart products.

Urska Srsen and Sando Mur are BellaBeat's cofounders.Founded in 2013, Bellabeat has grown rapidly and quickly positioned itself as a tech-driven wellness company for women. The company's collection of data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with knowledge about their own health and habits.

By 2016, Bellabeat had opened offices around the world and launched multiple products. Bellabeat products became available through a growing number of online retailers in addition to their own e-commerce channel on their website.

The company has invested in local advertising media, such as radio, out-of-home billboards, print, and television, but focuses on digital marketing extensively. Bellabeat invests year-round in Google Search, maintaining active Facebook and Instagram pages, and consistently engages users on Twitter. Additionally, Bellabeat runs video ads on Youtube and display ads on the Google Display Network to support campaigns around key marketing dates.

Products

Bellabeat app: The Bellabeat app provides users with health data related to their activity, sleep, stress, menstrual cycle, and mindfulness habits. it provides insights on their current habits and make healthy decisions.
Leaf: Bellabeat's classic wellness tracker can be worn as a bracelet, necklace, or clip. The Leaf tracker connects to the Bellabeat app to track activity, sleep, and stress.
Time: This wellness watch combines the timeless look of a classic timepiece with smart technology to track user activity, sleep, and stress. The Time watch connects to the Bellabeat app to provide you with insights into your daily wellness.
Spring: This is a water bottle that tracks daily water intake using smart technology to ensure that you are appropriately hydrated throughout the day. The Spring bottle connects to the Bellabeat app to track hydration levels.

Business Task - Ask

Analyze smart device usage data, apply it to one of Bellabeat's products and its marketing strategy in order to gain insight and discover new and effective marketing strategies and growth opportunities.

Stake Holders

Urska Srsen: Bellabeat's cofounder and Chief Creative Officer
Sando Mur: Mathematician and Bellabeat's cofounder; key member of the Bellabeat executive team *Bellabeat marketing analytics team: A team of data analysts responsible for collecting, analyzing, and reporting data that helps guide Bellabeat's marketing strategy.

Preparing the data

The "Fitbit Fitness Tracker Data" was downloaded from Kaggle. The dataset was cleaned and processed via RStudio.

The appropriate packages were installed and loaded

library('tidyverse')

## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --

## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.6     v dplyr   1.0.8
## v tidyr   1.2.0     v stringr 1.4.0
## v readr   2.1.2     v forcats 0.5.1

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library('tidyr')
library('dplyr')

#installing the read package
library('readr')

Importing the dataset and assigning them to a variable

dailyActivity <- read.csv('dailyActivity_merged.csv')
dailyCalories <- read.csv('dailyCalories_merged.csv')
dailyIntensities <- read.csv('dailyIntensities_merged.csv')
heartrate <- read.csv('heartrate_seconds_merged.csv')
dailySteps <- read.csv('dailySteps_merged.csv')
sleepDay <- read.csv('sleepDay_merged.csv')
weight <- read.csv('weightLogInfo_merged.csv')

#Different Time Frame
hourly_calories <- read.csv("hourlyCalories_merged.csv")
hourly_intensities <- read.csv("hourlyIntensities_merged.csv")
hourly_steps <- read.csv("hourlySteps_merged.csv")
daily_sleep <- read.csv("sleepDay_merged.csv")

Preview the dataset using the structure function

str(dailyActivity)

## 'data.frame':    940 obs. of  15 variables:
##  $ Id                      : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityDate            : chr  "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
##  $ TotalSteps              : int  13162 10735 10460 9762 12669 9705 13019 15506 10544 9819 ...
##  $ TotalDistance           : num  8.5 6.97 6.74 6.28 8.16 ...
##  $ TrackerDistance         : num  8.5 6.97 6.74 6.28 8.16 ...
##  $ LoggedActivitiesDistance: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ VeryActiveDistance      : num  1.88 1.57 2.44 2.14 2.71 ...
##  $ ModeratelyActiveDistance: num  0.55 0.69 0.4 1.26 0.41 ...
##  $ LightActiveDistance     : num  6.06 4.71 3.91 2.83 5.04 ...
##  $ SedentaryActiveDistance : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ VeryActiveMinutes       : int  25 21 30 29 36 38 42 50 28 19 ...
##  $ FairlyActiveMinutes     : int  13 19 11 34 10 20 16 31 12 8 ...
##  $ LightlyActiveMinutes    : int  328 217 181 209 221 164 233 264 205 211 ...
##  $ SedentaryMinutes        : int  728 776 1218 726 773 539 1149 775 818 838 ...
##  $ Calories                : int  1985 1797 1776 1745 1863 1728 1921 2035 1786 1775 ...

str(dailyIntensities)

## 'data.frame':    940 obs. of  10 variables:
##  $ Id                      : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityDay             : chr  "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
##  $ SedentaryMinutes        : int  728 776 1218 726 773 539 1149 775 818 838 ...
##  $ LightlyActiveMinutes    : int  328 217 181 209 221 164 233 264 205 211 ...
##  $ FairlyActiveMinutes     : int  13 19 11 34 10 20 16 31 12 8 ...
##  $ VeryActiveMinutes       : int  25 21 30 29 36 38 42 50 28 19 ...
##  $ SedentaryActiveDistance : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LightActiveDistance     : num  6.06 4.71 3.91 2.83 5.04 ...
##  $ ModeratelyActiveDistance: num  0.55 0.69 0.4 1.26 0.41 ...
##  $ VeryActiveDistance      : num  1.88 1.57 2.44 2.14 2.71 ...

str(dailySteps)

## 'data.frame':    940 obs. of  3 variables:
##  $ Id         : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityDay: chr  "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
##  $ StepTotal  : int  13162 10735 10460 9762 12669 9705 13019 15506 10544 9819 ...

str(dailyCalories)

## 'data.frame':    940 obs. of  3 variables:
##  $ Id         : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityDay: chr  "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
##  $ Calories   : int  1985 1797 1776 1745 1863 1728 1921 2035 1786 1775 ...

str(sleepDay)

## 'data.frame':    413 obs. of  5 variables:
##  $ Id                : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ SleepDay          : chr  "4/12/2016 12:00:00 AM" "4/13/2016 12:00:00 AM" "4/15/2016 12:00:00 AM" "4/16/2016 12:00:00 AM" ...
##  $ TotalSleepRecords : int  1 2 1 2 1 1 1 1 1 1 ...
##  $ TotalMinutesAsleep: int  327 384 412 340 700 304 360 325 361 430 ...
##  $ TotalTimeInBed    : int  346 407 442 367 712 320 377 364 384 449 ...

str(hourly_calories)

## 'data.frame':    22099 obs. of  3 variables:
##  $ Id          : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityHour: chr  "4/12/2016 12:00:00 AM" "4/12/2016 1:00:00 AM" "4/12/2016 2:00:00 AM" "4/12/2016 3:00:00 AM" ...
##  $ Calories    : int  81 61 59 47 48 48 48 47 68 141 ...

str(hourly_intensities)

## 'data.frame':    22099 obs. of  4 variables:
##  $ Id              : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityHour    : chr  "4/12/2016 12:00:00 AM" "4/12/2016 1:00:00 AM" "4/12/2016 2:00:00 AM" "4/12/2016 3:00:00 AM" ...
##  $ TotalIntensity  : int  20 8 7 0 0 0 0 0 13 30 ...
##  $ AverageIntensity: num  0.333 0.133 0.117 0 0 ...

str(daily_sleep)

## 'data.frame':    413 obs. of  5 variables:
##  $ Id                : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ SleepDay          : chr  "4/12/2016 12:00:00 AM" "4/13/2016 12:00:00 AM" "4/15/2016 12:00:00 AM" "4/16/2016 12:00:00 AM" ...
##  $ TotalSleepRecords : int  1 2 1 2 1 1 1 1 1 1 ...
##  $ TotalMinutesAsleep: int  327 384 412 340 700 304 360 325 361 430 ...
##  $ TotalTimeInBed    : int  346 407 442 367 712 320 377 364 384 449 ...

str(hourly_steps)

## 'data.frame':    22099 obs. of  3 variables:
##  $ Id          : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityHour: chr  "4/12/2016 12:00:00 AM" "4/12/2016 1:00:00 AM" "4/12/2016 2:00:00 AM" "4/12/2016 3:00:00 AM" ...
##  $ StepTotal   : int  373 160 151 0 0 0 0 0 250 1864 ...

#Varaibles Contained in Daily_Step, Daily_Intensities, Daily_Calories is present in Daily Activities
# To Avoid duplicate data We are dropping the 3 Dataframes 
rm(dailyIntensities,dailyCalories,dailySteps)

# cleaning the data
#Checking For NA's Value #checking for nulls
#is.null() or colSums(is.na())
colSums(is.na(dailyActivity))

##                       Id             ActivityDate               TotalSteps 
##                        0                        0                        0 
##            TotalDistance          TrackerDistance LoggedActivitiesDistance 
##                        0                        0                        0 
##       VeryActiveDistance ModeratelyActiveDistance      LightActiveDistance 
##                        0                        0                        0 
##  SedentaryActiveDistance        VeryActiveMinutes      FairlyActiveMinutes 
##                        0                        0                        0 
##     LightlyActiveMinutes         SedentaryMinutes                 Calories 
##                        0                        0                        0

colSums(is.na(daily_sleep))

##                 Id           SleepDay  TotalSleepRecords TotalMinutesAsleep 
##                  0                  0                  0                  0 
##     TotalTimeInBed 
##                  0

colSums(is.na(hourly_calories))

##           Id ActivityHour     Calories 
##            0            0            0

colSums(is.na(hourly_steps))

##           Id ActivityHour    StepTotal 
##            0            0            0

colSums(is.na(sleepDay))

##                 Id           SleepDay  TotalSleepRecords TotalMinutesAsleep 
##                  0                  0                  0                  0 
##     TotalTimeInBed 
##                  0

colSums(is.na(weight))

##             Id           Date       WeightKg   WeightPounds            Fat 
##              0              0              0              0             65 
##            BMI IsManualReport          LogId 
##              0              0              0

colSums(is.na(heartrate))

##    Id  Time Value 
##     0     0     0

Fat in weight Contains 65 Na's/Null Values

The Information aboutbFat isnt Neccesary as most of the value are NA's

2 Values are only present out of 67values

So We remove it from the column

rm(weight) #  Removing Table Weight ----Successful

The dates are in character 'chr', instead of a date, to change this to a date format, we install and load the lubridate package. The lubridate package helps change characters to date formats.

#install.packages('lubridate')
library('lubridate')

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

#Working on the Dates
dailyActivity$ActivityDate<-as.Date(dailyActivity$ActivityDate
                                    , format = "%m/%d/%y")
sleepDay$SleepDay<-as.Date(sleepDay$SleepDay
                           , format = "%m/%d/%y")
sleepDay$SleepDay<-as.Date(sleepDay$SleepDay
                           , format = "%m/%d/%y")
daily_sleep$SleepDay<-as.Date(sleepDay$SleepDay
                           , format = "%m/%d/%y")
str(daily_sleep)

## 'data.frame':    413 obs. of  5 variables:
##  $ Id                : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ SleepDay          : Date, format: "2020-04-12" "2020-04-13" ...
##  $ TotalSleepRecords : int  1 2 1 2 1 1 1 1 1 1 ...
##  $ TotalMinutesAsleep: int  327 384 412 340 700 304 360 325 361 430 ...
##  $ TotalTimeInBed    : int  346 407 442 367 712 320 377 364 384 449 ...

Checking for duplicate ids

unique(dailyActivity$Id)

##  [1] 1503960366 1624580081 1644430081 1844505072 1927972279 2022484408
##  [7] 2026352035 2320127002 2347167796 2873212765 3372868164 3977333714
## [13] 4020332650 4057192912 4319703577 4388161847 4445114986 4558609924
## [19] 4702921684 5553957443 5577150313 6117666160 6290855005 6775888955
## [25] 6962181067 7007744171 7086361926 8053475328 8253242879 8378563200
## [31] 8583815059 8792009665 8877689391

n_distinct(dailyActivity$Id)

## [1] 33

unique(daily_sleep$Id)

##  [1] 1503960366 1644430081 1844505072 1927972279 2026352035 2320127002
##  [7] 2347167796 3977333714 4020332650 4319703577 4388161847 4445114986
## [13] 4558609924 4702921684 5553957443 5577150313 6117666160 6775888955
## [19] 6962181067 7007744171 7086361926 8053475328 8378563200 8792009665

n_distinct(daily_sleep$Id)

## [1] 24

unique(hourly_calories$Id)

##  [1] 1503960366 1624580081 1644430081 1844505072 1927972279 2022484408
##  [7] 2026352035 2320127002 2347167796 2873212765 3372868164 3977333714
## [13] 4020332650 4057192912 4319703577 4388161847 4445114986 4558609924
## [19] 4702921684 5553957443 5577150313 6117666160 6290855005 6775888955
## [25] 6962181067 7007744171 7086361926 8053475328 8253242879 8378563200
## [31] 8583815059 8792009665 8877689391

n_distinct(hourly_calories$Id)

## [1] 33

unique(hourly_intensities$Id)

##  [1] 1503960366 1624580081 1644430081 1844505072 1927972279 2022484408
##  [7] 2026352035 2320127002 2347167796 2873212765 3372868164 3977333714
## [13] 4020332650 4057192912 4319703577 4388161847 4445114986 4558609924
## [19] 4702921684 5553957443 5577150313 6117666160 6290855005 6775888955
## [25] 6962181067 7007744171 7086361926 8053475328 8253242879 8378563200
## [31] 8583815059 8792009665 8877689391

n_distinct(hourly_intensities$Id)

## [1] 33

unique(hourly_steps$Id)

##  [1] 1503960366 1624580081 1644430081 1844505072 1927972279 2022484408
##  [7] 2026352035 2320127002 2347167796 2873212765 3372868164 3977333714
## [13] 4020332650 4057192912 4319703577 4388161847 4445114986 4558609924
## [19] 4702921684 5553957443 5577150313 6117666160 6290855005 6775888955
## [25] 6962181067 7007744171 7086361926 8053475328 8253242879 8378563200
## [31] 8583815059 8792009665 8877689391

n_distinct(hourly_steps$Id)

## [1] 33

Checking for and removing duplicate data.

Checking for and removing duplicates is essential as it cleans the datasets and avoid possible errors.

#Checking any duplicate row in any of our dataframe 

sum(duplicated(dailyActivity))

## [1] 0

sum(duplicated(daily_sleep))

## [1] 3

#3 Duplicted row

sum(duplicated(hourly_calories))

## [1] 0

sum(duplicated(hourly_intensities))

## [1] 0

sum(duplicated(hourly_steps))

## [1] 0

#Removing the duplicated row in daily_sleep
sleep_daily<- daily_sleep[!duplicated(daily_sleep), ]
sum(duplicated(sleep_daily))

## [1] 0

str(sleep_daily)

## 'data.frame':    410 obs. of  5 variables:
##  $ Id                : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ SleepDay          : Date, format: "2020-04-12" "2020-04-13" ...
##  $ TotalSleepRecords : int  1 2 1 2 1 1 1 1 1 1 ...
##  $ TotalMinutesAsleep: int  327 384 412 340 700 304 360 325 361 430 ...
##  $ TotalTimeInBed    : int  346 407 442 367 712 320 377 364 384 449 ...

#converting time frame from character to date 

hourly_calories$ActivityHour <- mdy_hms(hourly_calories$ActivityHour)
hourly_intensities$ActivityHour <-mdy_hms(hourly_intensities$ActivityHour)
hourly_steps$ActivityHour <-mdy_hms(hourly_steps$ActivityHour)

Analyzing the Dataset.

In order to start the analysis, the dataset would be merged, Merging the dataset makes the analysis easily accessible and organized. In merging the datasets 'sleep_daily' and 'dailyActivity', there should be a unique column, therefore, the 'SleepDay' column was renamed to 'ActivityDate'.

#renaming our dataset
sleep_daily<- sleep_daily %>%
  rename(ActivityDate = SleepDay)
str(dailyActivity)

## 'data.frame':    940 obs. of  15 variables:
##  $ Id                      : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityDate            : Date, format: "2020-04-12" "2020-04-13" ...
##  $ TotalSteps              : int  13162 10735 10460 9762 12669 9705 13019 15506 10544 9819 ...
##  $ TotalDistance           : num  8.5 6.97 6.74 6.28 8.16 ...
##  $ TrackerDistance         : num  8.5 6.97 6.74 6.28 8.16 ...
##  $ LoggedActivitiesDistance: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ VeryActiveDistance      : num  1.88 1.57 2.44 2.14 2.71 ...
##  $ ModeratelyActiveDistance: num  0.55 0.69 0.4 1.26 0.41 ...
##  $ LightActiveDistance     : num  6.06 4.71 3.91 2.83 5.04 ...
##  $ SedentaryActiveDistance : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ VeryActiveMinutes       : int  25 21 30 29 36 38 42 50 28 19 ...
##  $ FairlyActiveMinutes     : int  13 19 11 34 10 20 16 31 12 8 ...
##  $ LightlyActiveMinutes    : int  328 217 181 209 221 164 233 264 205 211 ...
##  $ SedentaryMinutes        : int  728 776 1218 726 773 539 1149 775 818 838 ...
##  $ Calories                : int  1985 1797 1776 1745 1863 1728 1921 2035 1786 1775 ...

str(sleep_daily)

## 'data.frame':    410 obs. of  5 variables:
##  $ Id                : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityDate      : Date, format: "2020-04-12" "2020-04-13" ...
##  $ TotalSleepRecords : int  1 2 1 2 1 1 1 1 1 1 ...
##  $ TotalMinutesAsleep: int  327 384 412 340 700 304 360 325 361 430 ...
##  $ TotalTimeInBed    : int  346 407 442 367 712 320 377 364 384 449 ...

Merging the datasets "dailyActivity" and "sleep_daily" ensures that the dataset is organized.

the merged data would be named 'sleep_n_daily_activity'.

sleep_n_daily_activity <- merge(dailyActivity, sleep_daily, by=c("Id", "ActivityDate"))
str(sleep_n_daily_activity)

## 'data.frame':    410 obs. of  18 variables:
##  $ Id                      : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityDate            : Date, format: "2020-04-12" "2020-04-13" ...
##  $ TotalSteps              : int  13162 10735 9762 12669 9705 15506 10544 9819 14371 10039 ...
##  $ TotalDistance           : num  8.5 6.97 6.28 8.16 6.48 ...
##  $ TrackerDistance         : num  8.5 6.97 6.28 8.16 6.48 ...
##  $ LoggedActivitiesDistance: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ VeryActiveDistance      : num  1.88 1.57 2.14 2.71 3.19 ...
##  $ ModeratelyActiveDistance: num  0.55 0.69 1.26 0.41 0.78 ...
##  $ LightActiveDistance     : num  6.06 4.71 2.83 5.04 2.51 ...
##  $ SedentaryActiveDistance : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ VeryActiveMinutes       : int  25 21 29 36 38 50 28 19 41 39 ...
##  $ FairlyActiveMinutes     : int  13 19 34 10 20 31 12 8 21 5 ...
##  $ LightlyActiveMinutes    : int  328 217 209 221 164 264 205 211 262 238 ...
##  $ SedentaryMinutes        : int  728 776 726 773 539 775 818 838 732 709 ...
##  $ Calories                : int  1985 1797 1745 1863 1728 2035 1786 1775 1949 1788 ...
##  $ TotalSleepRecords       : int  1 2 1 2 1 1 1 1 1 1 ...
##  $ TotalMinutesAsleep      : int  327 384 412 340 700 304 360 325 361 430 ...
##  $ TotalTimeInBed          : int  346 407 442 367 712 320 377 364 384 449 ...

Loading the package for plotting visuals.

The ggplot2 and corrplot are a packages that helps create visualizations. The corrplot would create visuals that shows the correlation or relationship between the daily activities and calories of users of the Bellabeat app.

library("ggplot2")

library("corrplot")

## corrplot 0.92 loaded

Plotting the relationship between Total steps and calories burned.

In plotting the relationship between total steps and calories burned, the 'sleep_n_daily_activity' which is the merged dataset of 'dailyActivity' and 'sleepdaily' datasets was used. This analysis would help discover trends on how the total steps covered by users can affect the calories burned.

ggplot(sleep_n_daily_activity) +
  geom_point(mapping = aes(x=TotalSteps, y=Calories, color = ActivityDate)) +
  labs(title = "Relationship between TotalSteps and Calories")

This plot shows the relationship between Total steps and Calories burnt in the process. This states that the more less steps taken, the lesser the calories burnt and also the more steps taken, the higher the calories burnt.

Analyzing the correlation between Calories burned and Daily Activity Level.

This would help analyze and gain insight on the dataset even further, by checking the relationship between the daily activities and Calories burned of users of the BellaBeat's app.

This analysis would help uncover trends on users very-active Minutes, fairly-active minutes, sedentary minutes, lightly active minutes on calories. For this analysis, I used the original dataset 'dailyActivity'

let's View the dataset

glimpse(dailyActivity)

## Rows: 940
## Columns: 15
## $ Id                       <dbl> 1503960366, 1503960366, 1503960366, 150396036~
## $ ActivityDate             <date> 2020-04-12, 2020-04-13, 2020-04-14, 2020-04-~
## $ TotalSteps               <int> 13162, 10735, 10460, 9762, 12669, 9705, 13019~
## $ TotalDistance            <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.8~
## $ TrackerDistance          <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.8~
## $ LoggedActivitiesDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ VeryActiveDistance       <dbl> 1.88, 1.57, 2.44, 2.14, 2.71, 3.19, 3.25, 3.5~
## $ ModeratelyActiveDistance <dbl> 0.55, 0.69, 0.40, 1.26, 0.41, 0.78, 0.64, 1.3~
## $ LightActiveDistance      <dbl> 6.06, 4.71, 3.91, 2.83, 5.04, 2.51, 4.71, 5.0~
## $ SedentaryActiveDistance  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ VeryActiveMinutes        <int> 25, 21, 30, 29, 36, 38, 42, 50, 28, 19, 66, 4~
## $ FairlyActiveMinutes      <int> 13, 19, 11, 34, 10, 20, 16, 31, 12, 8, 27, 21~
## $ LightlyActiveMinutes     <int> 328, 217, 181, 209, 221, 164, 233, 264, 205, ~
## $ SedentaryMinutes         <int> 728, 776, 1218, 726, 773, 539, 1149, 775, 818~
## $ Calories                 <int> 1985, 1797, 1776, 1745, 1863, 1728, 1921, 203~

View(dailyActivity)

Plotting a density graph showing the relationship in density of calories burned by users.

This shows that the more density of very active minutes, the higher calories burned. The calories burned are stabilized when there is low density on active minutes.

ggplot(data = dailyActivity, mapping = aes(x=Calories, fill = VeryActiveMinutes, color = "VeryActiveMinutes" )) +
  geom_density(bw = 2)

Plotting the correlation graph

corrplot(corr = cor(dailyActivity[11:15]), order = "AOE",
         method = "circle",
         type = "upper", tl.pos = "lt")

corrplot(corr = cor(dailyActivity[11:15]), add= TRUE, order = "AOE",
         method = "pie", diag = FALSE,
         type = "lower", tl.pos = "n", cl.pos = "n")

This plot suggests that there is a positive correlation between calories burned and active minutes and also a strong negative correlation or relationship between sendentary minutes and calories burned.

Observations and trends

From the analysis, the result shows a positive correlation between calories burned and user's activity. The relationship trend discovered as a result of the includes:

Very-active minutes in the dataset have a positive correlation to calories burned. Therefore, Active users hava a positive relation with calories burned.
The longer the total steps the higher the calories burnt and more likely for active users to have sufficient number of sleep.
Non-active users have less steps covered and there is a strong negative correlation with calories burned and total steps, and therefore, are likely to have a higher insufficient amount of sleep.

Recommendations

In correlation with the analysis, the result leads to the following recommendations being made:

Bellabeat app can enhance its sleep tracking device and promote a friendly reminder to Users that oversleep on the amount of activity needed.
Bellabeat app can also promote a friendly reminder to Users that have insufficient amount of sleep and very active minutes on the need to reduce workout activities and encourage more rest and sleep.
Great priority should be placed especially on providing motivation or encouragement alerts to users that would want to lose weight and are on sedentary active minutes.
Bellabeat app can send daily notifications on sleep or activity recommendations to promote the sleep/non-active relation and also a reminder on when a user is supposed to get some rest as this could create a better sleeping habit by being more active in everyday life.