Introduction

Smart devices in recent times more depended in health and wellness. Smart devices helps in analyzing fitness data, thereby providing insights.

It tracks health data related to their activity, sleep, stress, habits and wellness.Smart devices were one of the first wearable devices that promoted self-monitoring and were typically associated with fitness tracking. These technologies are used to gather data at all times during the day.

The flexibility in this technology also allows for more positive and accurate results.link

About the Company

Bellabeat is a small successful, a high-tech company that manufactures health-focused smart products.

Urska Srsen and Sando Mur are BellaBeat's cofounders.Founded in 2013, Bellabeat has grown rapidly and quickly positioned itself as a tech-driven wellness company for women. The company's collection of data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with knowledge about their own health and habits.

By 2016, Bellabeat had opened offices around the world and launched multiple products. Bellabeat products became available through a growing number of online retailers in addition to their own e-commerce channel on their website.

The company has invested in local advertising media, such as radio, out-of-home billboards, print, and television, but focuses on digital marketing extensively. Bellabeat invests year-round in Google Search, maintaining active Facebook and Instagram pages, and consistently engages users on Twitter. Additionally, Bellabeat runs video ads on Youtube and display ads on the Google Display Network to support campaigns around key marketing dates.

Products

Business Task - Ask

Stake Holders
  • Urska Srsen: Bellabeat's cofounder and Chief Creative Officer
  • Sando Mur: Mathematician and Bellabeat's cofounder; key member of the Bellabeat executive team *Bellabeat marketing analytics team: A team of data analysts responsible for collecting, analyzing, and reporting data that helps guide Bellabeat's marketing strategy.

Preparing the data

The "Fitbit Fitness Tracker Data" was downloaded from Kaggle. The dataset was cleaned and processed via RStudio.

The appropriate packages were installed and loaded

library('tidyverse')
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.6     v dplyr   1.0.8
## v tidyr   1.2.0     v stringr 1.4.0
## v readr   2.1.2     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library('tidyr')
library('dplyr')
#installing the read package
library('readr')

Importing the dataset and assigning them to a variable

dailyActivity <- read.csv('dailyActivity_merged.csv')
dailyCalories <- read.csv('dailyCalories_merged.csv')
dailyIntensities <- read.csv('dailyIntensities_merged.csv')
heartrate <- read.csv('heartrate_seconds_merged.csv')
dailySteps <- read.csv('dailySteps_merged.csv')
sleepDay <- read.csv('sleepDay_merged.csv')
weight <- read.csv('weightLogInfo_merged.csv')
#Different Time Frame
hourly_calories <- read.csv("hourlyCalories_merged.csv")
hourly_intensities <- read.csv("hourlyIntensities_merged.csv")
hourly_steps <- read.csv("hourlySteps_merged.csv")
daily_sleep <- read.csv("sleepDay_merged.csv")
Preview the dataset using the structure function
str(dailyActivity)
## 'data.frame':    940 obs. of  15 variables:
##  $ Id                      : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityDate            : chr  "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
##  $ TotalSteps              : int  13162 10735 10460 9762 12669 9705 13019 15506 10544 9819 ...
##  $ TotalDistance           : num  8.5 6.97 6.74 6.28 8.16 ...
##  $ TrackerDistance         : num  8.5 6.97 6.74 6.28 8.16 ...
##  $ LoggedActivitiesDistance: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ VeryActiveDistance      : num  1.88 1.57 2.44 2.14 2.71 ...
##  $ ModeratelyActiveDistance: num  0.55 0.69 0.4 1.26 0.41 ...
##  $ LightActiveDistance     : num  6.06 4.71 3.91 2.83 5.04 ...
##  $ SedentaryActiveDistance : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ VeryActiveMinutes       : int  25 21 30 29 36 38 42 50 28 19 ...
##  $ FairlyActiveMinutes     : int  13 19 11 34 10 20 16 31 12 8 ...
##  $ LightlyActiveMinutes    : int  328 217 181 209 221 164 233 264 205 211 ...
##  $ SedentaryMinutes        : int  728 776 1218 726 773 539 1149 775 818 838 ...
##  $ Calories                : int  1985 1797 1776 1745 1863 1728 1921 2035 1786 1775 ...
str(dailyIntensities)
## 'data.frame':    940 obs. of  10 variables:
##  $ Id                      : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityDay             : chr  "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
##  $ SedentaryMinutes        : int  728 776 1218 726 773 539 1149 775 818 838 ...
##  $ LightlyActiveMinutes    : int  328 217 181 209 221 164 233 264 205 211 ...
##  $ FairlyActiveMinutes     : int  13 19 11 34 10 20 16 31 12 8 ...
##  $ VeryActiveMinutes       : int  25 21 30 29 36 38 42 50 28 19 ...
##  $ SedentaryActiveDistance : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LightActiveDistance     : num  6.06 4.71 3.91 2.83 5.04 ...
##  $ ModeratelyActiveDistance: num  0.55 0.69 0.4 1.26 0.41 ...
##  $ VeryActiveDistance      : num  1.88 1.57 2.44 2.14 2.71 ...
str(dailySteps)
## 'data.frame':    940 obs. of  3 variables:
##  $ Id         : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityDay: chr  "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
##  $ StepTotal  : int  13162 10735 10460 9762 12669 9705 13019 15506 10544 9819 ...
str(dailyCalories)
## 'data.frame':    940 obs. of  3 variables:
##  $ Id         : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityDay: chr  "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
##  $ Calories   : int  1985 1797 1776 1745 1863 1728 1921 2035 1786 1775 ...
str(sleepDay)
## 'data.frame':    413 obs. of  5 variables:
##  $ Id                : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ SleepDay          : chr  "4/12/2016 12:00:00 AM" "4/13/2016 12:00:00 AM" "4/15/2016 12:00:00 AM" "4/16/2016 12:00:00 AM" ...
##  $ TotalSleepRecords : int  1 2 1 2 1 1 1 1 1 1 ...
##  $ TotalMinutesAsleep: int  327 384 412 340 700 304 360 325 361 430 ...
##  $ TotalTimeInBed    : int  346 407 442 367 712 320 377 364 384 449 ...
str(hourly_calories)
## 'data.frame':    22099 obs. of  3 variables:
##  $ Id          : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityHour: chr  "4/12/2016 12:00:00 AM" "4/12/2016 1:00:00 AM" "4/12/2016 2:00:00 AM" "4/12/2016 3:00:00 AM" ...
##  $ Calories    : int  81 61 59 47 48 48 48 47 68 141 ...
str(hourly_intensities)
## 'data.frame':    22099 obs. of  4 variables:
##  $ Id              : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityHour    : chr  "4/12/2016 12:00:00 AM" "4/12/2016 1:00:00 AM" "4/12/2016 2:00:00 AM" "4/12/2016 3:00:00 AM" ...
##  $ TotalIntensity  : int  20 8 7 0 0 0 0 0 13 30 ...
##  $ AverageIntensity: num  0.333 0.133 0.117 0 0 ...
str(daily_sleep)
## 'data.frame':    413 obs. of  5 variables:
##  $ Id                : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ SleepDay          : chr  "4/12/2016 12:00:00 AM" "4/13/2016 12:00:00 AM" "4/15/2016 12:00:00 AM" "4/16/2016 12:00:00 AM" ...
##  $ TotalSleepRecords : int  1 2 1 2 1 1 1 1 1 1 ...
##  $ TotalMinutesAsleep: int  327 384 412 340 700 304 360 325 361 430 ...
##  $ TotalTimeInBed    : int  346 407 442 367 712 320 377 364 384 449 ...
str(hourly_steps)
## 'data.frame':    22099 obs. of  3 variables:
##  $ Id          : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityHour: chr  "4/12/2016 12:00:00 AM" "4/12/2016 1:00:00 AM" "4/12/2016 2:00:00 AM" "4/12/2016 3:00:00 AM" ...
##  $ StepTotal   : int  373 160 151 0 0 0 0 0 250 1864 ...
#Varaibles Contained in Daily_Step, Daily_Intensities, Daily_Calories is present in Daily Activities
# To Avoid duplicate data We are dropping the 3 Dataframes 
rm(dailyIntensities,dailyCalories,dailySteps)
# cleaning the data
#Checking For NA's Value #checking for nulls
#is.null() or colSums(is.na())
colSums(is.na(dailyActivity))
##                       Id             ActivityDate               TotalSteps 
##                        0                        0                        0 
##            TotalDistance          TrackerDistance LoggedActivitiesDistance 
##                        0                        0                        0 
##       VeryActiveDistance ModeratelyActiveDistance      LightActiveDistance 
##                        0                        0                        0 
##  SedentaryActiveDistance        VeryActiveMinutes      FairlyActiveMinutes 
##                        0                        0                        0 
##     LightlyActiveMinutes         SedentaryMinutes                 Calories 
##                        0                        0                        0
colSums(is.na(daily_sleep))
##                 Id           SleepDay  TotalSleepRecords TotalMinutesAsleep 
##                  0                  0                  0                  0 
##     TotalTimeInBed 
##                  0
colSums(is.na(hourly_calories))
##           Id ActivityHour     Calories 
##            0            0            0
colSums(is.na(hourly_steps))
##           Id ActivityHour    StepTotal 
##            0            0            0
colSums(is.na(sleepDay))
##                 Id           SleepDay  TotalSleepRecords TotalMinutesAsleep 
##                  0                  0                  0                  0 
##     TotalTimeInBed 
##                  0
colSums(is.na(weight))
##             Id           Date       WeightKg   WeightPounds            Fat 
##              0              0              0              0             65 
##            BMI IsManualReport          LogId 
##              0              0              0
colSums(is.na(heartrate))
##    Id  Time Value 
##     0     0     0

Fat in weight Contains 65 Na's/Null Values

The Information aboutbFat isnt Neccesary as most of the value are NA's

2 Values are only present out of 67values

So We remove it from the column

rm(weight) #  Removing Table Weight ----Successful 

The dates are in character 'chr', instead of a date, to change this to a date format, we install and load the lubridate package. The lubridate package helps change characters to date formats.

#install.packages('lubridate')
library('lubridate')
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
#Working on the Dates
dailyActivity$ActivityDate<-as.Date(dailyActivity$ActivityDate
                                    , format = "%m/%d/%y")
sleepDay$SleepDay<-as.Date(sleepDay$SleepDay
                           , format = "%m/%d/%y")
sleepDay$SleepDay<-as.Date(sleepDay$SleepDay
                           , format = "%m/%d/%y")
daily_sleep$SleepDay<-as.Date(sleepDay$SleepDay
                           , format = "%m/%d/%y")
str(daily_sleep)
## 'data.frame':    413 obs. of  5 variables:
##  $ Id                : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ SleepDay          : Date, format: "2020-04-12" "2020-04-13" ...
##  $ TotalSleepRecords : int  1 2 1 2 1 1 1 1 1 1 ...
##  $ TotalMinutesAsleep: int  327 384 412 340 700 304 360 325 361 430 ...
##  $ TotalTimeInBed    : int  346 407 442 367 712 320 377 364 384 449 ...

Checking for duplicate ids

unique(dailyActivity$Id)
##  [1] 1503960366 1624580081 1644430081 1844505072 1927972279 2022484408
##  [7] 2026352035 2320127002 2347167796 2873212765 3372868164 3977333714
## [13] 4020332650 4057192912 4319703577 4388161847 4445114986 4558609924
## [19] 4702921684 5553957443 5577150313 6117666160 6290855005 6775888955
## [25] 6962181067 7007744171 7086361926 8053475328 8253242879 8378563200
## [31] 8583815059 8792009665 8877689391
n_distinct(dailyActivity$Id)
## [1] 33
unique(daily_sleep$Id)
##  [1] 1503960366 1644430081 1844505072 1927972279 2026352035 2320127002
##  [7] 2347167796 3977333714 4020332650 4319703577 4388161847 4445114986
## [13] 4558609924 4702921684 5553957443 5577150313 6117666160 6775888955
## [19] 6962181067 7007744171 7086361926 8053475328 8378563200 8792009665
n_distinct(daily_sleep$Id)
## [1] 24
unique(hourly_calories$Id)
##  [1] 1503960366 1624580081 1644430081 1844505072 1927972279 2022484408
##  [7] 2026352035 2320127002 2347167796 2873212765 3372868164 3977333714
## [13] 4020332650 4057192912 4319703577 4388161847 4445114986 4558609924
## [19] 4702921684 5553957443 5577150313 6117666160 6290855005 6775888955
## [25] 6962181067 7007744171 7086361926 8053475328 8253242879 8378563200
## [31] 8583815059 8792009665 8877689391
n_distinct(hourly_calories$Id)
## [1] 33
unique(hourly_intensities$Id)
##  [1] 1503960366 1624580081 1644430081 1844505072 1927972279 2022484408
##  [7] 2026352035 2320127002 2347167796 2873212765 3372868164 3977333714
## [13] 4020332650 4057192912 4319703577 4388161847 4445114986 4558609924
## [19] 4702921684 5553957443 5577150313 6117666160 6290855005 6775888955
## [25] 6962181067 7007744171 7086361926 8053475328 8253242879 8378563200
## [31] 8583815059 8792009665 8877689391
n_distinct(hourly_intensities$Id)
## [1] 33
unique(hourly_steps$Id)
##  [1] 1503960366 1624580081 1644430081 1844505072 1927972279 2022484408
##  [7] 2026352035 2320127002 2347167796 2873212765 3372868164 3977333714
## [13] 4020332650 4057192912 4319703577 4388161847 4445114986 4558609924
## [19] 4702921684 5553957443 5577150313 6117666160 6290855005 6775888955
## [25] 6962181067 7007744171 7086361926 8053475328 8253242879 8378563200
## [31] 8583815059 8792009665 8877689391
n_distinct(hourly_steps$Id)
## [1] 33

Checking for and removing duplicate data.

Checking for and removing duplicates is essential as it cleans the datasets and avoid possible errors.

#Checking any duplicate row in any of our dataframe 

sum(duplicated(dailyActivity))
## [1] 0
sum(duplicated(daily_sleep))
## [1] 3
#3 Duplicted row
sum(duplicated(hourly_calories))
## [1] 0
sum(duplicated(hourly_intensities))
## [1] 0
sum(duplicated(hourly_steps))
## [1] 0
#Removing the duplicated row in daily_sleep
sleep_daily<- daily_sleep[!duplicated(daily_sleep), ]
sum(duplicated(sleep_daily))
## [1] 0
str(sleep_daily)
## 'data.frame':    410 obs. of  5 variables:
##  $ Id                : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ SleepDay          : Date, format: "2020-04-12" "2020-04-13" ...
##  $ TotalSleepRecords : int  1 2 1 2 1 1 1 1 1 1 ...
##  $ TotalMinutesAsleep: int  327 384 412 340 700 304 360 325 361 430 ...
##  $ TotalTimeInBed    : int  346 407 442 367 712 320 377 364 384 449 ...
#converting time frame from character to date 

hourly_calories$ActivityHour <- mdy_hms(hourly_calories$ActivityHour)
hourly_intensities$ActivityHour <-mdy_hms(hourly_intensities$ActivityHour)
hourly_steps$ActivityHour <-mdy_hms(hourly_steps$ActivityHour)

Analyzing the Dataset.

In order to start the analysis, the dataset would be merged, Merging the dataset makes the analysis easily accessible and organized. In merging the datasets 'sleep_daily' and 'dailyActivity', there should be a unique column, therefore, the 'SleepDay' column was renamed to 'ActivityDate'.

#renaming our dataset
sleep_daily<- sleep_daily %>%
  rename(ActivityDate = SleepDay)
str(dailyActivity)
## 'data.frame':    940 obs. of  15 variables:
##  $ Id                      : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityDate            : Date, format: "2020-04-12" "2020-04-13" ...
##  $ TotalSteps              : int  13162 10735 10460 9762 12669 9705 13019 15506 10544 9819 ...
##  $ TotalDistance           : num  8.5 6.97 6.74 6.28 8.16 ...
##  $ TrackerDistance         : num  8.5 6.97 6.74 6.28 8.16 ...
##  $ LoggedActivitiesDistance: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ VeryActiveDistance      : num  1.88 1.57 2.44 2.14 2.71 ...
##  $ ModeratelyActiveDistance: num  0.55 0.69 0.4 1.26 0.41 ...
##  $ LightActiveDistance     : num  6.06 4.71 3.91 2.83 5.04 ...
##  $ SedentaryActiveDistance : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ VeryActiveMinutes       : int  25 21 30 29 36 38 42 50 28 19 ...
##  $ FairlyActiveMinutes     : int  13 19 11 34 10 20 16 31 12 8 ...
##  $ LightlyActiveMinutes    : int  328 217 181 209 221 164 233 264 205 211 ...
##  $ SedentaryMinutes        : int  728 776 1218 726 773 539 1149 775 818 838 ...
##  $ Calories                : int  1985 1797 1776 1745 1863 1728 1921 2035 1786 1775 ...
str(sleep_daily)
## 'data.frame':    410 obs. of  5 variables:
##  $ Id                : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityDate      : Date, format: "2020-04-12" "2020-04-13" ...
##  $ TotalSleepRecords : int  1 2 1 2 1 1 1 1 1 1 ...
##  $ TotalMinutesAsleep: int  327 384 412 340 700 304 360 325 361 430 ...
##  $ TotalTimeInBed    : int  346 407 442 367 712 320 377 364 384 449 ...

Merging the datasets "dailyActivity" and "sleep_daily" ensures that the dataset is organized.

the merged data would be named 'sleep_n_daily_activity'.
sleep_n_daily_activity <- merge(dailyActivity, sleep_daily, by=c("Id", "ActivityDate"))
str(sleep_n_daily_activity)
## 'data.frame':    410 obs. of  18 variables:
##  $ Id                      : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityDate            : Date, format: "2020-04-12" "2020-04-13" ...
##  $ TotalSteps              : int  13162 10735 9762 12669 9705 15506 10544 9819 14371 10039 ...
##  $ TotalDistance           : num  8.5 6.97 6.28 8.16 6.48 ...
##  $ TrackerDistance         : num  8.5 6.97 6.28 8.16 6.48 ...
##  $ LoggedActivitiesDistance: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ VeryActiveDistance      : num  1.88 1.57 2.14 2.71 3.19 ...
##  $ ModeratelyActiveDistance: num  0.55 0.69 1.26 0.41 0.78 ...
##  $ LightActiveDistance     : num  6.06 4.71 2.83 5.04 2.51 ...
##  $ SedentaryActiveDistance : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ VeryActiveMinutes       : int  25 21 29 36 38 50 28 19 41 39 ...
##  $ FairlyActiveMinutes     : int  13 19 34 10 20 31 12 8 21 5 ...
##  $ LightlyActiveMinutes    : int  328 217 209 221 164 264 205 211 262 238 ...
##  $ SedentaryMinutes        : int  728 776 726 773 539 775 818 838 732 709 ...
##  $ Calories                : int  1985 1797 1745 1863 1728 2035 1786 1775 1949 1788 ...
##  $ TotalSleepRecords       : int  1 2 1 2 1 1 1 1 1 1 ...
##  $ TotalMinutesAsleep      : int  327 384 412 340 700 304 360 325 361 430 ...
##  $ TotalTimeInBed          : int  346 407 442 367 712 320 377 364 384 449 ...

Loading the package for plotting visuals.

The ggplot2 and corrplot are a packages that helps create visualizations. The corrplot would create visuals that shows the correlation or relationship between the daily activities and calories of users of the Bellabeat app.

library("ggplot2")

library("corrplot")
## corrplot 0.92 loaded

Plotting the relationship between Total steps and calories burned.

In plotting the relationship between total steps and calories burned, the 'sleep_n_daily_activity' which is the merged dataset of 'dailyActivity' and 'sleepdaily' datasets was used. This analysis would help discover trends on how the total steps covered by users can affect the calories burned.

ggplot(sleep_n_daily_activity) +
  geom_point(mapping = aes(x=TotalSteps, y=Calories, color = ActivityDate)) +
  labs(title = "Relationship between TotalSteps and Calories")

This plot shows the relationship between Total steps and Calories burnt in the process. This states that the more less steps taken, the lesser the calories burnt and also the more steps taken, the higher the calories burnt.

Analyzing the correlation between Calories burned and Daily Activity Level.

This would help analyze and gain insight on the dataset even further, by checking the relationship between the daily activities and Calories burned of users of the BellaBeat's app.

This analysis would help uncover trends on users very-active Minutes, fairly-active minutes, sedentary minutes, lightly active minutes on calories. For this analysis, I used the original dataset 'dailyActivity'

let's View the dataset

glimpse(dailyActivity)
## Rows: 940
## Columns: 15
## $ Id                       <dbl> 1503960366, 1503960366, 1503960366, 150396036~
## $ ActivityDate             <date> 2020-04-12, 2020-04-13, 2020-04-14, 2020-04-~
## $ TotalSteps               <int> 13162, 10735, 10460, 9762, 12669, 9705, 13019~
## $ TotalDistance            <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.8~
## $ TrackerDistance          <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.8~
## $ LoggedActivitiesDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ VeryActiveDistance       <dbl> 1.88, 1.57, 2.44, 2.14, 2.71, 3.19, 3.25, 3.5~
## $ ModeratelyActiveDistance <dbl> 0.55, 0.69, 0.40, 1.26, 0.41, 0.78, 0.64, 1.3~
## $ LightActiveDistance      <dbl> 6.06, 4.71, 3.91, 2.83, 5.04, 2.51, 4.71, 5.0~
## $ SedentaryActiveDistance  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ VeryActiveMinutes        <int> 25, 21, 30, 29, 36, 38, 42, 50, 28, 19, 66, 4~
## $ FairlyActiveMinutes      <int> 13, 19, 11, 34, 10, 20, 16, 31, 12, 8, 27, 21~
## $ LightlyActiveMinutes     <int> 328, 217, 181, 209, 221, 164, 233, 264, 205, ~
## $ SedentaryMinutes         <int> 728, 776, 1218, 726, 773, 539, 1149, 775, 818~
## $ Calories                 <int> 1985, 1797, 1776, 1745, 1863, 1728, 1921, 203~
View(dailyActivity)

Plotting a density graph showing the relationship in density of calories burned by users.

This shows that the more density of very active minutes, the higher calories burned. The calories burned are stabilized when there is low density on active minutes.

ggplot(data = dailyActivity, mapping = aes(x=Calories, fill = VeryActiveMinutes, color = "VeryActiveMinutes" )) +
  geom_density(bw = 2)

Plotting the correlation graph

corrplot(corr = cor(dailyActivity[11:15]), order = "AOE",
         method = "circle",
         type = "upper", tl.pos = "lt")

corrplot(corr = cor(dailyActivity[11:15]), add= TRUE, order = "AOE",
         method = "pie", diag = FALSE,
         type = "lower", tl.pos = "n", cl.pos = "n")

This plot suggests that there is a positive correlation between calories burned and active minutes and also a strong negative correlation or relationship between sendentary minutes and calories burned.

Recommendations

In correlation with the analysis, the result leads to the following recommendations being made: