Bellabeat, a high-tech company that manufactures health-focused smart products like watches, mobile apps, wellness tracker, smart water bottle and access to fully personalized guidance on nutrition, activity, sleep, health etc.Some of these products are connected to the BellaBeat app, which offers up-to-date data on users’ health and wellbeing. Urka Sren, the CCO of BellaBeats, has asked the marketing analytics team to focus on a BellaBeats product and analyze smart device usage data to gain insight into how people are already using their smart devices. In the hope that the analysis’s findings will help businesses grow through effective marketing strategies.
Analyze smart device usage data to learn how consumers use non-Bellabeat smart devices. Also, make suggestions for improving the company’s marketing strategy.
We’ll use FitBit Fitness Tracker Data from Kaggle for our analysis. This data set contains personal fitness tracker data from 30 fitbit users who consented to the submission of information about their daily activity, steps, heart rate and sleep monitoring. It was sourced by a third-party, Amazon Mechanical Turk, between March-May 2016 and is licensed in the public domain (Creative Commons). We are free to use the data without running the risk of infringing any copyright law.
To effectively have a smooth analysis, it was necessary we use the right kind of data. Having identified the data source, we had to assess its suitability by asking questions about its source, reliability, originality and other important factors.
The data is comprehensive in terms of information variety. It also comes from a credible third-party with citation to original source (Furberg et al). However, due to potential sampling bias, it has low reliability. Because of the small sample size and the lack of demographic data, this is the case. Because our product is aimed at a specific demographic—women—any business decision based on this analysis should account for this unreliability. Furthermore, the data collected six years old and these may not adequately portray the behavior of users today.
To prepare the data set, it was downloaded and saved locally in a file called “Fitabase Data new”. We started by reviewing the data sources in Excel and deciding which ones to use for our analysis. The data set downloaded contains 18 CSV files which includes the following data: dailyActivity, dailyCalories, dailyIntensities, dailySteps subdivided into daily, hourly minute, wide and narrow categories. In addition to those, we also have heartrate_seconds, SleepDay and WeightLogInfo. After several considerations, we decided to focus on the data presented in the narrow format.
This stage basically involved downloading, storing and importing the data into the R studio environment.
The setwd() was used to define the path of the file in the computer used for the analysis.This was a necessary action because the knitting process in the R studio would not be completed without it..
setwd("C:/Users/USER/Documents/DATA ANALYTICS/vIDEOs/R/New R/Fitabase Data new")
The read.csv() function imports the data into the “environment” plane of the R studio.
knitr::opts_chunk$set(echo = TRUE)
dailyActivity_merged<-read.csv("dailyActivity_merged.csv")
heartrate_seconds_merged<-read.csv("heartrate_seconds_merged.csv")
dailySteps_merged<-read.csv("dailySteps_merged.csv")
dailyCalories_merged<-read.csv("dailyCalories_merged.csv")
dailyIntensities_merged<-read.csv("dailyIntensities_merged.csv")
hourlySteps_merged<-read.csv("hourlySteps_merged.csv")
hourlyCalories_merged<-read.csv("hourlyCalories_merged.csv")
hourlyIntensities_merged<-read.csv("hourlyIntensities_merged.csv")
minuteStepsNarrow_merged<-read.csv("minuteMETsNarrow_merged.csv")
minuteSleep_merged<-read.csv("minuteSleep_merged.csv")
minuteStepsNarrow_merged<-read.csv("minuteStepsNarrow_merged.csv")
minuteCaloriesNarrow_merged<-read.csv("minuteCaloriesNarrow_merged.csv")
sleepDay_merged<-read.csv("SleepDay_merged.csv")
weightLogInfo_merged<-read.csv("weightLogInfo_merged.csv")
At this stage, we had to carefully examine the data to determine which was best suited for our analysis. To accomplish this, we had to first install and load several R packages.
We had to install and load, as needed, some packages that would ensure the analysis ran smoothly.
Note: We used the {r, results=FALSE} to prevent some output from being displayed.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.7 ✔ stringr 1.4.0
## ✔ tidyr 1.2.0 ✔ forcats 0.5.1
## ✔ readr 2.1.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(ggplot2)
library(ggpubr)
## Warning: package 'ggpubr' was built under R version 4.2.1
library(data.table)
##
## Attaching package: 'data.table'
## The following object is masked from 'package:purrr':
##
## transpose
## The following objects are masked from 'package:dplyr':
##
## between, first, last
library(tinytex)
## Warning: package 'tinytex' was built under R version 4.2.1
options("max.print"=3000)
We used various functions to get information like number of missing values, number of columns, summary of data etc.
The sapply((list…)…) function enables us to view the column names, internal structure of the data by imbedding the functions: glimpse(), col(), str().
Note: The {r, result = FALSE} was used to prevent the output from being displayed.
sapply(list(dailyActivity_merged, dailyCalories_merged, dailyIntensities_merged, dailySteps_merged, heartrate_seconds_merged,
hourlyCalories_merged, hourlyIntensities_merged, hourlySteps_merged, sleepDay_merged, weightLogInfo_merged), glimpse)
sapply(list(dailyActivity_merged, dailyCalories_merged, dailyIntensities_merged, dailySteps_merged, heartrate_seconds_merged,
hourlyCalories_merged, hourlyIntensities_merged, hourlySteps_merged, sleepDay_merged, weightLogInfo_merged), str)
sapply(list(dailyActivity_merged, dailyCalories_merged, dailyIntensities_merged, dailySteps_merged, heartrate_seconds_merged,
hourlyCalories_merged, hourlyIntensities_merged, hourlySteps_merged, sleepDay_merged, weightLogInfo_merged), colnames)
The sum(is.na) function enabled to see the number of missing values for each data. The results reveals only 65 missing values in the weightlogInfo data.
sum(is.na(dailyActivity_merged))
## [1] 0
sum(is.na(dailyIntensities_merged))
## [1] 0
sum(is.na(dailyCalories_merged))
## [1] 0
sum(is.na(dailySteps_merged))
## [1] 0
sum(is.na(heartrate_seconds_merged))
## [1] 0
sum(is.na(hourlyCalories_merged))
## [1] 0
sum(is.na(hourlyIntensities_merged))
## [1] 0
sum(is.na(hourlySteps_merged))
## [1] 0
sum(is.na(weightLogInfo_merged))
## [1] 65
The sum(is.null) function enabled us to see the number of null values for each data. The result shows that there are no null values.
sum(is.null(dailyActivity_merged))
## [1] 0
sum(is.null(dailyIntensities_merged))
## [1] 0
sum(is.null(dailyCalories_merged))
## [1] 0
sum(is.null(dailySteps_merged))
## [1] 0
sum(is.null(heartrate_seconds_merged))
## [1] 0
sum(is.null(hourlyCalories_merged))
## [1] 0
sum(is.null(hourlyIntensities_merged))
## [1] 0
sum(is.null(hourlySteps_merged))
## [1] 0
sum(is.null(sleepDay_merged))
## [1] 0
sum(is.null(weightLogInfo_merged))
## [1] 0
The “file name”%>% distinct(id) was used to count the number of participants that took part in the survey. The result showed only 8 and 14 entries for weightlogInfo and heartrate_seconds data.
dailyActivity_merged %>% distinct(Id)
dailyCalories_merged %>% distinct(Id)
dailyIntensities_merged %>% distinct(Id)
dailySteps_merged %>% distinct(Id)
heartrate_seconds_merged %>% distinct(Id)
hourlyCalories_merged %>% distinct(Id)
hourlyIntensities_merged %>% distinct(Id)
hourlySteps_merged %>% distinct(Id)
sleepDay_merged %>% distinct(Id)
weightLogInfo_merged %>% distinct(Id)
The nrow() function counts the number of rows. The weightLogInfo_merged data has a significantly fewer rows than the other data.
nrow(dailyActivity_merged)
## [1] 940
nrow(dailyCalories_merged)
## [1] 940
nrow(dailyIntensities_merged)
## [1] 940
nrow(dailySteps_merged)
## [1] 940
nrow(heartrate_seconds_merged)
## [1] 2483658
nrow(hourlyCalories_merged)
## [1] 22099
nrow(hourlyIntensities_merged)
## [1] 22099
nrow(hourlySteps_merged)
## [1] 22099
nrow(sleepDay_merged)
## [1] 413
nrow(weightLogInfo_merged)
## [1] 67
The summary() function summarized the important detail of the data.
summary(dailyActivity_merged)
## Id ActivityDate TotalSteps TotalDistance
## Min. :1.504e+09 Length:940 Min. : 0 Min. : 0.000
## 1st Qu.:2.320e+09 Class :character 1st Qu.: 3790 1st Qu.: 2.620
## Median :4.445e+09 Mode :character Median : 7406 Median : 5.245
## Mean :4.855e+09 Mean : 7638 Mean : 5.490
## 3rd Qu.:6.962e+09 3rd Qu.:10727 3rd Qu.: 7.713
## Max. :8.878e+09 Max. :36019 Max. :28.030
## TrackerDistance LoggedActivitiesDistance VeryActiveDistance
## Min. : 0.000 Min. :0.0000 Min. : 0.000
## 1st Qu.: 2.620 1st Qu.:0.0000 1st Qu.: 0.000
## Median : 5.245 Median :0.0000 Median : 0.210
## Mean : 5.475 Mean :0.1082 Mean : 1.503
## 3rd Qu.: 7.710 3rd Qu.:0.0000 3rd Qu.: 2.053
## Max. :28.030 Max. :4.9421 Max. :21.920
## ModeratelyActiveDistance LightActiveDistance SedentaryActiveDistance
## Min. :0.0000 Min. : 0.000 Min. :0.000000
## 1st Qu.:0.0000 1st Qu.: 1.945 1st Qu.:0.000000
## Median :0.2400 Median : 3.365 Median :0.000000
## Mean :0.5675 Mean : 3.341 Mean :0.001606
## 3rd Qu.:0.8000 3rd Qu.: 4.782 3rd Qu.:0.000000
## Max. :6.4800 Max. :10.710 Max. :0.110000
## VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes
## Min. : 0.00 Min. : 0.00 Min. : 0.0 Min. : 0.0
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.:127.0 1st Qu.: 729.8
## Median : 4.00 Median : 6.00 Median :199.0 Median :1057.5
## Mean : 21.16 Mean : 13.56 Mean :192.8 Mean : 991.2
## 3rd Qu.: 32.00 3rd Qu.: 19.00 3rd Qu.:264.0 3rd Qu.:1229.5
## Max. :210.00 Max. :143.00 Max. :518.0 Max. :1440.0
## Calories
## Min. : 0
## 1st Qu.:1828
## Median :2134
## Mean :2304
## 3rd Qu.:2793
## Max. :4900
summary(dailyCalories_merged)
## Id ActivityDay Calories
## Min. :1.504e+09 Length:940 Min. : 0
## 1st Qu.:2.320e+09 Class :character 1st Qu.:1828
## Median :4.445e+09 Mode :character Median :2134
## Mean :4.855e+09 Mean :2304
## 3rd Qu.:6.962e+09 3rd Qu.:2793
## Max. :8.878e+09 Max. :4900
summary(dailyIntensities_merged)
## Id ActivityDay SedentaryMinutes LightlyActiveMinutes
## Min. :1.504e+09 Length:940 Min. : 0.0 Min. : 0.0
## 1st Qu.:2.320e+09 Class :character 1st Qu.: 729.8 1st Qu.:127.0
## Median :4.445e+09 Mode :character Median :1057.5 Median :199.0
## Mean :4.855e+09 Mean : 991.2 Mean :192.8
## 3rd Qu.:6.962e+09 3rd Qu.:1229.5 3rd Qu.:264.0
## Max. :8.878e+09 Max. :1440.0 Max. :518.0
## FairlyActiveMinutes VeryActiveMinutes SedentaryActiveDistance
## Min. : 0.00 Min. : 0.00 Min. :0.000000
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.:0.000000
## Median : 6.00 Median : 4.00 Median :0.000000
## Mean : 13.56 Mean : 21.16 Mean :0.001606
## 3rd Qu.: 19.00 3rd Qu.: 32.00 3rd Qu.:0.000000
## Max. :143.00 Max. :210.00 Max. :0.110000
## LightActiveDistance ModeratelyActiveDistance VeryActiveDistance
## Min. : 0.000 Min. :0.0000 Min. : 0.000
## 1st Qu.: 1.945 1st Qu.:0.0000 1st Qu.: 0.000
## Median : 3.365 Median :0.2400 Median : 0.210
## Mean : 3.341 Mean :0.5675 Mean : 1.503
## 3rd Qu.: 4.782 3rd Qu.:0.8000 3rd Qu.: 2.053
## Max. :10.710 Max. :6.4800 Max. :21.920
summary(dailySteps_merged)
## Id ActivityDay StepTotal
## Min. :1.504e+09 Length:940 Min. : 0
## 1st Qu.:2.320e+09 Class :character 1st Qu.: 3790
## Median :4.445e+09 Mode :character Median : 7406
## Mean :4.855e+09 Mean : 7638
## 3rd Qu.:6.962e+09 3rd Qu.:10727
## Max. :8.878e+09 Max. :36019
summary(heartrate_seconds_merged)
## Id Time Value
## Min. :2.022e+09 Length:2483658 Min. : 36.00
## 1st Qu.:4.388e+09 Class :character 1st Qu.: 63.00
## Median :5.554e+09 Mode :character Median : 73.00
## Mean :5.514e+09 Mean : 77.33
## 3rd Qu.:6.962e+09 3rd Qu.: 88.00
## Max. :8.878e+09 Max. :203.00
summary(hourlyCalories_merged)
## Id ActivityHour Calories
## Min. :1.504e+09 Length:22099 Min. : 42.00
## 1st Qu.:2.320e+09 Class :character 1st Qu.: 63.00
## Median :4.445e+09 Mode :character Median : 83.00
## Mean :4.848e+09 Mean : 97.39
## 3rd Qu.:6.962e+09 3rd Qu.:108.00
## Max. :8.878e+09 Max. :948.00
summary(hourlyIntensities_merged)
## Id ActivityHour TotalIntensity AverageIntensity
## Min. :1.504e+09 Length:22099 Min. : 0.00 Min. :0.0000
## 1st Qu.:2.320e+09 Class :character 1st Qu.: 0.00 1st Qu.:0.0000
## Median :4.445e+09 Mode :character Median : 3.00 Median :0.0500
## Mean :4.848e+09 Mean : 12.04 Mean :0.2006
## 3rd Qu.:6.962e+09 3rd Qu.: 16.00 3rd Qu.:0.2667
## Max. :8.878e+09 Max. :180.00 Max. :3.0000
summary(hourlySteps_merged)
## Id ActivityHour StepTotal
## Min. :1.504e+09 Length:22099 Min. : 0.0
## 1st Qu.:2.320e+09 Class :character 1st Qu.: 0.0
## Median :4.445e+09 Mode :character Median : 40.0
## Mean :4.848e+09 Mean : 320.2
## 3rd Qu.:6.962e+09 3rd Qu.: 357.0
## Max. :8.878e+09 Max. :10554.0
summary(sleepDay_merged)
## Id SleepDay TotalSleepRecords TotalMinutesAsleep
## Min. :1.504e+09 Length:413 Min. :1.000 Min. : 58.0
## 1st Qu.:3.977e+09 Class :character 1st Qu.:1.000 1st Qu.:361.0
## Median :4.703e+09 Mode :character Median :1.000 Median :433.0
## Mean :5.001e+09 Mean :1.119 Mean :419.5
## 3rd Qu.:6.962e+09 3rd Qu.:1.000 3rd Qu.:490.0
## Max. :8.792e+09 Max. :3.000 Max. :796.0
## TotalTimeInBed
## Min. : 61.0
## 1st Qu.:403.0
## Median :463.0
## Mean :458.6
## 3rd Qu.:526.0
## Max. :961.0
summary(weightLogInfo_merged)
## Id Date WeightKg WeightPounds
## Min. :1.504e+09 Length:67 Min. : 52.60 Min. :116.0
## 1st Qu.:6.962e+09 Class :character 1st Qu.: 61.40 1st Qu.:135.4
## Median :6.962e+09 Mode :character Median : 62.50 Median :137.8
## Mean :7.009e+09 Mean : 72.04 Mean :158.8
## 3rd Qu.:8.878e+09 3rd Qu.: 85.05 3rd Qu.:187.5
## Max. :8.878e+09 Max. :133.50 Max. :294.3
##
## Fat BMI IsManualReport LogId
## Min. :22.00 Min. :21.45 Length:67 Min. :1.460e+12
## 1st Qu.:22.75 1st Qu.:23.96 Class :character 1st Qu.:1.461e+12
## Median :23.50 Median :24.39 Mode :character Median :1.462e+12
## Mean :23.50 Mean :25.19 Mean :1.462e+12
## 3rd Qu.:24.25 3rd Qu.:25.56 3rd Qu.:1.462e+12
## Max. :25.00 Max. :47.54 Max. :1.463e+12
## NA's :65
After inspecting the data, we decided to leave out the weightlogInfo and heartrate_seconds data. These data was deemed insufficient to provide a useful insight. Also, we narrowed our focus towards data recorded on daily and hourly basis.
We had the need to compare some of the data against the days of the week. We used the mutate(), format(), as.Date and as.numeric(), data.table() functions to attach “Weekday” columns to the selected data table and renamed them accordingly.
In some other cases, we needed to create new tables entirely. Click on New tables to see.
daily_activity <- dailyActivity_merged %>% mutate(Weekday =weekdays(as.Date(ActivityDate, "%m/%d/%y")))
daily_activity$ActivityDate <- as.Date(daily_activity$ActivityDate)
daily_activity$WeekdayNum <- format(daily_activity$ActivityDate, "%u")
daily_activity$WeekdayNum <- as.numeric(daily_activity$WeekdayNum)
daily_steps <- dailySteps_merged %>% mutate(Weekday =weekdays(as.Date(ActivityDay, "%m/%d/%y")))
daily_steps$ActivityDay <- as.Date(daily_steps$ActivityDay)
daily_steps$WeekdayNum <- format(daily_steps$ActivityDay, "%u")
daily_steps$WeekdayNum <- as.numeric(daily_steps$WeekdayNum)
daily_intensity <- dailyIntensities_merged %>% mutate(Weekday =weekdays(as.Date(ActivityDay, "%m/%d/%y")))
daily_intensity$ActivityDay <- as.Date(daily_intensity$ActivityDay)
daily_intensity$WeekdayNum <- format(daily_intensity$ActivityDay, "%u")
daily_intensity$WeekdayNum <- as.numeric(daily_intensity$WeekdayNum)
Intensity<-data.table(LightActive=daily_intensity$LightActiveDistance, ModerateActive=daily_intensity$ModeratelyActiveDistance, VeryActive=daily_intensity$VeryActiveDistance, WeekDay=daily_intensity$Weekday)
Activity<-data.table(LightActive=daily_activity$LightActiveDistance, ModeratelyActive=daily_activity$ModeratelyActiveDistance, VeryActive=daily_activity$VeryActiveDistance, WeekDay=daily_activity$Weekday)
Activity2<-data.table(Sedentary=daily_activity$SedentaryMinutes, FairlyActive=daily_activity$FairlyActiveMinutes, LightlyActive=daily_activity$LightlyActiveMinutes, VeryActive=daily_activity$VeryActiveMinutes, WeekDay=daily_activity$Weekday)
We used the ggplot() function to compare Total Time in Bed against Total Minutes Asleep attributes in the SleepData (See Importing the data ). From the plot, we notice that is a positive correlation (R = 0.92) between time in bed and time asleep which indicates a good sleep habit for most participants.
ggscatter(sleepDay_merged, x = "TotalTimeInBed", y = "TotalMinutesAsleep", add = "reg.line", conf.int = TRUE,
cor.coef = TRUE, cor.method = "spearman")
## `geom_smooth()` using formula 'y ~ x'
Although there is a small group (in the range of 250< time in bed <875) who slept less than the time spent in bed. This indicates a lack of sleep of some kind. With more data, the root causes of these sleeplessness could be identified.
ggplot(data = sleepDay_merged, aes(x = TotalTimeInBed, y = TotalMinutesAsleep)) + geom_point(aes(color =TotalMinutesAsleep)) + stat_smooth(geom = "smooth") +labs(title="Time Asleep vs Time in Bed") + theme(legend.position = "bottom")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
The charts below shows the minutes spent by the various users.
It can be seen that there were more active minutes spent on Sundays and Mondays than any other days of the week for the various users.
ggplot(data = daily_activity, aes(x=Weekday, y = VeryActiveMinutes)) + geom_bar(stat="identity", col=terrain.colors(940),show.legend = TRUE) +labs(title="Very Active Minutes vs. Weekdays") + theme(legend.position = "bottom")
ggplot(data = daily_activity, aes(x=Weekday, y = FairlyActiveMinutes)) + geom_bar(stat="identity", col=terrain.colors(940),show.legend = TRUE) +labs(title="Fairly Active Minutes vs. Weekdays") + theme(legend.position = "bottom")
ggplot(data = daily_activity, aes(x=Weekday, y = LightlyActiveMinutes)) + geom_bar(stat="identity", col=terrain.colors(940),show.legend = TRUE) +labs(title="Lightly Active Minutes vs. Weekdays") + theme(legend.position = "bottom")
ggplot(data = daily_activity, aes(x=Weekday, y = SedentaryMinutes)) + geom_bar(stat="identity", col=terrain.colors(940),show.legend = TRUE) +labs(title="Sedentary Minutes vs. Weekdays") + theme(legend.position = "bottom")
When the charts are combined (as shown below), it becomes clear that people engage in light activity for a greater percentage of their time than any other type of activity.
Activity2 %>%
pivot_longer(-WeekDay ) %>%
ggplot(aes(x = WeekDay, y = value, fill=name)) + geom_bar(stat="identity") + labs(title = "Minutes Vs Daily Activities")
The charts below shows the distances covered in each week day for the various users. The charts show that Sunday and Monday were the days most distances were traveled, which is consistent with the “minutes vs weekday” charts (see Activity Analysis).
ggplot(data = daily_intensity, aes(x=Weekday, y = VeryActiveDistance)) + geom_bar(stat="identity", col= terrain.colors(940)) +labs(title="Very Active Distance vs Weekday") + theme(legend.position = "top")
ggplot(data = daily_intensity, aes(x=Weekday, y = ModeratelyActiveDistance)) + geom_bar(stat="identity", col= terrain.colors(940)) +labs(title="Moderatly Active Distance vs Weekday") + theme(legend.position = "top")
ggplot(data = daily_intensity, aes(x=Weekday, y = LightActiveDistance)) + geom_bar(stat="identity", col= terrain.colors(940)) +labs(title="Light Active Distance vs Weekday") + theme(legend.position = "top")
ggplot(data = daily_intensity, aes(x=Weekday, y = SedentaryActiveDistance)) + geom_bar(stat="identity", col= terrain.colors(940)) +labs(title="Sedentary Active Distance vs Weekday") + theme(legend.position = "top")
When the charts are combined, it becomes clear that users when engaged in light activity covered the most distance.
Activity %>%
pivot_longer(-WeekDay ) %>%
ggplot(aes(x = WeekDay, y = value, fill=name)) + geom_bar(stat="identity") + labs(title = "Distance Vs Daily Activities")
From the minutes and distance charts, we see that most users prefer light activity.
During these activities, active users spent an average of 193 minutes and covered an average distance of 3.3 km.
mean(daily_activity$LightlyActiveMinutes)
## [1] 192.8128
mean(daily_intensity$LightActiveDistance)
## [1] 3.340819
The plots below shows that steps taken and calories burned are moderately positively correlated (R = 0.56). This indicates that increasing the number of steps taken did, in some cases, result in increased calorie burn.
This implies that more steps does not always lead to more calories burned. There are other factors to be put into consideration.
ggscatter(dailyActivity_merged, x = "TotalSteps", y = "Calories", add = "reg.line", conf.int = TRUE,
cor.coef = TRUE, cor.method = "spearman")
## `geom_smooth()` using formula 'y ~ x'
ggplot(data = dailyActivity_merged, aes(x = TotalSteps, y = Calories)) + geom_point(colour = "purple") + stat_smooth(geom = "smooth", col=terrain.colors(80)) + scale_color_gradient(low="red", high="blue") +labs(title="Total Steps vs Calories Burnt") + theme(legend.position = "bottom")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
The plot below shows that more steps were taken on Monday and Sunday. This is consistent with the observation in the “daily_Activity” data where most activities were recorded on Sunday and Monday.See Activity Analysis.
ggplot(data = daily_steps, aes(x=Weekday, y = StepTotal)) +geom_bar(stat="identity", col= terrain.colors(940)) +labs(title="Daily Steps") + theme(legend.position = "top")
Most users do get a good night rest. See Sleep Analysis
Most users are active on Sundays and Mondays. See Activity Analysis
Most users can be classified as lightly active. During these light activities, they spent an average of 193 minutes out 226 active minutes, covers a distance of 3.3 km out of a total distance of 5.49 km. See Summary of the data
Generally, users consume an average of 2304 kCal of energy while taking an average of 7638 steps.
While the majority of users do get enough sleep, a small percentage seem to sleep less compared to the amount of time they spend in bed. The company should integrate a bedtime reminder in the “leaf” or watch, offer treatments for insomnia, or direct users to resources for expert assistance in the Bellabeat app.
Since the majority of users prefer “light” activities and favor some days over others, the Bellabeat app should contain recommendations for both light exercise routines that cover every day of the week and hard exercise routines that entail a few days in a week.
The Bellabeap app should also contain recommendations of activities that could increase calorie consumption during light to vigorous activity.
The company should host yearly marathons or similar events, with the entry fee being the possession or purchase of any Bellabeat product.
If a customer has any questions, they should be given the chance to speak with someone. These contacts should to be included with the products.