INTRODUCTION

Bellabeat, a high-tech company that manufactures health-focused smart products like watches, mobile apps, wellness tracker, smart water bottle and access to fully personalized guidance on nutrition, activity, sleep, health etc.Some of these products are connected to the BellaBeat app, which offers up-to-date data on users’ health and wellbeing. Urka Sren, the CCO of BellaBeats, has asked the marketing analytics team to focus on a BellaBeats product and analyze smart device usage data to gain insight into how people are already using their smart devices. In the hope that the analysis’s findings will help businesses grow through effective marketing strategies.

STAGE 1: BUSINESS TASK

Analyze smart device usage data to learn how consumers use non-Bellabeat smart devices. Also, make suggestions for improving the company’s marketing strategy.

Data Source

We’ll use FitBit Fitness Tracker Data from Kaggle for our analysis. This data set contains personal fitness tracker data from 30 fitbit users who consented to the submission of information about their daily activity, steps, heart rate and sleep monitoring. It was sourced by a third-party, Amazon Mechanical Turk, between March-May 2016 and is licensed in the public domain (Creative Commons). We are free to use the data without running the risk of infringing any copyright law.

Is the data ROOC?

To effectively have a smooth analysis, it was necessary we use the right kind of data. Having identified the data source, we had to assess its suitability by asking questions about its source, reliability, originality and other important factors.

The data is comprehensive in terms of information variety. It also comes from a credible third-party with citation to original source (Furberg et al). However, due to potential sampling bias, it has low reliability. Because of the small sample size and the lack of demographic data, this is the case. Because our product is aimed at a specific demographic—women—any business decision based on this analysis should account for this unreliability. Furthermore, the data collected six years old and these may not adequately portray the behavior of users today.

STAGE 2: DATA COLLECTION

To prepare the data set, it was downloaded and saved locally in a file called “Fitabase Data new”. We started by reviewing the data sources in Excel and deciding which ones to use for our analysis. The data set downloaded contains 18 CSV files which includes the following data: dailyActivity, dailyCalories, dailyIntensities, dailySteps subdivided into daily, hourly minute, wide and narrow categories. In addition to those, we also have heartrate_seconds, SleepDay and WeightLogInfo. After several considerations, we decided to focus on the data presented in the narrow format.

This stage basically involved downloading, storing and importing the data into the R studio environment.

Defining the working directory

The setwd() was used to define the path of the file in the computer used for the analysis.This was a necessary action because the knitting process in the R studio would not be completed without it..

setwd("C:/Users/USER/Documents/DATA ANALYTICS/vIDEOs/R/New R/Fitabase Data new")

Importing the data

The read.csv() function imports the data into the “environment” plane of the R studio.

knitr::opts_chunk$set(echo = TRUE)
dailyActivity_merged<-read.csv("dailyActivity_merged.csv")
heartrate_seconds_merged<-read.csv("heartrate_seconds_merged.csv")
dailySteps_merged<-read.csv("dailySteps_merged.csv")
dailyCalories_merged<-read.csv("dailyCalories_merged.csv")
dailyIntensities_merged<-read.csv("dailyIntensities_merged.csv")
hourlySteps_merged<-read.csv("hourlySteps_merged.csv")
hourlyCalories_merged<-read.csv("hourlyCalories_merged.csv")
hourlyIntensities_merged<-read.csv("hourlyIntensities_merged.csv")
minuteStepsNarrow_merged<-read.csv("minuteMETsNarrow_merged.csv")
minuteSleep_merged<-read.csv("minuteSleep_merged.csv")
minuteStepsNarrow_merged<-read.csv("minuteStepsNarrow_merged.csv")
minuteCaloriesNarrow_merged<-read.csv("minuteCaloriesNarrow_merged.csv")
sleepDay_merged<-read.csv("SleepDay_merged.csv")
weightLogInfo_merged<-read.csv("weightLogInfo_merged.csv")

STAGE 3: DATA CLEANING

At this stage, we had to carefully examine the data to determine which was best suited for our analysis. To accomplish this, we had to first install and load several R packages.

Loading the necessary packages

We had to install and load, as needed, some packages that would ensure the analysis ran smoothly.

Note: We used the {r, results=FALSE} to prevent some output from being displayed.

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.7     ✔ stringr 1.4.0
## ✔ tidyr   1.2.0     ✔ forcats 0.5.1
## ✔ readr   2.1.2

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

library(ggplot2)
library(ggpubr)

## Warning: package 'ggpubr' was built under R version 4.2.1

library(data.table)

## 
## Attaching package: 'data.table'

## The following object is masked from 'package:purrr':
## 
##     transpose

## The following objects are masked from 'package:dplyr':
## 
##     between, first, last

library(tinytex)

## Warning: package 'tinytex' was built under R version 4.2.1

options("max.print"=3000)

Inspecting the data

We used various functions to get information like number of missing values, number of columns, summary of data etc.

The sapply((list…)…) function enables us to view the column names, internal structure of the data by imbedding the functions: glimpse(), col(), str().

Note: The {r, result = FALSE} was used to prevent the output from being displayed.

sapply(list(dailyActivity_merged, dailyCalories_merged, dailyIntensities_merged, dailySteps_merged, heartrate_seconds_merged, 
            hourlyCalories_merged, hourlyIntensities_merged, hourlySteps_merged, sleepDay_merged, weightLogInfo_merged), glimpse)

sapply(list(dailyActivity_merged, dailyCalories_merged, dailyIntensities_merged, dailySteps_merged, heartrate_seconds_merged, 
            hourlyCalories_merged, hourlyIntensities_merged, hourlySteps_merged, sleepDay_merged, weightLogInfo_merged), str)

sapply(list(dailyActivity_merged, dailyCalories_merged, dailyIntensities_merged, dailySteps_merged, heartrate_seconds_merged, 
            hourlyCalories_merged, hourlyIntensities_merged, hourlySteps_merged,  sleepDay_merged, weightLogInfo_merged), colnames)

The sum(is.na) function enabled to see the number of missing values for each data. The results reveals only 65 missing values in the weightlogInfo data.

sum(is.na(dailyActivity_merged))

## [1] 0

sum(is.na(dailyIntensities_merged))

## [1] 0

sum(is.na(dailyCalories_merged))

## [1] 0

sum(is.na(dailySteps_merged))

## [1] 0

sum(is.na(heartrate_seconds_merged))

## [1] 0

sum(is.na(hourlyCalories_merged))

## [1] 0

sum(is.na(hourlyIntensities_merged))

## [1] 0

sum(is.na(hourlySteps_merged))

## [1] 0

sum(is.na(weightLogInfo_merged))

## [1] 65

The sum(is.null) function enabled us to see the number of null values for each data. The result shows that there are no null values.

sum(is.null(dailyActivity_merged))

## [1] 0

sum(is.null(dailyIntensities_merged))

## [1] 0

sum(is.null(dailyCalories_merged))

## [1] 0

sum(is.null(dailySteps_merged))

## [1] 0

sum(is.null(heartrate_seconds_merged))

## [1] 0

sum(is.null(hourlyCalories_merged))

## [1] 0

sum(is.null(hourlyIntensities_merged))

## [1] 0

sum(is.null(hourlySteps_merged))

## [1] 0

sum(is.null(sleepDay_merged))

## [1] 0

sum(is.null(weightLogInfo_merged))

## [1] 0

The “file name”%>% distinct(id) was used to count the number of participants that took part in the survey. The result showed only 8 and 14 entries for weightlogInfo and heartrate_seconds data.

dailyActivity_merged %>% distinct(Id)
dailyCalories_merged %>% distinct(Id)
dailyIntensities_merged %>% distinct(Id)
dailySteps_merged %>% distinct(Id)
heartrate_seconds_merged %>% distinct(Id)
hourlyCalories_merged %>% distinct(Id)
hourlyIntensities_merged %>% distinct(Id)
hourlySteps_merged %>% distinct(Id)
sleepDay_merged %>% distinct(Id)
weightLogInfo_merged %>% distinct(Id)

The nrow() function counts the number of rows. The weightLogInfo_merged data has a significantly fewer rows than the other data.

nrow(dailyActivity_merged)

## [1] 940

nrow(dailyCalories_merged)

## [1] 940

nrow(dailyIntensities_merged)

## [1] 940

nrow(dailySteps_merged)

## [1] 940

nrow(heartrate_seconds_merged)

## [1] 2483658

nrow(hourlyCalories_merged)

## [1] 22099

nrow(hourlyIntensities_merged)

## [1] 22099

nrow(hourlySteps_merged)

## [1] 22099

nrow(sleepDay_merged)

## [1] 413

nrow(weightLogInfo_merged)

## [1] 67

Summary of the data

The summary() function summarized the important detail of the data.

summary(dailyActivity_merged)

##        Id            ActivityDate         TotalSteps    TotalDistance   
##  Min.   :1.504e+09   Length:940         Min.   :    0   Min.   : 0.000  
##  1st Qu.:2.320e+09   Class :character   1st Qu.: 3790   1st Qu.: 2.620  
##  Median :4.445e+09   Mode  :character   Median : 7406   Median : 5.245  
##  Mean   :4.855e+09                      Mean   : 7638   Mean   : 5.490  
##  3rd Qu.:6.962e+09                      3rd Qu.:10727   3rd Qu.: 7.713  
##  Max.   :8.878e+09                      Max.   :36019   Max.   :28.030  
##  TrackerDistance  LoggedActivitiesDistance VeryActiveDistance
##  Min.   : 0.000   Min.   :0.0000           Min.   : 0.000    
##  1st Qu.: 2.620   1st Qu.:0.0000           1st Qu.: 0.000    
##  Median : 5.245   Median :0.0000           Median : 0.210    
##  Mean   : 5.475   Mean   :0.1082           Mean   : 1.503    
##  3rd Qu.: 7.710   3rd Qu.:0.0000           3rd Qu.: 2.053    
##  Max.   :28.030   Max.   :4.9421           Max.   :21.920    
##  ModeratelyActiveDistance LightActiveDistance SedentaryActiveDistance
##  Min.   :0.0000           Min.   : 0.000      Min.   :0.000000       
##  1st Qu.:0.0000           1st Qu.: 1.945      1st Qu.:0.000000       
##  Median :0.2400           Median : 3.365      Median :0.000000       
##  Mean   :0.5675           Mean   : 3.341      Mean   :0.001606       
##  3rd Qu.:0.8000           3rd Qu.: 4.782      3rd Qu.:0.000000       
##  Max.   :6.4800           Max.   :10.710      Max.   :0.110000       
##  VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes
##  Min.   :  0.00    Min.   :  0.00      Min.   :  0.0        Min.   :   0.0  
##  1st Qu.:  0.00    1st Qu.:  0.00      1st Qu.:127.0        1st Qu.: 729.8  
##  Median :  4.00    Median :  6.00      Median :199.0        Median :1057.5  
##  Mean   : 21.16    Mean   : 13.56      Mean   :192.8        Mean   : 991.2  
##  3rd Qu.: 32.00    3rd Qu.: 19.00      3rd Qu.:264.0        3rd Qu.:1229.5  
##  Max.   :210.00    Max.   :143.00      Max.   :518.0        Max.   :1440.0  
##     Calories   
##  Min.   :   0  
##  1st Qu.:1828  
##  Median :2134  
##  Mean   :2304  
##  3rd Qu.:2793  
##  Max.   :4900

summary(dailyCalories_merged)

##        Id            ActivityDay           Calories   
##  Min.   :1.504e+09   Length:940         Min.   :   0  
##  1st Qu.:2.320e+09   Class :character   1st Qu.:1828  
##  Median :4.445e+09   Mode  :character   Median :2134  
##  Mean   :4.855e+09                      Mean   :2304  
##  3rd Qu.:6.962e+09                      3rd Qu.:2793  
##  Max.   :8.878e+09                      Max.   :4900

summary(dailyIntensities_merged)

##        Id            ActivityDay        SedentaryMinutes LightlyActiveMinutes
##  Min.   :1.504e+09   Length:940         Min.   :   0.0   Min.   :  0.0       
##  1st Qu.:2.320e+09   Class :character   1st Qu.: 729.8   1st Qu.:127.0       
##  Median :4.445e+09   Mode  :character   Median :1057.5   Median :199.0       
##  Mean   :4.855e+09                      Mean   : 991.2   Mean   :192.8       
##  3rd Qu.:6.962e+09                      3rd Qu.:1229.5   3rd Qu.:264.0       
##  Max.   :8.878e+09                      Max.   :1440.0   Max.   :518.0       
##  FairlyActiveMinutes VeryActiveMinutes SedentaryActiveDistance
##  Min.   :  0.00      Min.   :  0.00    Min.   :0.000000       
##  1st Qu.:  0.00      1st Qu.:  0.00    1st Qu.:0.000000       
##  Median :  6.00      Median :  4.00    Median :0.000000       
##  Mean   : 13.56      Mean   : 21.16    Mean   :0.001606       
##  3rd Qu.: 19.00      3rd Qu.: 32.00    3rd Qu.:0.000000       
##  Max.   :143.00      Max.   :210.00    Max.   :0.110000       
##  LightActiveDistance ModeratelyActiveDistance VeryActiveDistance
##  Min.   : 0.000      Min.   :0.0000           Min.   : 0.000    
##  1st Qu.: 1.945      1st Qu.:0.0000           1st Qu.: 0.000    
##  Median : 3.365      Median :0.2400           Median : 0.210    
##  Mean   : 3.341      Mean   :0.5675           Mean   : 1.503    
##  3rd Qu.: 4.782      3rd Qu.:0.8000           3rd Qu.: 2.053    
##  Max.   :10.710      Max.   :6.4800           Max.   :21.920

summary(dailySteps_merged)

##        Id            ActivityDay          StepTotal    
##  Min.   :1.504e+09   Length:940         Min.   :    0  
##  1st Qu.:2.320e+09   Class :character   1st Qu.: 3790  
##  Median :4.445e+09   Mode  :character   Median : 7406  
##  Mean   :4.855e+09                      Mean   : 7638  
##  3rd Qu.:6.962e+09                      3rd Qu.:10727  
##  Max.   :8.878e+09                      Max.   :36019

summary(heartrate_seconds_merged)

##        Id                Time               Value       
##  Min.   :2.022e+09   Length:2483658     Min.   : 36.00  
##  1st Qu.:4.388e+09   Class :character   1st Qu.: 63.00  
##  Median :5.554e+09   Mode  :character   Median : 73.00  
##  Mean   :5.514e+09                      Mean   : 77.33  
##  3rd Qu.:6.962e+09                      3rd Qu.: 88.00  
##  Max.   :8.878e+09                      Max.   :203.00

summary(hourlyCalories_merged)

##        Id            ActivityHour          Calories     
##  Min.   :1.504e+09   Length:22099       Min.   : 42.00  
##  1st Qu.:2.320e+09   Class :character   1st Qu.: 63.00  
##  Median :4.445e+09   Mode  :character   Median : 83.00  
##  Mean   :4.848e+09                      Mean   : 97.39  
##  3rd Qu.:6.962e+09                      3rd Qu.:108.00  
##  Max.   :8.878e+09                      Max.   :948.00

summary(hourlyIntensities_merged)

##        Id            ActivityHour       TotalIntensity   AverageIntensity
##  Min.   :1.504e+09   Length:22099       Min.   :  0.00   Min.   :0.0000  
##  1st Qu.:2.320e+09   Class :character   1st Qu.:  0.00   1st Qu.:0.0000  
##  Median :4.445e+09   Mode  :character   Median :  3.00   Median :0.0500  
##  Mean   :4.848e+09                      Mean   : 12.04   Mean   :0.2006  
##  3rd Qu.:6.962e+09                      3rd Qu.: 16.00   3rd Qu.:0.2667  
##  Max.   :8.878e+09                      Max.   :180.00   Max.   :3.0000

summary(hourlySteps_merged)

##        Id            ActivityHour         StepTotal      
##  Min.   :1.504e+09   Length:22099       Min.   :    0.0  
##  1st Qu.:2.320e+09   Class :character   1st Qu.:    0.0  
##  Median :4.445e+09   Mode  :character   Median :   40.0  
##  Mean   :4.848e+09                      Mean   :  320.2  
##  3rd Qu.:6.962e+09                      3rd Qu.:  357.0  
##  Max.   :8.878e+09                      Max.   :10554.0

summary(sleepDay_merged)

##        Id              SleepDay         TotalSleepRecords TotalMinutesAsleep
##  Min.   :1.504e+09   Length:413         Min.   :1.000     Min.   : 58.0     
##  1st Qu.:3.977e+09   Class :character   1st Qu.:1.000     1st Qu.:361.0     
##  Median :4.703e+09   Mode  :character   Median :1.000     Median :433.0     
##  Mean   :5.001e+09                      Mean   :1.119     Mean   :419.5     
##  3rd Qu.:6.962e+09                      3rd Qu.:1.000     3rd Qu.:490.0     
##  Max.   :8.792e+09                      Max.   :3.000     Max.   :796.0     
##  TotalTimeInBed 
##  Min.   : 61.0  
##  1st Qu.:403.0  
##  Median :463.0  
##  Mean   :458.6  
##  3rd Qu.:526.0  
##  Max.   :961.0

summary(weightLogInfo_merged)

##        Id                Date              WeightKg       WeightPounds  
##  Min.   :1.504e+09   Length:67          Min.   : 52.60   Min.   :116.0  
##  1st Qu.:6.962e+09   Class :character   1st Qu.: 61.40   1st Qu.:135.4  
##  Median :6.962e+09   Mode  :character   Median : 62.50   Median :137.8  
##  Mean   :7.009e+09                      Mean   : 72.04   Mean   :158.8  
##  3rd Qu.:8.878e+09                      3rd Qu.: 85.05   3rd Qu.:187.5  
##  Max.   :8.878e+09                      Max.   :133.50   Max.   :294.3  
##                                                                         
##       Fat             BMI        IsManualReport         LogId          
##  Min.   :22.00   Min.   :21.45   Length:67          Min.   :1.460e+12  
##  1st Qu.:22.75   1st Qu.:23.96   Class :character   1st Qu.:1.461e+12  
##  Median :23.50   Median :24.39   Mode  :character   Median :1.462e+12  
##  Mean   :23.50   Mean   :25.19                      Mean   :1.462e+12  
##  3rd Qu.:24.25   3rd Qu.:25.56                      3rd Qu.:1.462e+12  
##  Max.   :25.00   Max.   :47.54                      Max.   :1.463e+12  
##  NA's   :65

Data Selection

After inspecting the data, we decided to leave out the weightlogInfo and heartrate_seconds data. These data was deemed insufficient to provide a useful insight. Also, we narrowed our focus towards data recorded on daily and hourly basis.

STAGE 3: ANALYSING THE DATA

We had the need to compare some of the data against the days of the week. We used the mutate(), format(), as.Date and as.numeric(), data.table() functions to attach “Weekday” columns to the selected data table and renamed them accordingly.

In some other cases, we needed to create new tables entirely. Click on New tables to see.

daily_activity <- dailyActivity_merged %>% mutate(Weekday =weekdays(as.Date(ActivityDate, "%m/%d/%y")))
daily_activity$ActivityDate <- as.Date(daily_activity$ActivityDate)
daily_activity$WeekdayNum <- format(daily_activity$ActivityDate, "%u")
daily_activity$WeekdayNum <- as.numeric(daily_activity$WeekdayNum)

daily_steps <- dailySteps_merged %>% mutate(Weekday =weekdays(as.Date(ActivityDay, "%m/%d/%y")))
daily_steps$ActivityDay <- as.Date(daily_steps$ActivityDay)
daily_steps$WeekdayNum <- format(daily_steps$ActivityDay, "%u")
daily_steps$WeekdayNum <- as.numeric(daily_steps$WeekdayNum)

daily_intensity <- dailyIntensities_merged %>% mutate(Weekday =weekdays(as.Date(ActivityDay, "%m/%d/%y")))
daily_intensity$ActivityDay <- as.Date(daily_intensity$ActivityDay)
daily_intensity$WeekdayNum <- format(daily_intensity$ActivityDay, "%u")
daily_intensity$WeekdayNum <- as.numeric(daily_intensity$WeekdayNum)

New tables

Intensity<-data.table(LightActive=daily_intensity$LightActiveDistance, ModerateActive=daily_intensity$ModeratelyActiveDistance, VeryActive=daily_intensity$VeryActiveDistance, WeekDay=daily_intensity$Weekday)

Activity<-data.table(LightActive=daily_activity$LightActiveDistance, ModeratelyActive=daily_activity$ModeratelyActiveDistance, VeryActive=daily_activity$VeryActiveDistance, WeekDay=daily_activity$Weekday)

Activity2<-data.table(Sedentary=daily_activity$SedentaryMinutes, FairlyActive=daily_activity$FairlyActiveMinutes, LightlyActive=daily_activity$LightlyActiveMinutes, VeryActive=daily_activity$VeryActiveMinutes, WeekDay=daily_activity$Weekday)

Sleep Analysis

We used the ggplot() function to compare Total Time in Bed against Total Minutes Asleep attributes in the SleepData (See Importing the data ). From the plot, we notice that is a positive correlation (R = 0.92) between time in bed and time asleep which indicates a good sleep habit for most participants.

ggscatter(sleepDay_merged, x = "TotalTimeInBed", y = "TotalMinutesAsleep", add = "reg.line", conf.int = TRUE, 
          cor.coef = TRUE, cor.method = "spearman")

## `geom_smooth()` using formula 'y ~ x'

Although there is a small group (in the range of 250< time in bed <875) who slept less than the time spent in bed. This indicates a lack of sleep of some kind. With more data, the root causes of these sleeplessness could be identified.

ggplot(data = sleepDay_merged, aes(x = TotalTimeInBed, y = TotalMinutesAsleep)) +  geom_point(aes(color =TotalMinutesAsleep)) + stat_smooth(geom = "smooth") +labs(title="Time Asleep vs Time in Bed") + theme(legend.position = "bottom")

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Activity Analysis

The charts below shows the minutes spent by the various users.

It can be seen that there were more active minutes spent on Sundays and Mondays than any other days of the week for the various users.

ggplot(data = daily_activity, aes(x=Weekday, y = VeryActiveMinutes)) +  geom_bar(stat="identity", col=terrain.colors(940),show.legend = TRUE) +labs(title="Very Active Minutes vs. Weekdays") + theme(legend.position = "bottom")

ggplot(data = daily_activity, aes(x=Weekday, y = FairlyActiveMinutes)) +  geom_bar(stat="identity", col=terrain.colors(940),show.legend = TRUE) +labs(title="Fairly Active Minutes vs. Weekdays") + theme(legend.position = "bottom")

ggplot(data = daily_activity, aes(x=Weekday, y = LightlyActiveMinutes)) +  geom_bar(stat="identity", col=terrain.colors(940),show.legend = TRUE) +labs(title="Lightly Active Minutes vs. Weekdays") + theme(legend.position = "bottom")

ggplot(data = daily_activity, aes(x=Weekday, y = SedentaryMinutes)) +  geom_bar(stat="identity", col=terrain.colors(940),show.legend = TRUE) +labs(title="Sedentary Minutes vs. Weekdays") + theme(legend.position = "bottom")

When the charts are combined (as shown below), it becomes clear that people engage in light activity for a greater percentage of their time than any other type of activity.

Activity2 %>% 
  pivot_longer(-WeekDay ) %>% 
  ggplot(aes(x = WeekDay, y = value, fill=name)) +  geom_bar(stat="identity") + labs(title = "Minutes Vs Daily Activities")

The charts below shows the distances covered in each week day for the various users. The charts show that Sunday and Monday were the days most distances were traveled, which is consistent with the “minutes vs weekday” charts (see Activity Analysis).

ggplot(data = daily_intensity, aes(x=Weekday, y = VeryActiveDistance)) +  geom_bar(stat="identity", col= terrain.colors(940)) +labs(title="Very Active Distance vs Weekday") + theme(legend.position = "top")

ggplot(data = daily_intensity, aes(x=Weekday, y = ModeratelyActiveDistance)) +  geom_bar(stat="identity", col= terrain.colors(940)) +labs(title="Moderatly Active Distance vs Weekday") + theme(legend.position = "top")

ggplot(data = daily_intensity, aes(x=Weekday, y = LightActiveDistance)) +  geom_bar(stat="identity", col= terrain.colors(940)) +labs(title="Light Active Distance vs Weekday") + theme(legend.position = "top")

ggplot(data = daily_intensity, aes(x=Weekday, y = SedentaryActiveDistance)) +  geom_bar(stat="identity", col= terrain.colors(940)) +labs(title="Sedentary Active Distance vs Weekday") + theme(legend.position = "top")

When the charts are combined, it becomes clear that users when engaged in light activity covered the most distance.

Activity %>% 
  pivot_longer(-WeekDay ) %>% 
  ggplot(aes(x = WeekDay, y = value, fill=name)) +  geom_bar(stat="identity") + labs(title = "Distance Vs Daily Activities")

From the minutes and distance charts, we see that most users prefer light activity.

During these activities, active users spent an average of 193 minutes and covered an average distance of 3.3 km.

mean(daily_activity$LightlyActiveMinutes)

## [1] 192.8128

mean(daily_intensity$LightActiveDistance)

## [1] 3.340819

Calories Analysis

The plots below shows that steps taken and calories burned are moderately positively correlated (R = 0.56). This indicates that increasing the number of steps taken did, in some cases, result in increased calorie burn.

This implies that more steps does not always lead to more calories burned. There are other factors to be put into consideration.

ggscatter(dailyActivity_merged, x = "TotalSteps", y = "Calories", add = "reg.line", conf.int = TRUE, 
          cor.coef = TRUE, cor.method = "spearman")

## `geom_smooth()` using formula 'y ~ x'

ggplot(data = dailyActivity_merged, aes(x = TotalSteps, y = Calories)) +  geom_point(colour = "purple") + stat_smooth(geom = "smooth", col=terrain.colors(80)) + scale_color_gradient(low="red", high="blue") +labs(title="Total Steps vs Calories Burnt") + theme(legend.position = "bottom")

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

The plot below shows that more steps were taken on Monday and Sunday. This is consistent with the observation in the “daily_Activity” data where most activities were recorded on Sunday and Monday.See Activity Analysis.

ggplot(data = daily_steps, aes(x=Weekday, y = StepTotal)) +geom_bar(stat="identity", col= terrain.colors(940)) +labs(title="Daily Steps") + theme(legend.position = "top")

SUMMARY OF ANALYSIS FINDINGS

Most users do get a good night rest. See Sleep Analysis
Most users are active on Sundays and Mondays. See Activity Analysis
Most users can be classified as lightly active. During these light activities, they spent an average of 193 minutes out 226 active minutes, covers a distance of 3.3 km out of a total distance of 5.49 km. See Summary of the data
Generally, users consume an average of 2304 kCal of energy while taking an average of 7638 steps.

RECOMMENDATIONS

While the majority of users do get enough sleep, a small percentage seem to sleep less compared to the amount of time they spend in bed. The company should integrate a bedtime reminder in the “leaf” or watch, offer treatments for insomnia, or direct users to resources for expert assistance in the Bellabeat app.
Since the majority of users prefer “light” activities and favor some days over others, the Bellabeat app should contain recommendations for both light exercise routines that cover every day of the week and hard exercise routines that entail a few days in a week.
The Bellabeap app should also contain recommendations of activities that could increase calorie consumption during light to vigorous activity.
The company should host yearly marathons or similar events, with the entry fee being the possession or purchase of any Bellabeat product.
If a customer has any questions, they should be given the chance to speak with someone. These contacts should to be included with the products.

Google Data Analytics Capstone Project: Bellabeat Case Study

Chinemerem Okpara C

2022-08-28