ASK:

In this case study, we are asked by a company, Bellabeat, to track and identify trends in their device usage by their customers. Bellabeat is a fitness device company with a total of 3 devices: Leaf, Time, and Spring. Bellabeat also provides a subscription-based membership with a fully guided help into their nutrition, activity, sleep, health and beauty based on their lifestyle and goals.

Goals:

The shareholder of Bellabeat has specifically asked 3 major questions to be solved:

  1. What are some visible trends in their smart device usage?

  2. How could these trends apply to Bellabeat Customers?

  3. And how could these trends help influence Bellabeat’s future marketing strategy?

Important People:

  • Urska Srsen (Cofounder and Chief Crative Officer of Bellabeat)
  • Sando Mur (Cofounder and Mathematicion of Bellabeat)
  • Bellabeat’s Marketing Team

PREPARE:

The data was given from Kaggle, which contains a total of 18 csv files. The CSV files contain duplicate information where one is saved in the Long format and the other in the Wide format. The licensing of this dataset follows the Public Domain CCO: Pubic Domain

About the Data:

Although the Kaggle page did not contain a description of the dataset, a data dictionary was found from an outside resource for the dataset (Fitabase Fitbit Data Dictionary as of 2:14:24).

The dataset contains personal fitness information that are tracked by the Bellabeat devices or manually inputted by the verified users. The total number of verified users in this dataset is said to be 30, with their age, name, and sex all kept unknown. All thirty of the users have consented to the submission of their personal tracked data, which includes: heart rate, active minute, active intensity, sleep measurement, steps, weight, and MET. Each specific user is identified with a unique numeric ID.

Limitations:

Because of the small sample size of 30 users and the ommitment of some of their personal information, we cannot fully say the upcoming analysis is a 100%. To do so, we would need a bigger pool of data that we can analyze.


PROCESS:

First, I downloaded the necessary libraries and then uploaded the CSV files that I wanted onto R.

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.3.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(tidyr)
library(readr)
library(lubridate)

dailyActivity <- read_csv("C:/Users/yosup/Downloads/archive (3)/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")
## Rows: 940 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): ActivityDate
## dbl (14): Id, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDi...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
weightLog <- read_csv("C:/Users/yosup/Downloads/archive (3)/Fitabase Data 4.12.16-5.12.16/weightLogInfo_merged.csv")
## Rows: 67 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Date
## dbl (6): Id, WeightKg, WeightPounds, Fat, BMI, LogId
## lgl (1): IsManualReport
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
sleepDay <- read_csv("C:/Users/yosup/Downloads/archive (3)/Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")
## Rows: 413 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): SleepDay
## dbl (4): Id, TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
heartrate_seconds <- read_csv("C:/Users/yosup/Downloads/archive (3)/Fitabase Data 4.12.16-5.12.16/heartrate_seconds_merged.csv")
## Rows: 2483658 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Time
## dbl (2): Id, Value
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
n_distinct(dailyActivity$Id)
## [1] 33
n_distinct(weightLog$Id)
## [1] 8
length(which(weightLog$IsManualReport==TRUE))
## [1] 41
length(weightLog$IsManualReport)
## [1] 67
sum(is.na(weightLog$Fat))
## [1] 65
date_range_sleepDay <- range(sleepDay$SleepDay)
date_range_sleepDay
## [1] "4/12/2016 12:00:00 AM" "5/9/2016 12:00:00 AM"
sdEntries <- sleepDay %>% count(Id)
sd(sdEntries$n)
## [1] 11.49661
mean(sleepDay$TotalMinutesAsleep)
## [1] 419.4673
hist(sleepDay$TotalMinutesAsleep)

mean(sleepDay$TotalTimeInBed)
## [1] 458.6392
hist(sleepDay$TotalTimeInBed)

sleepDay <- transform(sleepDay, TimeDiff = abs(sleepDay$TotalMinutesAsleep - sleepDay$TotalTimeInBed))
range(heartrate_seconds$Time)
## [1] "4/12/2016 1:00:00 AM" "5/9/2016 9:59:59 PM"
hist(heartrate_seconds$Value)

summary(heartrate_seconds$Value)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   36.00   63.00   73.00   77.33   88.00  203.00
sd(heartrate_seconds$Value)
## [1] 19.4045
max_min_heart8 <- heartrate_seconds %>% group_by(Id) %>% summarize(across(everything(), max))
max_min_heart8 <- max_min_heart8 %>% rename(maxTime = Time, maxValue = Value)
temp <- heartrate_seconds %>% group_by(Id) %>% summarize(across(everything(), min))

max_min_heart8 <- cbind(max_min_heart8, temp$Time, temp$Value)
max_min_heart8 <- max_min_heart8 %>% rename(minTime = `temp$Time`, minValue = `temp$Value`)
max_min_heart8 <- max_min_heart8 %>% relocate(minTime,  .before = maxValue)
 

max_min_heart8 <- cbind(max_min_heart8, heartrate_seconds %>% count(Id))
max_min_heart8 <- max_min_heart8 %>% select(-6)
max_min_heart8 <- max_min_heart8 %>% rename(Count = n)
max_min_heart8
##            Id              maxTime               minTime maxValue minValue
## 1  2022484408  5/9/2016 9:59:55 AM  4/12/2016 1:00:00 PM      203       38
## 2  2026352035  5/9/2016 7:49:45 PM  4/17/2016 5:30:20 AM      125       63
## 3  2347167796 4/29/2016 6:56:50 AM  4/12/2016 1:00:10 PM      195       49
## 4  4020332650  5/9/2016 9:59:59 PM  4/12/2016 1:00:00 AM      191       46
## 5  4388161847  5/9/2016 9:59:55 PM  4/13/2016 1:00:00 AM      180       39
## 6  4558609924  5/9/2016 9:59:55 AM  4/12/2016 1:00:00 PM      199       44
## 7  5553957443  5/9/2016 9:59:55 PM  4/12/2016 1:00:05 AM      165       47
## 8  5577150313  5/9/2016 9:59:50 PM  4/12/2016 1:00:00 AM      174       36
## 9  6117666160  5/9/2016 9:59:45 AM  4/15/2016 1:00:00 PM      189       52
## 10 6775888955  5/7/2016 9:59:55 AM 4/13/2016 10:00:00 PM      177       55
## 11 6962181067  5/9/2016 9:59:55 AM  4/12/2016 1:00:00 AM      184       47
## 12 7007744171  5/6/2016 9:59:50 AM  4/12/2016 1:00:00 PM      166       54
## 13 8792009665  5/4/2016 9:59:45 AM  4/12/2016 1:00:00 PM      158       43
## 14 8877689391  5/9/2016 9:59:56 PM  4/12/2016 1:00:05 PM      180       46
##     Count
## 1  154104
## 2    2490
## 3  152683
## 4  285461
## 5  249748
## 6  192168
## 7  255174
## 8  248560
## 9  158899
## 10  32771
## 11 266326
## 12 133592
## 13 122841
## 14 228841
maxmin_dailyActivity <- dailyActivity %>% group_by(Id) %>% summarize(across(everything(), max))
glimpse(dailyActivity)
## Rows: 940
## Columns: 15
## $ Id                       <dbl> 1503960366, 1503960366, 1503960366, 150396036…
## $ ActivityDate             <chr> "4/12/2016", "4/13/2016", "4/14/2016", "4/15/…
## $ TotalSteps               <dbl> 13162, 10735, 10460, 9762, 12669, 9705, 13019…
## $ TotalDistance            <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.8…
## $ TrackerDistance          <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.8…
## $ LoggedActivitiesDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ VeryActiveDistance       <dbl> 1.88, 1.57, 2.44, 2.14, 2.71, 3.19, 3.25, 3.5…
## $ ModeratelyActiveDistance <dbl> 0.55, 0.69, 0.40, 1.26, 0.41, 0.78, 0.64, 1.3…
## $ LightActiveDistance      <dbl> 6.06, 4.71, 3.91, 2.83, 5.04, 2.51, 4.71, 5.0…
## $ SedentaryActiveDistance  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ VeryActiveMinutes        <dbl> 25, 21, 30, 29, 36, 38, 42, 50, 28, 19, 66, 4…
## $ FairlyActiveMinutes      <dbl> 13, 19, 11, 34, 10, 20, 16, 31, 12, 8, 27, 21…
## $ LightlyActiveMinutes     <dbl> 328, 217, 181, 209, 221, 164, 233, 264, 205, …
## $ SedentaryMinutes         <dbl> 728, 776, 1218, 726, 773, 539, 1149, 775, 818…
## $ Calories                 <dbl> 1985, 1797, 1776, 1745, 1863, 1728, 1921, 203…
#filtering out all the unlogged days of activity
active_days <- filter(dailyActivity, dailyActivity$LoggedActivitiesDistance > 0)
length(active_days)
## [1] 15
#only 4 unique users had LoggedActivityDistance > 0
n_distinct(active_days$Id)
## [1] 4
sleepTemp <- sleepDay %>% separate(SleepDay, into = c("NewDate", "Hour", "AM_or_PM"), sep = " ") %>% rename
View(sleepTemp)
merged_data <- left_join(dailyActivity, sleepTemp, by = c('Id' = 'Id','ActivityDate' = 'NewDate'), relationship = "many-to-many") 
merged_data <- merged_data %>% relocate(ActivityDate, .before = TotalSteps)
n_distinct(merged_data$Id)
## [1] 33
summary(merged_data)
##        Id            ActivityDate         TotalSteps    TotalDistance   
##  Min.   :1.504e+09   Length:943         Min.   :    0   Min.   : 0.000  
##  1st Qu.:2.320e+09   Class :character   1st Qu.: 3795   1st Qu.: 2.620  
##  Median :4.445e+09   Mode  :character   Median : 7439   Median : 5.260  
##  Mean   :4.858e+09                      Mean   : 7652   Mean   : 5.503  
##  3rd Qu.:6.962e+09                      3rd Qu.:10734   3rd Qu.: 7.720  
##  Max.   :8.878e+09                      Max.   :36019   Max.   :28.030  
##                                                                         
##  TrackerDistance  LoggedActivitiesDistance VeryActiveDistance
##  Min.   : 0.000   Min.   :0.000            Min.   : 0.000    
##  1st Qu.: 2.620   1st Qu.:0.000            1st Qu.: 0.000    
##  Median : 5.260   Median :0.000            Median : 0.220    
##  Mean   : 5.489   Mean   :0.110            Mean   : 1.504    
##  3rd Qu.: 7.715   3rd Qu.:0.000            3rd Qu.: 2.065    
##  Max.   :28.030   Max.   :4.942            Max.   :21.920    
##                                                              
##  ModeratelyActiveDistance LightActiveDistance SedentaryActiveDistance
##  Min.   :0.0000           Min.   : 0.000      Min.   :0.000000       
##  1st Qu.:0.0000           1st Qu.: 1.950      1st Qu.:0.000000       
##  Median :0.2400           Median : 3.380      Median :0.000000       
##  Mean   :0.5709           Mean   : 3.349      Mean   :0.001601       
##  3rd Qu.:0.8050           3rd Qu.: 4.790      3rd Qu.:0.000000       
##  Max.   :6.4800           Max.   :10.710      Max.   :0.110000       
##                                                                      
##  VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes
##  Min.   :  0.00    Min.   :  0.00      Min.   :  0          Min.   :   0.0  
##  1st Qu.:  0.00    1st Qu.:  0.00      1st Qu.:127          1st Qu.: 729.0  
##  Median :  4.00    Median :  7.00      Median :199          Median :1057.0  
##  Mean   : 21.24    Mean   : 13.63      Mean   :193          Mean   : 990.4  
##  3rd Qu.: 32.00    3rd Qu.: 19.00      3rd Qu.:264          3rd Qu.:1229.0  
##  Max.   :210.00    Max.   :143.00      Max.   :518          Max.   :1440.0  
##                                                                             
##     Calories        Hour             AM_or_PM         TotalSleepRecords
##  Min.   :   0   Length:943         Length:943         Min.   :1.000    
##  1st Qu.:1830   Class :character   Class :character   1st Qu.:1.000    
##  Median :2140   Mode  :character   Mode  :character   Median :1.000    
##  Mean   :2308                                         Mean   :1.119    
##  3rd Qu.:2796                                         3rd Qu.:1.000    
##  Max.   :4900                                         Max.   :3.000    
##                                                       NA's   :530      
##  TotalMinutesAsleep TotalTimeInBed     TimeDiff     
##  Min.   : 58.0      Min.   : 61.0   Min.   :  0.00  
##  1st Qu.:361.0      1st Qu.:403.0   1st Qu.: 17.00  
##  Median :433.0      Median :463.0   Median : 25.00  
##  Mean   :419.5      Mean   :458.6   Mean   : 39.17  
##  3rd Qu.:490.0      3rd Qu.:526.0   3rd Qu.: 40.00  
##  Max.   :796.0      Max.   :961.0   Max.   :371.00  
##  NA's   :530        NA's   :530     NA's   :530
merged_data <- merged_data %>% mutate(Weekday = weekdays(mdy(ActivityDate)))
merged_data
## # A tibble: 943 × 22
##            Id ActivityDate TotalSteps TotalDistance TrackerDistance
##         <dbl> <chr>             <dbl>         <dbl>           <dbl>
##  1 1503960366 4/12/2016         13162          8.5             8.5 
##  2 1503960366 4/13/2016         10735          6.97            6.97
##  3 1503960366 4/14/2016         10460          6.74            6.74
##  4 1503960366 4/15/2016          9762          6.28            6.28
##  5 1503960366 4/16/2016         12669          8.16            8.16
##  6 1503960366 4/17/2016          9705          6.48            6.48
##  7 1503960366 4/18/2016         13019          8.59            8.59
##  8 1503960366 4/19/2016         15506          9.88            9.88
##  9 1503960366 4/20/2016         10544          6.68            6.68
## 10 1503960366 4/21/2016          9819          6.34            6.34
## # ℹ 933 more rows
## # ℹ 17 more variables: LoggedActivitiesDistance <dbl>,
## #   VeryActiveDistance <dbl>, ModeratelyActiveDistance <dbl>,
## #   LightActiveDistance <dbl>, SedentaryActiveDistance <dbl>,
## #   VeryActiveMinutes <dbl>, FairlyActiveMinutes <dbl>,
## #   LightlyActiveMinutes <dbl>, SedentaryMinutes <dbl>, Calories <dbl>,
## #   Hour <chr>, AM_or_PM <chr>, TotalSleepRecords <dbl>, …

ANALYZE:

df <- data.frame(table(merged_data$Weekday))
ggplot(df) + geom_col(mapping=aes(x = factor(Var1, level = c('Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday')),, y = Freq, fill = Var1)) + labs(title = "Amount Activity Per Date")

summary(df)
##         Var1        Freq      
##  Friday   :1   Min.   :121.0  
##  Monday   :1   1st Qu.:123.0  
##  Saturday :1   Median :126.0  
##  Sunday   :1   Mean   :134.7  
##  Thursday :1   3rd Qu.:149.0  
##  Tuesday  :1   Max.   :152.0  
##  Wednesday:1
hold_data <- merged_data %>% group_by(Id) %>% summarize(TotalStep = sum(TotalSteps))
hold_data <- hold_data %>% arrange(-TotalStep)

ggplot(merged_data ,mapping=aes(x= factor(Weekday, level=c('Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday')), y = TotalSteps, fill = Weekday)) + geom_col()+
  facet_grid(~factor(Id, levels = unique(hold_data$Id))) + theme(axis.text.x = element_blank()) + labs(title = "Total Steps per Day for Each Id")

ggplot(merged_data ,mapping=aes(x= factor(Weekday, level=c('Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday')), y = TotalSteps, fill = Weekday)) + geom_col()+
  facet_wrap(~factor(Id, levels = unique(hold_data$Id))) + theme(axis.text.x = element_blank()) + labs(title = "Total Steps per Day for Each Id")

ggplot(merged_data,mapping=aes(x= factor(Weekday, level=c('Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday')), y = TotalSteps, fill = Weekday)) + geom_col()+
  theme(axis.text.x = element_blank()) + labs(title = "Total Steps per Day")

ggplot(merged_data, mapping=aes(x = TimeDiff, y = TotalSteps)) + geom_point()+ labs(title = "Relationship between Time Awake in Bed and Total Steps") + geom_smooth()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## Warning: Removed 530 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 530 rows containing missing values (`geom_point()`).

merged_data <- transform(merged_data, TotalMin=(VeryActiveMinutes+FairlyActiveMinutes+LightlyActiveMinutes+SedentaryMinutes))
merged_data <- merged_data %>% relocate(TotalMin, .before = TotalSteps)
ggplot(merged_data, mapping=aes(x = TimeDiff, y = TotalMin)) + geom_point()+ labs(title = "Relationship between Time Awake in Bed and Total Steps") + geom_smooth()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## Warning: Removed 530 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 530 rows containing missing values (`geom_point()`).

ggplot(merged_data,mapping=aes(x= factor(Weekday, level=c('Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday')), y = TotalMin, fill = Weekday)) + geom_col()+
  theme(axis.text.x = element_blank()) + labs(title = "Total Minutes per Day")

library(cowplot)
## Warning: package 'cowplot' was built under R version 4.3.3
## 
## Attaching package: 'cowplot'
## The following object is masked from 'package:lubridate':
## 
##     stamp
View(merged_data)
TotalMinGraph <- ggplot(merged_data,mapping=aes(x= factor(Weekday, level=c('Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday')), y = TotalMin, fill = Weekday)) + geom_col()+
  theme(axis.text.x = element_blank()) + labs(title = "Total Minutes per Day", x = NULL)

TotalStepsGraph <- ggplot(merged_data,mapping=aes(x= factor(Weekday, level=c('Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday')), y = TotalSteps, fill = Weekday)) + geom_col()+
  theme(axis.text.x = element_blank()) + labs(title = "Total Steps per Day", x = NULL)

TotalDistGraph <- ggplot(merged_data,mapping=aes(x= factor(Weekday, level=c('Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday')), y = TotalDistance, fill = Weekday)) + geom_col()+
  theme(axis.text.x = element_blank()) + labs(title = "Total Distance per Day", x = NULL)


df <- data.frame(table(merged_data$Weekday))
TotalActGraph <- ggplot(df) + geom_col(mapping=aes(x = factor(Var1, level = c('Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday')),, y = Freq, fill = Var1)) + labs(title = "Amount Activity Per Date", x = NULL) + theme(axis.text.x = element_blank())
 

plot_grid(TotalMinGraph, TotalStepsGraph, TotalDistGraph, TotalActGraph, labels = "AUTO")

library(cowplot)
TimeDiff_n_TotalSteps <- ggplot(merged_data, mapping=aes(x = TimeDiff, y = TotalSteps)) + geom_point()+ labs(title = "Relationship between Time Awake in Bed and Total Steps") + geom_smooth()

TimeDiff_n_TotalDist <- ggplot(merged_data, mapping=aes(x = TimeDiff, y = TotalDistance)) + geom_point()+labs(title = "Relationship between Time Awake in Bed and Total Distance") + geom_smooth()

TimeDiff_n_TotalMin <- ggplot(merged_data, mapping=aes(x= TimeDiff, y = TotalMin)) + geom_point()+labs(title= "Relationship between Time Awake in Bed and Total Minutes") + geom_smooth()

cor.test(merged_data$TimeDiff, merged_data$TotalMin)
## 
##  Pearson's product-moment correlation
## 
## data:  merged_data$TimeDiff and merged_data$TotalMin
## t = -4.4746, df = 411, p-value = 9.925e-06
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.305665 -0.121561
## sample estimates:
##        cor 
## -0.2155274
cor.test(merged_data$TimeDiff, merged_data$TotalDistance)
## 
##  Pearson's product-moment correlation
## 
## data:  merged_data$TimeDiff and merged_data$TotalDistance
## t = 0.12105, df = 411, p-value = 0.9037
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.09057597  0.10240631
## sample estimates:
##         cor 
## 0.005970762
cor.test(merged_data$TimeDiff, merged_data$TotalSteps)
## 
##  Pearson's product-moment correlation
## 
## data:  merged_data$TimeDiff and merged_data$TotalSteps
## t = 0.54976, df = 411, p-value = 0.5828
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.06956893  0.12327966
## sample estimates:
##        cor 
## 0.02710758
plot_grid(TimeDiff_n_TotalSteps, TimeDiff_n_TotalDist, TimeDiff_n_TotalMin, labels = "AUTO")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## Warning: Removed 530 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 530 rows containing missing values (`geom_point()`).
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## Warning: Removed 530 rows containing non-finite values (`stat_smooth()`).
## Removed 530 rows containing missing values (`geom_point()`).
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## Warning: Removed 530 rows containing non-finite values (`stat_smooth()`).
## Removed 530 rows containing missing values (`geom_point()`).

TimeDiff_n_TotalSteps <- ggplot(merged_data, mapping=aes(x = TotalMinutesAsleep, y = TotalSteps)) + geom_point()+ labs(title = "Relationship between Time Awake in Bed and Total Steps") + geom_smooth()

TimeDiff_n_TotalDist <- ggplot(merged_data, mapping=aes(x = TotalMinutesAsleep, y = TotalDistance)) + geom_point()+labs(title = "Relationship between Time Awake in Bed and Total Distance") + geom_smooth()

TimeDiff_n_TotalMin <- ggplot(merged_data, mapping=aes(x= TotalMinutesAsleep, y = TotalMin)) + geom_point()+labs(title= "Relationship between Time Awake in Bed and Total Minutes") + geom_smooth()

cor.test(merged_data$TotalMinutesAsleep, merged_data$TotalMin)
## 
##  Pearson's product-moment correlation
## 
## data:  merged_data$TotalMinutesAsleep and merged_data$TotalMin
## t = -16.456, df = 411, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.6850669 -0.5683003
## sample estimates:
##        cor 
## -0.6302341
cor.test(merged_data$TotalMinutesAsleep, merged_data$TotalDistance)
## 
##  Pearson's product-moment correlation
## 
## data:  merged_data$TotalMinutesAsleep and merged_data$TotalDistance
## t = -3.5428, df = 411, p-value = 0.0004414
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.26424789 -0.07692598
## sample estimates:
##        cor 
## -0.1721427
cor.test(merged_data$TotalMinutesAsleep, merged_data$TotalSteps)
## 
##  Pearson's product-moment correlation
## 
## data:  merged_data$TotalMinutesAsleep and merged_data$TotalSteps
## t = -3.8563, df = 411, p-value = 0.0001336
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.27834209 -0.09203143
## sample estimates:
##        cor 
## -0.1868665
plot_grid(TimeDiff_n_TotalSteps, TimeDiff_n_TotalDist, TimeDiff_n_TotalMin, labels = "AUTO")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## Warning: Removed 530 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 530 rows containing missing values (`geom_point()`).
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## Warning: Removed 530 rows containing non-finite values (`stat_smooth()`).
## Removed 530 rows containing missing values (`geom_point()`).
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## Warning: Removed 530 rows containing non-finite values (`stat_smooth()`).
## Removed 530 rows containing missing values (`geom_point()`).

TimeDiff_n_TotalSteps <- ggplot(merged_data, mapping=aes(x = TotalTimeInBed, y = TotalSteps)) + geom_point()+ labs(title = "Relationship between Time Awake in Bed and Total Steps") + geom_smooth()

TimeDiff_n_TotalDist <- ggplot(merged_data, mapping=aes(x = TotalTimeInBed, y = TotalDistance)) + geom_point()+labs(title = "Relationship between Time Awake in Bed and Total Distance") + geom_smooth()

TimeDiff_n_TotalMin <- ggplot(merged_data, mapping=aes(x= TotalTimeInBed, y = TotalMin)) + geom_point()+labs(title= "Relationship between Time Awake in Bed and Total Minutes") + geom_smooth()

cor.test(merged_data$TotalTimeInBed, merged_data$TotalMin)
## 
##  Pearson's product-moment correlation
## 
## data:  merged_data$TotalTimeInBed and merged_data$TotalMin
## t = -18.09, df = 411, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.7162610 -0.6083721
## sample estimates:
##        cor 
## -0.6657822
cor.test(merged_data$TotalTimeInBed, merged_data$TotalDistance)
## 
##  Pearson's product-moment correlation
## 
## data:  merged_data$TotalTimeInBed and merged_data$TotalDistance
## t = -3.2459, df = 411, p-value = 0.001267
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.25076397 -0.06255465
## sample estimates:
##        cor 
## -0.1580949
cor.test(merged_data$TotalTimeInBed, merged_data$TotalSteps)
## 
##  Pearson's product-moment correlation
## 
## data:  merged_data$TotalTimeInBed and merged_data$TotalSteps
## t = -3.3717, df = 411, p-value = 0.0008178
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.25649374 -0.06865199
## sample estimates:
##        cor 
## -0.1640597
plot_grid(TimeDiff_n_TotalSteps, TimeDiff_n_TotalDist, TimeDiff_n_TotalMin, labels = "AUTO")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## Warning: Removed 530 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 530 rows containing missing values (`geom_point()`).
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## Warning: Removed 530 rows containing non-finite values (`stat_smooth()`).
## Removed 530 rows containing missing values (`geom_point()`).
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## Warning: Removed 530 rows containing non-finite values (`stat_smooth()`).
## Removed 530 rows containing missing values (`geom_point()`).


SHARE

Call To Action: Optimizing the Usage of the Bellabeat Devices

We found out that there was a huge discrepancy between the users in the activity recorded by the Bellabeat devices. In other to combat this. We also found out that there was a consistent difference in the usage and the activeness of the users recorded. Therefore, in order to push for more usage of their devices I suggest:

  • Bellabeat should push for an app, downloadable on mobile devices, that connects to the or all of the Bellabeat fitness devices that the customer has. Then by using both the app and the devices, they should send alerts and notification for the users that seem to be less active than most of the other users.

  • Bellabeat can also choose the day to send the alerts. We saw that the users were most active on Tuesday, Wednesday, and Thursday, and less active on Friday, Saturday, Sunday, and Monday (least active on Sunday and/or Monday). Therefore, Bellabeat should put focus on sending alerts to their users on Sunday and Monday (regardless of their activeness recorded) and also send alerts on Friday and Saturday.

  • Bellabeat can also build a feature asking the users on how active they desire to be. Depending on the answer from the users, we can send them a daily report of how much they slept and how active they were.

    • For example, if the user stated they want to be active for at least 1,000 minutes, their Bellabeat device could recommend sleeping no more than 50 minutes idling in bed.