1. Summary

Bellabeat is a cutting edge organization that fabricates wellbeing centered shrewd items. They offer different brilliant gadgets that gather information on action, rest, stress, and conceptive wellbeing to enable ladies with information about their own wellbeing and propensities.

The principal focal point of this case is to break down savvy gadgets wellness information and decide how it could assist with opening new learning experiences for Bellabeat. One of Bellabeat’s products will be our focus: Bellabeat application.

The Bellabeat application gives clients wellbeing information connected with their movement, rest, stress, monthly cycle, and care propensities. This information can assist users with better figuring out their ongoing propensities and settle on solid choices. The Bellabeat application associates with their line of brilliant health items

2. Ask Phase

Business Task

Identify trends in how consumers use non-Bellabeat smart devices to apply insights into Bellabeat’s marketing strategy.

Stakeholders

Urška Sršen - Bellabeat cofounder and Chief Creative Officer

Sando Mur - Bellabeat cofounder and key member of Bellabeat executive team

Bellabeat Marketing Analytics team

3. Prepare Phase

Dataset used: The data source utilised for our contextual analysis is FitBit Wellness Tracker data. This dataset is put away in Kaggle and was made accessible through Mobius.

Accessibility and privacy of data: Checking the metadata of our dataset we can affirm it is open-source. In order to dedicate the work to the public domain, the owner has waived all copyright-related rights, as well as any related or neighboring rights, to the fullest extent permitted by law. You can duplicate, alter, appropriate and play out the work, in any event, for business purposes, all without asking authorization.

Information about our dataset: Between 03.12 and 05.12 of 2016, respondents to a distributed survey used Amazon Mechanical Turk to create these datasets. Personal tracker data, including minute-level output for monitoring physical activity, heart rate, and sleep, was consented to by thirty eligible Fitbit users. The use of various Fitbit trackers and individual tracking behaviors and preferences are represented by output variation.

Data Organization and Verification: Accessible to us are 18 CSV reports. Each archive addresses different quantitative data followed by Fitbit. The data is viewed as since a long time ago each column is one time point for every subject, so each subject will have information in different lines. Each client has an exceptional ID and various lines since information is followed by day and time.

I made Excel Pivot Tables by sorting and filtering tables because the sample was so small. I had the option to check ascribes and perceptions of each table and relations between tables. Counted test size (clients) of each table and checked time length of investigation - 31days.

library(readr)
library(kableExtra)
data <- read_csv("data_organization_and_verification.csv")
## Rows: 18 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Table Name, Type, Description
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
kable(data)
Table Name Type Description
dailyActivity_merged Microsoft Excel CSV Daily Activity over 31 days of 33 users. Tracking daily: Steps, Distance, Intensities, Calories
dailyCalories_merged Microsoft Excel CSV Daily Calories over 31 days of 33 users
dailyIntensities_merged Microsoft Excel CSV Daily Intensity over 31 days of 33 users. Measured in Minutes and Distance, dividing groups in 4 categories: Sedentary, Lightly Active, Fairly Active,Very Active
dailySteps_merged Microsoft Excel CSV Daily Steps over 31 days of 33 users
heartrate_seconds_merged Microsoft Excel CSV Exact day and time heartrate logs for just 7 users
hourlyCalories_merged Microsoft Excel CSV Hourly Calories burned over 31 days of 33 users
hourlyIntensities_merged Microsoft Excel CSV Hourly total and average intensity over 31 days of 33 users
hourlySteps_merged Microsoft Excel CSV Hourly Steps over 31 days of 33 users
minuteCaloriesNarrow_merged Microsoft Excel CSV Calories burned every minute over 31 days of 33 users (Every minute in single row)
minuteCaloriesWide_merged Microsoft Excel CSV Calories burned every minute over 31 days of 33 users (Every minute in single column)
minuteIntensitiesNarrow_merged Microsoft Excel CSV Intensity counted by minute over 31 days of 33 users (Every minute in single row)
minuteIntensitiesWide_merged Microsoft Excel CSV Intensity counted by minute over 31 days of 33 users (Every minute in single column)
minuteMETsNarrow_merged Microsoft Excel CSV Ratio of the energy you are using in a physical activity compared to the energy you would use at rest. Counted in minutes
minuteSleep_merged Microsoft Excel CSV Log Sleep by Minute for 24 users over 31 days. Value column not specified
minuteStepsNarrow_merged Microsoft Excel CSV Steps tracked every minute over 31 days of 33 users (Every minute in single row)
minuteStepsWide_merged Microsoft Excel CSV Steps tracked every minute over 31 days of 33 users (Every minute in single column)
sleepDay_merged Microsoft Excel CSV Daily sleep logs, tracked by: Total count of sleeps a day, Total minutes, Total Time in Bed
weightLogInfo_merged Microsoft Excel CSV Weight track by day in Kg and Pounds over 30 days. Calculation of BMI.5 users report weight manually 3 users not.In total there are 8 users

Data Credibility and Integrity: Because of the limit of size (30 clients) and not having any segment data we could experience an examining inclination. We are don’t know whether the example is illustrative of the populace all in all. Another issue we would experience is that the dataset isn’t current and furthermore the time restriction of the study (2 months in length). To that end we will give our contextual investigation a functional methodology.

4. Process Phase

I will focus my analysis in R due to the accessibility, amount of data and to be able to create data visualization to share my results with stakeholders.

Installing packages and opening libraries We will pick the bundles that will help us on our examination and open them. For our analysis, we will make use of the following packages:

library(ggpubr)
## Loading required package: ggplot2
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ stringr   1.5.0
## ✔ forcats   1.0.0     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter()     masks stats::filter()
## ✖ dplyr::group_rows() masks kableExtra::group_rows()
## ✖ dplyr::lag()        masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(here)
## here() starts at /cloud/project
library(skimr)
library(janitor)
## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(lubridate)
library(ggrepel)
library(magrittr)
## 
## Attaching package: 'magrittr'
## 
## The following object is masked from 'package:purrr':
## 
##     set_names
## 
## The following object is masked from 'package:tidyr':
## 
##     extract
library('dplyr')
library(ggplot2)

Importing datasets

Knowing the datasets we have, we will upload the datasets that will help us answer our business task. On our analysis we will focus on the following datasets:

Daily_activity Daily_sleep Hourly_steps Due to the the small sample we won’t consider for this analysis Weight (8 Users) and heart rate (7 users)

daily_activity <- read_csv(file= "./Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")
## Rows: 940 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): ActivityDate
## dbl (14): Id, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDi...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
daily_sleep <- read_csv(file= "./Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")
## Rows: 413 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): SleepDay
## dbl (4): Id, TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
hourly_steps <- read_csv("./Fitabase Data 4.12.16-5.12.16/hourlySteps_merged.csv")
## Rows: 22099 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityHour
## dbl (2): Id, StepTotal
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Preview our datasets ¶ We will preview our selected data frames and check the summary of each column.

head(daily_activity)
## # A tibble: 6 × 15
##           Id ActivityDate TotalSteps TotalDistance TrackerDistance
##        <dbl> <chr>             <dbl>         <dbl>           <dbl>
## 1 1503960366 4/12/2016         13162          8.5             8.5 
## 2 1503960366 4/13/2016         10735          6.97            6.97
## 3 1503960366 4/14/2016         10460          6.74            6.74
## 4 1503960366 4/15/2016          9762          6.28            6.28
## 5 1503960366 4/16/2016         12669          8.16            8.16
## 6 1503960366 4/17/2016          9705          6.48            6.48
## # ℹ 10 more variables: LoggedActivitiesDistance <dbl>,
## #   VeryActiveDistance <dbl>, ModeratelyActiveDistance <dbl>,
## #   LightActiveDistance <dbl>, SedentaryActiveDistance <dbl>,
## #   VeryActiveMinutes <dbl>, FairlyActiveMinutes <dbl>,
## #   LightlyActiveMinutes <dbl>, SedentaryMinutes <dbl>, Calories <dbl>
str(daily_activity)
## spc_tbl_ [940 × 15] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Id                      : num [1:940] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityDate            : chr [1:940] "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
##  $ TotalSteps              : num [1:940] 13162 10735 10460 9762 12669 ...
##  $ TotalDistance           : num [1:940] 8.5 6.97 6.74 6.28 8.16 ...
##  $ TrackerDistance         : num [1:940] 8.5 6.97 6.74 6.28 8.16 ...
##  $ LoggedActivitiesDistance: num [1:940] 0 0 0 0 0 0 0 0 0 0 ...
##  $ VeryActiveDistance      : num [1:940] 1.88 1.57 2.44 2.14 2.71 ...
##  $ ModeratelyActiveDistance: num [1:940] 0.55 0.69 0.4 1.26 0.41 ...
##  $ LightActiveDistance     : num [1:940] 6.06 4.71 3.91 2.83 5.04 ...
##  $ SedentaryActiveDistance : num [1:940] 0 0 0 0 0 0 0 0 0 0 ...
##  $ VeryActiveMinutes       : num [1:940] 25 21 30 29 36 38 42 50 28 19 ...
##  $ FairlyActiveMinutes     : num [1:940] 13 19 11 34 10 20 16 31 12 8 ...
##  $ LightlyActiveMinutes    : num [1:940] 328 217 181 209 221 164 233 264 205 211 ...
##  $ SedentaryMinutes        : num [1:940] 728 776 1218 726 773 ...
##  $ Calories                : num [1:940] 1985 1797 1776 1745 1863 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Id = col_double(),
##   ..   ActivityDate = col_character(),
##   ..   TotalSteps = col_double(),
##   ..   TotalDistance = col_double(),
##   ..   TrackerDistance = col_double(),
##   ..   LoggedActivitiesDistance = col_double(),
##   ..   VeryActiveDistance = col_double(),
##   ..   ModeratelyActiveDistance = col_double(),
##   ..   LightActiveDistance = col_double(),
##   ..   SedentaryActiveDistance = col_double(),
##   ..   VeryActiveMinutes = col_double(),
##   ..   FairlyActiveMinutes = col_double(),
##   ..   LightlyActiveMinutes = col_double(),
##   ..   SedentaryMinutes = col_double(),
##   ..   Calories = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
head(daily_sleep)
## # A tibble: 6 × 5
##           Id SleepDay        TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
##        <dbl> <chr>                       <dbl>              <dbl>          <dbl>
## 1 1503960366 4/12/2016 12:0…                 1                327            346
## 2 1503960366 4/13/2016 12:0…                 2                384            407
## 3 1503960366 4/15/2016 12:0…                 1                412            442
## 4 1503960366 4/16/2016 12:0…                 2                340            367
## 5 1503960366 4/17/2016 12:0…                 1                700            712
## 6 1503960366 4/19/2016 12:0…                 1                304            320
str(daily_sleep)
## spc_tbl_ [413 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Id                : num [1:413] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ SleepDay          : chr [1:413] "4/12/2016 12:00:00 AM" "4/13/2016 12:00:00 AM" "4/15/2016 12:00:00 AM" "4/16/2016 12:00:00 AM" ...
##  $ TotalSleepRecords : num [1:413] 1 2 1 2 1 1 1 1 1 1 ...
##  $ TotalMinutesAsleep: num [1:413] 327 384 412 340 700 304 360 325 361 430 ...
##  $ TotalTimeInBed    : num [1:413] 346 407 442 367 712 320 377 364 384 449 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Id = col_double(),
##   ..   SleepDay = col_character(),
##   ..   TotalSleepRecords = col_double(),
##   ..   TotalMinutesAsleep = col_double(),
##   ..   TotalTimeInBed = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
head(hourly_steps)
## # A tibble: 6 × 3
##           Id ActivityHour          StepTotal
##        <dbl> <chr>                     <dbl>
## 1 1503960366 4/12/2016 12:00:00 AM       373
## 2 1503960366 4/12/2016 1:00:00 AM        160
## 3 1503960366 4/12/2016 2:00:00 AM        151
## 4 1503960366 4/12/2016 3:00:00 AM          0
## 5 1503960366 4/12/2016 4:00:00 AM          0
## 6 1503960366 4/12/2016 5:00:00 AM          0
str(hourly_steps)
## spc_tbl_ [22,099 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Id          : num [1:22099] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityHour: chr [1:22099] "4/12/2016 12:00:00 AM" "4/12/2016 1:00:00 AM" "4/12/2016 2:00:00 AM" "4/12/2016 3:00:00 AM" ...
##  $ StepTotal   : num [1:22099] 373 160 151 0 0 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Id = col_double(),
##   ..   ActivityHour = col_character(),
##   ..   StepTotal = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

Cleaning and formatting Now that we got to find out about our information structures we will handle them to search for any mistakes and irregularities.

Verifying number of users by finding length of their unique values Before we go on with our cleaning we need to ensure the number of special clients that are per information outline. Despite the fact that 30 is the negligible example size we will in any case keep the rest dataset for training as it were.

unique(daily_activity$Id)
##  [1] 1503960366 1624580081 1644430081 1844505072 1927972279 2022484408
##  [7] 2026352035 2320127002 2347167796 2873212765 3372868164 3977333714
## [13] 4020332650 4057192912 4319703577 4388161847 4445114986 4558609924
## [19] 4702921684 5553957443 5577150313 6117666160 6290855005 6775888955
## [25] 6962181067 7007744171 7086361926 8053475328 8253242879 8378563200
## [31] 8583815059 8792009665 8877689391
unique(daily_sleep$Id)
##  [1] 1503960366 1644430081 1844505072 1927972279 2026352035 2320127002
##  [7] 2347167796 3977333714 4020332650 4319703577 4388161847 4445114986
## [13] 4558609924 4702921684 5553957443 5577150313 6117666160 6775888955
## [19] 6962181067 7007744171 7086361926 8053475328 8378563200 8792009665
unique(hourly_steps$Id)
##  [1] 1503960366 1624580081 1644430081 1844505072 1927972279 2022484408
##  [7] 2026352035 2320127002 2347167796 2873212765 3372868164 3977333714
## [13] 4020332650 4057192912 4319703577 4388161847 4445114986 4558609924
## [19] 4702921684 5553957443 5577150313 6117666160 6290855005 6775888955
## [25] 6962181067 7007744171 7086361926 8053475328 8253242879 8378563200
## [31] 8583815059 8792009665 8877689391
length(unique(daily_activity$Id))
## [1] 33
length(unique(daily_sleep$Id))
## [1] 24
length(unique(hourly_steps$Id))
## [1] 33

Duplicates I now look for any duplicates:

sum(duplicated(daily_activity))
## [1] 0
sum(duplicated(daily_sleep))
## [1] 3
sum(duplicated(hourly_steps))
## [1] 0

Remove duplicates and N/A Knowing the length of our observations (daily_sleep 413) we are able to delete duplicates for daily_sleep.

daily_activity <- daily_activity %>%
  distinct() %>% 
  drop_na()

daily_sleep <- daily_sleep %>%
  distinct() %>%
  drop_na()

hourly_steps <- hourly_steps %>%
  distinct() %>%
  drop_na()

We will verify that duplicates have been removed

sum(duplicated(daily_sleep))
## [1] 0
sum(duplicated(daily_activity))
## [1] 0
sum(duplicated(hourly_steps))
## [1] 0

Clean and rename columns We need to guarantee that segment names are involving right grammar and same organization in all datasets since we will consolidate them later on. We are changing the organization, everything being equal, to bring lower case.

clean_names(daily_activity)
## # A tibble: 940 × 15
##            id activity_date total_steps total_distance tracker_distance
##         <dbl> <chr>               <dbl>          <dbl>            <dbl>
##  1 1503960366 4/12/2016           13162           8.5              8.5 
##  2 1503960366 4/13/2016           10735           6.97             6.97
##  3 1503960366 4/14/2016           10460           6.74             6.74
##  4 1503960366 4/15/2016            9762           6.28             6.28
##  5 1503960366 4/16/2016           12669           8.16             8.16
##  6 1503960366 4/17/2016            9705           6.48             6.48
##  7 1503960366 4/18/2016           13019           8.59             8.59
##  8 1503960366 4/19/2016           15506           9.88             9.88
##  9 1503960366 4/20/2016           10544           6.68             6.68
## 10 1503960366 4/21/2016            9819           6.34             6.34
## # ℹ 930 more rows
## # ℹ 10 more variables: logged_activities_distance <dbl>,
## #   very_active_distance <dbl>, moderately_active_distance <dbl>,
## #   light_active_distance <dbl>, sedentary_active_distance <dbl>,
## #   very_active_minutes <dbl>, fairly_active_minutes <dbl>,
## #   lightly_active_minutes <dbl>, sedentary_minutes <dbl>, calories <dbl>
daily_activity<- rename_with(daily_activity, tolower)
clean_names(daily_sleep)
## # A tibble: 410 × 5
##          id sleep_day total_sleep_records total_minutes_asleep total_time_in_bed
##       <dbl> <chr>                   <dbl>                <dbl>             <dbl>
##  1   1.50e9 4/12/201…                   1                  327               346
##  2   1.50e9 4/13/201…                   2                  384               407
##  3   1.50e9 4/15/201…                   1                  412               442
##  4   1.50e9 4/16/201…                   2                  340               367
##  5   1.50e9 4/17/201…                   1                  700               712
##  6   1.50e9 4/19/201…                   1                  304               320
##  7   1.50e9 4/20/201…                   1                  360               377
##  8   1.50e9 4/21/201…                   1                  325               364
##  9   1.50e9 4/23/201…                   1                  361               384
## 10   1.50e9 4/24/201…                   1                  430               449
## # ℹ 400 more rows
daily_sleep <- rename_with(daily_sleep, tolower)
clean_names(hourly_steps)
## # A tibble: 22,099 × 3
##            id activity_hour         step_total
##         <dbl> <chr>                      <dbl>
##  1 1503960366 4/12/2016 12:00:00 AM        373
##  2 1503960366 4/12/2016 1:00:00 AM         160
##  3 1503960366 4/12/2016 2:00:00 AM         151
##  4 1503960366 4/12/2016 3:00:00 AM           0
##  5 1503960366 4/12/2016 4:00:00 AM           0
##  6 1503960366 4/12/2016 5:00:00 AM           0
##  7 1503960366 4/12/2016 6:00:00 AM           0
##  8 1503960366 4/12/2016 7:00:00 AM           0
##  9 1503960366 4/12/2016 8:00:00 AM         250
## 10 1503960366 4/12/2016 9:00:00 AM        1864
## # ℹ 22,089 more rows
hourly_steps <- rename_with(hourly_steps, tolower)

Consistency of date and time columns Since we have confirmed our segment names and change them to bring down case, we will zero in on cleaning date-time design for daily_activity and daily_sleep since we will consolidate the two information outlines. Since we can dismiss the time on daily_sleep information outline we are utilizing as_date rather as as_datetime

daily_activity <- daily_activity %>%
  rename(date = activitydate) %>%
  mutate(date = as_date(date, format = "%m/%d/%Y"))

daily_sleep <- daily_sleep %>%
  rename(date = sleepday) %>%
  mutate(date = as_date(date,format ="%m/%d/%Y %I:%M:%S %p" , tz=Sys.timezone()))
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `date = as_date(date, format = "%m/%d/%Y %I:%M:%S %p", tz =
##   Sys.timezone())`.
## Caused by warning:
## ! `tz` argument is ignored by `as_date()`

We will check our cleaned datasets

head(daily_activity)
## # A tibble: 6 × 15
##           id date       totalsteps totaldistance trackerdistance
##        <dbl> <date>          <dbl>         <dbl>           <dbl>
## 1 1503960366 2016-04-12      13162          8.5             8.5 
## 2 1503960366 2016-04-13      10735          6.97            6.97
## 3 1503960366 2016-04-14      10460          6.74            6.74
## 4 1503960366 2016-04-15       9762          6.28            6.28
## 5 1503960366 2016-04-16      12669          8.16            8.16
## 6 1503960366 2016-04-17       9705          6.48            6.48
## # ℹ 10 more variables: loggedactivitiesdistance <dbl>,
## #   veryactivedistance <dbl>, moderatelyactivedistance <dbl>,
## #   lightactivedistance <dbl>, sedentaryactivedistance <dbl>,
## #   veryactiveminutes <dbl>, fairlyactiveminutes <dbl>,
## #   lightlyactiveminutes <dbl>, sedentaryminutes <dbl>, calories <dbl>
head(daily_sleep)
## # A tibble: 6 × 5
##           id date       totalsleeprecords totalminutesasleep totaltimeinbed
##        <dbl> <date>                 <dbl>              <dbl>          <dbl>
## 1 1503960366 2016-04-12                 1                327            346
## 2 1503960366 2016-04-13                 2                384            407
## 3 1503960366 2016-04-15                 1                412            442
## 4 1503960366 2016-04-16                 2                340            367
## 5 1503960366 2016-04-17                 1                700            712
## 6 1503960366 2016-04-19                 1                304            320

For our hourly_steps dataset we will convert date string to date-time.

hourly_steps<- hourly_steps %>% 
  rename(date_time = activityhour) %>% 
  mutate(date_time = as.POSIXct(date_time,format ="%m/%d/%Y %I:%M:%S %p" , tz=Sys.timezone()))

head(hourly_steps)
## # A tibble: 6 × 3
##           id date_time           steptotal
##        <dbl> <dttm>                  <dbl>
## 1 1503960366 2016-04-12 00:00:00       373
## 2 1503960366 2016-04-12 01:00:00       160
## 3 1503960366 2016-04-12 02:00:00       151
## 4 1503960366 2016-04-12 03:00:00         0
## 5 1503960366 2016-04-12 04:00:00         0
## 6 1503960366 2016-04-12 05:00:00         0

Merging Datasets We will merge daily_activity and daily_sleep to see any correlation between variables by using id and date as their primary keys.

daily_activity_sleep <- merge(daily_activity, daily_sleep, by=c ("id", "date"))
glimpse(daily_activity_sleep)
## Rows: 410
## Columns: 18
## $ id                       <dbl> 1503960366, 1503960366, 1503960366, 150396036…
## $ date                     <date> 2016-04-12, 2016-04-13, 2016-04-15, 2016-04-…
## $ totalsteps               <dbl> 13162, 10735, 9762, 12669, 9705, 15506, 10544…
## $ totaldistance            <dbl> 8.50, 6.97, 6.28, 8.16, 6.48, 9.88, 6.68, 6.3…
## $ trackerdistance          <dbl> 8.50, 6.97, 6.28, 8.16, 6.48, 9.88, 6.68, 6.3…
## $ loggedactivitiesdistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ veryactivedistance       <dbl> 1.88, 1.57, 2.14, 2.71, 3.19, 3.53, 1.96, 1.3…
## $ moderatelyactivedistance <dbl> 0.55, 0.69, 1.26, 0.41, 0.78, 1.32, 0.48, 0.3…
## $ lightactivedistance      <dbl> 6.06, 4.71, 2.83, 5.04, 2.51, 5.03, 4.24, 4.6…
## $ sedentaryactivedistance  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ veryactiveminutes        <dbl> 25, 21, 29, 36, 38, 50, 28, 19, 41, 39, 73, 3…
## $ fairlyactiveminutes      <dbl> 13, 19, 34, 10, 20, 31, 12, 8, 21, 5, 14, 23,…
## $ lightlyactiveminutes     <dbl> 328, 217, 209, 221, 164, 264, 205, 211, 262, …
## $ sedentaryminutes         <dbl> 728, 776, 726, 773, 539, 775, 818, 838, 732, …
## $ calories                 <dbl> 1985, 1797, 1745, 1863, 1728, 2035, 1786, 177…
## $ totalsleeprecords        <dbl> 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ totalminutesasleep       <dbl> 327, 384, 412, 340, 700, 304, 360, 325, 361, …
## $ totaltimeinbed           <dbl> 346, 407, 442, 367, 712, 320, 377, 364, 384, …

Analyze Phase and Share Phase We will analyze trends of the users of FitBit and determine if that can help us on BellaBeat’s marketing strategy.

Type of users per activity level since we don’t have any demographic variables from our sample we want to determine the type of users with the data we have. We can classify the users by activity considering the daily amount of steps. We can categorize users as follows:

Sedentary - Less than 5000 steps a day. Lightly active - Between 5000 and 7499 steps a day. Fairly active - Between 7500 and 9999 steps a day. Very active - More than 10000 steps a day.

First we will calculate the daily steps average by user.

daily_average <- daily_activity_sleep %>%
  group_by(id) %>%
  summarise (mean_daily_steps = mean(totalsteps), mean_daily_calories = mean(calories), mean_daily_sleep = mean(totalminutesasleep))

head(daily_average)
## # A tibble: 6 × 4
##           id mean_daily_steps mean_daily_calories mean_daily_sleep
##        <dbl>            <dbl>               <dbl>            <dbl>
## 1 1503960366           12406.               1872.             360.
## 2 1644430081            7968.               2978.             294 
## 3 1844505072            3477                1676.             652 
## 4 1927972279            1490                2316.             417 
## 5 2026352035            5619.               1541.             506.
## 6 2320127002            5079                1804               61

We will now classify our users by the daily average steps.

user_type <- daily_average %>%
  mutate(user_type = case_when(
    mean_daily_steps < 5000 ~ "sedentary",
    mean_daily_steps >= 5000 & mean_daily_steps < 7499 ~ "lightly active", 
    mean_daily_steps >= 7500 & mean_daily_steps < 9999 ~ "fairly active", 
    mean_daily_steps >= 10000 ~ "very active"
  ))

head(user_type)
## # A tibble: 6 × 5
##           id mean_daily_steps mean_daily_calories mean_daily_sleep user_type    
##        <dbl>            <dbl>               <dbl>            <dbl> <chr>        
## 1 1503960366           12406.               1872.             360. very active  
## 2 1644430081            7968.               2978.             294  fairly active
## 3 1844505072            3477                1676.             652  sedentary    
## 4 1927972279            1490                2316.             417  sedentary    
## 5 2026352035            5619.               1541.             506. lightly acti…
## 6 2320127002            5079                1804               61  lightly acti…

Now that we have a new column with the user type we will create a data frame with the percentage of each user type to better visualize them on a graph.

user_type_percent <- user_type %>%
  group_by(user_type) %>%
  summarise(total = n()) %>%
  mutate(totals = sum(total)) %>%
  group_by(user_type) %>%
  summarise(total_percent = total / totals) %>%
  mutate(labels = scales::percent(total_percent))

user_type_percent$user_type <- factor(user_type_percent$user_type , levels = c("very active", "fairly active", "lightly active", "sedentary"))


head(user_type_percent)
## # A tibble: 4 × 3
##   user_type      total_percent labels
##   <fct>                  <dbl> <chr> 
## 1 fairly active          0.375 38%   
## 2 lightly active         0.208 21%   
## 3 sedentary              0.208 21%   
## 4 very active            0.208 21%

Below we can see that users are fairly distributed by their activity considering the daily amount of steps. We can determine that based on users activity all kind of users wear smart-devices.

user_type_percent %>%
  ggplot(aes(x="",y=total_percent, fill=user_type)) +
  geom_bar(stat = "identity", width = 1)+
  coord_polar("y", start=0)+
  theme_minimal()+
  theme(axis.title.x= element_blank(),
        axis.title.y = element_blank(),
        panel.border = element_blank(), 
        panel.grid = element_blank(), 
        axis.ticks = element_blank(),
        axis.text.x = element_blank(),
        plot.title = element_text(hjust = 0.5, size=14, face = "bold")) +
  scale_fill_manual(values = c("#85e085","#e6e600", "#ffd480", "#ff8080")) +
  geom_text(aes(label = labels),
            position = position_stack(vjust = 0.5))+
  labs(title="User type distribution")

Steps and minutes asleep per weekday We want to know now what days of the week are the users more active and also what days of the week users sleep more. We will also verify if the users walk the recommended amount of steps and have the recommended amount of sleep.

Below we are calculating the weekdays based on our column date. We are also calculating the average steps walked and minutes sleeped by weekday.

weekday_steps_sleep <- daily_activity_sleep %>%
  mutate(weekday = weekdays(date))

weekday_steps_sleep$weekday <-ordered(weekday_steps_sleep$weekday, levels=c("Monday", "Tuesday", "Wednesday", "Thursday",
"Friday", "Saturday", "Sunday"))

 weekday_steps_sleep <-weekday_steps_sleep%>%
  group_by(weekday) %>%
  summarize (daily_steps = mean(totalsteps), daily_sleep = mean(totalminutesasleep))

head(weekday_steps_sleep)
## # A tibble: 6 × 3
##   weekday   daily_steps daily_sleep
##   <ord>           <dbl>       <dbl>
## 1 Monday          9273.        420.
## 2 Tuesday         9183.        405.
## 3 Wednesday       8023.        435.
## 4 Thursday        8184.        401.
## 5 Friday          7901.        405.
## 6 Saturday        9871.        419.
ggarrange(
    ggplot(weekday_steps_sleep) +
      geom_col(aes(weekday, daily_steps), fill = "#006699") +
      geom_hline(yintercept = 7500) +
      labs(title = "Daily steps per weekday", x= "", y = "") +
      theme(axis.text.x = element_text(angle = 45,vjust = 0.5, hjust = 1)),
    ggplot(weekday_steps_sleep, aes(weekday, daily_sleep)) +
      geom_col(fill = "#85e0e0") +
      geom_hline(yintercept = 480) +
      labs(title = "Minutes asleep per weekday", x= "", y = "") +
      theme(axis.text.x = element_text(angle = 45,vjust = 0.5, hjust = 1))
  )

In the graphs above we can determine the following:

Users walk daily the recommended amount of steps of 7500 besides Sunday’s.

Users don’t sleep the recommended amount of minutes/ hours - 8 hours.

Hourly steps throughout the day Getting deeper into our analysis we want to know when exactly are users more active in a day.

We will use the hourly_steps data frame and separate date_time column.

hourly_steps <- hourly_steps %>%
  separate(date_time, into = c("date", "time"), sep= " ") %>%
  mutate(date = ymd(date)) 
## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 934 rows [1, 25, 49, 73,
## 97, 121, 145, 169, 193, 217, 241, 265, 289, 313, 337, 361, 385, 409, 433, 457,
## ...].
head(hourly_steps) 
## # A tibble: 6 × 4
##           id date       time     steptotal
##        <dbl> <date>     <chr>        <dbl>
## 1 1503960366 2016-04-12 <NA>           373
## 2 1503960366 2016-04-12 01:00:00       160
## 3 1503960366 2016-04-12 02:00:00       151
## 4 1503960366 2016-04-12 03:00:00         0
## 5 1503960366 2016-04-12 04:00:00         0
## 6 1503960366 2016-04-12 05:00:00         0
head(hourly_steps)
## # A tibble: 6 × 4
##           id date       time     steptotal
##        <dbl> <date>     <chr>        <dbl>
## 1 1503960366 2016-04-12 <NA>           373
## 2 1503960366 2016-04-12 01:00:00       160
## 3 1503960366 2016-04-12 02:00:00       151
## 4 1503960366 2016-04-12 03:00:00         0
## 5 1503960366 2016-04-12 04:00:00         0
## 6 1503960366 2016-04-12 05:00:00         0
hourly_steps %>%
  group_by(time) %>%
  summarize(average_steps = mean(steptotal)) %>%
  ggplot() +
  geom_col(mapping = aes(x=time, y = average_steps, fill = average_steps)) + 
  labs(title = "Hourly steps throughout the day", x="", y="") + 
  scale_fill_gradient(low = "green", high = "red")+
  theme(axis.text.x = element_text(angle = 90))

We can see that users are more active between 8am and 7pm. Walking more steps during lunch time from 12pm to 2pm and evenings from 5pm and 7pm.

Correlations

We will now determine if there is any correlation between different variables:

Daily steps and daily sleep

Daily steps and calories

ggarrange(
ggplot(daily_activity_sleep, aes(x=totalsteps, y=totalminutesasleep))+
  geom_jitter() +
  geom_smooth(color = "red") + 
  labs(title = "Daily steps vs Minutes asleep", x = "Daily steps", y= "Minutes asleep") +
   theme(panel.background = element_blank(),
        plot.title = element_text( size=14)), 
ggplot(daily_activity_sleep, aes(x=totalsteps, y=calories))+
  geom_jitter() +
  geom_smooth(color = "red") + 
  labs(title = "Daily steps vs Calories", x = "Daily steps", y= "Calories") +
   theme(panel.background = element_blank(),
        plot.title = element_text( size=14))
)
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Per our plots:

There is no correlation between daily activity level based on steps and the amount of minutes users sleep a day.

Otherwise we can see a positive correlation between steps and calories burned. As assumed the more steps walked the more calories may be burned.

Use of smart device

Days used smart device Now that we have seen some trends in activity, sleep and calories burned, we want to see how often do the users in our sample use their device. That way we can plan our marketing strategy and see what features would benefit the use of smart devices.

We will calculate the number of users that use their smart device on a daily basis, classifying our sample into three categories knowing that the date interval is 31 days:

high use - users who use their device between 21 and 31 days. moderate use - users who use their device between 10 and 20 days. low use - users who use their device between 1 and 10 days. First we will create a new data frame grouping by Id, calculating number of days used and creating a new column with the classification explained above.

daily_use <- daily_activity_sleep %>%
  group_by(id) %>%
  summarize(days_used=sum(n())) %>%
  mutate(usage = case_when(
    days_used >= 1 & days_used <= 10 ~ "low use",
    days_used >= 11 & days_used <= 20 ~ "moderate use", 
    days_used >= 21 & days_used <= 31 ~ "high use", 
  ))
  
head(daily_use)
## # A tibble: 6 × 3
##           id days_used usage   
##        <dbl>     <int> <chr>   
## 1 1503960366        25 high use
## 2 1644430081         4 low use 
## 3 1844505072         3 low use 
## 4 1927972279         5 low use 
## 5 2026352035        28 high use
## 6 2320127002         1 low use

We will now create a percentage data frame to better visualize the results in the graph. We are also ordering our usage levels.

daily_use_percent <- daily_use %>%
  group_by(usage) %>%
  summarise(total = n()) %>%
  mutate(totals = sum(total)) %>%
  group_by(usage) %>%
  summarise(total_percent = total / totals) %>%
  mutate(labels = scales::percent(total_percent))

daily_use_percent$usage <- factor(daily_use_percent$usage, levels = c("high use", "moderate use", "low use"))

head(daily_use_percent)
## # A tibble: 3 × 3
##   usage        total_percent labels
##   <fct>                <dbl> <chr> 
## 1 high use             0.5   50%   
## 2 low use              0.375 38%   
## 3 moderate use         0.125 12%

Now that we have our new table we can create our plot:

daily_use_percent %>%
  ggplot(aes(x="",y=total_percent, fill=usage)) +
  geom_bar(stat = "identity", width = 1)+
  coord_polar("y", start=0)+
  theme_minimal()+
  theme(axis.title.x= element_blank(),
        axis.title.y = element_blank(),
        panel.border = element_blank(), 
        panel.grid = element_blank(), 
        axis.ticks = element_blank(),
        axis.text.x = element_blank(),
        plot.title = element_text(hjust = 0.5, size=14, face = "bold")) +
  geom_text(aes(label = labels),
            position = position_stack(vjust = 0.5))+
  scale_fill_manual(values = c("#006633","#00e673","#80ffbf"),
                    labels = c("High use - 21 to 31 days",
                                 "Moderate use - 11 to 20 days",
                                 "Low use - 1 to 10 days"))+
  labs(title="Daily use of smart device")

Analyzing our results we can see that

50% of the users of our sample use their device frequently - between 21 to 31 days.

12% use their device 11 to 20 days.

38% of our sample use really rarely their device.

Time used smart device

Being more precise we want to see how many minutes do users wear their device per day. For that we will merge the created daily_use data frame and daily_activity to be able to filter results by daily use of device as well.

daily_use_merged <- merge(daily_activity, daily_use, by=c ("id"))
head(daily_use_merged)
##           id       date totalsteps totaldistance trackerdistance
## 1 1503960366 2016-05-07      11992          7.71            7.71
## 2 1503960366 2016-05-06      12159          8.03            8.03
## 3 1503960366 2016-05-01      10602          6.81            6.81
## 4 1503960366 2016-04-30      14673          9.25            9.25
## 5 1503960366 2016-04-12      13162          8.50            8.50
## 6 1503960366 2016-04-13      10735          6.97            6.97
##   loggedactivitiesdistance veryactivedistance moderatelyactivedistance
## 1                        0               2.46                     2.12
## 2                        0               1.97                     0.25
## 3                        0               2.29                     1.60
## 4                        0               3.56                     1.42
## 5                        0               1.88                     0.55
## 6                        0               1.57                     0.69
##   lightactivedistance sedentaryactivedistance veryactiveminutes
## 1                3.13                       0                37
## 2                5.81                       0                24
## 3                2.92                       0                33
## 4                4.27                       0                52
## 5                6.06                       0                25
## 6                4.71                       0                21
##   fairlyactiveminutes lightlyactiveminutes sedentaryminutes calories days_used
## 1                  46                  175              833     1821        25
## 2                   6                  289              754     1896        25
## 3                  35                  246              730     1820        25
## 4                  34                  217              712     1947        25
## 5                  13                  328              728     1985        25
## 6                  19                  217              776     1797        25
##      usage
## 1 high use
## 2 high use
## 3 high use
## 4 high use
## 5 high use
## 6 high use

We need to create a new data frame calculating the total amount of minutes users wore the device every day and creating three different categories:

*All day - device was worn all day.

*More than half day - device was worn more than half of the day.

*Less than half day - device was worn less than half of the day.

minutes_worn <- daily_use_merged %>% 
  mutate(total_minutes_worn = veryactiveminutes+fairlyactiveminutes+lightlyactiveminutes+sedentaryminutes)%>%
  mutate (percent_minutes_worn = (total_minutes_worn/1440)*100) %>%
  mutate (worn = case_when(
    percent_minutes_worn == 100 ~ "All day",
    percent_minutes_worn < 100 & percent_minutes_worn >= 50~ "More than half day", 
    percent_minutes_worn < 50 & percent_minutes_worn > 0 ~ "Less than half day"
  ))

head(minutes_worn)
##           id       date totalsteps totaldistance trackerdistance
## 1 1503960366 2016-05-07      11992          7.71            7.71
## 2 1503960366 2016-05-06      12159          8.03            8.03
## 3 1503960366 2016-05-01      10602          6.81            6.81
## 4 1503960366 2016-04-30      14673          9.25            9.25
## 5 1503960366 2016-04-12      13162          8.50            8.50
## 6 1503960366 2016-04-13      10735          6.97            6.97
##   loggedactivitiesdistance veryactivedistance moderatelyactivedistance
## 1                        0               2.46                     2.12
## 2                        0               1.97                     0.25
## 3                        0               2.29                     1.60
## 4                        0               3.56                     1.42
## 5                        0               1.88                     0.55
## 6                        0               1.57                     0.69
##   lightactivedistance sedentaryactivedistance veryactiveminutes
## 1                3.13                       0                37
## 2                5.81                       0                24
## 3                2.92                       0                33
## 4                4.27                       0                52
## 5                6.06                       0                25
## 6                4.71                       0                21
##   fairlyactiveminutes lightlyactiveminutes sedentaryminutes calories days_used
## 1                  46                  175              833     1821        25
## 2                   6                  289              754     1896        25
## 3                  35                  246              730     1820        25
## 4                  34                  217              712     1947        25
## 5                  13                  328              728     1985        25
## 6                  19                  217              776     1797        25
##      usage total_minutes_worn percent_minutes_worn               worn
## 1 high use               1091             75.76389 More than half day
## 2 high use               1073             74.51389 More than half day
## 3 high use               1044             72.50000 More than half day
## 4 high use               1015             70.48611 More than half day
## 5 high use               1094             75.97222 More than half day
## 6 high use               1033             71.73611 More than half day

As we have done before, to better visualize our results we will create new data frames. In this case we will create four different data frames to arrange them later on on a same visualization.

First data frame will show the total of users and will calculate percentage of minutes worn the device taking into consideration the three categories created.

The three other data frames are filtered by category of daily users so that we can see also the difference of daily use and time use.

minutes_worn_percent<- minutes_worn%>%
  group_by(worn) %>%
  summarise(total = n()) %>%
  mutate(totals = sum(total)) %>%
  group_by(worn) %>%
  summarise(total_percent = total / totals) %>%
  mutate(labels = scales::percent(total_percent))


minutes_worn_highuse <- minutes_worn%>%
  filter (usage == "high use")%>%
  group_by(worn) %>%
  summarise(total = n()) %>%
  mutate(totals = sum(total)) %>%
  group_by(worn) %>%
  summarise(total_percent = total / totals) %>%
  mutate(labels = scales::percent(total_percent))

minutes_worn_moduse <- minutes_worn%>%
  filter(usage == "moderate use") %>%
  group_by(worn) %>%
  summarise(total = n()) %>%
  mutate(totals = sum(total)) %>%
  group_by(worn) %>%
  summarise(total_percent = total / totals) %>%
  mutate(labels = scales::percent(total_percent))

minutes_worn_lowuse <- minutes_worn%>%
  filter (usage == "low use") %>%
  group_by(worn) %>%
  summarise(total = n()) %>%
  mutate(totals = sum(total)) %>%
  group_by(worn) %>%
  summarise(total_percent = total / totals) %>%
  mutate(labels = scales::percent(total_percent))

minutes_worn_highuse$worn <- factor(minutes_worn_highuse$worn, levels = c("All day", "More than half day", "Less than half day"))
minutes_worn_percent$worn <- factor(minutes_worn_percent$worn, levels = c("All day", "More than half day", "Less than half day"))
minutes_worn_moduse$worn <- factor(minutes_worn_moduse$worn, levels = c("All day", "More than half day", "Less than half day"))
minutes_worn_lowuse$worn <- factor(minutes_worn_lowuse$worn, levels = c("All day", "More than half day", "Less than half day"))

head(minutes_worn_percent)
## # A tibble: 3 × 3
##   worn               total_percent labels
##   <fct>                      <dbl> <chr> 
## 1 All day                   0.365  36%   
## 2 Less than half day        0.0351 4%    
## 3 More than half day        0.600  60%
head(minutes_worn_highuse)
## # A tibble: 3 × 3
##   worn               total_percent labels
##   <fct>                      <dbl> <chr> 
## 1 All day                   0.0676 6.8%  
## 2 Less than half day        0.0432 4.3%  
## 3 More than half day        0.889  88.9%
head(minutes_worn_moduse)
## # A tibble: 3 × 3
##   worn               total_percent labels
##   <fct>                      <dbl> <chr> 
## 1 All day                    0.267 27%   
## 2 Less than half day         0.04  4%    
## 3 More than half day         0.693 69%
head(minutes_worn_lowuse)
## # A tibble: 3 × 3
##   worn               total_percent labels
##   <fct>                      <dbl> <chr> 
## 1 All day                   0.802  80%   
## 2 Less than half day        0.0224 2%    
## 3 More than half day        0.175  18%

Now that we have created the four data frames and also ordered worn level categories, we can visualize our results in the following plots. All the plots have been arranged together for a better visualization.

ggarrange(
  ggplot(minutes_worn_percent, aes(x="",y=total_percent, fill=worn)) +
  geom_bar(stat = "identity", width = 1)+
  coord_polar("y", start=0)+
  theme_minimal()+
  theme(axis.title.x= element_blank(),
        axis.title.y = element_blank(),
        panel.border = element_blank(), 
        panel.grid = element_blank(), 
        axis.ticks = element_blank(),
        axis.text.x = element_blank(),
        plot.title = element_text(hjust = 0.5, size=14, face = "bold"),
        plot.subtitle = element_text(hjust = 0.5)) +
    scale_fill_manual(values = c("#004d99", "#3399ff", "#cce6ff"))+
  geom_text(aes(label = labels),
            position = position_stack(vjust = 0.5), size = 3.5)+
  labs(title="Time worn per day", subtitle = "Total Users"),
  ggarrange(
  ggplot(minutes_worn_highuse, aes(x="",y=total_percent, fill=worn)) +
  geom_bar(stat = "identity", width = 1)+
  coord_polar("y", start=0)+
  theme_minimal()+
  theme(axis.title.x= element_blank(),
        axis.title.y = element_blank(),
        panel.border = element_blank(), 
        panel.grid = element_blank(), 
        axis.ticks = element_blank(),
        axis.text.x = element_blank(),
        plot.title = element_text(hjust = 0.5, size=14, face = "bold"),
        plot.subtitle = element_text(hjust = 0.5), 
        legend.position = "none")+
    scale_fill_manual(values = c("#004d99", "#3399ff", "#cce6ff"))+
  geom_text_repel(aes(label = labels),
            position = position_stack(vjust = 0.5), size = 3)+
  labs(title="", subtitle = "High use - Users"), 
  ggplot(minutes_worn_moduse, aes(x="",y=total_percent, fill=worn)) +
  geom_bar(stat = "identity", width = 1)+
  coord_polar("y", start=0)+
  theme_minimal()+
  theme(axis.title.x= element_blank(),
        axis.title.y = element_blank(),
        panel.border = element_blank(), 
        panel.grid = element_blank(), 
        axis.ticks = element_blank(),
        axis.text.x = element_blank(),
        plot.title = element_text(hjust = 0.5, size=14, face = "bold"), 
        plot.subtitle = element_text(hjust = 0.5),
        legend.position = "none") +
    scale_fill_manual(values = c("#004d99", "#3399ff", "#cce6ff"))+
  geom_text(aes(label = labels),
            position = position_stack(vjust = 0.5), size = 3)+
  labs(title="", subtitle = "Moderate use - Users"), 
  ggplot(minutes_worn_lowuse, aes(x="",y=total_percent, fill=worn)) +
  geom_bar(stat = "identity", width = 1)+
  coord_polar("y", start=0)+
  theme_minimal()+
  theme(axis.title.x= element_blank(),
        axis.title.y = element_blank(),
        panel.border = element_blank(), 
        panel.grid = element_blank(), 
        axis.ticks = element_blank(),
        axis.text.x = element_blank(),
        plot.title = element_text(hjust = 0.5, size=14, face = "bold"), 
        plot.subtitle = element_text(hjust = 0.5),
        legend.position = "none") +
    scale_fill_manual(values = c("#004d99", "#3399ff", "#cce6ff"))+
  geom_text(aes(label = labels),
            position = position_stack(vjust = 0.5), size = 3)+
  labs(title="", subtitle = "Low use - Users"), 
  ncol = 3), 
  nrow = 2)

Per our plots we can see that 36% of the total of users wear the device all day long, 60% more than half day long and just 4% less than half day.

If we filter the total users considering the days they have used the device and also check each day how long they have worn the device, we have the following results:

Just a reminder:

high use - users who use their device between 21 and 31 days. moderate use - users who use their device between 10 and 20 days. low use - users who use their device between 1 and 10 days. High users - Just 6.8% of the users that have used their device between 21 and 31 days wear it all day. 88.9% wear the device more than half day but not all day.

Moderate users are the ones who wear the device less on a daily basis.

Being low users who wear more time their device the day they use it.

6. Conclusion (Act Phase)

The goal of Bellabeat is to empower women by giving them the information they need to figure out who they are.

For us to answer our business undertaking and help Bellabeat on their central goal, in light of our outcomes, I would guidance to involve own following information for additional examination. Datasets utilized have a little example and can be one-sided since we had no segment subtleties of clients. Realizing that our principal target are youthful and grown-up ladies I would urge to keep viewing patterns as ready to make a showcasing stragety zeroed in on them.

That being said, after our examination we have found various patterns that might be useful to our internet based crusade and further develop Bellabeat application:

  1. Step Up for Health:

• Insight: Users average over 7,500 steps daily (except Sundays), but fall short of the CDC’s recommended 8,000 steps.

• Action: Implement personalized step notifications and educational app posts:

o Send timely reminders to encourage users to reach 8,000 steps.

o Share engaging content emphasizing the health benefits of increased activity, citing credible sources like the CDC.

o Highlight the positive correlation between steps and calorie expenditure.

• Expected Outcome: Increased user activity levels and improved health outcomes.

  1. Sweet Dreams Made Easy:

• Insight: Users generally sleep less than 8 hours per night.

• Action: Offer sleep-supportive features:

o Allow users to set desired sleep times and receive customizable bedtime reminders.

o Provide a library of sleep resources, including:

 Breathing exercises

 Relaxing music and sleep sounds

 Sleep hygiene tips and techniques

• Expected Outcome: Improved sleep quality and user well-being.

  1. Gamified Goals:

• Insight: Not all users respond to reminders and education.

• Action: Introduce a limited-time gamified activity challenge:

o Implement a level system based on daily step goals.

o Require sustained activity for a set period (e.g., month) to progress.

o Reward level achievement with redeemable stars for merchandise or Bellabeat product discounts.

• Expected Outcome: Increased user engagement and motivation, leading to higher activity levels.

Overall:

• Personalization: Tailor these recommendations to user preferences and data insights for a more impactful experience.

• Continuous Improvement: Test, monitor, and adapt these features based on user feedback and data analysis.

• Long-Term Engagement: Focus on strategies that keep users actively engaged with the app over time.

THANK YOU!