How Can a Wellness Technology Company Play It Smart?

This Case study addresses the following questions for BellaBeat

  1. What are some trends in smart device usage?
  2. How could these trends apply to Bellabeat customers?
  3. How could these trends help influence Bellabeat marketing strategy?

Background info abou the Company

Urška Sršen and Sando Mur founded Bellabeat, a high-tech company that manufactures health-focused smart products. Sršen used her background as an artist to develop beautifully designed technology that informs and inspires women around the world. Collecting data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with knowledge about their own health and habits. Since it was founded in 2013, Bellabeat has grown rapidly and quickly positioned itself as a tech-driven wellness company for women.

By 2016, Bellabeat had opened offices around the world and launched multiple products. Bellabeat products became available through a growing number of online retailers in addition to their own e-commerce channel on their website. The company has invested in traditional advertising media, such as radio, out-of-home billboards, print, and television, but focuses on digital marketing extensively. Bellabeat invests year-round in Google Search, maintaining active Facebook and Instagram pages, and consistently engages consumers on Twitter. Additionally, Bellabeat runs video ads on Youtube and display ads on the Google Display Network to support campaigns around key marketing dates.

Sršen knows that an analysis of Bellabeat’s available consumer data would reveal more opportunities for growth. She has asked the marketing analytics team to focus on a Bellabeat product and analyze smart device usage data in order to gain insight into how people are already using their smart devices. Then, using this information, she would like high-level recommendations for how these trends can inform Bellabeat marketing strategy.

The Data Analysis procedures followed were the following

  1. Ask - questions are already put out in the first paragraph
  2. Prepare - use public data that explores smart device users’ daily habits such as FitBit Fitness Tracker Data and also installing packages
  3. Process - the data was processed using R by first importing the data, cleaning the data and finally transforming the data
  4. Analyse - the data was aggregated, calculations were performed and the trends and relationships were identified.
  5. Share - data visualization was performed using ggplot2 for high-level insights and recommendations
  6. Act - the findings were presented using R Markdown notebook.

Packages that were installed for data analysis

install.packages("tidyverse")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.1'
## (as 'lib' is unspecified)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.6     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.4     ✓ stringr 1.4.0
## ✓ readr   2.1.1     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(readr)
install.packages("janitor")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.1'
## (as 'lib' is unspecified)
install.packages("skimr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.1'
## (as 'lib' is unspecified)
library(janitor)
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(skimr)
install.packages("ggplot2")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.1'
## (as 'lib' is unspecified)
library(ggplot2)

The following Fitabase Data was imported

library(readr)
dailyActivity_merged <- read_csv("/cloud/dailyActivity_merged.csv")
## Rows: 940 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): ActivityDate
## dbl (14): Id, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDi...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
hourlySteps_merged <- read_csv("/cloud/hourlySteps_merged.csv")
## Rows: 22099 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityHour
## dbl (2): Id, StepTotal
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
sleepDay_merged <- read_csv("/cloud/sleepDay_merged.csv")
## Rows: 413 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): SleepDay
## dbl (4): Id, TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
weightLogInfo_merged <- read_csv("/cloud/weightLogInfo_merged.csv")
## Rows: 67 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Date
## dbl (6): Id, WeightKg, WeightPounds, Fat, BMI, LogId
## lgl (1): IsManualReport
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Data processing was conducted by cleaning and transforming it

The hourlySteps_merged data was changed and cleaned to be in minuites for the data to be consistent when comparing. The following data mutations where done:

  • Transform hourlSteps_merged by mutating hourlysteps in hours into minutesSteps
minutesSteps <- mutate(hourlySteps_merged, StepTotalPerMin = StepTotal/60)
head(minutesSteps)
## # A tibble: 6 × 4
##           Id ActivityHour          StepTotal StepTotalPerMin
##        <dbl> <chr>                     <dbl>           <dbl>
## 1 1503960366 4/12/2016 12:00:00 AM       373            6.22
## 2 1503960366 4/12/2016 1:00:00 AM        160            2.67
## 3 1503960366 4/12/2016 2:00:00 AM        151            2.52
## 4 1503960366 4/12/2016 3:00:00 AM          0            0   
## 5 1503960366 4/12/2016 4:00:00 AM          0            0   
## 6 1503960366 4/12/2016 5:00:00 AM          0            0

Transformation of data formats

The dailyActivity, minutesSteps, sleepDay, and weightLogInfor data format or datatype was changed from characters into standard date format and the date and time where further split into seperate columns

  • Transform dailyActivity_merged date into dd/mm/yyyy format
dailyActivity_merged$ActivityDate=as.POSIXct(dailyActivity_merged$ActivityDate, format="%m/%d/%Y", tz=Sys.timezone())
dailyActivity_merged$date <- format(dailyActivity_merged$ActivityDate, format = "%m/%d/%y")
head(dailyActivity_merged)
## # A tibble: 6 × 16
##           Id ActivityDate        TotalSteps TotalDistance TrackerDistance
##        <dbl> <dttm>                   <dbl>         <dbl>           <dbl>
## 1 1503960366 2016-04-12 00:00:00      13162          8.5             8.5 
## 2 1503960366 2016-04-13 00:00:00      10735          6.97            6.97
## 3 1503960366 2016-04-14 00:00:00      10460          6.74            6.74
## 4 1503960366 2016-04-15 00:00:00       9762          6.28            6.28
## 5 1503960366 2016-04-16 00:00:00      12669          8.16            8.16
## 6 1503960366 2016-04-17 00:00:00       9705          6.48            6.48
## # … with 11 more variables: LoggedActivitiesDistance <dbl>,
## #   VeryActiveDistance <dbl>, ModeratelyActiveDistance <dbl>,
## #   LightActiveDistance <dbl>, SedentaryActiveDistance <dbl>,
## #   VeryActiveMinutes <dbl>, FairlyActiveMinutes <dbl>,
## #   LightlyActiveMinutes <dbl>, SedentaryMinutes <dbl>, Calories <dbl>,
## #   date <chr>
  • Transform minutesSteps date-time into dd/mm/yyyy and time HH:MM:SS format.
hourlySteps_merged$ActivityHour = as.POSIXct(hourlySteps_merged$ActivityHour, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
hourlySteps_merged$time <- format(hourlySteps_merged$ActivityHour, format = "%H:%M:%S")
hourlySteps_merged$date <- format(hourlySteps_merged$ActivityHour, format = "%m/%d/%y")
head(hourlySteps_merged)
## # A tibble: 6 × 5
##           Id ActivityHour        StepTotal time     date    
##        <dbl> <dttm>                  <dbl> <chr>    <chr>   
## 1 1503960366 2016-04-12 00:00:00       373 00:00:00 04/12/16
## 2 1503960366 2016-04-12 01:00:00       160 01:00:00 04/12/16
## 3 1503960366 2016-04-12 02:00:00       151 02:00:00 04/12/16
## 4 1503960366 2016-04-12 03:00:00         0 03:00:00 04/12/16
## 5 1503960366 2016-04-12 04:00:00         0 04:00:00 04/12/16
## 6 1503960366 2016-04-12 05:00:00         0 05:00:00 04/12/16
## Code re-entered to recreate the dataset with new format
minutesSteps <- mutate(hourlySteps_merged, StepTotalPerMin = StepTotal/60)
head(minutesSteps)
## # A tibble: 6 × 6
##           Id ActivityHour        StepTotal time     date     StepTotalPerMin
##        <dbl> <dttm>                  <dbl> <chr>    <chr>              <dbl>
## 1 1503960366 2016-04-12 00:00:00       373 00:00:00 04/12/16            6.22
## 2 1503960366 2016-04-12 01:00:00       160 01:00:00 04/12/16            2.67
## 3 1503960366 2016-04-12 02:00:00       151 02:00:00 04/12/16            2.52
## 4 1503960366 2016-04-12 03:00:00         0 03:00:00 04/12/16            0   
## 5 1503960366 2016-04-12 04:00:00         0 04:00:00 04/12/16            0   
## 6 1503960366 2016-04-12 05:00:00         0 05:00:00 04/12/16            0
  • Transform sleepDay_merged date-time into dd/mm/yyyy and time HH:MM:SS format.
sleepDay_merged$SleepDay =as.POSIXct(sleepDay_merged$SleepDay, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
sleepDay_merged$time <- format(sleepDay_merged$SleepDay, format = "%H:%M:%S")
sleepDay_merged$date <- format(sleepDay_merged$SleepDay, format = "%m/%d/%y")
head(sleepDay_merged)
## # A tibble: 6 × 7
##         Id SleepDay            TotalSleepRecords TotalMinutesAsl… TotalTimeInBed
##      <dbl> <dttm>                          <dbl>            <dbl>          <dbl>
## 1   1.50e9 2016-04-12 00:00:00                 1              327            346
## 2   1.50e9 2016-04-13 00:00:00                 2              384            407
## 3   1.50e9 2016-04-15 00:00:00                 1              412            442
## 4   1.50e9 2016-04-16 00:00:00                 2              340            367
## 5   1.50e9 2016-04-17 00:00:00                 1              700            712
## 6   1.50e9 2016-04-19 00:00:00                 1              304            320
## # … with 2 more variables: time <chr>, date <chr>
  • Transform weightLogInfo_merged date-time into dd/mm/yyyy and time HH:MM:SS format.
weightLogInfo_merged$Date = as.POSIXct(weightLogInfo_merged$Date, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
weightLogInfo_merged$time <- format(weightLogInfo_merged$Date, format = "%H:%M:%S")
weightLogInfo_merged$date <- format(weightLogInfo_merged$Date, format = "%m/%d/%y")
head(weightLogInfo_merged)
## # A tibble: 6 × 10
##          Id Date                WeightKg WeightPounds   Fat   BMI IsManualReport
##       <dbl> <dttm>                 <dbl>        <dbl> <dbl> <dbl> <lgl>         
## 1    1.50e9 2016-05-02 23:59:59     52.6         116.    22  22.6 TRUE          
## 2    1.50e9 2016-05-03 23:59:59     52.6         116.    NA  22.6 TRUE          
## 3    1.93e9 2016-04-13 01:08:52    134.          294.    NA  47.5 FALSE         
## 4    2.87e9 2016-04-21 23:59:59     56.7         125.    NA  21.5 TRUE          
## 5    2.87e9 2016-05-12 23:59:59     57.3         126.    NA  21.7 TRUE          
## 6    4.32e9 2016-04-17 23:59:59     72.4         160.    25  27.5 TRUE          
## # … with 3 more variables: LogId <dbl>, time <chr>, date <chr>

The number of participants for each dataset

The Distinct function was used to determine the number of participants for each dataset and the following was conducted.

n_distinct(dailyActivity_merged$Id)
## [1] 33
n_distinct(minutesSteps$Id)
## [1] 33
n_distinct(sleepDay_merged$Id)
## [1] 24
n_distinct(weightLogInfo_merged$Id)
## [1] 8

From the results we could determine the number of participants per each dataset and it was 33 in dailyActivity, 33 in minutesSteps, 24 in sleepDay and 8 in weightLogInfo.

In total the number of participants where:

Total_participants <- n_distinct(dailyActivity_merged$Id) + n_distinct(minutesSteps$Id) +
  n_distinct(sleepDay_merged $Id) +
  n_distinct(weightLogInfo_merged$Id)

Total_participants
## [1] 98

Therefore, this means the sample number (98) was moderate for analysis and a fair, unbiased conclusion can be determined from it.

Merging of Datasets

The merging of database such as dailyActivty and miutestSteps; dailyActivty and weightLogInfor was performed in order to easily compare and determine data trends and insights when plotting.

merged_activity_minuteSteps <- merge(dailyActivity_merged, minutesSteps,by=c('Id', 'date'))
head(merged_activity_minuteSteps)
##           Id     date ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 1503960366 04/12/16   2016-04-12      13162           8.5             8.5
## 2 1503960366 04/12/16   2016-04-12      13162           8.5             8.5
## 3 1503960366 04/12/16   2016-04-12      13162           8.5             8.5
## 4 1503960366 04/12/16   2016-04-12      13162           8.5             8.5
## 5 1503960366 04/12/16   2016-04-12      13162           8.5             8.5
## 6 1503960366 04/12/16   2016-04-12      13162           8.5             8.5
##   LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1                        0               1.88                     0.55
## 2                        0               1.88                     0.55
## 3                        0               1.88                     0.55
## 4                        0               1.88                     0.55
## 5                        0               1.88                     0.55
## 6                        0               1.88                     0.55
##   LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1                6.06                       0                25
## 2                6.06                       0                25
## 3                6.06                       0                25
## 4                6.06                       0                25
## 5                6.06                       0                25
## 6                6.06                       0                25
##   FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1                  13                  328              728     1985
## 2                  13                  328              728     1985
## 3                  13                  328              728     1985
## 4                  13                  328              728     1985
## 5                  13                  328              728     1985
## 6                  13                  328              728     1985
##          ActivityHour StepTotal     time StepTotalPerMin
## 1 2016-04-12 00:00:00       373 00:00:00        6.216667
## 2 2016-04-12 01:00:00       160 01:00:00        2.666667
## 3 2016-04-12 02:00:00       151 02:00:00        2.516667
## 4 2016-04-12 03:00:00         0 03:00:00        0.000000
## 5 2016-04-12 04:00:00         0 04:00:00        0.000000
## 6 2016-04-12 05:00:00         0 05:00:00        0.000000
merged_activity_weight <- merge(dailyActivity_merged, weightLogInfo_merged, by=c('Id', 'date'))
head(merged_activity_weight)
##           Id     date ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 1503960366 05/02/16   2016-05-02      14727          9.71            9.71
## 2 1503960366 05/03/16   2016-05-03      15103          9.66            9.66
## 3 1927972279 04/13/16   2016-04-13        356          0.25            0.25
## 4 2873212765 04/21/16   2016-04-21       8859          5.98            5.98
## 5 2873212765 05/12/16   2016-05-12       7566          5.11            5.11
## 6 4319703577 04/17/16   2016-04-17         29          0.02            0.02
##   LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1                        0               3.21                     0.57
## 2                        0               3.73                     1.05
## 3                        0               0.00                     0.00
## 4                        0               0.13                     0.37
## 5                        0               0.00                     0.00
## 6                        0               0.00                     0.00
##   LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1                5.92                    0.00                41
## 2                4.88                    0.00                50
## 3                0.25                    0.00                 0
## 4                5.47                    0.01                 2
## 5                5.11                    0.00                 0
## 6                0.02                    0.00                 0
##   FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1                  15                  277              798     2004
## 2                  24                  254              816     1990
## 3                   0                   32              986     2151
## 4                  10                  371             1057     1970
## 5                   0                  268              720     1431
## 6                   0                    3             1363     1464
##                  Date WeightKg WeightPounds Fat   BMI IsManualReport
## 1 2016-05-02 23:59:59     52.6     115.9631  22 22.65           TRUE
## 2 2016-05-03 23:59:59     52.6     115.9631  NA 22.65           TRUE
## 3 2016-04-13 01:08:52    133.5     294.3171  NA 47.54          FALSE
## 4 2016-04-21 23:59:59     56.7     125.0021  NA 21.45           TRUE
## 5 2016-05-12 23:59:59     57.3     126.3249  NA 21.69           TRUE
## 6 2016-04-17 23:59:59     72.4     159.6147  25 27.45           TRUE
##          LogId     time
## 1 1.462234e+12 23:59:59
## 2 1.462320e+12 23:59:59
## 3 1.460510e+12 01:08:52
## 4 1.461283e+12 23:59:59
## 5 1.463098e+12 23:59:59
## 6 1.460938e+12 23:59:59

The summary of datasets

The min, median, max and other valuable insights about the datasets were determined using the summary function.

  • Summary of dailyActivity merged with heartrate_in_min
merged_activity_minuteSteps %>%  
  select(Id, date, TotalSteps, TotalDistance, Calories, StepTotalPerMin, StepTotal) %>%
  summary()
##        Id                date             TotalSteps    TotalDistance   
##  Min.   :1.504e+09   Length:22099       Min.   :    0   Min.   : 0.000  
##  1st Qu.:2.320e+09   Class :character   1st Qu.: 3974   1st Qu.: 2.680  
##  Median :4.445e+09   Mode  :character   Median : 7604   Median : 5.320  
##  Mean   :4.848e+09                      Mean   : 7752   Mean   : 5.572  
##  3rd Qu.:6.962e+09                      3rd Qu.:10771   3rd Qu.: 7.750  
##  Max.   :8.878e+09                      Max.   :36019   Max.   :28.030  
##     Calories    StepTotalPerMin      StepTotal      
##  Min.   : 120   Min.   :  0.0000   Min.   :    0.0  
##  1st Qu.:1841   1st Qu.:  0.0000   1st Qu.:    0.0  
##  Median :2162   Median :  0.6667   Median :   40.0  
##  Mean   :2336   Mean   :  5.3361   Mean   :  320.2  
##  3rd Qu.:2799   3rd Qu.:  5.9500   3rd Qu.:  357.0  
##  Max.   :4900   Max.   :175.9000   Max.   :10554.0
  • Summary of dailyActivity merged with weightLogInfor
merged_activity_weight %>% 
  select(Id, TotalSteps, TotalDistance, Calories, WeightKg, Fat, BMI) %>% 
  summary()
##        Id              TotalSteps    TotalDistance       Calories   
##  Min.   :1.504e+09   Min.   :   29   Min.   : 0.020   Min.   : 928  
##  1st Qu.:6.962e+09   1st Qu.: 8477   1st Qu.: 5.945   1st Qu.:1998  
##  Median :6.962e+09   Median :11101   Median : 8.110   Median :2174  
##  Mean   :7.009e+09   Mean   :12102   Mean   : 9.211   Mean   :2545  
##  3rd Qu.:8.878e+09   3rd Qu.:14996   3rd Qu.: 9.710   3rd Qu.:3258  
##  Max.   :8.878e+09   Max.   :29326   Max.   :26.720   Max.   :4552  
##                                                                     
##     WeightKg           Fat             BMI       
##  Min.   : 52.60   Min.   :22.00   Min.   :21.45  
##  1st Qu.: 61.40   1st Qu.:22.75   1st Qu.:23.96  
##  Median : 62.50   Median :23.50   Median :24.39  
##  Mean   : 72.04   Mean   :23.50   Mean   :25.19  
##  3rd Qu.: 85.05   3rd Qu.:24.25   3rd Qu.:25.56  
##  Max.   :133.50   Max.   :25.00   Max.   :47.54  
##                   NA's   :65
  • Summary of sleepDay
sleepDay_merged %>% 
  select(Id, TotalMinutesAsleep, TotalTimeInBed) %>% 
  summary()
##        Id            TotalMinutesAsleep TotalTimeInBed 
##  Min.   :1.504e+09   Min.   : 58.0      Min.   : 61.0  
##  1st Qu.:3.977e+09   1st Qu.:361.0      1st Qu.:403.0  
##  Median :4.703e+09   Median :433.0      Median :463.0  
##  Mean   :5.001e+09   Mean   :419.5      Mean   :458.6  
##  3rd Qu.:6.962e+09   3rd Qu.:490.0      3rd Qu.:526.0  
##  Max.   :8.792e+09   Max.   :796.0      Max.   :961.0

The median sleeping time in hours: 433/60 =

SleepTimeHr <- 433/60 
SleepTimeHr 
## [1] 7.216667

Observations from the summaries are:

  • The majority of the participants weren’t as active since their average total steps per day were 7752 compared to the health recommended average 8000 steps day.

  • The majority of participants weren’t healthy as their weight and BMI was 72.04 Kg and 25.19, respectively. The healthy BMI is between 18.5 to <25

  • Most participants slept healthy since adults are recommended to have 7-8 hours of sleep and the participants average sleep was 7.22 hours.

Including Plots

The following plots show the trends of Calories burnt with the distance walked by individuals.

ggplot(data = dailyActivity_merged) + 
  geom_point(mapping = aes(x = TotalDistance, y = Calories)) +
  geom_smooth(mapping = aes(x = TotalDistance, y = Calories)) +
labs(title="Total Distance vs Calories")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

There is a positive correlation here between total distance and calories, which suggests that the active individuals are the more calories they burn.

The total distance covered and time taken for it was measured and the following graph shows the findings:

ggplot(data = merged_activity_minuteSteps) + 
  geom_point(mapping = aes(x = TotalSteps, y = StepTotalPerMin, color = VeryActiveMinutes)) +
  labs(title="  Total Distance vs StepTotalPerMin")

From the scatter plot above it shows the data of speed the individuals walked or ran at. The most active individuals show to cover more number of steps in a shorter time frame and these findings show that they were running and moving at a greater speed. The opposite can also be observed where slower individuals cover a short distance or number of steps in a shorter time frame.

The following Data shows the Weight distribution of the females

ggplot(data = weightLogInfo_merged) + 
  geom_histogram(mapping = aes(x = WeightKg, color = BMI)) +
  labs(title= " Weight(Kg) distribution")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

From the weight distribution we could tell that most females weights were between 60 and 90 Kg. These findings agree with the weight median of 63Kg. The lowest weight was 52.60 Kg and the highest weight was 133.50 Kg.

The relationship between BMI and weight was also investigated.

ggplot(data = merged_activity_weight) + 
  geom_point(mapping = aes(x = WeightKg, y =  BMI)) +
  geom_smooth(mapping = aes(x = WeightKg, y = BMI)) +
  labs(title= " BMI Vs Weight(Kg)")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

From the plot of BMI vs Weight we can note a positive trend suggesting that as the weight increases so does the females BMI increases.

If the BMI is less than 18.5, it falls within the underweight range. If your BMI is 18.5 to <25, it falls within the healthy weight range. If your BMI is 25.0 to <30, it falls within the overweight range. If your BMI is 30.0 or higher, it falls within the obesity range.

From the data we also analysed the activity of individuals relative to their weight to determine which ones were more active and the following plot shows our findings:

ggplot(data = merged_activity_weight) + 
  geom_point(mapping = aes(x = WeightKg, y =  FairlyActiveMinutes)) +
  geom_smooth(mapping = aes(x = WeightKg, y = FairlyActiveMinutes)) +
  labs(title= " FailyActive females Vs Weight(Kg)")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

From these finding we can attribute that females with approximately 60 Kg are more active higher wights. There is however a peak of activity in females with the weight of approximately 90 Kg and this may be due to them wanting to loose weight.

Conclusion

In conclusion we can say that most participants weren’t active or healthy and that the fitness companies such as BellaBeat should perhaps have innovative ways to encourage their target audience which is females to become healthier. They could also use smartphone notifications to alert females about their health and steps to take in improving it.

In answering the question of the Case Study I would recommend BellaBeat to take advantage of these findings and use innovative ways for females to improve their lifestyle and health. They should use these insights to market and encourage females to use their products in order to improve their healths and lifestyle by getting daily notification of their, steps, sleep, calory, etc. metrics

I would like to Thank Google Data Analytics course for the opportunity to excercise my skills This is my project using R

Regards, Tshepo Molefe