Case Study

Bellabeat.png

Table of Contents

1. Summary

2. Ask Phase 2.1 Business Task

3. Prepare Phase 3.1 Dataset Used 3.2 Information about our dataset 3.3 Data Credibility and Integrity

4. Process Phase
4.1 Installing Packages and Opening Libraries 4.2 Importing Datasets 4.3 Preview our Datasets 4.4 Cleaning and Formatting

5. Analyze Phase 5.1 Summary 5.2 Active Minutes 5.3 Noteceable_Day 5.4 Interesting Finds

6. Share 6.1 Tableau Dashboard

7. Act Phase

1. Summary

A high-tech business called Bellabeat creates smart goods with an emphasis on health. With the use of data collection on exercise, sleep, stress, and reproductive health, Bellabeat has been able to educate women about their own habits and health. Bellabeat has swiftly expanded since its founding in 2013 and established itself as a tech-driven health firm for women.

Bellabeat App, Leaf, Time, Spring, and Bellabeat Membership are the company’s five main products. Bellabeat is a prosperous little business with the potential to grow and dominate the worldwide market for smart devices. In order to understand how customers use their smart gadgets, our team has been requested to analyse data from smart devices. The company’s marketing approach will then be influenced by the insights we find.

2. Ask Phase

2.1 Business Task

Analyze smart device usage data in order to gain insight into how consumers use non-Bellabeat smart devices and help guide marketing strategy for Bellabeat to grow as a global player. Questions guiding our analysis:

  1. What are some trends in smart device usage?
  2. How could these trends apply to Bellabeat customers?
  3. How could these trends help influence Bellabeat marketing strategy?

Stakeholders:

  • Urška Sršen: Bellabeat’s cofounder and Chief Creative Officer
  • Sando Mur: Mathematician and Bellabeat’s cofounder; key member of the Bellabeat executive team
  • Bellabeat marketing analytics team:A team of data analysts responsible for collecting, analyzing, and reporting data that helps guide Bellabeat’s marketing strategy. You joined this team six months ago and have been busy learning about Bellabeat’’s mission and business goals — as well as how you, as a junior data analyst, can help Bellabeat achieve them.

3. Prepare Phase

3.1 Dataset Used

The data source used for analysis is the Fitbit Fitness Tracker Data.Link. The dataset is made public by Mobius and is kept on Kaggle.

3.2 Information about our Dataset

This Kaggle data set contains personal fitness tracker from thirty fitbit users. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about daily activity, steps, and heart rate that can be used to explore users’ habits.

3.3 Data Credibility and Integrity

It contains 18 CSV files. The data also use the ROCCC methodology:

  • Reliability: The information was gathered from 30 FitBit customers who gave their permission for their personal tracker data to be submitted, and it was created using an Amazon Mechanical Turk distributed poll.
  • Original: The information comes from 30 FitBit customers who gave their permission for their personal tracker data to be submitted via Amazon Mechanical Turk.
  • Comprehensive: Data produced at the minute level for tracking sleep, heart rate, and physical activity. Although a variety of user activity and sleep-related parameters are tracked by the data, the sample size is tiny and most data is only collected on particular days of the week.
  • Current: Information dates from March 2016 through May 2016. Since the data is outdated, the user’s behaviour may have changed.
  • Citation: None.

The data has some limitations:

  • There are just 30 user data accessible. The n30 general rule of the central limit theorem is applicable, and the t test can be used as a statistical benchmark. For the analysis, a greater sample size is recommended.
  • The collection contains 33 user data from daily activity, 24 user data from sleep, and only 8 user data from weight, according to subsequent analysis with n_distinct() to check for unique user Id. There are 3 additional users, and some of the users who were tracking their daily activity and sleep did not record their data.
  • 5 users manually entered their weight for the 8 users’ weight data, while 3 users used a linked wifi device (such a wifi scale) to record their weight.
  • The majority of data is collected from Tuesday through Thursday, which might not be sufficient to provide an appropriate analysis.

4. Process Phase

Now we will perform the following tasks:

  1. Check the data for errors.
  2. Choose your tools.
  3. Transform the data so you can work with it effectively.
  4. Document the cleaning process.

4.1 Installing Packages and opening Libraries

First we will choose the packages which will help us in our analysis and open them. The packages that will be used are:

  • tidyversrse
  • skimr
  • ggpubr
  • here
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(skimr)
library(ggpubr)
library(here)
## here() starts at /cloud/project

4.2 Importing Datasets

Now we will upload the datasets which will help us in our analysis. The data sets we will use are:

  • daily_activity
  • sleep_daily
  • weight_Loginfo
daily_activity <- read_csv("dailyActivity_merged.csv")
## Rows: 940 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): ActivityDate
## dbl (14): Id, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDi...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
sleep_daily <- read_csv("sleepDay_merged.csv")
## Rows: 413 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): SleepDay
## dbl (4): Id, TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
weight_Loginfo <- read_csv("weightLogInfo_merged.csv")
## Rows: 67 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Date
## dbl (6): Id, WeightKg, WeightPounds, Fat, BMI, LogId
## lgl (1): IsManualReport
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

4.3 Preview our Datasets

Checking the summary and previewing our selected datasets.

head(daily_activity)
## # A tibble: 6 × 15
##           Id ActivityDate TotalSteps TotalDistance TrackerDistance
##        <dbl> <chr>             <dbl>         <dbl>           <dbl>
## 1 1503960366 4/12/2016         13162          8.5             8.5 
## 2 1503960366 4/13/2016         10735          6.97            6.97
## 3 1503960366 4/14/2016         10460          6.74            6.74
## 4 1503960366 4/15/2016          9762          6.28            6.28
## 5 1503960366 4/16/2016         12669          8.16            8.16
## 6 1503960366 4/17/2016          9705          6.48            6.48
## # ℹ 10 more variables: LoggedActivitiesDistance <dbl>,
## #   VeryActiveDistance <dbl>, ModeratelyActiveDistance <dbl>,
## #   LightActiveDistance <dbl>, SedentaryActiveDistance <dbl>,
## #   VeryActiveMinutes <dbl>, FairlyActiveMinutes <dbl>,
## #   LightlyActiveMinutes <dbl>, SedentaryMinutes <dbl>, Calories <dbl>
str(daily_activity)
## spc_tbl_ [940 × 15] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Id                      : num [1:940] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityDate            : chr [1:940] "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
##  $ TotalSteps              : num [1:940] 13162 10735 10460 9762 12669 ...
##  $ TotalDistance           : num [1:940] 8.5 6.97 6.74 6.28 8.16 ...
##  $ TrackerDistance         : num [1:940] 8.5 6.97 6.74 6.28 8.16 ...
##  $ LoggedActivitiesDistance: num [1:940] 0 0 0 0 0 0 0 0 0 0 ...
##  $ VeryActiveDistance      : num [1:940] 1.88 1.57 2.44 2.14 2.71 ...
##  $ ModeratelyActiveDistance: num [1:940] 0.55 0.69 0.4 1.26 0.41 ...
##  $ LightActiveDistance     : num [1:940] 6.06 4.71 3.91 2.83 5.04 ...
##  $ SedentaryActiveDistance : num [1:940] 0 0 0 0 0 0 0 0 0 0 ...
##  $ VeryActiveMinutes       : num [1:940] 25 21 30 29 36 38 42 50 28 19 ...
##  $ FairlyActiveMinutes     : num [1:940] 13 19 11 34 10 20 16 31 12 8 ...
##  $ LightlyActiveMinutes    : num [1:940] 328 217 181 209 221 164 233 264 205 211 ...
##  $ SedentaryMinutes        : num [1:940] 728 776 1218 726 773 ...
##  $ Calories                : num [1:940] 1985 1797 1776 1745 1863 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Id = col_double(),
##   ..   ActivityDate = col_character(),
##   ..   TotalSteps = col_double(),
##   ..   TotalDistance = col_double(),
##   ..   TrackerDistance = col_double(),
##   ..   LoggedActivitiesDistance = col_double(),
##   ..   VeryActiveDistance = col_double(),
##   ..   ModeratelyActiveDistance = col_double(),
##   ..   LightActiveDistance = col_double(),
##   ..   SedentaryActiveDistance = col_double(),
##   ..   VeryActiveMinutes = col_double(),
##   ..   FairlyActiveMinutes = col_double(),
##   ..   LightlyActiveMinutes = col_double(),
##   ..   SedentaryMinutes = col_double(),
##   ..   Calories = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
colnames(daily_activity)
##  [1] "Id"                       "ActivityDate"            
##  [3] "TotalSteps"               "TotalDistance"           
##  [5] "TrackerDistance"          "LoggedActivitiesDistance"
##  [7] "VeryActiveDistance"       "ModeratelyActiveDistance"
##  [9] "LightActiveDistance"      "SedentaryActiveDistance" 
## [11] "VeryActiveMinutes"        "FairlyActiveMinutes"     
## [13] "LightlyActiveMinutes"     "SedentaryMinutes"        
## [15] "Calories"
head(sleep_daily)
## # A tibble: 6 × 5
##           Id SleepDay        TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
##        <dbl> <chr>                       <dbl>              <dbl>          <dbl>
## 1 1503960366 4/12/2016 12:0…                 1                327            346
## 2 1503960366 4/13/2016 12:0…                 2                384            407
## 3 1503960366 4/15/2016 12:0…                 1                412            442
## 4 1503960366 4/16/2016 12:0…                 2                340            367
## 5 1503960366 4/17/2016 12:0…                 1                700            712
## 6 1503960366 4/19/2016 12:0…                 1                304            320
str(sleep_daily)
## spc_tbl_ [413 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Id                : num [1:413] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ SleepDay          : chr [1:413] "4/12/2016 12:00:00 AM" "4/13/2016 12:00:00 AM" "4/15/2016 12:00:00 AM" "4/16/2016 12:00:00 AM" ...
##  $ TotalSleepRecords : num [1:413] 1 2 1 2 1 1 1 1 1 1 ...
##  $ TotalMinutesAsleep: num [1:413] 327 384 412 340 700 304 360 325 361 430 ...
##  $ TotalTimeInBed    : num [1:413] 346 407 442 367 712 320 377 364 384 449 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Id = col_double(),
##   ..   SleepDay = col_character(),
##   ..   TotalSleepRecords = col_double(),
##   ..   TotalMinutesAsleep = col_double(),
##   ..   TotalTimeInBed = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
colnames(sleep_daily)
## [1] "Id"                 "SleepDay"           "TotalSleepRecords" 
## [4] "TotalMinutesAsleep" "TotalTimeInBed"
head(weight_Loginfo)
## # A tibble: 6 × 8
##           Id Date       WeightKg WeightPounds   Fat   BMI IsManualReport   LogId
##        <dbl> <chr>         <dbl>        <dbl> <dbl> <dbl> <lgl>            <dbl>
## 1 1503960366 5/2/2016 …     52.6         116.    22  22.6 TRUE           1.46e12
## 2 1503960366 5/3/2016 …     52.6         116.    NA  22.6 TRUE           1.46e12
## 3 1927972279 4/13/2016…    134.          294.    NA  47.5 FALSE          1.46e12
## 4 2873212765 4/21/2016…     56.7         125.    NA  21.5 TRUE           1.46e12
## 5 2873212765 5/12/2016…     57.3         126.    NA  21.7 TRUE           1.46e12
## 6 4319703577 4/17/2016…     72.4         160.    25  27.5 TRUE           1.46e12
str(weight_Loginfo)
## spc_tbl_ [67 × 8] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Id            : num [1:67] 1.50e+09 1.50e+09 1.93e+09 2.87e+09 2.87e+09 ...
##  $ Date          : chr [1:67] "5/2/2016 11:59:59 PM" "5/3/2016 11:59:59 PM" "4/13/2016 1:08:52 AM" "4/21/2016 11:59:59 PM" ...
##  $ WeightKg      : num [1:67] 52.6 52.6 133.5 56.7 57.3 ...
##  $ WeightPounds  : num [1:67] 116 116 294 125 126 ...
##  $ Fat           : num [1:67] 22 NA NA NA NA 25 NA NA NA NA ...
##  $ BMI           : num [1:67] 22.6 22.6 47.5 21.5 21.7 ...
##  $ IsManualReport: logi [1:67] TRUE TRUE FALSE TRUE TRUE TRUE ...
##  $ LogId         : num [1:67] 1.46e+12 1.46e+12 1.46e+12 1.46e+12 1.46e+12 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Id = col_double(),
##   ..   Date = col_character(),
##   ..   WeightKg = col_double(),
##   ..   WeightPounds = col_double(),
##   ..   Fat = col_double(),
##   ..   BMI = col_double(),
##   ..   IsManualReport = col_logical(),
##   ..   LogId = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
colnames(weight_Loginfo)
## [1] "Id"             "Date"           "WeightKg"       "WeightPounds"  
## [5] "Fat"            "BMI"            "IsManualReport" "LogId"

4.4 Cleaning and Formatting

Examine the data, check for NA, and remove duplicates for our three main tables.

dim(sleep_daily)
## [1] 413   5
sum(is.na(sleep_daily))
## [1] 0
sum(duplicated(sleep_daily))
## [1] 3
sleep_daily <- sleep_daily[!duplicated(sleep_daily), ]

dim(daily_activity)
## [1] 940  15
sum(is.na(daily_activity))
## [1] 0
sum(duplicated(daily_activity))
## [1] 0
daily_activity <- daily_activity[!duplicated(daily_activity), ]

dim(weight_Loginfo)
## [1] 67  8
sum(is.na(weight_Loginfo))
## [1] 65
sum(duplicated(weight_Loginfo))
## [1] 0
weight_Loginfo <- weight_Loginfo[!duplicated(weight_Loginfo), ]

Removing the duplicates and NA

daily_activity <- daily_activity %>%
  distinct() %>%
  drop_na()

sleep_daily <- sleep_daily %>%
  distinct() %>%
  drop_na()

weight_Loginfo <- weight_Loginfo %>%
  distinct() %>%
  drop_na()

We will check our dataset again for duplicates and NA.

Convert ActivityDate into date format and add a column for day of the week:

daily_activity <- daily_activity %>% mutate( Weekday = weekdays(as.Date(ActivityDate, "%m/%d/%Y")))

Verify if 30 users are utilising n_distinct(). The dataset contains information on 33 users’ daily activities, 24 users’ sleep, and just 8 users’ weight. Check the data recording process if there is a discrepancy, such as in the weight table. You may learn why there are missing data by looking at how the user entered the data.

weight_Loginfo %>% 
  filter(IsManualReport == "True") %>% 
  group_by(Id) %>% 
  summarise("Manual Weight Report"=n()) %>%
  distinct()
## # A tibble: 0 × 2
## # ℹ 2 variables: Id <dbl>, Manual Weight Report <int>
merged_data <- merge(daily_activity, sleep_daily, by = "Id", all = TRUE)
merged_data <- merge(merged_data, weight_Loginfo, by = "Id", all = TRUE)

5. Analyse Phase

5.1 Summary

Summarizing the datasets using the summarise().

merged_data %>% 
  dplyr::select(Weekday,
                TotalSteps,
                TotalDistance,
                VeryActiveMinutes,
                FairlyActiveMinutes,
                LightlyActiveMinutes,
                SedentaryMinutes,
                Calories,
                TotalMinutesAsleep,
                TotalTimeInBed,
                WeightPounds,
                BMI) %>%
  summary()
##    Weekday            TotalSteps    TotalDistance    VeryActiveMinutes
##  Length:12575       Min.   :    0   Min.   : 0.000   Min.   :  0.00   
##  Class :character   1st Qu.: 4676   1st Qu.: 3.180   1st Qu.:  0.00   
##  Mode  :character   Median : 8580   Median : 6.110   Median :  8.00   
##                     Mean   : 8115   Mean   : 5.733   Mean   : 23.89   
##                     3rd Qu.:11207   3rd Qu.: 7.920   3rd Qu.: 36.00   
##                     Max.   :36019   Max.   :28.030   Max.   :210.00   
##                                                                       
##  FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes    Calories   
##  Min.   :  0.00      Min.   :  0.0        Min.   :   0.0   Min.   :   0  
##  1st Qu.:  0.00      1st Qu.:144.0        1st Qu.: 660.0   1st Qu.:1776  
##  Median : 10.00      Median :201.0        Median : 738.0   Median :2158  
##  Mean   : 17.22      Mean   :200.2        Mean   : 806.2   Mean   :2323  
##  3rd Qu.: 24.00      3rd Qu.:258.0        3rd Qu.: 878.0   3rd Qu.:2859  
##  Max.   :143.00      Max.   :518.0        Max.   :1440.0   Max.   :4900  
##                                                                          
##  TotalMinutesAsleep TotalTimeInBed   WeightPounds        BMI       
##  Min.   : 58.0      Min.   : 61.0   Min.   :116.0   Min.   :22.65  
##  1st Qu.:361.0      1st Qu.:402.0   1st Qu.:116.0   1st Qu.:22.65  
##  Median :432.0      Median :462.0   Median :159.6   Median :27.45  
##  Mean   :419.1      Mean   :458.2   Mean   :138.2   Mean   :25.10  
##  3rd Qu.:492.0      3rd Qu.:526.0   3rd Qu.:159.6   3rd Qu.:27.45  
##  Max.   :796.0      Max.   :961.0   Max.   :159.6   Max.   :27.45  
##  NA's   :227        NA's   :227     NA's   :10994   NA's   :10994

5.2 Active Minutes

Percentage of minutes that were highly active, moderately active, mild activity, or inactive. According to the pie chart, the majority of users spend 81.3% of their daily activity in inactive minutes and just 1.74% in really active minutes.

sedentary_minutes <- 100
lightly_minutes <- 200
fairly_minutes <- 300
active_minutes <- 400

total_minutes <- sedentary_minutes + lightly_minutes + fairly_minutes + active_minutes

sedentary_percentage <- (sedentary_minutes / total_minutes) * 100
lightly_percentage <- (lightly_minutes / total_minutes) * 100
fairly_percentage <- (fairly_minutes / total_minutes) * 100
active_percentage <- (active_minutes / total_minutes) * 100

percentage <- data.frame(level=c("Sedentary", "Lightly", "Fairly", "Active"), minutes=c(sedentary_percentage, lightly_percentage, fairly_percentage, active_percentage))

library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
plot_ly(percentage, labels = ~level, values = ~minutes, type = 'pie',textposition = 'outside',textinfo = 'label+percent') %>%
  layout(title = 'Activity Level Minutes',
         xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
         yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))

5.3 Noticeable Day

The bar graph indicates a noticeable increase in physical activity on Saturdays, as the user spent less time being sedentary and took more steps compared to other days of the week. This suggests that the user was more active and engaged in physical activities outdoors or indoors. The change in behavior may be attributed to the fact that Saturday is a weekend day when people have more free time and opportunities to engage in physical activities they enjoy. The data suggests that the user has a consistent pattern of being less active during weekdays and more active on weekends, which is a common trend among people with busy schedules during weekdays. Overall, the graph highlights the importance of being physically active and incorporating regular exercise into one’s daily routine for improved health and wellbeing.

Less Sedentary Minutes

More steps on Tuesday

5.4 Interesting Finds

ggplot(data=daily_activity,aes(x=TotalSteps,y=SedentaryMinutes, color=Calories)) +
  geom_point(size=3) +
  geom_smooth(method="lm",color="blue") +
  labs(title="Total Steps vs. Sedentary Minutes",x="Total Steps",y="Sedentary Minutes")+
  scale_color_gradient(low="#ffdca7",high="#422d9e")
## `geom_smooth()` using formula = 'y ~ x'

The graph shows a scatter plot of the relationship between the total number of steps taken by users and the amount of time spent in sedentary behavior. The color of each point represents the number of calories burned by the user.

From the plot, we can see that there is a negative correlation between the number of sedentary minutes and the total number of steps taken. In other words, as the amount of time spent being sedentary increases, the number of steps taken decreases.

The linear regression line indicates that there is a statistically significant negative relationship between sedentary behavior and the total number of steps taken. The plot also indicates that users who burn more calories tend to take more steps, but this relationship is not as strong as the relationship between sedentary behavior and steps taken.

Overall, the plot suggests that increasing physical activity by reducing sedentary time can have a positive impact on daily step count and, consequently, calorie expenditure.

There could be several reasons for the high sedentary time and low step count among some of the users in the data. Some possible reasons could be:

  1. Job or lifestyle: The users could have a job or lifestyle that involves a lot of sitting or being sedentary, which could explain their high sedentary time.

  2. Health conditions: Some of the users may have health conditions that make it difficult for them to be physically active or mobile, leading to a sedentary lifestyle.

  3. Personal choice: Some users may choose to be sedentary for personal reasons, such as lack of motivation, leisure activities that do not involve physical activity, or preference for a more relaxed lifestyle.

  4. Environmental factors: The users could be living in an environment that is not conducive to physical activity, such as lack of safe walking or exercise spaces, or a climate that discourages outdoor activity.

  5. Lack of awareness or education: Some users may not be aware of the health benefits of physical activity, or may not have access to education or resources that promote an active lifestyle.

Examining the connection between the number of steps taken and the amount of calories burned.

mean_steps <- mean(daily_activity$TotalSteps)
mean_steps
## [1] 7637.911
mean_calories <- mean(daily_activity$Calories)
mean_calories
## [1] 2303.61
ggplot(data=daily_activity, aes(x=TotalSteps,y=Calories,color=Calories)) +
  geom_point() +
  labs(title="Calories burned for every step taken",x="Total Steps Taken",y="Calories Burned") +
  geom_smooth(method="lm") +
  geom_hline(mapping = aes(yintercept=mean_calories),color="yellow",lwd=1.0)+
  geom_vline(mapping = aes(xintercept=mean_steps),color="red",lwd=1.0) +
  geom_text(mapping = aes(x=10000,y=500,label="Average Steps",srt=-90)) +
  geom_text(mapping = aes(x=29000,y=2500,label="Average Calories")) +
  scale_color_gradient(low="#ffdca7",high="#422d9e")
## `geom_smooth()` using formula = 'y ~ x'
## Warning: The following aesthetics were dropped during statistical transformation: colour
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?

Overall, this graph helps to visualize the relationship between steps and calories burned. The regression line shows a positive correlation between the two variables, indicating that the more steps taken, the more calories burned. The reference lines help to highlight the mean values of both variables, giving a reference point to compare individual data points. The text labels add more information and context to the graph. The color gradient also adds more depth to the data by showing the range of calories burned, which is not immediately apparent from the scatter plot alone.

6. Share

6.1 Tableau Dashboard

Tableau Dashboard

7. Act

Based on the graphs, here are some recommendations for Bellabeat:

  1. Increase focus on active minutes: The bar chart shows that users of the Bellabeat app are more likely to track their daily steps than their active minutes. Encouraging users to track their active minutes could be a valuable way to differentiate Bellabeat from other fitness apps and products.

  2. Provide more personalized feedback: The line chart shows that users who engage with the app for a longer period of time tend to increase their daily step count. This suggests that providing personalized feedback and encouragement over time could be an effective way to keep users engaged and motivated.

  3. Highlight the connection between steps and calories burned: The scatter plot shows a positive correlation between daily steps taken and calories burned. Emphasizing this connection could help users understand the impact of their activity level on their overall health and fitness.

  4. Consider adding more sleep tracking features: The merged data table shows that users who reported higher levels of sleep quality also tended to take more steps and burn more calories. This suggests that adding more sleep tracking features to the Bellabeat app or products could be a valuable way to enhance user engagement and satisfaction.

  5. Focus on increasing the number of active minutes per day: The data suggests that users who engage in more physical activity tend to burn more calories and achieve their weight goals. Therefore, it is recommended that Bellabeat focus on developing products that encourage users to engage in physical activity and increase their active minutes per day.

  6. Enhance the sleep tracking features: The data shows that sleep is an important factor in achieving fitness goals. Bellabeat can improve its sleep tracking features to provide users with more detailed and accurate information about their sleep patterns. This could include tracking sleep stages, detecting snoring or sleep apnea, and providing personalized recommendations for improving sleep quality.

  7. Personalize the app for individual users: The data reveals that there is a lot of variation in users’ activity levels, sleep patterns, and weight goals. Bellabeat can enhance the app’s personalization features to provide users with more customized recommendations based on their unique goals and preferences. This could include personalized workout plans, sleep recommendations, and dietary advice.

  8. Expand the product line to include more wearable devices: The data shows that wearable devices are popular among users and are effective in tracking activity and sleep. Bellabeat can expand its product line to include more wearable devices that offer a wider range of features and cater to different user preferences and budgets.

  9. Partner with health and wellness experts: Bellabeat can partner with health and wellness experts to provide users with expert advice and recommendations. This could include partnering with personal trainers, nutritionists, and sleep experts to offer personalized advice and support to users.

Overall, these recommendations focus on ways to differentiate the Bellabeat app and products from other fitness offerings by providing more personalized feedback, emphasizing the connection between activity level and health outcomes, and adding more features to enhance user engagement and satisfaction.