Bellabeat Case Study



1) ASK


Questions to Consider

    What are some trends in smart device usage?
  
  
      - According to Think with Google, 61% of people own a smart device globally.
      What does this mean? 
      
        - The market for smart devices is increasing. 
        
      What is the current situation?
      
        - Bellabeat is looking to increase sales in their products, as well as gain
        insight into what their customers use their products for the most. 
        This analysis will focus on the amount of steps, and the calories datasets given in order to adjust or add to the current marketing strategies. 
        
      Business Task
      - This analysis focuses on using the various sets of data in order to determine if the products that Bellabeat customers use are effective, and if any necessary changes should be made in order to increase sales. 
      - Some strategies we plan to put to use:
          - Different designs and color schemes to appeal to more women.
          - Promote product to various age groups of women.
          - Possibility of a new product that is unlike the rest.
          - Updating features/improving interface of products.
          - Example: The Leaf product is designed to be a necklace, bracelet, or clip. If these things are meant for women, we should design them to appeal to more women! 
          
      Key Stakeholders
      - Urška Sršen
      - Sando Mur
      - Customers that use Bellabeat products

2) Prepare


Where is this data from?

[FitBit Fitness Tracker Data](http://https://www.kaggle.com/datasets/arashnic/fitbit)

[Think with Google](http://https://www.thinkwithgoogle.com/future-of-marketing/emerging-technology/smart-device-use-statistics/)


This data can be found on Kaggle. 


How is this data organized?

![Daily steps and calories.jpg](attachment:06e0ea61-6edf-4a7c-a3e6-ded830d1d5f6.jpg)

![Sleep.jpg](attachment:80d74f74-be40-4964-98ed-42e6034acdc0.jpg)

Does this data ROCCC? 

- Reliable: Although this data seems to be fairly good quality, it is not accurate, since the data has been accumulated from over 5 years ago.

- Original: This data is original, since it comes from thirty eligible FitBit users who consented to having their data made public. However, there are some instances of bias within the data, such as the weight put in, and the data does not state ages either. 

- Comprehensive: This data set is comprehensive, because it shows every aspect of what the smart devices analyze, along with data. However, there seem to be some errors. 

- Current: This dataset is not current, since it was taken during 2016. This is a limitation to this dataset. In addition, the same user seems to have multiple entries for different days, which could skew the data. 

- Cited: This dataset has been cited by a Kaggle user(Mobius), who is a verified analyst. 

[Mobius](http://https://www.kaggle.com/arashnic)

Does this help answer the business task?

- Using this dataset will help aid in the analysis to find out if Bellabeat products are working for customers, and if they are effective for them.  

Problems with the Data?

- Majority of the data is mixed between wide and long format. 

- Multiple entries for the same users. 

Verifying the Data's integrity.

- Bellabeat has successfully gathered data from various users of their products, without compromising their identities. 


3) Process

Tools used for this analysis

- R was used for analyzing, cleaning, and organizing the data. 

- Data was merged within R and combined.


```r
install.packages("tidyverse")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(tidyverse)
## ── Attaching packages
## ───────────────────────────────────────
## tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.7     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
install.packages("ggplot2")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(ggplot2)
install.packages("dplyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(dplyr)
install.packages("janitor")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(janitor)
## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
# Data is merged into R
steps_and_calories_v2 <- read.csv("Daily Steps and Calories merged - dailySteps_merged.csv")
sleep_merged <- read.csv("Sleep merged - sleepDay_merged.csv")
# Data is organized 
steps_and_calories_final <- drop_na(steps_and_calories_v2)
# Data is cleaned 

4) Analyze

First Observation: Time Asleep vs Time in Bed

### Plotting the Sleep Data 
ggplot(data = sleep_merged) +
  geom_point(mapping = aes(x = TotalTimeInBed, y = TotalMinutesAsleep), color = "orange")

From the data above, we can see that there is a positive relationship between the total amount of time users spent in bed, and the total amount of time they were asleep.

Second Observation: Total Steps vs Total Calories

### Plotting the Calories vs Steps Data
ggplot(data = steps_and_calories_v2) +
  geom_point(mapping = aes(x = Calories, y = StepTotal), color = "blue") +
  labs(title = "Total Steps vs Total Calories", subtitle = "Data taken from FitBit Fitness Tracker Data on Kaggle")

Adding in another dataset

During this time, I added another dataset, in which all the data is already merged.

daily_activities_v2 <- read.csv("Daily_activity_v2 - dailyActivity_merged.csv")
#### The data was also cleaned. 

Looking at the data above, there also seems to be a positive relationship with the amount of steps users took per day, and the amount of calories burned as well.

### Activity vs Calories 

Third Observation: Very Active Minutes vs Total Calories

In observing the Daily Activities data set that was merged with other data sets, there also seems to be a fairly positive relationship with the amount of very active minutes and the amount of calories burned.

 ggplot(data = daily_activities_v2) +
   geom_point(mapping = aes(x = VeryActiveMinutes, y = Calories)) +
   geom_smooth(method = 'loess', mapping = aes(x = VeryActiveMinutes, y = Calories)) +
   labs(title = "Very Active vs Total Calories")
## `geom_smooth()` using formula 'y ~ x'

### Fourth Observation: Lightly Active Minutes vs Total Calories

When observing the amount of calories Bellabeat users lost with Lightly Active Minutes, the trend was consistent. There was no increase or decrease in the amount of calories.

ggplot(data = daily_activities_v2) +
  geom_point(mapping = aes(x = LightlyActiveMinutes, y = Calories)) +
  geom_smooth(method = 'loess', mapping = aes(x = LightlyActiveMinutes, y = Calories)) +
  labs(title = "Light Activity vs Total Calories")
## `geom_smooth()` using formula 'y ~ x'

Conclusion from Findings

In conclusion, I believe that in analyzing the data from many Bellabeat users, it appears that the customers that use the products often, meaning while they are doing some type of activity, see the most results. Referring to the Calories vs Steps visualization, the more steps the users took, the more calories they burned. From this, I conclude that the more the users usee the products for their everyday activities, the more they will see results from it, negative or positive.

5) Share

Business Question

In doing this analysis, I made sure to keep the business question in mind, and I believe that it was answered. In using the visualizations to answer the question, Bellabeat products are effective for users that use them on a consistent basis, and use them for daily activity.

What story does this tell?

The Bellabeat data tells us how much its customers are using the products, all while showing us the various ways that people are active within a day. It also shows how this activity affects factors such as sleep, hydration, etc. The more activity someone did, the more calories they burned. The less activity, the less calories they burned. With this, the more time users spent asleep, the more time they spent in bed as well.

What does this tell us?

This story tells us that people will find a way to make something effective in their lives, whether it be small or big changes.

6) Act

Based on this analysis, I have found that Bellabeat products appear to be highly effective to customers who use them regularly for their active lifestyles.

Recommendations & Insights

As stated in the insights prior to this conclusion, there are a few insights that could improve Bellabeat’s sales.

Ultimately, the goal from this analysis is to show that there is no need to reinvent the wheel! From the data, the products seem to be effective, so improvement is the key.