Hello everyone, it’s been a few months since I’ve been working on the Google Data Analytics Professional Certificate through Coursera. Throughout this journey I’ve accumulated lots of interesting, insightful and most importantly, useful information about various tools that were included with the bundle, such as Tableau, R programming, SQL, Spreadsheets.
This curriculum not only exposed me to diverse standardized practices, but also equipped me with a universal framework that I could apply to any project. Moreover, I gained valuable knowledge of key data analyst terminologies and processes. Through the completion of a case study included in the course, I was able to further enhance my skills and understanding by utilizing a variety of tools, methods, and strategies. Further, I will provide a brief walk through of my thought process and overall comprehension that I have developed over time through this case study.
2 BACKGROUND INFORMATION
Bellabeat is a high-tech manufacturer that manufactures health-focused smart products for women.They develop uniquely beautifully designed technology that informs and inspires women around the world. Their app and multiple smart devices are the source point to collect various data on activity, sleep, stress, hydration levels, and reproductive health to empower women with an understanding of their own health and habits.
Bellabeat was founded in 2013 by Urška Sršen and Sando Mur and since then it has grown rapidly and quickly positioned itself as a tech-driven wellness company for women.It took them three years to open multiple offices around the world and launch various products.They increased their product’s availability through a growing number of online retailers in addition to their own e - commerce channel on their website.
Bellabeat offers a range of products that promote wellness and a healthy lifestyle, all of which connect to the Bellabeat app.There are different product lines available, but one of their popular products is the Leaf, a wellness tracker that can be worn as a bracelet, necklace, or clip. It tracks the user’s activity, sleep, and stress levels and syncs with the Bellabeat app.Apart from their products, they also offer a subscription-based membership program that provides personalized guidance on nutrition, activity, sleep, health, beauty, and mindfulness, based on users’ lifestyles and goals, with 24/7 access.
3 DATA LIMITATIONS
We have identified several limitations in the datasets. First and foremost, the data is not comprehensive, as it includes inputs from only 33 unique users. Of these 33 users, only 8 entered weight, 12 entered heart rate, and 24 entered sleep data. Furthermore, some users did not provide information for all variables in the weight dataset, making the data incomplete. Despite these limitations, we will still work with these datasets as they contain important variables.
It’s important to note that the data comes from FitBit users, which is a secondary source, and therefore may not accurately reflect the behavior and data distribution of BellaBeat users, potentially leading to inaccurate insights.Another limitation is that the data is not current, as it was collected between 4/12/2016 to 5/12/2016, which was about 5 years prior to the time of this case study.The limited duration of data collection, only 30 days, and the small number of users, 33, also affect the reliability and may have resulted in a biased dataset.Some users may not have entered information, while others may have turned off their devices or not used them regularly. Additionally, some data was manually entered, such as weight information, which introduces potential errors.
If this were a real-life project intended to define BellaBeat’s marketing strategy, these limitations would need to be addressed before proceeding with the analysis. However, as this is a case study and we cannot control these limitations, we will still proceed with the analysis.In a real-world scenario, a data analyst would ask several questions before proceeding with data cleaning, such as why some users generated more data rows than others, whether users contributed data voluntarily or were told how often to use the app, and what measures were taken to eliminate sampling bias. It would also be beneficial to obtain newer versions of these datasets or similar datasets directly from BellaBeat for comparison and originality.
4ASK PHASE:
Some of the determining questions asked which will significantly guide the direction for the future marketing program:
Do you have any specific recommendations for any selective domain or devices that you consider essential to focus on while recommending a marketing strategy?
What kinds of targets or expectations have you set for this analysis? Do you mind giving a concise brief?
Are you willing to pivot your business strategy if this analysis reveals insights that could help you expand your customer base and improve the customer experience apart from high level recommendation for marketing strategy?
Can you identify any products that did not receive the desired response, as compared to stakeholders expectations? If so, could you provide a short brief of what you think went wrong?
What will be the use case for recommending a high-level marketing strategy? Is it to enhance your current services or products, or perhaps to launch a new product or service offering?
4.1 Key Takeways
Identify business task.
The main purpose of this analysis is to recommend a high-level marketing strategy and provide insights, which will enable the executive team to gain a clear landscape of the current state and identify untapped opportunities for growth.
The analysis of any one product can result in either minor or significant changes to that product or the creation of a new product that enhances the overall female customers experience as well as, can increase retention rates.
Consider Key Stakeholders.
Urška Sršen(Bellabeat’s cofounder and Chief Creative Officer).
Sando Mur(Mathematician and Bellabeat’s cofounder)
Marketing Analytics Team.
4.2 Deliverables
Our goal is to ascertain growth opportunities for any specific products or services, as well as unlock the full potential of female customers who avail themselves of these offerings.
The aim is to spot all determining factors which are hindering female customers from achieving a balanced lifestyle through use of the app and also undermining possibilities for improving the app’s services to a superior standard.
5PREPARE PHASE:
In this analysis, I will be using datasets to identify every pattern and style in how female users are utilizing the availed services or products. This analysis can also help broadcast areas where the product or service may need improvement or where the marketing strategies can be optimized to better cater to the needs and preferences of female users.The data is made available using public domain by Mobius page.
5.1 Key Task
Load the datasets in a particular order to ensure they are in consecutive order.
Downloading the datasets from a given online repository and then save them in a separate folder as raw data.
Determining the file format and ensuring if lists of docs are accessible,as well as both readable and writable.
Examining the credibility of data by inspecting for any vague or unwanted rows in each of the datasets and then sort them accordingly.
Checking if the total number of id’s are same for each dataset so as to identify any discrepancies and inconsistencies within the dataset.
5.2 Deliverables
Documenting the entire procedure step by step involved in this phase.
A short brief on each operation performed for clarity and ease of understanding.
Checking for distinctive user id in order to know exact number of users in a dataset.
For daily_steps dataset.
Code
n_distinct(daily_steps$Id)
[1] 33
For daily_calories dataset.
Code
n_distinct(daily_calories$Id)
[1] 33
For daily_intensities dataset.
Code
n_distinct(daily_Intensities$Id)
[1] 33
For daily_activities dataset.
Code
n_distinct(daily_Intensities$Id)
[1] 33
For sleep_day dataset.
Code
n_distinct(sleep_day$Id)
[1] 24
6PROCESS PHASE:
Cleaning and processing of the gathered datasets is necessary to determine the quality of associated characteristics and to proceed with further analysis.
6.1 Key Task
Examine across datasets for errors or missing values.
Eradicating duplicates and outliers from datasets if any.
Selecting the appropriate tool to perform the required analysis.
Storing a backup of original datasets to refer back to in case any essential data is lost during analysis.
Transform the existing dataset into a workable format to perform the desired analysis.
6.2 Deliverables
Administering some crucial changes within datasets such as converting datatype to get desired one and using functions for calculating values.
Manipulating datasets by performing some critical computations as required.
Stating use case for all the changes made within datasets along with detailed summary of all datasets.
Recording statement for every minor to major changes made across datasets.
6.3 Code Chunk
Splitting date and time in two different columns.
Here parsing a character string into date and time using mdy_hms().
Merging data by Id’s and creating a distinguishable combination of data frames to ensure a widened outlook.
Code
calorie_steps <-merge(daily_calories,daily_steps, by ="Id",all =TRUE)weight_sleep <-merge(weight_log_info,sleep_day,by ="Id", all =TRUE)dailyActivity_sleep <-merge(daily_activity,sleep_day,by ="Id",all =TRUE)dailyIntensities_weight <-merge(daily_Intensities,weight_log_info, by ="Id",all =TRUE)
Using ‘inner_join()’ concatenating datasets by Id’s and creating a distinguishable combination of data frames to ensure a widened outlook.
Code
hourly_calories_intensities <-inner_join(hourly_calories,hourly_intensities, by ="Id",multiple ="all")hourly_calories_steps <-inner_join(hourly_calories,hourly_steps, by ="Id",multiple ="all")hourly_intensities_calories <-inner_join(calorie_steps,hourly_intensities, by ="Id",multiple ="all")
Ommiting NA’s for dataset as it is mandatory in order to visualize.
Id ActivityDay.x Calories Month.x
Min. :1.504e+09 Min. :2016-04-12 Min. : 0 Length:27800
1st Qu.:2.320e+09 1st Qu.:2016-04-19 1st Qu.:1827 Class :character
Median :4.445e+09 Median :2016-04-26 Median :2156 Mode :character
Mean :4.833e+09 Mean :2016-04-26 Mean :2313
3rd Qu.:6.962e+09 3rd Qu.:2016-05-04 3rd Qu.:2800
Max. :8.878e+09 Max. :2016-05-12 Max. :4900
WeekDay.x ActivityDay.y StepTotal Month.y
Length:27800 Min. :2016-04-12 Min. : 0 Length:27800
Class :character 1st Qu.:2016-04-19 1st Qu.: 3761 Class :character
Mode :character Median :2016-04-26 Median : 7443 Mode :character
Mean :2016-04-26 Mean : 7673
3rd Qu.:2016-05-04 3rd Qu.:10771
Max. :2016-05-12 Max. :36019
WeekDay.y
Length:27800
Class :character
Mode :character
Code
summary(weight_sleep)
Id Date.x WeightKg WeightPounds
Min. :1.504e+09 Min. :2016-04-12 Min. : 52.60 Min. :116.0
1st Qu.:5.577e+09 1st Qu.:2016-04-18 1st Qu.: 61.20 1st Qu.:134.9
Median :6.962e+09 Median :2016-04-28 Median : 61.50 Median :135.6
Mean :6.235e+09 Mean :2016-04-26 Mean : 63.34 Mean :139.6
3rd Qu.:6.962e+09 3rd Qu.:2016-05-04 3rd Qu.: 62.00 3rd Qu.:136.7
Max. :8.878e+09 Max. :2016-05-12 Max. :133.50 Max. :294.3
NA's :292 NA's :292 NA's :292
Fat BMI IsManualReport LogId
Min. :22.00 Min. :21.45 Mode :logical Min. :1.460e+12
1st Qu.:22.00 1st Qu.:23.89 FALSE:55 1st Qu.:1.461e+12
Median :25.00 Median :24.00 TRUE :1059 Median :1.462e+12
Mean :23.53 Mean :24.42 NA's :292 Mean :1.462e+12
3rd Qu.:25.00 3rd Qu.:24.21 3rd Qu.:1.462e+12
Max. :25.00 Max. :47.54 Max. :1.463e+12
NA's :1355 NA's :292 NA's :292
Time.x Month.x WeekDay.x Time_of_day
Length:1406 Length:1406 Length:1406 Night : 23
Class1:hms Class :character Class :character Morning : 31
Class2:difftime Mode :character Mode :character Afternoon: 1
Mode :numeric Evening :1059
NA's : 292
TotalSleepRecords TotalMinutesAsleep TotalTimeInBed Date.y
Min. :1.000 Min. : 58.0 Min. : 61.0 Min. :2016-04-12
1st Qu.:1.000 1st Qu.:400.0 1st Qu.:421.8 1st Qu.:2016-04-19
Median :1.000 Median :442.0 Median :457.0 Median :2016-04-27
Mean :1.101 Mean :433.7 Mean :458.3 Mean :2016-04-26
3rd Qu.:1.000 3rd Qu.:476.2 3rd Qu.:510.0 3rd Qu.:2016-05-04
Max. :3.000 Max. :796.0 Max. :961.0 Max. :2016-05-12
NA's :26 NA's :26 NA's :26 NA's :26
Time.y Month.y WeekDay.y
Length:1406 Length:1406 Length:1406
Class1:hms Class :character Class :character
Class2:difftime Mode :character Mode :character
Mode :numeric
Code
summary(dailyActivity_sleep)
Id ActivityDate TotalSteps TotalDistance
Min. :1.504e+09 Min. :2016-04-12 Min. : 0 Min. : 0.000
1st Qu.:3.977e+09 1st Qu.:2016-04-19 1st Qu.: 4660 1st Qu.: 3.160
Median :4.703e+09 Median :2016-04-27 Median : 8585 Median : 6.120
Mean :5.021e+09 Mean :2016-04-26 Mean : 8108 Mean : 5.722
3rd Qu.:6.962e+09 3rd Qu.:2016-05-04 3rd Qu.:11317 3rd Qu.: 7.920
Max. :8.792e+09 Max. :2016-05-12 Max. :22988 Max. :17.950
TrackerDistance LoggedActivitiesDistance VeryActiveDistance
Min. : 0.000 Min. :0.0000 Min. : 0.000
1st Qu.: 3.160 1st Qu.:0.0000 1st Qu.: 0.000
Median : 6.120 Median :0.0000 Median : 0.530
Mean : 5.715 Mean :0.1215 Mean : 1.397
3rd Qu.: 7.880 3rd Qu.:0.0000 3rd Qu.: 2.310
Max. :17.950 Max. :4.9421 Max. :13.400
ModeratelyActiveDistance LightActiveDistance SedentaryActiveDistance
Min. :0.0000 Min. : 0.000 Min. :0.0000000
1st Qu.:0.0000 1st Qu.: 2.350 1st Qu.:0.0000000
Median :0.4000 Median : 3.540 Median :0.0000000
Mean :0.7309 Mean : 3.532 Mean :0.0006795
3rd Qu.:1.0000 3rd Qu.: 4.830 3rd Qu.:0.0000000
Max. :6.4800 Max. :10.300 Max. :0.1100000
VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes
Min. : 0.00 Min. : 0.00 Min. : 0.0 Min. : 0.0
1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.:144.0 1st Qu.: 659.0
Median : 8.00 Median : 10.00 Median :200.0 Median : 734.0
Mean : 23.94 Mean : 17.34 Mean :199.8 Mean : 799.4
3rd Qu.: 36.00 3rd Qu.: 24.00 3rd Qu.:258.0 3rd Qu.: 853.0
Max. :210.00 Max. :143.00 Max. :518.0 Max. :1440.0
Calories Month.x WeekDay.x TotalSleepRecords
Min. : 0 Length:12348 Length:12348 Min. :1.000
1st Qu.:1776 Class :character Class :character 1st Qu.:1.000
Median :2158 Mode :character Mode :character Median :1.000
Mean :2323 Mean :1.122
3rd Qu.:2859 3rd Qu.:1.000
Max. :4900 Max. :3.000
TotalMinutesAsleep TotalTimeInBed Date Time
Min. : 58.0 Min. : 61.0 Min. :2016-04-12 Length:12348
1st Qu.:361.0 1st Qu.:402.0 1st Qu.:2016-04-19 Class1:hms
Median :432.0 Median :462.0 Median :2016-04-27 Class2:difftime
Mean :419.1 Mean :458.2 Mean :2016-04-26 Mode :numeric
3rd Qu.:492.0 3rd Qu.:526.0 3rd Qu.:2016-05-04
Max. :796.0 Max. :961.0 Max. :2016-05-12
Month.y WeekDay.y
Length:12348 Length:12348
Class :character Class :character
Mode :character Mode :character
Code
summary(dailyIntensities_weight)
Id ActivityDay SedentaryMinutes LightlyActiveMinutes
Min. :1.504e+09 Min. :2016-04-12 Min. : 0.0 Min. : 0.0
1st Qu.:1.504e+09 1st Qu.:2016-04-19 1st Qu.: 654.0 1st Qu.:191.8
Median :2.912e+09 Median :2016-04-27 Median : 739.0 Median :233.0
Mean :2.912e+09 Mean :2016-04-27 Mean : 792.0 Mean :224.4
3rd Qu.:4.320e+09 3rd Qu.:2016-05-04 3rd Qu.: 834.5 3rd Qu.:288.8
Max. :4.320e+09 Max. :2016-05-12 Max. :1440.0 Max. :390.0
FairlyActiveMinutes VeryActiveMinutes SedentaryActiveDistance
Min. : 0.00 Min. : 0.00 Min. :0
1st Qu.: 8.00 1st Qu.: 1.00 1st Qu.:0
Median :13.50 Median :14.50 Median :0
Mean :15.74 Mean :21.15 Mean :0
3rd Qu.:23.00 3rd Qu.:37.75 3rd Qu.:0
Max. :47.00 Max. :78.00 Max. :0
LightActiveDistance ModeratelyActiveDistance VeryActiveDistance
Min. :0.000 Min. :0.0000 Min. :0.0000
1st Qu.:2.973 1st Qu.:0.3225 1st Qu.:0.0625
Median :4.255 Median :0.5650 Median :1.0350
Mean :3.961 Mean :0.6482 Mean :1.5682
3rd Qu.:5.397 3rd Qu.:0.9875 3rd Qu.:2.7850
Max. :6.440 Max. :2.1200 Max. :6.4000
Month.x WeekDay.x Date WeightKg
Length:62 Length:62 Min. :2016-04-17 Min. :52.6
Class :character Class :character 1st Qu.:2016-04-17 1st Qu.:52.6
Mode :character Mode :character Median :2016-04-24 Median :62.5
Mean :2016-04-24 Mean :62.5
3rd Qu.:2016-05-02 3rd Qu.:72.4
Max. :2016-05-02 Max. :72.4
WeightPounds Fat BMI IsManualReport
Min. :116.0 Min. :22.0 Min. :22.65 Mode:logical
1st Qu.:116.0 1st Qu.:22.0 1st Qu.:22.65 TRUE:62
Median :137.8 Median :23.5 Median :25.05
Mean :137.8 Mean :23.5 Mean :25.05
3rd Qu.:159.6 3rd Qu.:25.0 3rd Qu.:27.45
Max. :159.6 Max. :25.0 Max. :27.45
LogId Time Month.y WeekDay.y
Min. :1.461e+12 Length:62 Length:62 Length:62
1st Qu.:1.461e+12 Class1:hms Class :character Class :character
Median :1.462e+12 Class2:difftime Mode :character Mode :character
Mean :1.462e+12 Mode :numeric
3rd Qu.:1.462e+12
Max. :1.462e+12
Time_of_day
Night : 0
Morning : 0
Afternoon: 0
Evening :62
Code
summary(hourly_calories_intensities)
Id Calories Date.x Time.x
Min. :1.504e+09 Min. : 42.00 Min. :2016-04-12 Length:15393213
1st Qu.:2.320e+09 1st Qu.: 63.00 1st Qu.:2016-04-19 Class1:hms
Median :4.445e+09 Median : 83.00 Median :2016-04-26 Class2:difftime
Mean :4.820e+09 Mean : 97.74 Mean :2016-04-26 Mode :numeric
3rd Qu.:6.962e+09 3rd Qu.:109.00 3rd Qu.:2016-05-04
Max. :8.878e+09 Max. :948.00 Max. :2016-05-12
Month.x WeekDay.x Time_of_day.x TotalIntensity
Length:15393213 Length:15393213 Night :4545397 Min. : 0.00
Class :character Class :character Morning :3880444 1st Qu.: 0.00
Mode :character Mode :character Afternoon:3814580 Median : 3.00
Evening :3152792 Mean : 12.06
3rd Qu.: 16.00
Max. :180.00
AverageIntensity Date.y Time.y Month.y
Min. :0.0000 Min. :2016-04-12 Length:15393213 Length:15393213
1st Qu.:0.0000 1st Qu.:2016-04-19 Class1:hms Class :character
Median :0.0500 Median :2016-04-26 Class2:difftime Mode :character
Mean :0.2010 Mean :2016-04-26 Mode :numeric
3rd Qu.:0.2667 3rd Qu.:2016-05-04
Max. :3.0000 Max. :2016-05-12
WeekDay.y Time_of_day.y
Length:15393213 Night :4545397
Class :character Morning :3880444
Mode :character Afternoon:3814580
Evening :3152792
Code
summary(hourly_calories_steps)
Id Calories Date.x Time.x
Min. :1.504e+09 Min. : 42.00 Min. :2016-04-12 Length:15393213
1st Qu.:2.320e+09 1st Qu.: 63.00 1st Qu.:2016-04-19 Class1:hms
Median :4.445e+09 Median : 83.00 Median :2016-04-26 Class2:difftime
Mean :4.820e+09 Mean : 97.74 Mean :2016-04-26 Mode :numeric
3rd Qu.:6.962e+09 3rd Qu.:109.00 3rd Qu.:2016-05-04
Max. :8.878e+09 Max. :948.00 Max. :2016-05-12
Month.x WeekDay.x Time_of_day.x StepTotal
Length:15393213 Length:15393213 Night :4545397 Min. : 0.0
Class :character Class :character Morning :3880444 1st Qu.: 0.0
Mode :character Mode :character Afternoon:3814580 Median : 41.0
Evening :3152792 Mean : 321.2
3rd Qu.: 359.0
Max. :10554.0
Date.y Time.y Month.y WeekDay.y
Min. :2016-04-12 Length:15393213 Length:15393213 Length:15393213
1st Qu.:2016-04-19 Class1:hms Class :character Class :character
Median :2016-04-26 Class2:difftime Mode :character Mode :character
Mean :2016-04-26 Mode :numeric
3rd Qu.:2016-05-04
Max. :2016-05-12
Time_of_day.y
Night :4545397
Morning :3880444
Afternoon:3814580
Evening :3152792
Code
summary(hourly_intensities_calories)
Id ActivityDay.x Calories Month.x
Min. :1.504e+09 Min. :2016-04-12 Min. : 0 Length:19615642
1st Qu.:2.320e+09 1st Qu.:2016-04-19 1st Qu.:1821 Class :character
Median :4.445e+09 Median :2016-04-26 Median :2162 Mode :character
Mean :4.802e+09 Mean :2016-04-26 Mean :2320
3rd Qu.:6.962e+09 3rd Qu.:2016-05-04 3rd Qu.:2809
Max. :8.878e+09 Max. :2016-05-12 Max. :4900
WeekDay.x ActivityDay.y StepTotal Month.y
Length:19615642 Min. :2016-04-12 Min. : 0 Length:19615642
Class :character 1st Qu.:2016-04-19 1st Qu.: 3761 Class :character
Mode :character Median :2016-04-26 Median : 7502 Mode :character
Mean :2016-04-26 Mean : 7696
3rd Qu.:2016-05-04 3rd Qu.:10817
Max. :2016-05-12 Max. :36019
WeekDay.y TotalIntensity AverageIntensity Date
Length:19615642 Min. : 0.00 Min. :0.0000 Min. :2016-04-12
Class :character 1st Qu.: 0.00 1st Qu.:0.0000 1st Qu.:2016-04-19
Mode :character Median : 3.00 Median :0.0500 Median :2016-04-26
Mean : 12.06 Mean :0.2010 Mean :2016-04-26
3rd Qu.: 16.00 3rd Qu.:0.2667 3rd Qu.:2016-05-04
Max. :180.00 Max. :3.0000 Max. :2016-05-12
Time Month WeekDay Time_of_day
Length:19615642 Length:19615642 Length:19615642 Night :5791418
Class1:hms Class :character Class :character Morning :4945271
Class2:difftime Mode :character Mode :character Afternoon:4861084
Mode :numeric Evening :4017869
7ANALYZE PHASE
In this crucial phase, I will analyze the clean and processed dataset to uncover answers to known and untapped questions. This will guide Bellabeat’s stakeholders and marketing executives to make informed decisions and develop a targeted marketing campaign. Ultimately, this will help retain the existing customer base as well as improve services to the highest standards possible.
7.1 Key Task
A list of computations was performed to perceive information in a more comprehensive manner in order to understand how female customers are utilizing products and services.
The analysis was conducted to obtain a thorough understanding of female customer traits and to identify any patterns that could help the analytics team determine areas for improvement.
Concatenating multiple datasets expands the ability to explore trends and relationships that may exist, thereby providing a clarity on the significance of the user base.
Several columns were aggregated in order to create another attribute on which comparisons were performed accordingly to ensure a refined understanding of the day to day activity recorded through the bellabeat app.
Various R built-in functions were used to thoroughly examine these datasets and to finalize the profiling for this analysis.
7.2 Deliverables
Numerous analysis were performed using some specific functions such as summarise(), distinct(),group_by(),describe(),table(), etc.
The computations will state a brief on how female customers perceive services and products across different categories.
In addition,some statistical operations were also performed to get the relevant distribution of attributes within datasets influencing customer’s conduct.
7.3 Code Chunk
Getting an overview of the maximum and minimum calories burned by each user using max() and min() function respectively.
Summarizing the dataset specific column to take up the exact overview of respective columns and also rounding of the summary in two decimal places using ‘digit’.
In this phase, potential insights will be shared through the use of appropriate visualizations created with tools such as R and Tableau. These visualizations will depict actionable steps that stakeholders can initiate to address the relevant concern.
8.1 Key Task
Selecting the most adequate tools such as R and tableau to illustrate the visualization in a more effective manner.
Choosing the appropriate graph type to conclude findings along with legends, labels and heading to improve readability and interpretation.
Provide detailed explanations for all aspects of the analysis, including minor details by making the visualization interactive.
Ensuring work is easily accessible.
8.2 Deliverables
Presentation of findings accompanied with illustration of graphs along with explanations.
Put a short brief for each visualization included in this phase to aid effective understanding.
All of the visualizations were made interactive in order to provide a wider outlook.
8.3 Visualization
Comparing Total time Slept Vs Total time in Bed using various geom() functions such as geom_point(), geom_smooth() and geom_jitter().
Code
ggplotly(ggplot(data = sleep_day) +aes(x =as_hms(TotalTimeInBed), y =as_hms(TotalMinutesAsleep)) +geom_point() +geom_smooth() +geom_jitter() +labs(title =paste0("<b>", "Total time Asleep Vs Total Time In Bed" ,"</b>"), x ="Total Time in Bed" , y ="Total Minutes as Asleep") +theme_minimal())
Here, comparing the split of distance covered for every weekday of a month using geom_bar() function.
Code
ggplotly(ggplot(data = daily_activity) +aes(x = Month, y = TotalDistance, fill = WeekDay) +geom_bar(stat ='identity',position ='dodge', width =1) +scale_fill_manual(values =c("blue","orange","brown","yellow","black","red","darkgoldenrod")) +labs(title =paste0("<b>", "Comparing Distance for every Weekdays in a month","</b>"), x ="Month", Y ="Total Distance" , fill ="Weekdays" ) +theme(axis.text.x =element_text(vjust =0.5, hjust=1),plot.background =element_rect(fill ="lightblue")))
Visualized the difference in total steps taken for each weekday of a month using the geom_col() function.
Code
ggplotly(ggplot(data = daily_activity) +aes(x = Month, y = TotalSteps, fill = WeekDay) +geom_col(position ='dodge',width =1 )+scale_fill_manual(values =c("brown","darkgreen","orange","darkgoldenrod","black","blue","darkorchid")) +labs(title =paste0("<b>","Comparing Total Steps for every Weekdays in a month","</b>"), x ="Month", y ="Total Steps", fill ="Weekdays") +theme(axis.title.x =element_text(vjust =0.5, hjust =1), plot.background =element_rect(fill ="skyblue")))
Trying to get the co-relation of total distance covered vs total steps taken every weekday of a month using geom_line(),geom_point() and facet_wrap functions.
Code
ggplotly(ggplot(data = daily_activity) +aes(x = TotalDistance, y = TotalSteps, fill = Month) +geom_line(linewidth =1.5)+geom_point(size =2) +facet_wrap(~WeekDay) +scale_fill_manual(values =c("lightblue","orange") ) +theme(panel.grid.major =element_line(color ="gray", linetype ="dotted")) +labs(title ="Total Distance Vs Total Steps Taken") +xlab("Total Distance") +ylab("Total Steps"))
Visualized daily calories burned for every weekday in a month using type as bar plot.
Code
plot_ly(data = daily_calories, x =~WeekDay, y =~Calories, type ="bar", color =~Month, colors =c("black", "darkorchid")) %>%layout(title ="Daily Calories by Weekday and Month", xaxis =list(title ="Weekday"), yaxis =list(title ="Calories"),legend =list(title =list(text ="Month")),hovermode ="closest") %>%layout(xaxis =list(tickangle =60, tickfont =list(size =10)))
Visualizing hourly calories burned by each user Id for a weekday in a month using type as geom_line().
Create ggplot object.
Code
ggploty_obj <-ggplot(data = hourly_calories, aes(x =as_hms(Time), y = Calories, color = Month)) +geom_line(linewidth =1.5, alpha =0.8) +facet_wrap(~WeekDay) +scale_color_brewer(palette ="Set1") +labs(x ="Time of Day", y ="Calories Burned", title ="Hourly Calories burned each weekday of an month") +theme(axis.text.x =element_text(angle =90, hjust =1))
Convert ggplot object to plotly object.
Code
plot_obj <-ggplotly(ggploty_obj)
Show plotly object.
Code
plot_obj
Visualized each minute of calories burned for a month on weekday basis using geom_line() and facet_wrap() function.
Create ggplot object.
Code
ggplot_obj <-ggplot(data = Calories, aes(x =as_hms(Time), y = total_calories, color = Month)) +geom_line(linewidth =1.5, alpha =0.8) +facet_wrap(~WeekDay) +scale_color_brewer(palette ="Dark2") +labs(x ="Time of Day", y ="Calories Burned", title ="Each minute of calories burned on a weekday of a month") +theme(axis.text.x =element_text(angle =90, hjust =1))
Convert ggplot object to plotly object.
Code
plotly_obj <-ggplotly(ggplot_obj)
Display plotly object.
Code
plotly_obj
Comparing randomness and correlation for Total_Minute_As_sleep, Total_Time_In_Bed and Sedentary_Minutes using correlogram.
This extensively crucial phase of strategizing the new marketing campaign will be carried out by Urška Sršen(Bellabeat’s cofounder and Chief Creative Officer),Sando Mur(Mathematician and Bellabeat’s cofounder; key member of the Bellabeat executive team) and Bellabeat marketing team based on the conclusion of the above analysis made.
11CONCLUSION:
The relationship between total time asleep and total time in bed appears to be quite linear. Nevertheless, some users sleep for more than 10 hours and spend over 12 hours in bed, which results in an unhealthy sleep cycle.
The distance covered by each female user appears to be more active during weekends, particularly on Saturdays and Sundays, compared to other weekdays. Additionally, it varies from month to month. Out of the two months’ data available, Thursday and Tuesday recorded the lowest distance covered by female users for the month of May.
The co-relation of total steps walked every weekday in a month appear to be similar with the distance covered by each user. This suggests that most calories are burned through walking, irrespective of any other exercises.
Based on two-month data, one interesting insight is that the maximum amount of calories on a daily basis were burned by women users during the month of April, whereas the lowest sum of calories was burned during the month of May.
The data shows that active weekdays for calories burned on a minute-by-minute and hourly basis almost follows a linear pattern, with Friday, Saturday, Thursday, and Tuesday being the peak weekdays compared to the rest of the week.
The correlogram of total time asleep, and total time in bed clearly indicates a positive relationship between them.However, sedentary minutes have a negative relationship with both total time asleep and time in bed. Additionally, the distribution of sedentary minutes is bimodal.These findings suggest that increasing time spent asleep and in bed may be beneficial for overall health, while reducing sedentary behavior could also have a positive impact on day to day life.
The correlogram of weight in pounds and BMI reveals a clear positive relationship between these variables, while sedentary minutes exhibit a negative relationship with both weight and BMI. These findings suggest that reducing sedentary behavior could help maintain a balanced weight and BMI.
The pie chart clearly shows that sedentary minutes percentage are comparatively much higher than the rest of the segments, such as very active minutes, fairly active minutes and lightly active minutes, which shows that users were very minimally active.
The second pie chart reveals that lightly active distance contribute almost 50% of total segments followed by very active and moderate distance.However, sedentary distance is almost negligible as compared to all three segments.
Among the three categories of distance covered by each female user, the average distance for very active distance is 1.39km, the moderate distance is 0.73km, while in comparison with the light active distance, which is much higher at 3.53km. These findings suggest that female users engaged in a variety of physical activity levels, with light activity being the most common and highest in distance.
12DELIVERABLE:
Allocate data engineers to focus on collecting a diverse range of health data while maintaining data integrity and by enabling integration of data science into bellabeat resulting in personalized push notifications based on individual health parameters.
Integrating data science can help curate a comprehensive understanding of their health and wellness that in turn can guide them towards more informed decisions about their lifestyle choices and improve their overall well-being.Thus making the app a more effective tool for achieving health and wellness goals.
An effective way to get weight info is to first partner with any weight scale manufacturer to develop a smart weight scale digital solution to get the precise weight readings.It is one of the crucial parameters which will help nutritionists or algorithms to provide personalized and accurate guidance. Also, by leveraging this data, the overall experience for achieving health and wellness goals can be enhanced more effectively.
To create a more engaging experience for our users, we could develop a community feature that allows them to connect with friends from their contacts, social media accounts, or directly within the app. By leveraging social connections, we can boost user engagement and increase screen time on the app.
Bellabeat app can be an essential part of our users’ daily routine by integrating additional features that help them plan their day. Along with that, it could include an alarm option that alerts them to attend scheduled tasks throughout the day. Additionally, we could host weekly and monthly challenges that are open to users from any location, and offer rewards such as discount coupons or free product or service subscriptions for those who complete the challenges. These features will not only enhance the user experience, but also encourage them to use the app regularly and make healthier choices as part of their daily lifestyle.
To help users improve their sleep cycle, we can add extensions to the app that provide default notifications for getting into bed and waking up on time. This can be achieved by incorporating heart rate sensors that detect the user’s sleep time and provide alerts accordingly. These features will not only promote better sleep habits, but also enhance the user experience by providing personalized notifications based on their individual sleep patterns.
To reduce sedentary minutes, it is essential to add an extra feature that enables users to drink water after every half an hour to stay hydrated. Additionally,if a user has been sedentary for two or three hours, remind the user to take a short walk in order to make them more active and productive throughout the day.By incorporating these features, users can stay motivated to stay active and healthy, leading to a more balanced lifestyle.
To reduce total time in bed, bellabeat can add an additional feature to the sleep cycle feature that prompts users with a notification to fall asleep if they are awake in bed for an extended period of time. Additionally, Bellabeat can partner with a software company that produces meditation and sleep time stories and integrate subscription packages within its annual or monthly packages. This can help users to get a complete bundle of health and calmness along with a push to fall asleep early, thus reducing total time in bed.
Expand your user base by enabling referral benefits such as giving both users the benefit of a 50% - 50% discount on annual subscriptions, or else providing any subscription to any product.Other than that, another extra feature can be added by creating a bellabeat, secure wallet service where users can earn coins on referrals plus gets rewarded for every challenge they participate and then can spent it on any bellabeat products or services purchases made in any near future.