Google Data Analytics Capstone Project

How can a wellness technology company play it smart?

Business Task

The business task is to analyze smart device usage data of non-bellabeat devices togain insights into relevant consumer trends as well as to discover how to use these trends to bellabeat marketing strategies.

Issues with Data Reliability

Reliability: The data is not reliable. The data contains information for only 30 individuals which is not a representative sample of all the fit-bit smart devices users.
Original: The data is not original. It would have been had it been provided by fitbit itself.
Comprehensive: The data is not comprehensive. Some other data,like gender, age etc., would have been useful for an more accurate analysis.
Current: The data is not current.
Cited: The data is cited. It came from Amazon Mechanical Turk.

Keeping all the above points in mind, so the data analysis would not be accurate as the data integrity and credibilty is lacking.

Preparing and Cleaning Data

Loading the packages that will be used during the data cleaning and visualization process:

library(tidyverse)
library(lubridate)
library(ggplot2)
library(readr)
library(tidyr)
library(dplyr)
library(skimr)
library(janitor)
library(scales)

Now that the packages have been loaded, next sep is to import the required csv files to R.

daily_activities <- read.csv("C:\\Users\\Dibyajyoti Das\\Desktop\\case study 2\\Fitabase Data 4.12.16-5.12.16\\dailyActivity_merged.csv")
weight <- read.csv("C:\\Users\\Dibyajyoti Das\\Desktop\\case study 2\\Fitabase Data 4.12.16-5.12.16\\weightLogInfo_merged.csv")
sleep <- read.csv("C:\\Users\\Dibyajyoti Das\\Desktop\\case study 2\\Fitabase Data 4.12.16-5.12.16\\sleepDay_merged.csv")

The dataset contains 1.5 months worth of data. Also, in some cases, total steps have been recorded to be zero which may be because the user forgot wear their smart device.

The rows which contains 0 total steps has to be removed.

daily_activities1 <- daily_activities %>% 
  filter(TotalSteps!=0)

Separating date an time into different columns in weight and sleep dataset.

weight1 <- weight %>% 
  separate(Date, c("Date","Time")," ")

sleep1 <- sleep %>% 
  separate(SleepDay, c("Date","Time")," ")

To make all the datasets consistent to make merging them easier and clean, next step is to rename the ActivityDate column in daily_activities dataset to “Date”.

colnames(daily_activities1)[2]<-"Date"

Now we have a date column across all the datasets. For analysis, another column is created containing the weekdays.

weight1 <- weight1 %>% 
  mutate(Weekday = weekdays(as.Date(Date,"%m/%d/%Y")))

sleep1 <- sleep1 %>% 
  mutate(Weekday = weekdays(as.Date(Date,"%m/%d/%Y")))

daily_activities1 <- daily_activities1 %>% 
  mutate(Weekday = weekdays(as.Date(Date,"%m/%d/%Y")))

The datasets may contain duplicate rows, so we check the datasets for them.

sum(duplicated(sleep1))

## [1] 3

sum(duplicated(weight1))

## [1] 0

sum(duplicated(daily_activities1))

## [1] 0

It shows that sleep1 data set has 3 duplicate rows.

sleep1 <- sleep1[!duplicated(sleep), ]

Merging all three data sets into one.

combined_data <- merge(daily_activities1,sleep1, by = "Id")
combined_data_final <- merge(combined_data,weight1, by = "Id")

combined_data_final$Weekday <- factor(combined_data_final$Weekday,
                                      levels = c("Monday","Tuesday","Wednesday","Thursday",
                                                 "Friday","Saturday","Sunday"))

Now that the data has been properly cleaned and sorted, we move on to the analysis and visualization process.

Analyze

summary(daily_activities1)

##        Id                Date             TotalSteps    TotalDistance  
##  Min.   :1.504e+09   Length:863         Min.   :    4   Min.   : 0.00  
##  1st Qu.:2.320e+09   Class :character   1st Qu.: 4923   1st Qu.: 3.37  
##  Median :4.445e+09   Mode  :character   Median : 8053   Median : 5.59  
##  Mean   :4.858e+09                      Mean   : 8319   Mean   : 5.98  
##  3rd Qu.:6.962e+09                      3rd Qu.:11092   3rd Qu.: 7.90  
##  Max.   :8.878e+09                      Max.   :36019   Max.   :28.03  
##  TrackerDistance  LoggedActivitiesDistance VeryActiveDistance
##  Min.   : 0.000   Min.   :0.0000           Min.   : 0.000    
##  1st Qu.: 3.370   1st Qu.:0.0000           1st Qu.: 0.000    
##  Median : 5.590   Median :0.0000           Median : 0.410    
##  Mean   : 5.964   Mean   :0.1178           Mean   : 1.637    
##  3rd Qu.: 7.880   3rd Qu.:0.0000           3rd Qu.: 2.275    
##  Max.   :28.030   Max.   :4.9421           Max.   :21.920    
##  ModeratelyActiveDistance LightActiveDistance SedentaryActiveDistance
##  Min.   :0.0000           Min.   : 0.000      Min.   :0.00000        
##  1st Qu.:0.0000           1st Qu.: 2.345      1st Qu.:0.00000        
##  Median :0.3100           Median : 3.580      Median :0.00000        
##  Mean   :0.6182           Mean   : 3.639      Mean   :0.00175        
##  3rd Qu.:0.8650           3rd Qu.: 4.895      3rd Qu.:0.00000        
##  Max.   :6.4800           Max.   :10.710      Max.   :0.11000        
##  VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes
##  Min.   :  0.00    Min.   :  0.00      Min.   :  0.0        Min.   :   0.0  
##  1st Qu.:  0.00    1st Qu.:  0.00      1st Qu.:146.5        1st Qu.: 721.5  
##  Median :  7.00    Median :  8.00      Median :208.0        Median :1021.0  
##  Mean   : 23.02    Mean   : 14.78      Mean   :210.0        Mean   : 955.8  
##  3rd Qu.: 35.00    3rd Qu.: 21.00      3rd Qu.:272.0        3rd Qu.:1189.0  
##  Max.   :210.00    Max.   :143.00      Max.   :518.0        Max.   :1440.0  
##     Calories      Weekday         
##  Min.   :  52   Length:863        
##  1st Qu.:1856   Class :character  
##  Median :2220   Mode  :character  
##  Mean   :2361                     
##  3rd Qu.:2832                     
##  Max.   :4900

summary(sleep1)

##        Id                Date               Time           TotalSleepRecords
##  Min.   :1.504e+09   Length:410         Length:410         Min.   :1.00     
##  1st Qu.:3.977e+09   Class :character   Class :character   1st Qu.:1.00     
##  Median :4.703e+09   Mode  :character   Mode  :character   Median :1.00     
##  Mean   :4.995e+09                                         Mean   :1.12     
##  3rd Qu.:6.962e+09                                         3rd Qu.:1.00     
##  Max.   :8.792e+09                                         Max.   :3.00     
##  TotalMinutesAsleep TotalTimeInBed    Weekday         
##  Min.   : 58.0      Min.   : 61.0   Length:410        
##  1st Qu.:361.0      1st Qu.:403.8   Class :character  
##  Median :432.5      Median :463.0   Mode  :character  
##  Mean   :419.2      Mean   :458.5                     
##  3rd Qu.:490.0      3rd Qu.:526.0                     
##  Max.   :796.0      Max.   :961.0

summary(weight1)

##        Id                Date               Time              WeightKg     
##  Min.   :1.504e+09   Length:67          Length:67          Min.   : 52.60  
##  1st Qu.:6.962e+09   Class :character   Class :character   1st Qu.: 61.40  
##  Median :6.962e+09   Mode  :character   Mode  :character   Median : 62.50  
##  Mean   :7.009e+09                                         Mean   : 72.04  
##  3rd Qu.:8.878e+09                                         3rd Qu.: 85.05  
##  Max.   :8.878e+09                                         Max.   :133.50  
##                                                                            
##   WeightPounds        Fat             BMI        IsManualReport    
##  Min.   :116.0   Min.   :22.00   Min.   :21.45   Length:67         
##  1st Qu.:135.4   1st Qu.:22.75   1st Qu.:23.96   Class :character  
##  Median :137.8   Median :23.50   Median :24.39   Mode  :character  
##  Mean   :158.8   Mean   :23.50   Mean   :25.19                     
##  3rd Qu.:187.5   3rd Qu.:24.25   3rd Qu.:25.56                     
##  Max.   :294.3   Max.   :25.00   Max.   :47.54                     
##                  NA's   :65                                        
##      LogId             Weekday         
##  Min.   :1.460e+12   Length:67         
##  1st Qu.:1.461e+12   Class :character  
##  Median :1.462e+12   Mode  :character  
##  Mean   :1.462e+12                     
##  3rd Qu.:1.462e+12                     
##  Max.   :1.463e+12                     
##

Trends

The median total steps by users is 8053 with maximum and minimum being 36019 and 4 respectively.
The median total distance traveled is 5.59 kilometers.
The median for Very active minutes is 23.01, for fairly active minutes is 8.0, for light active minutes is 208.0 and for sedentary minutes is 1021.0
The median for total minute asleep is 432.5.
The median BMI is 24.39.

Visualization

Now, we present our insights through graphs and charts

We see that maximum amount of users spent their day being sedentary while very active and fairly active make up only 2 % of the total time.

We find that data recording by users is not consistent throughout the week. Users record the least amount of data on Friday and Saturday, that is the days leading to weekend. While maximum data is reported on Wednesday.

Maximum steps recorded by users is on Monday and Wednesday. Users record minimum steps on Friday and Saturday which is also the days users record least of data as per our previous data.

Users burn maximum calories on Wednesday and minimum on Friday and Saturday which is consistent with data from the previous bar graphs.

A very interesting trend that is noticed in this bar chart is that users spent most of their asleep on Wednesday which is also when burn the most calories.

## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

It can be seen that users burn more calories with increasing steps. It can also be seen there is spike of calories burned in between 5000 an 10000 steps. This may be due to users being more active and thus burning more calories.

Sedentary Minutes vs Total Steps:

## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

Some users burn calories in the range of 4000 kcal just being sedentary.

Very Active Minutes vs Total Minutes:

## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

From the above graphs, it can be seen that regardless how much a user slept, the average user is mostly sedentary. In fact, the more sleep user had, the more sedentary he/she becomes.

Recommendations

There has been problems to input information by users. Bellabeat could provide incentives to users for consistent tracking.
Bellabeat products could have algorithm which would track the users schedule and provide health recommendation catered to the specific user.
Bellabeat could have offer different memberships to users, like premium and casual. Some services locked for its premium members.