To identify Business Problem:This Business case study research Analytics is done To get insights regarding the sleep and activity intensity of the people, being a fitness tracker.
The dataset is collected from Kaggle : https://www.kaggle.com/arashnic/fitbit/metadata . It contains 18 csv files.
We will explore the data. The 2 main parameters to observwe will be sleep and activity, to track the fitness of the people.
So, Let’s get started!
Here we’ll create our data frames for review. The data frames I’ll be working with in this review will be creating objects for:
daily_activity, daily_calories, daily_sleep, weight_log_info, daily_intensities
library('tidyverse')
library('janitor')
library('skimr')
library('here')
library('dplyr')
library('lubridate')
library('ggplot2')
daily_activity <- read.csv("dailyActivity_merged.csv")
daily_calories <- read.csv("dailyCalories_merged.csv")
sleep_day <- read.csv("sleepDay_merged.csv")
daily_intensities <- read.csv("dailyIntensities_merged.csv")
weight_log <- read.csv("weightLogInfo_merged.csv")
head(daily_activity)
Id ActivityDate TotalSteps TotalDistance TrackerDistance
1 1503960366 4/12/2016 13162 8.50 8.50
2 1503960366 4/13/2016 10735 6.97 6.97
3 1503960366 4/14/2016 10460 6.74 6.74
4 1503960366 4/15/2016 9762 6.28 6.28
5 1503960366 4/16/2016 12669 8.16 8.16
6 1503960366 4/17/2016 9705 6.48 6.48
LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
1 0 1.88 0.55
2 0 1.57 0.69
3 0 2.44 0.40
4 0 2.14 1.26
5 0 2.71 0.41
6 0 3.19 0.78
LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
1 6.06 0 25
2 4.71 0 21
3 3.91 0 30
4 2.83 0 29
5 5.04 0 36
6 2.51 0 38
FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
1 13 328 728 1985
2 19 217 776 1797
3 11 181 1218 1776
4 34 209 726 1745
5 10 221 773 1863
6 20 164 539 1728
colnames(daily_activity)
[1] "Id" "ActivityDate"
[3] "TotalSteps" "TotalDistance"
[5] "TrackerDistance" "LoggedActivitiesDistance"
[7] "VeryActiveDistance" "ModeratelyActiveDistance"
[9] "LightActiveDistance" "SedentaryActiveDistance"
[11] "VeryActiveMinutes" "FairlyActiveMinutes"
[13] "LightlyActiveMinutes" "SedentaryMinutes"
[15] "Calories"
glimpse(daily_activity)
Rows: 940
Columns: 15
$ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960~
$ ActivityDate <chr> "4/12/2016", "4/13/2016", "4/14/2016", "4/1~
$ TotalSteps <int> 13162, 10735, 10460, 9762, 12669, 9705, 130~
$ TotalDistance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9~
$ TrackerDistance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9~
$ LoggedActivitiesDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ VeryActiveDistance <dbl> 1.88, 1.57, 2.44, 2.14, 2.71, 3.19, 3.25, 3~
$ ModeratelyActiveDistance <dbl> 0.55, 0.69, 0.40, 1.26, 0.41, 0.78, 0.64, 1~
$ LightActiveDistance <dbl> 6.06, 4.71, 3.91, 2.83, 5.04, 2.51, 4.71, 5~
$ SedentaryActiveDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ VeryActiveMinutes <int> 25, 21, 30, 29, 36, 38, 42, 50, 28, 19, 66,~
$ FairlyActiveMinutes <int> 13, 19, 11, 34, 10, 20, 16, 31, 12, 8, 27, ~
$ LightlyActiveMinutes <int> 328, 217, 181, 209, 221, 164, 233, 264, 205~
$ SedentaryMinutes <int> 728, 776, 1218, 726, 773, 539, 1149, 775, 8~
$ Calories <int> 1985, 1797, 1776, 1745, 1863, 1728, 1921, 2~
> ```
head(daily_calories)
Id ActivityDay Calories
1 1503960366 4/12/2016 1985
2 1503960366 4/13/2016 1797
3 1503960366 4/14/2016 1776
4 1503960366 4/15/2016 1745
5 1503960366 4/16/2016 1863
6 1503960366 4/17/2016 1728
[1] "Id" "ActivityDay" "Calories"
glimpse(daily_calories)
Rows: 940
Columns: 3
$ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 15039603~
$ ActivityDay <chr> "4/12/2016", "4/13/2016", "4/14/2016", "4/15/2016", "4/1~
$ Calories <int> 1985, 1797, 1776, 1745, 1863, 1728, 1921, 2035, 1786, 17~
head(sleep_day)
Id SleepDay TotalSleepRecords TotalMinutesAsleep
1 1503960366 4/12/2016 12:00:00 AM 1 327
2 1503960366 4/13/2016 12:00:00 AM 2 384
3 1503960366 4/15/2016 12:00:00 AM 1 412
4 1503960366 4/16/2016 12:00:00 AM 2 340
5 1503960366 4/17/2016 12:00:00 AM 1 700
6 1503960366 4/19/2016 12:00:00 AM 1 304
TotalTimeInBed
1 346
2 407
3 442
4 367
5 712
6 320
colnames(sleep_day)
[1] "Id" "SleepDay" "TotalSleepRecords"
[4] "TotalMinutesAsleep" "TotalTimeInBed"
glimpse(sleep_day)
Rows: 413
Columns: 5
$ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 1~
$ SleepDay <chr> "4/12/2016 12:00:00 AM", "4/13/2016 12:00:00 AM",~
$ TotalSleepRecords <int> 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1~
$ TotalMinutesAsleep <int> 327, 384, 412, 340, 700, 304, 360, 325, 361, 430,~
$ TotalTimeInBed <int> 346, 407, 442, 367, 712, 320, 377, 364, 384, 449,~
head(daily_intensities)
Id ActivityDay SedentaryMinutes LightlyActiveMinutes
1 1503960366 4/12/2016 728 328
2 1503960366 4/13/2016 776 217
3 1503960366 4/14/2016 1218 181
4 1503960366 4/15/2016 726 209
5 1503960366 4/16/2016 773 221
6 1503960366 4/17/2016 539 164
FairlyActiveMinutes VeryActiveMinutes SedentaryActiveDistance
1 13 25 0
2 19 21 0
3 11 30 0
4 34 29 0
5 10 36 0
6 20 38 0
LightActiveDistance ModeratelyActiveDistance VeryActiveDistance
1 6.06 0.55 1.88
2 4.71 0.69 1.57
3 3.91 0.40 2.44
4 2.83 1.26 2.14
5 5.04 0.41 2.71
6 2.51 0.78 3.19
colnames(daily_intensities)
[1] "Id" "ActivityDay"
[3] "SedentaryMinutes" "LightlyActiveMinutes"
[5] "FairlyActiveMinutes" "VeryActiveMinutes"
[7] "SedentaryActiveDistance" "LightActiveDistance"
[9] "ModeratelyActiveDistance" "VeryActiveDistance"
glimpse(daily_intensities)
Rows: 940
Columns: 10
$ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960~
$ ActivityDay <chr> "4/12/2016", "4/13/2016", "4/14/2016", "4/1~
$ SedentaryMinutes <int> 728, 776, 1218, 726, 773, 539, 1149, 775, 8~
$ LightlyActiveMinutes <int> 328, 217, 181, 209, 221, 164, 233, 264, 205~
$ FairlyActiveMinutes <int> 13, 19, 11, 34, 10, 20, 16, 31, 12, 8, 27, ~
$ VeryActiveMinutes <int> 25, 21, 30, 29, 36, 38, 42, 50, 28, 19, 66,~
$ SedentaryActiveDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ LightActiveDistance <dbl> 6.06, 4.71, 3.91, 2.83, 5.04, 2.51, 4.71, 5~
$ ModeratelyActiveDistance <dbl> 0.55, 0.69, 0.40, 1.26, 0.41, 0.78, 0.64, 1~
$ VeryActiveDistance <dbl> 1.88, 1.57, 2.44, 2.14, 2.71, 3.19, 3.25, 3~
>n_distinct(daily_activity$Id)
[1] 33
>n_distinct(sleep_day$Id)
[1] 24
new_daily_activity <- daily_activity %>%
distinct() %>%
mutate(ActivityDate=mdy(ActivityDate)) %>%
mutate(TotalActiveTime = VeryActiveMinutes+FairlyActiveMinutes+LightlyActiveMinutes) %>% ### To calculate Total Active Time
group_by(Id,ActivityDate)
head(new_daily_activity)
### A tibble: 6 x 16
### Groups: Id, ActivityDate [6]
Id ActivityDate TotalSteps TotalDistance TrackerDistance
<dbl> <date> <int> <dbl> <dbl>
1 1503960366 2016-04-12 13162 8.5 8.5
2 1503960366 2016-04-13 10735 6.97 6.97
3 1503960366 2016-04-14 10460 6.74 6.74
4 1503960366 2016-04-15 9762 6.28 6.28
5 1503960366 2016-04-16 12669 8.16 8.16
6 1503960366 2016-04-17 9705 6.48 6.48
### ... with 11 more variables: LoggedActivitiesDistance <dbl>,
### VeryActiveDistance <dbl>, ModeratelyActiveDistance <dbl>,
### LightActiveDistance <dbl>, SedentaryActiveDistance <dbl>,
### VeryActiveMinutes <int>, FairlyActiveMinutes <int>,
### LightlyActiveMinutes <int>, SedentaryMinutes <int>, Calories <int>,
### TotalActiveTime <int>
new_sleep_day <- sleep_day %>%
distinct() %>%
mutate(SleepDay=mdy_hms(SleepDay)) %>%
group_by(Id)
head(new_sleep_day)
### A tibble: 6 x 5
### Groups: Id [1]
Id SleepDay TotalSleepRecor~ TotalMinutesAsl~ TotalTimeInBed
<dbl> <dttm> <int> <int> <int>
1 1.50e9 2016-04-12 00:00:00 1 327 346
2 1.50e9 2016-04-13 00:00:00 2 384 407
3 1.50e9 2016-04-15 00:00:00 1 412 442
4 1.50e9 2016-04-16 00:00:00 2 340 367
5 1.50e9 2016-04-17 00:00:00 1 700 712
6 1.50e9 2016-04-19 00:00:00 1 304 320
Observations :
From the given plot 1 we can conclude that the number of fairly active people is the least at 14 whereas the number of people sedentary is the highest at 991. the people who are very active and lightly active are in the between. The correlation stands out at 0.77.
From the plot 2 we can conclude that the maximum number of people can take a total number of steps of around 10000 as the number of steps increase and the number of activity times decrease as hence there are hardly a very few people who are around 30,000 steps. R=0.72
From plot three we can conclude that the maximum number of people sleep from 729 hours and the least number of people sleep for more than 9 hours whereas there is a considerable amount of people who sleep less than 5 hours and 5 to 7 hours.
The correlation stands out at 0.93.
From plot 4 we can conclude that the graph moves in a directly proportional manner. People tend to be asleep during the time when they are in bed. R=0.93
From plot five we can conclude that the level of sleep adequacy is maximum for 5 hours and least after around 9 hours. we can see that the sleep adequacy tends to decrease after 5 hours of sleep. The correlation stands out at -0.88
Recommendations :
In the era of technology,it’s generally seen that whenever a product becomes versatile which can be used for multiple purposes it tends to attract more buyers as it also looks attractive then its existing competitors.
To make the product more versatile several things can be added to it:
1. Heart rate monitor to monitor the heart rate of all the users.
2. Calorie tracker to track the calories burnt while walking
3. Alarm for the user to set goals and routines accordingly to maintain fitness.
4. Some Targets can be placed for users with Levels of challenges like Level 1,Level2,Level3 and so on, then giving scores will motivate users to be in activities more.