Prerequisites:
library(readr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
###Exploratory Analysis
Exploring dailyActivity and sleepDay
#reading the datasets
dailyActivity <- read_csv("C:/Users/30694/Desktop/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")
## Rows: 940 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDate
## dbl (14): Id, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDi...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
sleepDay <- read_csv("C:/Users/30694/Desktop/Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")
## Rows: 413 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): SleepDay
## dbl (4): Id, TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
weightLogInfo <- read_csv("C:/Users/30694/Desktop/Fitabase Data 4.12.16-5.12.16/weightLogInfo_merged.csv")
## Rows: 67 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Date
## dbl (6): Id, WeightKg, WeightPounds, Fat, BMI, LogId
## lgl (1): IsManualReport
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
heartrate_seconds <- read_csv("C:/Users/30694/Desktop/Fitabase Data 4.12.16-5.12.16/heartrate_seconds_merged.csv")
## Rows: 2483658 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Time
## dbl (2): Id, Value
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# checking content
colnames(dailyActivity)
## [1] "Id" "ActivityDate"
## [3] "TotalSteps" "TotalDistance"
## [5] "TrackerDistance" "LoggedActivitiesDistance"
## [7] "VeryActiveDistance" "ModeratelyActiveDistance"
## [9] "LightActiveDistance" "SedentaryActiveDistance"
## [11] "VeryActiveMinutes" "FairlyActiveMinutes"
## [13] "LightlyActiveMinutes" "SedentaryMinutes"
## [15] "Calories"
colnames(sleepDay)
## [1] "Id" "SleepDay" "TotalSleepRecords"
## [4] "TotalMinutesAsleep" "TotalTimeInBed"
colnames(weightLogInfo)
## [1] "Id" "Date" "WeightKg" "WeightPounds"
## [5] "Fat" "BMI" "IsManualReport" "LogId"
colnames(heartrate_seconds)
## [1] "Id" "Time" "Value"
How many unique Ids in dailyActivity ?
n_distinct(dailyActivity$Id)
## [1] 33
n_distinct(sleepDay$Id)
## [1] 24
n_distinct(weightLogInfo$Id)
## [1] 8
33 for daily activity, 24 for sleepDay, 8 for weightLogInfo - insufficient data - potential area for improvement for future
#quick summary statistics
dailyActivity %>%
select(TotalSteps,
TotalDistance,
SedentaryMinutes) %>%
summary()
## TotalSteps TotalDistance SedentaryMinutes
## Min. : 0 Min. : 0.000 Min. : 0.0
## 1st Qu.: 3790 1st Qu.: 2.620 1st Qu.: 729.8
## Median : 7406 Median : 5.245 Median :1057.5
## Mean : 7638 Mean : 5.490 Mean : 991.2
## 3rd Qu.:10727 3rd Qu.: 7.713 3rd Qu.:1229.5
## Max. :36019 Max. :28.030 Max. :1440.0
sleepDay %>%
select(TotalSleepRecords,
TotalMinutesAsleep,
TotalTimeInBed) %>%
summary()
## TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
## Min. :1.000 Min. : 58.0 Min. : 61.0
## 1st Qu.:1.000 1st Qu.:361.0 1st Qu.:403.0
## Median :1.000 Median :433.0 Median :463.0
## Mean :1.119 Mean :419.5 Mean :458.6
## 3rd Qu.:1.000 3rd Qu.:490.0 3rd Qu.:526.0
## Max. :3.000 Max. :796.0 Max. :961.0
What’s the relationship between steps taken in a day and sedentary minutes ?
ggplot(dailyActivity, aes(TotalSteps,SedentaryMinutes)) + geom_point()
What’s the relationship between minutes asleep and time in bed ?
cor(sleepDay$TotalTimeInBed,sleepDay$TotalMinutesAsleep)
## [1] 0.9304575
ggplot(sleepDay, aes(TotalTimeInBed,TotalMinutesAsleep)) + geom_point()
There is a positive correlation, with some unexpected trends. While there’s a clear dependency here (the higher the number of time in bed, the higher is total minutes asleep), interestingly, some observations are not part of this trend, which means some people were not asleep while in bed for quite a while. Would be good to review those cases in detail.
let’s calculate time “not asleep but in bed” (AwakeInBed) for each day and plot these records on a histogram.
sleepDay <- sleepDay %>%
mutate(AwakeInBed = TotalTimeInBed - TotalMinutesAsleep)
ggplot(sleepDay, aes(AwakeInBed)) + geom_histogram(color="white") +
labs(title="Time awake in bed - histogram plot",x="Time awake in bed (min)", y = "Number of records (count)")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Most records show less than an hour of being awake in bed, which seems reasonable.
#join dailyActivity and sleepDay
mergedData<- sleepDay %>%
left_join(dailyActivity, by = "Id")
#quick analysis on heart rate
heartrate_seconds %>%
select(Time,
Value) %>%
summary()
## Time Value
## Length:2483658 Min. : 36.00
## Class :character 1st Qu.: 63.00
## Mode :character Median : 73.00
## Mean : 77.33
## 3rd Qu.: 88.00
## Max. :203.00
Median is 73 which seems healthy - heart rate should be between 60 and 100 max is 203 which is fairly normal when exercising. Min is 36 which is somewhat normal. Heart-rate slows down to 40 when a sleep.
overall - healthy patients.
#Activity Level during days of the week
ggplot(mergedData, aes(TotalSteps,Calories)) + geom_jitter()
There’s a correlation between daily steps and calories. The more steps taken, the more calories one may burn.
When are people most active during the week ?
I use mutate function here to create a new variable called “week”. lubridate is used to make working with dates and time easier.
weekly_calories <- dailyActivity %>%
mutate(week = lubridate::week(mdy(ActivityDate)))%>%
select(Id,week,Calories) %>%
group_by(Id,week) %>%
summarize(WeeklyCalories = sum(Calories)) %>%
arrange(Id,week)
## `summarise()` has grouped output by 'Id'. You can override using the `.groups`
## argument.
ggplot(weekly_calories, aes(week, WeeklyCalories),group = 1) +
geom_line() +
labs(title = "Weekly calories",
y = "Calories", x = "Week") +
facet_wrap(~ Id ) +
theme(plot.title = element_text(hjust = 1.0))
NOTE:
Here we can observe that during the first week users have burnt the most numbers of calories(i.e. they were active with their workouts)after which there is an decrease in activity or it has remained same for few. Users like 4057192912 , 2347167796 and few others were not at all consistent with their workouts.
Findings:
Number of steps are directly proportional to the number of calories burnt.
Distance is directly proportional to the number of calories burnt.
Users are more energetic and active during the first week and then it becomes stagnant.
There are users who are sleeping less then 7 hours (not recommended).
RECOMENDATIONS:
As per analysis, as some datasets were quite vague, I would suggest the usage of leaf which can be worn in multiple ways and can get more precised data about the user other then any other products which have a particular time or functionality.
Secondly, Product should capture the age, gender, weight and height as soon as the user wears it in first place.
Thirdly , a notification should be generated with suggestions when user is having less hours or more hours of sleep.
Fourth , Users who are staying consistent with their activity should be praised with some kind of discount notifications and also a summary on how they are staying active and doing good for their health. It will motivate them more.
Fifth , Users who are not active or have stopped burning calories should be notified with health benefits and advantages of workouts.