Bellabeat Health Technology Case Study- Hope Owens, Jan 1 2022
I am a (pretend) junior data analyst working on the marketing analyst team at Bellabeat, a high-tech manufacturer of health-focused products for women. Bellabeat is a successful small company, but they have the potential to become a larger player in the global smart device market. Urška Sršen, cofounder and Chief Creative Officer of Bellabeat, believes that analyzing smart device fitness data could help unlock new growth opportunities for the company. I have been asked to focus on one of Bellabeat’s products and analyze smart device data to gain insight into how consumers are using their smart devices to help guide marketing strategy for the company.
Characters:
Urška Sršen: Bellabeat’s cofounder and Chief Creative Officer
Sando Mur: Mathematician and Bellabeat’s cofounder; key member of the Bellabeat executive team
Bellabeat marketing analytics team: A team of data analysts responsible for collecting, analyzing, and reporting data that helps guide Bellabeat’s marketing strategy.
Products:
Bellabeat app: The Bellabeat app provides users with health data related to their activity, sleep, stress, menstrual cycle, and mindfulness habits. This data can help users better understand their current habits and make healthy decisions. The Bellabeat app connects to their line of smart wellness products.
Leaf: Bellabeat’s classic wellness tracker can be worn as a bracelet, necklace, or clip. The Leaf tracker connects to the Bellabeat app to track activity, sleep, and stress.
Time: This wellness watch combines the timeless look of a classic timepiece with smart technology to track user activity, sleep, and stress. The Time watch connects to the Bellabeat app to provide you with insights into your daily wellness.
Spring: This is a water bottle that tracks daily water intake using smart technology to ensure that you are appropriately hydrated throughout the day. The Spring bottle connects to the Bellabeat app to track your hydration levels.
The project is divided into three questions to guide marketing analysis:
What are some trends in smart device usage?
How could these trends apply to Bellabeat customers?
How could these trends help influence Bellabeat marketing strategy?
Statement of purpose: I am going to analyze open source fitness data to understand health variables in a subset of smart device users. The relationships between different health metrics and product use can give a lifestyle context for the marketing of Bellabeat smart devices, and can show areas of growth for Bellabeat. I am going to understand what factors of health can be most supported by Bellabeat Leaf, Time and application products, including sleep quality, falling asleep time, activity level, weight, BMI, fat levels, distance traveled, steps walked, as well as frequency of application use.
Our analysis needs to be specifically designed to address our business objective. Data was found through CC0: Public Domain data, made available through Mobius. This dataset generated by respondents to a distributed survey via Amazon Mechanical Turk between 03.12.2016-05.12.2016. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring.
This data source consists of 18 CSV documents that show different analytical perspectives about user health. Each document represents different quantitative data tracked by Fitbit. With each row being one time point per subject and data being tracked by day and time, these datasets are considered long.
| Table Name | Type | Description |
|---|---|---|
| dailyActivity_merged | Microsoft Excel CSV | id (user id number), ActivityDate (activity date and time in datetime), TotalSteps (total number of steps taken), TotalDistance (total distance traveled Km), LoggedActivitiesDistance (logged distance traveled Km), VeryActiveDistance (distance traveled- active state), ModeratelyActiveDistance (distance traveled- moderate state), LightActiveDistance (distance traveled- light state), SedentaryActiveDistance (distance traveled- sedentary state), VeryActiveMinutes (minutes active- active state), FairlyActiveMinutes (minutes active- moderate state), LightlyActiveMinutes (minutes active- light state), SedentaryMinutes (minutes active- sedentary state), Calories (calories burned) |
| dailyCalories_merged | Microsoft Excel CSV | Id (user id number), ActivityDate (activity date and time in datetime), Calories (calories burned) |
| dailyIntensities_merged | Microsoft Excel CSV | id (user id number), ActivityDate (activity date and time in datetime), SedentaryMinutes(minutes active- sedentary state), LightlyActiveMinutes (minutes active- light state), FairlyActiveMinutes (minutes active- moderate state), VeryActiveMinutes (minutes active- active state), SedentaryActiveDistance (distance traveled- sedentary state), LightActiveDistance (distance traveled- light state), ModeratelyActiveDistance (distance traveled- moderate state), VeryActiveDistance (distance traveled- active state) |
| dailySteps_merged | Microsoft Excel CSV | Id (user id number), ActivityDate (activity date and time in datetime), StepTotal (total number of steps taken) |
| heartrate_seconds_merged | Microsoft Excel CSV | Id (user id number), Time (activity date and time in datetime), Value (heartrate) |
| hourlyCalories_merged | Microsoft Excel CSV | Id (user id number), ActivityHour (activity date and time in datetime), Calories (calories burned) |
| hourlyIntensities_merged | Microsoft Excel CSV | Id (user id number), ActivityHour (activity date and time in datetime), TotalIntensity (total intensity level), AverageIntensity (average intensity level) |
| hourlySteps_merged | Microsoft Excel CSV | Id (user id number), ActivityHour (activity date and time in datetime), StepTotal (total number of steps taken) |
| minuteCaloriesNarrow_merged | Microsoft Excel CSV | Id (user id number), ActivityMinute (activity date and time in datetime), Calories (calories burned) |
| minuteCaloriesWide_merged | Microsoft Excel CSV | Id (user id number), ActivityHour (activity date and time in datetime), Calories00 (calories burned minute 00), Calories01 (calories burned minute 01), Calories02 (calories burned minute 02)… Calories59 (calories burned minute 59) |
| minuteIntensitiesNarrow_merged | Microsoft Excel CSV | Id (user id number), ActivityMinute (activity date and time in datetime), Intensity (intensity level) |
| minuteIntensitiesWide_merged | Microsoft Excel CSV | Id (user id number), ActivityHour (activity date and time in datetime), Intensity00 (intensity level minute 00), Intensity01 (intensity level minute 01), Intensity02 (intensity level minute 02)… Intensity59 (intensity level minute 59) |
| minuteMETsNarrow_merged | Microsoft Excel CSV | Id (user id number), ActivityMinute (activity date and time in datetime), METs (metabolic equivalent of task) |
| minuteSleep_merged | Microsoft Excel CSV | Id (user id number), date(activity date and time in datetime), value (time of sleep, hours), logId (data log id) |
| minuteStepsNarrow_merged | Microsoft Excel CSV | Id (user id number), ActivityMinute (activity date and time in datetime), Steps (total number of steps taken) |
| minuteStepsWide_merged | Microsoft Excel CSV | Id (user id number), ActivityHour (activity date and time in datetime), Steps00 (total number of steps taken minute 00), Steps01 (total number of steps taken minute 01), Steps02 (total number of steps taken minute 02)… Steps59 (total number of steps taken minute 59) |
| sleepDay_merged | Microsoft Excel CSV | Id (user id number), SleepDay(activity date and time in datetime), TotalSleepRecords (time of sleep, hours), TotalMinutesAsleep (time of sleep, minutes), TotalTimeInBed (time in bed, hours) |
| weightLogInfo_merged | Microsoft Excel CSV | Id (user id number), Date(activity date and time in datetime), WeightKg (weight measured in kg), WeightPounds (weight measured in lbs), Fat (percet fat), BMI (body mass index), IsManualReport (weight entered manually), LogId (data log id) |
Task 1: Choose data- For this business objective, I chose to use the following datasets: dailyActivity_merged, dailyCalories_merged, dailyIntensities_merged, dailySteps_merged, hourlyCalories_merged, hourlyIntensities_merged, sleepDay_merged, and weightLogInfo_merged. The data showed health variables on a larger scale that was most applicable to my business objective, being hourly and diel measures of health and not measures at the minute level.
Task 2: Know your audience- I am performing analysis for Urška Sršen, Sando Mur and the Bellabeat analytics team. They have a working understanding of the fitness smart devices, and I need to tailor my results to help inform their conclusions.
Task 3: Aims to address objective- In the aims of answering the business objective, I decided future analysis should focus on these variables: sleep quality, falling asleep time, activity level, weight, BMI, fat levels, distance traveled, steps walked, as well as frequency of application use.
Task 1: Data source validation- Verifying the metadata of the dataset confirmed it as open-source. The owner has dedicated the work to the public domain by waiving all of his or her rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law. We can copy, modify, and distribute the work without asking permission.
Due to the limitation of size (30 users) and not having any demographic information, our findings could be effected by a sampling bias. The population of fitbit users in this study might not be representative of the population as a whole. Additionally, the dataset is historic (2016) and has a limited duration of time (2 months long). This case study will pursue our business objective as an operational approach.
Task 2: Data exploration- I needed to prepare my workspace and load the packages that I would be using in my analysis. I then uploaded the files to R, and made sure they were in a usable format. I used the summary() function on each dataset to find general trends and irregularities in the data. In terms of R syntax, I did not have a problem with the original column names or data distribution, so I kept them the same.
#load packages used
require(lubridate) # package to work with dates and times
require(tidyverse) # package to clean and process data
require(plyr) # package to work with ddply
require(ggplot2) # package to graph values
require(GGally) # package extends function of 'ggplot2'
require(reshape2) # package long to wide datasets in r
require(car) # package to run anova in r
require(calendR) # package to plot calendar dates
require(openair) # package to plot for calender variables in r
require(magrittr) # package to use pipe %>%
require(tidyr) # package transform wide dataset to long dataset
require(hexbin) # package density plots
#load activity datasets provided by the CC0: Public Domain
daily_activity <- read.csv("C:/Users/hoper/OneDrive/Documents/Cycling case study/bellabeat/dailyActivity_merged.csv", header=TRUE)
daily_calories <- read.csv("C:/Users/hoper/OneDrive/Documents/Cycling case study/bellabeat/dailyCalories_merged.csv", header=TRUE)
daily_intensities <- read.csv("C:/Users/hoper/OneDrive/Documents/Cycling case study/bellabeat/dailyIntensities_merged.csv", header=TRUE)
daily_steps <- read.csv("C:/Users/hoper/OneDrive/Documents/Cycling case study/bellabeat/dailySteps_merged.csv", header=TRUE)
heartrate <- read.csv("C:/Users/hoper/OneDrive/Documents/Cycling case study/bellabeat/heartrate_seconds_merged.csv", header=TRUE)
hourly_calories <- read.csv("C:/Users/hoper/OneDrive/Documents/Cycling case study/bellabeat/hourlyCalories_merged.csv", header=TRUE)
hourly_intensities <- read.csv("C:/Users/hoper/OneDrive/Documents/Cycling case study/bellabeat/hourlyIntensities_merged.csv", header=TRUE)
hourly_steps <- read.csv("C:/Users/hoper/OneDrive/Documents/Cycling case study/bellabeat/hourlySteps_merged.csv", header=TRUE)
MET <- read.csv("C:/Users/hoper/OneDrive/Documents/Cycling case study/bellabeat/minuteMETsNarrow_merged.csv", header=TRUE)
sleep_day <- read.csv("C:/Users/hoper/OneDrive/Documents/Cycling case study/bellabeat/sleepDay_merged.csv", header=TRUE)
weight <- read.csv("C:/Users/hoper/OneDrive/Documents/Cycling case study/bellabeat/weightLogInfo_merged.csv", header=TRUE)
### Proofread data, prepare for analysis
#summary (daily_activity)
#summary (daily_calories)
#summary (daily_intensities)
#summary (daily_steps)
#summary (heartrate)
#summary (hourly_calories)
#summary (hourly_intensities)
#summary (hourly_steps)
#summary (MET)
#summary (sleep_day)
#summary (weight)
Task 1: Clean and process data- Before I could perform analysis, I needed my values to be in a usable format. I knew that I was going to be interested in date, time, and day of the week, so I transformed the date and time values to integer form, listed in separate columns. I also wanted to strip the hour from the time format, as well as day of week so I could broadly classify by these variables.
With a single row (as either an hour or day) serving as a unit of replication, I wanted rows to have all future classification variables listed in separate columns so I could group them later on. I performed these functions for all datasets.
#create date format to strip values
daily_activity$date <- mdy(daily_activity$ActivityDate) # import date field in correct format
daily_activity$month<- month(daily_activity$date) # get month of year
daily_activity$doy<- yday(daily_activity$date) # get day of year
daily_activity$year<- year(daily_activity$date) # get year
daily_activity$wday<- wday(daily_activity$date,
label = FALSE,
abbr = TRUE,
week_start = getOption("lubridate.week.start", 7),
locale = Sys.getlocale("LC_TIME")) # extract day of week from calendar date
Task 1: Analyze- Now with my datasets prepared for analysis, I was ready to group by data by different attributes. I was interested in distance and activity intensity levels during different time periods as my response variable. I set my function to find the average distance traveled by users for each intensity class, and well as the frequency of users, standard deviation and standard error.
For this test, with a single row (or day) serving as a unit of replication, I classified individual predictor variables by intensity level and time so I could run statistical tests to test for variable significance. I did not think any of the variables would be colinear, being different metrics of time and intensity, but I could have graphed two predictor variables against each other to test for colinearity if they seemed closely related.
Task 2: Visualization- I was then able to graph my function using a ggplot bar chart to visualize data, and run a type III ANOVA using the car() package. My business objective was to find differences health metrics by time, intensity and use type, so I used a type III sums of squares. I wanted to find presence of both a main effect and a main effect and interaction effect between my two predictor variables. With the possible presence of significant interactions, this approach to analysis was the most valid for my business objective.
### (1a) Daily activity by weekday- total
weekday_daily_activity <- ddply(daily_activity, c("wday"), summarise,
N = sum(!is.na(TotalDistance) ), #use this version if missing values present
mean = mean(TotalDistance,na.rm=TRUE),
sd = sd(TotalDistance,na.rm=TRUE),
se = sd / sqrt(N))
# Create classifier variable
weekday_daily_activity$activity <- paste("Total")
### (2a) Active distance by weekday- intensity subset active
weekday_active_distance <- ddply(daily_activity, c("wday"), summarise,
N = sum(!is.na(VeryActiveDistance) ), #use this version if missing values present
mean = mean(VeryActiveDistance,na.rm=TRUE),
sd = sd(VeryActiveDistance,na.rm=TRUE),
se = sd / sqrt(N))
weekday_active_distance$activity <- paste("Active")
### (3a) Moderate distance by weekday
weekday_moderate_distance <- ddply(daily_activity, c("wday"), summarise,
N = sum(!is.na(ModeratelyActiveDistance) ), #use this version if missing values present
mean = mean(ModeratelyActiveDistance,na.rm=TRUE),
sd = sd(ModeratelyActiveDistance,na.rm=TRUE),
se = sd / sqrt(N))
weekday_moderate_distance$activity <- paste("Moderate")
### (4a) Light distance by weekday
weekday_light_distance <- ddply(daily_activity, c("wday"), summarise,
N = sum(!is.na(LightActiveDistance) ), #use this version if missing values present
mean = mean(LightActiveDistance,na.rm=TRUE),
sd = sd(LightActiveDistance,na.rm=TRUE),
se = sd / sqrt(N))
weekday_light_distance$activity <- paste("Light")
### (5a) Sedentary distance by weekday
weekday_sedentary_distance <- ddply(daily_activity, c("wday"), summarise,
N = sum(!is.na(SedentaryActiveDistance) ), #use this version if missing values present
mean = mean(SedentaryActiveDistance,na.rm=TRUE),
sd = sd(SedentaryActiveDistance,na.rm=TRUE),
se = sd / sqrt(N))
weekday_sedentary_distance$activity <- paste("Sedentary")
# Bind 2a-5a
weekday_distance <- rbind(weekday_sedentary_distance, weekday_light_distance, weekday_moderate_distance,
weekday_active_distance)
##### Figure: Distance traveled by activity type
graph1<-ggplot() +
geom_histogram(data = weekday_sedentary_distance,
aes(x = wday, y = mean),
stat = "identity",
fill = "#004c6d",
alpha = 0.4,
width = 0.95) +
geom_histogram(data = weekday_light_distance,
aes(x = wday, y = mean),
stat = "identity",
fill = "#004c6d",
alpha = 0.4,
width = 0.95) +
geom_histogram(data = weekday_active_distance,
aes(x = wday, y = mean),
stat = "identity",
fill = "#004c6d",
alpha = 0.6,
width = 0.95) +
geom_histogram(data = weekday_moderate_distance,
aes(x = wday, y = mean),
stat = "identity",
fill = "#004c6d",
alpha = 1,
width = 0.95) +
labs(title = "Distance traveled by activity type",x="Day of week",y="Kilometers") +
theme_minimal() +
theme(legend.position="bottom") +
theme(legend.text = element_text(color = 'black', size = 12, hjust = 0.5)) +
theme(plot.title = element_text(color = '#554C6C', size = 15, hjust = 0.5)) +
theme(axis.title.x = element_text(color = '#554C6C', size = 12, hjust = 0.5)) +
theme(axis.title.y = element_text(color = '#554C6C', size = 12, hjust = 0.5)) +
theme(axis.line = element_line(colour = "black"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank())
graph1 + geom_errorbar(data = weekday_light_distance,
aes(x=wday, ymin=mean-se,
ymax=mean+se), width=0.3,
colour="black",
alpha=0.9,
size=0.8) +
geom_errorbar(data = weekday_active_distance,
aes(x=wday, ymin=mean-se,
ymax=mean+se),
width=0.3,
colour="black",
alpha=1,
size=0.8) +
geom_errorbar(data = weekday_moderate_distance,
aes(x=wday, ymin=mean-se,
ymax=mean+se),
width=0.3,
colour="black",
alpha=1.2,
size=0.8)
Figure 1. Distance traveled by users (km) at different fitness intensity levels (sedentary, light, moderate and active), shown through increasing color intensity with active being indicated by the dark navy bars (n = 30, ± 1 SE). The data used in analysis was obtained as CC0: Public Domain data, made available through Mobius.
######### ANOVA: Distance: intensity/weekday #########
daily_activity_subset <- daily_activity[,(names(daily_activity) %in% c("wday",
"ModeratelyActiveDistance",
"SedentaryActiveDistance",
"LightActiveDistance",
"VeryActiveDistance"))]
# convert wide format to long format dataset
# wday= weekday, variable= intensity, value= distance
long <- melt(daily_activity_subset, id.vars = c("wday"))
# type III anova
anova.dist<-lm(value~factor(variable)*factor(wday), data= long)
Anova(anova.dist, type="III")
# Tally frequency of users within the sleep_day dataset
# One row = one day/one unit of replication
device_use <- sleep_day %>%
group_by(Id) %>%
tally()
# Create classifier variable for device use based on number of days used
device_use_class<- device_use %>%
mutate(Use_Level = case_when(n >= 1 & n<= 10 ~ "Rare Use",
n >= 11 & n<= 20 ~ "Moderate Use",
n >= 21 & n<= 31 ~ "Continuing Use"))
# Subset dataset to frequency (n) and device use level
device_use_class <- device_use_class[,(names(device_use_class) %in% c("n",
"Use_Level"))]
# Count frequency of use classes, prepare for figure
device_use_class <- data.frame(device_use_class) %>%
group_by(Use_Level) %>%
tally()
# Omit device use class = na
device_use_class <- na.omit(device_use_class)
#### Figure: Proportional frequency of smart device use
device_use_class %>%
ggplot(aes(x = "", y = Use_Level, fill = Use_Level)) +
geom_bar(stat = "identity", width = 1, col="black")+
coord_polar("y", start=0)+
theme_minimal()+
theme(axis.title.x= element_blank(),
axis.title.y = element_blank(),
panel.border = element_blank(),
panel.grid = element_blank(),
axis.ticks = element_blank(),
axis.text.x = element_blank(),
plot.title = element_text(hjust = 0.5, size=14, face = "bold")) +
scale_fill_manual(values = c("#7da2c0", "#5a84a3", "#366788","#004c6d")) +
geom_label(aes(label = Use_Level),
color = "Black",
position = position_stack(vjust = 0.39),
show.legend = FALSE) +
labs(title="Proportional frequency of smart device use") +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(),) +
theme(legend.text = element_text(color = 'black', size = 12, hjust = 0.5)) +
theme(plot.title = element_text(color = '#554C6C', size = 15, hjust = 0.5)) +
theme(axis.title.x = element_text(color = 'white', size = 12, hjust = 0.5)) +
theme(axis.title.y = element_text(color = 'white', size = 12, hjust = 0.5))
Figure 2. Proportional frequency of smart device use for users of the FitBit App from the data subset, shown through increasing color intensity with rare use indicated by the dark navy color (n = 30). The data used in analysis was obtained as CC0: Public Domain data, made available through Mobius.
sleep_day$SleepDay=as.POSIXct(sleep_day$SleepDay, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
sleep_day$time <- format(sleep_day$SleepDay, format = "%H:%M:%S")
sleep_day$date <- format(sleep_day$SleepDay, format = "%m/%d/%y")
sleep_day$date <- mdy(sleep_day$date) # import date field in correct format
sleep_day$month <- month(sleep_day$date) # get month of year
sleep_day$doy <- yday(sleep_day$date) # get day of year
sleep_day$year <- year(sleep_day$date) # get year
sleep_day$wday <- wday(sleep_day$date, label = FALSE, abbr = TRUE, week_start = getOption("lubridate.week.start", 7), locale = Sys.getlocale("LC_TIME"))
device_use <- sleep_day %>%
group_by(date) %>%
tally() %>%
summarise( max_user = max(n))
device_use <- sleep_day %>%
group_by(date) %>%
tally() %>%
mutate(usage_rate = n / 17)
calendarPlot(device_use, pollutant = "usage_rate",
year = 2016,
month = 4:5,
annotate = "date",
main = "Frequency of smart device users",
cols = "viridis"
)
Figure 3. Proportional frequency of smart device use for users of the FitBit App April 12th, 2016 to May 12th, 2016, shown through increasing color intensity with rare use indicated by the dark navy color, and frequent use indicated by the yellow color (n = 30). The data used in analysis was obtained as CC0: Public Domain data, made available through Mobius.
######### ANOVA: Frequency: use class #########
# Anova- are means different?
use.lm <- lm(n ~ Use_Level, data = device_use_class)
use.av <- aov(use.lm)
summary(use.av)
## Df Sum Sq Mean Sq
## Use_Level 2 34.67 17.33
## Post hoc- how are means different?
tukey.test <- TukeyHSD(use.av)
tukey.test
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = use.lm)
##
## $Use_Level
## diff lwr upr p adj
## Moderate Use-Continuing Use -8 NaN NaN NaN
## Rare Use-Continuing Use -2 NaN NaN NaN
## Rare Use-Moderate Use 6 NaN NaN NaN
### (1b) Active minutes by weekday
weekday_active_minutes <- ddply(daily_activity, c("wday"), summarise,
N = sum(!is.na(VeryActiveMinutes) ), #use this version if missing values present
mean = mean(VeryActiveMinutes,na.rm=TRUE),
sd = sd(VeryActiveMinutes,na.rm=TRUE),
se = sd / sqrt(N))
weekday_active_minutes$activity <- paste("Active")
### (2b) Moderate minutes by weekday
weekday_moderate_minutes <- ddply(daily_activity, c("wday"), summarise,
N = sum(!is.na(FairlyActiveMinutes) ), #use this version if missing values present
mean = mean(FairlyActiveMinutes,na.rm=TRUE),
sd = sd(FairlyActiveMinutes,na.rm=TRUE),
se = sd / sqrt(N))
weekday_moderate_minutes$activity <- paste("Moderate")
### (3b) Light minutes by weekday
weekday_light_minutes <- ddply(daily_activity, c("wday"), summarise,
N = sum(!is.na(LightlyActiveMinutes) ), #use this version if missing values present
mean = mean(LightlyActiveMinutes,na.rm=TRUE),
sd = sd(LightlyActiveMinutes,na.rm=TRUE),
se = sd / sqrt(N))
weekday_light_minutes$activity <- paste("Light")
### (4b) Sedentary minutes by weekday
weekday_sedentary_minutes <- ddply(daily_activity, c("wday"), summarise,
N = sum(!is.na(SedentaryMinutes) ), #use this version if missing values present
mean = mean(SedentaryMinutes,na.rm=TRUE),
sd = sd(SedentaryMinutes,na.rm=TRUE),
se = sd / sqrt(N))
weekday_sedentary_minutes$activity <- paste("Sedentary")
# Bind 1b-4b
weekday_minutes <- rbind(weekday_sedentary_minutes, weekday_light_minutes, weekday_moderate_minutes, weekday_active_minutes)
### (5b) pivot table- minutes by activity type, by weekday
minutes_mean <- ddply(weekday_minutes, c("activity"), summarise,
N = sum(!is.na(mean) ), #use this version if missing values present
mean = mean(mean,na.rm=TRUE),
sd = sd(mean,na.rm=TRUE),
se = sd / sqrt(mean))
### Figure: Proportional duration of activity type
minutes_mean %>%
ggplot(aes(x = "", y = activity, fill = activity)) +
geom_bar(stat = "identity", width = 1, col="black")+
coord_polar("y", start=0)+
theme_minimal()+
theme(axis.title.x= element_blank(),
axis.title.y = element_blank(),
panel.border = element_blank(),
panel.grid = element_blank(),
axis.ticks = element_blank(),
axis.text.x = element_blank(),
plot.title = element_text(hjust = 0.5, size=14, face = "bold")) +
scale_fill_manual(values = c("#7da2c0", "#5a84a3", "#366788","#004c6d")) +
geom_label(aes(label = activity),
color = "Black",
position = position_stack(vjust = 0.5),
show.legend = FALSE) +
labs(title="Proportional duration of activity type") +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(),) +
theme(legend.text = element_text(color = 'black', size = 12, hjust = 0.5)) +
theme(plot.title = element_text(color = '#554C6C', size = 15, hjust = 0.5)) +
theme(axis.title.x = element_text(color = 'white', size = 12, hjust = 0.5)) +
theme(axis.title.y = element_text(color = 'white', size = 12, hjust = 0.5))
Figure 4. Proportional duration of activity level (sedentary, light, moderate, active), shown through increasing color intensity with sedentary use indicated by the dark navy color (n = 30). The data used in analysis was obtained as CC0: Public Domain data, made available through Mobius.
######### ANOVA: Duration: activity type #########
# Anova- are means different?
activity.lm <- lm(mean ~ activity, data = minutes_mean)
activity.av <- aov(activity.lm)
summary(activity.av)
## Df Sum Sq Mean Sq
## activity 3 649548 216516
## Post hoc- how are means different?
tukey.test <- TukeyHSD(activity.av)
tukey.test
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = activity.lm)
##
## $activity
## diff lwr upr p adj
## Light-Active 171.683642 NaN NaN NaN
## Moderate-Active -7.567495 NaN NaN NaN
## Sedentary-Active 970.487980 NaN NaN NaN
## Moderate-Light -179.251137 NaN NaN NaN
## Sedentary-Light 798.804338 NaN NaN NaN
## Sedentary-Moderate 978.055475 NaN NaN NaN
#create date format to strip values
hourly_intensities$ActivityHour=as.POSIXct(hourly_intensities$ActivityHour,
format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone()) # set variable type as date time format
hourly_intensities$time <- format(hourly_intensities$ActivityHour,
format = "%H:%M:%S") # extract time
hourly_intensities$time_obj <- strptime(hourly_intensities$time,
format = "%H:%M:%OS", tz = "EST") # create time format to strip hour as subclass
hourly_intensities$hour<- hour(hourly_intensities$time_obj) # strip time
### (1c) Daily steps by weekday
hour_activity <- ddply(hourly_intensities, c("hour"), summarise,
N = sum(!is.na(TotalIntensity) ), #use this version if missing values present
mean = mean(TotalIntensity,na.rm=TRUE),
sd = sd(TotalIntensity,na.rm=TRUE),
se = sd / sqrt(N))
graph2 <- ggplot() +
geom_histogram(data = hour_activity, aes(x = hour, y = mean), stat = "identity", fill = "#004c6d", color="#004c6d" , alpha = 0.6, width = 0.95) +
labs(title = "Fitness intensity level by hour",x="Hour",y="Intensity Level") +
theme_minimal() +
theme(legend.position="bottom") +
theme(legend.text = element_text(color = 'black', size = 12, hjust = 0.5)) +
theme(plot.title = element_text(color = '#554C6C', size = 15, hjust = 0.5)) +
theme(axis.title.x = element_text(color = '#554C6C', size = 12, hjust = 0.5)) +
theme(axis.title.y = element_text(color = '#554C6C', size = 12, hjust = 0.5)) +
theme(axis.line = element_line(colour = "black"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank())
graph2 + geom_errorbar(data = hour_activity, aes(x=hour, ymin=mean-se, ymax=mean+se), width=0.3, colour="black", alpha=0.9, size=0.8)
head (hour_activity)
Figure 5. Fitness activity level by hour throughout a 24 hour period (± 1 SE) as shown by the blue bars (n = 30). The data used in analysis was obtained as CC0: Public Domain data, made available through Mobius.
######### ANOVA: Intensity: hour #########
hour.lm <- lm(mean ~ hour, data = hour_activity)
hour.av <- aov(hour.lm)
summary(hour.av)
## Df Sum Sq Mean Sq F value Pr(>F)
## hour 1 457 457.0 12.23 0.00204 **
## Residuals 22 822 37.4
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#create date format to strip values
daily_calories$ActivityHour=as.POSIXct(daily_calories$ActivityDay, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
daily_calories$time <- format(daily_calories$ActivityDay, format = "%H:%M:%S")
daily_calories$date <- format(daily_calories$ActivityDay, format = "%m/%d/%y")
daily_calories$date <- mdy(daily_calories$date) # import date field in correct format
daily_calories$month <- month(daily_calories$date) # get month of year
daily_calories$doy <- yday(daily_calories$date) # get day of year
daily_calories$year <- year(daily_calories$date) # get year
daily_calories$wday <- wday(daily_calories$date, label = FALSE, abbr = TRUE, week_start = getOption("lubridate.week.start", 7), locale = Sys.getlocale("LC_TIME"))
#create date format to strip values
sleep_day$SleepDay=as.POSIXct(sleep_day$SleepDay, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
sleep_day$time <- format(sleep_day$SleepDay, format = "%H:%M:%S")
sleep_day$date <- format(sleep_day$SleepDay, format = "%m/%d/%y")
sleep_day$date <- mdy(sleep_day$date) # import date field in correct format
sleep_day$month <- month(sleep_day$date) # get month of year
sleep_day$doy <- yday(sleep_day$date) # get day of year
sleep_day$year <- year(sleep_day$date) # get year
sleep_day$wday <- wday(sleep_day$date, label = FALSE, abbr = TRUE, week_start = getOption("lubridate.week.start", 7), locale = Sys.getlocale("LC_TIME"))
sleep_calories <- merge(sleep_day,daily_calories,by="doy")
sleep_calories_subset <- sleep_calories[,(names(sleep_calories) %in% c("TotalMinutesAsleep","doy",
"TotalTimeInBed","Calories"))]
sleep_calories_subset$sleep_hours <- sleep_calories_subset$TotalMinutesAsleep/60
sleep_calories_subset$bed_hours <- sleep_calories_subset$TotalTimeInBed/60
sleep_calories_subset$fall_asleep_time <- sleep_calories_subset$bed_hours - sleep_calories_subset$sleep_hours
# Bin size control + color palette
ggplot(sleep_calories_subset, aes(x=sleep_hours, y=Calories) ) +
geom_hex(bins = 70) +
scale_fill_continuous(type = "viridis") +
theme_bw() +
labs(title = "Daily distance travelled compared with calories burned",x="Kilometers Travelled",y="Calories Burned") +
theme(legend.text = element_text(color = 'black', size = 12, hjust = 0.5)) +
theme(plot.title = element_text(color = '#554C6C', size = 15, hjust = 0.5)) +
theme(axis.title.x = element_text(color = '#554C6C', size = 12, hjust = 0.5)) +
theme(axis.title.y = element_text(color = '#554C6C', size = 12, hjust = 0.5)) +
theme(axis.line = element_line(colour = "black"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank())
Figure 6. Distance traveled (Km) compared with calories burned in fitness smart device users. Density of count shown by color, with brighter colors showing more frequent values (n = 30). The data used in analysis was obtained as CC0: Public Domain data, made available through Mobius.
# cont...
#
ggplot(sleep_calories_subset, aes(x=fall_asleep_time, y=Calories) ) +
geom_hex(bins = 70) +
scale_fill_continuous(type = "viridis") +
theme_bw() +
labs(title = "Daily falling asleep time compared with calories burned",x="Time Falling Asleep",y="Calories Burned") +
theme(legend.text = element_text(color = 'black', size = 12, hjust = 0.5)) +
theme(plot.title = element_text(color = '#554C6C', size = 15, hjust = 0.5)) +
theme(axis.title.x = element_text(color = '#554C6C', size = 12, hjust = 0.5)) +
theme(axis.title.y = element_text(color = '#554C6C', size = 12, hjust = 0.5)) +
theme(axis.line = element_line(colour = "black"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank())
Figure 7. Daily falling asleep time compared with calories burned. Density of count shown by color, with brighter colors showing more frequent values (n = 30). The data used in analysis was obtained as CC0: Public Domain data, made available through Mobius.
sleep_day$SleepDay=as.POSIXct(sleep_day$SleepDay, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
sleep_day$time <- format(sleep_day$SleepDay, format = "%H:%M:%S")
sleep_day$date <- format(sleep_day$SleepDay, format = "%m/%d/%y")
sleep_day$date <- mdy(sleep_day$date) # import date field in correct format
sleep_day$month <- month(sleep_day$date) # get month of year
sleep_day$doy <- yday(sleep_day$date) # get day of year
sleep_day$year <- year(sleep_day$date) # get year
sleep_day$wday <- wday(sleep_day$date, label = FALSE, abbr = TRUE, week_start = getOption("lubridate.week.start", 7), locale = Sys.getlocale("LC_TIME"))
daily_intensities$date <- mdy(daily_intensities$ActivityDay) # import date field in correct format
daily_intensities$month <- month(daily_intensities$date) # get month of year
daily_intensities$doy <- yday(daily_intensities$date) # get day of year
daily_intensities$year <- year(daily_intensities$date) # get year
daily_intensities$wday <- wday(daily_intensities$date, label = FALSE, abbr = TRUE, week_start = getOption("lubridate.week.start", 7), locale = Sys.getlocale("LC_TIME"))
sleep_intensity <- merge(sleep_day,daily_intensities,by="doy")
sleep_intensity_subset <- sleep_intensity[,(names(sleep_intensity) %in% c("TotalMinutesAsleep","doy",
"TotalTimeInBed","LightlyActiveMinutes",
"SedentaryMinutes","FairlyActiveMinutes",
"VeryActiveMinutes"))]
### Sleep as hour
sleep_intensity_subset$sleep_hours <- sleep_intensity_subset$TotalMinutesAsleep/60
sleep_intensity_subset$bed_hours <- sleep_intensity_subset$TotalTimeInBed/60
sleep_intensity_subset$fall_asleep_time <- sleep_intensity_subset$bed_hours - sleep_intensity_subset$sleep_hours
### Intensity as hour
sleep_intensity_subset$seden_hours <- sleep_intensity_subset$SedentaryMinutes/60
sleep_intensity_subset$light_hours <- sleep_intensity_subset$LightlyActiveMinutes/60
sleep_intensity_subset$moder_hours <- sleep_intensity_subset$FairlyActiveMinutes/60
sleep_intensity_subset$activ_hours <- sleep_intensity_subset$VeryActiveMinutes/60
sleep_intensity_subset_class<- sleep_intensity_subset %>%
mutate(Sleep_Level = case_when(sleep_hours >= 0 & sleep_hours<= 7 ~ "Low Sleep",
sleep_hours >= 7.01 & sleep_hours<= 8.5 ~ "Healthy Sleep",
sleep_hours >= 8.51 & sleep_hours<= 24 ~ "Over Sleep"))
# create correlogram for activities and calories
correlogram <- sleep_intensity_subset_class %>%
select(bed_hours, activ_hours, moder_hours,
light_hours, seden_hours, Sleep_Level) %>%
ggpairs(mapping = ggplot2::aes(colour = Sleep_Level),
lower = list(continuous = wrap("smooth", alpha = 0.5, size=0.9,alignPercent=0.8)),
diag = list(discrete="barDiag", continuous = wrap("densityDiag", alpha=0.5 )),
upper = list(combo = wrap("box_no_facet", alpha=0.5)),
columns = 1:5) +
scale_fill_manual(values=c( "#366788","#7da2c0","#004c6d")) +
scale_color_manual(values=c("#366788","#7da2c0","#004c6d")) +
theme_bw() +
labs(title = "Correlogram of Sleep Level vs. Bed Time (hours) and Fitness Intensity Level (hours)",
subtitle = "Categorized by low, healthy, and oversleep sleep levels",
caption = "Data source: Public Domain through Mobius") +
theme(legend.text = element_text(color = 'black', size = 12, hjust = 0.5)) +
theme(plot.title = element_text(color = '#554C6C', size = 15, hjust = 0.5)) +
theme(axis.title.x = element_text(color = '#554C6C', size = 12, hjust = 0.5)) +
theme(axis.title.y = element_text(color = '#554C6C', size = 12, hjust = 0.5)) +
theme(axis.line = element_line(colour = "black"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank())
correlogram # display the correlogram
Figure 8. Correlogram of sleep level vs. bed time (hours) and fitness intensity level (hours), categorized by sleep level. Sleep level is with increased sleep time (low, healthy or over sleep) indicated by the light to dark navy points respectively (n = 30). The data used in analysis was obtained as CC0: Public Domain data, made available through Mobius.
sleep_intensity_subset <- sleep_intensity_subset_class[,
(names(sleep_intensity_subset_class) %in% c("sleep_hours",
"bed_hours",
"seden_hours"))]
### correlogram
# Set palette for correlogram
change_palette <- function(correlogram){
for(i in 1:correlogram$nrow) {
for(j in 1:correlogram$ncol){
correlogram[i,j] <- correlogram[i,j] +
scale_fill_manual(values=c( "#0582CA","#004c6d")) +
scale_color_manual(values=c("#0582CA","#004c6d"))
}
}
return(correlogram)
}
change_palette_single <- function(data, mapping, method = "lm", ...) {
correlogram <- ggplot(data = data, mapping = mapping) +
geom_point(colour = "#004c6d") +
geom_smooth(method = method, color = "black", ...)
return(correlogram)
}
correlogram <- sleep_intensity_subset %>%
select(sleep_hours,
bed_hours,
seden_hours) %>%
ggpairs(lower = list(continuous = wrap(change_palette_single, method = "lm")),
diag = list(continuous = wrap("barDiag", fill = "#004c6d", alpha = 0.6, colour = "#004c6d")),
upper = list(continuous = wrap("cor", size = 4))) +
theme() +
labs(title = "Correlogram of sedentary activity compared with sleep and duration in bed",
caption = "Data source: Public Domain through Mobius")+
theme(legend.text = element_text(color = 'black', size = 12, hjust = 0.5)) +
theme(plot.title = element_text(color = '#554C6C', size = 15, hjust = 0.5)) +
theme(axis.title.x = element_text(color = '#554C6C', size = 12, hjust = 0.5)) +
theme(axis.title.y = element_text(color = '#554C6C', size = 12, hjust = 0.5)) +
theme(axis.line = element_line(colour = "black"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank())
correlogram
Figure 9. Correlogram of sedentary activity compared with sleep and duration in bed. Fitness activity level by hour throughout a 24 hour period (± 1 SE) as shown by the blue bars (n = 30). The data used in analysis was obtained as CC0: Public Domain data, made available through Mobius.
daily_intensities$date <- mdy(daily_intensities$ActivityDay) # import date field in correct format
daily_intensities$month <- month(daily_intensities$date) # get month of year
daily_intensities$doy <- yday(daily_intensities$date) # get day of year
daily_intensities$year <- year(daily_intensities$date) # get year
daily_intensities$wday <- wday(daily_intensities$date, label = FALSE, abbr = TRUE, week_start = getOption("lubridate.week.start", 7), locale = Sys.getlocale("LC_TIME"))
weight$WeightDay=as.POSIXct(weight$Date, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
weight$time <- format(weight$WeightDay, format = "%H:%M:%S")
weight$date <- format(weight$WeightDay, format = "%m/%d/%y")
weight$date <- mdy(weight$date) # import date field in correct format
weight$month <- month(weight$date) # get month of year
weight$doy <- yday(weight$date) # get day of year
weight$year <- year(weight$date) # get year
weight$wday <- wday(weight$date, label = FALSE, abbr = TRUE, week_start = getOption("lubridate.week.start", 7), locale = Sys.getlocale("LC_TIME"))
intensities_weight <- merge(daily_intensities,weight,by="doy")
intensities_weight <- intensities_weight[,(names(intensities_weight) %in% c("WeightPounds","Fat","doy",
"BMI","LightlyActiveMinutes",
"SedentaryMinutes","FairlyActiveMinutes",
"VeryActiveMinutes"))]
### Intensity as hour
intensities_weight$seden_hours <- intensities_weight$SedentaryMinutes/60
intensities_weight$light_hours <- intensities_weight$LightlyActiveMinutes/60
intensities_weight$moder_hours <- intensities_weight$FairlyActiveMinutes/60
intensities_weight$activ_hours <- intensities_weight$VeryActiveMinutes/60
intensities_weight_class <- intensities_weight %>%
mutate(BMI_class = case_when(BMI >= 0 & BMI<= 18.5 ~ "Underweight",
BMI >= 18.51 & BMI<= 24.9 ~ "Healthy",
BMI >= 25 & BMI<= 29.9 ~ "Overweight",
BMI<= 30 ~ "Obese"))
intensities_weight_class <- na.omit(intensities_weight_class)
# create correlogram for activities and calories
correlogram2 <- intensities_weight_class %>%
select(Fat, WeightPounds, activ_hours, seden_hours, BMI_class) %>%
ggpairs(mapping = ggplot2::aes(colour = BMI_class),
lower = list(continuous = wrap("smooth", alpha = 0.5, size=0.9,alignPercent=0.8)),
diag = list(discrete="barDiag", continuous = wrap("densityDiag", alpha=0.5 )),
upper = list(combo = wrap("box_no_facet", alpha=0.5)),
columns = 1:5) +
scale_fill_manual(values=c( "#7FB3D5","#004c6d")) +
scale_color_manual(values=c("#7FB3D5","#004c6d")) +
theme_bw() +
labs(title = "Correlogram of BMI Level vs. Fat, Weight(lbs) and Fitness Intensity Level (hours)",
subtitle = "Categorized by healthy and overweight BMI levels",
caption = "Data source: Public Domain through Mobius") +
theme(legend.text = element_text(color = 'black', size = 12, hjust = 0.5)) +
theme(plot.title = element_text(color = '#554C6C', size = 15, hjust = 0.5)) +
theme(axis.title.x = element_text(color = '#554C6C', size = 12, hjust = 0.5)) +
theme(axis.title.y = element_text(color = '#554C6C', size = 12, hjust = 0.5)) +
theme(axis.line = element_line(colour = "black"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank())
correlogram2 # display the correlogram
Figure 10. Correlogram of BMI level vs. fat, weight (lbs) and fitness intensity level (hours), categorized by healthy and overweight users, indicated by the blue and navy bars (n = 30). The data used in analysis was obtained as CC0: Public Domain data, made available through Mobius.
#create date time format to strip values
hourly_calories$Activity_Day=as.POSIXct(hourly_calories$ActivityHour,
format="%m/%d/%Y %I:%M:%S %p",
tz=Sys.timezone()) # set variable type as date time format
hourly_calories$time <- format(hourly_calories$Activity_Day,
format = "%H:%M:%S") # create time column
hourly_calories$date <- format(hourly_calories$Activity_Day,
format = "%m/%d/%y") # create date column
hourly_calories$date <- mdy(hourly_calories$date) # import date field in correct format
hourly_calories$month <- month(hourly_calories$date) # get month of year
hourly_calories$doy <- yday(hourly_calories$date) # get day of year
hourly_calories$year <- year(hourly_calories$date) # get year
hourly_calories$wday <- wday(hourly_calories$date,
label = FALSE, abbr = TRUE,
week_start = getOption("lubridate.week.start", 7),
locale = Sys.getlocale("LC_TIME"))# extract day of week from calendar date
hourly_calories$time_obj <- strptime(hourly_calories$time,
format = "%H:%M:%OS",
tz = "EST") # create time format to strip hour as subclass
hourly_calories$hour<- hour(hourly_calories$time_obj) # strip time
# Convert from integer time to diel categorical (Day/Night)
hourly_calories$diel <- hourly_calories$hour <- mapvalues(hourly_calories$hour,
from = c(1, 2, 3, 4, 5,
6, 7, 8, 9, 10,
11, 12, 13, 14,
15, 16, 17, 18,
19, 20, 21, 22,
23, 0), to = c("Night", "Night", "Night",
"Night", "Night", "Night",
"Night", "Day", "Day",
"Day", "Day", "Day",
"Day", "Day", "Day",
"Day", "Day", "Day",
"Day", "Night", "Night",
"Night", "Night", "Night"))
#create date time format to strip values
hourly_calories$Activity_Day=as.POSIXct(hourly_calories$ActivityHour,
format="%m/%d/%Y %I:%M:%S %p",
tz=Sys.timezone()) # set variable type as date time format
hourly_calories$time <- format(hourly_calories$Activity_Day,
format = "%H:%M:%S") # create time column
hourly_calories$date <- format(hourly_calories$Activity_Day,
format = "%m/%d/%y") # create date column
hourly_calories$date <- mdy(hourly_calories$date) # import date field in correct format
hourly_calories$month <- month(hourly_calories$date) # get month of year
hourly_calories$doy <- yday(hourly_calories$date) # get day of year
hourly_calories$year <- year(hourly_calories$date) # get year
hourly_calories$wday <- wday(hourly_calories$date,
label = FALSE, abbr = TRUE,
week_start = getOption("lubridate.week.start", 7),
locale = Sys.getlocale("LC_TIME"))# extract day of week from calendar date
hourly_calories$time_obj <- strptime(hourly_calories$time,
format = "%H:%M:%OS",
tz = "EST") # create time format to strip hour as subclass
hourly_calories$hour<- hour(hourly_calories$time_obj) # strip time
# Subset data to daytime calories
hourly_calories_day <- hourly_calories[ which(hourly_calories$diel=='Day'
), ]
# Subset dataset to hour, calories, sedentary time... very active time
hourly_calories_day_subset <- hourly_calories_day[,(names(hourly_calories_day) %in% c("hour",
"Calories",
"SedentaryMinutes",
"LightlyActiveMinutes",
"FairlyActiveMinutes",
"VeryActiveMinutes"))]
#create date time format to strip values
hourly_intensities$Activity_Day=as.POSIXct(hourly_intensities$ActivityHour,
format="%m/%d/%Y %I:%M:%S %p",
tz=Sys.timezone()) # set variable type as date time format
hourly_intensities$time <- format(hourly_intensities$Activity_Day,
format = "%H:%M:%S") # create time column
hourly_intensities$date <- format(hourly_intensities$Activity_Day,
format = "%m/%d/%y") # create date column
hourly_intensities$date <- mdy(hourly_intensities$date) # import date field in correct format
hourly_intensities$month <- month(hourly_intensities$date) # get month of year
hourly_intensities$doy <- yday(hourly_intensities$date) # get day of year
hourly_intensities$year <- year(hourly_intensities$date) # get year
hourly_intensities$wday <- wday(hourly_intensities$date,
label = FALSE,
abbr = TRUE,
week_start = getOption("lubridate.week.start", 7),
locale = Sys.getlocale("LC_TIME"))# extract day of week from calendar date
hourly_intensities$time_obj <- strptime(hourly_intensities$time,
format = "%H:%M:%OS",
tz = "EST") # create time format to strip hour as subclass
hourly_intensities$hour<- hour(hourly_intensities$time_obj) # strip time
# Convert from integer time to diel categorical (Day/Night)
hourly_intensities$diel <- hourly_intensities$hour <- mapvalues(hourly_intensities$hour,
from = c(1, 2, 3, 4, 5,
6, 7, 8, 9, 10,
11, 12, 13, 14,
15, 16, 17, 18,
19, 20, 21, 22,
23, 0), to = c("Night", "Night", "Night",
"Night", "Night", "Night",
"Night", "Day", "Day",
"Day", "Day", "Day",
"Day", "Day", "Day",
"Day", "Day", "Day",
"Day", "Night", "Night",
"Night", "Night", "Night"))
#create date time format to strip values
hourly_intensities$Activity_Day=as.POSIXct(hourly_intensities$ActivityHour,
format="%m/%d/%Y %I:%M:%S %p",
tz=Sys.timezone()) # set variable type as date time format
hourly_intensities$time <- format(hourly_intensities$Activity_Day,
format = "%H:%M:%S") # create time column
hourly_intensities$date <- format(hourly_intensities$Activity_Day,
format = "%m/%d/%y") # create date column
hourly_intensities$date <- mdy(hourly_intensities$date) # import date field in correct format
hourly_intensities$month <- month(hourly_intensities$date) # get month of year
hourly_intensities$doy <- yday(hourly_intensities$date) # get day of year
hourly_intensities$year <- year(hourly_intensities$date) # get year
hourly_intensities$wday <- wday(hourly_intensities$date,
label = FALSE,
abbr = TRUE,
week_start = getOption("lubridate.week.start", 7),
locale = Sys.getlocale("LC_TIME"))# extract day of week from calendar date
hourly_intensities$time_obj <- strptime(hourly_intensities$time,
format = "%H:%M:%OS",
tz = "EST") # create time format to strip hour as subclass
hourly_intensities$hour<- hour(hourly_intensities$time_obj) # strip time
# Subset data to daytime intensities
hourly_intensities_day <- hourly_intensities[ which(hourly_intensities$diel=='Day'
), ]
# Subset dataset to hour and total intensity
hourly_intensities_day_subset <- hourly_intensities_day[,(names(hourly_intensities_day) %in% c("hour",
"TotalIntensity"))]
# Create classifier variable for intensity level
hourly_intensities_day_subset_active <- hourly_intensities_day_subset %>%
mutate(activity = case_when(TotalIntensity <= 20 ~ "Sedentary",
TotalIntensity <= 35 ~ "Light",
TotalIntensity <= 45 ~ "Moderate",
TotalIntensity > 45 ~ "Active"))
#<20 is sedentary
#20-35 is light
#35-45 is moderate
#>40 is active
# Horizontal bind datasets (hour is unit of replication)
calorie_intensities <- cbind(hourly_intensities_day_subset_active,hourly_calories_day_subset)
# Subset data by total intensity, hour, activity, calories
calorie_intensities <- calorie_intensities[,(names(calorie_intensities) %in% c("TotalIntensity",
"hour",
"activity",
"Calories"))]
#### Figure: Hourly activity intesity compared with calories burned
w<-ggplot(calorie_intensities, aes(x=Calories, y=TotalIntensity, color=activity)) +
geom_point() +
ylim(0, 180) +
labs(title = "Hourly activity intensity compared with calories burned",x="Hourly Calories Burned",y="Hourly Intensity Level") +
theme(legend.text = element_text(color = 'black', size = 12, hjust = 0.5)) +
theme(plot.title = element_text(color = '#554C6C', size = 15, hjust = 0.5)) +
theme(axis.title.x = element_text(color = '#554C6C', size = 12, hjust = 0.5)) +
theme(axis.title.y = element_text(color = '#554C6C', size = 12, hjust = 0.5)) +
theme(axis.line = element_line(colour = "black"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank())
linear <- stat_smooth(method = "lm", col = "#013045")
w + scale_color_manual(values=c("#7da2c0", "#366788", "#5a84a3", "#004c6d")) +
linear
\(\mathbf{\displaystyle \normalsize LM\: : \:Intensity\:Level\:(\,hours\,) = 0.31 \star Calories\:Burned\:(\,hours\,) - 17.03 ,\:p < 0.001,\:R^2 = 0.796}\)
Figure 11. Hourly activity intensity compared with calories burned. Fitness intensity level (hours) is shown by the blue points, with decreasing activity level shown by increased color intensity (n = 30). A sedentary fitness level is shown by the navy points. The data used in analysis was obtained as CC0: Public Domain data, made available through Mobius.