This document was created to fulfill some of the requirements for the Google Data Analytics Professional Certificate.
Overview
To inform marketing strategies and product developement for Bellabeat, a fitness company geared towards women, an analysis of publically available data from smart devices was conducted. The dataset aggregated information from FitBit products in 2016 and was obtained through Kaggle.
The data were cleaned, reviewed, and analyzed with RStudio. Tests of normality and an analysis of unique users was conducted for all data. To gain additional information about the under-utilization of data related to weight, a second analysis was conducted.
Summary of Data
The results of the analysis should be interpreted cautiously due to the age of the dataset, number of unique users, and tests of normality; however, Step 1 in the Recommendations section will mitigate some of these limitations.
A review of all measures indicated:
- The data are not normally distrbuted
- Users collected data on activities more often than other health categories
- Users collected data on weight least often
Tests of Normality
### Tests of Normality
#Tests of Skewness
SkSt <- skewness(merged$TotalSteps)
SkDi <- skewness(merged$TotalDistance)
SkVeMi <- skewness(merged$VeryActiveMinutes)
SkFaMi <- skewness(merged$FairlyActiveMinutes)
SkLiMi <- skewness(merged$LightlyActiveMinutes)
SkSeMi <- skewness(merged$SedentaryMinutes)
SkCa <- skewness(merged$Calories)
SkSl <- skewness(sleep$minutes_asleep)
SkBd <- skewness(sleep$minutes_in_bed)
SkWe <- skewness(weight$weight_pounds)
SkBmi <- skewness(weight$bmi)
#Test of Kurtosis
KuSt <- kurtosis(merged$TotalSteps)
KuDi <- kurtosis(merged$TotalDistance)
KuVeMi <- kurtosis(merged$VeryActiveMinutes)
KuFaMi <- kurtosis(merged$FairlyActiveMinutes)
KuLiMi <- kurtosis(merged$LightlyActiveMinutes)
KuSeMi <- kurtosis(merged$SedentaryMinutes)
KuCa <- kurtosis(merged$Calories)
KuSl <- kurtosis(sleep$minutes_asleep)
KuBd <- kurtosis(sleep$minutes_in_bed)
KuWe <- kurtosis(weight$weight_pounds)
KuBmi <- kurtosis(weight$bmi)
#List of Variables Tested
sk_ku_act_names <- c("Total Steps", "Total Distance", "Very Active Minutes", "Fairly Active Minutes", "Lightly Active Minutes", "Sedentary Minutes", "Calories")
sk_ku_sle_names <- c("Minutes Asleep", "Minutes in Bed")
sk_ku_wei_names <- c("Weight (lbs)", "BMI")
#Vector of Skewness Results
sk_act_values <- c(SkSt, SkDi, SkVeMi, SkFaMi, SkLiMi, SkSeMi, SkCa)
sk_sle_values <- c(SkSl, SkBd)
sk_wei_values <- c(SkWe, SkBmi)
#Vector of Kurtosis Results
ku_act_values <- c(KuSt, KuDi, KuVeMi, KuFaMi, KuLiMi, KuSeMi, KuCa)
ku_sle_values <- c(KuSl, KuBd)
ku_wei_values <- c(KuWe, KuBmi)
#Data frames
sk_ku_act_df <- data.frame(sk_ku_act_names, sk_act_values, ku_act_values)
sk_ku_sle_df <- data.frame(sk_ku_sle_names, sk_sle_values, ku_sle_values)
sk_ku_wei_df <- data.frame(sk_ku_wei_names, sk_wei_values, ku_wei_values)
#Renaming Columns
colnames(sk_ku_act_df) <- c("Activity Measures", "Skewness", "Kurtosis")
colnames(sk_ku_sle_df) <- c("Sleep Measures", "Skewness", "Kurtosis")
colnames(sk_ku_wei_df) <- c("Weight Measures", "Skewness", "Kurtosis")
#Data frames to tables
kable(sk_ku_act_df)
| Total Steps |
0.6518526 |
4.156526 |
| Total Distance |
1.1244756 |
6.090108 |
| Very Active Minutes |
2.1726691 |
8.741005 |
| Fairly Active Minutes |
2.4755336 |
10.946888 |
| Lightly Active Minutes |
-0.0378688 |
2.635419 |
| Sedentary Minutes |
-0.2940279 |
2.331211 |
| Calories |
0.4217761 |
3.615331 |
kable(sk_ku_sle_df)
| Minutes Asleep |
-0.6127408 |
4.582448 |
| Minutes in Bed |
-0.2178411 |
6.441879 |
kable(sk_ku_wei_df)
| Weight (lbs) |
1.338812 |
6.49147 |
| BMI |
5.865064 |
43.53380 |
Observations
#Calculating number of observations across health measures
OSt <- length(merged$TotalSteps)
ODi <- length(merged$TotalDistance)
OVeMi <- length(merged$VeryActiveMinutes)
OFaMi <- length(merged$FairlyActiveMinutes)
OLiMi <- length(merged$LightlyActiveMinutes)
OSeMi <- length(merged$SedentaryMinutes)
OCa <- length(merged$Calories)
OSl <- length(sleep$minutes_asleep)
OBd <- length(sleep$minutes_in_bed)
OWe <- length(weight$weight_pounds)
OBmi <- length(weight$bmi)
#Organizing information about activity into data frame
o_act_measures <- c("Total Steps", "Total Distance", "Very Active Minutes", "Fairly Active Minutes", "Lightly Active Minutes", "Sedentary Minutes", "Calories")
o_act_observations <- c(OSt, ODi, OVeMi, OFaMi, OLiMi, OSeMi, OCa)
o_act_df <- data.frame(o_act_measures, o_act_observations)
#Organizing information about sleep into data frame
o_sle_measures <- c("Minutes Asleep", "Minutes in Bed")
o_sle_observations <- c(OSl, OBd)
o_sle_df <- data.frame(o_sle_measures, o_sle_observations)
#Organizing information on weight into data frame
o_wei_measures <- c("Weight (lbs)", "BMI")
o_wei_observations <- c(OWe, OBmi)
o_wei_df <- data.frame(o_wei_measures, o_wei_observations)
#Renaming Columns
colnames(o_wei_df) <- c("Weight Measures", "Observations")
colnames(o_sle_df) <- c("Sleep Measures", "Observations")
colnames(o_act_df) <- c("Activity Measures", "Observations")
#Converting data frames to tables
kable(o_act_df)
| Total Steps |
940 |
| Total Distance |
940 |
| Very Active Minutes |
940 |
| Fairly Active Minutes |
940 |
| Lightly Active Minutes |
940 |
| Sedentary Minutes |
940 |
| Calories |
940 |
kable(o_sle_df)
| Minutes Asleep |
413 |
| Minutes in Bed |
413 |
kable(o_wei_df)
Visual Representation of Data
#Calculating number of unique users
users_activity <- length(unique(merged$Id))
users_sleep <- length(unique(sleep$id))
users_weight <- length(unique(weight$id))
#Creating data frame
user_values <- c(users_activity, users_sleep, users_weight)
user_labels <- c("Activity", "Sleep", "Weight")
unique_user_df <- data.frame(user_labels, user_values)
#Graph of unique users
ggplot(unique_user_df) + geom_col(mapping = aes(x=user_labels, y=user_values), fill = "#000066") + theme_bw() + labs(x = "Health Categories", y = "Total", title = "Unique Users") + theme(axis.text = element_text(size = 13), text = element_text(size = 13))

#Color palette
cbPalette <- c("#999999", "#009E73")
#Graph of weight-specific data
ggplot(weight) + geom_bar(mapping = aes(x=id, fill = manual_report)) + coord_flip() + theme_bw() + labs(x = "User ID", y="Total", title = "Type of Weight Log") + theme(legend.title = element_blank()) + scale_fill_manual(values=cbPalette) + theme(axis.text = element_text(size = 12.5), text = element_text(size = 13))

Potential Growth Area
Two important themes were found in the weight-specific analysis.
- Most observations were logged manually
- Of the limited users, most logged their weight fewer than 5 times
Based on the available information, it appears measures without automated data collection are less likely to be monitored and used.
Patterns in user behavior may be attributed to the effort required to manually log data, forgetting to complete the task due to competing obligations (e.g., work, family), or a combination of factors.
Bellabeat does not currently offer a product with automated data colleciton for weight (see table below).
| Leaf |
Worn as bracelet, clip, or necklace |
Yes |
Yes |
Yes |
No |
No |
| Time |
Worn as watch |
Yes |
Yes |
Yes |
No |
No |
| Spring |
Used as water bottle |
No |
No |
No |
Yes |
No |
If users must enter data manually or use another product to collect weight data automatically, it may decrease the use of the Bellabeat app and the online coaching program, which would decrease user engagement.
Recommendations
Immediate Next Steps:
- Conduct a brief survey of current customers to:
- Verify the results of the results of the analysis
- Gauge customer interest about adding a feature to automatically log weight data
If the results of the survey correspond with the analysis and there is sufficient customer interest:
Examine the cost associated with adding automated weight collection to future products.
Consider releasing a limited amount of updated devices and monitor the use of the Bellabeat app and online coaching program for a concurrent increase in user engagement.
---
title: 'Patterns in Fitness Data from Smart Devices '
author: "Christopher Taylor"
output:
  html_notebook: null
  'tml_notebook:': default
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)

#Packages used
library(readr)
library(tidyverse)
library(ggplot2)
library(dplyr)
library(readxl)
library(knitr)
library(moments)

#Activity Data
merged <- merged <- read_excel("~/FitBit Analysis/merged.xlsx", col_types = c("text", "date", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric"))

#Sleep Data
sleep <- read_excel("~/FitBit Analysis/sleep.xlsx", col_types = c("text", "date","numeric", "numeric","numeric"))

#Weight Data
weight <- read_excel("~/FitBit Analysis/weight.xlsx", col_types = c("text", "date", "numeric", "numeric", "numeric", "numeric", "numeric", "text", "numeric"))

```

This document was created to fulfill some of the requirements for the Google Data Analytics Professional Certificate.

---

## Overview
To inform marketing strategies and product developement for Bellabeat, a fitness company geared towards women, an analysis of publically available data from smart devices was conducted. The [dataset](https://www.kaggle.com/arashnic/fitbit) aggregated information from FitBit products in 2016 and was obtained through Kaggle.

The data were cleaned, reviewed, and analyzed with RStudio. Tests of normality and an analysis of unique users was conducted for all data. To gain additional information about the under-utilization of data related to weight, a second analysis was conducted. 

## Summary of Data

The results of the analysis should be interpreted cautiously due to the age of the dataset, number of unique users, and tests of normality; however, Step 1 in the Recommendations section will mitigate some of these limitations.

A review of all measures indicated:

- The data are not normally distrbuted
- Users collected data on activities more often than other health categories 
- Users collected data on weight least often

### Tests of Normality
```{r}
### Tests of Normality

#Tests of Skewness
SkSt <- skewness(merged$TotalSteps)
SkDi <- skewness(merged$TotalDistance)
SkVeMi <- skewness(merged$VeryActiveMinutes)
SkFaMi <- skewness(merged$FairlyActiveMinutes)
SkLiMi <- skewness(merged$LightlyActiveMinutes)
SkSeMi <- skewness(merged$SedentaryMinutes)
SkCa <- skewness(merged$Calories)
SkSl <- skewness(sleep$minutes_asleep)
SkBd <- skewness(sleep$minutes_in_bed)
SkWe <- skewness(weight$weight_pounds)
SkBmi <- skewness(weight$bmi)

#Test of Kurtosis 
KuSt <- kurtosis(merged$TotalSteps)
KuDi <- kurtosis(merged$TotalDistance)
KuVeMi <- kurtosis(merged$VeryActiveMinutes)
KuFaMi <- kurtosis(merged$FairlyActiveMinutes)
KuLiMi <- kurtosis(merged$LightlyActiveMinutes)
KuSeMi <- kurtosis(merged$SedentaryMinutes)
KuCa <- kurtosis(merged$Calories)
KuSl <- kurtosis(sleep$minutes_asleep)
KuBd <- kurtosis(sleep$minutes_in_bed)
KuWe <- kurtosis(weight$weight_pounds)
KuBmi <- kurtosis(weight$bmi)

#List of Variables Tested
sk_ku_act_names <- c("Total Steps", "Total Distance", "Very Active Minutes", "Fairly Active Minutes", "Lightly Active Minutes", "Sedentary Minutes", "Calories")
sk_ku_sle_names <- c("Minutes Asleep", "Minutes in Bed")
sk_ku_wei_names <- c("Weight (lbs)", "BMI")

#Vector of Skewness Results
sk_act_values <- c(SkSt, SkDi, SkVeMi, SkFaMi, SkLiMi, SkSeMi, SkCa)
sk_sle_values <- c(SkSl, SkBd)
sk_wei_values <- c(SkWe, SkBmi)

#Vector of Kurtosis Results
ku_act_values <- c(KuSt, KuDi, KuVeMi, KuFaMi, KuLiMi, KuSeMi, KuCa)
ku_sle_values <- c(KuSl, KuBd)
ku_wei_values <- c(KuWe, KuBmi)

#Data frames
sk_ku_act_df <- data.frame(sk_ku_act_names, sk_act_values, ku_act_values)
sk_ku_sle_df <- data.frame(sk_ku_sle_names, sk_sle_values, ku_sle_values)
sk_ku_wei_df <- data.frame(sk_ku_wei_names, sk_wei_values, ku_wei_values)

#Renaming Columns
colnames(sk_ku_act_df) <- c("Activity Measures", "Skewness", "Kurtosis")
colnames(sk_ku_sle_df) <- c("Sleep Measures", "Skewness", "Kurtosis")
colnames(sk_ku_wei_df) <- c("Weight Measures", "Skewness", "Kurtosis")

#Data frames to tables
kable(sk_ku_act_df)
kable(sk_ku_sle_df)
kable(sk_ku_wei_df)
```

### Observations
```{r}

#Calculating number of observations across health measures
OSt <- length(merged$TotalSteps)
ODi <- length(merged$TotalDistance)
OVeMi <- length(merged$VeryActiveMinutes)
OFaMi <- length(merged$FairlyActiveMinutes)
OLiMi <- length(merged$LightlyActiveMinutes)
OSeMi <- length(merged$SedentaryMinutes)
OCa <- length(merged$Calories)
OSl <- length(sleep$minutes_asleep)
OBd <- length(sleep$minutes_in_bed)
OWe <- length(weight$weight_pounds)
OBmi <- length(weight$bmi)

#Organizing information about activity into data frame
o_act_measures <- c("Total Steps", "Total Distance", "Very Active Minutes", "Fairly Active Minutes", "Lightly Active Minutes", "Sedentary Minutes", "Calories")
o_act_observations <- c(OSt, ODi, OVeMi, OFaMi, OLiMi, OSeMi, OCa)
o_act_df <- data.frame(o_act_measures, o_act_observations)

#Organizing information about sleep into data frame
o_sle_measures <- c("Minutes Asleep", "Minutes in Bed")
o_sle_observations <- c(OSl, OBd)
o_sle_df <- data.frame(o_sle_measures, o_sle_observations)

#Organizing information on weight into data frame
o_wei_measures <- c("Weight (lbs)", "BMI")
o_wei_observations <- c(OWe, OBmi)
o_wei_df <- data.frame(o_wei_measures, o_wei_observations)

#Renaming Columns
colnames(o_wei_df) <- c("Weight Measures", "Observations")
colnames(o_sle_df) <- c("Sleep Measures", "Observations")
colnames(o_act_df) <- c("Activity Measures", "Observations")

#Converting data frames to tables
kable(o_act_df)
kable(o_sle_df)
kable(o_wei_df)
```

## Visual Representation of Data

```{r}

#Calculating number of unique users
users_activity <- length(unique(merged$Id))
users_sleep <- length(unique(sleep$id))
users_weight <- length(unique(weight$id))

#Creating data frame
user_values <- c(users_activity, users_sleep, users_weight)
user_labels <- c("Activity", "Sleep", "Weight")
unique_user_df <- data.frame(user_labels, user_values)

#Graph of unique users
ggplot(unique_user_df) + geom_col(mapping = aes(x=user_labels, y=user_values), fill = "#000066") + theme_bw() + labs(x = "Health Categories", y = "Total", title = "Unique Users") + theme(axis.text = element_text(size = 13), text = element_text(size = 13))
```

```{r}
#Color palette

cbPalette <- c("#999999", "#009E73")

#Graph of weight-specific data
ggplot(weight) + geom_bar(mapping = aes(x=id, fill = manual_report)) + coord_flip() + theme_bw() + labs(x = "User ID", y="Total", title = "Type of Weight Log") + theme(legend.title = element_blank()) + scale_fill_manual(values=cbPalette) + theme(axis.text = element_text(size = 12.5), text = element_text(size = 13))

```

## Potential Growth Area

Two important themes were found in the weight-specific analysis. 

1. Most observations were logged manually
2. Of the limited users, most logged their weight fewer than 5 times\

Based on the available information, it appears **measures without automated data collection are less likely to be monitored and used.** \

Patterns in user behavior may be attributed to the effort required to manually log data, forgetting to complete the task due to competing obligations (e.g., work, family), or a combination of factors.\

Bellabeat does not currently offer a product with automated data colleciton for weight (see table below).\

```{r echo=FALSE}

#Data for data frame
Product <- c("Leaf", "Time", "Spring")
Description <- c("Worn as bracelet, clip, or necklace", "Worn as watch", "Used as water bottle") 
Activity <- c("Yes", "Yes", "No")
Sleep <- c("Yes", "Yes", "No")
Stress <- c("Yes", "Yes", "No")
Hydration <- c("No", "No", "Yes")
Weight <- c("No", "No", "No")

#Data frame about Bellabeat products
product_df <- data.frame(Product, Description, Activity, Sleep, Stress, Hydration, Weight)

#Data frame to table
kable(product_df)
```

If users must enter data manually or use another product to collect weight data automatically, it may decrease the use of the Bellabeat app and the online coaching program, which would decrease user engagement.

## Recommendations 
Immediate Next Steps:

- Conduct a brief survey of current customers to:
  - Verify the results of the results of the analysis
  - Gauge customer interest about adding a feature to automatically log weight data

If the results of the survey correspond with the analysis and there is sufficient customer interest: 

1. Examine the cost associated with adding automated weight collection to future products.

2. Consider releasing a limited amount of updated devices and monitor the use of the Bellabeat app and online coaching program for a concurrent increase in user engagement.

---
