This document was created to fulfill some of the requirements for the Google Data Analytics Professional Certificate.


Overview

To inform marketing strategies and product developement for Bellabeat, a fitness company geared towards women, an analysis of publically available data from smart devices was conducted. The dataset aggregated information from FitBit products in 2016 and was obtained through Kaggle.

The data were cleaned, reviewed, and analyzed with RStudio. Tests of normality and an analysis of unique users was conducted for all data. To gain additional information about the under-utilization of data related to weight, a second analysis was conducted.

Summary of Data

The results of the analysis should be interpreted cautiously due to the age of the dataset, number of unique users, and tests of normality; however, Step 1 in the Recommendations section will mitigate some of these limitations.

A review of all measures indicated:

Tests of Normality

### Tests of Normality

#Tests of Skewness
SkSt <- skewness(merged$TotalSteps)
SkDi <- skewness(merged$TotalDistance)
SkVeMi <- skewness(merged$VeryActiveMinutes)
SkFaMi <- skewness(merged$FairlyActiveMinutes)
SkLiMi <- skewness(merged$LightlyActiveMinutes)
SkSeMi <- skewness(merged$SedentaryMinutes)
SkCa <- skewness(merged$Calories)
SkSl <- skewness(sleep$minutes_asleep)
SkBd <- skewness(sleep$minutes_in_bed)
SkWe <- skewness(weight$weight_pounds)
SkBmi <- skewness(weight$bmi)

#Test of Kurtosis 
KuSt <- kurtosis(merged$TotalSteps)
KuDi <- kurtosis(merged$TotalDistance)
KuVeMi <- kurtosis(merged$VeryActiveMinutes)
KuFaMi <- kurtosis(merged$FairlyActiveMinutes)
KuLiMi <- kurtosis(merged$LightlyActiveMinutes)
KuSeMi <- kurtosis(merged$SedentaryMinutes)
KuCa <- kurtosis(merged$Calories)
KuSl <- kurtosis(sleep$minutes_asleep)
KuBd <- kurtosis(sleep$minutes_in_bed)
KuWe <- kurtosis(weight$weight_pounds)
KuBmi <- kurtosis(weight$bmi)

#List of Variables Tested
sk_ku_act_names <- c("Total Steps", "Total Distance", "Very Active Minutes", "Fairly Active Minutes", "Lightly Active Minutes", "Sedentary Minutes", "Calories")
sk_ku_sle_names <- c("Minutes Asleep", "Minutes in Bed")
sk_ku_wei_names <- c("Weight (lbs)", "BMI")

#Vector of Skewness Results
sk_act_values <- c(SkSt, SkDi, SkVeMi, SkFaMi, SkLiMi, SkSeMi, SkCa)
sk_sle_values <- c(SkSl, SkBd)
sk_wei_values <- c(SkWe, SkBmi)

#Vector of Kurtosis Results
ku_act_values <- c(KuSt, KuDi, KuVeMi, KuFaMi, KuLiMi, KuSeMi, KuCa)
ku_sle_values <- c(KuSl, KuBd)
ku_wei_values <- c(KuWe, KuBmi)

#Data frames
sk_ku_act_df <- data.frame(sk_ku_act_names, sk_act_values, ku_act_values)
sk_ku_sle_df <- data.frame(sk_ku_sle_names, sk_sle_values, ku_sle_values)
sk_ku_wei_df <- data.frame(sk_ku_wei_names, sk_wei_values, ku_wei_values)

#Renaming Columns
colnames(sk_ku_act_df) <- c("Activity Measures", "Skewness", "Kurtosis")
colnames(sk_ku_sle_df) <- c("Sleep Measures", "Skewness", "Kurtosis")
colnames(sk_ku_wei_df) <- c("Weight Measures", "Skewness", "Kurtosis")

#Data frames to tables
kable(sk_ku_act_df)
Activity Measures Skewness Kurtosis
Total Steps 0.6518526 4.156526
Total Distance 1.1244756 6.090108
Very Active Minutes 2.1726691 8.741005
Fairly Active Minutes 2.4755336 10.946888
Lightly Active Minutes -0.0378688 2.635419
Sedentary Minutes -0.2940279 2.331211
Calories 0.4217761 3.615331
kable(sk_ku_sle_df)
Sleep Measures Skewness Kurtosis
Minutes Asleep -0.6127408 4.582448
Minutes in Bed -0.2178411 6.441879
kable(sk_ku_wei_df)
Weight Measures Skewness Kurtosis
Weight (lbs) 1.338812 6.49147
BMI 5.865064 43.53380

Observations


#Calculating number of observations across health measures
OSt <- length(merged$TotalSteps)
ODi <- length(merged$TotalDistance)
OVeMi <- length(merged$VeryActiveMinutes)
OFaMi <- length(merged$FairlyActiveMinutes)
OLiMi <- length(merged$LightlyActiveMinutes)
OSeMi <- length(merged$SedentaryMinutes)
OCa <- length(merged$Calories)
OSl <- length(sleep$minutes_asleep)
OBd <- length(sleep$minutes_in_bed)
OWe <- length(weight$weight_pounds)
OBmi <- length(weight$bmi)

#Organizing information about activity into data frame
o_act_measures <- c("Total Steps", "Total Distance", "Very Active Minutes", "Fairly Active Minutes", "Lightly Active Minutes", "Sedentary Minutes", "Calories")
o_act_observations <- c(OSt, ODi, OVeMi, OFaMi, OLiMi, OSeMi, OCa)
o_act_df <- data.frame(o_act_measures, o_act_observations)

#Organizing information about sleep into data frame
o_sle_measures <- c("Minutes Asleep", "Minutes in Bed")
o_sle_observations <- c(OSl, OBd)
o_sle_df <- data.frame(o_sle_measures, o_sle_observations)

#Organizing information on weight into data frame
o_wei_measures <- c("Weight (lbs)", "BMI")
o_wei_observations <- c(OWe, OBmi)
o_wei_df <- data.frame(o_wei_measures, o_wei_observations)

#Renaming Columns
colnames(o_wei_df) <- c("Weight Measures", "Observations")
colnames(o_sle_df) <- c("Sleep Measures", "Observations")
colnames(o_act_df) <- c("Activity Measures", "Observations")

#Converting data frames to tables
kable(o_act_df)
Activity Measures Observations
Total Steps 940
Total Distance 940
Very Active Minutes 940
Fairly Active Minutes 940
Lightly Active Minutes 940
Sedentary Minutes 940
Calories 940
kable(o_sle_df)
Sleep Measures Observations
Minutes Asleep 413
Minutes in Bed 413
kable(o_wei_df)
Weight Measures Observations
Weight (lbs) 67
BMI 67

Visual Representation of Data


#Calculating number of unique users
users_activity <- length(unique(merged$Id))
users_sleep <- length(unique(sleep$id))
users_weight <- length(unique(weight$id))

#Creating data frame
user_values <- c(users_activity, users_sleep, users_weight)
user_labels <- c("Activity", "Sleep", "Weight")
unique_user_df <- data.frame(user_labels, user_values)

#Graph of unique users
ggplot(unique_user_df) + geom_col(mapping = aes(x=user_labels, y=user_values), fill = "#000066") + theme_bw() + labs(x = "Health Categories", y = "Total", title = "Unique Users") + theme(axis.text = element_text(size = 13), text = element_text(size = 13))

#Color palette

cbPalette <- c("#999999", "#009E73")

#Graph of weight-specific data
ggplot(weight) + geom_bar(mapping = aes(x=id, fill = manual_report)) + coord_flip() + theme_bw() + labs(x = "User ID", y="Total", title = "Type of Weight Log") + theme(legend.title = element_blank()) + scale_fill_manual(values=cbPalette) + theme(axis.text = element_text(size = 12.5), text = element_text(size = 13))

Potential Growth Area

Two important themes were found in the weight-specific analysis.

  1. Most observations were logged manually
  2. Of the limited users, most logged their weight fewer than 5 times

Based on the available information, it appears measures without automated data collection are less likely to be monitored and used.

Patterns in user behavior may be attributed to the effort required to manually log data, forgetting to complete the task due to competing obligations (e.g., work, family), or a combination of factors.

Bellabeat does not currently offer a product with automated data colleciton for weight (see table below).

Product Description Activity Sleep Stress Hydration Weight
Leaf Worn as bracelet, clip, or necklace Yes Yes Yes No No
Time Worn as watch Yes Yes Yes No No
Spring Used as water bottle No No No Yes No

If users must enter data manually or use another product to collect weight data automatically, it may decrease the use of the Bellabeat app and the online coaching program, which would decrease user engagement.

Recommendations

Immediate Next Steps:

If the results of the survey correspond with the analysis and there is sufficient customer interest:

  1. Examine the cost associated with adding automated weight collection to future products.

  2. Consider releasing a limited amount of updated devices and monitor the use of the Bellabeat app and online coaching program for a concurrent increase in user engagement.


---
title: 'Patterns in Fitness Data from Smart Devices '
author: "Christopher Taylor"
output:
  html_notebook: null
  'tml_notebook:': default
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)

#Packages used
library(readr)
library(tidyverse)
library(ggplot2)
library(dplyr)
library(readxl)
library(knitr)
library(moments)

#Activity Data
merged <- merged <- read_excel("~/FitBit Analysis/merged.xlsx", col_types = c("text", "date", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric"))

#Sleep Data
sleep <- read_excel("~/FitBit Analysis/sleep.xlsx", col_types = c("text", "date","numeric", "numeric","numeric"))

#Weight Data
weight <- read_excel("~/FitBit Analysis/weight.xlsx", col_types = c("text", "date", "numeric", "numeric", "numeric", "numeric", "numeric", "text", "numeric"))

```

This document was created to fulfill some of the requirements for the Google Data Analytics Professional Certificate.

---

## Overview
To inform marketing strategies and product developement for Bellabeat, a fitness company geared towards women, an analysis of publically available data from smart devices was conducted. The [dataset](https://www.kaggle.com/arashnic/fitbit) aggregated information from FitBit products in 2016 and was obtained through Kaggle.

The data were cleaned, reviewed, and analyzed with RStudio. Tests of normality and an analysis of unique users was conducted for all data. To gain additional information about the under-utilization of data related to weight, a second analysis was conducted. 

## Summary of Data

The results of the analysis should be interpreted cautiously due to the age of the dataset, number of unique users, and tests of normality; however, Step 1 in the Recommendations section will mitigate some of these limitations.

A review of all measures indicated:

- The data are not normally distrbuted
- Users collected data on activities more often than other health categories 
- Users collected data on weight least often

### Tests of Normality
```{r}
### Tests of Normality

#Tests of Skewness
SkSt <- skewness(merged$TotalSteps)
SkDi <- skewness(merged$TotalDistance)
SkVeMi <- skewness(merged$VeryActiveMinutes)
SkFaMi <- skewness(merged$FairlyActiveMinutes)
SkLiMi <- skewness(merged$LightlyActiveMinutes)
SkSeMi <- skewness(merged$SedentaryMinutes)
SkCa <- skewness(merged$Calories)
SkSl <- skewness(sleep$minutes_asleep)
SkBd <- skewness(sleep$minutes_in_bed)
SkWe <- skewness(weight$weight_pounds)
SkBmi <- skewness(weight$bmi)

#Test of Kurtosis 
KuSt <- kurtosis(merged$TotalSteps)
KuDi <- kurtosis(merged$TotalDistance)
KuVeMi <- kurtosis(merged$VeryActiveMinutes)
KuFaMi <- kurtosis(merged$FairlyActiveMinutes)
KuLiMi <- kurtosis(merged$LightlyActiveMinutes)
KuSeMi <- kurtosis(merged$SedentaryMinutes)
KuCa <- kurtosis(merged$Calories)
KuSl <- kurtosis(sleep$minutes_asleep)
KuBd <- kurtosis(sleep$minutes_in_bed)
KuWe <- kurtosis(weight$weight_pounds)
KuBmi <- kurtosis(weight$bmi)

#List of Variables Tested
sk_ku_act_names <- c("Total Steps", "Total Distance", "Very Active Minutes", "Fairly Active Minutes", "Lightly Active Minutes", "Sedentary Minutes", "Calories")
sk_ku_sle_names <- c("Minutes Asleep", "Minutes in Bed")
sk_ku_wei_names <- c("Weight (lbs)", "BMI")

#Vector of Skewness Results
sk_act_values <- c(SkSt, SkDi, SkVeMi, SkFaMi, SkLiMi, SkSeMi, SkCa)
sk_sle_values <- c(SkSl, SkBd)
sk_wei_values <- c(SkWe, SkBmi)

#Vector of Kurtosis Results
ku_act_values <- c(KuSt, KuDi, KuVeMi, KuFaMi, KuLiMi, KuSeMi, KuCa)
ku_sle_values <- c(KuSl, KuBd)
ku_wei_values <- c(KuWe, KuBmi)

#Data frames
sk_ku_act_df <- data.frame(sk_ku_act_names, sk_act_values, ku_act_values)
sk_ku_sle_df <- data.frame(sk_ku_sle_names, sk_sle_values, ku_sle_values)
sk_ku_wei_df <- data.frame(sk_ku_wei_names, sk_wei_values, ku_wei_values)

#Renaming Columns
colnames(sk_ku_act_df) <- c("Activity Measures", "Skewness", "Kurtosis")
colnames(sk_ku_sle_df) <- c("Sleep Measures", "Skewness", "Kurtosis")
colnames(sk_ku_wei_df) <- c("Weight Measures", "Skewness", "Kurtosis")

#Data frames to tables
kable(sk_ku_act_df)
kable(sk_ku_sle_df)
kable(sk_ku_wei_df)
```

### Observations
```{r}

#Calculating number of observations across health measures
OSt <- length(merged$TotalSteps)
ODi <- length(merged$TotalDistance)
OVeMi <- length(merged$VeryActiveMinutes)
OFaMi <- length(merged$FairlyActiveMinutes)
OLiMi <- length(merged$LightlyActiveMinutes)
OSeMi <- length(merged$SedentaryMinutes)
OCa <- length(merged$Calories)
OSl <- length(sleep$minutes_asleep)
OBd <- length(sleep$minutes_in_bed)
OWe <- length(weight$weight_pounds)
OBmi <- length(weight$bmi)

#Organizing information about activity into data frame
o_act_measures <- c("Total Steps", "Total Distance", "Very Active Minutes", "Fairly Active Minutes", "Lightly Active Minutes", "Sedentary Minutes", "Calories")
o_act_observations <- c(OSt, ODi, OVeMi, OFaMi, OLiMi, OSeMi, OCa)
o_act_df <- data.frame(o_act_measures, o_act_observations)

#Organizing information about sleep into data frame
o_sle_measures <- c("Minutes Asleep", "Minutes in Bed")
o_sle_observations <- c(OSl, OBd)
o_sle_df <- data.frame(o_sle_measures, o_sle_observations)

#Organizing information on weight into data frame
o_wei_measures <- c("Weight (lbs)", "BMI")
o_wei_observations <- c(OWe, OBmi)
o_wei_df <- data.frame(o_wei_measures, o_wei_observations)

#Renaming Columns
colnames(o_wei_df) <- c("Weight Measures", "Observations")
colnames(o_sle_df) <- c("Sleep Measures", "Observations")
colnames(o_act_df) <- c("Activity Measures", "Observations")

#Converting data frames to tables
kable(o_act_df)
kable(o_sle_df)
kable(o_wei_df)
```

## Visual Representation of Data

```{r}

#Calculating number of unique users
users_activity <- length(unique(merged$Id))
users_sleep <- length(unique(sleep$id))
users_weight <- length(unique(weight$id))

#Creating data frame
user_values <- c(users_activity, users_sleep, users_weight)
user_labels <- c("Activity", "Sleep", "Weight")
unique_user_df <- data.frame(user_labels, user_values)

#Graph of unique users
ggplot(unique_user_df) + geom_col(mapping = aes(x=user_labels, y=user_values), fill = "#000066") + theme_bw() + labs(x = "Health Categories", y = "Total", title = "Unique Users") + theme(axis.text = element_text(size = 13), text = element_text(size = 13))
```

```{r}
#Color palette

cbPalette <- c("#999999", "#009E73")

#Graph of weight-specific data
ggplot(weight) + geom_bar(mapping = aes(x=id, fill = manual_report)) + coord_flip() + theme_bw() + labs(x = "User ID", y="Total", title = "Type of Weight Log") + theme(legend.title = element_blank()) + scale_fill_manual(values=cbPalette) + theme(axis.text = element_text(size = 12.5), text = element_text(size = 13))

```

## Potential Growth Area

Two important themes were found in the weight-specific analysis. 

1. Most observations were logged manually
2. Of the limited users, most logged their weight fewer than 5 times\

Based on the available information, it appears **measures without automated data collection are less likely to be monitored and used.** \

Patterns in user behavior may be attributed to the effort required to manually log data, forgetting to complete the task due to competing obligations (e.g., work, family), or a combination of factors.\

Bellabeat does not currently offer a product with automated data colleciton for weight (see table below).\

```{r echo=FALSE}

#Data for data frame
Product <- c("Leaf", "Time", "Spring")
Description <- c("Worn as bracelet, clip, or necklace", "Worn as watch", "Used as water bottle") 
Activity <- c("Yes", "Yes", "No")
Sleep <- c("Yes", "Yes", "No")
Stress <- c("Yes", "Yes", "No")
Hydration <- c("No", "No", "Yes")
Weight <- c("No", "No", "No")

#Data frame about Bellabeat products
product_df <- data.frame(Product, Description, Activity, Sleep, Stress, Hydration, Weight)

#Data frame to table
kable(product_df)
```

If users must enter data manually or use another product to collect weight data automatically, it may decrease the use of the Bellabeat app and the online coaching program, which would decrease user engagement.

## Recommendations 
Immediate Next Steps:

- Conduct a brief survey of current customers to:
  - Verify the results of the results of the analysis
  - Gauge customer interest about adding a feature to automatically log weight data

If the results of the survey correspond with the analysis and there is sufficient customer interest: 

1. Examine the cost associated with adding automated weight collection to future products.

2. Consider releasing a limited amount of updated devices and monitor the use of the Bellabeat app and online coaching program for a concurrent increase in user engagement.

---
