FitBit Health Data Analysis

Analysis Overview

The dataset contains health data from 30 fitbit users who consented to share their data. Of these 30 individuals, 24 were utilized in this analysis to discover patterns in device usage and correlations between various health data.

Step 1: Install and load the essential packages

Packages like tidyverse contain core functions that are useful for data analysis in R

install.packages("tidyverse")

## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)

library("tidyverse")

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Step 2: Load the dataset

After uploading the csv file by utilizing the upload option in the menu on the bottom right pane, assign the csv file to a named dataframe. This can be done by using the assignment operator and read_csv function. This will create a usable dataset in R.

fitbit_df <- read_csv("JoinedSleepAndActDailyData.csv")

## Rows: 413 Columns: 22
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr   (2): primary_key_s, primary_key_d
## dbl  (18): id, total_sleep_records, total_minutes_asleep, total_time_in_bed,...
## date  (2): sleep_day, activity_date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Step 3: Preview the data

Use the head() and colnames() functions to gain familiarity with the data.

head(fitbit_df)

## # A tibble: 6 × 22
##   primary_key_s           id sleep_day  total_sleep_records total_minutes_asleep
##   <chr>                <dbl> <date>                   <dbl>                <dbl>
## 1 1503960366_4/12/20… 1.50e9 2016-04-12                   1                  327
## 2 1503960366_4/13/20… 1.50e9 2016-04-13                   2                  384
## 3 1503960366_4/15/20… 1.50e9 2016-04-15                   1                  412
## 4 1503960366_4/16/20… 1.50e9 2016-04-16                   2                  340
## 5 1503960366_4/17/20… 1.50e9 2016-04-17                   1                  700
## 6 1503960366_4/19/20… 1.50e9 2016-04-19                   1                  304
## # ℹ 17 more variables: total_time_in_bed <dbl>, primary_key_d <chr>,
## #   id_1 <dbl>, activity_date <date>, total_steps <dbl>, total_distance <dbl>,
## #   tracker_distance <dbl>, logged_activities_distance <dbl>,
## #   very_active_distance <dbl>, moderately_active_distance <dbl>,
## #   light_active_distance <dbl>, sedentary_active_distance <dbl>,
## #   very_active_minutes <dbl>, fairly_active_minutes <dbl>,
## #   lightly_active_minutes <dbl>, sedentary_minutes <dbl>, calories <dbl>

colnames(fitbit_df)

##  [1] "primary_key_s"              "id"                        
##  [3] "sleep_day"                  "total_sleep_records"       
##  [5] "total_minutes_asleep"       "total_time_in_bed"         
##  [7] "primary_key_d"              "id_1"                      
##  [9] "activity_date"              "total_steps"               
## [11] "total_distance"             "tracker_distance"          
## [13] "logged_activities_distance" "very_active_distance"      
## [15] "moderately_active_distance" "light_active_distance"     
## [17] "sedentary_active_distance"  "very_active_minutes"       
## [19] "fairly_active_minutes"      "lightly_active_minutes"    
## [21] "sedentary_minutes"          "calories"

Step 4: Create a plot for Daily Total Steps vs Day

Use the ggplot and aesthetic functions to create a plot that displays the relationship between daily total steps and time.

ggplot(data = fitbit_df) +
  geom_smooth(mapping = aes(x = sleep_day, y = total_steps)) +
  labs(title = "Daily Total Steps Over Time") +
  xlab("Day") +
  ylab("Daily Steps (count)")

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Step 5: Create a plot for Daily Hours of Sleep vs Day

Use the ggplot and aesthetic functions to create a plot that displays the relationship between daily hours of sleep and time. Make sure to convert any time metrics from minutes to hours using the divide operator. Appropriately label the plot by using label functions.

ggplot(data = fitbit_df) +
  geom_smooth(mapping = aes(x = sleep_day, y = total_minutes_asleep/60)) +
  labs(title = "Hours Asleep Daily Over Time") +
  xlab("Day") +
  ylab("Hours Asleep (hours)")

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Step 6: Create a plot for Hours Spent Fairly Active vs Hours Spent Sedentary

Use the ggplot and aesthetic functions to create a plot that displays the relationship between daily steps and time. Make sure to convert any time metrics from minutes to hours using the divide operator. Hours are easier to understand than using hundreds of minutes. Appropriately label the plot by using label functions.

ggplot(data = fitbit_df) +
  geom_point(mapping = aes(x = sedentary_minutes/60, y = fairly_active_minutes/60)) +
  labs(title = "Time Spent Fairly Active vs Time Spent Sedentary") +
  xlab("Daily Hours Spent Sedentary (hours)") +
  ylab("Daily Hours Spent Fairly Active (hours)")

Step 7: Create summary statistics

Use the summarize function to determine what the mean metrics were for different columns in the dataset. This may be helpful information to touch on during a presentation. Once again, make sure to convert any metrics in minutes to hours.

summarize(fitbit_df, mean_daily_steps = mean(total_steps), mean_hours_asleep = mean(total_minutes_asleep/60), mean_hours_sedentary = mean(sedentary_minutes/60), mean_hours_sfa = mean(fairly_active_minutes/60))

## # A tibble: 1 × 4
##   mean_daily_steps mean_hours_asleep mean_hours_sedentary mean_hours_sfa
##              <dbl>             <dbl>                <dbl>          <dbl>
## 1            8541.              6.99                 11.9          0.301

Step 8: Count the number of unique participants used in this analysis

Use the unique_id_count function to determine if the analysis was conducted using at least 30 individuals. 30 is a guideline for the minimum sample size from which credible statistical tests can be run. Our sample size number will help determine if there are potential limitations to our analysis.

unique_id_count <- n_distinct(fitbit_df$id)
print(unique_id_count)

## [1] 24