FitBit is a popular wireless wearable device that monitors health metrics like sleep, steps, and different activity levels. The dataset used for this analysis is on public domain from kaggle. More information about the dataset can be found linked here
The dataset contains health data from 30 fitbit users who consented to share their data. Of these 30 individuals, 24 were utilized in this analysis to discover patterns in device usage and correlations between various health data.
Packages like tidyverse contain core functions that are useful for data analysis in R
install.packages("tidyverse")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)
library("tidyverse")
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
After uploading the csv file by utilizing the upload option in the menu on the bottom right pane, assign the csv file to a named dataframe. This can be done by using the assignment operator and read_csv function. This will create a usable dataset in R.
fitbit_df <- read_csv("JoinedSleepAndActDailyData.csv")
## Rows: 413 Columns: 22
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): primary_key_s, primary_key_d
## dbl (18): id, total_sleep_records, total_minutes_asleep, total_time_in_bed,...
## date (2): sleep_day, activity_date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Use the head() and colnames() functions to gain familiarity with the data.
head(fitbit_df)
## # A tibble: 6 × 22
## primary_key_s id sleep_day total_sleep_records total_minutes_asleep
## <chr> <dbl> <date> <dbl> <dbl>
## 1 1503960366_4/12/20… 1.50e9 2016-04-12 1 327
## 2 1503960366_4/13/20… 1.50e9 2016-04-13 2 384
## 3 1503960366_4/15/20… 1.50e9 2016-04-15 1 412
## 4 1503960366_4/16/20… 1.50e9 2016-04-16 2 340
## 5 1503960366_4/17/20… 1.50e9 2016-04-17 1 700
## 6 1503960366_4/19/20… 1.50e9 2016-04-19 1 304
## # ℹ 17 more variables: total_time_in_bed <dbl>, primary_key_d <chr>,
## # id_1 <dbl>, activity_date <date>, total_steps <dbl>, total_distance <dbl>,
## # tracker_distance <dbl>, logged_activities_distance <dbl>,
## # very_active_distance <dbl>, moderately_active_distance <dbl>,
## # light_active_distance <dbl>, sedentary_active_distance <dbl>,
## # very_active_minutes <dbl>, fairly_active_minutes <dbl>,
## # lightly_active_minutes <dbl>, sedentary_minutes <dbl>, calories <dbl>
colnames(fitbit_df)
## [1] "primary_key_s" "id"
## [3] "sleep_day" "total_sleep_records"
## [5] "total_minutes_asleep" "total_time_in_bed"
## [7] "primary_key_d" "id_1"
## [9] "activity_date" "total_steps"
## [11] "total_distance" "tracker_distance"
## [13] "logged_activities_distance" "very_active_distance"
## [15] "moderately_active_distance" "light_active_distance"
## [17] "sedentary_active_distance" "very_active_minutes"
## [19] "fairly_active_minutes" "lightly_active_minutes"
## [21] "sedentary_minutes" "calories"
Use the ggplot and aesthetic functions to create a plot that displays the relationship between daily total steps and time.
ggplot(data = fitbit_df) +
geom_smooth(mapping = aes(x = sleep_day, y = total_steps)) +
labs(title = "Daily Total Steps Over Time") +
xlab("Day") +
ylab("Daily Steps (count)")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Use the ggplot and aesthetic functions to create a plot that displays the relationship between daily hours of sleep and time. Make sure to convert any time metrics from minutes to hours using the divide operator. Appropriately label the plot by using label functions.
ggplot(data = fitbit_df) +
geom_smooth(mapping = aes(x = sleep_day, y = total_minutes_asleep/60)) +
labs(title = "Hours Asleep Daily Over Time") +
xlab("Day") +
ylab("Hours Asleep (hours)")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Use the ggplot and aesthetic functions to create a plot that displays the relationship between daily steps and time. Make sure to convert any time metrics from minutes to hours using the divide operator. Hours are easier to understand than using hundreds of minutes. Appropriately label the plot by using label functions.
ggplot(data = fitbit_df) +
geom_point(mapping = aes(x = sedentary_minutes/60, y = fairly_active_minutes/60)) +
labs(title = "Time Spent Fairly Active vs Time Spent Sedentary") +
xlab("Daily Hours Spent Sedentary (hours)") +
ylab("Daily Hours Spent Fairly Active (hours)")
Use the summarize function to determine what the mean metrics were for different columns in the dataset. This may be helpful information to touch on during a presentation. Once again, make sure to convert any metrics in minutes to hours.
summarize(fitbit_df, mean_daily_steps = mean(total_steps), mean_hours_asleep = mean(total_minutes_asleep/60), mean_hours_sedentary = mean(sedentary_minutes/60), mean_hours_sfa = mean(fairly_active_minutes/60))
## # A tibble: 1 × 4
## mean_daily_steps mean_hours_asleep mean_hours_sedentary mean_hours_sfa
## <dbl> <dbl> <dbl> <dbl>
## 1 8541. 6.99 11.9 0.301
Use the unique_id_count function to determine if the analysis was conducted using at least 30 individuals. 30 is a guideline for the minimum sample size from which credible statistical tests can be run. Our sample size number will help determine if there are potential limitations to our analysis.
unique_id_count <- n_distinct(fitbit_df$id)
print(unique_id_count)
## [1] 24