This report analyzes daily gym attendance and workout data for the year 2024. There are 2,600 visits to the gym throughout the year spanning members of various ages, genders, and membership types. The data set was taken from the following link: https://www.kaggle.com/datasets/zahranusratt/daily-gym-attendance-and-workout-activity-dataset?resource=download. The data set tracks on an individual level, workout preference, calories burned, etc. The goal for this project is to analyze the data to identify trends in gym usage to understand member habits!
head(df, 5)
## visit_date age gender membership_type workout_type
## 1 2024-10-11 64 Other Annual HIIT
## 2 2024-06-01 65 Female Quarterly Strength Training
## 3 2024-06-13 45 Male Quarterly Cardio
## 4 2024-02-05 35 Female Monthly CrossFit
## 5 2024-07-13 26 Female Quarterly Yoga
## workout_duration_minutes calories_burned check_in_time attendance_status
## 1 28 171 20:04 Absent
## 2 72 650 19:17 Absent
## 3 70 633 7:24 Absent
## 4 64 362 7:18 Absent
## 5 31 262 11:22 Absent
Pictured above are the first 5 lines of the data set. As you can see, the dataset has 9 variables which are as follows: visit_date, age, gender, memebership_type, workout_type, workout_duration_minutes, calories_burned, check_in_time, and attendance_status. There are also 2 more created variables which are done later on in the code, and they are the following: age_group and hour.
Below are the visualizations and findings! Click through the tabs to see each.
df$age_group <- cut(df$age, breaks = c(0, 18, 25, 35, 45, 55, 65, 100), labels = c("Under 18", "18-25", "26-35", "36-45", "46-55", "56-65", "65+"))
avg_cal_age <- aggregate(calories_burned ~ age_group + workout_type, data = df, FUN = mean)
avg_cal_age$calories_burned <- round(avg_cal_age$calories_burned, 0)
ggplot(avg_cal_age, aes(x = age_group, y = calories_burned, fill = workout_type)) +
geom_bar(stat = "identity", position = "dodge") +
labs(x = "Age Group", y = "Average Calories Burned", fill = "Workout Type") +
theme_light() +
theme(plot.title = element_text(hjust = 0.5)) +
scale_fill_brewer(palette = "Paired") +
geom_text(aes(label = calories_burned), position = position_dodge(width = 0.9), vjust = -0.5, size = 2.5)
Strength training burns the most calories across all age groups and it peaks in the under 18 group at 649 calories burnt. Calorie burn is relatively consistent across all of the age groups which implies that the workout type matters more than age here. The under 18 group also shows the most variation out of all of the groups. Cardio is the lowest at 424 calories, and this makes sense as most people under the age of 18 who go to the gym are less worried about cardio and more worried about getting stronger, hence the big spike for strength training. Yoga is consistently burning the fewest calories and this makes sense as well due to its lower intensity comparatively.
df$hour <- hour(hm(df$check_in_time))
hourly_counts <- aggregate(cbind(n = hour) ~ hour, data = df, FUN = length)
ggplot(hourly_counts, aes(x = hour, y = n)) +
geom_line() +
geom_point(shape = 1, size = 3, color = "red") +
scale_x_continuous(breaks = 0:23) +
labs(x = "Hour", y = "Check-In Count") +
theme_light() +
theme(plot.title = element_text(hjust = 0.5))
Gym check ins peak at 8am and at 4pm (hour 16). This makes logical sense as most people go to the gym before / after work or school. There is also a noticeable dip in attendance at 3pm (hour 15). This dip corresponds to the early morning rush, the lunch rush, and the post work / school rush. 3pm is in between the lunch rush and the post rush. There is another peak at 12pm which corresponds with the lunch rush. Overall, the data shown in this graph makes total logical sense for a gym.
days_df <- df[, c("visit_date"), drop = FALSE]
days_df$month <- month(ymd(df$visit_date), label = TRUE, abbr = TRUE)
days_df$day <- day(ymd(df$visit_date))
days_df <- aggregate(n ~ month + day, data = transform(days_df, n = 1), FUN = sum)
mymonths <- c('Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec')
days_df$month <- factor(days_df$month, levels = mymonths)
ggplot(days_df, aes(x = day, y = n, fill = month)) +
geom_bar(stat = "identity", position = "dodge") +
theme_light() +
theme(plot.title = element_text(hjust = 0.5)) +
scale_y_continuous(labels = comma) +
labs(x = "Day of the Month",
y = "Visit Count",
fill = "Month") +
scale_fill_brewer(palette = "Set3") +
facet_wrap(~month, ncol = 4, nrow = 3)
Gym visits are evenly distributed across all months and all days which comes as a little bit of a shock. Normally, one would expect there to be much higher traffic in a gym during the beginning months of the year compared to the later months. No month shows a dramatic drop off which means members are not cancelling their memberships. This graph gave back unexpected results!
workout_df <- aggregate(cbind(n = workout_type) ~ workout_type, data = df, FUN = length)
plot_ly(workout_df, labels = ~workout_type, values = ~n) %>%
add_pie(hole = 0.6) %>%
layout(title = "Gym Visits by Workout Type") %>%
layout(annotations = list(text = paste0("Total Visits:\n",
scales::comma(sum(workout_df$n))),
"showarrow" = F))
Workout types are very evenly distributed across all 2,600 visits. Strength training has the highest total with 21.5% while yoga has the lowest total with 18%. This shows that the gym attracts all sorts of fitness customers and as we saw in the gym visits by day by month, they also retain them! The lack of one overly dominant category is a great sign for the gym and shows that the programs they run and the facilities they have are very well balanced.
workout_gender_df <- aggregate(cbind(n = gender) ~ workout_type + gender,
data = df, FUN = length)
plot_ly(textposition = "inside", labels = ~gender, values = ~n) %>%
add_pie(data = workout_gender_df[workout_gender_df$workout_type == "Cardio",],
name = "Cardio", title = "Cardio",
domain = list(row = 0, column = 0)) %>%
add_pie(data = workout_gender_df[workout_gender_df$workout_type == "CrossFit",],
name = "CrossFit", title = "CrossFit",
domain = list(row = 0, column = 1)) %>%
add_pie(data = workout_gender_df[workout_gender_df$workout_type == "HIIT",],
name = "HIIT", title = "HIIT",
domain = list(row = 0, column = 2)) %>%
add_pie(data = workout_gender_df[workout_gender_df$workout_type == "Strength Training",],
name = "Strength Training", title = "Strength Training",
domain = list(row = 1, column = 0)) %>%
add_pie(data = workout_gender_df[workout_gender_df$workout_type == "Yoga",],
name = "Yoga", title = "Yoga",
domain = list(row = 1, column = 1)) %>%
layout(showlegend = TRUE,
grid = list(rows = 2, columns = 3))
The 5 pie charts show how evenly distributed gender is across all of the workout types. Opposite of what most people would think, such as common fitness stereotypes, strength training does not heavily skew male and yoga does not heavily skew female. The most balanced workout type of the 5 is crossfit which is made up of 34% male, 34% female, and 32% other. All 5 charts show what we have already discovered earlier, which is that this gym is a highly inclusive environment. That meaning no single gender dominates any particular type of workout, which is a very noteworthy finding as many would think along the lines of the common fitness sterotypes where strength trainging is all male and yoga is all female.
The analysis of 2,600 visits to a single gym in 2024 has uncovered several meaningful insights about member behavior and preferences. Strength training consistently burns the most calories while yoga burns the least, swhich shows the most intense and least intense workouts offered by the gym. Peak check ins times is exactly when you would expect it, morning rush and afternoon rush due to the fact that people have to work around their work and school schedules. There is no significant drop off in attendance throughout the months or throughout the year which implies this gym retains their customers and they continue to go. Workout type preference is very balanced throughout all 2,600 visits, with no dominating workout type. This is due to the fact that the gym has very well rounded programming. The gender distribution across all workout types is nearly equal with all percents hanging around 31%-36%. This challenges common fitness stereotypes as males do not dominate strength training and females do not dominate yoga. All together, these findings suggest that this gym is very well preforming for all members and all types of workouts!