Exercise Schedule & Meal Plan is a structured dataset (80,000 rows, 5 columns) mapping basic user inputs—Gender, Goal, and BMI Category—to recommended Exercise Schedule and Meal Plan. Ideal for recommendation systems, multi-output classification, and rule-based baselines in health & fitness applications. This dataset is available on kaggle: https://www.kaggle.com/datasets/kavindavimukthi/meal-plan-and-exercise-schedule-gender-goal-bmi
# import libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.1 ✔ stringr 1.5.2
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(stringr)
library(scales)
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:purrr':
##
## discard
##
## The following object is masked from 'package:readr':
##
## col_factor
library(forcats)
url = 'https://raw.githubusercontent.com/mehreengillani/DATA607/refs/heads/main/GYM.csv'
gym_data <- read_csv(url)
## Rows: 80000 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Gender, Goal, BMI Category, Exercise Schedule, Meal Plan
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
cat("Dataset dimensions:", dim(gym_data), "\n")
## Dataset dimensions: 80000 5
cat("Column names:", names(gym_data), "\n")
## Column names: Gender Goal BMI Category Exercise Schedule Meal Plan
cat("\nFirst look at the data:\n")
##
## First look at the data:
gym_data
## # A tibble: 80,000 × 5
## Gender Goal `BMI Category` `Exercise Schedule` `Meal Plan`
## <chr> <chr> <chr> <chr> <chr>
## 1 Female muscle_gain Normal weight Moderate cardio, Strength trai… Balanced d…
## 2 Male fat_burn Underweight Light weightlifting, Yoga, and… High-calor…
## 3 Male muscle_gain Normal weight Moderate cardio, Strength trai… Balanced d…
## 4 Male muscle_gain Overweight High-intensity interval traini… Low-carb, …
## 5 Female muscle_gain Normal weight Moderate cardio, Strength trai… Balanced d…
## 6 Male muscle_gain Underweight Light weightlifting, Yoga, and… High-calor…
## 7 Female fat_burn Overweight High-intensity interval traini… Low-carb, …
## 8 Male muscle_gain Overweight High-intensity interval traini… Low-carb, …
## 9 Female fat_burn Obesity Low-impact cardio, Swimming, a… Low-calori…
## 10 Female muscle_gain Underweight Light weightlifting, Yoga, and… High-calor…
## # ℹ 79,990 more rows
cat("Missing values by column:\n")
## Missing values by column:
colSums(is.na(gym_data))
## Gender Goal BMI Category Exercise Schedule
## 0 0 0 0
## Meal Plan
## 0
cat("\nData types:\n")
##
## Data types:
glimpse(gym_data)
## Rows: 80,000
## Columns: 5
## $ Gender <chr> "Female", "Male", "Male", "Male", "Female", "Male"…
## $ Goal <chr> "muscle_gain", "fat_burn", "muscle_gain", "muscle_…
## $ `BMI Category` <chr> "Normal weight", "Underweight", "Normal weight", "…
## $ `Exercise Schedule` <chr> "Moderate cardio, Strength training, and 5000 step…
## $ `Meal Plan` <chr> "Balanced diet with moderate protein and carbohydr…
clean_data <- gym_data %>%
rename_with(~ tolower(gsub(" ", "_", .x))) %>%
mutate(
gender = factor(gender, levels = c("Female", "Male")),
# Standardize goal values
goal = case_when(
tolower(goal) == "muscle_gain" ~ "Muscle Gain",
tolower(goal) == "fat_burn" ~ "Fat Burn",
TRUE ~ goal
),
goal = factor(goal, levels = c("Muscle Gain", "Fat Burn")),
# Standardize BMI categories
bmi_category = case_when(
tolower(bmi_category) == "normal weight" ~ "Normal",
tolower(bmi_category) == "underweight" ~ "Underweight",
tolower(bmi_category) == "overweight" ~ "Overweight",
tolower(bmi_category) == "obesity" ~ "Obese",
TRUE ~ bmi_category
),
bmi_category = factor(bmi_category,
levels = c("Underweight", "Normal", "Overweight", "Obese")),
# Categorize exercise intensity based on schedule descriptions
exercise_intensity = case_when(
str_detect(tolower(exercise_schedule), "high-intensity|hiit|intense") ~ "High",
str_detect(tolower(exercise_schedule), "moderate|strength training") ~ "Moderate",
str_detect(tolower(exercise_schedule), "light|low-impact|yoga") ~ "Low",
TRUE ~ "Unknown"
),
exercise_intensity = factor(exercise_intensity,
levels = c("Low", "Moderate", "High")),
# Extract step count from exercise schedule
steps = as.numeric(str_extract(exercise_schedule, "\\d+")),
# Categorize meal plan types
meal_plan_type = case_when(
str_detect(tolower(meal_plan), "balanced|moderate") ~ "Balanced",
str_detect(tolower(meal_plan), "high-calorie|protein-rich|whole milk") ~ "High Calorie",
str_detect(tolower(meal_plan), "low-carb|high-fiber") ~ "Low Carb",
str_detect(tolower(meal_plan), "low-calorie|portion control") ~ "Low Calorie",
TRUE ~ "Other"
),
meal_plan_type = factor(meal_plan_type)
)
cat("Cleaned data structure:\n")
## Cleaned data structure:
head(clean_data)
## # A tibble: 6 × 8
## gender goal bmi_category exercise_schedule meal_plan exercise_intensity steps
## <fct> <fct> <fct> <chr> <chr> <fct> <dbl>
## 1 Female Musc… Normal Moderate cardio,… Balanced… Moderate 5000
## 2 Male Fat … Underweight Light weightlift… High-cal… Low 2000
## 3 Male Musc… Normal Moderate cardio,… Balanced… Moderate 5000
## 4 Male Musc… Overweight High-intensity i… Low-carb… High 8000
## 5 Female Musc… Normal Moderate cardio,… Balanced… Moderate 5000
## 6 Male Musc… Underweight Light weightlift… High-cal… Low 2000
## # ℹ 1 more variable: meal_plan_type <fct>
colnames(clean_data)
## [1] "gender" "goal" "bmi_category"
## [4] "exercise_schedule" "meal_plan" "exercise_intensity"
## [7] "steps" "meal_plan_type"
#Verify categorical variable standardization
cat("Gender distribution:\n")
## Gender distribution:
table(clean_data$gender)
##
## Female Male
## 40680 39320
cat("\nGoal distribution:\n")
##
## Goal distribution:
table(clean_data$goal)
##
## Muscle Gain Fat Burn
## 41020 38980
cat("\nBMI Category distribution:\n")
##
## BMI Category distribution:
table(clean_data$bmi_category)
##
## Underweight Normal Overweight Obese
## 20940 19920 19840 19300
cat("\nExercise Intensity distribution:\n")
##
## Exercise Intensity distribution:
table(clean_data$exercise_intensity)
##
## Low Moderate High
## 40240 19920 19840
cat("\nMeal Plan Type distribution:\n")
##
## Meal Plan Type distribution:
table(clean_data$meal_plan_type)
##
## Balanced High Calorie Low Calorie Low Carb
## 19920 20940 19300 19840
# Summary 1: Goals by Gender and BMI - Percentage of Total
goal_summary <- clean_data %>%
count(gender, bmi_category, goal) %>%
ungroup() %>% # Remove previous grouping
mutate(percentage_of_total = round(n / sum(n) * 100, 2)) %>% # Percentage of overall total
arrange(gender, bmi_category, desc(n))
cat("Goals by Gender and BMI Category (Percentage of Total):\n")
## Goals by Gender and BMI Category (Percentage of Total):
print(goal_summary, n = Inf)
## # A tibble: 16 × 5
## gender bmi_category goal n percentage_of_total
## <fct> <fct> <fct> <int> <dbl>
## 1 Female Underweight Fat Burn 5480 6.85
## 2 Female Underweight Muscle Gain 5200 6.5
## 3 Female Normal Muscle Gain 5440 6.8
## 4 Female Normal Fat Burn 4800 6
## 5 Female Overweight Muscle Gain 4940 6.18
## 6 Female Overweight Fat Burn 4860 6.08
## 7 Female Obese Muscle Gain 5020 6.28
## 8 Female Obese Fat Burn 4940 6.18
## 9 Male Underweight Muscle Gain 5260 6.58
## 10 Male Underweight Fat Burn 5000 6.25
## 11 Male Normal Muscle Gain 4880 6.1
## 12 Male Normal Fat Burn 4800 6
## 13 Male Overweight Muscle Gain 5460 6.82
## 14 Male Overweight Fat Burn 4580 5.73
## 15 Male Obese Muscle Gain 4820 6.02
## 16 Male Obese Fat Burn 4520 5.65
# Visualization 1: Goals by Gender
ggplot(clean_data, aes(x = gender, fill = goal)) +
geom_bar(position = "dodge", alpha = 0.8) +
geom_text(stat = 'count', aes(label = after_stat(count)),
position = position_dodge(width = 0.9), vjust = -0.5) +
labs(title = "Fitness Goals by Gender",
x = "Gender",
y = "Count",
fill = "Goal") +
theme_minimal() +
scale_fill_brewer(palette = "Set5")
## Warning: Unknown palette: "Set5"
# Visualization 3: Exercise Intensity by Goal
ggplot(clean_data, aes(x = goal, fill = exercise_intensity)) +
geom_bar(position = "dodge", alpha = 0.8) +
labs(title = "Exercise Intensity by Fitness Goal",
x = "Goal",
y = "Count",
fill = "Intensity") +
theme_minimal() +
scale_fill_brewer(palette = "Set4") +
facet_wrap(~gender)
## Warning: Unknown palette: "Set4"
Muscle Gain Approaches: Men prefer high-intensity; women
choose moderate strength training
Overall Intensity
Preference: Women favor low-intensity activities (yoga, swimming)
more than men
Most Common Pattern: Low-intensity exercise
predominates across both genders
# Load scales package properly
library(scales)
# Calculate percentages first, then plot
bmi_gender_percentage <- clean_data %>%
count(gender, bmi_category) %>%
group_by(gender) %>%
mutate(percentage = n / sum(n)) %>%
ungroup()
# Now create the plot with percentages
ggplot(bmi_gender_percentage, aes(x = gender, y = percentage, fill = bmi_category)) +
geom_col(position = "dodge", alpha = 0.8) +
labs(title = "BMI Category Distribution by Gender",
x = "Gender",
y = "Percentage",
fill = "BMI Category") +
theme_minimal() +
scale_fill_brewer(palette = "Set4") +
scale_y_continuous(labels = percent, limits = c(0, 1)) + # 0% to 100%
geom_text(aes(label = percent(percentage)),
position = position_dodge(width = 0.9),
vjust = -0.5, size = 3)
## Warning: Unknown palette: "Set4"
Underweight: Nearly identical rates (Female: 26.25%, Male:
26.09%)
Overweight: Comparable percentages (Female: 24%,
Male: 25.5%)
Obese: Minimal gender difference (Female:
24.5%, Male: 23.75%)
Overall: Remarkably similar BMI
distribution patterns between genders
# Calculate percentages first
meal_gender_goal_percentage <- clean_data %>%
count(gender, goal,meal_plan_type) %>%
group_by(gender) %>%
mutate(percentage = round(n / sum(n),3)) %>% #round(n / sum(n) * 100, 2))
ungroup()
# Now create the plot with percentages
ggplot(meal_gender_goal_percentage, aes(x = meal_plan_type, y = percentage, fill = meal_plan_type)) +
geom_col(position = "dodge", alpha = 0.8) +
labs(title = "Meal Plan Types by Goal",
x = "Gender", # Note: You might want to reconsider this x-axis label since it shows meal_plan_type
y = "Percentage",
fill = "Meal Plan Type") +
theme_minimal() +
scale_fill_brewer(palette = "Set4") +
scale_y_continuous(labels = percent, limits = c(0, 1)) + # 0% to 100%
facet_grid(goal ~ gender) +
geom_text(aes(label = percent(percentage)),
position = position_dodge(width = 0.9),
vjust = -0.5, size = 3) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
## Warning: Unknown palette: "Set4"
Muscle Gain: Men prefer low-carb; women choose balanced
diets
Fat Burn: Both genders consistently select high-calorie
meal plans
Overall: Gender influences diet choice for muscle
building but not for fat reduction
step_heatmap <- clean_data %>%
group_by(exercise_intensity, goal) %>%
summarise(mean_steps = mean(steps, na.rm = TRUE), .groups = 'drop')
ggplot(step_heatmap, aes(x = exercise_intensity, y = goal, fill = mean_steps)) +
geom_tile(color = "white", size = 1) +
geom_text(aes(label = round(mean_steps)), color = "white", fontface = "bold") +
scale_fill_gradient(low = "lightblue", high = "darkblue",
name = "Average Steps") +
labs(title = "Average Step Count by Exercise Intensity and Goal",
x = "Exercise Intensity",
y = "Goal") +
theme_minimal()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
This analysis of the Exercise Schedule & Meal Plan dataset revealed several key insights into fitness patterns across gender, BMI categories, and fitness goals:
Gender-Based Exercise Patterns: Distinct exercise intensity
preferences emerged between genders. Men pursuing muscle gain
predominantly selected high-intensity workouts, while women with the
same goal preferred moderate strength training. Overall, women showed a
stronger preference for low-intensity activities like yoga and
swimming.
BMI Distribution Consistency: The analysis revealed remarkably similar BMI distributions across genders, with nearly identical proportions in underweight (~26%), overweight (~24-25%), and obese (~24%) categories for both males and females.
Goal-Oriented Meal Planning: Muscle Gain: Men prefer low-carb;
women choose balanced diets
Fat Burn: Both genders
consistently select high-calorie meal plans
Machine Learning Applications: Develop recommendation systems
using multi-output classification to suggest personalized exercise and
meal plans
Cluster Analysis: Identify distinct user segments
based on gender, BMI, and goal combinations for targeted fitness
programs
Predictive Modeling: Build models to predict optimal
exercise intensity and meal plans for new users
Temporal Analysis: Incorporate time-series data to track
fitness progress and plan effectiveness over time
Nutritional
Deep Dive: Expand meal plan analysis with detailed nutritional
information (macronutrients, calories)
Exercise Specificity:
Categorize exercises by type (cardio, strength, flexibility) and muscle
groups targeted
Cultural & Geographic Factors: Investigate how exercise
and diet preferences vary across different demographics
Seasonal
Patterns: Analyze how fitness recommendations change based on
seasonal variations
Age Group Analysis: Extend the dataset to
include age demographics for life-stage specific recommendations
Personalized Fitness Apps: Use findings to develop AI-driven
fitness coaching applications
Healthcare Integration: Partner
with healthcare providers for obesity prevention and weight management
programs
Corporate Wellness: Adapt insights for workplace
wellness program development
This analysis provides a strong
foundation for building intelligent fitness recommendation systems and
contributes valuable insights to the growing field of data-driven health
and wellness optimization.