Exercise Schedule & Meal Plan is a structured dataset (80,000 rows, 5 columns) mapping basic user inputs—Gender, Goal, and BMI Category—to recommended Exercise Schedule and Meal Plan. Ideal for recommendation systems, multi-output classification, and rule-based baselines in health & fitness applications. This dataset is available on kaggle: https://www.kaggle.com/datasets/kavindavimukthi/meal-plan-and-exercise-schedule-gender-goal-bmi

Step 1: import libraries, read csv from github

# import libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.1     ✔ stringr   1.5.2
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(stringr)
library(scales)
## 
## Attaching package: 'scales'
## 
## The following object is masked from 'package:purrr':
## 
##     discard
## 
## The following object is masked from 'package:readr':
## 
##     col_factor
library(forcats)
url = 'https://raw.githubusercontent.com/mehreengillani/DATA607/refs/heads/main/GYM.csv'
gym_data <- read_csv(url)
## Rows: 80000 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Gender, Goal, BMI Category, Exercise Schedule, Meal Plan
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Step 2: Initial data exploration

cat("Dataset dimensions:", dim(gym_data), "\n")
## Dataset dimensions: 80000 5
cat("Column names:", names(gym_data), "\n")
## Column names: Gender Goal BMI Category Exercise Schedule Meal Plan
cat("\nFirst look at the data:\n")
## 
## First look at the data:
gym_data
## # A tibble: 80,000 × 5
##    Gender Goal        `BMI Category` `Exercise Schedule`             `Meal Plan`
##    <chr>  <chr>       <chr>          <chr>                           <chr>      
##  1 Female muscle_gain Normal weight  Moderate cardio, Strength trai… Balanced d…
##  2 Male   fat_burn    Underweight    Light weightlifting, Yoga, and… High-calor…
##  3 Male   muscle_gain Normal weight  Moderate cardio, Strength trai… Balanced d…
##  4 Male   muscle_gain Overweight     High-intensity interval traini… Low-carb, …
##  5 Female muscle_gain Normal weight  Moderate cardio, Strength trai… Balanced d…
##  6 Male   muscle_gain Underweight    Light weightlifting, Yoga, and… High-calor…
##  7 Female fat_burn    Overweight     High-intensity interval traini… Low-carb, …
##  8 Male   muscle_gain Overweight     High-intensity interval traini… Low-carb, …
##  9 Female fat_burn    Obesity        Low-impact cardio, Swimming, a… Low-calori…
## 10 Female muscle_gain Underweight    Light weightlifting, Yoga, and… High-calor…
## # ℹ 79,990 more rows

Step 3: Check for missing values and data types

cat("Missing values by column:\n")
## Missing values by column:
colSums(is.na(gym_data))
##            Gender              Goal      BMI Category Exercise Schedule 
##                 0                 0                 0                 0 
##         Meal Plan 
##                 0
cat("\nData types:\n")
## 
## Data types:
glimpse(gym_data)
## Rows: 80,000
## Columns: 5
## $ Gender              <chr> "Female", "Male", "Male", "Male", "Female", "Male"…
## $ Goal                <chr> "muscle_gain", "fat_burn", "muscle_gain", "muscle_…
## $ `BMI Category`      <chr> "Normal weight", "Underweight", "Normal weight", "…
## $ `Exercise Schedule` <chr> "Moderate cardio, Strength training, and 5000 step…
## $ `Meal Plan`         <chr> "Balanced diet with moderate protein and carbohydr…

Step 4: Standardize all categorical variables

clean_data <- gym_data %>%
  rename_with(~ tolower(gsub(" ", "_", .x))) %>%
  mutate(
    gender = factor(gender, levels = c("Female", "Male")),
    
    # Standardize goal values
    goal = case_when(
      tolower(goal) == "muscle_gain" ~ "Muscle Gain",
      tolower(goal) == "fat_burn" ~ "Fat Burn",
      TRUE ~ goal
    ),
    goal = factor(goal, levels = c("Muscle Gain", "Fat Burn")),
    
    # Standardize BMI categories
    bmi_category = case_when(
      tolower(bmi_category) == "normal weight" ~ "Normal",
      tolower(bmi_category) == "underweight" ~ "Underweight",
      tolower(bmi_category) == "overweight" ~ "Overweight", 
      tolower(bmi_category) == "obesity" ~ "Obese",
      TRUE ~ bmi_category
    ),
    bmi_category = factor(bmi_category, 
                         levels = c("Underweight", "Normal", "Overweight", "Obese")),
    
    # Categorize exercise intensity based on schedule descriptions
    exercise_intensity = case_when(
      str_detect(tolower(exercise_schedule), "high-intensity|hiit|intense") ~ "High",
      str_detect(tolower(exercise_schedule), "moderate|strength training") ~ "Moderate",
      str_detect(tolower(exercise_schedule), "light|low-impact|yoga") ~ "Low",
      TRUE ~ "Unknown"
    ),
    exercise_intensity = factor(exercise_intensity, 
                               levels = c("Low", "Moderate", "High")),
    
    # Extract step count from exercise schedule
    steps = as.numeric(str_extract(exercise_schedule, "\\d+")),
    
    # Categorize meal plan types
    meal_plan_type = case_when(
      str_detect(tolower(meal_plan), "balanced|moderate") ~ "Balanced",
      str_detect(tolower(meal_plan), "high-calorie|protein-rich|whole milk") ~ "High Calorie",
      str_detect(tolower(meal_plan), "low-carb|high-fiber") ~ "Low Carb",
      str_detect(tolower(meal_plan), "low-calorie|portion control") ~ "Low Calorie",
      TRUE ~ "Other"
    ),
    meal_plan_type = factor(meal_plan_type)
  )

cat("Cleaned data structure:\n")
## Cleaned data structure:
head(clean_data)
## # A tibble: 6 × 8
##   gender goal  bmi_category exercise_schedule meal_plan exercise_intensity steps
##   <fct>  <fct> <fct>        <chr>             <chr>     <fct>              <dbl>
## 1 Female Musc… Normal       Moderate cardio,… Balanced… Moderate            5000
## 2 Male   Fat … Underweight  Light weightlift… High-cal… Low                 2000
## 3 Male   Musc… Normal       Moderate cardio,… Balanced… Moderate            5000
## 4 Male   Musc… Overweight   High-intensity i… Low-carb… High                8000
## 5 Female Musc… Normal       Moderate cardio,… Balanced… Moderate            5000
## 6 Male   Musc… Underweight  Light weightlift… High-cal… Low                 2000
## # ℹ 1 more variable: meal_plan_type <fct>
colnames(clean_data)
## [1] "gender"             "goal"               "bmi_category"      
## [4] "exercise_schedule"  "meal_plan"          "exercise_intensity"
## [7] "steps"              "meal_plan_type"

Step 4.1: Verify categorical variable standardization

#Verify categorical variable standardization
cat("Gender distribution:\n")
## Gender distribution:
table(clean_data$gender)
## 
## Female   Male 
##  40680  39320
cat("\nGoal distribution:\n") 
## 
## Goal distribution:
table(clean_data$goal)
## 
## Muscle Gain    Fat Burn 
##       41020       38980
cat("\nBMI Category distribution:\n")
## 
## BMI Category distribution:
table(clean_data$bmi_category)
## 
## Underweight      Normal  Overweight       Obese 
##       20940       19920       19840       19300
cat("\nExercise Intensity distribution:\n")
## 
## Exercise Intensity distribution:
table(clean_data$exercise_intensity)
## 
##      Low Moderate     High 
##    40240    19920    19840
cat("\nMeal Plan Type distribution:\n")
## 
## Meal Plan Type distribution:
table(clean_data$meal_plan_type)
## 
##     Balanced High Calorie  Low Calorie     Low Carb 
##        19920        20940        19300        19840

Step 4.2: Create comprehensive summary tables

# Summary 1: Goals by Gender and BMI - Percentage of Total
goal_summary <- clean_data %>%
  count(gender, bmi_category, goal) %>%
  ungroup() %>%  # Remove previous grouping
  mutate(percentage_of_total = round(n / sum(n) * 100, 2)) %>% # Percentage of overall total
  arrange(gender, bmi_category, desc(n))
  
cat("Goals by Gender and BMI Category (Percentage of Total):\n")
## Goals by Gender and BMI Category (Percentage of Total):
print(goal_summary, n = Inf)
## # A tibble: 16 × 5
##    gender bmi_category goal            n percentage_of_total
##    <fct>  <fct>        <fct>       <int>               <dbl>
##  1 Female Underweight  Fat Burn     5480                6.85
##  2 Female Underweight  Muscle Gain  5200                6.5 
##  3 Female Normal       Muscle Gain  5440                6.8 
##  4 Female Normal       Fat Burn     4800                6   
##  5 Female Overweight   Muscle Gain  4940                6.18
##  6 Female Overweight   Fat Burn     4860                6.08
##  7 Female Obese        Muscle Gain  5020                6.28
##  8 Female Obese        Fat Burn     4940                6.18
##  9 Male   Underweight  Muscle Gain  5260                6.58
## 10 Male   Underweight  Fat Burn     5000                6.25
## 11 Male   Normal       Muscle Gain  4880                6.1 
## 12 Male   Normal       Fat Burn     4800                6   
## 13 Male   Overweight   Muscle Gain  5460                6.82
## 14 Male   Overweight   Fat Burn     4580                5.73
## 15 Male   Obese        Muscle Gain  4820                6.02
## 16 Male   Obese        Fat Burn     4520                5.65

Step 5: Visualizations for categorical data analysis

# Visualization 1: Goals by Gender
ggplot(clean_data, aes(x = gender, fill = goal)) +
  geom_bar(position = "dodge", alpha = 0.8) +
  geom_text(stat = 'count', aes(label = after_stat(count)), 
            position = position_dodge(width = 0.9), vjust = -0.5) +
  labs(title = "Fitness Goals by Gender",
       x = "Gender", 
       y = "Count",
       fill = "Goal") +
  theme_minimal() +
  scale_fill_brewer(palette = "Set5")
## Warning: Unknown palette: "Set5"

Visualization 5.1: Exercise Intensity by Goal

# Visualization 3: Exercise Intensity by Goal
ggplot(clean_data, aes(x = goal, fill = exercise_intensity)) +
  geom_bar(position = "dodge", alpha = 0.8) +
  labs(title = "Exercise Intensity by Fitness Goal",
       x = "Goal", 
       y = "Count",
       fill = "Intensity") +
  theme_minimal() +
  scale_fill_brewer(palette = "Set4") +
  facet_wrap(~gender)
## Warning: Unknown palette: "Set4"


Muscle Gain Approaches: Men prefer high-intensity; women choose moderate strength training
Overall Intensity Preference: Women favor low-intensity activities (yoga, swimming) more than men
Most Common Pattern: Low-intensity exercise predominates across both genders

Step 5.2: BMI Category Distribution by Gender

# Load scales package properly
library(scales)

# Calculate percentages first, then plot
bmi_gender_percentage <- clean_data %>%
  count(gender, bmi_category) %>%
  group_by(gender) %>%
  mutate(percentage = n / sum(n)) %>%
  ungroup()

# Now create the plot with percentages
ggplot(bmi_gender_percentage, aes(x = gender, y = percentage, fill = bmi_category)) +
  geom_col(position = "dodge", alpha = 0.8) +
  labs(title = "BMI Category Distribution by Gender",
       x = "Gender", 
       y = "Percentage",
       fill = "BMI Category") +
  theme_minimal() +
  scale_fill_brewer(palette = "Set4") +
  scale_y_continuous(labels = percent, limits = c(0, 1)) +  # 0% to 100%
  geom_text(aes(label = percent(percentage)), 
            position = position_dodge(width = 0.9), 
            vjust = -0.5, size = 3)
## Warning: Unknown palette: "Set4"


Underweight: Nearly identical rates (Female: 26.25%, Male: 26.09%)
Overweight: Comparable percentages (Female: 24%, Male: 25.5%)
Obese: Minimal gender difference (Female: 24.5%, Male: 23.75%)
Overall: Remarkably similar BMI distribution patterns between genders

Step 5.3 Visualization: Meal Plan Types by Goal and BMI

# Calculate percentages first
meal_gender_goal_percentage <- clean_data %>%
  count(gender, goal,meal_plan_type) %>%
  group_by(gender) %>%
  mutate(percentage = round(n / sum(n),3)) %>% #round(n / sum(n) * 100, 2))
  ungroup()

# Now create the plot with percentages
ggplot(meal_gender_goal_percentage, aes(x = meal_plan_type, y = percentage, fill = meal_plan_type)) +
  geom_col(position = "dodge", alpha = 0.8) +
  labs(title = "Meal Plan Types by Goal",
       x = "Gender",  # Note: You might want to reconsider this x-axis label since it shows meal_plan_type
       y = "Percentage",
       fill = "Meal Plan Type") +
  theme_minimal() +
  scale_fill_brewer(palette = "Set4") +
  scale_y_continuous(labels = percent, limits = c(0, 1)) +  # 0% to 100%
  facet_grid(goal ~ gender) +
  geom_text(aes(label = percent(percentage)), 
            position = position_dodge(width = 0.9), 
            vjust = -0.5, size = 3) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
## Warning: Unknown palette: "Set4"


Muscle Gain: Men prefer low-carb; women choose balanced diets
Fat Burn: Both genders consistently select high-calorie meal plans
Overall: Gender influences diet choice for muscle building but not for fat reduction

Step 5.4 Visualization Step counts by goal and intensity

step_heatmap <- clean_data %>%
  group_by(exercise_intensity, goal) %>%
  summarise(mean_steps = mean(steps, na.rm = TRUE), .groups = 'drop')

ggplot(step_heatmap, aes(x = exercise_intensity, y = goal, fill = mean_steps)) +
  geom_tile(color = "white", size = 1) +
  geom_text(aes(label = round(mean_steps)), color = "white", fontface = "bold") +
  scale_fill_gradient(low = "lightblue", high = "darkblue", 
                      name = "Average Steps") +
  labs(title = "Average Step Count by Exercise Intensity and Goal",
       x = "Exercise Intensity", 
       y = "Goal") +
  theme_minimal()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Summary

This analysis of the Exercise Schedule & Meal Plan dataset revealed several key insights into fitness patterns across gender, BMI categories, and fitness goals:

Key Findings:

Gender-Based Exercise Patterns: Distinct exercise intensity preferences emerged between genders. Men pursuing muscle gain predominantly selected high-intensity workouts, while women with the same goal preferred moderate strength training. Overall, women showed a stronger preference for low-intensity activities like yoga and swimming.

BMI Distribution Consistency: The analysis revealed remarkably similar BMI distributions across genders, with nearly identical proportions in underweight (~26%), overweight (~24-25%), and obese (~24%) categories for both males and females.

Goal-Oriented Meal Planning: Muscle Gain: Men prefer low-carb; women choose balanced diets
Fat Burn: Both genders consistently select high-calorie meal plans

Methodological Strengths:

  • Successfully standardized and cleaned categorical variables from categorical text data
  • Implemented robust feature engineering to extract exercise intensity and step counts
  • Created comprehensive visualizations showing proportional relationships
  • Maintained data integrity through systematic transformation pipelines

Future Work

1. Advanced Analytics

Machine Learning Applications: Develop recommendation systems using multi-output classification to suggest personalized exercise and meal plans
Cluster Analysis: Identify distinct user segments based on gender, BMI, and goal combinations for targeted fitness programs
Predictive Modeling: Build models to predict optimal exercise intensity and meal plans for new users

2. Data Enhancement

Temporal Analysis: Incorporate time-series data to track fitness progress and plan effectiveness over time
Nutritional Deep Dive: Expand meal plan analysis with detailed nutritional information (macronutrients, calories)
Exercise Specificity: Categorize exercises by type (cardio, strength, flexibility) and muscle groups targeted

3. Expanded Research Questions

Cultural & Geographic Factors: Investigate how exercise and diet preferences vary across different demographics
Seasonal Patterns: Analyze how fitness recommendations change based on seasonal variations
Age Group Analysis: Extend the dataset to include age demographics for life-stage specific recommendations

4. Technical Improvements

Personalized Fitness Apps: Use findings to develop AI-driven fitness coaching applications
Healthcare Integration: Partner with healthcare providers for obesity prevention and weight management programs
Corporate Wellness: Adapt insights for workplace wellness program development
This analysis provides a strong foundation for building intelligent fitness recommendation systems and contributes valuable insights to the growing field of data-driven health and wellness optimization.