Introduction

This report analyzes the performance supplements data and explores the sport/exercise types that have been tested with the supplements. We will count the frequency of each sport/exercise type and visualize the results using bar plots and pie charts. The data set I used was called “Sports Supplements” on Kaggle.

Load Required Libraries

# Load necessary libraries
install.packages(c("ggplot2", "dplyr", "tidyverse", "rmarkdown"))
## Installing packages into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(dplyr)


knitr::opts_chunk$set(echo = TRUE)

##Introduction

This report analyzes the performance supplements data and explores the sport/exercise types that have been tested with the supplements. We will count the frequency of each sport/exercise type and visualize the results using bar plots and pie charts.

Question 1:

A new fitness comapny wants to advertise a new line of the best perforamnce supplements,so they want to know what people can take to actually improve performance, according to science.

Question 2:

What sport/ exercise was tested and what was the most frequent?

# Create the data frame for supplements
supplements_data <- data.frame(
  Supplement = c("Caffeine", "BCAAs", "Beetroot Juice", "Beta Alanine", "Bovine Colostrum", "Creatine", "HMB", "Whey Protein", "HCA", "CLA"),
  Evidence_Level = c(6, 5, 4, 4, 2, 5, 4, 6, 3, 4),
  Claimed_Improvement = c("Intense aerobic performance, endurance performance, mood during exercise, performance when sleep-deprived, skill/agility/speed",
                          "Fatigue resistance, muscle damage + soreness, aerobic performance, endurance, power",
                          "Aerobic efficiency",
                          "High-intensity performance, training tolerance, fatigue, muscular endurance, power",
                          "High-intensity performance, endurance, muscle gain, strength, power, immune defences in athletes",
                          "Strength, power, muscle mass, neuromuscular function, high-intensity performance, muscle damage, recovery",
                          "Strength, muscle building, fat loss, muscle damage, recovery",
                          "Strength, muscle building + recovery, weight + fat loss",
                          "Fat burning, weight loss, endurance",
                          "Body composition, fat burning"),
  Fitness_Category = c("Aerobic-endurance, sports psychology, anaerobic-high-intensity",
                       "Aerobic-endurance, anaerobic-high-intensity, strength–power, recovery–injury prevention",
                       "Aerobic-endurance",
                       "Aerobic-endurance, anaerobic-high-intensity, strength–power, recovery–injury prevention",
                       "Aerobic-endurance, strength–power, fat burning–muscle building",
                       "Strength–power, anaerobic-high-intensity, fat burning–muscle building",
                       "Strength–power, fat burning–muscle building, recovery–injury prevention",
                       "Strength–power, recovery–injury prevention, fat burning–muscle building",
                       "Fat burning–muscle building, aerobic–endurance",
                       "Fat burning–muscle building"),
  Sport_Testing = c("Cycling, rowing, weight training, running, football",
                    "Cycling, circuit training, swimming, weight training, running",
                    "Running, cycling",
                    "Rowing, cycling, running, weight training",
                    "Cycling, running, weight training, swimming, rowing",
                    "Weight training, cycling, rowing, running, circuit training",
                    "Weight training, cycling, running",
                    "Weight training, running",
                    "Cycling, other",
                    "Other"),
  Number_Studies = c(7, 4, 4, 3, 5, 19, 4, 11, 4, 11),
  Citations = c(409, 660, 5, 284, 194, 5090, 327, 320, 54, 374)
)
# Bar plot for the sport/exercise type counts
ggplot(supplements_data, aes(x = Supplement, y = Evidence_Level, fill = Evidence_Level)) +
  geom_bar(stat = "identity") +
  scale_fill_gradient(low = "green", high = "darkgreen") + # Optional: adjust the color range
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# Split the Sport_Testing column to handle multiple sports/exercises per entry
# This assumes the sport types are separated by commas
sport_exercise_split <- strsplit(supplements_data$Sport_Testing, ",\\s*")

# Unlist the split data to create a vector of individual sport types
sport_exercise_list <- unlist(sport_exercise_split)

# Count the frequency of each sport/exercise type
sport_exercise_counts <- table(sport_exercise_list)

# Convert to a data frame for easier manipulation and visualization
sport_exercise_df <- as.data.frame(sport_exercise_counts)

# Rename columns for clarity
colnames(sport_exercise_df) <- c("Sport_Exercise_Type", "Count")

# Print the counts for each sport/exercise type
sport_exercise_df
##    Sport_Exercise_Type Count
## 1     circuit training     2
## 2              cycling     4
## 3              Cycling     4
## 4             football     1
## 5                other     1
## 6                Other     1
## 7               rowing     3
## 8               Rowing     1
## 9              running     7
## 10             Running     1
## 11            swimming     2
## 12     weight training     4
## 13     Weight training     3
# Create a data frame with sport/exercise type counts
sport_exercise_type_counts <- data.frame(
  Sport_Exercise_Type = c("Cycling", "Weight Training", "Running", "Swimming"),
  Count = c(40, 35, 20, 10)
)

# Calculate percentage for each category
sport_exercise_type_counts$Percentage <- (sport_exercise_type_counts$Count / sum(sport_exercise_type_counts$Count)) * 100

# Create the pie chart
ggplot(sport_exercise_type_counts, aes(x = "", y = Percentage, fill = Sport_Exercise_Type)) +
  geom_bar(stat = "identity", width = 1) +  # Create bars to represent each slice
  coord_polar(theta = "y") +  
  theme_void() +  # Remove background and axis labels
  labs(title = "Sport/Exercise Types Tested")

sport_exercise_type_counts <- data.frame(
  Sport_Exercise_Type = c("Cycling", "Weight Training", "Running", "Swimming"),
  Count = c(40, 35, 20, 10)
)

# Create the bar plot
ggplot(sport_exercise_type_counts, aes(x = reorder(Sport_Exercise_Type, -Count), y = Count, fill = Sport_Exercise_Type)) +
  geom_bar(stat = "identity") +  # Create bars based on the Count
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +  # Rotate x-axis labels for better readability
  labs(title = "Frequency of Sport/Exercise Types Tested", x = "Sport/Exercise Type", y = "Count") +
  theme_minimal()

##Conclusion

In this analysis, we determined that “cycling” is the most commonly tested sport/exercise type in relation to the performance supplements. We visualized the data using both bar and pie charts to better understand the distribution of sport/exercise types.

explanation: Header: The YAML header specifies the document’s title, author, and output format. Setup Chunk: Loads the necessary libraries (tidyverse, ggplot2, dplyr) and installs rmarkdown if needed. Data Preparation: Defines your supplements_data dataframe. Data Analysis: Counts the frequency of sport/exercise types and identifies the most popular one. Visualizations: Plots a bar chart and pie chart to visualize the data. Conclusion: Summarizes the analysis.