title: “GymGraphs” output: html_document date: “2024-09-03”
data <- read.csv("megaGymDataset.csv")
library("ggplot2")
## Warning: package 'ggplot2' was built under R version 4.3.3
str(data)
## 'data.frame': 2918 obs. of 9 variables:
## $ X : int 0 1 2 3 4 5 6 7 8 9 ...
## $ Title : chr "Partner plank band row" "Banded crunch isometric hold" "FYR Banded Plank Jack" "Banded crunch" ...
## $ Desc : chr "The partner plank band row is an abdominal exercise where two partners perform single-arm planks while pulling "| __truncated__ "The banded crunch isometric hold is an exercise targeting the abdominal muscles, particularly the rectus abdomi"| __truncated__ "The banded plank jack is a variation on the plank that involves moving the legs in and out for repetitions. Hav"| __truncated__ "The banded crunch is an exercise targeting the abdominal muscles, particularly the rectus abdominis or \"six-pa"| __truncated__ ...
## $ Type : chr "Strength" "Strength" "Strength" "Strength" ...
## $ BodyPart : chr "Abdominals" "Abdominals" "Abdominals" "Abdominals" ...
## $ Equipment : chr "Bands" "Bands" "Bands" "Bands" ...
## $ Level : chr "Intermediate" "Intermediate" "Intermediate" "Intermediate" ...
## $ Rating : num 0 NA NA NA NA NA NA NA 8.9 8.9 ...
## $ RatingDesc: chr "" "" "" "" ...
ggplot(data, aes(x = Rating)) +
geom_histogram(binwidth = 0.5, fill = "blue", color = "black", na.rm = TRUE) +
labs(title = "Histogram of Ratings", x = "Rating", y = "Frequency")
This histogram displays the distribution of Rating values across all exercises in my dataset. The X-Axis represents the rating range, and the Y-Axis shows the frequency, indicating how many exercises fall into each rating bin. It shows whether most exercises have lower or higher ratings.
ggplot(data, aes(x = Type, y = Rating, fill = Type)) +
geom_bar(stat = "identity", na.rm = TRUE) +
labs(title = "Average Rating by Exercise Type", x = "Exercise Type", y = "Average Rating") +
theme_minimal()
This bar plot is intended to viualize the average rating for diffrent types of excercises. The X-Axis (Exercise Type) Shows different exercise types (e.g., “Strength, strongman powerlifting etc”) and the Y-Axis (Average Rating): Basically, it comparing the average ratings for different exercise types. It reveals which types of exercises receive higher ratings on average, indicating which types are generally more favorably rated.
ggplot(data, aes(x = Level, y = Rating)) +
geom_point(color = "purple", size = 3, na.rm = TRUE) +
labs(title = "Difficulty Level vs Rating", x = "Level", y = "Rating") +
theme_minimal()
The scatter plot shows the relationship between exercise difficulty and ratings. It helps identify if more difficult exercises tend to receive higher ratings or if difficulty level has little impact on ratings.
mean_rating <- mean(data$Rating, na.rm = TRUE)
sd_rating <- sd(data$Rating, na.rm = TRUE)
mean_rating
## [1] 5.91969
sd_rating
## [1] 3.584607
The mean rating provides the average rating of exercises. The standard deviation indicates how much ratings vary from this average, showing the extent of variability in the ratings.
data$Level <- as.numeric(as.factor(data$Level))
correlation <- cor(data$Rating, data$Level, use = "complete.obs")
correlation
## [1] 0.1947474
The correlation between difficulty level and rating is about 0.195, indicating a weak positive relationship. This suggests a slight tendency for harder exercises to receive slightly higher ratings, though the effect is not strong.
group1 <- data[data$Equipment == "Bands", ]
group2 <- data[data$Equipment == "Barbell", ]
t_test_result <- t.test(na.omit(group1$Rating), na.omit(group2$Rating))
t_test_result
##
## Welch Two Sample t-test
##
## data: na.omit(group1$Rating) and na.omit(group2$Rating)
## t = -2.8759, df = 38.846, p-value = 0.006508
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.5837953 -0.6239825
## sample estimates:
## mean of x mean of y
## 4.333333 6.437222
The t-test compares ratings between exercises using Bands and those using Barbells. The result shows a significant difference: exercises with Barbells receive higher ratings compared to those with Bands. This difference is statistically significant, meaning it is unlikely to be due to random chance. `