title: “GymGraphs” output: html_document date: “2024-09-03”

Loading the dataset

data <- read.csv("megaGymDataset.csv")
library("ggplot2")

## Warning: package 'ggplot2' was built under R version 4.3.3

str(data)

## 'data.frame':    2918 obs. of  9 variables:
##  $ X         : int  0 1 2 3 4 5 6 7 8 9 ...
##  $ Title     : chr  "Partner plank band row" "Banded crunch isometric hold" "FYR Banded Plank Jack" "Banded crunch" ...
##  $ Desc      : chr  "The partner plank band row is an abdominal exercise where two partners perform single-arm planks while pulling "| __truncated__ "The banded crunch isometric hold is an exercise targeting the abdominal muscles, particularly the rectus abdomi"| __truncated__ "The banded plank jack is a variation on the plank that involves moving the legs in and out for repetitions. Hav"| __truncated__ "The banded crunch is an exercise targeting the abdominal muscles, particularly the rectus abdominis or \"six-pa"| __truncated__ ...
##  $ Type      : chr  "Strength" "Strength" "Strength" "Strength" ...
##  $ BodyPart  : chr  "Abdominals" "Abdominals" "Abdominals" "Abdominals" ...
##  $ Equipment : chr  "Bands" "Bands" "Bands" "Bands" ...
##  $ Level     : chr  "Intermediate" "Intermediate" "Intermediate" "Intermediate" ...
##  $ Rating    : num  0 NA NA NA NA NA NA NA 8.9 8.9 ...
##  $ RatingDesc: chr  "" "" "" "" ...

Creating histogram of the ‘Rating’ column

ggplot(data, aes(x = Rating)) + 
  geom_histogram(binwidth = 0.5, fill = "blue", color = "black", na.rm = TRUE) + 
  labs(title = "Histogram of Ratings", x = "Rating", y = "Frequency")

This histogram displays the distribution of Rating values across all exercises in my dataset. The X-Axis represents the rating range, and the Y-Axis shows the frequency, indicating how many exercises fall into each rating bin. It shows whether most exercises have lower or higher ratings.

Creating bar plot of ‘Type’ vs ‘Rating’

ggplot(data, aes(x = Type, y = Rating, fill = Type)) +
  geom_bar(stat = "identity", na.rm = TRUE) +
  labs(title = "Average Rating by Exercise Type", x = "Exercise Type", y = "Average Rating") +
  theme_minimal()

This bar plot is intended to viualize the average rating for diffrent types of excercises. The X-Axis (Exercise Type) Shows different exercise types (e.g., “Strength, strongman powerlifting etc”) and the Y-Axis (Average Rating): Basically, it comparing the average ratings for different exercise types. It reveals which types of exercises receive higher ratings on average, indicating which types are generally more favorably rated.

Creating Scatter plot of ‘Difficulty Level vs ’Rating’

ggplot(data, aes(x = Level, y = Rating)) +
  geom_point(color = "purple", size = 3, na.rm = TRUE) +
  labs(title = "Difficulty Level vs Rating", x = "Level", y = "Rating") +
  theme_minimal()

The scatter plot shows the relationship between exercise difficulty and ratings. It helps identify if more difficult exercises tend to receive higher ratings or if difficulty level has little impact on ratings.

Calculating mean and standard deviation

mean_rating <- mean(data$Rating, na.rm = TRUE)
sd_rating <- sd(data$Rating, na.rm = TRUE)

mean_rating

## [1] 5.91969

sd_rating

## [1] 3.584607

The mean rating provides the average rating of exercises. The standard deviation indicates how much ratings vary from this average, showing the extent of variability in the ratings.

Calculating correlation

data$Level <- as.numeric(as.factor(data$Level))

correlation <- cor(data$Rating, data$Level, use = "complete.obs")

correlation

## [1] 0.1947474

The correlation between difficulty level and rating is about 0.195, indicating a weak positive relationship. This suggests a slight tendency for harder exercises to receive slightly higher ratings, though the effect is not strong.

T-Test

Divide the dataset into two groups

group1 <- data[data$Equipment == "Bands", ]
group2 <- data[data$Equipment == "Barbell", ]

t_test_result <- t.test(na.omit(group1$Rating), na.omit(group2$Rating))

t_test_result

## 
##  Welch Two Sample t-test
## 
## data:  na.omit(group1$Rating) and na.omit(group2$Rating)
## t = -2.8759, df = 38.846, p-value = 0.006508
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.5837953 -0.6239825
## sample estimates:
## mean of x mean of y 
##  4.333333  6.437222

The t-test compares ratings between exercises using Bands and those using Barbells. The result shows a significant difference: exercises with Barbells receive higher ratings compared to those with Bands. This difference is statistically significant, meaning it is unlikely to be due to random chance. `