Loading the needed packages

library(tidyverse)
library(scales)
library(patchwork)
library(statsExpressions)
library(DT)
library(ggstatsplot)

Loading the dataset

infer<-read_csv("infer.csv") %>% select(1:8)

Athletism - checking out the proportions/percentages between levels and testing for equality

In this case, I need to sample more “nonathletic” respondents to make the levels more equal within this group.

Gender

Age

I need to sample more respondents from the underrepresented levels like 40-44, 35-39, and 45-49.

Frequency of the physical exercise

Duration of the physical exercise

I need to sample more respondents from “> 2 hours” group level.

Sport type

My sample is highly unbalanced. I need to sample more respondents from underrepresented sports.

ANALYSIS OF QUESTIONARY ANSWERS IN THE CONTEXT OF EXERCISE FREQUENCY AND DURATION

I have decided that the most important variables that I want to dig into are exercise frequency and duration. Therefore, my attention turns to them. I will look for associations between those two variables and the different respondents’ answers. Also, I will check out for equal proportions between the question answers within the levels of those two most important for me variables.

Based on their answers, this shows me that the frequency of exercise significantly affects the mood of respondents. The more frequently respondents exercise, the more strongly they agree with the question statement.