#Tut 4: Revision

Initial Set Up

# First, I am loading all the necessary packages that will be utilised in this analysis
library(pacman)
p_load(tidyverse, psych, knitr, dplyr, janitor, haven, grid, readxl, magrittr, effectsize)

Question 1

T-Tests

#Next I will load in the necessary data sets
AirTanxiety_data <- read_excel("AirTanxiety_new.xlsx")
#It is now essential, to produce the mean, sd, and variance to make initial observations about the data
AirTanxiety_data %>% group_by(Group) %>% summarise(Mean= mean(Anxiety), SD= sd(Anxiety), Varience= var(Anxiety))
## # A tibble: 2 × 4
##   Group       Mean    SD Varience
##   <chr>      <dbl> <dbl>    <dbl>
## 1 Exercise    65.4  3.17    10.0 
## 2 Meditation  68.3  1.83     3.34

The output above illustrates that the excersise condition has lower mean anxiety levels, but higher variability, thus indicating less consistent levels and the presence of outliers. The meditation condition has higher mean anxiety levels, with much lower variability, approximating a normal distribution, and demonstrating more consistent anxiety levels.

# Now, I will calculate the k-ratio, to asses whether the assumption of homogeneity of variances is upheld.
10.044444/3.344444
## [1] 3.003323

The assumption of homogeneity of variances is upheld, k < 4 Thus indicating that I can continue on to conducting the t-test.

H0: There is no difference in the anxiety level means of air traffic controllers in both the exercise and meditation groups. H1: There is a difference in anxiety level means of air traffic controllers between the exercise and meditation groups.

# With all assumptions met and hypotheses stated, I will now conduct the t-test
t.test(Anxiety~Group, data= AirTanxiety_data, var.equal= TRUE)
## 
##  Two Sample t-test
## 
## data:  Anxiety by Group
## t = -2.5063, df = 18, p-value = 0.02202
## alternative hypothesis: true difference in means between group Exercise and group Meditation is not equal to 0
## 95 percent confidence interval:
##  -5.3309846 -0.4690154
## sample estimates:
##   mean in group Exercise mean in group Meditation 
##                     65.4                     68.3

A t-test was conducted to investigate the difference in the effect of meditation and exercise on anxiety level. The results indicate a statistically significant difference between the effect of meditation and exercise on anxiety level, t(18)= -2.51, p < .o5, and thus provides enough evidence to reject the null hypothesis and accept the alternate hypothesis. This is corroborated by the confidence interval which does not have a zero represented by the boundary values being the same sign. The negative sign of the t statistic indicates that the mean of the exercise condition is lower than the mean of the meditation condition. According to these results, I would suggest the airline implement meditation spaces and workshops where controllers can regulate through out the day or specified slots.

Question 2

Chi-Squared

#For this question, I will start by importing and reading in the "personality_data.xlsx" dataset.
Personality_data <- read_excel("personality_data.xlsx.")
# It is essential to inspect the data
head(Personality_data, 6)
## # A tibble: 6 × 3
##   participant_id favorite_film personality_type
##            <dbl> <chr>         <chr>           
## 1              1 Action        Introvert       
## 2              2 Action        Introvert       
## 3              3 Action        Introvert       
## 4              4 Action        Introvert       
## 5              5 Action        Introvert       
## 6              6 Action        Introvert
# I have inspected the data to have more insight of how it is set up and will now create a matrix
personality_matrix <- matrix(c(10, 25, 25, 15, 30, 15),
                        nrow = 2,
                        dimnames = list(Personality = c("Introvert", "Extrovert"),
                                        Film = c("Action", "Romantic", "Drama")))
print(personality_matrix)
##            Film
## Personality Action Romantic Drama
##   Introvert     10       25    30
##   Extrovert     25       15    15
# I will now produce a contigency table of the observed frequencies
Obs_Table <- kable(table(Personality_data$personality_type, Personality_data$favorite_film), caption = "Observed Frequencies")
print(Obs_Table)
## 
## 
## Table: Observed Frequencies
## 
## |          | Action| Drama| Romantic|
## |:---------|------:|-----:|--------:|
## |Extrovert |     10|    25|       30|
## |Introvert |     25|    15|       15|

The outputs above indicate that Extroverts strongly favour Romantic films, Introverts strongly favour action films. Both groups like drama films, but extroverts more so. Romantic films are twice as popular with extroverts than introverts

chisq.test(table(Personality_data$personality_type,Personality_data$favorite_film))$expected
##            
##               Action    Drama Romantic
##   Extrovert 18.95833 21.66667   24.375
##   Introvert 16.04167 18.33333   20.625
# It is now time to conduct the chi-squared test
chisq.test(table(Personality_data$personality_type,Personality_data$favorite_film))
## 
##  Pearson's Chi-squared test
## 
## data:  table(Personality_data$personality_type, Personality_data$favorite_film)
## X-squared = 13.187, df = 2, p-value = 0.001369

A Chi-squared test was conducted to investigate whether individuals with different personality types have preferences for certain types of film. The hypothesis states that there is a relationship between personality type and film preference. The test reveals a statistically significant relationship between personality type and film preference, t(2) = 13,19, p<.05. There is enough evidence to reject the null hypothesis of no difference and accept the alternate hypothesis.

# I will now produce the adjusted residuals to further understand the difference between the observed and expected values
chisq.test(table(Personality_data$personality_type,Personality_data$favorite_film))$stdres
##            
##                Action     Drama  Romantic
##   Extrovert -3.610918  1.295501  2.128725
##   Introvert  3.610918 -1.295501 -2.128725

Introverts depicted a strong preference for action movies while avoiding romantic films, whereas extroverts demonstrated the opposite pattern, having a preference for romantic movies and avoiding action films. This further corroborates the significance of the relationship between personality type and film preference. Additionally this suggests that introverts prefer high arousal genres and extroverts prefer more mellow romantic type genres.

Question 3.1

One-way ANOVA

# I will start by reading in the required data-sets.
diet_data <- read_excel("Diet.xlsx")

H0: The different diets have the same effect on total weight-loss after 8-weeks H1: The different diets have different effects on total weight-loss after 8 weeks

There are three main assumptions to be met in order to run a one-way ANOVA, along with an additional 4th assumption. There must be independence of observations, homogeneity of variances, normality, and lastly variables must be continuous.

diet_aov <- aov(Weight_Loss ~ Group, data = na.omit(diet_data))
summary(diet_aov)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## Group        3  166.2   55.40   16.87 5.21e-07 ***
## Residuals   36  118.2    3.28                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The One-way ANOVA conducted reveals a statistically significant effect of diet type on weight loss, f(3, 36) = 16.87, p < .001. This provides strong evidence that different diet programs produce significantly different weight loss outcomes and it is enough to reject the null hypothesis and accept the alternate hypothesis. The specific type of dietary intervention does matter for weight loss success, with some diets proving substantially more effective than others.

# I will now calculate the effect size
library(effectsize)
eta_squared(diet_aov)
## For one-way between subjects designs, partial eta squared is equivalent
##   to eta squared. Returning eta squared.
## # Effect Size for ANOVA
## 
## Parameter | Eta2 |       95% CI
## -------------------------------
## Group     | 0.58 | [0.39, 1.00]
## 
## - One-sided CIs: upper bound fixed at [1.00].

The effect size was large, demonstrating that that 58% of the variance in weight loss outcomes is explained by the diet type. This indicates that intentional diet type does aid weight loss and it may be suggested that those who want to lose weight should go on the particular diet that would result in the most weight lost. These results do not account for confounding variables such as those related to mental health and should not inform suggestion in isolation.

# I will now run an assumptions test, which should be run first to confirm whether the data meets the assumptions required for the One-way ANOVA test.
shapiro.test(diet_data$Weight_Loss) #This tests for normality
## 
##  Shapiro-Wilk normality test
## 
## data:  diet_data$Weight_Loss
## W = 0.96107, p-value = 0.1822

p > .05, thus indicating that the assumption of normality was upheld by this data

# I will now test whether the assumption of homogeneity of variances is upheld
bartlett.test(Weight_Loss ~ Group, data = diet_data) 
## 
##  Bartlett test of homogeneity of variances
## 
## data:  Weight_Loss by Group
## Bartlett's K-squared = 3.1412, df = 3, p-value = 0.3704

p > .05, indicating that the assumption of the homogeneity of variances is upheld.

Question 3.2

# Completed ANOVA table values based on the analysis:
SS_between = 166.2
SS_within = 118.2
df_between = 3
df_within = 36
MS_between = 55.40
MS_within = 3.28
F_value = 16.87
p_value = 5.21e-07

Question 4

Factorial ANOVA

#The first thing I am going to do is read in the data set and visually inspect it to gain more understanding on the variables
heart_data <- read_excel("Heart.xlsx")

H0: All four fasting schedules have the same effect on heart health H0: Frequent and In frequent excersise have the same effect on heart health H0: The effect of fasting schedule on heart health does not depend on the effect of frequency of excersise

# I will now run the factorial ANOVA, determine effect size, and interpret the results, thus assessing the implications
heart_aov <- aov(Heart_health ~ Fasting * Exercise, data = heart_data)
summary(heart_aov)
##                  Df Sum Sq Mean Sq F value   Pr(>F)    
## Fasting           3  194.2   64.73  40.777 4.84e-11 ***
## Exercise          1    3.6    3.60   2.268    0.142    
## Fasting:Exercise  3   63.8   21.27  13.396 7.83e-06 ***
## Residuals        32   50.8    1.59                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
eta_squared(heart_aov)
## # Effect Size for ANOVA (Type I)
## 
## Parameter        | Eta2 (partial) |       95% CI
## ------------------------------------------------
## Fasting          |           0.79 | [0.67, 1.00]
## Exercise         |           0.07 | [0.00, 1.00]
## Fasting:Exercise |           0.56 | [0.33, 1.00]
## 
## - One-sided CIs: upper bound fixed at [1.00].

A factorial ANOVA was conducted to investigate the main effects of various fasting schedules and different frequencies of exercise on heart health. More interestingly, the anova was conducted to identify whether or not there is an interaction between fasting and excersise that effects heart health. The ANOVA revealed a statistically significant main effect of fasting on heart health, f(3, 32)= 40.78, p < .001. Furthermore, there is a statistically significant interaction effect between fasting and exercise on heart health, f(3, 32)= 13.4, p < .001.

The main effect size of fasting on heart health is extremely large, while that of exercise on heart health is moderate. It is vital to note that the interaction effect size of both fasting and exercise is large, but still smaller than that of fasting. This suggests that heart health intervention programmes should ideally incorporate both fasting and exercise as tools. In cases where exersise may not be attainable, the incorporation of fasting will still make a difference.

The cell means plot depicts a clear interaction between exercise frequency and fasting protocol, as the lines are non-parallel and cross. Thus Indicating that the effect of the fasting protocol on heart health depends on the frequency of exercise.

For the ‘Frequent’ group, the 5/2 (M=9.7) and 16/8 (M=8.3) protocols are highly effective, while the 48hr protocol (M=2.9) is lower than the control (M=3.1). Alternatively, for the ‘Infrequent’ group, all three active protocols yield similar, moderate results (16/8=6.3, 5/2=6.7, 48hr=6.5), all of which are superior to the control (M=2.1).

The most notable finding is the reversal for the 48hr fasting protocol, which is the worst protocol for frequent exercise but is just as effective as the others for infrequent exercise. The optimal fasting protocol it depends how frequently an individual exercises.

Tukey’s HSD shows that when exercise was frequent, individuals following the 48-hour fasting protocol showed significantly smaller improvements in heart-health scores compared to those following the other fasting schedules (p < .05). In contrast, under infrequent exercise conditions, differences between fasting types were not statistically significant (p > .05).

This indicates that the negative effects of a prolonged 48-hour fast become most apparent when exercise is frequent. While moderate fasting schedules such as 16/8 and 5/2 appear to enhance heart-health improvements — especially alongside regular exercise — the 48-hour fast is associated with diminished gains and potentially adverse effects on cardiovascular health when combined with high exercise frequency.