Diet Effectiveness Statistical Study
Three different types of diet are being trialed by clinicians studying weight loss. Diet A is the diet usually recommended, diets B and C have been newly developed. On the basis, for the for the purpose of this, analysis Diet A will be the control group. The goal of this study is to confirm statistically if the newly formulated diets have an impact on the weight loss of the trial participants.
Based on the statistical analysis, the clinicians will adjust their recommendations to offer a new and improved diet that offers better weight loss results for customers. The statistical analysis is carried out using the frequentest statistical framework
The data consists of 42 rows with 0 missing values. 14 trials were carried out for each diet.
The data provided consists of 3 variables:
| Diet | weight | weight6weeks |
|---|---|---|
| A | 58 | 54.2 |
| A | 60 | 54.0 |
| A | 64 | 63.3 |
| A | 64 | 61.1 |
| A | 65 | 62.2 |
| A | 66 | 64.0 |
| Diet | Trials | Mean Weight | S.D Weight | Mean Weight @ 6 weeks | S.D Weight @ 6 weeks |
|---|---|---|---|---|---|
| A | 14 | 67.9 | 6.0 | 64.9 | 6.9 |
| B | 14 | 64.8 | 5.9 | 62.2 | 6.3 |
| C | 14 | 68.0 | 4.4 | 62.1 | 5.0 |
From the summary statistics we can infer that there is a difference between the mean weight and weight after 6 weeks of the trial participants across all 3 diets. Trial participants on diet A and diet C both have a mean weight of 68kg and participants for diet B have a mean weight of 65kg. To effectively evaluate the impact of these diets on the weight of participants, we calculate the weight loss from our observations and add it to our data frame. See preview of new data frame below:
| Diet | weight | weight6weeks | weight_loss |
|---|---|---|---|
| A | 58 | 54.2 | 3.8 |
| A | 60 | 54.0 | 6.0 |
| A | 64 | 63.3 | 0.7 |
| A | 64 | 61.1 | 2.9 |
| A | 65 | 62.2 | 2.8 |
Using a box plot to insight fully visualize the weight loss we see the distribution of weight loss for each diet and the differences between these distributions and their key metrics.
Distribution of weight loss for different diet types
From the Boxplot, Diet C has the highest mean weight loss from our data. There isn’t a clear difference between the average weight loss from diet A and diet B and there is a higher variance for the weight loss recorded from diet B compared to diet A. The difference in mean weight loss between the diets is thus worth investigating statistically.
We model our problem statistically as a one-way ANOVA (Analysis of Variance) model given that we are investigating the difference between the mean weight loss for more than two groups.
Given \(y_{i,j}\) be the weight lost by the \(j\)-th participant in each group fed with the \(i\)-th diet type, with \(i = 1, \ldots, 3\) and \(j = 1, \ldots, 14\). The one-way ANOVA model is as follows:
\[y_{i,j} \sim N(\mu_{i,j}, \sigma^2), \quad i = 1,2,3, \quad j = 1,\ldots,14\]
where
\[\mu_1 = \mu_{1,j}, \quad j = 1,\ldots,14\] \[\mu_2 = \mu_1 + \alpha_2, \quad j = 1,\ldots,14\] \[\mu_3 = \mu_1 + \alpha_3, \quad j = 1,\ldots,14\]
Description of model parameters \(\mu_1\), \(\alpha_2\) and \(\alpha_3\) : This One-Way ANOVA model uses \(\mu_1\) (average weight loss from diet A) as a “base” group and the mean weight loss from diets B and C (\(\mu_2\) and \(\mu_3\) respectively) are described in terms of the mean of diet A plus or minus constants \(\alpha_2\) and \(\alpha_3\).
\(\alpha_2\) and \(\alpha_3\) are thus constants used to parametrize the mean weight loss for diet B and diet C compared with diet A. Diet A is selected as the base group because it is what we are measuring against to see if the newly developed diets B and C offer improvements. In a more general sense, \(\mu_1\) can represent the mean weight loss of any of the diets depending on the control group being measured against.
Fitting this linear model we estimate the values \(\mu_1\), \(\alpha_2\), \(\alpha_3\) from the fitted model. The estimated values of the coefficients are below:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.0500000 0.5624035 5.4231521 3.271047e-06
## DietB -0.4428571 0.7953587 -0.5568018 5.808443e-01
## DietC 2.8928571 0.7953587 3.6371728 7.964651e-04
To confirm an underlying difference in the mean weight loss from the diets, I applied the anova (analysis of variance) function to the fitted linear model which essentially implements a hypothesis test of size 0.05 to confirm my expected/hypothesized difference between the mean weight loss of the groups.The anova test is also chosen because the goal is to make statistical inferences on the difference in mean weight loss of more than two groups (versus a t-test which would suffice for two groups).
With \(\mu_1\),\(\mu_2\) & \(\mu_3\) be the underlying average weight loss observed for Diet A, Diet B, Diet C from our data, I formulated the hypothesis test as follows:
\[H_0: \mu_1 = \mu_2 = \mu_3\] \[H_1: \mu_1 \neq \mu_2 \neq \mu_3\]
anova(m)
## Analysis of Variance Table
##
## Response: weight_loss
## Df Sum Sq Mean Sq F value Pr(>F)
## Diet 2 91.895 45.947 10.376 0.0002437 ***
## Residuals 39 172.699 4.428
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
p_value <- anova(m)$`Pr(>F)`[1]
p_value
## [1] 0.0002436924
Thus from our size \(\alpha\) = 0.05 test, we reject \(H_0\) as we have enough evidence to assert that there is a difference between the mean weight loss of the three groups. Though a difference between the mean values is confirmed statistically, where the statistically significant differences exist between the groups is not clear.
To confirm this, I can Perform a Follow-up Analysis using Tukey Honest Significant Differences. The Tukey test will allow me to test three pairs of hypotheses together simultaneously. I modeled the tests as follows:
\[ \begin{aligned} H_0:& \mu_2 - \mu_1 = 0 \quad \text{(or, } \mu_1 = \mu_2 \text{)} \\ H_1:& \mu_2 - \mu_1 \neq 0 \quad \text{(or, } \mu_1 \neq \mu_2 \text{)} \\ H_0:& \mu_3 - \mu_1 = 0 \quad \text{(or, } \mu_1 = \mu_3 \text{)} \\ H_1:& \mu_3 - \mu_1 \neq 0 \quad \text{(or, } \mu_1 \neq \mu_3 \text{)} \\ H_0:& \mu_3 - \mu_2 = 0 \quad \text{(or, } \mu_2 = \mu_3 \text{)} \\ H_1:& \mu_3 - \mu_2 \neq 0 \quad \text{(or, } \mu_2 \neq \mu_3 \text{)} \\ \end{aligned} \]
a <- aov(weight_loss ~ Diet, data = diet_df)
coef(a)
## (Intercept) DietB DietC
## 3.0500000 -0.4428571 2.8928571
summary(a)
## Df Sum Sq Mean Sq F value Pr(>F)
## Diet 2 91.89 45.95 10.38 0.000244 ***
## Residuals 39 172.70 4.43
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Based on the output of the Tukey HSD test, we can conclude the following:
Therefore, we can conclude that the diet C offers a different effect on weight loss, showing the higher weight loss value than diet B when both compared with diet A (the control group). Additionally, we can conclude that the difference in weight loss between group A and B is not statistically significant. This result aligns with our initial analysis and visualization of the data.
Management have placed a benchmark of over 5kg increase in weight loss for a new diet formulation to be implemented. To further aid their decision on a possible implementation of a new diet, the clinicians will like to confirm if the underlying weight loss obtained following diet C food supplement is more than 5 kg higher than the average of the underlying weight loss obtained following diet B?
The hypothesis to test if the mean weight loss \(\mu_3\) of diet C is more than 5kg higher than the mean weight loss \(\mu_2\) of diet B is formulated as follows:
\[ \begin{aligned} H_0:& \mu_3 - \mu_2 \leq 5 \quad \text{: } \mu_3 \text{ is not more than 5kg higher than } \mu_2 \\ H_1:& \mu_3 - \mu_2 > 5 \quad \text{: } \mu_3 \text{ is more than 5kg higher than } \mu_2 \\ \end{aligned} \]
ght <- glht(m_mu,
# State the hypothesis to be tested (null hypothesis):
linfct = c("DietC - DietB <= 5"))
summary(ght)
##
## Simultaneous Tests for General Linear Hypotheses
##
## Fit: lm(formula = weight_loss ~ Diet - 1, data = diet_df)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>t)
## DietC - DietB <= 5 3.3357 0.7954 -2.092 0.979
## (Adjusted p values reported -- single-step method)
The p-value from the hypothesis test is greater than 0.05, hence we cannot reject the null hypothesis. We thus conclude that there is not enough evidence to disprove the null hypothesis that the underlying weight loss obtained following Diet C is not more than 5 kg higher than the underlying weight loss obtained following diet B.
Unfortunately the threshold needed to effect a diet change was not met in this instance and the clinicians will possibly need to do more trials or improve their diet formulations.