Analysis of the weight loss study.

This R markdown file takes you through the analysis of the weight loss study. The values you have calculated will be different because the data is random, but the analysis steps remain the same. The first step is to load in the data and look at the variables contained within the data set.

data <- read.csv("~/Downloads/weight_loss_study.csv")
str(data)
## 'data.frame':    90 obs. of  4 variables:
##  $ id         : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Diet       : chr  "Control" "Control" "D1" "D1" ...
##  $ Exercise   : chr  "E1" "E2" "E1" "E2" ...
##  $ Weight.Loss: num  3.32 4.68 8.93 6.57 8.35 ...

The data has a grouping column for diet (Control, D1, D2) and exercise (E1, E2) and the dependent variable weight loss.

Data Cleaning

Check for missing data and the range of variables.

summary(data)
##        id            Diet             Exercise          Weight.Loss    
##  Min.   : 1.00   Length:90          Length:90          Min.   :-0.970  
##  1st Qu.:23.25   Class :character   Class :character   1st Qu.: 5.705  
##  Median :45.50   Mode  :character   Mode  :character   Median : 7.330  
##  Mean   :45.50                                         Mean   : 7.313  
##  3rd Qu.:67.75                                         3rd Qu.: 8.938  
##  Max.   :90.00                                         Max.   :14.270

Weight loss ranges between -1.5 (weight gain) and 11.96 with a mean of 6.801, there is no missing data.

Look for any outlying variables with box plots (this can also be used to check for the assumption of normality)

nGroups <- length(unique(data$Diet)) * length(unique(data$Exercise))
ylimit <- range(pretty(data$Weight.Loss))
out = boxplot(Weight.Loss ~ Diet + Exercise, data = data, 
        border = c(colours[c(1,2,3)], "black","black","black"), 
        col = c(NA,NA,NA,colours[c(1,2,3)]), lwd = 2, boxwex = .5, 
        las = 1, ann = FALSE, axes = FALSE, ylim=ylimit)
mtext("Diet / Exercise Groups", 1, line = 3, font = 2, cex = 1.2)
mtext("Weight Loss (kg) ",2, line = 2, font = 2, cex = 1.2)
axis(2, at = pretty(ylimit), tck = .01, lwd = 2, labels = FALSE)
mtext(pretty(ylimit),2,at=pretty(ylimit), las=1, line = .2)
mtext(c("Control", "Diet 1", "Diet 2"),1, line = .2, at = 1:6)
mtext(c("Exercise 1", "Exercise 2"), 1, line = 1.3, at = c(2,5))

There are two outliers in the data, they can be removed with the following code. I will keep them in because their values don’t exceed the range of the other diets / exercise groups and to keep the designed of this ANOVA balanced (equal number of participants in each group).

noOutliers <- data[!data$Weight.Loss %in% out$out,]
summary(noOutliers)
##        id            Diet             Exercise          Weight.Loss    
##  Min.   : 1.00   Length:82          Length:82          Min.   : 0.010  
##  1st Qu.:22.25   Class :character   Class :character   1st Qu.: 5.795  
##  Median :44.50   Mode  :character   Mode  :character   Median : 7.420  
##  Mean   :44.52                                         Mean   : 7.441  
##  3rd Qu.:66.75                                         3rd Qu.: 8.930  
##  Max.   :90.00                                         Max.   :14.270

Verify assumptions of ANOVA

The main assumptions to check here are normality of the residuals and homogeneity of variance

  homoVarDiet <- leveneTest(data$Weight.Loss, data$Diet)
  homoVarDiet
## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  2  0.0821 0.9213
##       87
  homoVarExercise <- leveneTest(data$Weight.Loss, data$Exercise)
  homoVarExercise
## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  1  1.8378 0.1787
##       88

Neither return a statistically significant difference between the variance of the groups for either Diet or Exercise. We follow up with building a histogram to check for the assumption of normality of residuals.

  model <- aov(Weight.Loss ~ Diet*Exercise, data = data)
  par(lwd = 2)
  hist(residuals(model), col = coloursT[4], border = colours[4], las = 1, xlab = "Residuals", main = "Distribution of Residuals", font.lab =2)

Run the ANOVA

I add \(\eta_p^2\) to the ANOVA table directly here, but that is not a necessary step. There is no within subjects component to this analysis, we therefore do not run aov() with the Error function. It will remove a degree of freedom from your residual and change your values in the ANOVA table.

model <- aov(Weight.Loss ~ Diet*Exercise, data = data)
outTable <- summary(model)
etap2 <- outTable[[1]][1:3,2] / (outTable[[1]][1:3,2] + outTable[[1]][4,2])
outTable[[1]][,"etap2"]<- c(etap2, NA)
outTable
##               Df Sum Sq Mean Sq F value Pr(>F)   etap2
## Diet           2  142.0   70.98   9.810 0.0001 0.18935
## Exercise       1    0.2    0.15   0.021 0.8848 0.00025
## Diet:Exercise  2   16.3    8.17   1.129 0.3282 0.02618
## Residuals     84  607.7    7.23

The only statistically significant effect in this ANOVA is the main effect of Diet, we only calculate the planned contrasts for that effect. The planned contrasts were:

The comparisons your run must be those set before hand. Additional comparisons would lead to posthoc comparisons and require corrections in the alpha level.

groupMeansDiet <- aggregate(data$Weight.Loss, by = list(data$Diet), mean)
groupSize <- aggregate(data$Weight.Loss, by = list(data$Diet), length)

MSErr <- outTable[[1]][4,3]
dfErr <- outTable[[1]][4,1]
myContrasts <- matrix(c(0,1,-1, -1, 0.5,0.5), nrow = 2, byrow = TRUE)
C <- apply(groupMeansDiet$x * t(myContrasts), 2, sum)
sc <- sqrt(MSErr * apply((t(myContrasts)**2)/groupSize$x,2,sum))
tcrit <- qt(1-0.05/2, dfErr)
contrastResults <- data.frame(MeanDiff=C, Lower = C - (sc*tcrit), upper = C + (sc*tcrit))
rownames(contrastResults) <- c("c(0,1,-1)", "c(-1, 0.5,0.5)")
contrastResults
##                  MeanDiff     Lower    upper
## c(0,1,-1)      -0.1133333 -1.494411 1.267744
## c(-1, 0.5,0.5)  2.6623333  1.466285 3.858382

Visualization of the effects

Create a visualization of the effects. As

groupMeans <- aggregate(data$Weight.Loss, by = list(data$Diet, data$Exercise), mean)
groupSD <- aggregate(data$Weight.Loss, by = list(data$Diet, data$Exercise), sd)

par(pty = "s", xpd = TRUE)
xlimit <- c(1,3)
ylimit <- range(pretty(data$Weight.Loss))
plot(x = NULL, y = NULL, xlim = xlimit, ylim = ylimit, ann = FALSE, axes = FALSE)
axis(2,at = pretty(data$Weight.Loss), tck = .01, lwd = 2, labels = FALSE, line = 2.3)

mtext("Diet", 1, line = 3, font = 2, cex = 1.2)
mtext("Weight Loss (kg) ",2, line = 3.3, font = 2, cex = 1.2)
mtext(pretty(ylimit),2,at=pretty(ylimit), las=1, line = 2.5)
mtext(c("Control", "Diet 1", "Diet 2"),1, line = .2, at = 1:3)

adj <- .2
arrows((1:3)-adj, groupMeans$x[1:3] - groupSD$x[1:3],(1:3)-adj, groupMeans$x[1:3] + groupSD$x[1:3],
       length=.05, angle=90, code=3, lwd=3,col =  colours[1:3] )
points((1:3)-adj, groupMeans$x[1:3], pch = 21, 
       bg = c("white","white","white"), col =  colours[1:3], 
       cex = 3.5, lwd=3)

arrows((1:3)+adj, groupMeans$x[4:6] - groupSD$x[4:6],(1:3)+adj, groupMeans$x[4:6] + groupSD$x[4:6],
       length=.05, angle=90, code=3, lwd=3,col =  colours[1:3] )
points((1:3)+adj, groupMeans$x[4:6], pch = 21, 
       bg = c(colours[1:3]), col =  colours[1:3], 
       cex = 3.5, lwd=3)

legend("bottomright", c("Exercise 1", "Exercise 2"), pch = 21, col = colours[1], pt.bg = c("white", colours[1]) , bty = "n", cex = 1.2)

Interpretation

As expected, there is a main effect of diet, F(2,84) = 23.28, p < .001, \(\eta_p^2\) = .357. There was no main effect of Exercise, F(1,84) = 0.11, p = .737, \(\eta_p^2\) = .001, nor any interaction effect of Diet and Exercise, F(2,84) = 0.478, p = .622, \(\eta_p^2\) = .011 (this is true regardless of what the lines look like in your visualizations - means may appear different or there may appear to be an interaction, but that does not mean is accounts for a sufficient amount of variance to reach statistical significance or explain a meaningful proportion of variance in the model.)

The main effect of diet was partitioned out by the planned comparisons defined at the start of the study. We defined an orthogonal contrast comparison where weight loss of participants in the diet groups were compared to those of the control group with a complex contrast defined as [-1,0.5,0.5] and the weight loss of participants in either diet group was compared with the simple contrast [0,1,-1]. Participants in either diet group lost approximately 3 kg more than participants in the control group (\(m_{diff} = 3.35\), 95% CI [2.29, 4.41]). Additionally, our diet did outperform the diet from the competitor, our participants lost about 1.6kg more than participants in the competing group (\(m_{diff} = 1.61\), 95% CI [0.38, 2.83]). As predicted, our new revolutionary diet can help individuals lose more weight than the best competing product already available, regardless of the individuals exercise regiment.

When you report your effects, if you say participants in group A loss X more than participants in group B, but the difference was not statistically significant that is kind of meaningless. If a mean difference is not statistically significant, then you accept the null hypothesis of no difference (i.e., 0) so there is no difference between group A and group B - do not write that there is a difference but then state that the difference is not statistically significant.