Chickens on different diets.

A spot of analysis here, as an example how to scientifically assess if one way of doing things (say a certain diet) is better than others.

We’re looking here, at data about some chickens. These chickens were fed one of 4 different diets for a time period of 3 weeks. Their weights were then recorded every day. Let us glance at a summary.

## Tabulate chickens by diet
ChickWeight %>% group_by(Diet) %>% summarize(number_of_chicken = n_distinct(Chick))
## # A tibble: 4 x 2
##   Diet  number_of_chicken
##   <fct>             <int>
## 1 1                    20
## 2 2                    10
## 3 3                    10
## 4 4                    10
## Tabulate chickens by time for which they were tracked
ChickWeight %>% group_by(Time) %>% summarize(number_of_chicken = n_distinct(Chick))
## # A tibble: 12 x 2
##     Time number_of_chicken
##    <dbl>             <int>
##  1     0                50
##  2     2                50
##  3     4                49
##  4     6                49
##  5     8                49
##  6    10                49
##  7    12                49
##  8    14                48
##  9    16                47
## 10    18                47
## 11    20                46
## 12    21                45

You can see from above that 50 different chicken were tracked over a period of 3 weeks. They were fed one of four different diets. 20 of them were fed diet 1 and 10 each were fed one of the 3 other diets.

Data Wrangling

However not all chicken were tracked till the end of 21 days. Of the 50 chicken, only 45 are left by day 21. In the absence of other information, the dark side of me is going to assume that 5 of them died. In my analysis, I stick to the 45 for who I have data for the whole 21 days.

##Convert the factor variable Chick to an integer variable
ChickWeight$intchick <- as.numeric(as.character(ChickWeight$Chick))
##identify survivors
survivors<- ChickWeight %>% filter(Time == 21) %>% distinct(intchick)
##filter out survivors
ChickWeight2 <- filter(ChickWeight, Chick %in% survivors$intchick)

Now that we have a dataset of just the 45 chicken we are interested in, let us look at some plots.

Scatter Plots by Diet

Stats summary plot

Let’s look at a summary of the change in weights for each chicken over 21 days, grouped by the diet they followed.

ggplot(Chickweight_delta) + stat_summary( mapping = aes(x = Diet.x, y = DeltaWeight), fun.ymin = min, fun.ymax = max, fun.y = mean) +labs(Title = 'Change in weights over a 21 day period are lowest for Diet 1 and highest for Diet 3', x = 'Diets' , y = 'Change in weight in gms' )

## Statistical Analysis

Visually it may look like we’ve found a winner in diet 3. However, how can we derive knowledge and a level of certainty in a mathematically rigorous manner ? An ANOVA or analysis of variance is often used in such situations. Simply put an ANOVA tells us whether the variations of the delta change in weight among chickens following different diets are significantly different from the overall variation in change in weight, in the sample. After all chickens are all growing at different rates, are we sure that the variation we see is because of the difference in diet ?

summary(lm( DeltaWeight ~ Diet.x, Chickweight_delta))
## 
## Call:
## lm(formula = DeltaWeight ~ Diet.x, data = Chickweight_delta)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -142.000  -42.000   -0.667   38.813  127.813 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   136.19      16.05   8.483 1.45e-10 ***
## Diet.x2        37.81      25.89   1.461 0.151738    
## Diet.x3        93.31      25.89   3.604 0.000839 ***
## Diet.x4        61.48      26.76   2.298 0.026762 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 64.22 on 41 degrees of freedom
## Multiple R-squared:  0.2558, Adjusted R-squared:  0.2014 
## F-statistic: 4.698 on 3 and 41 DF,  p-value: 0.006551