This data represent the relation between weights of the body and brain among various species. Below is a preview of what the dataset looks like.
brain_body <- read.csv('brain_body.txt', header = F, sep = '')
brain_body <- brain_body[,-1]
colnames(brain_body) <- c('Brain_wt', 'Body_wt')
head(brain_body)
## Brain_wt Body_wt
## 1 3.385 44.5
## 2 0.480 15.5
## 3 1.350 8.1
## 4 465.000 423.0
## 5 36.330 119.5
## 6 27.660 115.0
The dataset can be found here. If interested in other datasets similar, they can be found on the homepage if interested in others.
bb_lm <- lm(brain_body$Body_wt ~ brain_body$Brain_wt)
plot(brain_body$Brain_wt, brain_body$Body_wt)
abline(bb_lm)
summary(bb_lm)
##
## Call:
## lm(formula = brain_body$Body_wt ~ brain_body$Brain_wt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -810.07 -88.52 -79.64 -13.02 2050.33
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 91.00440 43.55258 2.09 0.0409 *
## brain_body$Brain_wt 0.96650 0.04766 20.28 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 334.7 on 60 degrees of freedom
## Multiple R-squared: 0.8727, Adjusted R-squared: 0.8705
## F-statistic: 411.2 on 1 and 60 DF, p-value: < 2.2e-16
Notice a few things with this model. The residual standard error which tells you how precisely was the estimate is measured. With this model it is 334.7
. This is too high.
A log tranformation may help to see the relationship better.
bb2_lm <- lm(log(brain_body$Body_wt) ~ log(brain_body$Brain_wt))
plot(log(brain_body$Brain_wt), log(brain_body$Body_wt))
abline(bb2_lm)
As suspected, there is a positive correlation between brain size and body size even though most of the data tend to stay lumped in the left corner of the model. With a positive correlation, this can be interpreted as large animals tending to have larger brains than small animals. It is possible that the larger the brain, the more there is room to handle more complex cognitive tasks. There are some significant outliers also that may have influenced the regression also.
summary(bb2_lm)
##
## Call:
## lm(formula = log(brain_body$Body_wt) ~ log(brain_body$Brain_wt))
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.71550 -0.49228 -0.06162 0.43597 1.94829
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.13479 0.09604 22.23 <2e-16 ***
## log(brain_body$Brain_wt) 0.75169 0.02846 26.41 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6943 on 60 degrees of freedom
## Multiple R-squared: 0.9208, Adjusted R-squared: 0.9195
## F-statistic: 697.4 on 1 and 60 DF, p-value: < 2.2e-16
The formula for the model is \[\hat{y} = 2.13479 + 0.75169 * brain\_wt\]
For this model, 92.08%
of the variability in body weight is explained by brain weight. The p-value
is very small and is therefore significant within the model. Changes in brain size will likely be related to the size of the body. Also notice that the standard error is much smaller which indicated more accuracy.
As stated within the text, residual analysis tells us about the model’s quality. Let’s explore.
plot(bb2_lm$residuals ~ brain_body$Brain_wt, main='Residuals')
abline(h = 0, lty = 3)
hist(bb2_lm$residuals)
qqnorm(bb2_lm$residuals)
qqline(bb2_lm$residuals)
Based on the plots above, the residuals are nearly normal based on the histogram and Q-Q plot. The variability is also constant.
The correlation between brain size and body size is positive and is linear. Also the residuals are distributed normally. There is a reason for some outliers. For instance, small animals such as small birds tend to have larger brain mass to body mass ratio than humans do (that is 1/40 in humans to 1/12 in the small birds). See for yourself here. A linear model is fit for this data.