Using R, build a regression model for data that interests you. Conduct residual analysis. Was the linear model appropriate? Why or why not?

Brain vs Body Mass for Animals

Data Preview

This data represent the relation between weights of the body and brain among various species. Below is a preview of what the dataset looks like.

brain_body <- read.csv('brain_body.txt', header = F, sep = '') 
brain_body <- brain_body[,-1] 
colnames(brain_body) <- c('Brain_wt', 'Body_wt')

head(brain_body)
##   Brain_wt Body_wt
## 1    3.385    44.5
## 2    0.480    15.5
## 3    1.350     8.1
## 4  465.000   423.0
## 5   36.330   119.5
## 6   27.660   115.0

The dataset can be found here. If interested in other datasets similar, they can be found on the homepage if interested in others.

The Linear Model

bb_lm <- lm(brain_body$Body_wt ~ brain_body$Brain_wt)

plot(brain_body$Brain_wt, brain_body$Body_wt)
abline(bb_lm)

summary(bb_lm)
## 
## Call:
## lm(formula = brain_body$Body_wt ~ brain_body$Brain_wt)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -810.07  -88.52  -79.64  -13.02 2050.33 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         91.00440   43.55258    2.09   0.0409 *  
## brain_body$Brain_wt  0.96650    0.04766   20.28   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 334.7 on 60 degrees of freedom
## Multiple R-squared:  0.8727, Adjusted R-squared:  0.8705 
## F-statistic: 411.2 on 1 and 60 DF,  p-value: < 2.2e-16

Notice a few things with this model. The residual standard error which tells you how precisely was the estimate is measured. With this model it is 334.7. This is too high.

A log tranformation may help to see the relationship better.

bb2_lm <- lm(log(brain_body$Body_wt) ~ log(brain_body$Brain_wt))

plot(log(brain_body$Brain_wt), log(brain_body$Body_wt))
abline(bb2_lm)

As suspected, there is a positive correlation between brain size and body size even though most of the data tend to stay lumped in the left corner of the model. With a positive correlation, this can be interpreted as large animals tending to have larger brains than small animals. It is possible that the larger the brain, the more there is room to handle more complex cognitive tasks. There are some significant outliers also that may have influenced the regression also.

Evaluating the quality of the model

summary(bb2_lm)
## 
## Call:
## lm(formula = log(brain_body$Body_wt) ~ log(brain_body$Brain_wt))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.71550 -0.49228 -0.06162  0.43597  1.94829 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               2.13479    0.09604   22.23   <2e-16 ***
## log(brain_body$Brain_wt)  0.75169    0.02846   26.41   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6943 on 60 degrees of freedom
## Multiple R-squared:  0.9208, Adjusted R-squared:  0.9195 
## F-statistic: 697.4 on 1 and 60 DF,  p-value: < 2.2e-16

The formula for the model is \[\hat{y} = 2.13479 + 0.75169 * brain\_wt\]

For this model, 92.08% of the variability in body weight is explained by brain weight. The p-value is very small and is therefore significant within the model. Changes in brain size will likely be related to the size of the body. Also notice that the standard error is much smaller which indicated more accuracy.

Residual Analysis

As stated within the text, residual analysis tells us about the model’s quality. Let’s explore.

plot(bb2_lm$residuals ~ brain_body$Brain_wt, main='Residuals')
abline(h = 0, lty = 3)

hist(bb2_lm$residuals)

qqnorm(bb2_lm$residuals)
qqline(bb2_lm$residuals)

Based on the plots above, the residuals are nearly normal based on the histogram and Q-Q plot. The variability is also constant.

Conclusion

The correlation between brain size and body size is positive and is linear. Also the residuals are distributed normally. There is a reason for some outliers. For instance, small animals such as small birds tend to have larger brain mass to body mass ratio than humans do (that is 1/40 in humans to 1/12 in the small birds). See for yourself here. A linear model is fit for this data.