Using the built in dataset for Mammal Sleep, I can see what the regression analysis of Body Weight to Brain Weight in mammals. This will be Simple linear regression. If other techniques are required, it will fail the regression.
summary(msleep)
## name genus vore
## Length:83 Length:83 Length:83
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## order conservation sleep_total sleep_rem
## Length:83 Length:83 Min. : 1.90 Min. :0.100
## Class :character Class :character 1st Qu.: 7.85 1st Qu.:0.900
## Mode :character Mode :character Median :10.10 Median :1.500
## Mean :10.43 Mean :1.875
## 3rd Qu.:13.75 3rd Qu.:2.400
## Max. :19.90 Max. :6.600
## NA's :22
## sleep_cycle awake brainwt bodywt
## Min. :0.1167 Min. : 4.10 Min. :0.00014 Min. : 0.005
## 1st Qu.:0.1833 1st Qu.:10.25 1st Qu.:0.00290 1st Qu.: 0.174
## Median :0.3333 Median :13.90 Median :0.01240 Median : 1.670
## Mean :0.4396 Mean :13.57 Mean :0.28158 Mean : 166.136
## 3rd Qu.:0.5792 3rd Qu.:16.15 3rd Qu.:0.12550 3rd Qu.: 41.750
## Max. :1.5000 Max. :22.10 Max. :5.71200 Max. :6654.000
## NA's :51 NA's :27
msleep%>%
ggplot(aes(bodywt, brainwt))+
geom_point()+
geom_smooth(method = lm, se = F)+
labs(title = "Original Data", x="Body Weight", y="Brain Weight")+
theme_minimal()
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 27 rows containing non-finite values (stat_smooth).
## Warning: Removed 27 rows containing missing values (geom_point).
There seems to be a lot of heavy outliers in the data. We will move forward, however, it does not look good.
msleep_lm<-lm(msleep$bodywt ~ msleep$brainwt)
summary(msleep_lm)
##
## Call:
## lm(formula = msleep$bodywt ~ msleep$brainwt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1564.96 7.88 43.41 50.29 1538.88
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -51.73 47.54 -1.088 0.281
## msleep$brainwt 904.56 47.17 19.176 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 341.6 on 54 degrees of freedom
## (27 observations deleted due to missingness)
## Multiple R-squared: 0.8719, Adjusted R-squared: 0.8696
## F-statistic: 367.7 on 1 and 54 DF, p-value: < 2.2e-16
msleep_lm %>%
ggplot(aes(fitted(msleep_lm),resid(msleep_lm)))+
geom_point()+
geom_smooth(method = lm, se = F)+
labs(title = "Residual Data", x="Fitted", y="Residual")+
theme_minimal()
## `geom_smooth()` using formula 'y ~ x'
Again. the outliers on the Residual data play havoc
msleep_lm %>%
ggplot(aes(sample=resid(msleep_lm)))+
stat_qq()+
stat_qq_line()+
labs(title = "Q-Q Plot")+
theme_minimal()
Using Simple Linear regression, I can not say you can estimate the size of a mammals brain based on the size of the body. The outliers would have to be dealt with using more advanced techniques for that comparison.