Discussion 11

Using the built in dataset for Mammal Sleep, I can see what the regression analysis of Body Weight to Brain Weight in mammals. This will be Simple linear regression. If other techniques are required, it will fail the regression.

summary(msleep)
##      name              genus               vore          
##  Length:83          Length:83          Length:83         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##     order           conservation        sleep_total      sleep_rem    
##  Length:83          Length:83          Min.   : 1.90   Min.   :0.100  
##  Class :character   Class :character   1st Qu.: 7.85   1st Qu.:0.900  
##  Mode  :character   Mode  :character   Median :10.10   Median :1.500  
##                                        Mean   :10.43   Mean   :1.875  
##                                        3rd Qu.:13.75   3rd Qu.:2.400  
##                                        Max.   :19.90   Max.   :6.600  
##                                                        NA's   :22     
##   sleep_cycle         awake          brainwt            bodywt        
##  Min.   :0.1167   Min.   : 4.10   Min.   :0.00014   Min.   :   0.005  
##  1st Qu.:0.1833   1st Qu.:10.25   1st Qu.:0.00290   1st Qu.:   0.174  
##  Median :0.3333   Median :13.90   Median :0.01240   Median :   1.670  
##  Mean   :0.4396   Mean   :13.57   Mean   :0.28158   Mean   : 166.136  
##  3rd Qu.:0.5792   3rd Qu.:16.15   3rd Qu.:0.12550   3rd Qu.:  41.750  
##  Max.   :1.5000   Max.   :22.10   Max.   :5.71200   Max.   :6654.000  
##  NA's   :51                       NA's   :27
msleep%>%
  ggplot(aes(bodywt, brainwt))+
  geom_point()+
  geom_smooth(method = lm, se = F)+
  labs(title = "Original Data", x="Body Weight", y="Brain Weight")+
  theme_minimal()
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 27 rows containing non-finite values (stat_smooth).
## Warning: Removed 27 rows containing missing values (geom_point).

There seems to be a lot of heavy outliers in the data. We will move forward, however, it does not look good.

msleep_lm<-lm(msleep$bodywt ~ msleep$brainwt)
summary(msleep_lm)
## 
## Call:
## lm(formula = msleep$bodywt ~ msleep$brainwt)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1564.96     7.88    43.41    50.29  1538.88 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -51.73      47.54  -1.088    0.281    
## msleep$brainwt   904.56      47.17  19.176   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 341.6 on 54 degrees of freedom
##   (27 observations deleted due to missingness)
## Multiple R-squared:  0.8719, Adjusted R-squared:  0.8696 
## F-statistic: 367.7 on 1 and 54 DF,  p-value: < 2.2e-16
msleep_lm %>%
  ggplot(aes(fitted(msleep_lm),resid(msleep_lm)))+
  geom_point()+
  geom_smooth(method = lm, se = F)+
  labs(title = "Residual Data", x="Fitted", y="Residual")+
  theme_minimal()
## `geom_smooth()` using formula 'y ~ x'

Again. the outliers on the Residual data play havoc

msleep_lm %>%
  ggplot(aes(sample=resid(msleep_lm)))+
  stat_qq()+
  stat_qq_line()+
  labs(title = "Q-Q Plot")+
  theme_minimal()

Using Simple Linear regression, I can not say you can estimate the size of a mammals brain based on the size of the body. The outliers would have to be dealt with using more advanced techniques for that comparison.