Grey Kangaroos

The data set contains two variables: X is the nasal length of the Kangaroos in mm Y is the nasal width of the Kangaroos in mm

Which will be labeled nasal_length & nasal_width Data Course: Data Repository

setwd('/Volumes/Document_Drive/CUNY_MS_ANALYTICS/Data 605/Assignment Eleven')
kangaroos <- read.csv('slr07.csv')
(head(kangaroos))
##     X   Y
## 1 609 241
## 2 629 222
## 3 620 233
## 4 564 207
## 5 645 247
## 6 493 189
colnames(kangaroos)<- c('nasal_length', 'nasal_width')
(head(kangaroos))
##   nasal_length nasal_width
## 1          609         241
## 2          629         222
## 3          620         233
## 4          564         207
## 5          645         247
## 6          493         189

Visualizing the Kangaro Data

plot(kangaroos$nasal_length, kangaroos$nasal_width, pch = 19, col = 'red', main = 'Kangaroo Nose Width vs Nose Length', type = 'p', xlab = "Nasal Length (mm)", ylab='Nasal Width (mm)')

From the scatter plot there does indeed seem to be a general linear trend in the relationship between nasal lengths and widths of Grey Kangaroos. It is worth a closer look.

Regression

kangaroo_lm = lm(nasal_width~ nasal_length, data=kangaroos)
summary(kangaroo_lm)
## 
## Call:
## lm(formula = nasal_width ~ nasal_length, data = kangaroos)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.656  -7.479   2.132   8.229  27.344 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   46.4508    16.2998    2.85  0.00669 ** 
## nasal_length   0.2876     0.0235   12.24 1.34e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.44 on 43 degrees of freedom
## Multiple R-squared:  0.7769, Adjusted R-squared:  0.7717 
## F-statistic: 149.7 on 1 and 43 DF,  p-value: 1.342e-15

Based on the coefficients of the linear regression, the ordinary least squares equation for this data set is: \(.2876nasal\_length + 46.4508\) and based on the $R^2=.7769 $ , 77.69% of the nasal width is explained by the length.

plot(kangaroos$nasal_length, kangaroos$nasal_width, main ="Kangaroo Nose Length vs Nose Width with Regression Line", xlab = "Nose Length", ylab = "Nose Width", col='red', pch=19)
abline(kangaroo_lm, col="purple")

This regression line splits the data rather nicely up the middle.

Residuals

qqplot(fitted(kangaroo_lm), resid(kangaroo_lm), col='red', pch=19)

qqnorm(resid(kangaroo_lm), col='red', pch=19)
qqline(resid(kangaroo_lm), col ='purple')

The residual plots fit the regression line well in the middle, but less so at either end. Based on these plots, the 22% or of nose width not explained by nose length, are from the extremes at either end, where the residuals depart from the line.