The data set contains two variables: X is the nasal length of the Kangaroos in mm Y is the nasal width of the Kangaroos in mm
Which will be labeled nasal_length & nasal_width Data Course: Data Repository
setwd('/Volumes/Document_Drive/CUNY_MS_ANALYTICS/Data 605/Assignment Eleven')
kangaroos <- read.csv('slr07.csv')
(head(kangaroos))
## X Y
## 1 609 241
## 2 629 222
## 3 620 233
## 4 564 207
## 5 645 247
## 6 493 189
colnames(kangaroos)<- c('nasal_length', 'nasal_width')
(head(kangaroos))
## nasal_length nasal_width
## 1 609 241
## 2 629 222
## 3 620 233
## 4 564 207
## 5 645 247
## 6 493 189
plot(kangaroos$nasal_length, kangaroos$nasal_width, pch = 19, col = 'red', main = 'Kangaroo Nose Width vs Nose Length', type = 'p', xlab = "Nasal Length (mm)", ylab='Nasal Width (mm)')
From the scatter plot there does indeed seem to be a general linear trend in the relationship between nasal lengths and widths of Grey Kangaroos. It is worth a closer look.
kangaroo_lm = lm(nasal_width~ nasal_length, data=kangaroos)
summary(kangaroo_lm)
##
## Call:
## lm(formula = nasal_width ~ nasal_length, data = kangaroos)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.656 -7.479 2.132 8.229 27.344
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 46.4508 16.2998 2.85 0.00669 **
## nasal_length 0.2876 0.0235 12.24 1.34e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14.44 on 43 degrees of freedom
## Multiple R-squared: 0.7769, Adjusted R-squared: 0.7717
## F-statistic: 149.7 on 1 and 43 DF, p-value: 1.342e-15
Based on the coefficients of the linear regression, the ordinary least squares equation for this data set is: \(.2876nasal\_length + 46.4508\) and based on the $R^2=.7769 $ , 77.69% of the nasal width is explained by the length.
plot(kangaroos$nasal_length, kangaroos$nasal_width, main ="Kangaroo Nose Length vs Nose Width with Regression Line", xlab = "Nose Length", ylab = "Nose Width", col='red', pch=19)
abline(kangaroo_lm, col="purple")
This regression line splits the data rather nicely up the middle.
qqplot(fitted(kangaroo_lm), resid(kangaroo_lm), col='red', pch=19)
qqnorm(resid(kangaroo_lm), col='red', pch=19)
qqline(resid(kangaroo_lm), col ='purple')
The residual plots fit the regression line well in the middle, but less so at either end. Based on these plots, the 22% or of nose width not explained by nose length, are from the extremes at either end, where the residuals depart from the line.