Problem 9.10

bailsofhay — Nov 25, 2013, 5:57 PM

data=read.table("http://www.stat.lsu.edu/exstweb/statlab/datasets/KNNLData/CH09PR10.txt")
names(data)=c("y","x1","x2","x3","x4")

### Part a ###
stem(data$x1)

  The decimal point is 1 digit(s) to the right of the |

   6 | 248
   8 | 4671468
  10 | 014456902
  12 | 0003
  14 | 00
# X1 data is most likely normal since it follows the bell shape curve
stem(data$x2)

  The decimal point is 1 digit(s) to the right of the |

   6 | 37
   8 | 135947
  10 | 127034789
  12 | 01112599
# X2 data mean is slightly skewed to the right.
stem(data$x3)

  The decimal point is 1 digit(s) to the right of the |

   8 | 0
   9 | 01335556789
  10 | 002356789
  11 | 3456
# X3 data mean is also skewed to the right with more of the observation values
# having a higher range. 
stem(data$x4)

  The decimal point is 1 digit(s) to the right of the |

   7 | 48
   8 | 03457889
   9 | 0557
  10 | 0223345889
  11 | 0
# X4 data is more odlly distributed, not bell shaped or skewed to one side by itself. 

### Part b ###

pairs(data)

plot of chunk unnamed-chunk-1

# The scatter plot matrix shows the relationships between the y observations with
# either x1 or x2,  x1 plotted with y or x2, and x2 plotted with either y or x1.
# Y is somewhat linear when plotted against X3 and X4, same with X3 to X4. Y with
# X1 and X2 is not as linearly related though. 
cor(data)
        y     x1     x2     x3     x4
y  1.0000 0.5144 0.4970 0.8971 0.8694
x1 0.5144 1.0000 0.1023 0.1808 0.3267
x2 0.4970 0.1023 1.0000 0.5190 0.3967
x3 0.8971 0.1808 0.5190 1.0000 0.7820
x4 0.8694 0.3267 0.3967 0.7820 1.0000
# The correlation matrix provides informtion about how each y,x1, and x2 correlate 
# with one another.
# You can see that Y correlates the most with X3 and X4 at .897 and .869 respectively.

### Part c ###
fit=lm(y~x1+x2+x3+x4, data=data)
fit

Call:
lm(formula = y ~ x1 + x2 + x3 + x4, data = data)

Coefficients:
(Intercept)           x1           x2           x3           x4  
  -124.3818       0.2957       0.0483       1.3060       0.5198  
plot(fit$fitted.values, data$y, xlab="Fitted Values", ylab="Job Proficiency Score")

plot of chunk unnamed-chunk-1

# Yes it does appear that all of the predictor variables should be retained.