bailsofhay — Nov 25, 2013, 5:57 PM
data=read.table("http://www.stat.lsu.edu/exstweb/statlab/datasets/KNNLData/CH09PR10.txt")
names(data)=c("y","x1","x2","x3","x4")
### Part a ###
stem(data$x1)
The decimal point is 1 digit(s) to the right of the |
6 | 248
8 | 4671468
10 | 014456902
12 | 0003
14 | 00
# X1 data is most likely normal since it follows the bell shape curve
stem(data$x2)
The decimal point is 1 digit(s) to the right of the |
6 | 37
8 | 135947
10 | 127034789
12 | 01112599
# X2 data mean is slightly skewed to the right.
stem(data$x3)
The decimal point is 1 digit(s) to the right of the |
8 | 0
9 | 01335556789
10 | 002356789
11 | 3456
# X3 data mean is also skewed to the right with more of the observation values
# having a higher range.
stem(data$x4)
The decimal point is 1 digit(s) to the right of the |
7 | 48
8 | 03457889
9 | 0557
10 | 0223345889
11 | 0
# X4 data is more odlly distributed, not bell shaped or skewed to one side by itself.
### Part b ###
pairs(data)
# The scatter plot matrix shows the relationships between the y observations with
# either x1 or x2, x1 plotted with y or x2, and x2 plotted with either y or x1.
# Y is somewhat linear when plotted against X3 and X4, same with X3 to X4. Y with
# X1 and X2 is not as linearly related though.
cor(data)
y x1 x2 x3 x4
y 1.0000 0.5144 0.4970 0.8971 0.8694
x1 0.5144 1.0000 0.1023 0.1808 0.3267
x2 0.4970 0.1023 1.0000 0.5190 0.3967
x3 0.8971 0.1808 0.5190 1.0000 0.7820
x4 0.8694 0.3267 0.3967 0.7820 1.0000
# The correlation matrix provides informtion about how each y,x1, and x2 correlate
# with one another.
# You can see that Y correlates the most with X3 and X4 at .897 and .869 respectively.
### Part c ###
fit=lm(y~x1+x2+x3+x4, data=data)
fit
Call:
lm(formula = y ~ x1 + x2 + x3 + x4, data = data)
Coefficients:
(Intercept) x1 x2 x3 x4
-124.3818 0.2957 0.0483 1.3060 0.5198
plot(fit$fitted.values, data$y, xlab="Fitted Values", ylab="Job Proficiency Score")
# Yes it does appear that all of the predictor variables should be retained.