In the chunk below please install the package “faraway” this tab to do this is located in the bottom right corner between plots and help. (press the install button and search for faraway) Then run the chunk below, you should see prostate in your environment.
#####Code goes below this line
library("faraway")
data(prostate)
#####Code goes above this line
head(prostate)
## lcavol lweight age lbph svi lcp gleason pgg45 lpsa
## 1 -0.5798185 2.7695 50 -1.386294 0 -1.38629 6 0 -0.43078
## 2 -0.9942523 3.3196 58 -1.386294 0 -1.38629 6 0 -0.16252
## 3 -0.5108256 2.6912 74 -1.386294 0 -1.38629 7 20 -0.16252
## 4 -1.2039728 3.2828 58 -1.386294 0 -1.38629 6 0 -0.16252
## 5 0.7514161 3.4324 62 -1.386294 0 -1.38629 6 0 0.37156
## 6 -1.0498221 3.2288 50 -1.386294 0 -1.38629 6 0 0.76547
Help for rest of questions: here is some information on the dataset we are working with today.
help(prostate)
Below create a scatterplot where log prostate-specific antigen is the response and log cancer volume is the predictor (you should be able to tell which variables are which from the help function above) What relationship do you notice? Is the relationship linear?
#put the code below
data("prostate")
plot(prostate$lcavol, prostate$lpsa, main = "Antigen vs Cancer Volume", xlab= "Cancer Volume", ylab = "Antigen")
#put the code above
#ANSWER QUESTION 2 IN THE LINE BELOW There is a correlation positive linear correlation between cancer volume and Antigen.
Below create a linear model where log prostate-specific antigen is the response and log cancer volume is the predictor. Call the “lm” function in r. What is the r squared for this model?
#finish the code below
data("prostate")
plot(prostate$lcavol, prostate$lpsa, main = "Antigen vs Cancer Volume", xlab= "Cancer Volume", ylab = "Antigen")
md1 <- lm(lpsa ~ lcavol, data = prostate)
abline(lm(lpsa ~ lcavol, data = prostate), col = "red", lwd = 2)
#finish the code above
#ANSWER QUESTION 3 IN THE LINE BELOW the R squared is 0.5394.
Please check the following assumptions for linear regression models. For each assumption state whether or not it holds and reason why.
#Put your answer to answer 1) below here Yes, the relation between teh predictor and the response is linear.
#Put your answer to answer 2) below here yes, there are no outliers in this graph
plot(fitted(md1), resid(md1))
#Put your answer to answer 3) below here the data toward the middle of
the X-axis is more closely condenced than the outside of the X-axis.
What is the beta coefficient for the predictor variable? What does the coefficient mean in the context of the model? If you had an x value of 3 what would the prediction for the model be? (Look back at the summary for information on the beta coefficients for the model)
#Put your answer to answer Question 5 below here
beta coefficient 0.74992 which means that with every 1 x increase, the y will also increase by 0.74992.