Question 1 (2pts) : Read in data

In the chunk below please install the package “faraway” this tab to do this is located in the bottom right corner between plots and help. (press the install button and search for faraway) Then run the chunk below, you should see prostate in your environment.

#####Code goes below this line
library("faraway")
data(prostate)
#####Code goes above this line
head(prostate)

##       lcavol lweight age      lbph svi      lcp gleason pgg45     lpsa
## 1 -0.5798185  2.7695  50 -1.386294   0 -1.38629       6     0 -0.43078
## 2 -0.9942523  3.3196  58 -1.386294   0 -1.38629       6     0 -0.16252
## 3 -0.5108256  2.6912  74 -1.386294   0 -1.38629       7    20 -0.16252
## 4 -1.2039728  3.2828  58 -1.386294   0 -1.38629       6     0 -0.16252
## 5  0.7514161  3.4324  62 -1.386294   0 -1.38629       6     0  0.37156
## 6 -1.0498221  3.2288  50 -1.386294   0 -1.38629       6     0  0.76547

Help for rest of questions: here is some information on the dataset we are working with today.

help(prostate)

Question 2 (4pts) : Build a scatterplot

Below create a scatterplot where log prostate-specific antigen is the response and log cancer volume is the predictor (you should be able to tell which variables are which from the help function above) What relationship do you notice? Is the relationship linear?

#put the code below
data("prostate")
plot(prostate$lcavol, prostate$lpsa, main = "Antigen vs Cancer Volume", xlab= "Cancer Volume", ylab = "Antigen")

#put the code above

#ANSWER QUESTION 2 IN THE LINE BELOW There is a correlation positive linear correlation between cancer volume and Antigen.

Question 3 (4pts) : Build the model and create a summary

Below create a linear model where log prostate-specific antigen is the response and log cancer volume is the predictor. Call the “lm” function in r. What is the r squared for this model?

#finish the code below
data("prostate")
plot(prostate$lcavol, prostate$lpsa, main = "Antigen vs Cancer Volume", xlab= "Cancer Volume", ylab = "Antigen") 
md1 <- lm(lpsa ~ lcavol, data = prostate)
abline(lm(lpsa ~ lcavol, data = prostate), col = "red", lwd = 2)

#finish the code above

#ANSWER QUESTION 3 IN THE LINE BELOW the R squared is 0.5394.

Question 4 (6pts) : Linear regression assumptions

Please check the following assumptions for linear regression models. For each assumption state whether or not it holds and reason why.

Linearity Assumption: the relationship between predictor and response is linear (you can refrence a previous portion of the lab here)

#Put your answer to answer 1) below here Yes, the relation between teh predictor and the response is linear.

Outlier condition: No outliers in the dataset

#Put your answer to answer 2) below here yes, there are no outliers in this graph

Equal spread condition: The spread of observations around the regression line is consistent across all values of the predictor. (the code below will help, remember to run it)

plot(fitted(md1), resid(md1))

#Put your answer to answer 3) below here the data toward the middle of the X-axis is more closely condenced than the outside of the X-axis.

Question 5 (4pts) : Interperting coeficients

What is the beta coefficient for the predictor variable? What does the coefficient mean in the context of the model? If you had an x value of 3 what would the prediction for the model be? (Look back at the summary for information on the beta coefficients for the model)

#Put your answer to answer Question 5 below here

beta coefficient 0.74992 which means that with every 1 x increase, the y will also increase by 0.74992.

LAB2_STAT1000

Margolis

2025 Spring

Question 1 (2pts) : Read in data

Question 2 (4pts) : Build a scatterplot

Question 3 (4pts) : Build the model and create a summary

Question 4 (6pts) : Linear regression assumptions

Question 5 (4pts) : Interperting coeficients