1. Using the cheddar data, fit a linear model with taste as the response and the three other variables as predictors.
require(faraway)
## Loading required package: faraway
data(cheddar, package="faraway")
head(cheddar)
##   taste Acetic   H2S Lactic
## 1  12.3  4.543 3.135   0.86
## 2  20.9  5.159 5.043   1.53
## 3  39.0  5.366 5.438   1.57
## 4  47.9  5.759 7.496   1.81
## 5   5.6  4.663 3.807   0.99
## 6  25.9  5.697 7.601   1.09
lmod<-lm(taste~Acetic+H2S+Lactic, data=cheddar)
summary(lmod)
## 
## Call:
## lm(formula = taste ~ Acetic + H2S + Lactic, data = cheddar)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -17.390  -6.612  -1.009   4.908  25.449 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -28.8768    19.7354  -1.463  0.15540   
## Acetic        0.3277     4.4598   0.073  0.94198   
## H2S           3.9118     1.2484   3.133  0.00425 **
## Lactic       19.6705     8.6291   2.280  0.03108 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.13 on 26 degrees of freedom
## Multiple R-squared:  0.6518, Adjusted R-squared:  0.6116 
## F-statistic: 16.22 on 3 and 26 DF,  p-value: 3.81e-06
fit_summary <- summary(lmod)
p.value <-fit_summary$coefficients["Lactic","Pr(>|t|)"]

p.value
## [1] 0.03107948
  1. Is the predictor Lactic statistically significant in this model? Ans: The Lactic is statistically significant in this model.

  2. Give the R command to extract the p-value for the test of βlactic = 0. Hint: look at summary()$coef.

fit_summary <- summary(lmod)
p.value <-fit_summary$coefficients["Lactic","Pr(>|t|)"]
p.value
## [1] 0.03107948
  1. Add normally distributed errors to lactic with mean zero and standard deviation 0.01 and refit the model. Now what is the p-value for the previous test?
cheddar1<-cheddar
n <- nrow(cheddar1)
cheddar1$Lactic1 <- cheddar1$Lactic + rnorm(n, mean = 0, sd = 0.01)

##refit the model
fit1 <- lm(taste~ Acetic+H2S + Lactic1, data = cheddar1)
fit1_summary <- summary(fit1)
fit1_summary
## 
## Call:
## lm(formula = taste ~ Acetic + H2S + Lactic1, data = cheddar1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -17.435  -6.509  -1.165   4.895  25.394 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -28.7726    19.7456  -1.457  0.15704   
## Acetic        0.3121     4.4664   0.070  0.94482   
## H2S           3.9601     1.2408   3.192  0.00368 **
## Lactic1      19.4612     8.5686   2.271  0.03165 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.14 on 26 degrees of freedom
## Multiple R-squared:  0.6514, Adjusted R-squared:  0.6111 
## F-statistic: 16.19 on 3 and 26 DF,  p-value: 3.87e-06
p.value1 <-fit1_summary$coefficients["Lactic1","Pr(>|t|)"]
p.value1
## [1] 0.03164736
  1. Repeat this same calculation of adding errors to Lactic 1000 times within for loop. Save the p -values into a vector. Report on the average p -value. Does this much measurement error makes a qualitative difference to the conclusions?
N = 1000
p.value.list <- rep(0,1000)

for (i in 1:N)

{

cheddar1$Lactic1 <- cheddar1$Lactic + rnorm(n, mean = 0, sd = 0.01)
fit2 <- lm(taste~ Acetic+H2S + Lactic1, data = cheddar1)
fit2_summary <- summary(fit2)
p.value.list[i] <- fit2_summary$coefficients["Lactic1","Pr(>|t|)"]

}
mean(p.value.list)
## [1] 0.03126446
  1. Repeat the previous question but with a standard deviation of 0.1. Does this much measurement error make a difference?
N = 1000
p.value.list <- rep(0,1000)
cheddar2<-cheddar
for (i in 1:N)

{

cheddar2$Lactic1 <- cheddar2$Lactic + rnorm(n, mean = 0, sd = 0.1)
fit3 <- lm(taste~ Acetic+H2S + Lactic1, data = cheddar2)
fit3_summary <- summary(fit3)
p.value.list[i] <- fit3_summary$coefficients["Lactic1","Pr(>|t|)"]

}
mean(p.value.list)
## [1] 0.06478882

conidering the value much higher than 0.05, this measurement error make a difference.