Using the faithful data, fit a regression of duration on waiting. Assuming that there was a measurement error in waiting of 30 seconds, use the SIMEX method to obtain a better estimate of the slope.
Use the print_formula custom function to print the final equation for the model input
print_formula <- function(model){
as.formula(
paste0("y ~ ", round(coefficients(model)[1],4), "",
paste(sprintf(" %+.4f*%s ",
coefficients(model)[-1],
names(coefficients(model)[-1])),
collapse="")
)
)
}
library(faraway)
library(simex)
str(faithful)
## 'data.frame': 272 obs. of 2 variables:
## $ eruptions: num 3.6 1.8 3.33 2.28 4.53 ...
## $ waiting : num 79 54 74 62 85 55 88 85 51 85 ...
eruptions: eruption time in minutes of the old faithful geyser, and
waiting: the waiting time between eruptions in minutes
Let’s fit a linear model on waiting. Note that we must set x = TRUE to ensure that x is returned so as to enable asymptotic variance estimation.
lmod <- lm(eruptions ~ waiting, data = faithful, x= T)
# summary(lmod)
coefficients(lmod)
## (Intercept) waiting
## -1.87401599 0.07562795
print_formula(lmod)
## y ~ -1.874 + 0.0756 * waiting
## <environment: 0x7fa62116d258>
Realized that the coefficient in this first iteration of the model is 0.0756. We’ll now use the SIMEX method to get a better estimate assuming that there is a measurement error of 30 seconds.
adj_lmod <- simex(lmod, SIMEXvariable = "waiting", measurement.error = .5)
coefficients(adj_lmod)
## (Intercept) waiting
## -1.88192543 0.07574088
print_formula(adj_lmod)
## y ~ -1.8819 + 0.0757 * waiting
## <environment: 0x7fa6241de3a8>
We now see that a better estimate of the slope is 0.0757, which is slightly higher than the one previously found, i.e. 0.0756.