This homework is due on October 23, the day we get back from Fall Break. It is slightly longer than usual due to our missed 20 minutes of clas on wednesday.
Read the pdf-file t-test refresher that has been placed on the resources page of Sakai. Our brief discussion of t-tests will mostly consist of a couple of examples. If you remember t-tests well from STOR 155 or STOR 455, you can skip this reading.
Attempt to do as much of this assignment as you can by next monday with no help from others. Then we will talk about it briefly at the end of class to see how everyone is doing.
sim1a <- tibble(
x = rep(1:10, each = 3),
y = x * 2 + 8 + rt(length(x), df = 2)
)
2. One way to make linear models more robust is to use a different distance measure. For example, instead of root-mean-squared distance, you could use mean-absolute distance:
measure_distance <- function(mod, data) {
diff <- data$y - make_prediction(mod, data)
mean(abs(diff))
}
Redo Exercise 1 with this metric for how good a fit our line is (100 regressions on 100 datasets). You will have to use `optim()’ to find the best-fitting model.
What do you notice about the range of models compared to the range in Exercise 1. How do you explain the results?
3. Get the time series data set salaries.csv and put it in your usual working directory.
Load the data set and have a look. It shows an employee’s salary as a function of number of years worked.
Fit a linear model to the data set and look as summary(your_model) to see various summary statistics. All indications are that the model is a pretty good fit, but just to be sure, look at plot(your_model, which = 1:2) to see two important plots you may remember from a past statistics course.
Based on your answer to (B) what evidence do we have that the linear model is not appropriate?
You may be aware that salaries, like most financial data, tend to grow exponentially over time. Use optim() and the sum-of-squares of residuals to fit a model of the form y = a*b^x to these data points. This will basically involve re-doing what we did in class for linear models but changing `model1()’ to be a new model that is exponential. In case it is helpful, here are the regression lecture commands in an R-script.
This model has the same number of parameters as the linear model in Part (B) so both models are equally parsimonious. Were you able to improve on the sum-of-squares fit from Part (B)?
Plot the residuals for the model in Part (D). Which model is superior, the linear model or the exponential model? Explain.