suppressPackageStartupMessages(library("tidyverse"))
package 㤼㸱tidyverse㤼㸲 was built under R version 3.6.3
suppressPackageStartupMessages(library("modelr"))
package 㤼㸱modelr㤼㸲 was built under R version 3.6.3

1. Instead of using lm() to fit a straight line, you can use loess() to fit a smooth curve. Repeat the process of model fitting, grid generation, predictions, and visualization on sim1 using loess() instead of lm(). How does the result compare to geom_smooth()?

I’ll use add_predictions() and add_residuals() to add the predictions and residuals from a loess regression to the sim1 data.

sim1_loess <- loess(y ~ x, data = sim1)
sim1_lm <- lm(y ~ x, data = sim1)

grid_loess <- sim1 %>%
  add_predictions(sim1_loess)

sim1 <- sim1 %>%
  add_residuals(sim1_lm) %>%
  add_predictions(sim1_lm) %>%
  add_residuals(sim1_loess, var = "resid_loess") %>%
  add_predictions(sim1_loess, var = "pred_loess")

This plots the loess predictions. The loess produces a nonlinear, smooth line through the data.

plot_sim1_loess <-
  ggplot(sim1, aes(x = x, y = y)) +
  geom_point() +
  geom_line(aes(x = x, y = pred), data = grid_loess, colour = "red")
plot_sim1_loess

The predictions of loess are the same as the default method for geom_smooth() because geom_smooth() uses loess() by default; the message even tells us that.

plot_sim1_loess +
  geom_smooth(method = "loess", colour = "blue", se = FALSE, alpha = 0.20)

We can plot the residuals (red), and compare them to the residuals from lm() (black). In general, the loess model has smaller residuals within the sample (out of sample is a different issue, and we haven’t considered the uncertainty of these estimates).

ggplot(sim1, aes(x = x)) +
  geom_ref_line(h = 0) +
  geom_point(aes(y = resid)) +
  geom_point(aes(y = resid_loess), colour = "red")

2. add_predictions() is paired with gather_predictions() and spread_predictions(). How do these three functions differ?

The functions gather_predictions() and spread_predictions() allow for adding predictions from multiple models at once.

Taking the sim1_mod example,

sim1_mod <- lm(y ~ x, data = sim1)
grid <- sim1 %>%
  data_grid(x)

The function add_predictions() adds only a single model at a time. To add two models:

grid %>%
  add_predictions(sim1_mod, var = "pred_lm") %>%
  add_predictions(sim1_loess, var = "pred_loess")

The function gather_predictions() adds predictions from multiple models by stacking the results and adding a column with the model name,

grid %>%
  gather_predictions(sim1_mod, sim1_loess)

The function spread_predictions() adds predictions from multiple models by adding multiple columns (postfixed with the model name) with predictions from each model.

grid %>%
  spread_predictions(sim1_mod, sim1_loess)

The function spread_predictions() is similar to the example which runs add_predictions() for each model, and is equivalent to running spread() after running gather_predictions():

grid %>%
  gather_predictions(sim1_mod, sim1_loess) %>%
  spread(model, pred)

3. What does geom_ref_line() do? What package does it come from? Why is displaying a reference line in plots showing residuals useful and important?

The geom geom_ref_line() adds as reference line to a plot. It is equivalent to running geom_hline() or geom_vline() with default settings that are useful for visualizing models. Putting a reference line at zero for residuals is important because good models (generally) should have residuals centered at zero, with approximately the same variance (or distribution) over the support of x, and no correlation. A zero reference line makes it easier to judge these characteristics visually.

4. Why might you want to look at a frequency polygon of absolute residuals? What are the pros and cons compared to looking at the raw residuals?

Showing the absolute values of the residuals makes it easier to view the spread of the residuals. The model assumes that the residuals have mean zero, and using the absolute values of the residuals effectively doubles the number of residuals.

sim1_mod <- lm(y ~ x, data = sim1)

sim1 <- sim1 %>%
  add_residuals(sim1_mod)

ggplot(sim1, aes(x = abs(resid))) +
  geom_freqpoly(binwidth = 0.5)

However, using the absolute values of residuals throws away information about the sign, meaning that the frequency polygon cannot show whether the model systematically over- or under-estimates the residuals.

LS0tDQp0aXRsZTogIlZpc3VhbGlzaW5nIG1vZGVscyINCm91dHB1dDogDQogIGh0bWxfbm90ZWJvb2s6DQogICAgdG9jOiB0cnVlDQogICAgdG9jX2Zsb2F0OiB0cnVlDQotLS0NCg0KYGBge3J9DQpzdXBwcmVzc1BhY2thZ2VTdGFydHVwTWVzc2FnZXMobGlicmFyeSgidGlkeXZlcnNlIikpDQpzdXBwcmVzc1BhY2thZ2VTdGFydHVwTWVzc2FnZXMobGlicmFyeSgibW9kZWxyIikpDQpgYGANCg0KIyMjIDEuIEluc3RlYWQgb2YgdXNpbmcgYGxtKClgIHRvIGZpdCBhIHN0cmFpZ2h0IGxpbmUsIHlvdSBjYW4gdXNlIGBsb2VzcygpYCB0byBmaXQgYSBzbW9vdGggY3VydmUuIFJlcGVhdCB0aGUgcHJvY2VzcyBvZiBtb2RlbCBmaXR0aW5nLCBncmlkIGdlbmVyYXRpb24sIHByZWRpY3Rpb25zLCBhbmQgdmlzdWFsaXphdGlvbiBvbiBgc2ltMWAgdXNpbmcgYGxvZXNzKClgIGluc3RlYWQgb2YgYGxtKClgLiBIb3cgZG9lcyB0aGUgcmVzdWx0IGNvbXBhcmUgdG8gYGdlb21fc21vb3RoKClgPw0KDQpJ4oCZbGwgdXNlIGBhZGRfcHJlZGljdGlvbnMoKWAgYW5kIGBhZGRfcmVzaWR1YWxzKClgIHRvIGFkZCB0aGUgcHJlZGljdGlvbnMgYW5kIHJlc2lkdWFscyBmcm9tIGEgKmxvZXNzKiByZWdyZXNzaW9uIHRvIHRoZSBgc2ltMWAgZGF0YS4NCg0KYGBge3J9DQpzaW0xX2xvZXNzIDwtIGxvZXNzKHkgfiB4LCBkYXRhID0gc2ltMSkNCnNpbTFfbG0gPC0gbG0oeSB+IHgsIGRhdGEgPSBzaW0xKQ0KDQpncmlkX2xvZXNzIDwtIHNpbTEgJT4lDQogIGFkZF9wcmVkaWN0aW9ucyhzaW0xX2xvZXNzKQ0KDQpzaW0xIDwtIHNpbTEgJT4lDQogIGFkZF9yZXNpZHVhbHMoc2ltMV9sbSkgJT4lDQogIGFkZF9wcmVkaWN0aW9ucyhzaW0xX2xtKSAlPiUNCiAgYWRkX3Jlc2lkdWFscyhzaW0xX2xvZXNzLCB2YXIgPSAicmVzaWRfbG9lc3MiKSAlPiUNCiAgYWRkX3ByZWRpY3Rpb25zKHNpbTFfbG9lc3MsIHZhciA9ICJwcmVkX2xvZXNzIikNCmBgYA0KDQpUaGlzIHBsb3RzIHRoZSBsb2VzcyBwcmVkaWN0aW9ucy4gVGhlIGxvZXNzIHByb2R1Y2VzIGEgbm9ubGluZWFyLCBzbW9vdGggbGluZSB0aHJvdWdoIHRoZSBkYXRhLg0KDQpgYGB7cn0NCnBsb3Rfc2ltMV9sb2VzcyA8LQ0KICBnZ3Bsb3Qoc2ltMSwgYWVzKHggPSB4LCB5ID0geSkpICsNCiAgZ2VvbV9wb2ludCgpICsNCiAgZ2VvbV9saW5lKGFlcyh4ID0geCwgeSA9IHByZWQpLCBkYXRhID0gZ3JpZF9sb2VzcywgY29sb3VyID0gInJlZCIpDQpwbG90X3NpbTFfbG9lc3MNCmBgYA0KDQpUaGUgcHJlZGljdGlvbnMgb2YgbG9lc3MgYXJlIHRoZSBzYW1lIGFzIHRoZSBkZWZhdWx0IG1ldGhvZCBmb3IgYGdlb21fc21vb3RoKClgIGJlY2F1c2UgYGdlb21fc21vb3RoKClgIHVzZXMgYGxvZXNzKClgIGJ5IGRlZmF1bHQ7IHRoZSBtZXNzYWdlIGV2ZW4gdGVsbHMgdXMgdGhhdC4NCg0KYGBge3J9DQpwbG90X3NpbTFfbG9lc3MgKw0KICBnZW9tX3Ntb290aChtZXRob2QgPSAibG9lc3MiLCBjb2xvdXIgPSAiYmx1ZSIsIHNlID0gRkFMU0UsIGFscGhhID0gMC4yMCkNCmBgYA0KDQpXZSBjYW4gcGxvdCB0aGUgcmVzaWR1YWxzIChyZWQpLCBhbmQgY29tcGFyZSB0aGVtIHRvIHRoZSByZXNpZHVhbHMgZnJvbSBgbG0oKWAgKGJsYWNrKS4gSW4gZ2VuZXJhbCwgdGhlICpsb2VzcyogbW9kZWwgaGFzIHNtYWxsZXIgcmVzaWR1YWxzIHdpdGhpbiB0aGUgc2FtcGxlIChvdXQgb2Ygc2FtcGxlIGlzIGEgZGlmZmVyZW50IGlzc3VlLCBhbmQgd2UgaGF2ZW7igJl0IGNvbnNpZGVyZWQgdGhlIHVuY2VydGFpbnR5IG9mIHRoZXNlIGVzdGltYXRlcykuDQoNCmBgYHtyfQ0KZ2dwbG90KHNpbTEsIGFlcyh4ID0geCkpICsNCiAgZ2VvbV9yZWZfbGluZShoID0gMCkgKw0KICBnZW9tX3BvaW50KGFlcyh5ID0gcmVzaWQpKSArDQogIGdlb21fcG9pbnQoYWVzKHkgPSByZXNpZF9sb2VzcyksIGNvbG91ciA9ICJyZWQiKQ0KYGBgDQoNCiMjIyAyLiBgYWRkX3ByZWRpY3Rpb25zKClgIGlzIHBhaXJlZCB3aXRoIGBnYXRoZXJfcHJlZGljdGlvbnMoKWAgYW5kIGBzcHJlYWRfcHJlZGljdGlvbnMoKWAuIEhvdyBkbyB0aGVzZSB0aHJlZSBmdW5jdGlvbnMgZGlmZmVyPw0KDQpUaGUgZnVuY3Rpb25zIGBnYXRoZXJfcHJlZGljdGlvbnMoKWAgYW5kIGBzcHJlYWRfcHJlZGljdGlvbnMoKWAgYWxsb3cgZm9yIGFkZGluZyBwcmVkaWN0aW9ucyBmcm9tIG11bHRpcGxlIG1vZGVscyBhdCBvbmNlLg0KDQpUYWtpbmcgdGhlIGBzaW0xX21vZGAgZXhhbXBsZSwNCg0KYGBge3J9DQpzaW0xX21vZCA8LSBsbSh5IH4geCwgZGF0YSA9IHNpbTEpDQpncmlkIDwtIHNpbTEgJT4lDQogIGRhdGFfZ3JpZCh4KQ0KYGBgDQoNClRoZSBmdW5jdGlvbiBgYWRkX3ByZWRpY3Rpb25zKClgIGFkZHMgb25seSBhIHNpbmdsZSBtb2RlbCBhdCBhIHRpbWUuIFRvIGFkZCB0d28gbW9kZWxzOg0KDQpgYGB7cn0NCmdyaWQgJT4lDQogIGFkZF9wcmVkaWN0aW9ucyhzaW0xX21vZCwgdmFyID0gInByZWRfbG0iKSAlPiUNCiAgYWRkX3ByZWRpY3Rpb25zKHNpbTFfbG9lc3MsIHZhciA9ICJwcmVkX2xvZXNzIikNCmBgYA0KDQpUaGUgZnVuY3Rpb24gYGdhdGhlcl9wcmVkaWN0aW9ucygpYCBhZGRzIHByZWRpY3Rpb25zIGZyb20gbXVsdGlwbGUgbW9kZWxzIGJ5IHN0YWNraW5nIHRoZSByZXN1bHRzIGFuZCBhZGRpbmcgYSBjb2x1bW4gd2l0aCB0aGUgbW9kZWwgbmFtZSwNCg0KYGBge3J9DQpncmlkICU+JQ0KICBnYXRoZXJfcHJlZGljdGlvbnMoc2ltMV9tb2QsIHNpbTFfbG9lc3MpDQpgYGANCg0KVGhlIGZ1bmN0aW9uIGBzcHJlYWRfcHJlZGljdGlvbnMoKWAgYWRkcyBwcmVkaWN0aW9ucyBmcm9tIG11bHRpcGxlIG1vZGVscyBieSBhZGRpbmcgbXVsdGlwbGUgY29sdW1ucyAocG9zdGZpeGVkIHdpdGggdGhlIG1vZGVsIG5hbWUpIHdpdGggcHJlZGljdGlvbnMgZnJvbSBlYWNoIG1vZGVsLg0KDQpgYGB7cn0NCmdyaWQgJT4lDQogIHNwcmVhZF9wcmVkaWN0aW9ucyhzaW0xX21vZCwgc2ltMV9sb2VzcykNCmBgYA0KDQpUaGUgZnVuY3Rpb24gYHNwcmVhZF9wcmVkaWN0aW9ucygpYCBpcyBzaW1pbGFyIHRvIHRoZSBleGFtcGxlIHdoaWNoIHJ1bnMgYGFkZF9wcmVkaWN0aW9ucygpYCBmb3IgZWFjaCBtb2RlbCwgYW5kIGlzIGVxdWl2YWxlbnQgdG8gcnVubmluZyBzcHJlYWQoKSBhZnRlciBydW5uaW5nIGBnYXRoZXJfcHJlZGljdGlvbnMoKWA6DQoNCmBgYHtyfQ0KZ3JpZCAlPiUNCiAgZ2F0aGVyX3ByZWRpY3Rpb25zKHNpbTFfbW9kLCBzaW0xX2xvZXNzKSAlPiUNCiAgc3ByZWFkKG1vZGVsLCBwcmVkKQ0KYGBgDQoNCiMjIyAzLiBXaGF0IGRvZXMgYGdlb21fcmVmX2xpbmUoKWAgZG8/IFdoYXQgcGFja2FnZSBkb2VzIGl0IGNvbWUgZnJvbT8gV2h5IGlzIGRpc3BsYXlpbmcgYSByZWZlcmVuY2UgbGluZSBpbiBwbG90cyBzaG93aW5nIHJlc2lkdWFscyB1c2VmdWwgYW5kIGltcG9ydGFudD8NCg0KVGhlIGBnZW9tIGdlb21fcmVmX2xpbmUoKWAgYWRkcyBhcyByZWZlcmVuY2UgbGluZSB0byBhIHBsb3QuIEl0IGlzIGVxdWl2YWxlbnQgdG8gcnVubmluZyBgZ2VvbV9obGluZSgpYCBvciBgZ2VvbV92bGluZSgpYCB3aXRoIGRlZmF1bHQgc2V0dGluZ3MgdGhhdCBhcmUgdXNlZnVsIGZvciB2aXN1YWxpemluZyBtb2RlbHMuIFB1dHRpbmcgYSByZWZlcmVuY2UgbGluZSBhdCB6ZXJvIGZvciByZXNpZHVhbHMgaXMgaW1wb3J0YW50IGJlY2F1c2UgZ29vZCBtb2RlbHMgKGdlbmVyYWxseSkgc2hvdWxkIGhhdmUgcmVzaWR1YWxzIGNlbnRlcmVkIGF0IHplcm8sIHdpdGggYXBwcm94aW1hdGVseSB0aGUgc2FtZSB2YXJpYW5jZSAob3IgZGlzdHJpYnV0aW9uKSBvdmVyIHRoZSBzdXBwb3J0IG9mIGB4YCwgYW5kIG5vIGNvcnJlbGF0aW9uLiBBIHplcm8gcmVmZXJlbmNlIGxpbmUgbWFrZXMgaXQgZWFzaWVyIHRvIGp1ZGdlIHRoZXNlIGNoYXJhY3RlcmlzdGljcyB2aXN1YWxseS4NCg0KIyMjIDQuIFdoeSBtaWdodCB5b3Ugd2FudCB0byBsb29rIGF0IGEgZnJlcXVlbmN5IHBvbHlnb24gb2YgYWJzb2x1dGUgcmVzaWR1YWxzPyBXaGF0IGFyZSB0aGUgcHJvcyBhbmQgY29ucyBjb21wYXJlZCB0byBsb29raW5nIGF0IHRoZSByYXcgcmVzaWR1YWxzPw0KDQpTaG93aW5nIHRoZSBhYnNvbHV0ZSB2YWx1ZXMgb2YgdGhlIHJlc2lkdWFscyBtYWtlcyBpdCBlYXNpZXIgdG8gdmlldyB0aGUgc3ByZWFkIG9mIHRoZSByZXNpZHVhbHMuIFRoZSBtb2RlbCBhc3N1bWVzIHRoYXQgdGhlIHJlc2lkdWFscyBoYXZlIG1lYW4gemVybywgYW5kIHVzaW5nIHRoZSBhYnNvbHV0ZSB2YWx1ZXMgb2YgdGhlIHJlc2lkdWFscyBlZmZlY3RpdmVseSBkb3VibGVzIHRoZSBudW1iZXIgb2YgcmVzaWR1YWxzLg0KDQpgYGB7cn0NCnNpbTFfbW9kIDwtIGxtKHkgfiB4LCBkYXRhID0gc2ltMSkNCg0Kc2ltMSA8LSBzaW0xICU+JQ0KICBhZGRfcmVzaWR1YWxzKHNpbTFfbW9kKQ0KDQpnZ3Bsb3Qoc2ltMSwgYWVzKHggPSBhYnMocmVzaWQpKSkgKw0KICBnZW9tX2ZyZXFwb2x5KGJpbndpZHRoID0gMC41KQ0KYGBgDQoNCkhvd2V2ZXIsIHVzaW5nIHRoZSBhYnNvbHV0ZSB2YWx1ZXMgb2YgcmVzaWR1YWxzIHRocm93cyBhd2F5IGluZm9ybWF0aW9uIGFib3V0IHRoZSBzaWduLCBtZWFuaW5nIHRoYXQgdGhlIGZyZXF1ZW5jeSBwb2x5Z29uIGNhbm5vdCBzaG93IHdoZXRoZXIgdGhlIG1vZGVsIHN5c3RlbWF0aWNhbGx5IG92ZXItIG9yIHVuZGVyLWVzdGltYXRlcyB0aGUgcmVzaWR1YWxzLg==