suppressPackageStartupMessages(library("tidyverse"))
package 㤼㸱tidyverse㤼㸲 was built under R version 3.6.3
suppressPackageStartupMessages(library("modelr"))
package 㤼㸱modelr㤼㸲 was built under R version 3.6.3
The option na.action
determines how missing values are handled. It is a function. na.warn
sets it so that there is a warning if there are any missing values. If it is not set (the default), R will silently drop them.
options(na.action = na.warn)
1 One downside of the linear model is that it is sensitive to unusual values because the distance incorporates a squared term. Fit a linear model to the simulated data below, and visualize the results. Rerun a few times to generate different simulated datasets. What do you notice about the model?
sim1a <- tibble(
x = rep(1:10, each = 3),
y = x * 1.5 + 6 + rt(length(x), df = 2)
)
Let’s run it once and plot the results:
ggplot(sim1a, aes(x = x, y = y)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE)

We can also do this more systematically, by generating several simulations and plotting the line.
simt <- function(i) {
tibble(
x = rep(1:10, each = 3),
y = x * 1.5 + 6 + rt(length(x), df = 2),
.id = i
)
}
sims <- map_df(1:12, simt)
ggplot(sims, aes(x = x, y = y)) +
geom_point() +
geom_smooth(method = "lm", colour = "red") +
facet_wrap(~.id, ncol = 4)

What if we did the same things with normal distributions?
sim_norm <- function(i) {
tibble(
x = rep(1:10, each = 3),
y = x * 1.5 + 6 + rnorm(length(x)),
.id = i
)
}
simdf_norm <- map_df(1:12, sim_norm)
ggplot(simdf_norm, aes(x = x, y = y)) +
geom_point() +
geom_smooth(method = "lm", colour = "red") +
facet_wrap(~.id, ncol = 4)

There are not large outliers, and the slopes are more similar.
The reason for this is that the Student’s t-distribution, from which we sample with rt()
has heavier tails than the normal distribution (rnorm()
). This means that the Student’s t-distribution assigns a larger probability to values further from the center of the distribution.
tibble(
x = seq(-5, 5, length.out = 100),
normal = dnorm(x),
student_t = dt(x, df = 2)
) %>%
gather(distribution, density, -x) %>%
ggplot(aes(x = x, y = density, colour = distribution)) +
geom_line()

For a normal distribution with mean zero and standard deviation one, the probability of being greater than 2 is,
pnorm(2, lower.tail = FALSE)
[1] 0.02275013
For a Student’s t-distribution with degrees of freedom = 2, it is more than 3 times higher,
pt(2, df = 2, lower.tail = FALSE)
[1] 0.09175171
2. One way to make linear models more robust is to use a different distance measure. For example, instead of root-mean-squared distance, you could use mean-absolute distance:
measure_distance <- function(mod, data) {
diff <- data$y - make_prediction(mod, data)
mean(abs(diff))
}
For the above function to work, we need to define a function, make_prediction()
, that takes a numeric vector of length two (the intercept and slope) and returns the predictions,
make_prediction <- function(mod, data) {
mod[1] + mod[2] * data$x
}
Using the sim1a
data, the best parameters of the least absolute deviation are:
best <- optim(c(0, 0), measure_distance, data = sim1a)
best$par
[1] 6.440298 1.459773
Using the sim1a
data, while the parameters the minimize the least squares objective function are:
measure_distance_ls <- function(mod, data) {
diff <- data$y - (mod[1] + mod[2] * data$x)
sqrt(mean(diff^2))
}
best <- optim(c(0, 0), measure_distance_ls, data = sim1a)
best$par
[1] 6.078735 1.488993
In practice, I suggest not using optim()
to fit this model, and instead using an existing implementation. The rlm()
and lqs()
functions in the MASS fit robust and resistant linear models.
LS0tDQp0aXRsZTogIkEgc2ltcGxlIG1vZGVsIg0Kb3V0cHV0OiANCiAgaHRtbF9ub3RlYm9vazoNCiAgICB0b2M6IHRydWUNCiAgICB0b2NfZmxvYXQ6IHRydWUNCi0tLQ0KDQpgYGB7cn0NCnN1cHByZXNzUGFja2FnZVN0YXJ0dXBNZXNzYWdlcyhsaWJyYXJ5KCJ0aWR5dmVyc2UiKSkNCnN1cHByZXNzUGFja2FnZVN0YXJ0dXBNZXNzYWdlcyhsaWJyYXJ5KCJtb2RlbHIiKSkNCmBgYA0KDQpUaGUgb3B0aW9uIGBuYS5hY3Rpb25gIGRldGVybWluZXMgaG93IG1pc3NpbmcgdmFsdWVzIGFyZSBoYW5kbGVkLiBJdCBpcyBhIGZ1bmN0aW9uLiBgbmEud2FybmAgc2V0cyBpdCBzbyB0aGF0IHRoZXJlIGlzIGEgd2FybmluZyBpZiB0aGVyZSBhcmUgYW55IG1pc3NpbmcgdmFsdWVzLiBJZiBpdCBpcyBub3Qgc2V0ICh0aGUgZGVmYXVsdCksIFIgd2lsbCBzaWxlbnRseSBkcm9wIHRoZW0uDQoNCmBgYHtyfQ0Kb3B0aW9ucyhuYS5hY3Rpb24gPSBuYS53YXJuKQ0KYGBgDQoNCiMjIyAxIE9uZSBkb3duc2lkZSBvZiB0aGUgbGluZWFyIG1vZGVsIGlzIHRoYXQgaXQgaXMgc2Vuc2l0aXZlIHRvIHVudXN1YWwgdmFsdWVzIGJlY2F1c2UgdGhlIGRpc3RhbmNlIGluY29ycG9yYXRlcyBhIHNxdWFyZWQgdGVybS4gRml0IGEgbGluZWFyIG1vZGVsIHRvIHRoZSBzaW11bGF0ZWQgZGF0YSBiZWxvdywgYW5kIHZpc3VhbGl6ZSB0aGUgcmVzdWx0cy4gUmVydW4gYSBmZXcgdGltZXMgdG8gZ2VuZXJhdGUgZGlmZmVyZW50IHNpbXVsYXRlZCBkYXRhc2V0cy4gV2hhdCBkbyB5b3Ugbm90aWNlIGFib3V0IHRoZSBtb2RlbD8NCg0KYGBge3J9DQpzaW0xYSA8LSB0aWJibGUoDQogIHggPSByZXAoMToxMCwgZWFjaCA9IDMpLA0KICB5ID0geCAqIDEuNSArIDYgKyBydChsZW5ndGgoeCksIGRmID0gMikNCikNCmBgYA0KDQpMZXTigJlzIHJ1biBpdCBvbmNlIGFuZCBwbG90IHRoZSByZXN1bHRzOg0KDQpgYGB7cn0NCmdncGxvdChzaW0xYSwgYWVzKHggPSB4LCB5ID0geSkpICsNCiAgZ2VvbV9wb2ludCgpICsNCiAgZ2VvbV9zbW9vdGgobWV0aG9kID0gImxtIiwgc2UgPSBGQUxTRSkNCmBgYA0KDQpXZSBjYW4gYWxzbyBkbyB0aGlzIG1vcmUgc3lzdGVtYXRpY2FsbHksIGJ5IGdlbmVyYXRpbmcgc2V2ZXJhbCBzaW11bGF0aW9ucyBhbmQgcGxvdHRpbmcgdGhlIGxpbmUuDQoNCmBgYHtyfQ0Kc2ltdCA8LSBmdW5jdGlvbihpKSB7DQogIHRpYmJsZSgNCiAgICB4ID0gcmVwKDE6MTAsIGVhY2ggPSAzKSwNCiAgICB5ID0geCAqIDEuNSArIDYgKyBydChsZW5ndGgoeCksIGRmID0gMiksDQogICAgLmlkID0gaQ0KICApDQp9DQoNCnNpbXMgPC0gbWFwX2RmKDE6MTIsIHNpbXQpDQoNCmdncGxvdChzaW1zLCBhZXMoeCA9IHgsIHkgPSB5KSkgKw0KICBnZW9tX3BvaW50KCkgKw0KICBnZW9tX3Ntb290aChtZXRob2QgPSAibG0iLCBjb2xvdXIgPSAicmVkIikgKw0KICBmYWNldF93cmFwKH4uaWQsIG5jb2wgPSA0KQ0KYGBgDQoNCldoYXQgaWYgd2UgZGlkIHRoZSBzYW1lIHRoaW5ncyB3aXRoIG5vcm1hbCBkaXN0cmlidXRpb25zPw0KDQpgYGB7cn0NCnNpbV9ub3JtIDwtIGZ1bmN0aW9uKGkpIHsNCiAgdGliYmxlKA0KICAgIHggPSByZXAoMToxMCwgZWFjaCA9IDMpLA0KICAgIHkgPSB4ICogMS41ICsgNiArIHJub3JtKGxlbmd0aCh4KSksDQogICAgLmlkID0gaQ0KICApDQp9DQoNCnNpbWRmX25vcm0gPC0gbWFwX2RmKDE6MTIsIHNpbV9ub3JtKQ0KDQpnZ3Bsb3Qoc2ltZGZfbm9ybSwgYWVzKHggPSB4LCB5ID0geSkpICsNCiAgZ2VvbV9wb2ludCgpICsNCiAgZ2VvbV9zbW9vdGgobWV0aG9kID0gImxtIiwgY29sb3VyID0gInJlZCIpICsNCiAgZmFjZXRfd3JhcCh+LmlkLCBuY29sID0gNCkNCmBgYA0KDQpUaGVyZSBhcmUgbm90IGxhcmdlIG91dGxpZXJzLCBhbmQgdGhlIHNsb3BlcyBhcmUgbW9yZSBzaW1pbGFyLg0KDQpUaGUgcmVhc29uIGZvciB0aGlzIGlzIHRoYXQgdGhlIFN0dWRlbnTigJlzICp0Ki1kaXN0cmlidXRpb24sIGZyb20gd2hpY2ggd2Ugc2FtcGxlIHdpdGggYHJ0KClgIGhhcyBoZWF2aWVyIHRhaWxzIHRoYW4gdGhlIG5vcm1hbCBkaXN0cmlidXRpb24gKGBybm9ybSgpYCkuIFRoaXMgbWVhbnMgdGhhdCB0aGUgU3R1ZGVudOKAmXMgKnQqLWRpc3RyaWJ1dGlvbiBhc3NpZ25zIGEgbGFyZ2VyIHByb2JhYmlsaXR5IHRvIHZhbHVlcyBmdXJ0aGVyIGZyb20gdGhlIGNlbnRlciBvZiB0aGUgZGlzdHJpYnV0aW9uLg0KDQpgYGB7cn0NCnRpYmJsZSgNCiAgeCA9IHNlcSgtNSwgNSwgbGVuZ3RoLm91dCA9IDEwMCksDQogIG5vcm1hbCA9IGRub3JtKHgpLA0KICBzdHVkZW50X3QgPSBkdCh4LCBkZiA9IDIpDQopICU+JQ0KICBnYXRoZXIoZGlzdHJpYnV0aW9uLCBkZW5zaXR5LCAteCkgJT4lDQogIGdncGxvdChhZXMoeCA9IHgsIHkgPSBkZW5zaXR5LCBjb2xvdXIgPSBkaXN0cmlidXRpb24pKSArDQogIGdlb21fbGluZSgpDQpgYGANCg0KRm9yIGEgbm9ybWFsIGRpc3RyaWJ1dGlvbiB3aXRoIG1lYW4gemVybyBhbmQgc3RhbmRhcmQgZGV2aWF0aW9uIG9uZSwgdGhlIHByb2JhYmlsaXR5IG9mIGJlaW5nIGdyZWF0ZXIgdGhhbiAyIGlzLA0KDQpgYGB7cn0NCnBub3JtKDIsIGxvd2VyLnRhaWwgPSBGQUxTRSkNCmBgYA0KDQpGb3IgYSBTdHVkZW504oCZcyAqdCotZGlzdHJpYnV0aW9uIHdpdGggZGVncmVlcyBvZiBmcmVlZG9tID0gMiwgaXQgaXMgbW9yZSB0aGFuIDMgdGltZXMgaGlnaGVyLA0KDQpgYGB7cn0NCnB0KDIsIGRmID0gMiwgbG93ZXIudGFpbCA9IEZBTFNFKQ0KYGBgDQoNCiMjIyAyLiBPbmUgd2F5IHRvIG1ha2UgbGluZWFyIG1vZGVscyBtb3JlIHJvYnVzdCBpcyB0byB1c2UgYSBkaWZmZXJlbnQgZGlzdGFuY2UgbWVhc3VyZS4gRm9yIGV4YW1wbGUsIGluc3RlYWQgb2Ygcm9vdC1tZWFuLXNxdWFyZWQgZGlzdGFuY2UsIHlvdSBjb3VsZCB1c2UgbWVhbi1hYnNvbHV0ZSBkaXN0YW5jZToNCg0KYGBge3J9DQptZWFzdXJlX2Rpc3RhbmNlIDwtIGZ1bmN0aW9uKG1vZCwgZGF0YSkgew0KICBkaWZmIDwtIGRhdGEkeSAtIG1ha2VfcHJlZGljdGlvbihtb2QsIGRhdGEpDQogIG1lYW4oYWJzKGRpZmYpKQ0KfQ0KYGBgDQoNCkZvciB0aGUgYWJvdmUgZnVuY3Rpb24gdG8gd29yaywgd2UgbmVlZCB0byBkZWZpbmUgYSBmdW5jdGlvbiwgYG1ha2VfcHJlZGljdGlvbigpYCwgdGhhdCB0YWtlcyBhIG51bWVyaWMgdmVjdG9yIG9mIGxlbmd0aCB0d28gKHRoZSBpbnRlcmNlcHQgYW5kIHNsb3BlKSBhbmQgcmV0dXJucyB0aGUgcHJlZGljdGlvbnMsDQoNCmBgYHtyfQ0KbWFrZV9wcmVkaWN0aW9uIDwtIGZ1bmN0aW9uKG1vZCwgZGF0YSkgew0KICBtb2RbMV0gKyBtb2RbMl0gKiBkYXRhJHgNCn0NCmBgYA0KDQpVc2luZyB0aGUgYHNpbTFhYCBkYXRhLCB0aGUgYmVzdCBwYXJhbWV0ZXJzIG9mIHRoZSBsZWFzdCBhYnNvbHV0ZSBkZXZpYXRpb24gYXJlOg0KDQpgYGB7cn0NCmJlc3QgPC0gb3B0aW0oYygwLCAwKSwgbWVhc3VyZV9kaXN0YW5jZSwgZGF0YSA9IHNpbTFhKQ0KYmVzdCRwYXINCmBgYA0KDQpVc2luZyB0aGUgYHNpbTFhYCBkYXRhLCB3aGlsZSB0aGUgcGFyYW1ldGVycyB0aGUgbWluaW1pemUgdGhlIGxlYXN0IHNxdWFyZXMgb2JqZWN0aXZlIGZ1bmN0aW9uIGFyZToNCg0KYGBge3J9DQptZWFzdXJlX2Rpc3RhbmNlX2xzIDwtIGZ1bmN0aW9uKG1vZCwgZGF0YSkgew0KICBkaWZmIDwtIGRhdGEkeSAtIChtb2RbMV0gKyBtb2RbMl0gKiBkYXRhJHgpDQogIHNxcnQobWVhbihkaWZmXjIpKQ0KfQ0KDQpiZXN0IDwtIG9wdGltKGMoMCwgMCksIG1lYXN1cmVfZGlzdGFuY2VfbHMsIGRhdGEgPSBzaW0xYSkNCmJlc3QkcGFyDQpgYGANCg0KSW4gcHJhY3RpY2UsIEkgc3VnZ2VzdCBub3QgdXNpbmcgYG9wdGltKClgIHRvIGZpdCB0aGlzIG1vZGVsLCBhbmQgaW5zdGVhZCB1c2luZyBhbiBleGlzdGluZyBpbXBsZW1lbnRhdGlvbi4gVGhlIGBybG0oKWAgYW5kIGBscXMoKWAgZnVuY3Rpb25zIGluIHRoZSBbTUFTU10oaHR0cHM6Ly9jcmFuLnItcHJvamVjdC5vcmcvcGFja2FnZT1NQVNTKSBmaXQgcm9idXN0IGFuZCByZXNpc3RhbnQgbGluZWFyIG1vZGVscy4NCg0KIyMjIDMuIE9uZSBjaGFsbGVuZ2Ugd2l0aCBwZXJmb3JtaW5nIG51bWVyaWNhbCBvcHRpbWl6YXRpb24gaXMgdGhhdCBpdOKAmXMgb25seSBndWFyYW50ZWVkIHRvIGZpbmQgYSBsb2NhbCBvcHRpbXVtLiBXaGF04oCZcyB0aGUgcHJvYmxlbSB3aXRoIG9wdGltaXppbmcgYSB0aHJlZSBwYXJhbWV0ZXIgbW9kZWwgbGlrZSB0aGlzPw0KDQpgYGB7cn0NCm1vZGVsMyA8LSBmdW5jdGlvbihhLCBkYXRhKSB7DQogIGFbMV0gKyBkYXRhJHggKiBhWzJdICsgYVszXQ0KfQ0KYGBgDQoNClRoZSBwcm9ibGVtIGlzIHRoYXQgeW91IGZvciBhbnkgdmFsdWVzIGBhWzFdID0gYTFgIGFuZCBgYVszXSA9IGEzYCwgYW55IG90aGVyIHZhbHVlcyBvZiBgYVsxXWAgYW5kIGBhWzNdYCB3aGVyZSBgYVsxXSArIGFbM10gPT0gKGExICsgYTMpYCB3aWxsIGhhdmUgdGhlIHNhbWUgZml0Lg0KDQpgYGB7cn0NCm1lYXN1cmVfZGlzdGFuY2VfMyA8LSBmdW5jdGlvbihhLCBkYXRhKSB7DQogIGRpZmYgPC0gZGF0YSR5IC0gbW9kZWwzKGEsIGRhdGEpDQogIHNxcnQobWVhbihkaWZmXjIpKQ0KfQ0KYGBgDQoNCkRlcGVuZGluZyBvbiBvdXIgc3RhcnRpbmcgcG9pbnRzLCB3ZSBjYW4gZmluZCBkaWZmZXJlbnQgb3B0aW1hbCB2YWx1ZXM6DQoNCmBgYHtyfQ0KYmVzdDNhIDwtIG9wdGltKGMoMCwgMCwgMCksIG1lYXN1cmVfZGlzdGFuY2VfMywgZGF0YSA9IHNpbTEpDQpiZXN0M2EkcGFyDQpiZXN0M2IgPC0gb3B0aW0oYygwLCAwLCAxKSwgbWVhc3VyZV9kaXN0YW5jZV8zLCBkYXRhID0gc2ltMSkNCmJlc3QzYiRwYXINCmJlc3QzYyA8LSBvcHRpbShjKDAsIDAsIDUpLCBtZWFzdXJlX2Rpc3RhbmNlXzMsIGRhdGEgPSBzaW0xKQ0KYmVzdDNjJHBhcg0KYGBgDQoNCkluIGZhY3QgdGhlcmUgYXJlIGFuIGluZmluaXRlIG51bWJlciBvZiBvcHRpbWFsIHZhbHVlcyBmb3IgdGhpcyBtb2RlbC4=