Amongst the common confusions regarding regression is that either \(X\) or \(Y\) need to be normally distributed. For the the case of \(X\), it’s easier to figure out why that shouldn’t be the case (other than it’s not a necessary assumption for OLS). You can’t have factor variables (sex, social class, etc) if \(X\) needed to be normally distributed.

It’s slightly less clear from an application stand point why the \(Y\) doesn’t have to be normally distributed. We know that one of the commonly mentioned regression assumptions is that the error terms of our model need to be normally distributed. The assumptions isn’t necessary for the OLS estimator to be BLUE, however it’s important when constructing the confidence intervals of our regression coefficients.

Now the question that never crossed my mind was this: If the error terms of \(Y\) are normally distributed is the \(Y\) necessarly normally distributed?

Fortunately, more curious people asked that question. The answer is no.

The example below uses a \(Y\) with a bimodal distribution (hence not normal) to showcase how you can have non normally distributed Y with normally distributed residuals.

Simulation

Let’s start by simulating 10000 observations from a binomial distribution (in the case below, a bernoulli distribution)

set.seed(1994)
xi <- rbinom(10000, 1, .5)

We create a \(y_i\), our dependant variable, from \(x_i\)

yi <- 0 + 5 * xi + rnorm(10000, .7)

Let’s plot a histogram of \(y_i\) to visually inspect the distribution of \(y_i\)

hist(yi, breaks=20)

As we can see from the histogram above, \(y_i\) has a bimodal distribution (since our \(x_i\) is binomial, dichotomous). We have achieved our goal of creating a \(y_i\) that’s not normally distributed.

We can also look at the qq plot of the \(y_i\) to double check the distribution again.

qqnorm(yi)

From the QQ plot above we can see that the \(y_i\) dependant variable is not normally distributed.

Fitting Our Model

We will now fit a regression of \(y_i\) on \(x_i\)

model <- lm(yi~xi)

Let’s check the diagnostic plots of our regression

plot(model)

The residuals plot we get is a consequence of using a bimodal \(x_i\). But our focus is not on the residuals plot but on the distribution of the residuals which we can see in the 2nd plot (QQ plot). Let’s just focus on that one below

qqnorm(resid(model))

As we see our residuals are normally distributed even though our \(y_i\) dependant variable wasn’t. We can check the histogram of our residuals as well to double check

hist(resid(model),breaks=20)

Conclusion

We have shown in this post that normally distributed errors don’t require or originate from normally distributed dependant variables. We used the case of a dependant variable with a bimodal distribution and found that it’s error terms are normally distributed.

Reference:

http://www.programmingr.com/examples/neat-tricks/sample-r-function/r-rbinom/ Simulating Binomial and Bernoulli distributions in R

https://stats.stackexchange.com/questions/11351/left-skewed-vs-symmetric-distribution-observed/11352#11352 The code above is inspired from this stackexchange post

https://stats.stackexchange.com/questions/12262/what-if-residuals-are-normally-distributed-but-y-is-not Another example using a multimodal distribution of Y

LS0tDQp0aXRsZTogIk5vbi1Ob3JtYWwgRGVwZW5kYW50IFZhcmlhYmxlIHdpdGggTm9ybWFsbHkgRGlzdHJpYnV0ZWQgUmVzaWR1YWxzIg0KYXV0aG9yOiAiQW5hcyBGYXJhaCAtIGFuYXNmYXJhaEBjbXUuZWR1Ig0KZGF0ZTogImByIGZvcm1hdChTeXMudGltZSgpLCAnJWQgJUIsICVZJylgIiANCm91dHB1dDoNCiAgDQogIGh0bWxfbm90ZWJvb2s6IGRlZmF1bHQNCiAgcGRmX2RvY3VtZW50OiBkZWZhdWx0DQotLS0NCg0KQW1vbmdzdCB0aGUgY29tbW9uIGNvbmZ1c2lvbnMgcmVnYXJkaW5nIHJlZ3Jlc3Npb24gaXMgdGhhdCBlaXRoZXIgJFgkIG9yICRZJCBuZWVkIHRvIGJlIG5vcm1hbGx5IGRpc3RyaWJ1dGVkLiBGb3IgdGhlIHRoZSBjYXNlIG9mICRYJCwgaXQncyBlYXNpZXIgdG8gDQpmaWd1cmUgb3V0IHdoeSB0aGF0IHNob3VsZG4ndCBiZSB0aGUgY2FzZSAob3RoZXIgdGhhbiBpdCdzIG5vdCBhIG5lY2Vzc2FyeSBhc3N1bXB0aW9uIGZvciBPTFMpLiBZb3UgY2FuJ3QgaGF2ZSBmYWN0b3IgdmFyaWFibGVzIChzZXgsIHNvY2lhbCBjbGFzcywgZXRjKSBpZiAkWCQgbmVlZGVkIHRvIGJlIG5vcm1hbGx5IGRpc3RyaWJ1dGVkLiANCg0KSXQncyBzbGlnaHRseSBsZXNzIGNsZWFyIGZyb20gYW4gYXBwbGljYXRpb24gc3RhbmQgcG9pbnQgd2h5IHRoZSAkWSQgZG9lc24ndCBoYXZlIHRvIGJlIG5vcm1hbGx5IGRpc3RyaWJ1dGVkLiBXZSBrbm93IHRoYXQgb25lIG9mIHRoZSBjb21tb25seSBtZW50aW9uZWQgcmVncmVzc2lvbiBhc3N1bXB0aW9ucyBpcyB0aGF0IHRoZSBlcnJvciB0ZXJtcyBvZiBvdXIgbW9kZWwgbmVlZCB0byBiZSBub3JtYWxseSBkaXN0cmlidXRlZC4gVGhlIGFzc3VtcHRpb25zIGlzbid0IG5lY2Vzc2FyeSBmb3IgdGhlIE9MUyBlc3RpbWF0b3IgdG8gYmUgQkxVRSwgaG93ZXZlciBpdCdzIGltcG9ydGFudCB3aGVuIGNvbnN0cnVjdGluZyB0aGUgY29uZmlkZW5jZSBpbnRlcnZhbHMgb2Ygb3VyIHJlZ3Jlc3Npb24gY29lZmZpY2llbnRzLiANCg0KTm93IHRoZSBxdWVzdGlvbiB0aGF0IG5ldmVyIGNyb3NzZWQgbXkgbWluZCB3YXMgdGhpczogKipJZiB0aGUgZXJyb3IgdGVybXMgb2YgJFkkIGFyZSBub3JtYWxseSBkaXN0cmlidXRlZCBpcyB0aGUgJFkkIG5lY2Vzc2FybHkgbm9ybWFsbHkgZGlzdHJpYnV0ZWQ/KiogDQoNCkZvcnR1bmF0ZWx5LCBtb3JlIGN1cmlvdXMgcGVvcGxlIGFza2VkIHRoYXQgcXVlc3Rpb24uIFRoZSBhbnN3ZXIgaXMgbm8uIA0KDQpUaGUgZXhhbXBsZSBiZWxvdyB1c2VzIGEgJFkkIHdpdGggYSBiaW1vZGFsIGRpc3RyaWJ1dGlvbiAoaGVuY2Ugbm90IG5vcm1hbCkgdG8gc2hvd2Nhc2UgaG93IHlvdSBjYW4gaGF2ZSBub24gbm9ybWFsbHkgZGlzdHJpYnV0ZWQgWSB3aXRoIG5vcm1hbGx5IGRpc3RyaWJ1dGVkIHJlc2lkdWFscy4gICANCg0KIyNTaW11bGF0aW9uIyMNCg0KTGV0J3Mgc3RhcnQgYnkgc2ltdWxhdGluZyAxMDAwMCBvYnNlcnZhdGlvbnMgZnJvbSBhIGJpbm9taWFsIGRpc3RyaWJ1dGlvbiAoaW4gdGhlIGNhc2UgYmVsb3csIGEgYmVybm91bGxpIGRpc3RyaWJ1dGlvbikgDQoNCg0KYGBge3J9DQpzZXQuc2VlZCgxOTk0KQ0KeGkgPC0gcmJpbm9tKDEwMDAwLCAxLCAuNSkNCmBgYA0KDQpXZSBjcmVhdGUgYSAkeV9pJCwgb3VyIGRlcGVuZGFudCB2YXJpYWJsZSwgZnJvbSAkeF9pJA0KDQpgYGB7cn0NCnlpIDwtIDAgKyA1ICogeGkgKyBybm9ybSgxMDAwMCwgLjcpDQoNCmBgYA0KDQpMZXQncyBwbG90IGEgaGlzdG9ncmFtIG9mICR5X2kkIHRvIHZpc3VhbGx5IGluc3BlY3QgdGhlIGRpc3RyaWJ1dGlvbiBvZiAkeV9pJA0KDQpgYGB7cn0NCmhpc3QoeWksIGJyZWFrcz0yMCkNCmBgYA0KDQpBcyB3ZSBjYW4gc2VlIGZyb20gdGhlIGhpc3RvZ3JhbSBhYm92ZSwgJHlfaSQgaGFzIGEgYmltb2RhbCBkaXN0cmlidXRpb24gKHNpbmNlIG91ciAkeF9pJCBpcyBiaW5vbWlhbCwgZGljaG90b21vdXMpLiBXZSBoYXZlIGFjaGlldmVkIG91ciBnb2FsIG9mIGNyZWF0aW5nIGEgJHlfaSQgdGhhdCdzIG5vdCBub3JtYWxseSBkaXN0cmlidXRlZC4gDQoNCldlIGNhbiBhbHNvIGxvb2sgYXQgdGhlIHFxIHBsb3Qgb2YgdGhlICR5X2kkIHRvIGRvdWJsZSBjaGVjayB0aGUgZGlzdHJpYnV0aW9uIGFnYWluLg0KDQpgYGB7cn0NCnFxbm9ybSh5aSkNCmBgYA0KDQpGcm9tIHRoZSBRUSBwbG90IGFib3ZlIHdlIGNhbiBzZWUgdGhhdCB0aGUgJHlfaSQgZGVwZW5kYW50IHZhcmlhYmxlIGlzIG5vdCBub3JtYWxseSBkaXN0cmlidXRlZC4gDQoNCiMjRml0dGluZyBPdXIgTW9kZWwjIw0KDQpXZSB3aWxsIG5vdyBmaXQgYSByZWdyZXNzaW9uIG9mICR5X2kkIG9uICR4X2kkDQoNCmBgYHtyfQ0KbW9kZWwgPC0gbG0oeWl+eGkpDQpgYGANCg0KTGV0J3MgY2hlY2sgdGhlIGRpYWdub3N0aWMgcGxvdHMgb2Ygb3VyIHJlZ3Jlc3Npb24NCg0KYGBge3J9DQpwbG90KG1vZGVsKQ0KYGBgDQoNCg0KVGhlIHJlc2lkdWFscyBwbG90IHdlIGdldCBpcyBhIGNvbnNlcXVlbmNlIG9mIHVzaW5nIGEgYmltb2RhbCAkeF9pJC4gQnV0IG91ciBmb2N1cyBpcyBub3Qgb24gdGhlIHJlc2lkdWFscyBwbG90IGJ1dCBvbiB0aGUgZGlzdHJpYnV0aW9uIG9mIHRoZSByZXNpZHVhbHMgd2hpY2ggd2UgY2FuIHNlZSBpbiB0aGUgMm5kIHBsb3QgKFFRIHBsb3QpLiBMZXQncyBqdXN0IGZvY3VzIG9uIHRoYXQgb25lIGJlbG93DQoNCmBgYHtyfQ0KcXFub3JtKHJlc2lkKG1vZGVsKSkNCmBgYA0KDQpBcyB3ZSBzZWUgb3VyIHJlc2lkdWFscyBhcmUgbm9ybWFsbHkgZGlzdHJpYnV0ZWQgZXZlbiB0aG91Z2ggb3VyICR5X2kkIGRlcGVuZGFudCB2YXJpYWJsZSB3YXNuJ3QuIFdlIGNhbiBjaGVjayB0aGUgaGlzdG9ncmFtIG9mIG91ciByZXNpZHVhbHMgYXMgd2VsbCB0byBkb3VibGUgY2hlY2sgDQoNCmBgYHtyfQ0KaGlzdChyZXNpZChtb2RlbCksYnJlYWtzPTIwKQ0KYGBgDQoNCiMjQ29uY2x1c2lvbiMjIA0KDQpXZSBoYXZlIHNob3duIGluIHRoaXMgcG9zdCB0aGF0IG5vcm1hbGx5IGRpc3RyaWJ1dGVkIGVycm9ycyBkb24ndCByZXF1aXJlIG9yIG9yaWdpbmF0ZSBmcm9tIG5vcm1hbGx5IGRpc3RyaWJ1dGVkIGRlcGVuZGFudCB2YXJpYWJsZXMuIFdlIHVzZWQgdGhlIGNhc2Ugb2YgYSBkZXBlbmRhbnQgdmFyaWFibGUgd2l0aCBhIGJpbW9kYWwgZGlzdHJpYnV0aW9uIGFuZCBmb3VuZCB0aGF0IGl0J3MgZXJyb3IgdGVybXMgYXJlIG5vcm1hbGx5IGRpc3RyaWJ1dGVkLiANCg0KDQojI1JlZmVyZW5jZTojIw0KDQpodHRwOi8vd3d3LnByb2dyYW1taW5nci5jb20vZXhhbXBsZXMvbmVhdC10cmlja3Mvc2FtcGxlLXItZnVuY3Rpb24vci1yYmlub20vIFNpbXVsYXRpbmcgQmlub21pYWwgYW5kIEJlcm5vdWxsaSBkaXN0cmlidXRpb25zIGluIFIgDQoNCmh0dHBzOi8vc3RhdHMuc3RhY2tleGNoYW5nZS5jb20vcXVlc3Rpb25zLzExMzUxL2xlZnQtc2tld2VkLXZzLXN5bW1ldHJpYy1kaXN0cmlidXRpb24tb2JzZXJ2ZWQvMTEzNTIjMTEzNTIgVGhlIGNvZGUgYWJvdmUgaXMgaW5zcGlyZWQgZnJvbSB0aGlzIHN0YWNrZXhjaGFuZ2UgcG9zdCANCg0KaHR0cHM6Ly9zdGF0cy5zdGFja2V4Y2hhbmdlLmNvbS9xdWVzdGlvbnMvMTIyNjIvd2hhdC1pZi1yZXNpZHVhbHMtYXJlLW5vcm1hbGx5LWRpc3RyaWJ1dGVkLWJ1dC15LWlzLW5vdCBBbm90aGVyIGV4YW1wbGUgdXNpbmcgYSBtdWx0aW1vZGFsIGRpc3RyaWJ1dGlvbiBvZiBZDQoNCg0KDQoNCg==