Effect of right-skew on qq-plot

Let’s create a data set that is clearly skewed right. We’ll name our variable right, and put 500 observations in it using random draws from the exponential distribution (rate = 2).

right = rexp(500, rate = 2)

The histogram shows this data set appears to have been drawn from an approximately normal distribution with massive skew to the right.

histogram(right, width = .25)

The qq-plot is clearly curved with the middle portion below the 45-degree line and the ends above it.

qqmath(right)

Effect of left-skew on qq-plot

Let’s do the same thing to create a left-skewed data set. However, to keep the data in the positive range, let’s take 5 minus the randomized values from the exponential distribution.

left = 5 - rexp(500, rate = 2)

Let’s display a histogram to see that it could be thought of as having been drawn from an approximately normal distribution with massive skew to the right.

histogram(left, width = .25)

The qq-plot is clearly curved with the middle portion above the 45-degree line and the ends below it.

qqmath(left)

Effect of uniform distribution on qq-plot

Now we need to create a uniform distribution, so we can simply use the randomization function which generates real numbers between 0 and 1. To convert to a similar range as above, we’ll multiply each random value by 5.

uni = runif(500) * 5

Let’s display a histogram to see that it could be thought of as having been drawn from an approximately normal distribution with massive skew to the right.

histogram(uni, width = .25)

The qq-plot has a clear S-shape.

qqmath(uni)

Effect of heteroskedasticity on qq-plot

It takes a bit of work to create a distribution where the variance increase as we move to the right along the \(x\)-axis. Notice that it’s actually easier to detect the heteroskedasticity in the fitted values plot.

x = runif(50, min = 0 , max = 5)
y = 5 + 10 * x
yHat = 5 + 10 * x + rnorm(50) * 5 * x
mod = lm (yHat ~ x)
plot(mod, which = 1)

The qq-plot has a very odd look at the extremes, but the diagnostic plot that will be more likely to show a problem is the fitted values plot.

res = y - yHat
qqmath(res)

LS0tDQp0aXRsZTogIlFRLVBsb3RzIg0Kb3V0cHV0OiBodG1sX25vdGVib29rDQotLS0NCg0KIyBFZmZlY3Qgb2YgcmlnaHQtc2tldyBvbiBxcS1wbG90DQoNCkxldCdzIGNyZWF0ZSBhIGRhdGEgc2V0IHRoYXQgaXMgY2xlYXJseSBza2V3ZWQgcmlnaHQuIFdlJ2xsIG5hbWUgb3VyIHZhcmlhYmxlICoqcmlnaHQqKiwgYW5kIHB1dCA1MDAgb2JzZXJ2YXRpb25zIGluIGl0IHVzaW5nIHJhbmRvbSBkcmF3cyBmcm9tIHRoZSBleHBvbmVudGlhbCBkaXN0cmlidXRpb24gKHJhdGUgPSAyKS4NCg0KYGBge3J9DQpyaWdodCA9IHJleHAoNTAwLCByYXRlID0gMikNCmBgYA0KDQpUaGUgaGlzdG9ncmFtIHNob3dzIHRoaXMgZGF0YSBzZXQgYXBwZWFycyB0byBoYXZlIGJlZW4gZHJhd24gZnJvbSBhbiBhcHByb3hpbWF0ZWx5IG5vcm1hbCBkaXN0cmlidXRpb24gd2l0aCBtYXNzaXZlIHNrZXcgdG8gdGhlIHJpZ2h0Lg0KDQpgYGB7cn0NCmhpc3RvZ3JhbShyaWdodCwgd2lkdGggPSAuMjUpDQpgYGANCg0KVGhlIHFxLXBsb3QgaXMgY2xlYXJseSBjdXJ2ZWQgd2l0aCB0aGUgbWlkZGxlIHBvcnRpb24gYmVsb3cgdGhlIDQ1LWRlZ3JlZSBsaW5lIGFuZCB0aGUgZW5kcyBhYm92ZSBpdC4NCg0KYGBge3J9DQpxcW1hdGgocmlnaHQpDQpgYGANCg0KDQojIEVmZmVjdCBvZiBsZWZ0LXNrZXcgb24gcXEtcGxvdA0KDQpMZXQncyBkbyB0aGUgc2FtZSB0aGluZyB0byBjcmVhdGUgYSBsZWZ0LXNrZXdlZCBkYXRhIHNldC4gSG93ZXZlciwgdG8ga2VlcCB0aGUgZGF0YSBpbiB0aGUgcG9zaXRpdmUgcmFuZ2UsIGxldCdzIHRha2UgNSBtaW51cyB0aGUgcmFuZG9taXplZCB2YWx1ZXMgZnJvbSB0aGUgZXhwb25lbnRpYWwgZGlzdHJpYnV0aW9uLg0KDQpgYGB7cn0NCmxlZnQgPSA1IC0gcmV4cCg1MDAsIHJhdGUgPSAyKQ0KYGBgDQoNCkxldCdzIGRpc3BsYXkgYSBoaXN0b2dyYW0gdG8gc2VlIHRoYXQgaXQgY291bGQgYmUgdGhvdWdodCBvZiBhcyBoYXZpbmcgYmVlbiBkcmF3biBmcm9tIGFuIGFwcHJveGltYXRlbHkgbm9ybWFsIGRpc3RyaWJ1dGlvbiB3aXRoIG1hc3NpdmUgc2tldyB0byB0aGUgcmlnaHQuDQoNCmBgYHtyfQ0KaGlzdG9ncmFtKGxlZnQsIHdpZHRoID0gLjI1KQ0KYGBgDQoNClRoZSBxcS1wbG90IGlzIGNsZWFybHkgY3VydmVkIHdpdGggdGhlIG1pZGRsZSBwb3J0aW9uIGFib3ZlIHRoZSA0NS1kZWdyZWUgbGluZSBhbmQgdGhlIGVuZHMgYmVsb3cgaXQuDQoNCmBgYHtyfQ0KcXFtYXRoKGxlZnQsIHR5cGUgPSBjKCJwIiwiciIpKQ0KYGBgDQoNCg0KIyBFZmZlY3Qgb2YgdW5pZm9ybSBkaXN0cmlidXRpb24gb24gcXEtcGxvdA0KDQpOb3cgd2UgbmVlZCB0byBjcmVhdGUgYSB1bmlmb3JtIGRpc3RyaWJ1dGlvbiwgc28gd2UgY2FuIHNpbXBseSB1c2UgdGhlIHJhbmRvbWl6YXRpb24gZnVuY3Rpb24gd2hpY2ggZ2VuZXJhdGVzIHJlYWwgbnVtYmVycyBiZXR3ZWVuIDAgYW5kIDEuIFRvIGNvbnZlcnQgdG8gYSBzaW1pbGFyIHJhbmdlIGFzIGFib3ZlLCB3ZSdsbCBtdWx0aXBseSBlYWNoIHJhbmRvbSB2YWx1ZSBieSA1LiANCg0KYGBge3J9DQp1bmkgPSBydW5pZig1MDApICogNQ0KYGBgDQoNCkxldCdzIGRpc3BsYXkgYSBoaXN0b2dyYW0gdG8gc2VlIHRoYXQgaXQgY291bGQgYmUgdGhvdWdodCBvZiBhcyBoYXZpbmcgYmVlbiBkcmF3biBmcm9tIGFuIGFwcHJveGltYXRlbHkgbm9ybWFsIGRpc3RyaWJ1dGlvbiB3aXRoIG1hc3NpdmUgc2tldyB0byB0aGUgcmlnaHQuDQoNCmBgYHtyfQ0KaGlzdG9ncmFtKHVuaSwgd2lkdGggPSAuMjUpDQpgYGANCg0KVGhlIHFxLXBsb3QgaGFzIGEgY2xlYXIgUy1zaGFwZS4NCg0KYGBge3J9DQpxcW1hdGgodW5pKQ0KYGBgDQoNCg0KDQojIEVmZmVjdCBvZiBoZXRlcm9za2VkYXN0aWNpdHkgb24gcXEtcGxvdA0KDQpJdCB0YWtlcyBhIGJpdCBvZiB3b3JrIHRvIGNyZWF0ZSBhIGRpc3RyaWJ1dGlvbiB3aGVyZSB0aGUgdmFyaWFuY2UgaW5jcmVhc2UgYXMgd2UgbW92ZSB0byB0aGUgcmlnaHQgYWxvbmcgdGhlICR4JC1heGlzLiBOb3RpY2UgdGhhdCBpdCdzIGFjdHVhbGx5IGVhc2llciB0byBkZXRlY3QgdGhlIGhldGVyb3NrZWRhc3RpY2l0eSBpbiB0aGUgZml0dGVkIHZhbHVlcyBwbG90Lg0KDQpgYGB7cn0NCnggPSBydW5pZig1MCwgbWluID0gMCAsIG1heCA9IDUpDQp5ID0gNSArIDEwICogeA0KeUhhdCA9IDUgKyAxMCAqIHggKyBybm9ybSg1MCkgKiA1ICogeA0KbW9kID0gbG0gKHlIYXQgfiB4KQ0KcGxvdChtb2QsIHdoaWNoID0gMSkNCmBgYA0KDQpUaGUgcXEtcGxvdCBoYXMgYSB2ZXJ5IG9kZCBsb29rIGF0IHRoZSBleHRyZW1lcywgYnV0IHRoZSBkaWFnbm9zdGljIHBsb3QgdGhhdCB3aWxsIGJlIG1vcmUgbGlrZWx5IHRvIHNob3cgYSBwcm9ibGVtIGlzIHRoZSBmaXR0ZWQgdmFsdWVzIHBsb3QuDQoNCmBgYHtyfQ0KcmVzID0geSAtIHlIYXQNCnFxbWF0aChyZXMpDQpgYGANCg0K