1. What is the problem with this plot? How could you improve it?
suppressPackageStartupMessages(library(tidyverse))
package 㤼㸱tidyverse㤼㸲 was built under R version 3.6.3
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_point()

There is overplotting because there are multiple observations for each combination of cty
and hwy
values.
I would improve the plot by using a jitter position adjustment to decrease overplotting.
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_point(position = "jitter")

The relationship between cty
and hwy
is clear even without jittering the points but jittering shows the locations where there are more observations.
2. What parameters to geom_jitter()
control the amount of jittering?
From the geom_jitter() documentation, there are two arguments to jitter:
width
controls the amount of vertical displacement, and
height
controls the amount of horizontal displacement.
The defaults values of width
and height
will introduce noise in both directions. Here is what the plot looks like with the default values of height
and width
.
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_point(position = position_jitter())

However, we can adjust them. Here are few a examples to understand how these parameters affects jittering. When width = 0
there is no horizontal jitter.
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_jitter(width = 0)

When width = 20
, there is too much horizontal jitter.
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_jitter(width = 20)

When height = 0
, there is no vertical jitter.
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_jitter(height = 0)

When height = 15
, there is too much vertical jitter.
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_jitter(height = 15)

When width = 0
and height = 0
, there is neither horizontal or vertical jitter, and the plot produced is identical to the one produced with geom_point()
.
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_jitter(height = 0, width = 0)

Note that the height
and width
arguments are in the units of the data. Thus height = 1
(width = 1
) corresponds to different relative amounts of jittering depending on the scale of the y
(x
) variable. The default values of height and width are defined to be 80% of the resolution()
of the data, which is the smallest non-zero distance between adjacent values of a variable. When x
and y
are discrete variables, their resolutions are both equal to 1, and height = 0.4
and width = 0.4
since the jitter moves points in both positive and negative directions.
3. Compare and contrast geom_jitter()
with geom_count()
.
The geom geom_jitter()
adds random variation to the locations points of the graph. In other words, it “jitters” the locations of points slightly. This method reduces overplotting since two points with the same location are unlikely to have the same random variation.
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_jitter()

However, the reduction in overlapping comes at the cost of slightly changing the x
and y
values of the points.
The geom geom_count()
sizes the points relative to the number of observations. Combinations of (x
, y
) values with more observations will be larger than those with fewer observations.
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_count()

The geom_count()
geom does not change x
and y
coordinates of the points. However, if the points are close together and counts are large, the size of some points can itself create overplotting. For example, in the following example, a third variable mapped to color is added to the plot. In this case, geom_count()
is less readable than geom_jitter()
when adding a third variable as a color aesthetic.
ggplot(data = mpg, mapping = aes(x = cty, y = hwy, color = class)) +
geom_jitter()

ggplot(data = mpg, mapping = aes(x = cty, y = hwy, color = class)) +
geom_count()

As that example shows, unfortunately, there is no universal solution to overplotting. The costs and benefits of different approaches will depend on the structure of the data and the goal of the data scientist.
4. What’s the default position adjustment for geom_boxplot()
? Create a visualization of the mpg dataset that demonstrates it.
The default position for geom_boxplot()
is "dodge2"
, which is a shortcut for position_dodge2. This position adjustment does not change the vertical position of a geom but moves the geom horizontally to avoid overlapping other geoms. See the documentation for position_dodge2() for additional discussion on how it works.
When we add colour = class
to the box plot, the different levels of the drv variable are placed side by side, i.e., dodged.
ggplot(data = mpg, aes(x = drv, y = hwy, colour = class)) +
geom_boxplot()

If position_identity()
is used the boxplots overlap.
ggplot(data = mpg, aes(x = drv, y = hwy, colour = class)) +
geom_boxplot(position = "identity")

LS0tDQp0aXRsZTogIlBvc2l0aW9uIEFkanVzdG1lbnRzIERlbW8iDQpvdXRwdXQ6IA0KICBodG1sX25vdGVib29rOg0KICAgIHRvYzogdHJ1ZQ0KICAgIHRvY19mbG9hdDogdHJ1ZQ0KLS0tDQoNCg0KIyMjIDEuIFdoYXQgaXMgdGhlIHByb2JsZW0gd2l0aCB0aGlzIHBsb3Q/IEhvdyBjb3VsZCB5b3UgaW1wcm92ZSBpdD8NCg0KYGBge3IgaW1wcm92ZX0NCnN1cHByZXNzUGFja2FnZVN0YXJ0dXBNZXNzYWdlcyhsaWJyYXJ5KHRpZHl2ZXJzZSkpDQpnZ3Bsb3QoZGF0YSA9IG1wZywgbWFwcGluZyA9IGFlcyh4ID0gY3R5LCB5ID0gaHd5KSkgKw0KICBnZW9tX3BvaW50KCkNCmBgYA0KDQpUaGVyZSBpcyBvdmVycGxvdHRpbmcgYmVjYXVzZSB0aGVyZSBhcmUgbXVsdGlwbGUgb2JzZXJ2YXRpb25zIGZvciBlYWNoIGNvbWJpbmF0aW9uIG9mIGBjdHlgIGFuZCBgaHd5YCB2YWx1ZXMuDQoNCkkgd291bGQgaW1wcm92ZSB0aGUgcGxvdCBieSB1c2luZyBhIGppdHRlciBwb3NpdGlvbiBhZGp1c3RtZW50IHRvIGRlY3JlYXNlIG92ZXJwbG90dGluZy4NCg0KYGBge3Igaml0dGVyMX0NCmdncGxvdChkYXRhID0gbXBnLCBtYXBwaW5nID0gYWVzKHggPSBjdHksIHkgPSBod3kpKSArDQogIGdlb21fcG9pbnQocG9zaXRpb24gPSAiaml0dGVyIikNCmBgYA0KDQpUaGUgcmVsYXRpb25zaGlwIGJldHdlZW4gYGN0eWAgYW5kIGBod3lgIGlzIGNsZWFyIGV2ZW4gd2l0aG91dCBqaXR0ZXJpbmcgdGhlIHBvaW50cyBidXQgaml0dGVyaW5nIHNob3dzIHRoZSBsb2NhdGlvbnMgd2hlcmUgdGhlcmUgYXJlIG1vcmUgb2JzZXJ2YXRpb25zLg0KDQojIyMgMi4gV2hhdCBwYXJhbWV0ZXJzIHRvIGBnZW9tX2ppdHRlcigpYCBjb250cm9sIHRoZSBhbW91bnQgb2Ygaml0dGVyaW5nPw0KDQpGcm9tIHRoZSBbZ2VvbV9qaXR0ZXIoKV0oaHR0cHM6Ly9nZ3Bsb3QyLnRpZHl2ZXJzZS5vcmcvcmVmZXJlbmNlL2dlb21faml0dGVyLmh0bWwpIGRvY3VtZW50YXRpb24sIHRoZXJlIGFyZSB0d28gYXJndW1lbnRzIHRvIGppdHRlcjoNCg0KIC0gYHdpZHRoYCBjb250cm9scyB0aGUgYW1vdW50IG9mIHZlcnRpY2FsIGRpc3BsYWNlbWVudCwgYW5kDQogLSBgaGVpZ2h0YCBjb250cm9scyB0aGUgYW1vdW50IG9mIGhvcml6b250YWwgZGlzcGxhY2VtZW50Lg0KIA0KVGhlIGRlZmF1bHRzIHZhbHVlcyBvZiBgd2lkdGhgIGFuZCBgaGVpZ2h0YCB3aWxsIGludHJvZHVjZSBub2lzZSBpbiBib3RoIGRpcmVjdGlvbnMuIEhlcmUgaXMgd2hhdCB0aGUgcGxvdCBsb29rcyBsaWtlIHdpdGggdGhlIGRlZmF1bHQgdmFsdWVzIG9mIGBoZWlnaHRgIGFuZCBgd2lkdGhgLg0KDQpgYGB7ciBub2lzZX0NCmdncGxvdChkYXRhID0gbXBnLCBtYXBwaW5nID0gYWVzKHggPSBjdHksIHkgPSBod3kpKSArDQogIGdlb21fcG9pbnQocG9zaXRpb24gPSBwb3NpdGlvbl9qaXR0ZXIoKSkNCmBgYA0KDQpIb3dldmVyLCB3ZSBjYW4gYWRqdXN0IHRoZW0uIEhlcmUgYXJlIGZldyBhIGV4YW1wbGVzIHRvIHVuZGVyc3RhbmQgaG93IHRoZXNlIHBhcmFtZXRlcnMgYWZmZWN0cyBqaXR0ZXJpbmcuIFdoZW4gYHdpZHRoID0gMGAgdGhlcmUgaXMgbm8gaG9yaXpvbnRhbCBqaXR0ZXIuDQoNCmBgYHtyIG5vaG9yaXp9DQpnZ3Bsb3QoZGF0YSA9IG1wZywgbWFwcGluZyA9IGFlcyh4ID0gY3R5LCB5ID0gaHd5KSkgKw0KICBnZW9tX2ppdHRlcih3aWR0aCA9IDApDQpgYGANCg0KV2hlbiBgd2lkdGggPSAyMGAsIHRoZXJlIGlzIHRvbyBtdWNoIGhvcml6b250YWwgaml0dGVyLg0KDQpgYGB7ciB0b29tdWNofQ0KZ2dwbG90KGRhdGEgPSBtcGcsIG1hcHBpbmcgPSBhZXMoeCA9IGN0eSwgeSA9IGh3eSkpICsNCiAgZ2VvbV9qaXR0ZXIod2lkdGggPSAyMCkNCmBgYA0KDQpXaGVuIGBoZWlnaHQgPSAwYCwgdGhlcmUgaXMgbm8gdmVydGljYWwgaml0dGVyLg0KDQpgYGB7ciBub3ZlcnR9DQpnZ3Bsb3QoZGF0YSA9IG1wZywgbWFwcGluZyA9IGFlcyh4ID0gY3R5LCB5ID0gaHd5KSkgKw0KICBnZW9tX2ppdHRlcihoZWlnaHQgPSAwKQ0KYGBgDQoNCldoZW4gYGhlaWdodCA9IDE1YCwgdGhlcmUgaXMgdG9vIG11Y2ggdmVydGljYWwgaml0dGVyLg0KDQpgYGB7ciB0b29tdWNodmVydH0NCmdncGxvdChkYXRhID0gbXBnLCBtYXBwaW5nID0gYWVzKHggPSBjdHksIHkgPSBod3kpKSArDQogIGdlb21faml0dGVyKGhlaWdodCA9IDE1KQ0KYGBgDQoNCldoZW4gYHdpZHRoID0gMGAgYW5kIGBoZWlnaHQgPSAwYCwgdGhlcmUgaXMgbmVpdGhlciBob3Jpem9udGFsIG9yIHZlcnRpY2FsIGppdHRlciwgYW5kIHRoZSBwbG90IHByb2R1Y2VkIGlzIGlkZW50aWNhbCB0byB0aGUgb25lIHByb2R1Y2VkIHdpdGggYGdlb21fcG9pbnQoKWAuDQoNCmBgYHtyIHNhbWVhc2dlb21wdH0NCmdncGxvdChkYXRhID0gbXBnLCBtYXBwaW5nID0gYWVzKHggPSBjdHksIHkgPSBod3kpKSArDQogIGdlb21faml0dGVyKGhlaWdodCA9IDAsIHdpZHRoID0gMCkNCmBgYA0KDQpOb3RlIHRoYXQgdGhlIGBoZWlnaHRgIGFuZCBgd2lkdGhgIGFyZ3VtZW50cyBhcmUgaW4gdGhlIHVuaXRzIG9mIHRoZSBkYXRhLiBUaHVzIGBoZWlnaHQgPSAxYCAoYHdpZHRoID0gMWApIGNvcnJlc3BvbmRzIHRvIGRpZmZlcmVudCByZWxhdGl2ZSBhbW91bnRzIG9mIGppdHRlcmluZyBkZXBlbmRpbmcgb24gdGhlIHNjYWxlIG9mIHRoZSBgeWAgKGB4YCkgdmFyaWFibGUuIFRoZSBkZWZhdWx0IHZhbHVlcyBvZiBoZWlnaHQgYW5kIHdpZHRoIGFyZSBkZWZpbmVkIHRvIGJlIDgwJSBvZiB0aGUgYHJlc29sdXRpb24oKWAgb2YgdGhlIGRhdGEsIHdoaWNoIGlzIHRoZSBzbWFsbGVzdCBub24temVybyBkaXN0YW5jZSBiZXR3ZWVuIGFkamFjZW50IHZhbHVlcyBvZiBhIHZhcmlhYmxlLiBXaGVuIGB4YCBhbmQgYHlgIGFyZSBkaXNjcmV0ZSB2YXJpYWJsZXMsIHRoZWlyIHJlc29sdXRpb25zIGFyZSBib3RoIGVxdWFsIHRvIDEsIGFuZCBgaGVpZ2h0ID0gMC40YCBhbmQgYHdpZHRoID0gMC40YCBzaW5jZSB0aGUgaml0dGVyIG1vdmVzIHBvaW50cyBpbiBib3RoIHBvc2l0aXZlIGFuZCBuZWdhdGl2ZSBkaXJlY3Rpb25zLg0KDQoNCiMjIyAzLiBDb21wYXJlIGFuZCBjb250cmFzdCBgZ2VvbV9qaXR0ZXIoKWAgd2l0aCBgZ2VvbV9jb3VudCgpYC4NCg0KVGhlIGdlb20gYGdlb21faml0dGVyKClgIGFkZHMgcmFuZG9tIHZhcmlhdGlvbiB0byB0aGUgbG9jYXRpb25zIHBvaW50cyBvZiB0aGUgZ3JhcGguIEluIG90aGVyIHdvcmRzLCBpdCDigJxqaXR0ZXJz4oCdIHRoZSBsb2NhdGlvbnMgb2YgcG9pbnRzIHNsaWdodGx5LiBUaGlzIG1ldGhvZCByZWR1Y2VzIG92ZXJwbG90dGluZyBzaW5jZSB0d28gcG9pbnRzIHdpdGggdGhlIHNhbWUgbG9jYXRpb24gYXJlIHVubGlrZWx5IHRvIGhhdmUgdGhlIHNhbWUgcmFuZG9tIHZhcmlhdGlvbi4NCg0KYGBge3Igaml0dGVyfQ0KZ2dwbG90KGRhdGEgPSBtcGcsIG1hcHBpbmcgPSBhZXMoeCA9IGN0eSwgeSA9IGh3eSkpICsNCiAgZ2VvbV9qaXR0ZXIoKQ0KYGBgDQoNCkhvd2V2ZXIsIHRoZSByZWR1Y3Rpb24gaW4gb3ZlcmxhcHBpbmcgY29tZXMgYXQgdGhlIGNvc3Qgb2Ygc2xpZ2h0bHkgY2hhbmdpbmcgdGhlIGB4YCBhbmQgYHlgIHZhbHVlcyBvZiB0aGUgcG9pbnRzLg0KDQpUaGUgZ2VvbSBgZ2VvbV9jb3VudCgpYCBzaXplcyB0aGUgcG9pbnRzIHJlbGF0aXZlIHRvIHRoZSBudW1iZXIgb2Ygb2JzZXJ2YXRpb25zLiBDb21iaW5hdGlvbnMgb2YgKGB4YCwgYHlgKSB2YWx1ZXMgd2l0aCBtb3JlIG9ic2VydmF0aW9ucyB3aWxsIGJlIGxhcmdlciB0aGFuIHRob3NlIHdpdGggZmV3ZXIgb2JzZXJ2YXRpb25zLg0KDQpgYGB7ciBnZW9tY291bnR9DQpnZ3Bsb3QoZGF0YSA9IG1wZywgbWFwcGluZyA9IGFlcyh4ID0gY3R5LCB5ID0gaHd5KSkgKw0KICBnZW9tX2NvdW50KCkNCmBgYA0KDQpUaGUgYGdlb21fY291bnQoKWAgZ2VvbSBkb2VzIG5vdCBjaGFuZ2UgYHhgIGFuZCBgeWAgY29vcmRpbmF0ZXMgb2YgdGhlIHBvaW50cy4gSG93ZXZlciwgaWYgdGhlIHBvaW50cyBhcmUgY2xvc2UgdG9nZXRoZXIgYW5kIGNvdW50cyBhcmUgbGFyZ2UsIHRoZSBzaXplIG9mIHNvbWUgcG9pbnRzIGNhbiBpdHNlbGYgY3JlYXRlIG92ZXJwbG90dGluZy4gRm9yIGV4YW1wbGUsIGluIHRoZSBmb2xsb3dpbmcgZXhhbXBsZSwgYSB0aGlyZCB2YXJpYWJsZSBtYXBwZWQgdG8gY29sb3IgaXMgYWRkZWQgdG8gdGhlIHBsb3QuIEluIHRoaXMgY2FzZSwgYGdlb21fY291bnQoKWAgaXMgbGVzcyByZWFkYWJsZSB0aGFuIGBnZW9tX2ppdHRlcigpYCB3aGVuIGFkZGluZyBhIHRoaXJkIHZhcmlhYmxlIGFzIGEgY29sb3IgYWVzdGhldGljLg0KDQpgYGB7ciB0aGlyZGFlc30NCmdncGxvdChkYXRhID0gbXBnLCBtYXBwaW5nID0gYWVzKHggPSBjdHksIHkgPSBod3ksIGNvbG9yID0gY2xhc3MpKSArDQogIGdlb21faml0dGVyKCkNCg0KDQpnZ3Bsb3QoZGF0YSA9IG1wZywgbWFwcGluZyA9IGFlcyh4ID0gY3R5LCB5ID0gaHd5LCBjb2xvciA9IGNsYXNzKSkgKw0KICBnZW9tX2NvdW50KCkNCmBgYA0KDQpBcyB0aGF0IGV4YW1wbGUgc2hvd3MsIHVuZm9ydHVuYXRlbHksIHRoZXJlIGlzIG5vIHVuaXZlcnNhbCBzb2x1dGlvbiB0byBvdmVycGxvdHRpbmcuIFRoZSBjb3N0cyBhbmQgYmVuZWZpdHMgb2YgZGlmZmVyZW50IGFwcHJvYWNoZXMgd2lsbCBkZXBlbmQgb24gdGhlIHN0cnVjdHVyZSBvZiB0aGUgZGF0YSBhbmQgdGhlIGdvYWwgb2YgdGhlIGRhdGEgc2NpZW50aXN0Lg0KDQojIyMgNC4gV2hhdOKAmXMgdGhlIGRlZmF1bHQgcG9zaXRpb24gYWRqdXN0bWVudCBmb3IgYGdlb21fYm94cGxvdCgpYD8gQ3JlYXRlIGEgdmlzdWFsaXphdGlvbiBvZiB0aGUgbXBnIGRhdGFzZXQgdGhhdCBkZW1vbnN0cmF0ZXMgaXQuDQoNClRoZSBkZWZhdWx0IHBvc2l0aW9uIGZvciBgZ2VvbV9ib3hwbG90KClgIGlzIGAiZG9kZ2UyImAsIHdoaWNoIGlzIGEgc2hvcnRjdXQgZm9yIHBvc2l0aW9uX2RvZGdlMi4gVGhpcyBwb3NpdGlvbiBhZGp1c3RtZW50IGRvZXMgbm90IGNoYW5nZSB0aGUgdmVydGljYWwgcG9zaXRpb24gb2YgYSBnZW9tIGJ1dCBtb3ZlcyB0aGUgZ2VvbSBob3Jpem9udGFsbHkgdG8gYXZvaWQgb3ZlcmxhcHBpbmcgb3RoZXIgZ2VvbXMuIFNlZSB0aGUgZG9jdW1lbnRhdGlvbiBmb3IgW3Bvc2l0aW9uX2RvZGdlMigpXShodHRwczovL2dncGxvdDIudGlkeXZlcnNlLm9yZy9yZWZlcmVuY2UvcG9zaXRpb25fZG9kZ2UuaHRtbCkgZm9yIGFkZGl0aW9uYWwgZGlzY3Vzc2lvbiBvbiBob3cgaXQgd29ya3MuDQoNCldoZW4gd2UgYWRkIGBjb2xvdXIgPSBjbGFzc2AgdG8gdGhlIGJveCBwbG90LCB0aGUgZGlmZmVyZW50IGxldmVscyBvZiB0aGUgZHJ2IHZhcmlhYmxlIGFyZSBwbGFjZWQgc2lkZSBieSBzaWRlLCBpLmUuLCBkb2RnZWQuDQoNCmBgYHtyIGRvZGdlZH0NCmdncGxvdChkYXRhID0gbXBnLCBhZXMoeCA9IGRydiwgeSA9IGh3eSwgY29sb3VyID0gY2xhc3MpKSArDQogIGdlb21fYm94cGxvdCgpDQpgYGANCg0KSWYgYHBvc2l0aW9uX2lkZW50aXR5KClgIGlzIHVzZWQgdGhlIGJveHBsb3RzIG92ZXJsYXAuDQoNCmBgYHtyIG92ZXJsYXB9DQpnZ3Bsb3QoZGF0YSA9IG1wZywgYWVzKHggPSBkcnYsIHkgPSBod3ksIGNvbG91ciA9IGNsYXNzKSkgKw0KICBnZW9tX2JveHBsb3QocG9zaXRpb24gPSAiaWRlbnRpdHkiKQ0KYGBg