In this chapter we’ll get you into the right frame of mind for developing meaningful visualizations with R. You’ll understand that as a communications tool, visualizations require you to think about your audience first. You’ll also be introduced to the basics of ggplot2 - the 7 different grammatical elements (layers) and aesthetic mappings.

Explore and Explain

In this video we made the distinction between plots for exploring and plots for explaining data. Which of the following are exploratory plots typically NOT?

ANSWER THE QUESTION 35 XP Possible Answers Meant for a specialist audience. press 1 Data-heavy. press 2 Pretty. press 3 Rough first drafts. press 4 Part of our data science toolkit as graphical data analysis. press 5

Exactly. You’re not concerned with beautiful at this point. Although, the plots should be meaningful and conform to best practices so that you are not misled!

#Exploring ggplot2, part 1

To get a first feel for ggplot2, let’s try to run some basic ggplot2 commands. Together, they build a plot of the mtcars dataset that contains information about 32 cars from a 1973 Motor Trend magazine. This dataset is small, intuitive, and contains a variety of continuous and categorical variables.

Load the ggplot2 package using library(). It is already installed on DataCamp’s servers. Use str() to explore the structure of the mtcars dataset. Hit Submit Answer. This will execute the example code on the right. See if you can understand what ggplot does with the data.

HINT Use library(ggplot2) to load the ggplot2 package. str(mtcars) will show you the structure of the mtcars dataset. For the third instruction, just hit Submit Answer and try to understand the code.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIExvYWQgdGhlIGdncGxvdDIgcGFja2FnZVxubGlicmFyeShnZ3Bsb3QyKVxuXG4jIEV4cGxvcmUgdGhlIG10Y2FycyBkYXRhIGZyYW1lIHdpdGggc3RyKClcbnN0cihtdGNhcnMpXG5cbiMgRXhlY3V0ZSB0aGUgZm9sbG93aW5nIGNvbW1hbmRcbmdncGxvdChtdGNhcnMsIGFlcyh4ID0gY3lsLCB5ID0gbXBnKSkgK1xuICBnZW9tX3BvaW50KCkiLCJzb2x1dGlvbiI6IiMgTG9hZCB0aGUgZ2dwbG90MiBwYWNrYWdlXG5saWJyYXJ5KGdncGxvdDIpXG5cbiMgRXhwbG9yZSB0aGUgbXRjYXJzIGRhdGEgZnJhbWUgd2l0aCBzdHIoKVxuc3RyKG10Y2FycylcblxuIyBFeGVjdXRlIHRoZSBmb2xsb3dpbmcgY29tbWFuZFxuZ2dwbG90KG10Y2FycywgYWVzKHggPSBjeWwsIHkgPSBtcGcpKSArXG4gIGdlb21fcG9pbnQoKSJ9

Phenomenal plotting! Notice that ggplot2 treats cyl as a continuous variable. You get a plot, but it’s not quite right, because it gives the impression that there is such a thing as a 5 or 7-cylinder car, which there is not.

Exploring ggplot2, part 2

The plot from the previous exercise wasn’t really satisfying. Although cyl (the number of cylinders) is categorical, it is classified as numeric in mtcars. You’ll have to explicitly tell ggplot2 that cyl is a categorical variable.

INSTRUCTIONS 70 XP Change the ggplot() command by wrapping factor() around cyl. Hit Submit Answer and see if the resulting plot is better this time. Show Answer (-70 XP) HINT Change cyl to factor(cyl). Simply hit Submit Answer; this will execute your code and check it for correctness.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIExvYWQgdGhlIGdncGxvdDIgcGFja2FnZVxubGlicmFyeShnZ3Bsb3QyKVxuXG4jIENoYW5nZSB0aGUgY29tbWFuZCBiZWxvdyBzbyB0aGF0IGN5bCBpcyB0cmVhdGVkIGFzIGZhY3RvclxuZ2dwbG90KG10Y2FycywgYWVzKHggPSBjeWwsIHkgPSBtcGcpKSArXG4gIGdlb21fcG9pbnQoKSIsInNvbHV0aW9uIjoiIyBMb2FkIHRoZSBnZ3Bsb3QyIHBhY2thZ2VcbmxpYnJhcnkoZ2dwbG90MilcblxuIyBDaGFuZ2UgdGhlIGNvbW1hbmQgYmVsb3cgc28gdGhhdCBjeWwgaXMgdHJlYXRlZCBhcyBmYWN0b3JcbmdncGxvdChtdGNhcnMsIGFlcyh4ID0gZmFjdG9yKGN5bCksIHkgPSBtcGcpKSArXG4gIGdlb21fcG9pbnQoKSJ9

Stellar scatterplotting! Notice that ggplot2 treats cyl as a factor. This time the x-axis does not contain variables like 5 or 7, only the values that are present in the dataset.

Exploring ggplot2, part 3

We’ll use several datasets throughout the courses to showcase the concepts discussed in the videos. In the previous exercises, you already got to know mtcars. Let’s dive a little deeper to explore the three main topics in this course: The data, aesthetics, and geom layers.

The mtcars dataset contains information about 32 cars from 1973 Motor Trend magazine. This dataset is small, intuitive, and contains a variety of continuous and categorical variables.

You’re encouraged to think about how the examples and concepts we discuss throughout these data viz courses apply to your own data-sets!

INSTRUCTIONS 70 XP ggplot2 has already been loaded for you. Take a look at the first command. It plots the mpg (miles per gallon) against the weight (in thousands of pounds). You don’t have to change anything about this command. In the second call of ggplot() change the color argument in aes() (which stands for aesthetics). The color should be dependent on the displacement of the car engine, found in disp. In the third call of ggplot() change the size argument in aes() (which stands for aesthetics). The size should be dependent on the displacement of the car engine, found in disp. Show Answer (-70 XP) HINT For the first instruction, you don’t have to code anything. Just try to understand the code that’s written for you. The second argument of the second ggplot() should contain a call of the aes() function with argument color set to disp. The second argument of the third ggplot() should contain a call of the aes() function with argument size set to disp.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJsaWJyYXJ5KGdncGxvdDIpXG4jIEEgc2NhdHRlciBwbG90IGhhcyBiZWVuIG1hZGUgZm9yIHlvdVxuZ2dwbG90KG10Y2FycywgYWVzKHggPSB3dCwgeSA9IG1wZykpICtcbiAgZ2VvbV9wb2ludCgpXG5cbiMgUmVwbGFjZSBfX18gd2l0aCB0aGUgY29ycmVjdCBjb2x1bW5cbmdncGxvdChtdGNhcnMsIGFlcyh4ID0gd3QsIHkgPSBtcGcsIGNvbG9yID0gX19fKSkgK1xuICBnZW9tX3BvaW50KClcblxuIyBSZXBsYWNlIF9fXyB3aXRoIHRoZSBjb3JyZWN0IGNvbHVtblxuZ2dwbG90KG10Y2FycywgYWVzKHggPSB3dCwgeSA9IG1wZywgc2l6ZSA9IF9fXykpICtcbiAgZ2VvbV9wb2ludCgpIiwic29sdXRpb24iOiJsaWJyYXJ5KGdncGxvdDIpXG4jIEEgc2NhdHRlciBwbG90IGhhcyBiZWVuIG1hZGUgZm9yIHlvdVxuZ2dwbG90KG10Y2FycywgYWVzKHggPSB3dCwgeSA9IG1wZykpICtcbiAgZ2VvbV9wb2ludCgpXG5cbiMgUmVwbGFjZSBfX18gd2l0aCB0aGUgY29ycmVjdCBjb2x1bW5cbmdncGxvdChtdGNhcnMsIGFlcyh4ID0gd3QsIHkgPSBtcGcsIGNvbG9yID0gZGlzcCkpICtcbiAgZ2VvbV9wb2ludCgpXG5cbiMgUmVwbGFjZSBfX18gd2l0aCB0aGUgY29ycmVjdCBjb2x1bW5cbmdncGxvdChtdGNhcnMsIGFlcyh4ID0gd3QsIHkgPSBtcGcsIHNpemUgPSBkaXNwKSkgK1xuICBnZW9tX3BvaW50KCkifQ==

Legendary! Notice that a legend for the color and size scales was automatically generated.

Understanding Variables

In the previous exercise you saw that disp can be mapped onto a color gradient or onto a continuous size scale.

Another argument of aes() is the shape of the points. There are a finite number of shapes which ggplot() can automatically assign to the points. However, if you try this command in the console to the right:

ggplot(mtcars, aes(x = wt, y = mpg, shape = disp)) + geom_point() It gives an error. What does this mean?

INSTRUCTIONS 50 XP Possible Answers shape is not a defined argument. press 1 shape only makes sense with categorical data, and disp is continuous. press 2 shape only makes sense with continuous data, and disp is categorical. press 3 shape is not a variable in your dataset. press 4 shape has to be defined as a function. press 5

Correct. The error message ‘A continuous variable can not be mapped to shape’, means that shape doesn’t exist on a continuous scale here.

Exploring ggplot2, part 4

The diamonds data frame contains information on the prices and various metrics of 50,000 diamonds. Among the variables included are carat (a measurement of the size of the diamond) and price. For the next exercises, you’ll be using a subset of 1,000 diamonds.

Here you’ll use two common geom layer functions: geom_point() and geom_smooth(). We already saw in the earlier exercises how these are added using the + operator.

INSTRUCTIONS 70 XP Explore the diamonds data frame with the str() function. Use the + operator to add geom_point() to the first ggplot() command. This will tell ggplot2 to draw points on the plot. Use the + operator to add geom_point() and geom_smooth(). These just stack on each other! geom_smooth() will draw a smoothed line over the points. Show Answer (-70 XP) HINT The str() function takes one argument, diamonds. Just paste the + operator after the first ggplot() command, followed by geom_point(). This function does not require an argument. Expand the command you created in the previous instruction by adding geom_smooth() with another +.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJsaWJyYXJ5KGdncGxvdDIpXG4jIEV4cGxvcmUgdGhlIGRpYW1vbmRzIGRhdGEgZnJhbWUgd2l0aCBzdHIoKVxuXG5cbiMgQWRkIGdlb21fcG9pbnQoKSB3aXRoICtcbmdncGxvdChkaWFtb25kcywgYWVzKHggPSBjYXJhdCwgeSA9IHByaWNlKSlcblxuXG4jIEFkZCBnZW9tX3BvaW50KCkgYW5kIGdlb21fc21vb3RoKCkgd2l0aCArXG5nZ3Bsb3QoZGlhbW9uZHMsIGFlcyh4ID0gY2FyYXQsIHkgPSBwcmljZSkpIiwic29sdXRpb24iOiJsaWJyYXJ5KGdncGxvdDIpXG4jIEV4cGxvcmUgdGhlIGRpYW1vbmRzIGRhdGEgZnJhbWUgd2l0aCBzdHIoKVxuc3RyKGRpYW1vbmRzKVxuXG4jIEFkZCBnZW9tX3BvaW50KCkgd2l0aCArXG5nZ3Bsb3QoZGlhbW9uZHMsIGFlcyh4ID0gY2FyYXQsIHkgPSBwcmljZSkpICArXG4gIGdlb21fcG9pbnQoKVxuXG4jIEFkZCBnZW9tX3BvaW50KCkgYW5kIGdlb21fc21vb3RoKCkgd2l0aCArXG5nZ3Bsb3QoZGlhbW9uZHMsIGFlcyh4ID0gY2FyYXQsIHkgPSBwcmljZSkpICtcbiAgZ2VvbV9wb2ludCgpICtcbiAgZ2VvbV9zbW9vdGgoKSJ9

Lovely layering! If you had executed the command without adding a +, it would produce an error message ‘No layers in plot’ because you are missing the third essential layer - the geom layer.

Exploring ggplot2, part 5

The code for the last plot of the previous exercise is available in the script on the right. It builds a scatter plot of the diamonds dataset, with carat on the x-axis and price on the y-axis. geom_smooth() is used to add a smooth line.

With this plot as a starting point, let’s explore some more possibilities of combining geoms.

INSTRUCTIONS 70 XP Plot 2 - Copy and paste plot 1, but show only the smooth line, no points. Plot 3 - Show only the smooth line, but color according to clarity by placing the argument color = clarity in the aes() function of your ggplot() call. Plot 4 - Draw translucent colored points. Copy the ggplot() command from plot 3 (with clarity mapped to color). Remove the smooth layer. Add the points layer back in. Set alpha = 0.4 inside geom_point(). This will make the points 60% transparent/40% visible. Show Answer (-70 XP) HINT You can copy the given ggplot() command. You only want to keep the geom_smooth() line. Again, you only want to keep geom_smooth(). This time add the argument color within aes(), it should be set to the clarity column. For the last command, you should only keep geom_point() and add the argument alpha to it. It should be set to 0.4. The code within aes() should be the same as for the second command.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJsaWJyYXJ5KGdncGxvdDIpXG4jIDEgLSBUaGUgcGxvdCB5b3UgY3JlYXRlZCBpbiB0aGUgcHJldmlvdXMgZXhlcmNpc2VcbmdncGxvdChkaWFtb25kcywgYWVzKHggPSBjYXJhdCwgeSA9IHByaWNlKSkgK1xuICBnZW9tX3BvaW50KCkgK1xuICBnZW9tX3Ntb290aCgpXG5cbiMgMiAtIENvcHkgdGhlIGFib3ZlIGNvbW1hbmQgYnV0IHNob3cgb25seSB0aGUgc21vb3RoIGxpbmVcblxuXG5cbiMgMyAtIENvcHkgdGhlIGFib3ZlIGNvbW1hbmQgYW5kIGFzc2lnbiB0aGUgY29ycmVjdCB2YWx1ZSB0byBjb2wgaW4gYWVzKClcblxuXG5cbiMgNCAtIEtlZXAgdGhlIGNvbG9yIHNldHRpbmdzIGZyb20gcHJldmlvdXMgY29tbWFuZC4gUGxvdCBvbmx5IHRoZSBwb2ludHMgd2l0aCBhcmd1bWVudCBhbHBoYS4iLCJzb2x1dGlvbiI6ImxpYnJhcnkoZ2dwbG90MilcbiMgMSAtIFRoZSBwbG90IHlvdSBjcmVhdGVkIGluIHRoZSBwcmV2aW91cyBleGVyY2lzZVxuZ2dwbG90KGRpYW1vbmRzLCBhZXMoeCA9IGNhcmF0LCB5ID0gcHJpY2UpKSArXG4gIGdlb21fcG9pbnQoKSArXG4gIGdlb21fc21vb3RoKClcblxuIyAyIC0gQ29weSB0aGUgYWJvdmUgY29tbWFuZCBidXQgc2hvdyBvbmx5IHRoZSBzbW9vdGggbGluZVxuZ2dwbG90KGRpYW1vbmRzLCBhZXMoeCA9IGNhcmF0LCB5ID0gcHJpY2UpKSArXG4gIGdlb21fc21vb3RoKClcblxuIyAzIC0gQ29weSB0aGUgYWJvdmUgY29tbWFuZCBhbmQgYXNzaWduIHRoZSBjb3JyZWN0IHZhbHVlIHRvIGNvbCBpbiBhZXMoKVxuZ2dwbG90KGRpYW1vbmRzLCBhZXMoeCA9IGNhcmF0LCB5ID0gcHJpY2UsIGNvbG9yID0gY2xhcml0eSkpICtcbiAgZ2VvbV9zbW9vdGgoKVxuXG4jIDQgLSBLZWVwIHRoZSBjb2xvciBzZXR0aW5ncyBmcm9tIHByZXZpb3VzIGNvbW1hbmQuIFBsb3Qgb25seSB0aGUgcG9pbnRzIHdpdGggYXJndW1lbnQgYWxwaGEuXG5nZ3Bsb3QoZGlhbW9uZHMsIGFlcyh4ID0gY2FyYXQsIHkgPSBwcmljZSwgY29sb3IgPSBjbGFyaXR5KSkgK1xuICBnZW9tX3BvaW50KGFscGhhID0gMC40KSJ9

Smooth work! geom_point() + geom_smooth() is a common combination.

Understanding the grammar, part 1

Here you’ll explore some of the different grammatical elements. Throughout this course, you’ll discover how they can be combined in all sorts of ways to develop unique plots.

In the following instructions, you’ll start by creating a ggplot object from the diamonds dataset. Next, you’ll add layers onto this object to build beautiful & informative plots.

INSTRUCTIONS 70 XP Define the data (diamonds) and aesthetics layers. Map carat on the x axis and price on the y axis. Assign it to an object: dia_plot. Using +, add a geom_point() layer (with no arguments), to the dia_plot object. This can be in a single or multiple lines. Note that you can also call aes() within the geom_point() function. Map clarity to the color argument in this way. Show Answer (-70 XP) HINT Use dia_plot <- ggplot(dataset, aes(x = column1, column2)) to create dia_plot. Use the correct dataset and column names! You can add the geom layer to dia_plot the same way you added the layer to the ggplot() command directly in the previous exercises. Recall that you can nest aes() function inside ggplot(), but you can also nest it inside a geom function.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJsaWJyYXJ5KGdncGxvdDIpXG4jIENyZWF0ZSB0aGUgb2JqZWN0IGNvbnRhaW5pbmcgdGhlIGRhdGEgYW5kIGFlcyBsYXllcnM6IGRpYV9wbG90XG5fX18gPC0gZ2dwbG90KF9fXywgYWVzKHggPSBfX18sIHkgPSBfX18pXG5cbiMgQWRkIGEgZ2VvbSBsYXllciB3aXRoICsgYW5kIGdlb21fcG9pbnQoKVxuX19fICsgX19fKClcblxuIyBBZGQgdGhlIHNhbWUgZ2VvbSBsYXllciwgYnV0IHdpdGggYWVzKCkgaW5zaWRlXG5fX18gKyBfX18oYWVzKGNvbG9yID0gX19fKSkiLCJzb2x1dGlvbiI6ImxpYnJhcnkoZ2dwbG90MilcbiMgQ3JlYXRlIHRoZSBvYmplY3QgY29udGFpbmluZyB0aGUgZGF0YSBhbmQgYWVzIGxheWVyczogZGlhX3Bsb3RcbmRpYV9wbG90IDwtIGdncGxvdChkaWFtb25kcywgYWVzKHggPSBjYXJhdCwgeSA9IHByaWNlKSlcblxuIyBBZGQgYSBnZW9tIGxheWVyIHdpdGggKyBhbmQgZ2VvbV9wb2ludCgpXG5kaWFfcGxvdCArIGdlb21fcG9pbnQoKVxuXG4jIEFkZCB0aGUgc2FtZSBnZW9tIGxheWVyLCBidXQgd2l0aCBhZXMoKSBpbnNpZGVcbmRpYV9wbG90ICsgZ2VvbV9wb2ludChhZXMoY29sb3IgPSBjbGFyaXR5KSkifQ==

Remarkable plot recyling! Notice how you can store the plot as a ggplot object that you can use later on to add other layers; that’s pretty convenient!

Understanding the grammar, part 2

Continuing with the previous exercise, here you’ll explore mixing arguments and aesthetics in a single geometry.

You’re still working on the diamonds dataset.

INSTRUCTIONS 70 XP 1 - The dia_plot object has been created for you. 2 - Update dia_plot so that it contains all the functions to make a scatter plot by using geom_point() for the geom layer. Set alpha = 0.2. 3 - Using +, plot the dia_plot object with a geom_smooth() layer on top. You don’t want any error shading, which can be achieved by setting the se = FALSE in geom_smooth(). 4 - Modify the geom_smooth() function from the previous instruction so that it contains aes() and map clarity to the col argument. Show Answer (-70 XP) HINT Plot 1 - The dia_plot object has been created for you, just run the code and examine the plot. Plot 2 - Extend dia_plot with a geom layer containing points. You can do this by coding dia_plot <- dia_plot + geom_point(). Just don’t forget to set the alpha argument within geom_point(). Plot 3 - This time you want to plot the previously created figure with a smooth line on top. Do this by entering dia_plot + geom_smooth(). Don’t forget to set the correct arguments within geom_smooth(). Plot 4 - This is similar to the third instruction, but this time you have to set aesthetics within geom_smooth() as well.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJsaWJyYXJ5KGdncGxvdDIpXG4jIDEgLSBUaGUgZGlhX3Bsb3Qgb2JqZWN0IGhhcyBiZWVuIGNyZWF0ZWQgZm9yIHlvdVxuZGlhX3Bsb3QgPC0gZ2dwbG90KGRpYW1vbmRzLCBhZXMoeCA9IGNhcmF0LCB5ID0gcHJpY2UpKVxuXG4jIDIgLSBFeHBhbmQgZGlhX3Bsb3QgYnkgYWRkaW5nIGdlb21fcG9pbnQoKSB3aXRoIGFscGhhIHNldCB0byAwLjJcbmRpYV9wbG90IDwtIGRpYV9wbG90ICsgX19fKClcblxuIyAzIC0gUGxvdCBkaWFfcGxvdCB3aXRoIGFkZGl0aW9uYWwgZ2VvbV9zbW9vdGgoKSB3aXRoIHNlIHNldCB0byBGQUxTRVxuZGlhX3Bsb3QgKyBfX18oX19fID0gX19fKVxuXG4jIDQgLSBDb3B5IHRoZSBjb21tYW5kIGZyb20gYWJvdmUgYW5kIGFkZCBhZXMoKSB3aXRoIHRoZSBjb3JyZWN0IG1hcHBpbmcgdG8gZ2VvbV9zbW9vdGgoKVxuZGlhX3Bsb3QgKyBfX18oYWVzKF9fXyA9IF9fXyksIF9fXyA9IF9fXykiLCJzb2x1dGlvbiI6ImxpYnJhcnkoZ2dwbG90MilcbnNldC5zZWVkKDEpXG5cbiMgMSAtIFRoZSBkaWFfcGxvdCBvYmplY3QgaGFzIGJlZW4gY3JlYXRlZCBmb3IgeW91XG5kaWFfcGxvdCA8LSBnZ3Bsb3QoZGlhbW9uZHMsIGFlcyh4ID0gY2FyYXQsIHkgPSBwcmljZSkpXG5cbiMgMiAtIEV4cGFuZCBkaWFfcGxvdCBieSBhZGRpbmcgZ2VvbV9wb2ludCgpIHdpdGggYWxwaGEgc2V0IHRvIDAuMlxuZGlhX3Bsb3QgPC0gZGlhX3Bsb3QgKyBnZW9tX3BvaW50KGFscGhhID0gMC4yKVxuXG4jIDMgLSBQbG90IGRpYV9wbG90IHdpdGggYWRkaXRpb25hbCBnZW9tX3Ntb290aCgpIHdpdGggc2Ugc2V0IHRvIEZBTFNFXG5kaWFfcGxvdCArIGdlb21fc21vb3RoKHNlID0gRkFMU0UpXG5cbiMgNCAtIENvcHkgdGhlIGNvbW1hbmQgZnJvbSBhYm92ZSBhbmQgYWRkIGFlcygpIHdpdGggdGhlIGNvcnJlY3QgbWFwcGluZyB0byBnZW9tX3Ntb290aCgpXG5kaWFfcGxvdCArIGdlb21fc21vb3RoKGFlcyhjb2wgPSBjbGFyaXR5KSwgc2UgPSBGQUxTRSkifQ==

Bravo! To set a property of a geom to a single value, pass it as an argument. To give the property different values for each row of data, pass it as an aesthetic.