library(tidyverse) # Always remember to set working directory and load packages.
A data frame is a table of variables (in columns) and observations/cases (in rows). The mpg data frame contains observations collected by the US Environment Protection Agency on 38 models of cars. You can learn more about any data frame by running ?mpg
mpg # The mpg data frame.
ggplot2 is a system for describing and building graphs. You can do a lot with the ggplot2 system since you can apply it in many places. ggplot2 starts with the function ggplot( ) that creates a coordinate system that you can then add layers to.
geom_point( ) is a function used to add a layer of points to a plot (in this case a scatterplot). Functions are followed by parentheses, like sum( ), mean( ), or ggplot( ). An aesthetic is a visual property of the objects in your plot and is one way of adding variables to a plot. Aesthetics include things like the size, the shape, or the color of your points. The first argument specifies the dataset to use in the graph–by itself all this does is create an empty graph. Graphs are completed by adding layers.
ggplot(data = mpg) + # The first argument specifies the dataset. *The "+" sign is always on first line.*
geom_point(mapping = aes(x = displ, y = hwy)) # Using the geom function to add layers (in this case points) to a graph. The x and y axes are assigned to displ and hwy variables.

In the code below a third variable (class) is added to a two-dimensional scatterplot by mapping it to an aesthetic by associating the name of the aesthetic (color) to the name of the variable inside aes( ) (class). ggplot2 assigns a unique level of the aesthetic (e.g., color) to each unique value of the variable (this is known as scaling). ggplot2 also adds a legend that explains which aesthetic levels correspond to which values.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class)) # Third variable (class) added.

If we mapped the class variable to the size aesthetic instead, like in the graph below, the exact size of each point would revealthe car class. This gives us a warning because mapping an unordered (nominal) varilable like class to an ordered (ordinal) aesthetic like size does not give a good representation of the data.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, size = class)) # Class variable set as size, which doesn't make sense.

Other aesethetics include the alpha aesthetic shown in graph below which controls the transparency of the points.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, alpha = class)) # Here alpha controls transparency

The alpha aesthetic can also control the shape of the points. ggplot2 will only use six shapes at a time, any additional groups in your dataset will go unplotted.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, shape = class)) # Here alpha controls shape

You can set the aesthetic properties of a geom manually—for example, to make all the points blue. In the plot below this this doesn’t convey information about the variable, it just made all the data points blue.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy), color = "blue") # Need to place argument name outside of aes ()

3.3.1 Exercises
- What’s gone wrong with this code? Why aren’t the points blue?
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = "blue")) # Points not blue becasue argument is placed *inside* aes.

ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy), color = "blue") # Argument needs to be placed *outside* aes.

- What are the variables in mpg?
- Categorical variables in R are called characters. They are stored as strings (i.e., text) which are text placaed within quotes. Categorical variables in mpg include: manufacturer, model, trans (type of transmission), drv (front-wheel drive, rear-wheel, 4wd), fl (fuel type), and class (type of car).
- Continuous varibles in R are called doubles or integers. Continuous variables in mpg include: displ (engine displacement in litres), cyl (number of cylinders), cty (city miles/gallon), and hwy (highway gallons/mile).
mpg # Also run ?mpg for description of dataset
- Map a continuous variable to color, size, and shape. How do these aesthetics behave differently for categorical vs. continuous variables?
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = cty)) # A continuous variable mapped to color. This makes sense for city miles/gallon.

ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, size = cty)) # A continuous variable mapped to size. This also makes sense for city miles/gallon.

ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, shape = cty)) # A continuous variable mapped to shape.
# Error: A continuous variable can not be mapped to shape
# This doesn't make sense because there's nothing continuous about shapes (use for categorical variables)
Why? For the aesthetics color, size, and shape the continuous variables are visualized on a spectrum whereas categorical variables are binned into discrete categories.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class)) # Categorical variables are discrete categories.

- What happens if you map the same variable to multiple aesthetics?
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = cty, size = cty)) # Both aesthetics are mapped and multiple legends are generated.

- What does the stroke aesthetic do? What shapes does it work with? (Hint: use ?geom_point).
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy), stroke = 3, shape = 21) # Stroke adjusts thickness of border for shapes that can take on different colors both inside and outside.

- What happens if you map an aesthetic to something other than a variable name, like aes(colour = displ < 5)?
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = displ < 5)) # R executes code and creates a temporary variable containing the results of the operation. Here, the new variable takes on a value of TRUE if the engine displacement is less than 5 or FALSE if the engine displacement is more than or equal to 5.

