.html file as: YourName_ANLY512-2018.html and upload it to the “Visualization Coding Exercise #6 (Homework #6)” assignment in Week #7 on Moodle.Histograms are one of the most common and intuitive ways of showing distributions. In this exercise you’ll use the mtcars data frame to explore typical variations of simple histograms. But first, some background:
The x axis/aesthetic:
The documentation for geom_histogram() states the argument stat = “bin” as a default. Recall that histograms cut up a continuous variable into discrete bins - thats what the stat “bin” is doing. You always get 30 evenly-sized bins by default, which is specified with the default argument binwidth = range/30. This is a pretty good starting point if you don’t know anything about the variable being ploted and want to start exploring.
The y axis/aesthetic:
geom_histogram() only requires one aesthetic: x. But there is clearly a y axis on your plot, so where does it come from? Actually, there is a variable mapped to the y aesthetic, it’s called ..count… When geom_histogram() executed the binning statistic (see above), it not only cuts up the data into discrete bins, but it also counts how many values are in each bin. So there is an internal data frame where this information is stored. The .. calls the variable count from this internal data frame. This is what appears on the y aesthetic. But it gets better! The density has also been calculated. This is the proportional frequency of this bin in relation to the whole data set. You use ..density.. to access this information.
You will be using the mtcars data frame.
Use the mtcars data frame and make a univariate histogram by mapping mpg onto the x aesthetic. Use geom_histogram() for the geom layer. This is Plot 1 that you will create.
Take Plot 1 and manually create 1-unit wide bins with the binwidth = 1 argument in geom_histogram(). This is Plot 2 that you will create.
Take Plot 2, and map ..density.. onto the y aesthetic (i.e. inside an aes()) inside geom_histogram(). You’ll have two aes() functions: one inside ggplot() and another inside geom_histogram(). This is Plot 3 that you will create.
Take Plot 3 and set the attribute fill, the inside of the bars, to the value “#377EB8” in geom_histogram(). This should not appear in aes(), since it’s an attribute, not an aesthetic mapping.
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.4
#part a
# Plot 1 - Make a univariate histogram; fill in ______ with the correct code.
ggplot(mtcars, aes(x = mpg)) +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#part b
#Plot 1, plus set binwidth to 1 in the geom layer
ggplot(mtcars, aes(x = mpg)) +
geom_histogram(binwidth = 1)
#part c
#Plot 2, plus map ..density.. to the y aesthetic (i.e. in a second aes() function)
ggplot(mtcars, aes(x = mpg)) +
geom_histogram(aes(y = ..density..), binwidth = 1)
#pard d
#Plot 3, plus set the fill attribute to "#377EB8"
ggplot(mtcars, aes(x = mpg)) +
geom_histogram(aes(y = ..density..), binwidth = 1, fill = "#377EB8")
You have seen that there are lots of ways to position scatter plots. Likewise, the geom_bar() and geom_histogram() geoms also have a position argument, which you can use to specify how to draw the bars of the plot.
Three position arguments will be reviewed:
stack: place the bars on top of each other. Counts are used. This is the default position.
fill: place the bars on top of each other, but this time use proportions.
dodge: place the bars next to each other. Counts are used.
In this exercise you’ll draw the total count of cars having a given number of cylinders (cyl), according to manual or automatic transmission type (am).
You will be using the mtcars data frame.
Convert cyl and am to factor variables before you make the plots requested below.
Map cyl onto the x aesthetic and am onto fill. Use geom_bar() to make a bar plot. This is Plot 1 you will create.
Take Plot 1 and explicitly set position = stack" in geom_bar(). This doesn’t change anything, does it? It was mentioned above that “stack” is the default. This is Plot 2 you will create.
Take Plot 2 and set position = “fill” in geom_bar(). This is Plot 3 you will create.
Take Plot 3 and set position = “dodge” in geom_bar().This is Plot 4 you will create.
library(ggplot2)
#part a
# Plot 1 - draw a bar plot of cyl, filled according to am
ggplot(mtcars, aes(x = cyl, fill = am)) +
geom_bar()
#part b
# Plot 2 - change the position argument to stack
ggplot(mtcars, aes(x = cyl, fill = am)) +
geom_bar(position = "stack")
#part c
# Plot 3 - change the position argument to fill
ggplot(mtcars, aes(x = cyl, fill = am)) +
geom_bar(position = "fill")
#pard d
#Plot 4 - change the position argument to dodge
ggplot(mtcars, aes(x = cyl, fill = am)) +
geom_bar(position = "dodge")
So far you’ve seen three different positions for bar plots: stack (the default), dodge (preferred), and fill (to show proportions).
However, you can go one step further by adjusting the dodging, so that your bars partially overlap each other. For this example you’ll again use the mtcars dataset. Like last time cyl and am are already available as factors inside mtcars.
Instead of using position = “dodge” you’re going to use position_dodge(), like you did with position_jitter() in the In-class Visualization Coding Exercise #4 during this week’s class. Here, you’ll save this as an object, posn_d, so that you can easily reuse it.
Remember, the reason you want to use position_dodge() (and position_jitter()) is to specify how much dodging (or jittering) you want.
You will be using the mtcars data frame.
Convert cyl and am to factor variables before you make the plots requested below.
The last plot from the previous exercise, 2d, has been provided for you; run code to remind yourself of the plot.
Define a new object called posn_d by calling position_dodge() with the argument width = 0.2.
Take plot from part a and make slightly overlapping bars by using the position = posn_d argument.
Take plot from part c and set alpha = 0.6 to see the overlap in bars.
library(ggplot2)
#part a - last plot form the previous exercise, 2d
ggplot(mtcars, aes(x = cyl, fill = am)) +
geom_bar(position = "dodge")
#partb
posn_d <- position_dodge(width = 0.2)
#part c - change the position argument to posn_d using plot from part a
ggplot(mtcars, aes(x = cyl, fill = am)) +
geom_bar(position = posn_d)
#part d - use posn_d as position and adjust alpha to 0.6
ggplot(mtcars, aes(x = cyl, fill = am)) +
geom_bar(position = posn_d, alpha = 0.6)
Overlapping histograms pose similar problems to overlapping bar plots, but there is a unique solution here: a frequency polygon.
This is a geom specific to binned data that draws a line connecting the value of each bin. Like geom_histogram(), it takes a binwidth argument and by default stat = “bin” and position = “identity”.
You will be using the mtcars data frame.
Convert cyl to a factor variable before you make the plots requested below.
The code for a basic histogram of mpg, which you’ve already seen, is provided. Extend the code to map cyl onto fill inside aes(). Run the code and view the results.
The default position for histograms is “stack”. Copy the code from part a and set the position for the histogram bars to “identity”.
Using the same data and base layers as in the previous two plots, create a plot with a geom_freqpoly(). Because you’re no longer working with bars, change the aes() function: cyl should be mapped onto col, not onto fill. This will correctly color the geom.
library(ggplot2)
#part a - basic histogram, add coloring defined by cyl
ggplot(mtcars, aes(mpg)) +
geom_histogram(binwidth = 1)
#part b - change position to identity
ggplot(mtcars, aes(mpg, fill = cyl)) +
geom_histogram(binwidth = 1, position = 'identity')
#part c - Change geom to freqpoly (position is identity by default)
ggplot(mtcars, aes(mpg, col = cyl)) +
geom_freqpoly(binwidth = 1)
You saw a nice trick in the last exercise of how to slightly overlap bars, but now you’ll see how to overlap them completely. This would be nice for multiple histograms, as long as there are not too many different overlaps!
You’ll make a histogram using the mpg variable in the mtcars data frame. You need to make the variables cyl and am categorical variables for all the plots requested below.
A basic histogram plot is provided. Run the code and review the output. This is Plot 1.
Take Plot 1 and map am onto fill within the aes() function. The default position is “stack”. This is Plot 2 you will create.
Take Plot 2 and add the position argument within geom_histogram(). Set it to “dodge”. This is Plot 3 you will create.
Take Plot 3 and change the position argument to “fill”. In this case, none of these positions really work well, because it’s difficult to compare the distributions directly. This is Plot 4 you will create.
Take Plot 4 and change the position argument to “identity” and set alpha = 0.4. This produces overlapping bars. This is Plot 5 you will create.
Take Plot 5 and change the aesthetic mapping. Map cyl onto fill.
library(ggplot2)
#part a - basic histogram plot command
ggplot(mtcars, aes(mpg)) +
geom_histogram(binwidth = 1)
#part b - Plot 1, plus expanded aesthetics: am onto fill
ggplot(mtcars, aes(mpg, fill = am)) +
geom_histogram(binwidth = 1)
#part c - Plot 2, plus change position = "dodge"
ggplot(mtcars, aes(mpg, fill = am)) +
geom_histogram(binwidth = 1, position = "dodge")
#part d - Plot 3, plus change position = "fill"
ggplot(mtcars, aes(mpg, fill = am)) +
geom_histogram(binwidth = 1, position = "fill")
## Warning: Removed 8 rows containing missing values (geom_bar).
#part e - Plot 4, plus change position = "identity" and alpha = 0.4
ggplot(mtcars, aes(mpg, fill = am)) +
geom_histogram(binwidth = 1,
position = "identity",
alpha = 0.4)
#part f - Plot 5, plus change mapping: cyl onto fill
ggplot(mtcars, aes(mpg, fill = cyl)) +
geom_histogram(binwidth = 1,
position = "identity",
alpha = 0.4)