This notebook is a test for creating R graphics with ggplot2. It reproduces the code at the following link: https://tutorials.iq.harvard.edu/R/Rgraphics/Rgraphics.html
The ggplot2 packages is included in a popular collection of packages called “the tidyverse”. Take a moment to ensure that it is installed, and that we have attached the ggplot2 package.
#install.packages("tidyverse")
library(tidyverse)
## ── Attaching packages ───────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.0.0 ✔ purrr 0.2.5
## ✔ tibble 1.4.2 ✔ dplyr 0.7.6
## ✔ tidyr 0.8.1 ✔ stringr 1.3.1
## ✔ readr 1.1.1 ✔ forcats 0.3.0
## ── Conflicts ──────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
housing <- read_csv("graphics/landdata-states.csv")
## Parsed with column specification:
## cols(
## State = col_character(),
## region = col_character(),
## Date = col_double(),
## Home.Value = col_integer(),
## Structure.Cost = col_integer(),
## Land.Value = col_integer(),
## Land.Share..Pct. = col_double(),
## Home.Price.Index = col_double(),
## Land.Price.Index = col_double(),
## Year = col_integer(),
## Qrtr = col_integer()
## )
head(housing[1:5])
Let’s plot a histogram:
library(ggplot2)
ggplot(housing, aes(x = Home.Value)) +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Now, let’s draw a more complex graphic:
ggplot(filter(housing, State %in% c("MA", "TX")),
aes(x=Date,
y=Home.Value,
color=State))+
geom_point()
In ggplot land aesthetic means “something you can see”. Examples include:
-position (i.e., on the x and y axes) -color (“outside” color) -fill (“inside” color) -shape (of points) -linetype -size
Each type of geom accepts only a subset of all aesthetics–refer to the geom help pages to see what mappings each geom accepts. Aesthetic mappings are set with the aes() function.
Geometric objects are the actual marks we put on a plot. Examples include:
-points (geom_point, for scatter plots, dot plots, etc) -lines (geom_line, for time series, trend lines, etc) -boxplot (geom_boxplot, for, well, boxplots!)
A plot must have at least one geom; there is no upper limit. You can add a geom to a plot using the + operator
You can get a list of available geometric objects using the code below:
help.search("geom_", package = "ggplot2")
Now that we know about geometric objects and aesthetic mapping, we can make a ggplot. geom_point requires mappings for x and y, all others are optional.
hp2001Q1 <- filter(housing, Date == 2001.25)
ggplot(hp2001Q1,
aes(y = Structure.Cost, x = Land.Value)) +
geom_point()
ggplot(hp2001Q1,
aes(y = Structure.Cost, x = log(Land.Value))) +
geom_point()
A plot constructed with ggplot can have more than one geom. In that case the mappings established in the ggplot() call are plot defaults that can be added to or overridden. Our plot could use a regression line:
hp2001Q1$pred.SC <- predict(lm(Structure.Cost ~ log(Land.Value), data = hp2001Q1))
p1 <- ggplot(hp2001Q1, aes(x = log(Land.Value), y = Structure.Cost))
p1 + geom_point(aes(color = Home.Value)) +
geom_line(aes(y = pred.SC))