Geometric Objects and Aesthetic Mapping in R ggplot2

housing <- read.csv("dataSets/landdata-states.csv")
head(housing[1:5])
##   State region  Date Home.Value Structure.Cost
## 1    AK   West 20101     224952         160599
## 2    AK   West 20102     225511         160252
## 3    AK   West 20093     225820         163791
## 4    AK   West 20094     224994         161787
## 5    AK   West 20074     234590         155400
## 6    AK   West 20081     233714         157458
housing$Year <- as.numeric(substr(housing$Date, 1, 4))
housing$Qrtr <- as.numeric(substr(housing$Date, 5, 5))
housing$Date <- housing$Year + housing$Qrtr/4

ggplot2 VS Base for simple graphs

hist(housing$Home.Value)
library(ggplot2)

ggplot(housing, aes(x = Home.Value)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#### ggplot2 Base graphics VS ggplot for more complex graphs:

plot(Home.Value ~ Date, data=subset(housing, State == "MA"))
points(Home.Value ~ Date, col="red", data=subset(housing, State == "TX"))
legend(19750, 400000, c("MA", "TX"), title="State", col=c("black", "red"), pch=c(1, 1))

ggplot(subset(housing, State %in% c("MA", "TX")), aes(x=Date, y=Home.Value, color=State)) + geom_point()

Geometric Objects And Aesthetics

Aesthetic Mapping

In ggplot land aesthetic means “something you can see”. Examples include: - position (i.e., on the x and y axes) - color (“outside” color) - fill (“inside” color) - shape (of points) - linetype - size ### Geometic Objects (geom) Geometric objects are the actual marks we put on a plot. Examples include: - points (geom_point, for scatter plots, dot plots, etc) - lines (geom_line, for time series, trend lines, etc) - boxplot (geom_boxplot, for, well, boxplots!) A plot must have at least one geom; there is no upper limit. You can add a geom to a plot using the + operator You can get a list of available geometric objects using the code below: - help.search(“geom_”, package = “ggplot2”) or simply type geom_<tab> in any good R IDE (such as Rstudio or ESS) to see a list of functions starting with geom_.

Points (Scatterplot)

hp2001Q1 <- subset(housing, Date == 2001.25) 
ggplot(hp2001Q1,
       aes(y = Structure.Cost, x = Land.Value)) +
       geom_point()

ggplot(hp2001Q1,
       aes(y = Structure.Cost, x = log(Land.Value))) +
       geom_point()

Lines (Prediction Line)

A plot constructed with ggplot can have more than one geom. In that case the mappings established in the ggplot() call are plot defaults that can be added to or overridden. Our plot could use a regression line:

hp2001Q1$pred.SC <- predict(lm(Structure.Cost ~ log(Land.Value), data = hp2001Q1))
p1 <- ggplot(hp2001Q1, aes(x = log(Land.Value), y = Structure.Cost))
p1 + geom_point(aes(color = Home.Value)) + geom_line(aes(y = pred.SC))

Smoothers

Not all geometric objects are simple shapes–the smooth geom includes a line and a ribbon.

p1 + geom_point(aes(color = Home.Value)) + geom_smooth()

Text (Label Points)

Each geom accepts a particualar set of mappings–for example geom_text() accepts a labels mapping.

p1 + geom_text(aes(label=State), size = 3)

## install.packages("ggrepel") 
require("ggrepel")
## Loading required package: ggrepel
p1 +  geom_point() +geom_text_repel(aes(label=State), size = 3)

Aesthetic Mapping VS Assignment

Note that variables are mapped to aesthetics with the aes() function, while fixed aesthetics are set outside the aes() call. This sometimes leads to confusion, as in this example:

p1 +
  geom_point(aes(size = 2),# incorrect! 2 is not a variable
             color="red") # this is fine -- all points red

Mapping Variables To Other Aesthetics

Other aesthetics are mapped in the same way as x and y in the previous example.

p1 + geom_point(aes(color=Home.Value, shape = region))
## Warning: Removed 1 rows containing missing values (geom_point).

Exercise I

These data consist of Human Development Index and Corruption Perception Index scores for several countries. Create a scatter plot with CPI on the x axis and HDI on the y axis. Color the points blue. Map the color of the the points to Region. Make the points bigger by setting size to 2 Map the size of the points to HDI.Rank

dat <- read.csv("dataSets/EconomistData.csv")
head(dat)
##   X     Country HDI.Rank   HDI CPI            Region
## 1 1 Afghanistan      172 0.398 1.5      Asia Pacific
## 2 2     Albania       70 0.739 3.1 East EU Cemt Asia
## 3 3     Algeria       96 0.698 2.9              MENA
## 4 4      Angola      148 0.486 2.0               SSA
## 5 5   Argentina       45 0.797 3.0          Americas
## 6 6     Armenia       86 0.716 2.6 East EU Cemt Asia
ggplot(dat, aes(x = CPI, y = HDI)) + geom_point(color="blue")

ggplot(dat, aes(x = CPI, y = HDI, color = Region)) + geom_point()

ggplot(dat, aes(x = CPI, y = HDI, color = Region)) + geom_point(size=2)

ggplot(dat, aes(x = CPI, y = HDI)) + geom_point(aes(color = Region, size =  HDI.Rank))