housing <- read.csv("dataSets/landdata-states.csv")
head(housing[1:5])
## State region Date Home.Value Structure.Cost
## 1 AK West 20101 224952 160599
## 2 AK West 20102 225511 160252
## 3 AK West 20093 225820 163791
## 4 AK West 20094 224994 161787
## 5 AK West 20074 234590 155400
## 6 AK West 20081 233714 157458
housing$Year <- as.numeric(substr(housing$Date, 1, 4))
housing$Qrtr <- as.numeric(substr(housing$Date, 5, 5))
housing$Date <- housing$Year + housing$Qrtr/4
hist(housing$Home.Value)
library(ggplot2)
ggplot(housing, aes(x = Home.Value)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#### ggplot2 Base graphics VS ggplot for more complex graphs:
plot(Home.Value ~ Date, data=subset(housing, State == "MA"))
points(Home.Value ~ Date, col="red", data=subset(housing, State == "TX"))
legend(19750, 400000, c("MA", "TX"), title="State", col=c("black", "red"), pch=c(1, 1))
ggplot(subset(housing, State %in% c("MA", "TX")), aes(x=Date, y=Home.Value, color=State)) + geom_point()
In ggplot land aesthetic means “something you can see”. Examples include: - position (i.e., on the x and y axes) - color (“outside” color) - fill (“inside” color) - shape (of points) - linetype - size ### Geometic Objects (geom) Geometric objects are the actual marks we put on a plot. Examples include: - points (geom_point, for scatter plots, dot plots, etc) - lines (geom_line, for time series, trend lines, etc) - boxplot (geom_boxplot, for, well, boxplots!) A plot must have at least one geom; there is no upper limit. You can add a geom to a plot using the + operator You can get a list of available geometric objects using the code below: - help.search(“geom_”, package = “ggplot2”) or simply type geom_<tab>
in any good R IDE (such as Rstudio or ESS) to see a list of functions starting with geom_.
hp2001Q1 <- subset(housing, Date == 2001.25)
ggplot(hp2001Q1,
aes(y = Structure.Cost, x = Land.Value)) +
geom_point()
ggplot(hp2001Q1,
aes(y = Structure.Cost, x = log(Land.Value))) +
geom_point()
A plot constructed with ggplot can have more than one geom. In that case the mappings established in the ggplot() call are plot defaults that can be added to or overridden. Our plot could use a regression line:
hp2001Q1$pred.SC <- predict(lm(Structure.Cost ~ log(Land.Value), data = hp2001Q1))
p1 <- ggplot(hp2001Q1, aes(x = log(Land.Value), y = Structure.Cost))
p1 + geom_point(aes(color = Home.Value)) + geom_line(aes(y = pred.SC))
Not all geometric objects are simple shapes–the smooth geom includes a line and a ribbon.
p1 + geom_point(aes(color = Home.Value)) + geom_smooth()
Each geom accepts a particualar set of mappings–for example geom_text() accepts a labels mapping.
p1 + geom_text(aes(label=State), size = 3)
## install.packages("ggrepel")
require("ggrepel")
## Loading required package: ggrepel
p1 + geom_point() +geom_text_repel(aes(label=State), size = 3)
Note that variables are mapped to aesthetics with the aes() function, while fixed aesthetics are set outside the aes() call. This sometimes leads to confusion, as in this example:
p1 +
geom_point(aes(size = 2),# incorrect! 2 is not a variable
color="red") # this is fine -- all points red
Other aesthetics are mapped in the same way as x and y in the previous example.
p1 + geom_point(aes(color=Home.Value, shape = region))
## Warning: Removed 1 rows containing missing values (geom_point).
These data consist of Human Development Index and Corruption Perception Index scores for several countries. Create a scatter plot with CPI on the x axis and HDI on the y axis. Color the points blue. Map the color of the the points to Region. Make the points bigger by setting size to 2 Map the size of the points to HDI.Rank
dat <- read.csv("dataSets/EconomistData.csv")
head(dat)
## X Country HDI.Rank HDI CPI Region
## 1 1 Afghanistan 172 0.398 1.5 Asia Pacific
## 2 2 Albania 70 0.739 3.1 East EU Cemt Asia
## 3 3 Algeria 96 0.698 2.9 MENA
## 4 4 Angola 148 0.486 2.0 SSA
## 5 5 Argentina 45 0.797 3.0 Americas
## 6 6 Armenia 86 0.716 2.6 East EU Cemt Asia
ggplot(dat, aes(x = CPI, y = HDI)) + geom_point(color="blue")
ggplot(dat, aes(x = CPI, y = HDI, color = Region)) + geom_point()
ggplot(dat, aes(x = CPI, y = HDI, color = Region)) + geom_point(size=2)
ggplot(dat, aes(x = CPI, y = HDI)) + geom_point(aes(color = Region, size = HDI.Rank))