Lattice and ggplot2 tend to compete. Lattice is strongly rooted in common approaches to visualization in statistics. ggplot2 attempts to use all those ideas “without the bad stuff” (Wickham is also a statistician).
ggviz and ggplot2 are both created by the Wickham. D3.js is a low level interactive visualization tool, and does much more than ggplot2. In practice a data scientist might use ggplot2 or ggviz to prototype a visualization, and give it to a professional front-end developer or UX expert to put into production using D3.js. Here is a great example.
hist(sestates$price)
Taylor the plot by passing arguments to the hist function.
hist(sestates$price,
breaks = 30,
main = "Housing Prices",
xlab = "price",
col = "blue",
)
Note that this will not play well with the “piping” used in dplyr, eg. the following won’t work.
sestates %>%
filter(city == "SACRAMENTO") %>%
select(price) %>%
hist
Start by using “aes” to create a “aesthetic mapping” – describe how variables in the data are mapped to visual properties.
# Map the x axis to price
p <- ggplot(sestates, aes(x = price))
p
Then add the histogram as a layer.
p + geom_histogram(bins = 30)
Unlike default plotting, where you add details with arguments, you build the plot incrementally by adding “layers”.
p +
geom_histogram(bins = 30, fill = "blue") +
ggtitle("Housing Prices") # add a title
Looking at square feet, lets define a house of more than 1850 square feet as “big”.
sestates <- mutate(sestates,
size = ifelse(sq__ft >= 1850, "big", "small"),
size = factor(size))
Using default plotting methods you create a first histogram and then “add” a second by setting the add argument to TRUE.
hist(sestates$price[sestates$size == "big"],
main = "Housing Prices", xlab = "Price",
col=rgb(1,0,0,0.5), breaks = 15, ylim = c(0, 175))
hist(sestates$price[sestates$size == "small"],
col=rgb(0,0,1,0.5), add = T, breaks = 15)
In ggplot2 you don’t create two different plots and combine them. Rather, you map a variable to a dimension in the plot, in this case we map the size variable to color (“fill = size”). You then add a single histogram layer.
ggplot(sestates, aes(x = price, fill = size)) + # Map price and size
geom_histogram(bins = 30, alpha = .5) + # Add histogram layer
ggtitle("Housing Prices") # Add a title layer
Map price to X axis, square feet to Y axis, and type (Condo, Multi-Family, Residential) to color
p <- ggplot(sestates, aes(price, sq__ft,
col = type,
alpha = .2))
Plot points.
p <- p + geom_point()
p
Plot a smoothing function.
p <- p + geom_smooth()
p
Compare across the variable for number of bathrooms.
p <- p + facet_grid(. ~ baths)
p
Oops! That looks kind of ugly. ggplot2 does not substitute for good visualization practice. ggplot2 is a tool. Creating informative visualizations of data is a skill you must build.
If you know what you are doing ggplot2 can make sophisticated visualizations.
Data on crimes in the USA
## Source: local data frame [50 x 5]
##
## state Murder Assault UrbanPop Rape
## (fctr) (dbl) (int) (int) (dbl)
## 1 alabama 13.2 236 58 21.2
## 2 alaska 10.0 263 48 44.5
## 3 arizona 8.1 294 80 31.0
## 4 arkansas 8.8 190 50 19.5
## 5 california 9.0 276 91 40.6
## 6 colorado 7.9 204 78 38.7
## 7 connecticut 3.3 110 77 11.1
## 8 delaware 5.9 238 72 15.8
## 9 florida 15.4 335 80 31.9
## 10 georgia 17.4 211 60 25.8
## .. ... ... ... ... ...
Plot a map of crime with ggplot.
ggplot(crimes, aes(map_id = state)) +
geom_map(aes(fill = Murder), map = states_map) +
expand_limits(x = states_map$long, y = states_map$lat) +
coord_map()
Adding a ‘facet’ layer to contrast different variables.
ggplot(crimesm, aes(map_id = state)) +
geom_map(aes(fill = value), map = states_map) +
expand_limits(x = states_map$long, y = states_map$lat) +
facet_wrap( ~ variable)
Geographical maps tend to be overused. Of course there are more murders in New York than Montana – New York has more people. ggplot2 can produce a variety of alternative visualizations.
In a famous presentation made by economist Han’s Rosling, he showed that the x-y scatterplot, where population is mapped to size of the point on the plot, is a powerful yet simple way of conveying population-based information.
You can create this easily in ggplot2 (if you want the animation, you need ggviz.)
ggplot(crimes, aes(Murder, Assault, size=UrbanPop, label=state)) +
geom_point(colour="red") +
geom_text(size=3) +
xlab("Murders per 1,000 population") +
ylab("Assaults per 1,000")