TL;DR: Use `coord_cartesian` to zoom, not `xlim`, `ylim`, or `scale_*`!

Why? Because the xlim, ylim, and scale_ commands remove data points, whereas the coord_cartesian command simply zooms the plots. This causes trouble when ggplot2 is using one of the stat_ functions to compute something (such as a smoothed fit to data, a density, or a contour) from the underlying data before plotting.

This is all explained in the ggplot2 docs and book, and so in theory everyone should know it. But in practice I’ve used ggplot2 for years and somehow managed to either not read or ignore that part of the documentation, and so never realized this. I asked some colleagues and they hadn’t either, which is why I thought it was worth the time to write up a simple example of how much trouble this can cause if you’re not careful.

Generate some data, fit, and plot

Here’s an example where we’ll generate 500 random points with a (weak) linear relationship between them and use geom_smooth to fit a linear function to them.

set.seed(42)
x <- rnorm(500)
y <- 2*x + 25*rnorm(500)
df <- data.frame(x, y)

ggplot(df, aes(x=x, y=y)) +
  geom_smooth(method="lm")

This looks pretty good—the fitted line shows a positive correlation between x and y and has just about the right slope (of approximately 2).

Zooming, the wrong way

Let’s try something seemingly harmless, and just center the y-axis around 0 from -12 to 12 using the ylim function.

ggplot(df, aes(x=x, y=y)) +
  geom_smooth(method="lm") +
  ylim(c(-12,12))

## Warning: Removed 326 rows containing non-finite values (stat_smooth).

What’s going on here? Why did setting the y-axis limits on the plot change the slope of the fitted line and reverse the direction of the apparent correlation between x and y?

This happens because of the order of operations involved in making the plot. When you call a command like geom_smooth in combination with ylim, ylim first filters the data to the specified range, then stat_smooth is called behind the scenes to fit the line, and, finally, the plot is displayed. So the slope of the fitted line changes because it’s fit to different data!

Now, to be fair, stat_smooth warns you about removing these values, but it’s very easy to overlook this warning and make some dangerous plotting errors if you’re not careful!

The same thing happens when you use xlim, scale_x_continuous, etc.

Zooming, the right way

Here’s the right way to do things using coord_cartesian, which doesn’t eliminate any data points but simply zooms the plot to the desired region.

ggplot(df, aes(x=x, y=y)) +
  geom_smooth(method="lm") +
  coord_cartesian(ylim=c(-12,12))

This plot has the right slope, and so all is well again.

The lesson again, in case you missed it: Use coord_cartesian to zoom, not xlim, ylim, or scale_*!

How to zoom with ggplot2

Jake Hofman

September 14, 2016

TL;DR: Use `coord_cartesian` to zoom, not `xlim`, `ylim`, or `scale_*`!

Generate some data, fit, and plot

Zooming, the wrong way

Zooming, the right way

How to zoom with ggplot2

Jake Hofman

September 14, 2016

TL;DR: Use coord_cartesian to zoom, not xlim, ylim, or scale_*!

Generate some data, fit, and plot

Zooming, the wrong way

Zooming, the right way

TL;DR: Use `coord_cartesian` to zoom, not `xlim`, `ylim`, or `scale_*`!