Issues with geom_col

There appears to be some inconsistency with the data shown in geom_col based on where it is published and the size of the window/viewport.

library(dplyr)
library(ggplot2)
library(readr)
library(lubridate)

Read data

df <- read_csv("data/geom_col_data.csv",
               col_select = c("dtime", "var"))

Distribution

The example data set is a time series (15min interval) with a variable that is mostly 0 with small events that have a range of (0, 1). This data is normally plotted with 72hr and 90 day series. I think the distribution is likely causing the issue, but the change in output is still concerning.

ggplot(df, aes(x = var)) +
  geom_histogram(bins = 50) +
  labs(
    title = "Histogram of var",
    x = "var",
    y = "Count"
  )

Plots with geom_col

When I create a basic column plot using geom_col the height of a column and the scale seem to change independently with the change in window size.

basic_col <- ggplot(df, aes(x = dtime, y = var)) +
  geom_col() +
  labs(
    title = "geom_col Plot with Inaccurate Column Heights"
  )
basic_col

The largest value for var is 0.73, but the output above show it being >0.75. The screenshot below is from clicking the “Show in New Window” button, which matches the image in RStudio but is different than what is rendered in the Quarto HTML document.

If I resize the plot in the window to be longer, it looks like the largest value is now almost 1.5

I’m guessing this may have to do with the position = stack, but it seems inconsistent.

Saving basic plot

ggsave("basic_col.png", basic_col)
Saving 7 x 5 in image

The saved plot is the the same as the default in RStudio.

Adding labels

If we add labels, we can clearly see the the columns extend past the actual values. The output shown below in the rendered Quarto doc is different than what is shown in RStudio as it shows the spike much closer to the beginning of May.

basic_col + geom_text(label = df$var)

In case it was stacking values due to the scale of the x-axis, I tried reducing the date range to see if it would improve the results. It did not:

narrow_col <- ggplot(df |> dplyr::filter(between(dtime, 
                                                 ymd_hms("2025-05-16 00:00:00"),
                                                 ymd_hms("2025-05-17 00:00:00"))),
                     aes(x = dtime, y = var)) +
  geom_col() +
  scale_x_datetime(date_breaks = "15 min",
                   #date_minor_breaks = "15 min",
                   date_labels = "%R") +
  labs(
    title = "Narrow geom_col Plot with Inaccurate Column Heights",
    x = "Time (5/16/25)"
  )

narrow_col

Comparing with geom_point

geom_point looks to show the proper heights:

basic_point <- ggplot(df, aes(x = dtime, y = var)) +
  geom_point() +
  labs(
    title = "geom_point Plot with Accurate Column Heights"
  )
basic_point

Adjusting scales

Using scale_y_continuous does not seem to fix the issue:

scale_col <- ggplot(df, aes(x = dtime, y = var)) +
  geom_col() +
  scale_y_continuous(limits = c(0, 1)) +
  labs(
    title = "geom_col Plot with scale_y_continuous"
  )
scale_col
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_col()`).

It still thinks there is values greater than 1 in the data set.