Help please :) Highlighting a column in ggplot2

I have the following code, with help from https://stackoverflow.com/questions/58866575/how-to-highlight-a-column-in-ggplot2 . But when I apply to my actual test data, where there’re about 30 columns to be highlighted, there are diagonal lines (between) connecting the highlighted columns (ie., ideally it should be empty in those columns). And my y-axis also increase a lots (not the data value, just the axis.) Any idea? Thank you !!!

Load libraries and set up the data frame.

library(tidyr)
library(kableExtra)
library(ggplot2)

fruits <- c("apple", "orange", "watermelons")

juice_content <- c(10, 1, 1000)

weight <- c(5, 2, 2000)

df <- data.frame(fruits, juice_content, weight)

Note that the data frame is ‘short & skinny’.

fruits	juice_content	weight
apple	10	5
orange	1	2
watermelons	1000	2000

Use the `tidyr` package to reshape the data.

df <-  gather(df, compare, measure, juice_content:weight, factor_key = TRUE)

Now, the data is ‘long & skinny’

fruits	compare	measure
apple	juice_content	10
orange	juice_content	1
watermelons	juice_content	1000
apple	weight	5
orange	weight	2
watermelons	weight	2000

First, make a plot with No background highlighting.

plot <- ggplot(df, aes(fruits, measure, fill = compare)) +
  geom_bar(stat = "identity", position = position_dodge()) +
  scale_y_log10()

plot

Second, generate the data for background highlighting.

highlight_level <- which(levels(df$fruits) %in% c("apple", "watermelons"))

AreaDF <- data.frame(
  fruits = unlist(
    lapply(highlight_level, function(x) c(x - 0.51, x - 0.5, x + 0.5, x + 0.51))
  ),
  yval = rep(
    c(1, max(df$measure), max(df$measure), 1), length(highlight_level)
  )
)

AreaDF

fruits	yval
0.49	1
0.50	2000
1.50	2000
1.51	1
2.49	1
2.50	2000
3.50	2000
3.51	1

Third, create the plot with background highlights.

plot <- ggplot(df, aes(fruits)) +
  geom_blank(aes(y = measure, fill = compare)) +
  geom_area(data = AreaDF, aes(y = yval), fill = "yellow") +
  geom_bar(aes(y = measure, fill = compare), stat = "identity", position = position_dodge()) +
  scale_y_log10()

plot

Summary

Your problem exists in this section of code.

AreaDF <- data.frame(
  fruits = unlist(
    lapply(highlight_level, function(x) c(x - 0.51, x - 0.5, x + 0.5, x + 0.51))
    ),
  yval = rep(
    c(1, max(df$measure), max(df$measure), 1), length(highlight_level))
  )

Specifically, this code is hardcoded for the case where the dataset contains only three factors. This part of code (see line 2 in the preceding code block) contains hardcoded values for the data elements in positions 1 and 3 of the vector: apple and watermelons.

highlight_level <- which(levels(df$fruits) %in% c("apple", "watermelons"))

In your sample code, you have only three factors, specifically, apple, orange, and watermelon. The unlist() function, which starts on line two in the preceding code block, converts the factors to integers. When you apply this code to your actual dataset, it breaks because you have 30 factors.

You could modify your code to accommodote an arbitrary number of factors. The key to this sample code is understanding the purpose of these two lines:

highlight_level <- which(levels(df$fruits) %in% c("apple", "watermelons"))
lapply(highlight_level, function(x) c(x - 0.51, x - 0.5, x + 0.5, x + 0.51))

Line 1 simply returns a two element vector equal to c(1, 3), but only in the case where elements 1 and 3 are equal apple and watermelons, respectively.

Then, the anonymous function in Line 2 calculcates a series of alternating polygons to use for the background polygon.

A better approach would be to modify Line 1 make it return a vector of all odd numbered integers from 1 to n, where n = the number of factors in the fruits vector. Here’s one way to accomplish that goal.

highlight_level <- which(seq(length(unique(df$fruits))) %% 2 != 0)

Of course, an even better approach would be to build a few functions to automatically calculate the polygon regions to be highlighted.

Reddit Post - Highlighting a column in ggplot2