I have the following code, with help from https://stackoverflow.com/questions/58866575/how-to-highlight-a-column-in-ggplot2 . But when I apply to my actual test data, where there’re about 30 columns to be highlighted, there are diagonal lines (between) connecting the highlighted columns (ie., ideally it should be empty in those columns). And my y-axis also increase a lots (not the data value, just the axis.) Any idea? Thank you !!!
fruits | juice_content | weight |
---|---|---|
apple | 10 | 5 |
orange | 1 | 2 |
watermelons | 1000 | 2000 |
tidyr
package to reshape the data.fruits | compare | measure |
---|---|---|
apple | juice_content | 10 |
orange | juice_content | 1 |
watermelons | juice_content | 1000 |
apple | weight | 5 |
orange | weight | 2 |
watermelons | weight | 2000 |
plot <- ggplot(df, aes(fruits, measure, fill = compare)) +
geom_bar(stat = "identity", position = position_dodge()) +
scale_y_log10()
plot
highlight_level <- which(levels(df$fruits) %in% c("apple", "watermelons"))
AreaDF <- data.frame(
fruits = unlist(
lapply(highlight_level, function(x) c(x - 0.51, x - 0.5, x + 0.5, x + 0.51))
),
yval = rep(
c(1, max(df$measure), max(df$measure), 1), length(highlight_level)
)
)
AreaDF
fruits | yval |
---|---|
0.49 | 1 |
0.50 | 2000 |
1.50 | 2000 |
1.51 | 1 |
2.49 | 1 |
2.50 | 2000 |
3.50 | 2000 |
3.51 | 1 |
plot <- ggplot(df, aes(fruits)) +
geom_blank(aes(y = measure, fill = compare)) +
geom_area(data = AreaDF, aes(y = yval), fill = "yellow") +
geom_bar(aes(y = measure, fill = compare), stat = "identity", position = position_dodge()) +
scale_y_log10()
plot
Your problem exists in this section of code.
AreaDF <- data.frame(
fruits = unlist(
lapply(highlight_level, function(x) c(x - 0.51, x - 0.5, x + 0.5, x + 0.51))
),
yval = rep(
c(1, max(df$measure), max(df$measure), 1), length(highlight_level))
)
Specifically, this code is hardcoded for the case where the dataset contains only three factors. This part of code (see line 2 in the preceding code block) contains hardcoded values for the data elements in positions 1 and 3 of the vector: apple
and watermelons
.
highlight_level <- which(levels(df$fruits) %in% c("apple", "watermelons"))
In your sample code, you have only three factors, specifically, apple
, orange
, and watermelon
. The unlist()
function, which starts on line two in the preceding code block, converts the factors to integers. When you apply this code to your actual dataset, it breaks because you have 30 factors.
You could modify your code to accommodote an arbitrary number of factors. The key to this sample code is understanding the purpose of these two lines:
highlight_level <- which(levels(df$fruits) %in% c("apple", "watermelons"))
lapply(highlight_level, function(x) c(x - 0.51, x - 0.5, x + 0.5, x + 0.51))
Line 1 simply returns a two element vector equal to c(1, 3)
, but only in the case where elements 1 and 3 are equal apple
and watermelons
, respectively.
Then, the anonymous function in Line 2 calculcates a series of alternating polygons to use for the background polygon.
A better approach would be to modify Line 1 make it return a vector of all odd numbered integers from 1 to n, where n = the number of factors in the fruits
vector. Here’s one way to accomplish that goal.
highlight_level <- which(seq(length(unique(df$fruits))) %% 2 != 0)
Of course, an even better approach would be to build a few functions to automatically calculate the polygon regions to be highlighted.