Recently at work, I needed to generate a chart which plotted the number of occurrences of an item per quarter. The raw data was stored in a csv file and looked a little like this.

##   item       date quarter
## 1    a  1/24/2014 2014 Q1
## 2    b 10/28/2014 2014 Q4
## 3    c 10/29/2014 2014 Q4
## 4    d 11/12/2014 2015 Q1
## 5    e   1/5/2015 2015 Q1
## 6    f   1/5/2015 2015 Q1
## 7    g   1/9/2015 2015 Q1

One thing to note about this data is that the quarters do not follow the calendar year and instead the first quarter starts on November of the previous calendar year. That is November 2014 is 2015 Q1.

I wanted to use the R language (for practice), so initially I created a plot by first summarizing the data using dply.

library(dplyr)
grouped.df <- df %>% group_by(quarter) %>% summarise(Total = n())
grouped.df
## Source: local data frame [3 x 2]
## 
##   quarter Total
## 1 2014 Q1     1
## 2 2014 Q4     2
## 3 2015 Q1     4

I then plotted the summary using ggplot like this

library(ggplot2)
plot <- ggplot(grouped.df, aes(quarter, Total))
plot <- plot + geom_bar(stat = "identity", fill = "steelblue")
plot <- plot + ggtitle("Number of Items per Quarter")
plot <- plot + xlab("Quarters")
plot <- plot + ylab("Count")
print(plot)

The plot looks OK but because there is no data for 2014 Q2 and Q3, those quarters are not shown on the chart. For my purposes, I wanted the zero count quarters to also show on the chart.

After a lot of Googling and reading through posts on Stack Overflow, I still couldn’t find an easy way to do this. I found posts that dealt with quarters that followed the calendar year (starting in January) but not when the quarter started in November.

So an idea I had was to

  1. Make a list of all possible quarters between the start date and the end date.
  2. Summarize my data by quarter.
  3. Merge the list of all quarters with the summarized data making the count equal to 0 for any resulting NA values.

Here is a function I wrote to generate the list of all quarters

library("lubridate")
GetAllQuarters <- function(start, end) {  
  start.quarter <- as.Date(cut(start,  breaks = "quarter")) - months(2)  ## back 2 months to Nov
  end.quarter <- as.Date(cut(end,  breaks = "quarter")) - months(2)
  
  quarters <- seq(start.quarter, end.quarter, by = "quarters")

  list.of.months <- list("Nov" = "Q1","Feb" = "Q2", "May" = "Q3", "Aug" = "Q4")
  all.quarters <- NULL
  
  for (i in 1:length(quarters)) {
    target.month <- as.character(month(quarters[i], label = TRUE))
    if (target.month == "Nov") {  # If month falls into Nov break, this is for the following year.
      target.year <- as.character(year(quarters[i]) + 1)
    } else {
      target.year <- as.character(year(quarters[i]))  
    }    
    target.quarter <- list.of.months[target.month]
    all.quarters[i] <- paste(target.year, target.quarter, sep = " ")
  }

  all.quarters  
}

For example, if I was to call this function with the dates “1/24/2014” and “1/9/2015”

all.quarters <- GetAllQuarters(as.Date("1/24/2014", format = "%m/%d/%Y"), as.Date("1/9/2015", format = "%m/%d/%Y"))

it would output the following:-

all.quarters
## [1] "2014 Q1" "2014 Q2" "2014 Q3" "2014 Q4" "2015 Q1"

I then created a function to merge this list of quarters with the original summarized data

MergeQuarters <- function(data.df, quarters.df, merge.column = "quarter") {
  merged.df <- merge(quarters.df, data.df, by = merge.column, all = TRUE, sort = TRUE)
  merged.df$Total[is.na(merged.df$Total)] <- 0  # Make sure NA columns are set to 0
  merged.df
}

This function is used as follows:-

all.quarters <- data.frame(all.quarters)  # first convert the list of quarters to a data frame
colnames(all.quarters) <- "quarter"  # name the column
merged <- MergeQuarters(grouped.df, all.quarters,"quarter" )
merged
##   quarter Total
## 1 2014 Q1     1
## 2 2014 Q2     0
## 3 2014 Q3     0
## 4 2014 Q4     2
## 5 2015 Q1     4

Now when I plot the merged data I get the chart I want.

plot <- ggplot(merged, aes(quarter, Total))
plot <- plot + geom_bar(stat = "identity", fill = "steelblue")
plot <- plot + ggtitle("Number of Items per Quarter")
plot <- plot + xlab("Quarters")
plot <- plot + ylab("Count")
print(plot)

The reason I am posting this here is that I wonder is there an easier way to do this? If you know, please let me know in the comments.