BarChart Template

Overview of Tutorial

This tutorial demonstrates how to create interactive barcharts like the one below using a variety of different libraries, currently including; plotly and highchart.

The datasets that this tutorial considers are structured as follows:

Measure	Category.1	Category.2
1	A	X
2	B	Y
3	A	Y

Where the “measure” column contains numerical data which is categorised by a number of categories (or dimensions). There are therefore two interesting bar charts that can be generated:

The mean value of the measure broken down by a particular category (Mean Parameter per Category)
The number of observations broken down by a particular category (Number of Observations per Category)

Note that this template covers both how to build such a bar chart inside of an HTML RMarkdown file and how to functionalise the code so as to conveniently switch between different categories and metrics in a Shiny app.

Import and Clean Data

The data for this template is a .csv file accessed from Figshare here using read.csv.

desktopItems <- read.csv(file = "https://ndownloader.figshare.com/files/5360960")
knitr::kable(head(desktopItems))

Timestamp	Desktop.Items	Operating.System	University.Department	University	Country
9/30/2015 13:07:58	5	Mac (OS X)	IT Services	University of Oxford	UK
11/06/2015 12:20	87	Linux	Physics	University of Durham	UK
11/06/2015 12:33	25	Windows 10	Physics	Queen’s University Belfast	UK
11/06/2015 12:46	20	Windows 7	Physics	University of Leeds	UK
11/06/2015 12:48	64	Windows 8	International Office	University of the West of England	UK
11/06/2015 12:50	34	Windows 7	Biology	King’s College London	UK

An advanced version of this template might attempt to automatically infer the measure and appropriate categories for the data, in this template we explicitly decide which columns are categories (or dimensions) and which column is the measure:

measure_column <- "Desktop.Items"
categories <- c("Operating.System","University.Department","University","Country")

These columns will be used in the BarCharts to decide what dimensions of the data we are visualising.

Mean Parameter per Category

Using the aggregate function the mean number of desktop items per category can easily be calculated, the chosen category for aggregation will be assigned to selected_dimension. The function as.name is necessary to convert strings into valid column names.

selected_dimension <- categories[1]
aggregate_mean <- aggregate(data = desktopItems, eval(as.name(measure_column)) ~ eval(as.name(selected_dimension)), FUN = mean)
knitr::kable(aggregate_mean)

eval(as.name(selected_dimension))	eval(as.name(measure_column))
Linux	30.90909
Mac (OS X)	19.76471
Windows 10	17.07143
Windows 7	43.37500
Windows 8	48.09091

For convenience in the visualisation, the column names of the data.frame are renamed:

colnames(aggregate_mean) <- c(selected_dimension, measure_column)

The column names for the categories are formatted with periods instead of spaces, i.e. Operating.System which does not aid comprehension of the chart. Using gsub a utility function called format_label is created to replace the periods:

format_label <- function(dimension){
  gsub(pattern = "[.]", replacement = " ", x = dimension)
}

Highcharter

The data.frame can now be visualised using highcharter as follows, note that it is unnecessary to use eval with this library as a namespace is not defined within the context of the visualisation.

library(highcharter)
highchart() %>%
  hc_chart(type = "column") %>%
  hc_xAxis(categories = aggregate_mean[,selected_dimension]) %>%
  hc_add_series(name = format_label(selected_dimension), data = aggregate_mean[,measure_column]) %>%
  hc_title(text = paste0("Mean number of desktop items aggregated by ",format_label(selected_dimension)))

The highcharter library distinguishes between horizontal and vertically orientated bar charts by hc_chart(type = ); column and bar are verticle and horizontal, respectively. Both varieties of chart are more legible if bars are ordered from largest to smallest, note that internal to hc_add_series the ordered measure_column is reversed to achieve this:

aggregate_mean <- aggregate_mean[order(aggregate_mean$Desktop.Items),]
highchart() %>%
  hc_chart(type = "bar") %>%
  hc_xAxis(categories = aggregate_mean[,selected_dimension]) %>%
  hc_add_series(name = format_label(selected_dimension), data = rev(aggregate_mean[,measure_column])) %>%
  hc_yAxis(title = list(text = "Mean Number of Desktop Items")) %>%
  hc_title(text = paste0("Mean number of desktop items aggregated by ",format_label(selected_dimension)))

Plotly

The data.frame can now be visualised using Plotly as follows, note that eval is necessary as the x and y arguments are assumed to be explicit column names for the data provided to plotly - eval forces the evaluation of as.name(selected_dimension).

library(plotly)
plot_ly(data = aggregate_mean,
        type = "bar",
        x = eval(as.name(selected_dimension)),
        y = eval(as.name(measure_column))) %>%
  layout(xaxis = list(title = format_label(selected_dimension)),
         yaxis = list(title = "Mean Number of Desktop Items"),
         title = paste0("Mean number of desktop items aggregated by ",format_label(selected_dimension)))

Barcharts are verticle by default in the plotly library, however horizontally orientated bar charts are often more appropriate where dimension labels are long. Independent of orientation, bar charts are more legible if bars are ordered from largest to smallest, as show below. Orientation is controlled in the plotly library through the argument orientation:

aggregate_mean <- aggregate_mean[order(aggregate_mean$Desktop.Items),]
plot_ly(data = aggregate_mean,
        type = "bar",
        y = eval(as.name(selected_dimension)),
        x = eval(as.name(measure_column)),
        orientation = "h") %>%
  layout(xaxis = list(title = "Mean Number of Desktop Items"),
         yaxis = list(title = format_label(selected_dimension)),
         title = paste0("Mean number of desktop items aggregated by ",format_label(selected_dimension)),
         margin = list(l = 80))

Number of Observations per Category

The number of observations per category can be calculated with aggregate by applying the FUN length across the subset data - i.e. how long is the list of observations for each category.

aggregate_number_of_observations <- aggregate(data = desktopItems, eval(as.name(measure_column)) ~ eval(as.name(selected_dimension)), FUN = length)
colnames(aggregate_number_of_observations) <- c(selected_dimension,"Desktop.Items")

Highcharter

Using the same code as above, a barchart of the aggregated data can easily be generated:

aggregate_number_of_observations <- aggregate_number_of_observations[order(aggregate_number_of_observations$Desktop.Items),]
highchart() %>%
  hc_chart(type = "bar") %>%
  hc_xAxis(categories = aggregate_number_of_observations[,selected_dimension]) %>%
  hc_add_series(name = format_label(selected_dimension), data = rev(aggregate_number_of_observations[,measure_column])) %>%
  hc_yAxis(title = list(text = "Mean Number of Desktop Items")) %>%
  hc_title(text = paste0("Mean number of desktop items aggregated by ",format_label(selected_dimension)))

Plotly

Using the same code as above, a barchart of the aggregated data can easily be generated:

aggregate_number_of_observations <- aggregate_number_of_observations[order(aggregate_number_of_observations$Desktop.Items),]
plot_ly(data = aggregate_number_of_observations,
        type = "bar",
        y = eval(as.name(selected_dimension)),
        x = eval(as.name(measure_column)),
        orientation = "h") %>%
  layout(xaxis = list(title = "Number of respondants"),
         yaxis = list(title= format_label(selected_dimension)),
         title = paste0("Number of respondants aggregated by ",format_label(selected_dimension)),
         margin = list(l = 80))

Functionalising

It is convenient to proceduralise the creation of these charts by converting the scripts into functions that can easily be called with different parameters, this is particularly useful for in Shiny apps. A function for each charting library considered in this document is provided below.

Note that the aggregation function is the same, for this tutorial, regardless of the visualisation library used.

aggregate_data_for_barchart <-
  function(data = NA,
           dimension_column = NA,
           measure_column = NA,
           aggregate_function = NA) {
    aggregated_data <-
      aggregate(data = data,
                eval(as.name(measure_column)) ~ eval(as.name(dimension_column)),
                FUN = aggregate_function)
    colnames(aggregated_data) <- c(dimension_column, measure_column)
    
    aggregated_data <-
      aggregated_data[order(aggregated_data[, measure_column]), ]
    # Return for use
    aggregated_data
  }

This function can easily be called to aggregate the data as follows:

intermediate_aggregate <- aggregate_data_for_barchart(
  data = desktopItems,
  dimension_column = "University",
  measure_column = "Desktop.Items",
  aggregate_function = mean
)
knitr::kable(intermediate_aggregate)

	University	Desktop.Items
8	University of Cambridge	0.00000
11	University of Greenwich	7.00000
15	University of Sheffield	8.00000
6	University College London	18.00000
10	University of Glasgow	20.00000
13	University of Leeds	20.00000
5	Queen’s University Belfast	25.00000
7	University of Birmingham	29.00000
12	University of Kent	31.00000
1	Durham University	31.33333
3	King’s College London	34.00000
14	University of Oxford	36.20225
4	Polytech’Tours	59.00000
2	Edinburgh University	61.00000
16	University of the West of England	64.00000
9	University of Durham	87.00000

Plotly

The function below is used to generate a plotly bar chart from the aggregate data function, note that a number of additional arguments have been added to provide greater flexibility over the output.

plotly_aggregated_barchart <- function(
  data = NA,
  dimension_column = NA,
  measure_column = NA,
  aggregate_description = NA,
  left_margin = 100,
  displayFurniture = T
) {
  plot_ly(
    data = data,
    type = "bar",
    y = eval(as.name(dimension_column)),
    x = Desktop.Items,
    orientation = "h"
  ) %>%
    layout(
      xaxis = list(title = aggregate_description),
      yaxis = list(title = ""),
      title = paste0(
        aggregate_description," aggregated by ",
        format_label(dimension_column)
      ),
      margin = list(l = left_margin)
    ) %>%
    config(displayModeBar = displayFurniture)
}

For example:

plotly_aggregated_barchart(
  data = intermediate_aggregate,
  dimension_column = "University",
  measure_column = "Desktop.Items",
  aggregate_description = "Mean number of desktop items",
  displayFurniture = F
)

Highcharter

The function below is used to generate a highcharter bar chart from the aggregate data function, note that a number of additional arguments have been added to provide greater flexibility over the output.

highcharter_aggregated_barchart <- function(
  data = NA,
  dimension_column = NA,
  measure_column = NA,
  aggregate_description = NA
) {
  highchart() %>%
  hc_chart(type = "bar") %>%
  hc_xAxis(categories = data[,dimension_column]) %>%
  hc_add_series(name = format_label(dimension_column), data = rev(aggregate_number_of_observations[,measure_column])) %>%
  hc_yAxis(title = list(text = aggregate_description)) %>%
  hc_title(text = paste0(aggregate_description," of desktop items aggregated by ",format_label(dimension_column)))
}

For example:

highcharter_aggregated_barchart(
  data = intermediate_aggregate,
  dimension_column = "University",
  measure_column = "Desktop.Items",
  aggregate_description = "Mean number of desktop items"
)