Overview of Tutorial

This tutorial demonstrates how to create interactive barcharts like the one below using a variety of different libraries, currently including; plotly and highchart.

The datasets that this tutorial considers are structured as follows:

Measure Category.1 Category.2
1 A X
2 B Y
3 A Y

Where the “measure” column contains numerical data which is categorised by a number of categories (or dimensions). There are therefore two interesting bar charts that can be generated:

Note that this template covers both how to build such a bar chart inside of an HTML RMarkdown file and how to functionalise the code so as to conveniently switch between different categories and metrics in a Shiny app.

Import and Clean Data

The data for this template is a .csv file accessed from Figshare here using read.csv.

desktopItems <- read.csv(file = "https://ndownloader.figshare.com/files/5360960")
knitr::kable(head(desktopItems))
Timestamp Desktop.Items Operating.System University.Department University Country
9/30/2015 13:07:58 5 Mac (OS X) IT Services University of Oxford UK
11/06/2015 12:20 87 Linux Physics University of Durham UK
11/06/2015 12:33 25 Windows 10 Physics Queen’s University Belfast UK
11/06/2015 12:46 20 Windows 7 Physics University of Leeds UK
11/06/2015 12:48 64 Windows 8 International Office University of the West of England UK
11/06/2015 12:50 34 Windows 7 Biology King’s College London UK

An advanced version of this template might attempt to automatically infer the measure and appropriate categories for the data, in this template we explicitly decide which columns are categories (or dimensions) and which column is the measure:

measure_column <- "Desktop.Items"
categories <- c("Operating.System","University.Department","University","Country")

These columns will be used in the BarCharts to decide what dimensions of the data we are visualising.

Mean Parameter per Category

Using the aggregate function the mean number of desktop items per category can easily be calculated, the chosen category for aggregation will be assigned to selected_dimension. The function as.name is necessary to convert strings into valid column names.

selected_dimension <- categories[1]
aggregate_mean <- aggregate(data = desktopItems, eval(as.name(measure_column)) ~ eval(as.name(selected_dimension)), FUN = mean)
knitr::kable(aggregate_mean)
eval(as.name(selected_dimension)) eval(as.name(measure_column))
Linux 30.90909
Mac (OS X) 19.76471
Windows 10 17.07143
Windows 7 43.37500
Windows 8 48.09091

For convenience in the visualisation, the column names of the data.frame are renamed:

colnames(aggregate_mean) <- c(selected_dimension, measure_column)

The column names for the categories are formatted with periods instead of spaces, i.e. Operating.System which does not aid comprehension of the chart. Using gsub a utility function called format_label is created to replace the periods:

format_label <- function(dimension){
  gsub(pattern = "[.]", replacement = " ", x = dimension)
}

Highcharter

The data.frame can now be visualised using highcharter as follows, note that it is unnecessary to use eval with this library as a namespace is not defined within the context of the visualisation.

library(highcharter)
highchart() %>%
  hc_chart(type = "column") %>%
  hc_xAxis(categories = aggregate_mean[,selected_dimension]) %>%
  hc_add_series(name = format_label(selected_dimension), data = aggregate_mean[,measure_column]) %>%
  hc_title(text = paste0("Mean number of desktop items aggregated by ",format_label(selected_dimension)))

The highcharter library distinguishes between horizontal and vertically orientated bar charts by hc_chart(type = ); column and bar are verticle and horizontal, respectively. Both varieties of chart are more legible if bars are ordered from largest to smallest, note that internal to hc_add_series the ordered measure_column is reversed to achieve this:

aggregate_mean <- aggregate_mean[order(aggregate_mean$Desktop.Items),]
highchart() %>%
  hc_chart(type = "bar") %>%
  hc_xAxis(categories = aggregate_mean[,selected_dimension]) %>%
  hc_add_series(name = format_label(selected_dimension), data = rev(aggregate_mean[,measure_column])) %>%
  hc_yAxis(title = list(text = "Mean Number of Desktop Items")) %>%
  hc_title(text = paste0("Mean number of desktop items aggregated by ",format_label(selected_dimension)))

Plotly

The data.frame can now be visualised using Plotly as follows, note that eval is necessary as the x and y arguments are assumed to be explicit column names for the data provided to plotly - eval forces the evaluation of as.name(selected_dimension).

library(plotly)
plot_ly(data = aggregate_mean,
        type = "bar",
        x = eval(as.name(selected_dimension)),
        y = eval(as.name(measure_column))) %>%
  layout(xaxis = list(title = format_label(selected_dimension)),
         yaxis = list(title = "Mean Number of Desktop Items"),
         title = paste0("Mean number of desktop items aggregated by ",format_label(selected_dimension)))

Barcharts are verticle by default in the plotly library, however horizontally orientated bar charts are often more appropriate where dimension labels are long. Independent of orientation, bar charts are more legible if bars are ordered from largest to smallest, as show below. Orientation is controlled in the plotly library through the argument orientation:

aggregate_mean <- aggregate_mean[order(aggregate_mean$Desktop.Items),]
plot_ly(data = aggregate_mean,
        type = "bar",
        y = eval(as.name(selected_dimension)),
        x = eval(as.name(measure_column)),
        orientation = "h") %>%
  layout(xaxis = list(title = "Mean Number of Desktop Items"),
         yaxis = list(title = format_label(selected_dimension)),
         title = paste0("Mean number of desktop items aggregated by ",format_label(selected_dimension)),
         margin = list(l = 80))

Number of Observations per Category

The number of observations per category can be calculated with aggregate by applying the FUN length across the subset data - i.e. how long is the list of observations for each category.

aggregate_number_of_observations <- aggregate(data = desktopItems, eval(as.name(measure_column)) ~ eval(as.name(selected_dimension)), FUN = length)
colnames(aggregate_number_of_observations) <- c(selected_dimension,"Desktop.Items")

Highcharter

Using the same code as above, a barchart of the aggregated data can easily be generated:

aggregate_number_of_observations <- aggregate_number_of_observations[order(aggregate_number_of_observations$Desktop.Items),]
highchart() %>%
  hc_chart(type = "bar") %>%
  hc_xAxis(categories = aggregate_number_of_observations[,selected_dimension]) %>%
  hc_add_series(name = format_label(selected_dimension), data = rev(aggregate_number_of_observations[,measure_column])) %>%
  hc_yAxis(title = list(text = "Mean Number of Desktop Items")) %>%
  hc_title(text = paste0("Mean number of desktop items aggregated by ",format_label(selected_dimension)))

Plotly

Using the same code as above, a barchart of the aggregated data can easily be generated:

aggregate_number_of_observations <- aggregate_number_of_observations[order(aggregate_number_of_observations$Desktop.Items),]
plot_ly(data = aggregate_number_of_observations,
        type = "bar",
        y = eval(as.name(selected_dimension)),
        x = eval(as.name(measure_column)),
        orientation = "h") %>%
  layout(xaxis = list(title = "Number of respondants"),
         yaxis = list(title= format_label(selected_dimension)),
         title = paste0("Number of respondants aggregated by ",format_label(selected_dimension)),
         margin = list(l = 80))

Functionalising

It is convenient to proceduralise the creation of these charts by converting the scripts into functions that can easily be called with different parameters, this is particularly useful for in Shiny apps. A function for each charting library considered in this document is provided below.

Note that the aggregation function is the same, for this tutorial, regardless of the visualisation library used.

aggregate_data_for_barchart <-
  function(data = NA,
           dimension_column = NA,
           measure_column = NA,
           aggregate_function = NA) {
    aggregated_data <-
      aggregate(data = data,
                eval(as.name(measure_column)) ~ eval(as.name(dimension_column)),
                FUN = aggregate_function)
    colnames(aggregated_data) <- c(dimension_column, measure_column)
    
    aggregated_data <-
      aggregated_data[order(aggregated_data[, measure_column]), ]
    # Return for use
    aggregated_data
  }

This function can easily be called to aggregate the data as follows:

intermediate_aggregate <- aggregate_data_for_barchart(
  data = desktopItems,
  dimension_column = "University",
  measure_column = "Desktop.Items",
  aggregate_function = mean
)
knitr::kable(intermediate_aggregate)
University Desktop.Items
8 University of Cambridge 0.00000
11 University of Greenwich 7.00000
15 University of Sheffield 8.00000
6 University College London 18.00000
10 University of Glasgow 20.00000
13 University of Leeds 20.00000
5 Queen’s University Belfast 25.00000
7 University of Birmingham 29.00000
12 University of Kent 31.00000
1 Durham University 31.33333
3 King’s College London 34.00000
14 University of Oxford 36.20225
4 Polytech’Tours 59.00000
2 Edinburgh University 61.00000
16 University of the West of England 64.00000
9 University of Durham 87.00000

Plotly

The function below is used to generate a plotly bar chart from the aggregate data function, note that a number of additional arguments have been added to provide greater flexibility over the output.

plotly_aggregated_barchart <- function(
  data = NA,
  dimension_column = NA,
  measure_column = NA,
  aggregate_description = NA,
  left_margin = 100,
  displayFurniture = T
) {
  plot_ly(
    data = data,
    type = "bar",
    y = eval(as.name(dimension_column)),
    x = Desktop.Items,
    orientation = "h"
  ) %>%
    layout(
      xaxis = list(title = aggregate_description),
      yaxis = list(title = ""),
      title = paste0(
        aggregate_description," aggregated by ",
        format_label(dimension_column)
      ),
      margin = list(l = left_margin)
    ) %>%
    config(displayModeBar = displayFurniture)
}

For example:

plotly_aggregated_barchart(
  data = intermediate_aggregate,
  dimension_column = "University",
  measure_column = "Desktop.Items",
  aggregate_description = "Mean number of desktop items",
  displayFurniture = F
)

Highcharter

The function below is used to generate a highcharter bar chart from the aggregate data function, note that a number of additional arguments have been added to provide greater flexibility over the output.

highcharter_aggregated_barchart <- function(
  data = NA,
  dimension_column = NA,
  measure_column = NA,
  aggregate_description = NA
) {
  highchart() %>%
  hc_chart(type = "bar") %>%
  hc_xAxis(categories = data[,dimension_column]) %>%
  hc_add_series(name = format_label(dimension_column), data = rev(aggregate_number_of_observations[,measure_column])) %>%
  hc_yAxis(title = list(text = aggregate_description)) %>%
  hc_title(text = paste0(aggregate_description," of desktop items aggregated by ",format_label(dimension_column)))
}

For example:

highcharter_aggregated_barchart(
  data = intermediate_aggregate,
  dimension_column = "University",
  measure_column = "Desktop.Items",
  aggregate_description = "Mean number of desktop items"
)