Introduction

The kirkegaard package contains a number of helper function for ggplot2. These are designed to save time, but do not genrally expand the capabilities of what a skilled ggplot2 user can do. As such, they do not constitute an extension. This documents gives some examples of the functions. All the functions begin with GG_ so they are easy to find.

GG_scatter

This is a convenience function to easily make scatterplots that add useful information such as the observed correlation in the plot and case names. To use it, give it a data.frame and the names of the two variables:

GG_scatter(iris, "Petal.Length", "Sepal.Width")

By default, the rownames are used as case names. We can turn this off with case_names = F:

GG_scatter(iris, "Petal.Length", "Sepal.Width", case_names = F)

The correlation, its confidence interval and the sample size is automatically shown in the corner where the data is least likely to be. One can control the position using text_pos:

GG_scatter(iris, "Petal.Length", "Sepal.Width", case_names = F, text_pos = "bl")

One can add weights which are automatically mapped to the size of the points using weights:

set.seed(1)
GG_scatter(iris, "Petal.Length", "Sepal.Width", case_names = F, weights = runif(150))

Note that the correlation is a weighted correlation and automatically uses the supplied weights as well.

If we want to use other case names, they can be supplied using case_names_vector:

set.seed(1)
GG_scatter(iris, "Petal.Length", "Sepal.Width", case_names_vector = sample(letters, replace = T, size = 150))

We can use another confidence interval by passing it to CI:

GG_scatter(iris, "Petal.Length", "Sepal.Width", case_names = F, CI = .99)

GG_denhist

It is frequently desired to make density or histograms of data distributions. GG_denhist makes both and combines them:

GG_denhist(iris, "Sepal.Length")

Currently, the y scale is nonsensical and only the relative differences are meaningful. In the future, the y scale will be the proportion.

A vertical one is automatically plotted for the mean. We can supply another function to vline if we want another kind of average:

GG_denhist(iris, "Sepal.Length", vline = median)

We can supply a groping variable if we want to compare groups:

GG_denhist(iris, "Sepal.Length", group = "Species")

GG_group_means

Examining group averages is a frequent task. GG_group_means makes this easier, supply a data.frame, the name of the data variable and the name of the grouping variable:

GG_group_means(iris, var = "Sepal.Length", groupvar = "Species")

There are a number of built in visualizations which can be controlled by type:

#bar (default)
GG_group_means(iris, var = "Sepal.Length", groupvar = "Species", type = "bar")

#point
GG_group_means(iris, var = "Sepal.Length", groupvar = "Species", type = "point")

#points
GG_group_means(iris, var = "Sepal.Length", groupvar = "Species", type = "points")

#violin
GG_group_means(iris, var = "Sepal.Length", groupvar = "Species", type = "violin")

#violin2
GG_group_means(iris, var = "Sepal.Length", groupvar = "Species", type = "violin2")

Confidence intervals are automatically shown. These can be controlled with CI:

GG_group_means(iris, var = "Sepal.Length", groupvar = "Species", type = "violin", CI = .99999)

We can also supply subgroups using subgroup:

#make up some subgroup data
set.seed(1)
iris$letter = sample(letters[1:2], replace = T, size = 150)

#bar (default)
GG_group_means(iris, var = "Sepal.Length", groupvar = "Species", type = "bar", subgroup = "letter")

#point
GG_group_means(iris, var = "Sepal.Length", groupvar = "Species", type = "point", subgroup = "letter")

#points
GG_group_means(iris, var = "Sepal.Length", groupvar = "Species", type = "points", subgroup = "letter")

#violin
GG_group_means(iris, var = "Sepal.Length", groupvar = "Species", type = "violin", subgroup = "letter")

#violin2
GG_group_means(iris, var = "Sepal.Length", groupvar = "Species", type = "violin2", subgroup = "letter")

GG_kmeans

GG_kmeans is a simple function to perform k-means cluster analysis on a dataset and plot the results:

GG_kmeans(iris[1:4], clusters = 3)

GG_forest

The popular package metafor is used to meta-analyze data. However, the built in plotting functions are ugly and based on base-r graphics:

#load built in dataset about European genomic ancestry and socioeconomic outcomes across the Americas
data(european_ancestry)
#meta-analysis
meta = rma(yi = european_ancestry$r, sei = european_ancestry$SE_r)
#plot
forest(meta)

I really dislike base-r plots. To make a ggplot2 plot, simply call GG_forest on the analysis:

GG_forest(meta)

## Warning: failed to assign NativeSymbolInfo for env since env is already
## defined in the 'lazyeval' namespace

It doesn’t seem that the names of the analyses are saved in the rma object, so we have to supply them to .names if they are desired:

GG_forest(meta, .names = european_ancestry$Author_sample)

GG_funnel

As with forest plots, metafor comes with a built in funnel plot:

funnel(meta)

GG_funnel is an intended ggplot2-based replacement of this:

GG_funnel(meta)

Outlying studies are automatically colored red. In the future, automatic labeling of (outlying/all) points will be added, but it can currently be done manually using standard ggplot2 functions:

#use ggrepel
library(ggrepel)
GG_funnel(meta) +
  ggrepel::geom_label_repel(data = european_ancestry, aes(r, SE_r, label = Author_sample), size = 2)

ggplot2 functions in kirkegaard package

Emil O. W. Kirkegaard