DRAFT Triad Data Visualizations: Part 1

These documents provide some examples of triad data visualization primarily using the packages ggtern, compositions and robCompositions. The ggtern package is an extension of R’s ggplot2 package offering many specialized visualization options for presenting 3-part compositional data within ternary plots. The author, Nicholas Hamilton, offers many examples with accompanying code. Compositions and robCompositions provide custom functions to transform, analyse, and model compositional data (data with multiple variables that measure distinct parts of a whole).

This code and documentation were developed to accompany in-person tutorials for an individual needing to learn just enough R to understand, create, and edit visualizations of their dissertation data. I wrote these notes primarily to serve as reminders of coding topics covered in our tutoring sessions.

NB: R software and packages are regularly updated, so version notes are provided at the end of this document along with instructions on how to roll back to older versions, if necessary.

Load, Tidy, and View Data

Load your data

Load one of the demonstration datasets. They are both comma delimited csv files. This example loads triad_data1.csv. Both demonstration data sets contain some triad, dyad, factor, and landscape (stones) data.

# Change the file path to match where you keep the data
demodata1 <- read.csv("E:/P_Teaching/DemoSheets/triad_data1.csv")
str(demodata1)

## 'data.frame':    62 obs. of  23 variables:
##  $ ObsID    : int  27 30 2 37 25 5 10 28 4 52 ...
##  $ StartTime: Factor w/ 47 levels "2/3/2016 19:13",..: 10 11 9 43 23 12 47 4 27 38 ...
##  $ EndTime  : Factor w/ 47 levels "2/3/2016 19:13",..: 11 12 10 43 24 13 47 5 28 39 ...
##  $ T1A      : num  9.65 13.07 45.19 42.77 24.14 ...
##  $ T1B      : num  33.2 78.4 42 40.2 17.4 ...
##  $ T1C      : num  57.12 8.53 12.81 16.98 58.42 ...
##  $ T2A      : num  9.6 22.6 46.9 42.9 60.1 ...
##  $ T2B      : num  47.3 58.7 39.4 45.1 23.1 ...
##  $ T2C      : num  43.1 18.8 13.8 12 16.7 ...
##  $ T3A      : num  24.18 16.48 10.8 17.39 9.93 ...
##  $ T3B      : num  54.7 43.8 81.5 70.5 85.4 ...
##  $ T3C      : num  21.1 39.77 7.73 12.11 4.65 ...
##  $ D1X      : num  0.317 0.289 0.247 0.202 0.488 ...
##  $ D1Y      : num  0.683 0.711 0.753 0.798 0.512 ...
##  $ D2X      : num  0.372 0.546 0.82 0.233 0.909 ...
##  $ D2Y      : num  0.6278 0.4543 0.1798 0.7666 0.0915 ...
##  $ D3X      : num  0.665 0.558 0.587 0.699 0.27 ...
##  $ D3Y      : num  0.335 0.442 0.413 0.301 0.73 ...
##  $ F1       : Factor w/ 5 levels "Jupiter","Mars",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ F2       : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 1 1 1 1 1 ...
##  $ F3       : Factor w/ 2 levels "A","B": 1 2 2 2 2 1 1 2 2 2 ...
##  $ L1XRight : num  0.306 0.351 0.727 0.301 0.724 ...
##  $ L1YTop   : num  0.303 0.5 0.645 0.274 0.772 ...

If you want to view your data, you could type the commands:

str(demodata1) to view the data structure (what class of objects and data)
head(demodata1) to view the top 6 rows of data
summary(demodata1) to see data summaries (min,max of continuous data, counts of factor levels, etc.)

Tidy your data

You might notice that some of the data default to an incorrect class. For example, in demodata1 you need to change ObsID from integer to factor class, and the StartTime and EndTime fields from factor to POSIXct date/time class

demodata1$ObsID <- as.factor(demodata1$ObsID)
demodata1$StartTime <- as.POSIXct(as.character(demodata1$StartTime), format="%m/%d/%Y %H:%M")
demodata1$EndTime <- as.POSIXct(as.character(demodata1$EndTime), format="%m/%d/%Y %H:%M")

It is also usually worthwhile to explore your data for errors and missing values. For example:

You might want to see is a particular column contains NAs. sum(complete.cases(demodata1$Triad1A)) results in 61, so one row in this column must have an NA (because we have 62 rows of data).
You might want to see the ObsID of the rows that have NAs. demodata1$ObsID[which(complete.cases(demodata1)==FALSE)] will output a vector of the associated ObsIDs.
You might want to confirm if the compositional data already sum to 1 (or 100%). demodata1$T1A + demodata1$T1B + demodata1$T1C will show you they all add to 100 or 99.99999, except the row with NAs.

Quick views of all data for one triad

You can use the package ggtern to get a quick view of the data.

#load the libraries for the ggplot2 and ggtern packages
#(this assumes you've already obtained the packages from CRAN - see version and install notes at the bottom of this document)
library(ggplot2)
library(ggtern)

# View one of the triads
# Use ggtern which is a specialized package using ggplot2 tools.
# You will define the data source (demodata1) and then provide the three column names where ggtern
# will find the x, y, and z coordinate data (T1A, T1B, and T1C)
# Then you tell ggtern what kind of geometry to use to display the data, in this case "geom_point"
ggtern(data=demodata1, aes(x=T1A, y=T1B, z=T1C)) +
    geom_point()

The above figure is the most basic ggtern plot you can create. To make the graph more informative, you’d probably want to add better labels, a title, and maybe arrows along the axes. In ggplot2 (and therefore ggtern also) graphics are built up layer by layer and component by component. Notice each time that you add an graphic element you need to use the + symbol.

# Make a prettier plot with labels
ggtern(data=demodata1, aes(x=T1A, y=T1B, z=T1C)) + #define data sources
    geom_point() +          #define data geometry
    theme_showarrows() +    #draw labeled arrows beside axes
    ggtitle("My Favorite Color") +      #add title
    xlab("Red") +                       #replace default axis labels
    ylab("Yellow") +
    zlab("Blue")

Simple Options to View Density Patterns

Patterns in the data can be hard to discern, especially if there are many data points. An easily interpreted alternative is a density plot. ggtern uses a two dimensional kernal density estimator to calculate interpolated density contours. These can be plotted with lines or shaded regions. Different visualizations of the same data can influence which patterns stand out and what conclusions you draw.

#Create a basic density contour plot
ggtern(data=demodata1, aes(x=T1A, y=T1B, z=T1C)) + #define data sources
    geom_point() +          #define first data geometry
    geom_density_tern()     #define second data geometery

#Notice that you could leave out the points, if you prefer
ggtern(data=demodata1, aes(x=T1A, y=T1B, z=T1C)) +      #define data sources
    geom_density_tern()                                 #define a data geometery

#You can assign a color gradient to the lines using the density values
ggtern(data=demodata1, aes(x=T1A, y=T1B, z=T1C)) +              #define data sources
    geom_density_tern(aes(fill=..level.., alpha=..level..))     #define a data geometery with an aesthetic

#Or you can apply a color gradient to space between the contour lines
ggtern(data=demodata1, aes(x=T1A, y=T1B, z=T1C)) +                          #define data sources
    stat_density_tern(aes(fill=..level.., alpha=..level..),geom='polygon') +#now you need to use stat_density_tern
    scale_fill_gradient2(high = "red") +                                    #define the fill color
    guides(color = "none", fill = "none", alpha = "none")                   #we don't want to display legend items

We can bring together all these components to build a plot displaying triad data with all points, density, and labels.

ggtern(data=demodata1, aes(x=T1A, y=T1B, z=T1C)) + 
    stat_density_tern(aes(fill=..level.., alpha=..level..), geom='polygon') +
    scale_fill_gradient2(high = "blue") +  
    geom_point() +
    theme_showarrows() +
    ggtitle("My Favorite Color") +
    xlab("Red") + 
    ylab("Yellow") +
    zlab("Blue") +
    guides(color = "none", fill = "none", alpha = "none")

Session and Package Information

I created and tested these examples with:

R Studio Version 0.99.892
R version 3.2.4 (2016-03-10) – “Very Secure Dishes”
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 8 x64 (build 9200)
package ‘ggtern’ version 2.1.0
package ‘ggplot2’ version 2.1.0

If you need to install older versions of ggplot2 and ggtern enter these commands (substitute the version numbers you are seeking):

oldggternurl <- “http://cran.r-project.org/src/contrib/Archive/ggtern/ggtern_1.0.2.0.tar.gz”
install.packages(oldggternurl, repos=NULL, type=“source”)
oldggplot2url <- “http://cran.r-project.org/src/contrib/Archive/ggplot2/ggplot2_1.0.1.tar.gz”
install.packages(oldggplot2url, repos=NULL, type=“source”)

After installation, you should restart R Studio.

Contact Information

For more information about this R script and associated data support consulting services, contact Dr. Ashton Drew.

alt text

DRAFT Triad Data Visualizations: Part 1

C. A. Drew, KDV Decision Analysis LLC

March 10, 2016

Load, Tidy, and View Data

Simple Options to View Density Patterns

Session and Package Information

Contact Information