Bird Trait Networks example

Lets have a look at an example documentation I created using the data I compiled for the Bird Trait Networks project and R Markdown

So I have compiled my large dataset and I want to start exploring it. The plain text chunks in an R Markdown document (.Rmd) are a great space to document any procedures and methods used to produce the data. I’ve spared you that here for my dataset but you should use this space with your own example dataset to include as much useful detail as you can to make the methods used to generate your data as understandable as possible.



Data

Lets first have a look at what we are dealing with and start by loading the data and the metadata:

### SETTINGS ##############################################################
input.folder <- "~/Documents/WORK/ACCE Data management course/workflow/inputs/exercises/metadata/"

### FILES #################################################################

meta <- read.csv(paste(input.folder,"metadata.csv", sep =""), stringsAsFactors = F)
dd   <- read.csv(paste(input.folder,"data.csv", sep =""))

### PACKAGES #################################################################

require(knitr) # needed for fuction kable


So now the data has been loaded into and r environmnent. I’ll use the kable function to make a cool html table of the first 30 rows of the data

kable(head(dd, 30), caption = "Table 1: Sample of the Bird Trait Networks dataset")
Table 1: Sample of the Bird Trait Networks dataset
species max.altitude inc dev.mode courtship.feed.m song.dur breed.system
Abroscopus_albogularis NA NA NA NA NA NA
Abroscopus_superciliaris NA NA NA NA NA NA
Acanthagenys_rufogularis NA NA NA NA NA NA
Acanthidops_unicolor NA NA NA NA NA NA
Acanthis_flammea 1400 10.0 1 1 19.8 1
Acanthis_hornemanni NA 12.0 NA 1 NA NA
Acanthisitta_chloris NA 19.5 2 NA NA 4
Acanthiza_apicalis NA NA NA NA NA NA
Acanthiza_chrysorrhoa NA NA NA NA NA NA
Acanthiza_lineata NA NA NA NA NA NA
Acanthiza_nana NA NA NA NA NA NA
Acanthiza_pusilla NA NA NA NA NA NA
Acanthiza_reguloides NA NA NA NA NA NA
Acanthiza_uropygialis NA NA NA NA NA NA
Acanthorhynchus_superciliosus NA NA NA NA NA NA
Acanthorynchus_tenuirostris NA NA NA NA NA NA
Accipiter_badius NA 30.0 2 NA NA 1
Accipiter_bicolor NA NA NA NA NA NA
Accipiter_brevipes NA 32.5 2 NA NA 1
Accipiter_cirrocephalus NA NA NA NA NA NA
Accipiter_cooperii NA 24.0 2 NA NA 5
Accipiter_fasciatus NA 30.0 NA NA NA NA
Accipiter_gentilis NA 33.0 2 NA NA 1
Accipiter_melanoleucus NA 37.5 2 NA NA 1
Accipiter_nisus 1930 34.0 2 NA NA 5
Accipiter_novaehollandiae NA NA NA NA NA NA
Accipiter_striatus NA 34.0 2 NA NA 1
Acridotheres_cristatellus NA 15.0 NA NA NA NA
Acridotheres_tristis NA 15.5 2 0 NA 1
Acrocephalus_agricola NA NA NA NA NA NA



Exploratory plots

To begin with I want to do some basic sanity checks. So I might want to firstly check the distribution of the data for each variable. This can help me identify outliers or other data entry errors. Here the metadata table can be very useful. Lets have a quick look at it:

Table 2: Variable metadata
code orig.vname cat descr scores levels type units
max.altitude Altitude ECOLOGY Maximum altitudinal distribution NA NA con m
inc Incubation period LIFE-HISTORY Incubation period NA NA con days
dev.mode Developmental mode LIFE-HISTORY Developmental mode 1;2;3 Altricial;Semiprecocial;Precocial cat NA
courtship.feed.m Courtship feeding (by the male) SEXUAL SELECTION Courtship feeding (by the male) 0;1 FALSE;TRUE bin NA
song.dur Song duration BEHAVIORAL Song duration NA NA con seconds
breed.system Breeding system BEHAVIORAL Which adult(s) provides the majority of care: 1;2;3;4;5 Pair;Female;Male;Cooperative;Occassional cat NA


Descriptive plot axes labels

Firstly, while the coded variable names are reasonably descriptive, I want to make sure the variables are clearly specified in their plots. I also want to include units. All this makes the plots and therefore the data more understandable by both myself and my collaborators. So I can use information in the metadata to construct more informative axis labels.

For this I’ve created the function axisLabel that takes a single row of the metadata dataframe (the row containing the information for the variable I want to create the axis label for) and combines information in columns descr and units.

### FUNCTIONS ##############################################################

# function takes dataframe consisting of a single variable metadata row. 
# Row must have columns named `descr` containing variable description and units
# containing units (NA if variable is unitless)
axisLabel <- function(metadata){
  
  # select description for variable
  descr <- metadata$descr
  
  # select units for variable if applicable and place in parenthesis
  units <- if(is.na(metadata$units)){NULL}else{
    paste(" (", metadata$units, ")", sep ="")}
  
  # combine description and units to create axis label
  label <- paste(descr, units, sep = "")
  
  # return label
  return(label)
    
}

You are welcome to copy and use this function. Just make sure you supply the function argument metadata with the a dataframe with a single row and with the appropriate data in approrpiately named columns descr and units. You can ofcourse edit the function to change these requirements.


The right plot for the right data type

I am also dealing with a variety of data types including continuous, integer, categorical and binary. So I can use the cat column in the metadata to determine the right plot type to use for each data type.


More informative labels for categorical/binary variables

Finally, because my categorical/binary variables are coded, I can use the information in the levels metadata column to provide more informative barplot labels.

To produce a series of plots for my dataset, I write a for() loop, and nest if() conditional statements in it, to specify when to use the two most appropriate plot types. I also want to calculate and include n, the number of observations available for each variable, in each plot because it’s a basic check I always investigate.



CODE & PLOTS

vars <- c("max.altitude", "inc", "dev.mode", "courtship.feed.m", "song.dur", "breed.system")

for(var in vars){
  
  # subset master dataset to variable data to plot. omit NAs
  x <- na.omit(dd[,var])
  
  # subset metadata to a single variable row
  var.meta <- meta[meta$code == var,]
  
  # use axisLabel function to create axis label for variable
  xlabel <- axisLabel(metadata = var.meta)
  
  
  
  # Use variable metadata to determine the most approriate plot for the data method
  # _________________________________________________________________________
  
  ################################################ 
  ### Plotting continuous or integer variables ###
  ################################################
  
  #  if data type is continuous or integer, use histogram
  
  if(var.meta$type %in% c("con", "int")){
    
    hist(x, xlab = xlabel, main = paste("n =", length(x)), col = "gray")
    
  }
  
  
  ################################################ 
  ### Plotting categorical or binary variables ###
  ################################################
  
  #  if data type is binary or categorical, use barplot
  if(var.meta$type %in% c("bin", "cat")){
    
    # split the string containing levels to get category labels for your codes
    # skip this step if your data catecories are not coded     
    levels <- strsplit(x = var.meta$levels,
                       split = ";")
    
    # plot TABLE of category frequencies
    barplot(table(dd[,var]), main = paste("n =", length(x)), xlab = xlabel, ylab = "Frequency" 
            , names.arg = levels[[1]] # remove argument if you skiped previous step. Delete line
            )
  }
  
}



YOUR MISSION

Here’s a link to the .Rmd file that created this .html file.