Kite Graphs in R

Since my daughter needs to make Kite graphs for year 12 biology at school, and no-one knows how to make them on computer, here is a step by step walkthrough of using R to make them. Mostly because I think this is an easier way than some of the other tutorials out there. This is also showing how I am thinking through how to create a graph I have not seen before and talks about why I am organising the data in various ways as I do so. All using the base graphics options in R.

What is a Kite Graph:

Kite graphs are a way of showing how much of various things there are as you go along some kind of measurement. Normally this is a measurement of distances (so measuring populations of species along a survey line) or a measurement of time (measuring species through time).

An Example Kite Graph

In looking at a kite graph, I would describe the following characteristics as being important for making the graph

The time or distance is on the x axis.
Each species comes in at a different y axis height
The abundance of the species is shown by how far it spreads from the graph.

Thinking about the last two points, it is going to be relatively easy to think of each species as being at one step intervals with the abundance being a difference of up to +/- 0.5 from the level of the step.

There are actually two ways of representing data in information for a kite graph, you could be getting raws counts of species, or your could be making a 0 to 5 range estimate of abundance, with 5 being abundant (this is often used for cases of ground cover). Either one could be scaled into a +/- 0.5 range but they would use slightly different ways, so while a person used to R could scale data differently within the set, for this example I am going to assume all the columns are abundance measures or all the columns are raw data, rather than being a mix.

Prerequisites:

You have R installed

R can be obtained from http://cran.r-project.org

You have RStudio installed

RStudio can be obtained from http://www.rstudio.com/products/rstudio/download/

You have a spreadsheet program for initial data collection

I don’t care which spreadsheet program, anything that can save as a .csv file is good with me.

Organising our data

Use a spreadsheet program to organise the data into having the measure in the first column (the time or distance measures that are ultimately going to be on the x axis) and the things you are measuring in the subsequent columns. It is really important you have a nice tidy arrangement of data with 1 row of headings at the top. The headings are fairly short, as they will appear on the graph in the axis (less than 12 characters unless you know enough about R to play with graph settings). Varying from this plan will mean you will have to vary the R code. This is really important- no added text, just one row of headings and the data.

tidy data

It is going to be really important the first column have the distance (time/space) measurement numbers. For this simple version I am assuming the distance measures are numbers, not for example times, just to avoid time conversion issues with this example if you are following along with your own data (so use numbers, or learn how to handle dates and times in R)

Also, note that entries with no observed species are 0, leaving it blank would assume readings are no made, and this would complicate our graphs, so for this demonstration I am assuming that each species was looked for at each stage.

Now, I am saving the data as a csv file (most spreadsheet programs you use a Save As… command) called survey.csv into a folder called kitegraph. It is important that from here on, if you used different names for the csv file or the folder you will need to change the computer code to match.

csv file in folder

Starting R

We open up R Studio. In the menus at the top, there is a Session menu. From the Session menu choose Set Working Directory, then use the Choose Directory... command to set R to use the kitegraph folder (or whatever folder you chose)

Settings Menu

This makes the folder the base of R for building this graph, so it is easy to read in the data. Now, with a default arrangement of R we have a console area subwindow on the left of the RStudio window, and we are going to be copying and pasting instructions in and pressing Return or Enter to finish them (people used to R will probably use a script, but I am step-by-steping through for someone who hasn’t used it.

We read in the data

survey <- read.csv("survey.csv", check.names = FALSE)

This uses the read.csv command to read in the survey.csv file while telling it not to worry about if the names are technically correct (as we are not using them for anything complex). It stores the contents of the csv file in a variable called survey.

We set if the data is raw numbers or abundance estimates

If it is raw numbers, put in

graphType <- "raw"

If it is abundance estimates, put in

graphType <- "abundance"

This creates a variable called graphType that stores if it is a raw graph or an abundance one, so we can use that information later. In fact, let’s use it now, here are several lines of code to put in to rescale the data to a range of 0 to 0.5.

We selectively rescale the data

if (graphType == "raw"){
    survey[,2:ncol(survey)] <- (survey[,2:ncol(survey)] / max(survey[,2:ncol(survey)]))/2
    }

If, and only if, it is raw data, it divides the data (all but the first column) by the biggest entry (making everything between 0 and 1) then divides that result by 2 (making it all between 0 and 0.5)

if (graphType == "abundance"){
    survey[,2:ncol(survey)] <- survey[,2:ncol(survey)] / 10
    }

If, and only if, it is abundance data, it divides the data (which is between 0 and 5) by 10 (making it between 0 and 0.5)

Set up the data just right

For making the graph we are basically going to be making a polygon by going along from left to right above the step level, then coming back from right to left along the bottom. To do this, we:

make sure the data is in the right order

survey <- survey[order(survey[,1]),]

This just makes sure all the rows of the data are in numeric order based on the first column (the distance)

make a full copy of the data

survey2 <- survey

Because we need to go there and back, we are making a second copy.

reverse the order in the second copy

survey2 <- survey2[order(survey2[,1], decreasing = TRUE),]

Because we need to go back again, we are reversing the order.

make the values negative

survey2[,2:ncol(survey2)] <- survey2[,2:ncol(survey2)] * -1

We want to come back below the step level at the equivalent height so make our second set of data negative

stick the two sets of data together to make one

survey <- rbind(survey,survey2)

Make a graph

Because we are making a blank graph and building it up with the kites, we will start with working out the dimensions

leftedge <- min(survey[,1])
rightedge <- max(survey[,1])
bottomedge <- 0
topedge <- ncol(survey)

we will also store the old margins and set new ones as we are tinlkering with the amount of space on the left

oldMargins <- par()$mar
par(mar=c(5.1,7,4.1,2.1))

make the initial graph with no contents or y axis (this will appear in the plots tab in the lower right of RStudio)

plot(c(leftedge,rightedge), c(bottomedge,topedge), type= "n", xlab=names(survey)[1], frame.plot=F, yaxt="n", ylab="")
axis(2, labels=names(survey)[2:ncol(survey)], at=1:(ncol(survey)-1), las=2, lty=0)

plot of chunk unnamed-chunk-14

Let’s loop through each column, making the kite plot for each by drawing a connect the points polygon

xValues = survey[,1]
for (i in 2:ncol(survey)){
  yValues = i + survey[,i] - 1
  polygon(xValues,yValues, col=i)
}

plot of chunk unnamed-chunk-16

This would look good with a main title, feel free to customise the text in the double quotes.

title(main="Example Kite Graph")

plot of chunk unnamed-chunk-18

Finally, because we played with the margin setting, we will put them back to the defaults (or we could restart RStudio)

par(mar=oldMargins)

So that is building up a new kind of graph using the base graphics system in R (and how to make a kite graph).