This is my first attempt at a tutorial, I don’t know if anyone will find it useful, but perhaps someone will stumble across it and find it interesting.

I’ve always been a little overwhelmed by packages that plot pedigrees, they always seem to have hundreds of lines of code and it always seemed like a very involved process to be able to go from a huge pedigree to such a complicated diagram.

However, as I’ve become more familiar with R the the mystery behind plotting a pedigree has slowly went away. The truth is that plotting a pedigree is easy, really easy.

I’d argue that in terms of quickly visualising your data, this will be as quick as loading up a package and getting your pedigree into the correct format.

Required data and additional files can be found at: github.com/simplydch/Pedigree-Plotting-in-R

Please feel free to get in touch with any questions or comments.

Plotting a pedigree

To demonstrated I have simulated a small pedigree consisting of 136 individuals in 10 cohorts:

load("short_ped.RData")
head(short_ped)
##    ID Sex Gen SireID MumID Status Died Cohort Arrived
## 1 ID1   1   1   <NA>  <NA>      d    9   2000      NA
## 2 ID2   1   1   <NA>  <NA>      d    2   2000      NA
## 3 ID3   1   1   <NA>  <NA>      d    3   2000      NA
## 4 ID4   1   1   <NA>  <NA>      d    4   2000      NA
## 5 ID5   1   1   <NA>  <NA>      d    6   2000      NA
## 6 ID6   1   1   <NA>  <NA>      d    3   2000      NA
dim(short_ped)
## [1] 136   9

In order to plot the pedigree the first step is to set up the coordinates of the points for each individual.

## We need to take the Cohorts from the pedigree to provide our y-axis values

y_coor <- short_ped[, "Cohort"]

## We then need x-coordinates that spread each set of cohorts out evenly across the x-axis

## First we can use table to find how many individuals are in each cohort
## For each of these values we generate an evenly spaced sequence between 1 and 10
## The output of sapply is a list, so do.call is used to join the values
## into a single vector

y_counts <- unname(table(y_coor))
x_coor <- do.call(c, sapply(y_counts, function(x) seq(1, 10, length = x)))

# And we can have our first attempt at plotting

plot(x_coor, y_coor)

So we have every individual represented as a dot in the line that relates to their cohort.

Now we need to join those dots, using the under-appreciated, but incredibly useful match function!

This allows us to find the row reference of the sire that relates to each id. If we re-arrange the order of the plotted x and y values using these references that’s all the information needed to plot all the sire links using the segments function. The segments function draws lines between two sets of coordinates

## First we need to all pull the ids, sires and dams from the pedigree.

ids <- short_ped[, "ID"]
sires <- short_ped[, "SireID"]
dams <- short_ped[, "MumID"]

# And then get row references, and then coordinates for the sires.

sire_ref <- match(sires, ids)

x_sire <- x_coor[sire_ref]
y_sire <- y_coor[sire_ref]

# Segments creates lines between the points in the provided x and y - values

plot(x_coor, y_coor)
segments(x_coor, y_coor, x_sire, y_sire)

This looks good, but obviously isn’t quite right. Why are there individuals at the bottom of the plot that have no sire links? Well, out plot is currently upside down. Those must be the female founders!

# Lets rearrange the plot reversing the y-values.

plot(x_coor, y_coor, ylim = c(max(y_coor), min(y_coor)))

# And add those sire links again, this time making them orange.

segments(x_coor, y_coor, x_sire, y_sire, col = "orange")

# And we can then also add in the dams using blue lines

dam_ref <- match(dams, ids)

x_dam <- x_coor[dam_ref]
y_dam <- y_coor[dam_ref]

segments(x_coor, y_coor, x_dam, y_dam, col = "blue")

And that’s it. We’ve plotted the pedigree, using different coloured lines to represent sires and dams.

We probably still want to change a few settings to make the plot neater.

par(mar = c(0.5, 4, 0.5, 0.5))
plot(x_coor, y_coor, ylim = c(max(y_coor), min(y_coor)), col = rgb(0, 0, 0, 0.5), pch = ".", cex = 1, frame.plot = FALSE, ylab = "", xlab = "", axes = FALSE)

## We can add in the Cohort information
axis(side = 2, labels = TRUE, tick = FALSE, las = 2, at = unique(y_coor))

segments(x_coor, y_coor, x_sire, y_sire, col = "orange")
segments(x_coor, y_coor, x_dam, y_dam, col = "blue")

It really is that quick and easy. If you wish to only plot some links, for example the descendants of one individual, things become slightly more challenging. However, you have all your links already, you just need to figure out which ones to ‘turn on’.

To demonstrate this next stage we can plot all the offspring of individual ID4.

# Lets start with the background pedigree, we can specify the colours as rgb
# and make them grey and transparent.

plot(x_coor, y_coor,
  ylim = c(max(y_coor), min(y_coor)), col = rgb(0, 0, 0, 0.5), pch = ".", cex = 1,
  frame.plot = FALSE, axes = F, ylab = "", xlab = "")

axis(side = 2, labels = TRUE, tick = FALSE, las = 2, at = unique(y_coor))
segments(x_coor, y_coor, x_sire, y_sire, col = rgb(0.9, 0.9, 0.9, 0.6))
segments(x_coor, y_coor, x_dam, y_dam, col = rgb(0.9, 0.9, 0.9, 0.6))



# Lets plot the focal id (in red) using points

id_plot <- which(ids %in% "ID4")

points(x_coor[id_plot], y_coor[id_plot], col = "red")

# And also plot the points representing his offspring, this time in black.

ID_wSire <- which(sire_ref %in% id_plot)

points(x_coor[ID_wSire], y_coor[ID_wSire])

# So to plot the links, lets just turn off all the links that aren't connected to individual ID4. Since he is a sire we only have to plot sire links

x_sire2 <- x_sire
y_sire2 <- y_sire
x_sire2[sire_ref != id_plot] <- NA
y_sire2[sire_ref != id_plot] <- NA
segments(x_coor, y_coor, x_sire2, y_sire2, col = "orange")

The next stage is a bit harder still. How do we plot all descendants? Unsurprisingly it requires a few more steps and a little more thinking, but exactly same logic as before.

## Lets start again with a plain background

plot(x_coor, y_coor,
  ylim = c(max(y_coor), min(y_coor)), col = rgb(0, 0, 0, 0.5), pch = ".", cex = 1,
  frame.plot = FALSE, axes = F, ylab = "", xlab = ""
)
axis(side = 2, labels = TRUE, tick = FALSE, las = 2, at = unique(y_coor))
segments(x_coor, y_coor, x_sire, y_sire, col = rgb(0.9, 0.9, 0.9, 0.6))
segments(x_coor, y_coor, x_dam, y_dam, col = rgb(0.9, 0.9, 0.9, 0.6))


# The main hurdle is identifying all the descendants in the first place, so we'll use
# a simple loop

#  We set up vectors to hold the parents each iteration and also all the descendants
# our focal individual is included in both.  At each iteration of the loop we find all the
# individuals that are offspring of the individuals in the list of parents, and continue until there are no further offspring.

parents <- id_plot
desc_ref <- c(id_plot)

repeat{
  offspring <- c(ids[sire_ref %in% parents], ids[dam_ref %in% parents])
  off_ref <- which(ids %in% offspring)
  if (length(offspring) == 0) {
    break
  }
  desc_ref <- c(desc_ref, off_ref)
  parents <- off_ref
}

#
# We have now got all the relevant individual references
desc_ref
##  [1]   4  32  35  42  45  48  54  56  58  67  71  72  81  52  75  78  79  87  88
## [20]  89  90  91  92  93  95  97 101 104  94  96  98  99 100 102 103 106 107 109
## [39] 110 111 112 115 117 118 120 122 123 126 128 130 131 132 133 136 101 108 109
## [58] 110 113 114 116 119 121 124 125 126 127 129 130 131 133 134 136 110 118
# So again we can highlight the focal individual and all it's descendants

points(x_coor[id_plot], y_coor[id_plot], col = "red")
points(x_coor[desc_ref], y_coor[desc_ref])


# This time we are interested in plotting both sire and dam links, but only to other
# individuals in the list of descendants

x_sire2 <- x_sire
y_sire2 <- y_sire
x_sire2[!(sire_ref %in% desc_ref & ids %in% ids[desc_ref])] <- NA
y_sire2[!(sire_ref %in% desc_ref & ids %in% ids[desc_ref])] <- NA
segments(x_coor, y_coor, x_sire2, y_sire2, col = "orange")


x_dam2 <- x_dam
y_dam2 <- y_dam
x_dam2[!(dam_ref %in% desc_ref & ids %in% ids[desc_ref])] <- NA
y_dam2[!(dam_ref %in% desc_ref & ids %in% ids[desc_ref])] <- NA
segments(x_coor, y_coor, x_dam2, y_dam2, col = "blue")

In conclusion, I maybe exaggerated a little about how simple it is, but actually plotting a whole pedigree isn’t at all complicated. The key challenge is when you are only interested in plotting some links and sub-setting the lines to just the ones that you require.

Once you understand the basic principle you can easily plot any set of relationships you like, and make a professional looking pedigree plot.

plot(x_coor, y_coor,
  ylim = c(max(y_coor), min(y_coor)), col = rgb(0, 0, 0, 0.5), pch = ".", cex = 1,
  frame.plot = FALSE, axes = F, ylab = "", xlab = ""
)
axis(side = 2, labels = TRUE, tick = FALSE, las = 2, at = unique(y_coor), cex = 0.8)
segments(x_coor, y_coor, x_sire, y_sire, col = rgb(0.9, 0.9, 0.9, 0.6))
segments(x_coor, y_coor, x_dam, y_dam, col = rgb(0.9, 0.9, 0.9, 0.6))


x_sire2 <- x_sire
y_sire2 <- y_sire
x_sire2[!(sire_ref %in% desc_ref & ids %in% ids[desc_ref])] <- NA
y_sire2[!(sire_ref %in% desc_ref & ids %in% ids[desc_ref])] <- NA
segments(x_coor, y_coor, x_sire2, y_sire2, col = "#f1a340")


x_dam2 <- x_dam
y_dam2 <- y_dam
x_dam2[!(dam_ref %in% desc_ref & ids %in% ids[desc_ref])] <- NA
y_dam2[!(dam_ref %in% desc_ref & ids %in% ids[desc_ref])] <- NA
segments(x_coor, y_coor, x_dam2, y_dam2, col = "#998ec3")