Introduction

The data used is from this genomics paper written by Alice Popejoy and Stephanie Fullerton. This article is using data collected in genome wide association studies (GWAS) to show it’s bias in participants. The data was collected by sample descriptions in GWAS catalogs that are publicly accessible and validated before published. The process was repeated in 2016 from the original study in 2009 to compare the progress, if there was any, and to put forth a prevalent issue in genomic research.

Create data

Using the data given in the article, create vectors that represent the proportion of participants for each year and the labels for the data. Since both years have the same labels, only 3 vectors are necessary.

vector09_data <- c(96, 2, 1)
vector16_data <- c(81, 14, 5)


data_labels <- c("European\n",
                     "Asian\n",
                     "Other non-European")

Pie graphs

The built in R function pie() is used to build pie charts. The arguments following arguments can be used to customize the pie charts:

  1. main – gives a title to the chart

  2. init.angle – specifies the starting angle in degrees for the slices

  3. radius – any number in range -1 to 1, longer labels usually signify a smaller radius

  4. col – a vector of colors to used for filling slices, default uses a set of pastel colors

# set up par()
par(mfrow = c(1,2), mar = c(2,3,1,5))

#pie graphs 1 - 2009
# add main, init.angle, radius, and col
pie(vector09_data, data_labels, main = "2009 Diversity in GWAS Studies",
    init.angle = -82, radius = 1, col = c(1,2,3))

# pie graph 2 - 2016
# add main, init.angle, radius, and col
pie(vector16_data, data_labels, main = "2016 Diversity in GWAS Studies",
    init.angle = -82, radius = 1, col = c(1,2,3))

The par() function is used to set graphical parameters. It is best practice to reset to default after changing them.

# default:
op <- par(mfrow = c(1, 1), # 1 x 1 pictures on one plot
          pty = "s")       # square plotting region,
                           # independent of device size

Explore arguments of pie charts:

par(op)

pie(vector09_data, data_labels,
    main = "2009",
    init.angle = 36,
    radius = -1,
    col = c(5, 6, 7))

pie(vector09_data, data_labels,
    main = "2009.1",
    init.angle = -67,
    radius = -1,
    col = c(0, 8, 1))

pie(vector16_data, data_labels,
    main = "2016",
    init.angle = -29,
    radius = -0.8,
    col = c(8, 9, 10))

pie(vector16_data, data_labels,
    main = "2016.1",
    init.angle = -42,
    radius = -1,
    col = c(3, 6, 7))

## Bar graphs

Bar graphs can be made with the barplot() function. The main arguments for bar graphs are as follows:

  1. height – this is the data field, which can be a vector or a matrix

  2. width – default is 1, the width of the bars being drawn

  3. col – choose your colors!!

  4. legend.text – usefully is data being sent into bar graph is a matrix

  5. axes – True or False, do the axes need to be displayed?

# data
dat2016 <- c(14, 3,1,0.54,0.28,0.08,0.05)
dat2016_rev <- rev(dat2016)
barplotdata2016 <- matrix(c(dat2016_rev))

# labels
labels_x <- rev(c("Asian","African","Mixed", "Hispanic &\nLatin American",
                        "Pacific Islander","Arab & Middle East","Native peoples"))

par(mfrow = c(1,1))

barplot(barplotdata2016,
        width = 0.01, 
        xlim = c(0,0.1),
         axes = F,
        col = c(1,2,3,4,5,6,7),
        legend.text = labels_x)

specific09 <- c(3, 0.57, 0.15, 0.06, 0.06)
bpspec09 <- matrix(c(specific09))

specific_labels <- c("Asian","African","Hispanic &\nLatin American",
                        "Pacific Islander","Native peoples")
par(mfrow = c(2,2))

barplot(specific09,
        width = 0.01, 
        xlim = c(0,0.1),
        col = c(2, 4, 5, 6, 7),
        main = "2009 Non-European Participants")

barplot(bpspec09,
        width = 0.01, 
        xlim = c(0,0.1),
        col = c(2, 4, 5, 6, 7),
        legend.text = specific_labels,
        main = "2009 Non-European Participants")