Duq HW1: Plotting & t-testing data from Cell retraction saga

The controversy

  • The Cell original paper by Y et al (2013) was controversial.
  • Other labs attempted replication and failed
  • Because of the controversy the original authors collaborated with other researchers to conduct a blinded, multi-lab replication of the original study (Cox et al 2016 PLOS)
  • 3 labs participated on the replication
  • For this assignment, we will plot and anlyze data from one of those labs.


Assignment

  • Load Yi_et_al_PLOS_betatropin_data_subset_lab_3_CSV.csv
  • make a barplot of the mean values (or dotplot)
  • make a beeswarm plot of the raw data
  • do a t.test
  • email me the code to do this


How to do it

Load the data

  • load data with read.csv()
  • Be sure how have .csv at the end
  • Note that R assumes that your 1st row contains column names
lab3.df <- read.csv(file = "Yi_et_al_PLOS_betatropin_data_subset_lab_3_CSV.csv")

Look at the data

head(lab3.df)
##   animal.ID trt slide total.beta.cell Ki67.cells percent.Ki67  X X.1 X.2
## 1    CM7721 MBP     3            1887         28         1.48 NA  NA  NA
## 2    CM7722 MBP     3             897          6         0.67 NA  NA  NA
## 3    CM7725 MBP     3            1997         11         0.55 NA  NA  NA
## 4    CM7726 MBP     3            4513         17         0.38 NA  NA  NA
## 5    CM7729 MBP     3             930          3         0.32 NA  NA  NA
## 6    CM7731 MBP     3            3535         12         0.34 NA  NA  NA
##   X.3
## 1  NA
## 2  NA
## 3  NA
## 4  NA
## 5  NA
## 6  NA

Aside: Negative indexing

Not that in my version of the file there are 3 empty columns to the left. Normally I’d probably check the Excel file to make sure there are now problems, delete these by hand in Excel and re-load the file.

I can also remove them in R like this using brackets and column indexing. “-c(7:10)” means “remove columns 7 through 10”. “c(7:10)” specifies those columns, and the negative signs says remove them (this is called “negative indexing”)

lab3.df <- lab3.df[, -c(7:10)]

See the links below for more on negative indexing:

http://stackoverflow.com/questions/7336679/in-r-what-does-a-negative-index-do?noredirect=1&lq=1

http://stackoverflow.com/questions/21209786/interpretation-of-negative-index-when-subsetting-a-data-frame



Plot the data

  • processing the data with doBy:summaryBy() is probably easiest
  • You can get a function for the standard error from plotrix::std.error()
  • barplots are easiest with barplot2()
  • plotting the means as points can be done several ways

Process the data

Load doBy package

The summaryBy command

library(doBy)

The std.error command

library(plotrix)


Calculate summary stats

  • summaryBy() by uses R’s standard formulat notation
  • This could also be done with tapply()
  • see below for how you can skip this step entirely!
Yi.summary.stats <- summaryBy(percent.Ki67 ~ trt, 
                              data = lab3.df,
                              FUN = c(mean, std.error))

Look at those summary stats

Yi.summary.stats
##       trt percent.Ki67.mean percent.Ki67.std.error
## 1     MBP             0.793              0.1566174
## 2 MBP+hBT             1.652              0.1061320



Make barplot

Load gplots

barplot2() is in gplots

library(gplots)
## 
## Attaching package: 'gplots'
## The following object is masked from 'package:plotrix':
## 
##     plotCI
## The following object is masked from 'package:stats':
## 
##     lowess

Basic barplot

If I forget how barplot2() works I can get the help file with ?barplot2

barplot2(height = Yi.summary.stats$percent.Ki67.mean,
         names.arg = Yi.summary.stats$trt,
         plot.ci = TRUE,
         ci.l = Yi.summary.stats$percent.Ki67.mean - 
           Yi.summary.stats$percent.Ki67.std.error,
         ci.u = Yi.summary.stats$percent.Ki67.mean + 
           Yi.summary.stats$percent.Ki67.std.error)



Cool trick: using the with() command

  • Its kinda annoying the number of times that we have to include “Yi.summary.stats$…”.
  • We can make life easier using the with() command
  • with() takes 2 arguements: a dataframe and a function that does something with that dataframe
  • below, I give it Yi.summary.stats, and then the barplot2 code with all the “Yi.summary.stats$” removed.
with(Yi.summary.stats, 
         barplot2(height = percent.Ki67.mean,
         names.arg = trt,
         plot.ci = TRUE,
         ci.l = percent.Ki67.mean - percent.Ki67.std.error,
         ci.u = percent.Ki67.mean + percent.Ki67.std.error))

Make point plot

  • The function plotCI is in gplots
  • Note that the syntax is different from barplot2()
  • This is kind of an ugly plot
plotCI(x = Yi.summary.stats$percent.Ki67.mean,
       ui = Yi.summary.stats$percent.Ki67.mean+
             Yi.summary.stats$percent.Ki67.std.error,
       li = Yi.summary.stats$percent.Ki67.mean-
             Yi.summary.stats$percent.Ki67.std.error)

Point plot with with()

Still ugly, but less code!

with(Yi.summary.stats, 
     plotCI(x = percent.Ki67.mean,
       ui = percent.Ki67.mean+
             percent.Ki67.std.error,
       li = percent.Ki67.mean-
             percent.Ki67.std.error))



Point plot with raw data!

We actually don’t have to use the summaryBy() command to get our means and stadard errors. gplots has a function plotmeans() that can do all of this in one step, using formula notation. It even adds some other stuff. I believe the default is for 95% CI, NOT Standard erros.

plotmeans(percent.Ki67 ~ trt, data = lab3.df)

  • The sample size labels can be turned off with the “n.label = F”" arguement
  • “connect = F” turns off the connecting line
plotmeans(percent.Ki67 ~ trt, 
          data = lab3.df,
          n.labe = F,
          connect = F)



Beeswarm plot

Load beeswarm libary

library(beeswarm)
## Warning: package 'beeswarm' was built under R version 3.3.2


Make beeswarm plot

beeswarm(percent.Ki67 ~ trt,
          data = lab3.df)

t.test

t.test(percent.Ki67 ~ trt,
          data = lab3.df)
## 
##  Welch Two Sample t-test
## 
## data:  percent.Ki67 by trt
## t = -4.5404, df = 15.826, p-value = 0.0003433
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.2604237 -0.4575763
## sample estimates:
##     mean in group MBP mean in group MBP+hBT 
##                 0.793                 1.652



References

Cox, Yi, Melton, Kushner et al. 2016. Resolving Discrepant Findings on ANGPTL8 in ??-Cell Proliferation: A Collaborative Approach to Resolving the Betatrophin Controversy. PLoSONE.

Yi P, Park JS, Melton DA (2013) Betatrophin:a hormone that controls pancreatic beta cell proliferation. Cell 153:747-758.