Duq HW1: Plotting & t-testing data from Cell retraction saga

The controversy

The Cell original paper by Y et al (2013) was controversial.
Other labs attempted replication and failed
Because of the controversy the original authors collaborated with other researchers to conduct a blinded, multi-lab replication of the original study (Cox et al 2016 PLOS)
3 labs participated on the replication
For this assignment, we will plot and anlyze data from one of those labs.

Assignment

Load Yi_et_al_PLOS_betatropin_data_subset_lab_3_CSV.csv
make a barplot of the mean values (or dotplot)
make a beeswarm plot of the raw data
do a t.test
email me the code to do this

How to do it

Load the data

load data with read.csv()
Be sure how have .csv at the end
Note that R assumes that your 1st row contains column names

lab3.df <- read.csv(file = "Yi_et_al_PLOS_betatropin_data_subset_lab_3_CSV.csv")

Look at the data

head(lab3.df)

##   animal.ID trt slide total.beta.cell Ki67.cells percent.Ki67  X X.1 X.2
## 1    CM7721 MBP     3            1887         28         1.48 NA  NA  NA
## 2    CM7722 MBP     3             897          6         0.67 NA  NA  NA
## 3    CM7725 MBP     3            1997         11         0.55 NA  NA  NA
## 4    CM7726 MBP     3            4513         17         0.38 NA  NA  NA
## 5    CM7729 MBP     3             930          3         0.32 NA  NA  NA
## 6    CM7731 MBP     3            3535         12         0.34 NA  NA  NA
##   X.3
## 1  NA
## 2  NA
## 3  NA
## 4  NA
## 5  NA
## 6  NA

Aside: Negative indexing

Not that in my version of the file there are 3 empty columns to the left. Normally I’d probably check the Excel file to make sure there are now problems, delete these by hand in Excel and re-load the file.

I can also remove them in R like this using brackets and column indexing. “-c(7:10)” means “remove columns 7 through 10”. “c(7:10)” specifies those columns, and the negative signs says remove them (this is called “negative indexing”)

lab3.df <- lab3.df[, -c(7:10)]

See the links below for more on negative indexing:

http://stackoverflow.com/questions/7336679/in-r-what-does-a-negative-index-do?noredirect=1&lq=1

http://stackoverflow.com/questions/21209786/interpretation-of-negative-index-when-subsetting-a-data-frame

Plot the data

processing the data with doBy:summaryBy() is probably easiest
You can get a function for the standard error from plotrix::std.error()
barplots are easiest with barplot2()
plotting the means as points can be done several ways

Process the data

Load doBy package

The summaryBy command

library(doBy)

The std.error command

library(plotrix)

Calculate summary stats

summaryBy() by uses R’s standard formulat notation
This could also be done with tapply()
see below for how you can skip this step entirely!

Yi.summary.stats <- summaryBy(percent.Ki67 ~ trt, 
                              data = lab3.df,
                              FUN = c(mean, std.error))

Look at those summary stats

Yi.summary.stats

##       trt percent.Ki67.mean percent.Ki67.std.error
## 1     MBP             0.793              0.1566174
## 2 MBP+hBT             1.652              0.1061320

Make barplot

Load gplots

barplot2() is in gplots

library(gplots)

## 
## Attaching package: 'gplots'

## The following object is masked from 'package:plotrix':
## 
##     plotCI

## The following object is masked from 'package:stats':
## 
##     lowess

Basic barplot

If I forget how barplot2() works I can get the help file with ?barplot2

barplot2(height = Yi.summary.stats$percent.Ki67.mean,
         names.arg = Yi.summary.stats$trt,
         plot.ci = TRUE,
         ci.l = Yi.summary.stats$percent.Ki67.mean - 
           Yi.summary.stats$percent.Ki67.std.error,
         ci.u = Yi.summary.stats$percent.Ki67.mean + 
           Yi.summary.stats$percent.Ki67.std.error)

Cool trick: using the with() command

Its kinda annoying the number of times that we have to include “Yi.summary.stats$…”.
We can make life easier using the with() command
with() takes 2 arguements: a dataframe and a function that does something with that dataframe
below, I give it Yi.summary.stats, and then the barplot2 code with all the “Yi.summary.stats$” removed.

with(Yi.summary.stats, 
         barplot2(height = percent.Ki67.mean,
         names.arg = trt,
         plot.ci = TRUE,
         ci.l = percent.Ki67.mean - percent.Ki67.std.error,
         ci.u = percent.Ki67.mean + percent.Ki67.std.error))

Make point plot

The function plotCI is in gplots
Note that the syntax is different from barplot2()
This is kind of an ugly plot

plotCI(x = Yi.summary.stats$percent.Ki67.mean,
       ui = Yi.summary.stats$percent.Ki67.mean+
             Yi.summary.stats$percent.Ki67.std.error,
       li = Yi.summary.stats$percent.Ki67.mean-
             Yi.summary.stats$percent.Ki67.std.error)

Point plot with with()

Still ugly, but less code!

with(Yi.summary.stats, 
     plotCI(x = percent.Ki67.mean,
       ui = percent.Ki67.mean+
             percent.Ki67.std.error,
       li = percent.Ki67.mean-
             percent.Ki67.std.error))

Point plot with raw data!

We actually don’t have to use the summaryBy() command to get our means and stadard errors. gplots has a function plotmeans() that can do all of this in one step, using formula notation. It even adds some other stuff. I believe the default is for 95% CI, NOT Standard erros.

plotmeans(percent.Ki67 ~ trt, data = lab3.df)

The sample size labels can be turned off with the “n.label = F”" arguement
“connect = F” turns off the connecting line

plotmeans(percent.Ki67 ~ trt, 
          data = lab3.df,
          n.labe = F,
          connect = F)

Beeswarm plot

Load beeswarm libary

library(beeswarm)

## Warning: package 'beeswarm' was built under R version 3.3.2

Make beeswarm plot

beeswarm(percent.Ki67 ~ trt,
          data = lab3.df)

t.test

t.test(percent.Ki67 ~ trt,
          data = lab3.df)

## 
##  Welch Two Sample t-test
## 
## data:  percent.Ki67 by trt
## t = -4.5404, df = 15.826, p-value = 0.0003433
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.2604237 -0.4575763
## sample estimates:
##     mean in group MBP mean in group MBP+hBT 
##                 0.793                 1.652

References

Cox, Yi, Melton, Kushner et al. 2016. Resolving Discrepant Findings on ANGPTL8 in ??-Cell Proliferation: A Collaborative Approach to Resolving the Betatrophin Controversy. PLoSONE.

Yi P, Park JS, Melton DA (2013) Betatrophin:a hormone that controls pancreatic beta cell proliferation. Cell 153:747-758.

Duq HW1: Plotting & t-testing data from Cell retraction saga

brouwern@gmail.com

January 25, 2017

Duq HW1: Plotting & t-testing data from Cell retraction saga

The controversy

Assignment

How to do it

Load the data

Look at the data

Aside: Negative indexing

Plot the data

Process the data

Load doBy package

Calculate summary stats

Make barplot

Load gplots

Basic barplot

Cool trick: using the with() command

Make point plot

Point plot with with()

Point plot with raw data!

Beeswarm plot

Load beeswarm libary

Make beeswarm plot

t.test

References