1 Introduction

The data we will play with are from this study:

Rationale: Asthma is a chronic inflammatory airway disease. The most common medications used for its treatment are β2-agonists and glucocorticosteroids, and one of the primary tissues that these drugs target in the treatment of asthma is the airway smooth muscle. We used RNA-Seq to characterize the human airway smooth muscle (HASM) transcriptome at baseline and under three asthma treatment conditions.

Methods: The Illumina TruSeq assay was used to prepare 75bp paired-end libraries for HASM cells from four white male donors under four treatment conditions: 1) no treatment; 2) treatment with a β2-agonist (i.e. Albuterol, 1μM for 18h); 3) treatment with a glucocorticosteroid (i.e. Dexamethasone (Dex), 1μM for 18h); 4) simultaneous treatment with a β2-agonist and glucocorticoid, and the libraries were sequenced with an Illumina Hi-Seq 2000 instrument.

2 Preparing data

2.1 Install the SRA Toolkit

To get the data out of the Sequence Read Archive (SRA), NCBI provides the SRA Toolkit. I install that here to perform the simultaneous download and fastq conversion.

download.file('http://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/2.6.3/sratoolkit.2.6.3-ubuntu64.tar.gz',
              'sratoolkit.2.6.3-ubuntu64.tar.gz')
untar('sratoolkit.2.6.3-ubuntu64.tar.gz',compressed=TRUE)

We will put all the FASTQ files in a single directory. Paired-end reads will be saved with filenames that match the original SRA accessions.

dir.create('fastq')
Sys.setenv('PATH'=paste(Sys.getenv("PATH"),'sratoolkit.2.6.3-ubuntu64/bin',sep=":"))

2.2 Find SRA accessions using `SRAdb` package

In order to find the associated SRA Run accessions for the SRA study, we will use the SRAdb package.

# install SRAdb if not already present
if(!require('SRAdb')) biocLite('SRAdb')
sqlf = 'SRAmetadb.sqlite'
# download the database if it is not already present
if(!file.exists(sqlf)) {
  getSRAdbFile()
}
conn = dbConnect(SQLite(),sqlf)

srp    = 'SRP033351'
dtable = sraConvert('SRP033351',out_type = c("sra", "submission", "study", "sample", "experiment", "run"),conn)
knitr::kable(dtable)

study	submission	sample	experiment	run
SRP033351	SRA114259	SRS508581	SRX384360	SRR1039523
SRP033351	SRA114259	SRS508573	SRX384352	SRR1039515
SRP033351	SRA114259	SRS508570	SRX384347	SRR1039510
SRP033351	SRA114259	SRS508572	SRX384350	SRR1039513
SRP033351	SRA114259	SRS508582	SRX384359	SRR1039522
SRP033351	SRA114259	SRS508575	SRX384353	SRR1039516
SRP033351	SRA114259	SRS508574	SRX384351	SRR1039514
SRP033351	SRA114259	SRS508577	SRX384356	SRR1039519
SRP033351	SRA114259	SRS508576	SRX384354	SRR1039517
SRP033351	SRA114259	SRS508580	SRX384358	SRR1039521
SRP033351	SRA114259	SRS508579	SRX384357	SRR1039520
SRP033351	SRA114259	SRS508571	SRX384349	SRR1039512
SRP033351	SRA114259	SRS508569	SRX384348	SRR1039511
SRP033351	SRA114259	SRS508567	SRX384346	SRR1039509
SRP033351	SRA114259	SRS508568	SRX384345	SRR1039508
SRP033351	SRA114259	SRS508578	SRX384355	SRR1039518

2.3 Get FASTQ from SRA

Now, we have the accessions of the SRA runs. These accessions contain the actual sequences, so we use the fastq-dump executable to download and dump fastq into the fastq directory.

fastqdump = function(accession,dir = getwd()) {
  system(sprintf('fastq-dump --split-files --gzip -O %s %s',dir,accession))
}
library(BiocParallel)
register(MulticoreParam())
bplapply(dtable$run,fastqdump,dir='fastq')

2.4 Upload to SevenBridges cloud

# library(BiocInstaller)
# useDevel()
# biocLite('sevenbridges')
library(sevenbridges)
a<-Auth(user="sdavis2", platform = "cgc", token = "XXXXXXXXXXX")
p = a$project('CSHL-2016')
sapply(dir('fastq',full.names = TRUE),p$upload)

p$file('*fastq.gz')
lapply(p$file('*fastq.gz'),function(j) {
  print(j$name)
  sn = strsplit(j$name,'_')[[1]][1]
  print(j$setMeta(sample_id=sn))
})

CSHL RNA-seq with NCI Cloud Pilot

Sean Davis

7/3/2016

Contents

1 Introduction

2 Preparing data

2.1 Install the SRA Toolkit

2.2 Find SRA accessions using `SRAdb` package

2.3 Get FASTQ from SRA

2.4 Upload to SevenBridges cloud

CSHL RNA-seq with NCI Cloud Pilot

Sean Davis

7/3/2016

Contents

1 Introduction

2 Preparing data

2.1 Install the SRA Toolkit

2.2 Find SRA accessions using SRAdb package

2.3 Get FASTQ from SRA

2.4 Upload to SevenBridges cloud

2.2 Find SRA accessions using `SRAdb` package