Obtaining Raw Affymetrix Data from ArrayExpress

Firstly you need to install the Array Express module for Bioconductor that allows you to search and download data directly within R. I am also going to install the library fot the human genome Affymetrix array.

BiocManager::install("ArrayExpress")
BiocManager::install("pd.hg.u133a")
BiocManager::install("hgu133acdf")
BiocManager::install("hgu95av2")
BiocManager::install("hgu95av2cdf")

Once the package is installed you can start to query the database. In this case I am interested in lung cancer data from cell lines.

library("ArrayExpress")
sets <- queryAE(keywords = "lung cancer", species = "homo+sapiens")

This gives a very large table of results and can take a very long time. I want to focus on the E-GEOD-4127 dataset as it contains the raw data files for 29 patients and this is a reasonable size dataset to practice on.

eset <- ArrayExpress("E-GEOD-4127")
Error in download.file(url, filedest) : 
  cannot open URL 'ftp://ftp.ebi.ac.uk/biostudies/fire/E-GEOD-/127/E-GEOD-4127/Files/E-GEOD-4127.idf.txt'
Reading in : /var/folders/85/9sb06ljn3ggc8rk8k2pv2c680000gn/T//RtmpfFkrFY/GSM94315.CEL
Reading in : /var/folders/85/9sb06ljn3ggc8rk8k2pv2c680000gn/T//RtmpfFkrFY/GSM94311.CEL
Reading in : /var/folders/85/9sb06ljn3ggc8rk8k2pv2c680000gn/T//RtmpfFkrFY/GSM94322.CEL
Reading in : /var/folders/85/9sb06ljn3ggc8rk8k2pv2c680000gn/T//RtmpfFkrFY/GSM94323.CEL
Reading in : /var/folders/85/9sb06ljn3ggc8rk8k2pv2c680000gn/T//RtmpfFkrFY/GSM94326.CEL
Reading in : /var/folders/85/9sb06ljn3ggc8rk8k2pv2c680000gn/T//RtmpfFkrFY/GSM94314.CEL
Reading in : /var/folders/85/9sb06ljn3ggc8rk8k2pv2c680000gn/T//RtmpfFkrFY/GSM94330.CEL
Reading in : /var/folders/85/9sb06ljn3ggc8rk8k2pv2c680000gn/T//RtmpfFkrFY/GSM94307.CEL
Reading in : /var/folders/85/9sb06ljn3ggc8rk8k2pv2c680000gn/T//RtmpfFkrFY/GSM94318.CEL
Reading in : /var/folders/85/9sb06ljn3ggc8rk8k2pv2c680000gn/T//RtmpfFkrFY/GSM94319.CEL
Reading in : /var/folders/85/9sb06ljn3ggc8rk8k2pv2c680000gn/T//RtmpfFkrFY/GSM94321.CEL
Reading in : /var/folders/85/9sb06ljn3ggc8rk8k2pv2c680000gn/T//RtmpfFkrFY/GSM94309.CEL
Reading in : /var/folders/85/9sb06ljn3ggc8rk8k2pv2c680000gn/T//RtmpfFkrFY/GSM94305.CEL
Reading in : /var/folders/85/9sb06ljn3ggc8rk8k2pv2c680000gn/T//RtmpfFkrFY/GSM94308.CEL
Reading in : /var/folders/85/9sb06ljn3ggc8rk8k2pv2c680000gn/T//RtmpfFkrFY/GSM94313.CEL
Reading in : /var/folders/85/9sb06ljn3ggc8rk8k2pv2c680000gn/T//RtmpfFkrFY/GSM94331.CEL
Reading in : /var/folders/85/9sb06ljn3ggc8rk8k2pv2c680000gn/T//RtmpfFkrFY/GSM94317.CEL
Reading in : /var/folders/85/9sb06ljn3ggc8rk8k2pv2c680000gn/T//RtmpfFkrFY/GSM94310.CEL
Reading in : /var/folders/85/9sb06ljn3ggc8rk8k2pv2c680000gn/T//RtmpfFkrFY/GSM94329.CEL
Reading in : /var/folders/85/9sb06ljn3ggc8rk8k2pv2c680000gn/T//RtmpfFkrFY/GSM94320.CEL
Reading in : /var/folders/85/9sb06ljn3ggc8rk8k2pv2c680000gn/T//RtmpfFkrFY/GSM94325.CEL
Reading in : /var/folders/85/9sb06ljn3ggc8rk8k2pv2c680000gn/T//RtmpfFkrFY/GSM94303.CEL
Reading in : /var/folders/85/9sb06ljn3ggc8rk8k2pv2c680000gn/T//RtmpfFkrFY/GSM94316.CEL
Reading in : /var/folders/85/9sb06ljn3ggc8rk8k2pv2c680000gn/T//RtmpfFkrFY/GSM94312.CEL
Reading in : /var/folders/85/9sb06ljn3ggc8rk8k2pv2c680000gn/T//RtmpfFkrFY/GSM94324.CEL
Reading in : /var/folders/85/9sb06ljn3ggc8rk8k2pv2c680000gn/T//RtmpfFkrFY/GSM94306.CEL
Reading in : /var/folders/85/9sb06ljn3ggc8rk8k2pv2c680000gn/T//RtmpfFkrFY/GSM94327.CEL
Reading in : /var/folders/85/9sb06ljn3ggc8rk8k2pv2c680000gn/T//RtmpfFkrFY/GSM94304.CEL
Reading in : /var/folders/85/9sb06ljn3ggc8rk8k2pv2c680000gn/T//RtmpfFkrFY/GSM94328.CEL
Error in if (file == "") file <- stdin() else { : 
  argument is of length zero
Error in basename(mageFiles$processedFiles) : 
  a character vector argument expected
Error in basename(mageFiles$processedArchive) : 
  a character vector argument expected

This downloads the .CEL files processes them to create an r-object and deletes the downloaded files to tidy up. I actually wanted the .CEL image files to check on any possible issues and so I downloaded them as a zip file to my R working directory. They are in a sub-directory called E-GEOD-4127.