Getting data

This is an R Markdown document created with examples of how to get data using R. Based on Coursera's Getting and Cleaning Data.

Raw data -> tidy data

We aim at reproducibility. So when you treat raw data to obtain tidy data, you should end up with:

the raw data (the rawest form of the data you had access to so that it may be possible to start the analysis from scratch again)
the tidy data set
code book = metadata (one section should be called “Study design” and thoroughly describe how the data was collected and another “Code book”, describing each variable and its units)
an explicit recipe of how we got the tidy data from the raw data (ideally a computer script. If not possible to put all on a script, there should be clear instructions on how to process the raw data to obtain the tidy data)

Using data.table instead of data.frame

Inherets from data.frame
- All functions that accept data.frame work on data.table
Written in C so it is much faster
Much, much faster at subsetting, group, and updating
the syntax is a bit different :(

You create it just like you would a data frame:

library(data.table)

## Warning: package 'data.table' was built under R version 3.0.3

DT = data.table(x=rnorm(9),y=rep(c("a","b","c"),each=3),z=rnorm(9))
DT

##          x y       z
## 1:  0.8060 a  0.4069
## 2:  0.5900 a  0.1419
## 3: -0.5667 a -0.6065
## 4:  0.1425 b -1.5311
## 5: -0.1231 b -0.3270
## 6: -0.5698 b -0.6895
## 7:  0.8695 c -0.7942
## 8: -0.8862 c -0.7771
## 9:  1.3449 c  0.4245

If you're concerned with memory space, you can check out the tables you are currently working with using the tables() function.

Row subsetting is the same as with data frames. Columns subsetting is quite different:

The subsetting function is modified for data.table
The argument you pass after the comma is an expression:

DT[,list(mean(x),sum(z))]

##        V1     V2
## 1: 0.1786 -3.752

DT[,table(y)]

## y
## a b c 
## 3 3 3

DT[, w:=z^2] # creates new variable w

##          x y       z       w
## 1:  0.8060 a  0.4069 0.16559
## 2:  0.5900 a  0.1419 0.02014
## 3: -0.5667 a -0.6065 0.36786
## 4:  0.1425 b -1.5311 2.34416
## 5: -0.1231 b -0.3270 0.10693
## 6: -0.5698 b -0.6895 0.47542
## 7:  0.8695 c -0.7942 0.63072
## 8: -0.8862 c -0.7771 0.60389
## 9:  1.3449 c  0.4245 0.18019

If you want to create a copy of a data table, explicitly use the function copy. If you just type DT2=DT, any changes you make to DT will be made to DT2.

Cool stuff: you can do more than one operation at once!

DT[,m:= {tmp <- (x+z); log2(tmp+5)}] # first create column tmp=x+z, then compute log2(tmp+5)

##          x y       z       w     m
## 1:  0.8060 a  0.4069 0.16559 2.635
## 2:  0.5900 a  0.1419 0.02014 2.519
## 3: -0.5667 a -0.6065 0.36786 1.936
## 4:  0.1425 b -1.5311 2.34416 1.853
## 5: -0.1231 b -0.3270 0.10693 2.186
## 6: -0.5698 b -0.6895 0.47542 1.903
## 7:  0.8695 c -0.7942 0.63072 2.344
## 8: -0.8862 c -0.7771 0.60389 1.738
## 9:  1.3449 c  0.4245 0.18019 2.759

Aggregating is also cool:

DT[,a:=x>0] # create boolean variable

##          x y       z       w     m     a
## 1:  0.8060 a  0.4069 0.16559 2.635  TRUE
## 2:  0.5900 a  0.1419 0.02014 2.519  TRUE
## 3: -0.5667 a -0.6065 0.36786 1.936 FALSE
## 4:  0.1425 b -1.5311 2.34416 1.853  TRUE
## 5: -0.1231 b -0.3270 0.10693 2.186 FALSE
## 6: -0.5698 b -0.6895 0.47542 1.903 FALSE
## 7:  0.8695 c -0.7942 0.63072 2.344  TRUE
## 8: -0.8862 c -0.7771 0.60389 1.738 FALSE
## 9:  1.3449 c  0.4245 0.18019 2.759  TRUE

DT[,b:= mean(x+w),by=a] # when a is true, compute mean(x+y) for the rows where a is true; when a is false, compute mean(x+y) for the rows where a is false!

##          x y       z       w     m     a       b
## 1:  0.8060 a  0.4069 0.16559 2.635  TRUE  1.4188
## 2:  0.5900 a  0.1419 0.02014 2.519  TRUE  1.4188
## 3: -0.5667 a -0.6065 0.36786 1.936 FALSE -0.1479
## 4:  0.1425 b -1.5311 2.34416 1.853  TRUE  1.4188
## 5: -0.1231 b -0.3270 0.10693 2.186 FALSE -0.1479
## 6: -0.5698 b -0.6895 0.47542 1.903 FALSE -0.1479
## 7:  0.8695 c -0.7942 0.63072 2.344  TRUE  1.4188
## 8: -0.8862 c -0.7771 0.60389 1.738 FALSE -0.1479
## 9:  1.3449 c  0.4245 0.18019 2.759  TRUE  1.4188

Count the number of times each variable appears with the special character .N:

set.seed(123);
DT = data.table(x=sample(letters[1:3], 1E5, TRUE))
DT[, .N, by=x] # much faster than doing a table of DT$x

##    x     N
## 1: a 33387
## 2: c 33201
## 3: b 33412

Very fast way to subset using a key:

DT = data.table(x=rep(c("a","b","c"),each=100), y=rnorm(300))
setkey(DT, x)
DT['a']

##      x        y
##   1: a  0.25959
##   2: a  0.91751
##   3: a -0.72232
##   4: a -0.80828
##   5: a -0.14135
##   6: a  2.25701
##   7: a -2.37955
##   8: a -0.45425
##   9: a -0.06007
##  10: a  0.86090
##  11: a -1.78466
##  12: a -0.13074
##  13: a -0.36984
##  14: a -0.18066
##  15: a -1.04973
##  16: a  0.37832
##  17: a -1.37079
##  18: a -0.31612
##  19: a  0.39435
##  20: a -1.68988
##  21: a -1.46234
##  22: a  2.55838
##  23: a  0.08789
##  24: a  1.73141
##  25: a  1.21513
##  26: a  0.29954
##  27: a -0.17246
##  28: a  1.13250
##  29: a  0.02320
##  30: a  1.33587
##  31: a -1.09879
##  32: a -0.58176
##  33: a  0.03892
##  34: a  1.07315
##  35: a  1.34970
##  36: a  1.19528
##  37: a -0.02218
##  38: a  0.69849
##  39: a  0.67241
##  40: a -0.79165
##  41: a -0.21791
##  42: a  0.02307
##  43: a  0.11539
##  44: a -0.27708
##  45: a  0.03688
##  46: a  0.47520
##  47: a  1.70749
##  48: a  1.07601
##  49: a -1.34571
##  50: a -1.44025
##  51: a -0.39393
##  52: a  0.58106
##  53: a -0.17079
##  54: a -0.90585
##  55: a  0.15621
##  56: a -0.37323
##  57: a -0.34587
##  58: a -0.35829
##  59: a -0.13307
##  60: a -0.08960
##  61: a  0.62793
##  62: a -1.42883
##  63: a  0.17255
##  64: a -0.79115
##  65: a  1.26204
##  66: a -0.26941
##  67: a  0.15698
##  68: a -0.76060
##  69: a  1.37060
##  70: a  0.03758
##  71: a  0.44949
##  72: a  2.78869
##  73: a -0.46849
##  74: a  1.01261
##  75: a -0.04374
##  76: a  1.40670
##  77: a  0.41993
##  78: a  0.31009
##  79: a  1.11905
##  80: a -1.29814
##  81: a -1.28248
##  82: a  1.65943
##  83: a  0.78375
##  84: a  0.57771
##  85: a -0.26725
##  86: a -0.64569
##  87: a -0.44953
##  88: a -0.82620
##  89: a  1.05504
##  90: a -0.87927
##  91: a -1.27713
##  92: a -0.63412
##  93: a  0.66470
##  94: a -0.50958
##  95: a  0.40736
##  96: a  1.67775
##  97: a -1.05206
##  98: a -0.63691
##  99: a  0.56539
## 100: a  0.38016
##      x        y

Merging two data tables fast using a key:

DT1 = data.table(x=c('a', 'a', 'b', 'dt1'), y=1:4)
DT2 = data.table(x=c('a', 'b', 'dt2'), z=5:7)
setkey(DT1, x); setkey(DT2, x)
merge(DT1, DT2)

##    x y z
## 1: a 1 5
## 2: a 2 5
## 3: b 3 6

Also, it is much faster to read using fread (returns a data.table) than read.table (returns data.frame). Check it out:

big_df = data.frame(x=rnorm(1E6), y=rnorm(1E6))
file = tempfile()
write.table(big_df, file=file, row.names=FALSE, col.names=TRUE, sep="\t", quote=FALSE)
system.time(fread(file))

## 
Read 51.0% of 1000000 rows
Read 89.0% of 1000000 rows
Read 1000000 rows and 2 (of 2) columns from 0.035 GB file in 00:00:04

##    user  system elapsed 
##    3.28    0.01    3.30

system.time(read.table(file, header=TRUE, sep="\t"))

##    user  system elapsed 
##   22.27    0.38   22.64

Downloading files

Often a data analysis process starts by downloading data from the internet and this should preferably be included in the script. It is important to know what directory you are working on and how to set a directory, check if a given directory exists or create one:

getwd() - to get the current working directory
setwd(“C:/example”) - to set the current working directory to C:/example
setwd(“../example”) to set the current directory to the folder example which is already in the current directory (before setting)
file.exists(“directoryName”) checks whether the directory exists
dir.create(“directoryName”) creates a directory
list.files(“directoryName”) lists the files in the directory

To download files you use the download.file() function. Example:

setInternet2(TRUE) # maybe not necessary for all systems
fileUrl = "https://data.baltimorecity.gov/api/views/dz54-2aru/rows.csv?accessType=DOWNLOAD"
download.file(fileUrl,destfile="./Data/cameras.csv") # on some systems, you might have to have method="curl"
dateDownloaded = date() # it is good practice to record the date of the download

Read flat files (usually a plain text file or a binary file)

read.table function

* This is the main function for reading data into R
* Flexible and robust but requires more parameters
* Reads the data into RAM - big data can cause problems
* Important parameters: file, header, sep, row.names, nrows
* Other important parameters
        * quote - you can tell R whether there are any quoted values quote="" means no quotes.
        * na.strings - set the character that represents a missing value.
        * nrows - how many rows to read of the file (e.g. nrows=10 reads 10 lines).
        * skip - number of lines to skip before starting to read

read.csv
read.csv2

Example:

cameraData <- read.table("./data/cameras.csv",sep=",",header=TRUE)
head(cameraData)

##                          address direction      street  crossStreet
## 1       S CATON AVE & BENSON AVE       N/B   Caton Ave   Benson Ave
## 2       S CATON AVE & BENSON AVE       S/B   Caton Ave   Benson Ave
## 3 WILKENS AVE & PINE HEIGHTS AVE       E/B Wilkens Ave Pine Heights
## 4        THE ALAMEDA & E 33RD ST       S/B The Alameda      33rd St
## 5        E 33RD ST & THE ALAMEDA       E/B      E 33rd  The Alameda
## 6        ERDMAN AVE & N MACON ST       E/B      Erdman     Macon St
##                 intersection                      Location.1
## 1     Caton Ave & Benson Ave (39.2693779962, -76.6688185297)
## 2     Caton Ave & Benson Ave (39.2693157898, -76.6689698176)
## 3 Wilkens Ave & Pine Heights  (39.2720252302, -76.676960806)
## 4     The Alameda  & 33rd St (39.3285013141, -76.5953545714)
## 5      E 33rd  & The Alameda (39.3283410623, -76.5953594625)
## 6         Erdman  & Macon St (39.3068045671, -76.5593167803)

Excel files

Example: download+read

if(!file.exists("Data")){dir.create("Data")}
fileUrl = "https://data.baltimorecity.gov/api/views/dz54-2aru/rows.xlsx?accessType=DOWNLOAD"
download.file(fileUrl,destfile="./data/cameras.xlsx", mode="wb") # mind the different download mode for these files, it may depend on your system

## Error: unsupported URL scheme

dateDownloaded = date()

library(xlsx)

## Loading required package: rJava
## Loading required package: xlsxjars

cameraData = read.xlsx("./data/cameras.xlsx",sheetIndex=1,header=TRUE)
head(cameraData)

##                          address direction      street  crossStreet
## 1       S CATON AVE & BENSON AVE       N/B   Caton Ave   Benson Ave
## 2       S CATON AVE & BENSON AVE       S/B   Caton Ave   Benson Ave
## 3 WILKENS AVE & PINE HEIGHTS AVE       E/B Wilkens Ave Pine Heights
## 4        THE ALAMEDA & E 33RD ST       S/B The Alameda      33rd St
## 5        E 33RD ST & THE ALAMEDA       E/B      E 33rd  The Alameda
## 6        ERDMAN AVE & N MACON ST       E/B      Erdman     Macon St
##                 intersection                      Location.1
## 1     Caton Ave & Benson Ave (39.2693779962, -76.6688185297)
## 2     Caton Ave & Benson Ave (39.2693157898, -76.6689698176)
## 3 Wilkens Ave & Pine Heights  (39.2720252302, -76.676960806)
## 4     The Alameda  & 33rd St (39.3285013141, -76.5953545714)
## 5      E 33rd  & The Alameda (39.3283410623, -76.5953594625)
## 6         Erdman  & Macon St (39.3068045671, -76.5593167803)

XML files

About XML:

Extensible markup language
Frequently used to store structured data
Particularly widely used in internet applications
Extracting XML is the basis for most web scraping
Components
- Markup - labels that give the text structure
- Content - the actual text of the document
Tags correspond to general labels
- Start tags <section>
- End tags </section>
- Empty tags <line-break />: no need for a start tag and an end tag in this case
Elements are specific examples of tags
- <Greeting> Hello, world </Greeting>
Attributes are components of the label
- <img src="jeff.jpg" alt="instructor"/>
- <step number="3"> Connect A to B. </step>

Example: Read file http://www.w3schools.com/xml/simple.xml

Let's load the data and see what it looks like:

library(XML)
fileUrl = "http://www.w3schools.com/xml/simple.xml" 
doc = xmlTreeParse(fileUrl,useInternal=TRUE) # loads the document into R so that you can then parse it/look at parts of it/identifying what is what in the file
print(doc)

## <?xml version="1.0" encoding="UTF-8"?>
## <!-- Edited by XMLSpy -->
## <breakfast_menu>
##   <food>
##     <name>Belgian Waffles</name>
##     <price>$5.95</price>
##     <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
##     <calories>650</calories>
##   </food>
##   <food>
##     <name>Strawberry Belgian Waffles</name>
##     <price>$7.95</price>
##     <description>Light Belgian waffles covered with strawberries and whipped cream</description>
##     <calories>900</calories>
##   </food>
##   <food>
##     <name>Berry-Berry Belgian Waffles</name>
##     <price>$8.95</price>
##     <description>Light Belgian waffles covered with an assortment of fresh berries and whipped cream</description>
##     <calories>900</calories>
##   </food>
##   <food>
##     <name>French Toast</name>
##     <price>$4.50</price>
##     <description>Thick slices made from our homemade sourdough bread</description>
##     <calories>600</calories>
##   </food>
##   <food>
##     <name>Homestyle Breakfast</name>
##     <price>$6.95</price>
##     <description>Two eggs, bacon or sausage, toast, and our ever-popular hash browns</description>
##     <calories>950</calories>
##   </food>
## </breakfast_menu>
##

We're actually interested in the root node:

rootNode = xmlRoot(doc)
print(rootNode)

## <breakfast_menu>
##   <food>
##     <name>Belgian Waffles</name>
##     <price>$5.95</price>
##     <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
##     <calories>650</calories>
##   </food>
##   <food>
##     <name>Strawberry Belgian Waffles</name>
##     <price>$7.95</price>
##     <description>Light Belgian waffles covered with strawberries and whipped cream</description>
##     <calories>900</calories>
##   </food>
##   <food>
##     <name>Berry-Berry Belgian Waffles</name>
##     <price>$8.95</price>
##     <description>Light Belgian waffles covered with an assortment of fresh berries and whipped cream</description>
##     <calories>900</calories>
##   </food>
##   <food>
##     <name>French Toast</name>
##     <price>$4.50</price>
##     <description>Thick slices made from our homemade sourdough bread</description>
##     <calories>600</calories>
##   </food>
##   <food>
##     <name>Homestyle Breakfast</name>
##     <price>$6.95</price>
##     <description>Two eggs, bacon or sausage, toast, and our ever-popular hash browns</description>
##     <calories>950</calories>
##   </food>
## </breakfast_menu>

We can check out its name, the names of the elements inside the root node or directly access parts of the document:

xmlName(rootNode)

## [1] "breakfast_menu"

names(rootNode)

##   food   food   food   food   food 
## "food" "food" "food" "food" "food"

rootNode[[1]][[2]]

## <price>$5.95</price>

We can programatically extract parts of the file

xmlSApply(rootNode,xmlValue)

##                                                                                                                     food 
##                               "Belgian Waffles$5.95Two of our famous Belgian Waffles with plenty of real maple syrup650" 
##                                                                                                                     food 
##                    "Strawberry Belgian Waffles$7.95Light Belgian waffles covered with strawberries and whipped cream900" 
##                                                                                                                     food 
## "Berry-Berry Belgian Waffles$8.95Light Belgian waffles covered with an assortment of fresh berries and whipped cream900" 
##                                                                                                                     food 
##                                                "French Toast$4.50Thick slices made from our homemade sourdough bread600" 
##                                                                                                                     food 
##                         "Homestyle Breakfast$6.95Two eggs, bacon or sausage, toast, and our ever-popular hash browns950"

The XPath function is quite useful to extract …

/node Top level node
//node Node at any level
node[@attr-name] Node with an attribute name
node[@attr-name='bob'] Node with attribute name attr-name='bob'

Example of usage:

xpathSApply(rootNode,"//name",xmlValue)

## [1] "Belgian Waffles"             "Strawberry Belgian Waffles" 
## [3] "Berry-Berry Belgian Waffles" "French Toast"               
## [5] "Homestyle Breakfast"

xpathSApply(rootNode,"//price",xmlValue)

## [1] "$5.95" "$7.95" "$8.95" "$4.50" "$6.95"

xpathSApply(rootNode,"//food[@class='french fries']",xmlValue)

## list()

The last instruction returns an empty list, because there are no foods called french fries. Let's look at an example where this type of instruction may be useful:

fileUrl = "http://espn.go.com/nfl/team/_/name/bal/baltimore-ravens"
doc = htmlTreeParse(fileUrl,useInternal=TRUE)
teams = xpathSApply(doc,"//li[@class='team-name']",xmlValue)
print(teams)

##  [1] "San Francisco" "Dallas"        "Washington"    "New Orleans"  
##  [5] "Cincinnati"    "Pittsburgh"    "Cleveland"     "Carolina"     
##  [9] "Indianapolis"  "Tampa Bay"     "Atlanta"       "Cincinnati"   
## [13] "Pittsburgh"    "Tennessee"     "New Orleans"   "San Diego"    
## [17] "Miami"         "Jacksonville"  "Houston"       "Cleveland"

HTML files

You may want to do webscraping, i.e., Programatically extracting data from the HTML code of websites:

It can be a great way to get data
Many websites have information you may want to programaticaly read
In some cases this is against the terms of service for the website
Attempting to read too many pages too quickly can get your IP address blocked

Let's start by reading a webpage (http://scholar.google.com/citations?user=HI-I6C0AAAAJ&hl=en):

con = url("http://scholar.google.com/citations?user=HI-I6C0AAAAJ&hl=en") # open connection 
htmlCode = readLines(con) # read

## Warning: incomplete final line found on
## 'http://scholar.google.com/citations?user=HI-I6C0AAAAJ&hl=en'

close(con) # close connection

In this case, you a list of characters which is huge (which is why it is not displayed here). We can instead try using the XML package:

library(XML)
url = "http://scholar.google.com/citations?user=HI-I6C0AAAAJ&hl=en"
html_obtained_using_XML_library = htmlTreeParse(url, useInternalNodes=T) # it's too big to print here: let's call this ***

Then we can use the type of commands we saw before (see XML files section)

xpathSApply(html_obtained_using_XML_library, "//title", xmlValue)

## [1] "Jeff Leek - Google Scholar Citations"

xpathSApply(html_obtained_using_XML_library, "//td[@id='col-citedby']", xmlValue)

##  [1] "Cited by" "416"      "303"      "278"      "181"      "159"     
##  [7] "149"      "137"      "126"      "119"      "48"       "45"      
## [13] "40"       "34"       "23"       "16"       "14"       "13"      
## [19] "12"       "10"       "7"

There's still another alternative, which is rather useful when the site requires a login and a password. The result from using this approach (in this case, no login and password are required) is the same as using the XML library:

library(httr);

## Warning: package 'httr' was built under R version 3.0.3

html = GET("http://scholar.google.com/citations?user=HI-I6C0AAAAJ&hl=en")
content = content(html,as="text")
parsedHtml = htmlParse(content,asText=TRUE)

So now the parsedHtml looks exactly like html_obtained_using_XML_library. Again, we can use functions like xpathSApply. If the website does require login and password (example: http://httpbin.org/basic-auth/user/passwd), this is how to proceed:

pg2 = GET("http://httpbin.org/basic-auth/user/passwd", authenticate("user","passwd"))
content = content(pg2,as="text")
parsedHtml = htmlParse(content,asText=TRUE)

Using handles (to authenticate in websites?):

google = handle("http://google.com")
pg1 = GET(handle=google,path="/")
pg2 = GET(handle=google,path="search")

JSON (Javascript Object Notation) data

About JSON data:

Similar structure to XML (but different syntax/format), also very commonly used on the internet
Data stored as
- Numbers (double)
- Strings (double quoted)
- Boolean (true or false)
- Array (ordered, comma separated enclosed in square brackets [])
- Object (unorderd, comma separated collection of key:value pairs in curley brackets {})

How to read it:

We will open the example JSON data https://api.github.com/users/jtleek/repos

library(jsonlite)

## Warning: package 'jsonlite' was built under R version 3.0.3

jsonData = fromJSON("https://api.github.com/users/jtleek/repos")

You actually get back a data frame! Wanna look at the name of its columns?

colnames(jsonData)

##  [1] "id"                "name"              "full_name"        
##  [4] "owner"             "private"           "html_url"         
##  [7] "description"       "fork"              "url"              
## [10] "forks_url"         "keys_url"          "collaborators_url"
## [13] "teams_url"         "hooks_url"         "issue_events_url" 
## [16] "events_url"        "assignees_url"     "branches_url"     
## [19] "tags_url"          "blobs_url"         "git_tags_url"     
## [22] "git_refs_url"      "trees_url"         "statuses_url"     
## [25] "languages_url"     "stargazers_url"    "contributors_url" 
## [28] "subscribers_url"   "subscription_url"  "commits_url"      
## [31] "git_commits_url"   "comments_url"      "issue_comment_url"
## [34] "contents_url"      "compare_url"       "merges_url"       
## [37] "archive_url"       "downloads_url"     "issues_url"       
## [40] "pulls_url"         "milestones_url"    "notifications_url"
## [43] "labels_url"        "releases_url"      "created_at"       
## [46] "updated_at"        "pushed_at"         "git_url"          
## [49] "ssh_url"           "clone_url"         "svn_url"          
## [52] "homepage"          "size"              "stargazers_count" 
## [55] "watchers_count"    "language"          "has_issues"       
## [58] "has_downloads"     "has_wiki"          "forks_count"      
## [61] "mirror_url"        "open_issues_count" "forks"            
## [64] "open_issues"       "watchers"          "default_branch"

Now if you look at the original Json file, you will see that there's a lot of info on the “owner” entry. Let's explore that:

class(jsonData$owner)

## [1] "data.frame"

names(jsonData$owner)

##  [1] "login"               "id"                  "avatar_url"         
##  [4] "gravatar_id"         "url"                 "html_url"           
##  [7] "followers_url"       "following_url"       "gists_url"          
## [10] "starred_url"         "subscriptions_url"   "organizations_url"  
## [13] "repos_url"           "events_url"          "received_events_url"
## [16] "type"                "site_admin"

jsonData$owner$gists_url

##  [1] "https://api.github.com/users/jtleek/gists{/gist_id}"
##  [2] "https://api.github.com/users/jtleek/gists{/gist_id}"
##  [3] "https://api.github.com/users/jtleek/gists{/gist_id}"
##  [4] "https://api.github.com/users/jtleek/gists{/gist_id}"
##  [5] "https://api.github.com/users/jtleek/gists{/gist_id}"
##  [6] "https://api.github.com/users/jtleek/gists{/gist_id}"
##  [7] "https://api.github.com/users/jtleek/gists{/gist_id}"
##  [8] "https://api.github.com/users/jtleek/gists{/gist_id}"
##  [9] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [10] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [11] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [12] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [13] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [14] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [15] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [16] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [17] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [18] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [19] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [20] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [21] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [22] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [23] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [24] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [25] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [26] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [27] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [28] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [29] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [30] "https://api.github.com/users/jtleek/gists{/gist_id}"

You can also write data frames to Json:

head(iris)

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

myjson = toJSON(iris, pretty=TRUE)
print(myjson)

## [
##  {
##      "Sepal.Length" : 5.1,
##      "Sepal.Width" : 3.5,
##      "Petal.Length" : 1.4,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.9,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 1.4,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.7,
##      "Sepal.Width" : 3.2,
##      "Petal.Length" : 1.3,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.6,
##      "Sepal.Width" : 3.1,
##      "Petal.Length" : 1.5,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5,
##      "Sepal.Width" : 3.6,
##      "Petal.Length" : 1.4,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.4,
##      "Sepal.Width" : 3.9,
##      "Petal.Length" : 1.7,
##      "Petal.Width" : 0.4,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.6,
##      "Sepal.Width" : 3.4,
##      "Petal.Length" : 1.4,
##      "Petal.Width" : 0.3,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5,
##      "Sepal.Width" : 3.4,
##      "Petal.Length" : 1.5,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.4,
##      "Sepal.Width" : 2.9,
##      "Petal.Length" : 1.4,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.9,
##      "Sepal.Width" : 3.1,
##      "Petal.Length" : 1.5,
##      "Petal.Width" : 0.1,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.4,
##      "Sepal.Width" : 3.7,
##      "Petal.Length" : 1.5,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.8,
##      "Sepal.Width" : 3.4,
##      "Petal.Length" : 1.6,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.8,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 1.4,
##      "Petal.Width" : 0.1,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.3,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 1.1,
##      "Petal.Width" : 0.1,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.8,
##      "Sepal.Width" : 4,
##      "Petal.Length" : 1.2,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.7,
##      "Sepal.Width" : 4.4,
##      "Petal.Length" : 1.5,
##      "Petal.Width" : 0.4,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.4,
##      "Sepal.Width" : 3.9,
##      "Petal.Length" : 1.3,
##      "Petal.Width" : 0.4,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.1,
##      "Sepal.Width" : 3.5,
##      "Petal.Length" : 1.4,
##      "Petal.Width" : 0.3,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.7,
##      "Sepal.Width" : 3.8,
##      "Petal.Length" : 1.7,
##      "Petal.Width" : 0.3,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.1,
##      "Sepal.Width" : 3.8,
##      "Petal.Length" : 1.5,
##      "Petal.Width" : 0.3,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.4,
##      "Sepal.Width" : 3.4,
##      "Petal.Length" : 1.7,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.1,
##      "Sepal.Width" : 3.7,
##      "Petal.Length" : 1.5,
##      "Petal.Width" : 0.4,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.6,
##      "Sepal.Width" : 3.6,
##      "Petal.Length" : 1,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.1,
##      "Sepal.Width" : 3.3,
##      "Petal.Length" : 1.7,
##      "Petal.Width" : 0.5,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.8,
##      "Sepal.Width" : 3.4,
##      "Petal.Length" : 1.9,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 1.6,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5,
##      "Sepal.Width" : 3.4,
##      "Petal.Length" : 1.6,
##      "Petal.Width" : 0.4,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.2,
##      "Sepal.Width" : 3.5,
##      "Petal.Length" : 1.5,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.2,
##      "Sepal.Width" : 3.4,
##      "Petal.Length" : 1.4,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.7,
##      "Sepal.Width" : 3.2,
##      "Petal.Length" : 1.6,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.8,
##      "Sepal.Width" : 3.1,
##      "Petal.Length" : 1.6,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.4,
##      "Sepal.Width" : 3.4,
##      "Petal.Length" : 1.5,
##      "Petal.Width" : 0.4,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.2,
##      "Sepal.Width" : 4.1,
##      "Petal.Length" : 1.5,
##      "Petal.Width" : 0.1,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.5,
##      "Sepal.Width" : 4.2,
##      "Petal.Length" : 1.4,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.9,
##      "Sepal.Width" : 3.1,
##      "Petal.Length" : 1.5,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5,
##      "Sepal.Width" : 3.2,
##      "Petal.Length" : 1.2,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.5,
##      "Sepal.Width" : 3.5,
##      "Petal.Length" : 1.3,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.9,
##      "Sepal.Width" : 3.6,
##      "Petal.Length" : 1.4,
##      "Petal.Width" : 0.1,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.4,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 1.3,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.1,
##      "Sepal.Width" : 3.4,
##      "Petal.Length" : 1.5,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5,
##      "Sepal.Width" : 3.5,
##      "Petal.Length" : 1.3,
##      "Petal.Width" : 0.3,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.5,
##      "Sepal.Width" : 2.3,
##      "Petal.Length" : 1.3,
##      "Petal.Width" : 0.3,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.4,
##      "Sepal.Width" : 3.2,
##      "Petal.Length" : 1.3,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5,
##      "Sepal.Width" : 3.5,
##      "Petal.Length" : 1.6,
##      "Petal.Width" : 0.6,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.1,
##      "Sepal.Width" : 3.8,
##      "Petal.Length" : 1.9,
##      "Petal.Width" : 0.4,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.8,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 1.4,
##      "Petal.Width" : 0.3,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.1,
##      "Sepal.Width" : 3.8,
##      "Petal.Length" : 1.6,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.6,
##      "Sepal.Width" : 3.2,
##      "Petal.Length" : 1.4,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.3,
##      "Sepal.Width" : 3.7,
##      "Petal.Length" : 1.5,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5,
##      "Sepal.Width" : 3.3,
##      "Petal.Length" : 1.4,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 7,
##      "Sepal.Width" : 3.2,
##      "Petal.Length" : 4.7,
##      "Petal.Width" : 1.4,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.4,
##      "Sepal.Width" : 3.2,
##      "Petal.Length" : 4.5,
##      "Petal.Width" : 1.5,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.9,
##      "Sepal.Width" : 3.1,
##      "Petal.Length" : 4.9,
##      "Petal.Width" : 1.5,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.5,
##      "Sepal.Width" : 2.3,
##      "Petal.Length" : 4,
##      "Petal.Width" : 1.3,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.5,
##      "Sepal.Width" : 2.8,
##      "Petal.Length" : 4.6,
##      "Petal.Width" : 1.5,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.7,
##      "Sepal.Width" : 2.8,
##      "Petal.Length" : 4.5,
##      "Petal.Width" : 1.3,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.3,
##      "Sepal.Width" : 3.3,
##      "Petal.Length" : 4.7,
##      "Petal.Width" : 1.6,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 4.9,
##      "Sepal.Width" : 2.4,
##      "Petal.Length" : 3.3,
##      "Petal.Width" : 1,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.6,
##      "Sepal.Width" : 2.9,
##      "Petal.Length" : 4.6,
##      "Petal.Width" : 1.3,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.2,
##      "Sepal.Width" : 2.7,
##      "Petal.Length" : 3.9,
##      "Petal.Width" : 1.4,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5,
##      "Sepal.Width" : 2,
##      "Petal.Length" : 3.5,
##      "Petal.Width" : 1,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.9,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 4.2,
##      "Petal.Width" : 1.5,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6,
##      "Sepal.Width" : 2.2,
##      "Petal.Length" : 4,
##      "Petal.Width" : 1,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.1,
##      "Sepal.Width" : 2.9,
##      "Petal.Length" : 4.7,
##      "Petal.Width" : 1.4,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.6,
##      "Sepal.Width" : 2.9,
##      "Petal.Length" : 3.6,
##      "Petal.Width" : 1.3,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.7,
##      "Sepal.Width" : 3.1,
##      "Petal.Length" : 4.4,
##      "Petal.Width" : 1.4,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.6,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 4.5,
##      "Petal.Width" : 1.5,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.8,
##      "Sepal.Width" : 2.7,
##      "Petal.Length" : 4.1,
##      "Petal.Width" : 1,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.2,
##      "Sepal.Width" : 2.2,
##      "Petal.Length" : 4.5,
##      "Petal.Width" : 1.5,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.6,
##      "Sepal.Width" : 2.5,
##      "Petal.Length" : 3.9,
##      "Petal.Width" : 1.1,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.9,
##      "Sepal.Width" : 3.2,
##      "Petal.Length" : 4.8,
##      "Petal.Width" : 1.8,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.1,
##      "Sepal.Width" : 2.8,
##      "Petal.Length" : 4,
##      "Petal.Width" : 1.3,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.3,
##      "Sepal.Width" : 2.5,
##      "Petal.Length" : 4.9,
##      "Petal.Width" : 1.5,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.1,
##      "Sepal.Width" : 2.8,
##      "Petal.Length" : 4.7,
##      "Petal.Width" : 1.2,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.4,
##      "Sepal.Width" : 2.9,
##      "Petal.Length" : 4.3,
##      "Petal.Width" : 1.3,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.6,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 4.4,
##      "Petal.Width" : 1.4,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.8,
##      "Sepal.Width" : 2.8,
##      "Petal.Length" : 4.8,
##      "Petal.Width" : 1.4,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.7,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 5,
##      "Petal.Width" : 1.7,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6,
##      "Sepal.Width" : 2.9,
##      "Petal.Length" : 4.5,
##      "Petal.Width" : 1.5,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.7,
##      "Sepal.Width" : 2.6,
##      "Petal.Length" : 3.5,
##      "Petal.Width" : 1,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.5,
##      "Sepal.Width" : 2.4,
##      "Petal.Length" : 3.8,
##      "Petal.Width" : 1.1,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.5,
##      "Sepal.Width" : 2.4,
##      "Petal.Length" : 3.7,
##      "Petal.Width" : 1,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.8,
##      "Sepal.Width" : 2.7,
##      "Petal.Length" : 3.9,
##      "Petal.Width" : 1.2,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6,
##      "Sepal.Width" : 2.7,
##      "Petal.Length" : 5.1,
##      "Petal.Width" : 1.6,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.4,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 4.5,
##      "Petal.Width" : 1.5,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6,
##      "Sepal.Width" : 3.4,
##      "Petal.Length" : 4.5,
##      "Petal.Width" : 1.6,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.7,
##      "Sepal.Width" : 3.1,
##      "Petal.Length" : 4.7,
##      "Petal.Width" : 1.5,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.3,
##      "Sepal.Width" : 2.3,
##      "Petal.Length" : 4.4,
##      "Petal.Width" : 1.3,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.6,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 4.1,
##      "Petal.Width" : 1.3,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.5,
##      "Sepal.Width" : 2.5,
##      "Petal.Length" : 4,
##      "Petal.Width" : 1.3,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.5,
##      "Sepal.Width" : 2.6,
##      "Petal.Length" : 4.4,
##      "Petal.Width" : 1.2,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.1,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 4.6,
##      "Petal.Width" : 1.4,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.8,
##      "Sepal.Width" : 2.6,
##      "Petal.Length" : 4,
##      "Petal.Width" : 1.2,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5,
##      "Sepal.Width" : 2.3,
##      "Petal.Length" : 3.3,
##      "Petal.Width" : 1,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.6,
##      "Sepal.Width" : 2.7,
##      "Petal.Length" : 4.2,
##      "Petal.Width" : 1.3,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.7,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 4.2,
##      "Petal.Width" : 1.2,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.7,
##      "Sepal.Width" : 2.9,
##      "Petal.Length" : 4.2,
##      "Petal.Width" : 1.3,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.2,
##      "Sepal.Width" : 2.9,
##      "Petal.Length" : 4.3,
##      "Petal.Width" : 1.3,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.1,
##      "Sepal.Width" : 2.5,
##      "Petal.Length" : 3,
##      "Petal.Width" : 1.1,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.7,
##      "Sepal.Width" : 2.8,
##      "Petal.Length" : 4.1,
##      "Petal.Width" : 1.3,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.3,
##      "Sepal.Width" : 3.3,
##      "Petal.Length" : 6,
##      "Petal.Width" : 2.5,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 5.8,
##      "Sepal.Width" : 2.7,
##      "Petal.Length" : 5.1,
##      "Petal.Width" : 1.9,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 7.1,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 5.9,
##      "Petal.Width" : 2.1,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.3,
##      "Sepal.Width" : 2.9,
##      "Petal.Length" : 5.6,
##      "Petal.Width" : 1.8,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.5,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 5.8,
##      "Petal.Width" : 2.2,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 7.6,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 6.6,
##      "Petal.Width" : 2.1,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 4.9,
##      "Sepal.Width" : 2.5,
##      "Petal.Length" : 4.5,
##      "Petal.Width" : 1.7,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 7.3,
##      "Sepal.Width" : 2.9,
##      "Petal.Length" : 6.3,
##      "Petal.Width" : 1.8,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.7,
##      "Sepal.Width" : 2.5,
##      "Petal.Length" : 5.8,
##      "Petal.Width" : 1.8,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 7.2,
##      "Sepal.Width" : 3.6,
##      "Petal.Length" : 6.1,
##      "Petal.Width" : 2.5,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.5,
##      "Sepal.Width" : 3.2,
##      "Petal.Length" : 5.1,
##      "Petal.Width" : 2,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.4,
##      "Sepal.Width" : 2.7,
##      "Petal.Length" : 5.3,
##      "Petal.Width" : 1.9,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.8,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 5.5,
##      "Petal.Width" : 2.1,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 5.7,
##      "Sepal.Width" : 2.5,
##      "Petal.Length" : 5,
##      "Petal.Width" : 2,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 5.8,
##      "Sepal.Width" : 2.8,
##      "Petal.Length" : 5.1,
##      "Petal.Width" : 2.4,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.4,
##      "Sepal.Width" : 3.2,
##      "Petal.Length" : 5.3,
##      "Petal.Width" : 2.3,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.5,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 5.5,
##      "Petal.Width" : 1.8,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 7.7,
##      "Sepal.Width" : 3.8,
##      "Petal.Length" : 6.7,
##      "Petal.Width" : 2.2,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 7.7,
##      "Sepal.Width" : 2.6,
##      "Petal.Length" : 6.9,
##      "Petal.Width" : 2.3,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6,
##      "Sepal.Width" : 2.2,
##      "Petal.Length" : 5,
##      "Petal.Width" : 1.5,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.9,
##      "Sepal.Width" : 3.2,
##      "Petal.Length" : 5.7,
##      "Petal.Width" : 2.3,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 5.6,
##      "Sepal.Width" : 2.8,
##      "Petal.Length" : 4.9,
##      "Petal.Width" : 2,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 7.7,
##      "Sepal.Width" : 2.8,
##      "Petal.Length" : 6.7,
##      "Petal.Width" : 2,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.3,
##      "Sepal.Width" : 2.7,
##      "Petal.Length" : 4.9,
##      "Petal.Width" : 1.8,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.7,
##      "Sepal.Width" : 3.3,
##      "Petal.Length" : 5.7,
##      "Petal.Width" : 2.1,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 7.2,
##      "Sepal.Width" : 3.2,
##      "Petal.Length" : 6,
##      "Petal.Width" : 1.8,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.2,
##      "Sepal.Width" : 2.8,
##      "Petal.Length" : 4.8,
##      "Petal.Width" : 1.8,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.1,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 4.9,
##      "Petal.Width" : 1.8,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.4,
##      "Sepal.Width" : 2.8,
##      "Petal.Length" : 5.6,
##      "Petal.Width" : 2.1,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 7.2,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 5.8,
##      "Petal.Width" : 1.6,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 7.4,
##      "Sepal.Width" : 2.8,
##      "Petal.Length" : 6.1,
##      "Petal.Width" : 1.9,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 7.9,
##      "Sepal.Width" : 3.8,
##      "Petal.Length" : 6.4,
##      "Petal.Width" : 2,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.4,
##      "Sepal.Width" : 2.8,
##      "Petal.Length" : 5.6,
##      "Petal.Width" : 2.2,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.3,
##      "Sepal.Width" : 2.8,
##      "Petal.Length" : 5.1,
##      "Petal.Width" : 1.5,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.1,
##      "Sepal.Width" : 2.6,
##      "Petal.Length" : 5.6,
##      "Petal.Width" : 1.4,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 7.7,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 6.1,
##      "Petal.Width" : 2.3,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.3,
##      "Sepal.Width" : 3.4,
##      "Petal.Length" : 5.6,
##      "Petal.Width" : 2.4,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.4,
##      "Sepal.Width" : 3.1,
##      "Petal.Length" : 5.5,
##      "Petal.Width" : 1.8,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 4.8,
##      "Petal.Width" : 1.8,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.9,
##      "Sepal.Width" : 3.1,
##      "Petal.Length" : 5.4,
##      "Petal.Width" : 2.1,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.7,
##      "Sepal.Width" : 3.1,
##      "Petal.Length" : 5.6,
##      "Petal.Width" : 2.4,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.9,
##      "Sepal.Width" : 3.1,
##      "Petal.Length" : 5.1,
##      "Petal.Width" : 2.3,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 5.8,
##      "Sepal.Width" : 2.7,
##      "Petal.Length" : 5.1,
##      "Petal.Width" : 1.9,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.8,
##      "Sepal.Width" : 3.2,
##      "Petal.Length" : 5.9,
##      "Petal.Width" : 2.3,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.7,
##      "Sepal.Width" : 3.3,
##      "Petal.Length" : 5.7,
##      "Petal.Width" : 2.5,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.7,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 5.2,
##      "Petal.Width" : 2.3,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.3,
##      "Sepal.Width" : 2.5,
##      "Petal.Length" : 5,
##      "Petal.Width" : 1.9,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.5,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 5.2,
##      "Petal.Width" : 2,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.2,
##      "Sepal.Width" : 3.4,
##      "Petal.Length" : 5.4,
##      "Petal.Width" : 2.3,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 5.9,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 5.1,
##      "Petal.Width" : 1.8,
##      "Species" : "virginica"
##  }
## ]

mySQL files

About:

Free and widely used open source database software
Widely used in internet based applications
Data are structured in
- Databases
- Tables within databases
- Fields within tables
Each row is called a record

How to connect to a database and know what it contains? Example: UCSC database - http://genome.ucsc.edu/goldenPath/help/mysql.html

library(RMySQL)
ucscDb = dbConnect(MySQL(),user="genome", host="genome-mysql.cse.ucsc.edu") # connect
result = dbGetQuery(ucscDb,"show databases;"); # get the databases withing the UCSC database
dbDisconnect(ucscDb); # always disconnect!

## [1] TRUE

We can now see which databases are included in the UCSC database:

result

##               Database
## 1   information_schema
## 2              ailMel1
## 3              allMis1
## 4              anoCar1
## 5              anoCar2
## 6              anoGam1
## 7              apiMel1
## 8              apiMel2
## 9              aplCal1
## 10             balAcu1
## 11             bosTau2
## 12             bosTau3
## 13             bosTau4
## 14             bosTau5
## 15             bosTau6
## 16             bosTau7
## 17           bosTauMd3
## 18             braFlo1
## 19             caeJap1
## 20              caePb1
## 21              caePb2
## 22             caeRem2
## 23             caeRem3
## 24             calJac1
## 25             calJac3
## 26             calMil1
## 27             canFam1
## 28             canFam2
## 29             canFam3
## 30             cavPor3
## 31                 cb1
## 32                 cb3
## 33                ce10
## 34                 ce2
## 35                 ce4
## 36                 ce6
## 37             cerSim1
## 38             choHof1
## 39             chrPic1
## 40                 ci1
## 41                 ci2
## 42             criGri1
## 43             danRer1
## 44             danRer2
## 45             danRer3
## 46             danRer4
## 47             danRer5
## 48             danRer6
## 49             danRer7
## 50             dasNov3
## 51             dipOrd1
## 52                 dm1
## 53                 dm2
## 54                 dm3
## 55                 dp2
## 56                 dp3
## 57             droAna1
## 58             droAna2
## 59             droEre1
## 60             droGri1
## 61             droMoj1
## 62             droMoj2
## 63             droPer1
## 64             droSec1
## 65             droSim1
## 66             droVir1
## 67             droVir2
## 68             droYak1
## 69             droYak2
## 70             echTel1
## 71             echTel2
## 72             equCab1
## 73             equCab2
## 74             eriEur1
## 75             eriEur2
## 76             felCat3
## 77             felCat4
## 78             felCat5
## 79                 fr1
## 80                 fr2
## 81                 fr3
## 82             gadMor1
## 83             galGal2
## 84             galGal3
## 85             galGal4
## 86             gasAcu1
## 87             geoFor1
## 88                  go
## 89            go080130
## 90            go140213
## 91             gorGor3
## 92             hetGla1
## 93             hetGla2
## 94                hg16
## 95                hg17
## 96                hg18
## 97                hg19
## 98         hg19Patch10
## 99          hg19Patch2
## 100         hg19Patch5
## 101         hg19Patch9
## 102               hg38
## 103            hgFixed
## 104             hgTemp
## 105          hgcentral
## 106            latCha1
## 107            loxAfr3
## 108            macEug1
## 109            macEug2
## 110            melGal1
## 111            melUnd1
## 112            micMur1
## 113               mm10
## 114         mm10Patch1
## 115                mm5
## 116                mm6
## 117                mm7
## 118                mm8
## 119                mm9
## 120            monDom1
## 121            monDom4
## 122            monDom5
## 123            musFur1
## 124            myoLuc2
## 125            nomLeu1
## 126            nomLeu2
## 127            nomLeu3
## 128            ochPri2
## 129            oreNil1
## 130            oreNil2
## 131            ornAna1
## 132            oryCun2
## 133            oryLat2
## 134            otoGar3
## 135            oviAri1
## 136            oviAri3
## 137            panTro1
## 138            panTro2
## 139            panTro3
## 140            panTro4
## 141            papAnu2
## 142            papHam1
## 143 performance_schema
## 144            petMar1
## 145            petMar2
## 146            ponAbe2
## 147            priPac1
## 148            proCap1
## 149     proteins120806
## 150     proteins121210
## 151     proteins140122
## 152           proteome
## 153            pteVam1
## 154            rheMac1
## 155            rheMac2
## 156            rheMac3
## 157                rn3
## 158                rn4
## 159                rn5
## 160            sacCer1
## 161            sacCer2
## 162            sacCer3
## 163            saiBol1
## 164            sarHar1
## 165            sorAra1
## 166           sp120323
## 167           sp121210
## 168           sp140122
## 169            speTri2
## 170            strPur1
## 171            strPur2
## 172            susScr2
## 173            susScr3
## 174            taeGut1
## 175            taeGut2
## 176            tarSyr1
## 177               test
## 178            tetNig1
## 179            tetNig2
## 180            triMan1
## 181            tupBel1
## 182            turTru2
## 183            uniProt
## 184            vicPac1
## 185            vicPac2
## 186           visiGene
## 187            xenTro1
## 188            xenTro2
## 189            xenTro3

One of the databases (scroll down the list above and you'll find it) is the hg19. Wanna see which tables are contained in the database hg19?

library(RMySQL)

## Loading required package: DBI

## MYSQL_HOME defined as C:\Program Files\MySQL\MySQL Server 5.6

hg19 = dbConnect(MySQL(),user="genome", db="hg19",
                    host="genome-mysql.cse.ucsc.edu")
allTables = dbListTables(hg19)

Note that we did not disconnect yet, and we should when we are done collecting info from the database hg19. Let's look at the first tables in the database hg19 (there are very many)

allTables[1:19]

##  [1] "HInv"                      "HInvGeneMrna"             
##  [3] "acembly"                   "acemblyClass"             
##  [5] "acemblyPep"                "affyCytoScan"             
##  [7] "affyExonProbeAmbiguous"    "affyExonProbeCore"        
##  [9] "affyExonProbeExtended"     "affyExonProbeFree"        
## [11] "affyExonProbeFull"         "affyExonProbesetAmbiguous"
## [13] "affyExonProbesetCore"      "affyExonProbesetExtended" 
## [15] "affyExonProbesetFree"      "affyExonProbesetFull"     
## [17] "affyGnf1h"                 "affyU133"                 
## [19] "affyU133Plus2"

Right, so maybe we want to read one of these tables, the 19th table, called “affyU133Plus2”:

affydata=dbReadTable(hg19, "affyU133Plus2")

We can send instructions (like get dimensions of the table, select a specific subset, compute quartiles…). We should end by closing the connection:

dbListFields(hg19,"affyU133Plus2")

##  [1] "bin"         "matches"     "misMatches"  "repMatches"  "nCount"     
##  [6] "qNumInsert"  "qBaseInsert" "tNumInsert"  "tBaseInsert" "strand"     
## [11] "qName"       "qSize"       "qStart"      "qEnd"        "tName"      
## [16] "tSize"       "tStart"      "tEnd"        "blockCount"  "blockSizes" 
## [21] "qStarts"     "tStarts"

dbGetQuery(hg19, "select count(*) from affyU133Plus2")

##   count(*)
## 1    58463

query <- dbSendQuery(hg19, "select * from affyU133Plus2 where misMatches between 1 and 3")
affyMis <- fetch(query); quantile(affyMis$misMatches)

##   0%  25%  50%  75% 100% 
##    1    1    2    2    3

affyMisSmall <- fetch(query,n=10); dbClearResult(query);

## [1] TRUE

dim(affyMisSmall)

## [1] 10 22

dbDisconnect(hg19)

## [1] TRUE

HDF: hierarchical data format

About:

Used for storing large data sets
Supports storing a range of data types
The data is stored in groups containing zero or more datasets and metadata. Each group (can have subgroups) has a:
- group header with group name and list of attributes
- group symbol table with a list of objects in group
datasets are multidimensional array of data elements with metadata
- Have a header with name, datatype, dataspace, and storage layout
- Have a data array with the data (think of it as a dataframe)

To deal with this type of data, you first need to install a couple of things (first time you use it):

source("http://bioconductor.org/biocLite.R")
biocLite("rhdf5")

Then you call the library rhdf5

library(rhdf5)
file.remove("example.h5")

## [1] TRUE

created = h5createFile("example.h5")
created

## [1] TRUE

To see how to deal with this type of data we will create an example file:

h5createFile("example.h5")

## [1] FALSE

Remember that the data is stored in groups, which contain datasets. We start by creating these groups (examples of) and see what the file looks like after the groups have been created:

created = h5createGroup("example.h5","foo")
created = h5createGroup("example.h5","baa")
created = h5createGroup("example.h5","foo/foo_ex")
h5ls("example.h5") # ls stands for list

##   group   name     otype dclass dim
## 0     /    baa H5I_GROUP           
## 1     /    foo H5I_GROUP           
## 2  /foo foo_ex H5I_GROUP

Now let's get datasets on those groups (groups: foo, baa, and foob_ex, which is a subgroup of foo). Let's get two datasets in the group foo, leaving the group baa empty.

A = matrix(1:10,nr=5,nc=2)
h5write(A, "example.h5","foo/A")
B = array(seq(0.1,2.0,by=0.1),dim=c(5,2,2))
h5write(B, "example.h5","foo/B")
h5ls("example.h5")

##   group   name       otype  dclass       dim
## 0     /    baa   H5I_GROUP                  
## 1     /    foo   H5I_GROUP                  
## 2  /foo      A H5I_DATASET INTEGER     5 x 2
## 3  /foo      B H5I_DATASET   FLOAT 5 x 2 x 2
## 4  /foo foo_ex   H5I_GROUP

What do we see? The file contains two groups: baa and foo. Group baa is empty. Group foo contains one subgroup (foo_ex) and two data sets (usually it should be data-frames like objects): A and B. We can also write something to the root group:

df = data.frame(1L:5L,seq(0,1,length.out=5), c("ab","cde","fghi","a","s"), stringsAsFactors=FALSE)
h5write(df, "example.h5","df")
h5ls("example.h5")

##   group   name       otype   dclass       dim
## 0     /    baa   H5I_GROUP                   
## 1     /     df H5I_DATASET COMPOUND         5
## 2     /    foo   H5I_GROUP                   
## 3  /foo      A H5I_DATASET  INTEGER     5 x 2
## 4  /foo      B H5I_DATASET    FLOAT 5 x 2 x 2
## 5  /foo foo_ex   H5I_GROUP

Done with writing? Wanna read something (specifically: database A which is inside group foo)?

h5read("example.h5","foo/A")

##      [,1] [,2]
## [1,]    1    6
## [2,]    2    7
## [3,]    3    8
## [4,]    4    9
## [5,]    5   10

What else is cool? Say we want to rewrite the first three elements of the first column of A (which is inside group foo). That easy:

h5write(c(12,13,14),"example.h5","foo/A",index=list(1:3,1))
h5read("example.h5","foo/A")

##      [,1] [,2]
## [1,]   12    6
## [2,]   13    7
## [3,]   14    8
## [4,]    4    9
## [5,]    5   10

Reading APIs (application programming interfaces)

Most companies like twitter or facebook have an API where you can download data, like
- what people are posting on facebook
- what people are twitting
You can usually get this info using GET requests with specific URLs (see, e.g., https://dev.twitter.com/docs/api/1/get/blocks/blocking)
we can use the HTTR package to get data from these websites
usually the first step is creating an account (not a user account, but an account with the development team, see e.g. https://apps.twitter.com/)

How to do it? Example:

myapp = oauth_app("twitter", key="yourConsumerKeyHere",secret="yourConsumerSecretHere")  # start the authorisation process; the consumer key you get from the API website
sig = sign_oauth1.0(myapp, token = "yourTokenHere", token_secret = "yourTokenSecretHere") # again, the necessary info is usually found on the app website (at least for twitter)
homeTL = GET("https://api.twitter.com/1.1/statuses/home_timeline.json", sig) # URL corresponds to the twitter API and the data i'd like to get out (statuses on home timeline); with twitter you get JSON data
json1 = content(homeTL) # extract JSON data
json2 = jsonlite::fromJSON(toJSON(json1)) # the data coming from twitter is hard to read, so it might be a good idea to use the jsonlight package to reformat it to something more readable

Interacting more directly with files

For most stuff (reading SPSS data, octave data, etc, there are packages for that). For interacting more directly with files:

file - open a connection to a text file
url - open a connection to a url
gzfile - open a connection to a .gz file
bzfile - open a connection to a .bz2 file
?connections for more information
Remember to close connections