Getting data

This is an R Markdown document created with examples of how to get data using R. Based on Coursera's Getting and Cleaning Data.


Raw data -> tidy data

We aim at reproducibility. So when you treat raw data to obtain tidy data, you should end up with:


Using data.table instead of data.frame

You create it just like you would a data frame:

library(data.table)
## Warning: package 'data.table' was built under R version 3.0.3
DT = data.table(x=rnorm(9),y=rep(c("a","b","c"),each=3),z=rnorm(9))
DT
##          x y       z
## 1:  0.8060 a  0.4069
## 2:  0.5900 a  0.1419
## 3: -0.5667 a -0.6065
## 4:  0.1425 b -1.5311
## 5: -0.1231 b -0.3270
## 6: -0.5698 b -0.6895
## 7:  0.8695 c -0.7942
## 8: -0.8862 c -0.7771
## 9:  1.3449 c  0.4245

If you're concerned with memory space, you can check out the tables you are currently working with using the tables() function.

Row subsetting is the same as with data frames. Columns subsetting is quite different:

DT[,list(mean(x),sum(z))]
##        V1     V2
## 1: 0.1786 -3.752
DT[,table(y)]
## y
## a b c 
## 3 3 3
DT[, w:=z^2] # creates new variable w
##          x y       z       w
## 1:  0.8060 a  0.4069 0.16559
## 2:  0.5900 a  0.1419 0.02014
## 3: -0.5667 a -0.6065 0.36786
## 4:  0.1425 b -1.5311 2.34416
## 5: -0.1231 b -0.3270 0.10693
## 6: -0.5698 b -0.6895 0.47542
## 7:  0.8695 c -0.7942 0.63072
## 8: -0.8862 c -0.7771 0.60389
## 9:  1.3449 c  0.4245 0.18019

If you want to create a copy of a data table, explicitly use the function copy. If you just type DT2=DT, any changes you make to DT will be made to DT2.


Cool stuff: you can do more than one operation at once!

DT[,m:= {tmp <- (x+z); log2(tmp+5)}] # first create column tmp=x+z, then compute log2(tmp+5)
##          x y       z       w     m
## 1:  0.8060 a  0.4069 0.16559 2.635
## 2:  0.5900 a  0.1419 0.02014 2.519
## 3: -0.5667 a -0.6065 0.36786 1.936
## 4:  0.1425 b -1.5311 2.34416 1.853
## 5: -0.1231 b -0.3270 0.10693 2.186
## 6: -0.5698 b -0.6895 0.47542 1.903
## 7:  0.8695 c -0.7942 0.63072 2.344
## 8: -0.8862 c -0.7771 0.60389 1.738
## 9:  1.3449 c  0.4245 0.18019 2.759

Aggregating is also cool:

DT[,a:=x>0] # create boolean variable
##          x y       z       w     m     a
## 1:  0.8060 a  0.4069 0.16559 2.635  TRUE
## 2:  0.5900 a  0.1419 0.02014 2.519  TRUE
## 3: -0.5667 a -0.6065 0.36786 1.936 FALSE
## 4:  0.1425 b -1.5311 2.34416 1.853  TRUE
## 5: -0.1231 b -0.3270 0.10693 2.186 FALSE
## 6: -0.5698 b -0.6895 0.47542 1.903 FALSE
## 7:  0.8695 c -0.7942 0.63072 2.344  TRUE
## 8: -0.8862 c -0.7771 0.60389 1.738 FALSE
## 9:  1.3449 c  0.4245 0.18019 2.759  TRUE
DT[,b:= mean(x+w),by=a] # when a is true, compute mean(x+y) for the rows where a is true; when a is false, compute mean(x+y) for the rows where a is false!
##          x y       z       w     m     a       b
## 1:  0.8060 a  0.4069 0.16559 2.635  TRUE  1.4188
## 2:  0.5900 a  0.1419 0.02014 2.519  TRUE  1.4188
## 3: -0.5667 a -0.6065 0.36786 1.936 FALSE -0.1479
## 4:  0.1425 b -1.5311 2.34416 1.853  TRUE  1.4188
## 5: -0.1231 b -0.3270 0.10693 2.186 FALSE -0.1479
## 6: -0.5698 b -0.6895 0.47542 1.903 FALSE -0.1479
## 7:  0.8695 c -0.7942 0.63072 2.344  TRUE  1.4188
## 8: -0.8862 c -0.7771 0.60389 1.738 FALSE -0.1479
## 9:  1.3449 c  0.4245 0.18019 2.759  TRUE  1.4188

Count the number of times each variable appears with the special character .N:

set.seed(123);
DT = data.table(x=sample(letters[1:3], 1E5, TRUE))
DT[, .N, by=x] # much faster than doing a table of DT$x
##    x     N
## 1: a 33387
## 2: c 33201
## 3: b 33412

Very fast way to subset using a key:

DT = data.table(x=rep(c("a","b","c"),each=100), y=rnorm(300))
setkey(DT, x)
DT['a']
##      x        y
##   1: a  0.25959
##   2: a  0.91751
##   3: a -0.72232
##   4: a -0.80828
##   5: a -0.14135
##   6: a  2.25701
##   7: a -2.37955
##   8: a -0.45425
##   9: a -0.06007
##  10: a  0.86090
##  11: a -1.78466
##  12: a -0.13074
##  13: a -0.36984
##  14: a -0.18066
##  15: a -1.04973
##  16: a  0.37832
##  17: a -1.37079
##  18: a -0.31612
##  19: a  0.39435
##  20: a -1.68988
##  21: a -1.46234
##  22: a  2.55838
##  23: a  0.08789
##  24: a  1.73141
##  25: a  1.21513
##  26: a  0.29954
##  27: a -0.17246
##  28: a  1.13250
##  29: a  0.02320
##  30: a  1.33587
##  31: a -1.09879
##  32: a -0.58176
##  33: a  0.03892
##  34: a  1.07315
##  35: a  1.34970
##  36: a  1.19528
##  37: a -0.02218
##  38: a  0.69849
##  39: a  0.67241
##  40: a -0.79165
##  41: a -0.21791
##  42: a  0.02307
##  43: a  0.11539
##  44: a -0.27708
##  45: a  0.03688
##  46: a  0.47520
##  47: a  1.70749
##  48: a  1.07601
##  49: a -1.34571
##  50: a -1.44025
##  51: a -0.39393
##  52: a  0.58106
##  53: a -0.17079
##  54: a -0.90585
##  55: a  0.15621
##  56: a -0.37323
##  57: a -0.34587
##  58: a -0.35829
##  59: a -0.13307
##  60: a -0.08960
##  61: a  0.62793
##  62: a -1.42883
##  63: a  0.17255
##  64: a -0.79115
##  65: a  1.26204
##  66: a -0.26941
##  67: a  0.15698
##  68: a -0.76060
##  69: a  1.37060
##  70: a  0.03758
##  71: a  0.44949
##  72: a  2.78869
##  73: a -0.46849
##  74: a  1.01261
##  75: a -0.04374
##  76: a  1.40670
##  77: a  0.41993
##  78: a  0.31009
##  79: a  1.11905
##  80: a -1.29814
##  81: a -1.28248
##  82: a  1.65943
##  83: a  0.78375
##  84: a  0.57771
##  85: a -0.26725
##  86: a -0.64569
##  87: a -0.44953
##  88: a -0.82620
##  89: a  1.05504
##  90: a -0.87927
##  91: a -1.27713
##  92: a -0.63412
##  93: a  0.66470
##  94: a -0.50958
##  95: a  0.40736
##  96: a  1.67775
##  97: a -1.05206
##  98: a -0.63691
##  99: a  0.56539
## 100: a  0.38016
##      x        y

Merging two data tables fast using a key:

DT1 = data.table(x=c('a', 'a', 'b', 'dt1'), y=1:4)
DT2 = data.table(x=c('a', 'b', 'dt2'), z=5:7)
setkey(DT1, x); setkey(DT2, x)
merge(DT1, DT2)
##    x y z
## 1: a 1 5
## 2: a 2 5
## 3: b 3 6

Also, it is much faster to read using fread (returns a data.table) than read.table (returns data.frame). Check it out:

big_df = data.frame(x=rnorm(1E6), y=rnorm(1E6))
file = tempfile()
write.table(big_df, file=file, row.names=FALSE, col.names=TRUE, sep="\t", quote=FALSE)
system.time(fread(file))
## 
Read 51.0% of 1000000 rows
Read 89.0% of 1000000 rows
Read 1000000 rows and 2 (of 2) columns from 0.035 GB file in 00:00:04
##    user  system elapsed 
##    3.28    0.01    3.30
system.time(read.table(file, header=TRUE, sep="\t"))
##    user  system elapsed 
##   22.27    0.38   22.64

Downloading files

Often a data analysis process starts by downloading data from the internet and this should preferably be included in the script. It is important to know what directory you are working on and how to set a directory, check if a given directory exists or create one:


To download files you use the download.file() function. Example:

setInternet2(TRUE) # maybe not necessary for all systems
fileUrl = "https://data.baltimorecity.gov/api/views/dz54-2aru/rows.csv?accessType=DOWNLOAD"
download.file(fileUrl,destfile="./Data/cameras.csv") # on some systems, you might have to have method="curl"
dateDownloaded = date() # it is good practice to record the date of the download

Read flat files (usually a plain text file or a binary file)

  1. read.table function

    * This is the main function for reading data into R
    * Flexible and robust but requires more parameters
    * Reads the data into RAM - big data can cause problems
    * Important parameters: file, header, sep, row.names, nrows
    * Other important parameters
            * quote - you can tell R whether there are any quoted values quote="" means no quotes.
            * na.strings - set the character that represents a missing value.
            * nrows - how many rows to read of the file (e.g. nrows=10 reads 10 lines).
            * skip - number of lines to skip before starting to read
    
  2. read.csv

  3. read.csv2


Example:

cameraData <- read.table("./data/cameras.csv",sep=",",header=TRUE)
head(cameraData)
##                          address direction      street  crossStreet
## 1       S CATON AVE & BENSON AVE       N/B   Caton Ave   Benson Ave
## 2       S CATON AVE & BENSON AVE       S/B   Caton Ave   Benson Ave
## 3 WILKENS AVE & PINE HEIGHTS AVE       E/B Wilkens Ave Pine Heights
## 4        THE ALAMEDA & E 33RD ST       S/B The Alameda      33rd St
## 5        E 33RD ST & THE ALAMEDA       E/B      E 33rd  The Alameda
## 6        ERDMAN AVE & N MACON ST       E/B      Erdman     Macon St
##                 intersection                      Location.1
## 1     Caton Ave & Benson Ave (39.2693779962, -76.6688185297)
## 2     Caton Ave & Benson Ave (39.2693157898, -76.6689698176)
## 3 Wilkens Ave & Pine Heights  (39.2720252302, -76.676960806)
## 4     The Alameda  & 33rd St (39.3285013141, -76.5953545714)
## 5      E 33rd  & The Alameda (39.3283410623, -76.5953594625)
## 6         Erdman  & Macon St (39.3068045671, -76.5593167803)

Excel files

Example: download+read

if(!file.exists("Data")){dir.create("Data")}
fileUrl = "https://data.baltimorecity.gov/api/views/dz54-2aru/rows.xlsx?accessType=DOWNLOAD"
download.file(fileUrl,destfile="./data/cameras.xlsx", mode="wb") # mind the different download mode for these files, it may depend on your system
## Error: unsupported URL scheme
dateDownloaded = date()
library(xlsx)
## Loading required package: rJava
## Loading required package: xlsxjars
cameraData = read.xlsx("./data/cameras.xlsx",sheetIndex=1,header=TRUE)
head(cameraData)
##                          address direction      street  crossStreet
## 1       S CATON AVE & BENSON AVE       N/B   Caton Ave   Benson Ave
## 2       S CATON AVE & BENSON AVE       S/B   Caton Ave   Benson Ave
## 3 WILKENS AVE & PINE HEIGHTS AVE       E/B Wilkens Ave Pine Heights
## 4        THE ALAMEDA & E 33RD ST       S/B The Alameda      33rd St
## 5        E 33RD ST & THE ALAMEDA       E/B      E 33rd  The Alameda
## 6        ERDMAN AVE & N MACON ST       E/B      Erdman     Macon St
##                 intersection                      Location.1
## 1     Caton Ave & Benson Ave (39.2693779962, -76.6688185297)
## 2     Caton Ave & Benson Ave (39.2693157898, -76.6689698176)
## 3 Wilkens Ave & Pine Heights  (39.2720252302, -76.676960806)
## 4     The Alameda  & 33rd St (39.3285013141, -76.5953545714)
## 5      E 33rd  & The Alameda (39.3283410623, -76.5953594625)
## 6         Erdman  & Macon St (39.3068045671, -76.5593167803)

XML files

About XML:

Example: Read file http://www.w3schools.com/xml/simple.xml

Let's load the data and see what it looks like:

library(XML)
fileUrl = "http://www.w3schools.com/xml/simple.xml" 
doc = xmlTreeParse(fileUrl,useInternal=TRUE) # loads the document into R so that you can then parse it/look at parts of it/identifying what is what in the file
print(doc)
## <?xml version="1.0" encoding="UTF-8"?>
## <!-- Edited by XMLSpy -->
## <breakfast_menu>
##   <food>
##     <name>Belgian Waffles</name>
##     <price>$5.95</price>
##     <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
##     <calories>650</calories>
##   </food>
##   <food>
##     <name>Strawberry Belgian Waffles</name>
##     <price>$7.95</price>
##     <description>Light Belgian waffles covered with strawberries and whipped cream</description>
##     <calories>900</calories>
##   </food>
##   <food>
##     <name>Berry-Berry Belgian Waffles</name>
##     <price>$8.95</price>
##     <description>Light Belgian waffles covered with an assortment of fresh berries and whipped cream</description>
##     <calories>900</calories>
##   </food>
##   <food>
##     <name>French Toast</name>
##     <price>$4.50</price>
##     <description>Thick slices made from our homemade sourdough bread</description>
##     <calories>600</calories>
##   </food>
##   <food>
##     <name>Homestyle Breakfast</name>
##     <price>$6.95</price>
##     <description>Two eggs, bacon or sausage, toast, and our ever-popular hash browns</description>
##     <calories>950</calories>
##   </food>
## </breakfast_menu>
## 

We're actually interested in the root node:

rootNode = xmlRoot(doc)
print(rootNode)
## <breakfast_menu>
##   <food>
##     <name>Belgian Waffles</name>
##     <price>$5.95</price>
##     <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
##     <calories>650</calories>
##   </food>
##   <food>
##     <name>Strawberry Belgian Waffles</name>
##     <price>$7.95</price>
##     <description>Light Belgian waffles covered with strawberries and whipped cream</description>
##     <calories>900</calories>
##   </food>
##   <food>
##     <name>Berry-Berry Belgian Waffles</name>
##     <price>$8.95</price>
##     <description>Light Belgian waffles covered with an assortment of fresh berries and whipped cream</description>
##     <calories>900</calories>
##   </food>
##   <food>
##     <name>French Toast</name>
##     <price>$4.50</price>
##     <description>Thick slices made from our homemade sourdough bread</description>
##     <calories>600</calories>
##   </food>
##   <food>
##     <name>Homestyle Breakfast</name>
##     <price>$6.95</price>
##     <description>Two eggs, bacon or sausage, toast, and our ever-popular hash browns</description>
##     <calories>950</calories>
##   </food>
## </breakfast_menu>

We can check out its name, the names of the elements inside the root node or directly access parts of the document:

xmlName(rootNode)
## [1] "breakfast_menu"
names(rootNode)
##   food   food   food   food   food 
## "food" "food" "food" "food" "food"
rootNode[[1]][[2]]
## <price>$5.95</price>

We can programatically extract parts of the file

xmlSApply(rootNode,xmlValue)
##                                                                                                                     food 
##                               "Belgian Waffles$5.95Two of our famous Belgian Waffles with plenty of real maple syrup650" 
##                                                                                                                     food 
##                    "Strawberry Belgian Waffles$7.95Light Belgian waffles covered with strawberries and whipped cream900" 
##                                                                                                                     food 
## "Berry-Berry Belgian Waffles$8.95Light Belgian waffles covered with an assortment of fresh berries and whipped cream900" 
##                                                                                                                     food 
##                                                "French Toast$4.50Thick slices made from our homemade sourdough bread600" 
##                                                                                                                     food 
##                         "Homestyle Breakfast$6.95Two eggs, bacon or sausage, toast, and our ever-popular hash browns950"

The XPath function is quite useful to extract …

Example of usage:

xpathSApply(rootNode,"//name",xmlValue)
## [1] "Belgian Waffles"             "Strawberry Belgian Waffles" 
## [3] "Berry-Berry Belgian Waffles" "French Toast"               
## [5] "Homestyle Breakfast"
xpathSApply(rootNode,"//price",xmlValue)
## [1] "$5.95" "$7.95" "$8.95" "$4.50" "$6.95"
xpathSApply(rootNode,"//food[@class='french fries']",xmlValue)
## list()

The last instruction returns an empty list, because there are no foods called french fries. Let's look at an example where this type of instruction may be useful:

fileUrl = "http://espn.go.com/nfl/team/_/name/bal/baltimore-ravens"
doc = htmlTreeParse(fileUrl,useInternal=TRUE)
teams = xpathSApply(doc,"//li[@class='team-name']",xmlValue)
print(teams)
##  [1] "San Francisco" "Dallas"        "Washington"    "New Orleans"  
##  [5] "Cincinnati"    "Pittsburgh"    "Cleveland"     "Carolina"     
##  [9] "Indianapolis"  "Tampa Bay"     "Atlanta"       "Cincinnati"   
## [13] "Pittsburgh"    "Tennessee"     "New Orleans"   "San Diego"    
## [17] "Miami"         "Jacksonville"  "Houston"       "Cleveland"

HTML files

You may want to do webscraping, i.e., Programatically extracting data from the HTML code of websites:

Let's start by reading a webpage (http://scholar.google.com/citations?user=HI-I6C0AAAAJ&hl=en):

con = url("http://scholar.google.com/citations?user=HI-I6C0AAAAJ&hl=en") # open connection 
htmlCode = readLines(con) # read
## Warning: incomplete final line found on
## 'http://scholar.google.com/citations?user=HI-I6C0AAAAJ&hl=en'
close(con) # close connection

In this case, you a list of characters which is huge (which is why it is not displayed here). We can instead try using the XML package:

library(XML)
url = "http://scholar.google.com/citations?user=HI-I6C0AAAAJ&hl=en"
html_obtained_using_XML_library = htmlTreeParse(url, useInternalNodes=T) # it's too big to print here: let's call this ***

Then we can use the type of commands we saw before (see XML files section)

xpathSApply(html_obtained_using_XML_library, "//title", xmlValue)
## [1] "Jeff Leek - Google Scholar Citations"
xpathSApply(html_obtained_using_XML_library, "//td[@id='col-citedby']", xmlValue)
##  [1] "Cited by" "416"      "303"      "278"      "181"      "159"     
##  [7] "149"      "137"      "126"      "119"      "48"       "45"      
## [13] "40"       "34"       "23"       "16"       "14"       "13"      
## [19] "12"       "10"       "7"

There's still another alternative, which is rather useful when the site requires a login and a password. The result from using this approach (in this case, no login and password are required) is the same as using the XML library:

library(httr); 
## Warning: package 'httr' was built under R version 3.0.3
html = GET("http://scholar.google.com/citations?user=HI-I6C0AAAAJ&hl=en")
content = content(html,as="text")
parsedHtml = htmlParse(content,asText=TRUE)

So now the parsedHtml looks exactly like html_obtained_using_XML_library. Again, we can use functions like xpathSApply. If the website does require login and password (example: http://httpbin.org/basic-auth/user/passwd), this is how to proceed:

pg2 = GET("http://httpbin.org/basic-auth/user/passwd", authenticate("user","passwd"))
content = content(pg2,as="text")
parsedHtml = htmlParse(content,asText=TRUE)

Using handles (to authenticate in websites?):

google = handle("http://google.com")
pg1 = GET(handle=google,path="/")
pg2 = GET(handle=google,path="search")

JSON (Javascript Object Notation) data

About JSON data:

How to read it:

We will open the example JSON data https://api.github.com/users/jtleek/repos

library(jsonlite)
## Warning: package 'jsonlite' was built under R version 3.0.3
jsonData = fromJSON("https://api.github.com/users/jtleek/repos") 

You actually get back a data frame! Wanna look at the name of its columns?

colnames(jsonData)
##  [1] "id"                "name"              "full_name"        
##  [4] "owner"             "private"           "html_url"         
##  [7] "description"       "fork"              "url"              
## [10] "forks_url"         "keys_url"          "collaborators_url"
## [13] "teams_url"         "hooks_url"         "issue_events_url" 
## [16] "events_url"        "assignees_url"     "branches_url"     
## [19] "tags_url"          "blobs_url"         "git_tags_url"     
## [22] "git_refs_url"      "trees_url"         "statuses_url"     
## [25] "languages_url"     "stargazers_url"    "contributors_url" 
## [28] "subscribers_url"   "subscription_url"  "commits_url"      
## [31] "git_commits_url"   "comments_url"      "issue_comment_url"
## [34] "contents_url"      "compare_url"       "merges_url"       
## [37] "archive_url"       "downloads_url"     "issues_url"       
## [40] "pulls_url"         "milestones_url"    "notifications_url"
## [43] "labels_url"        "releases_url"      "created_at"       
## [46] "updated_at"        "pushed_at"         "git_url"          
## [49] "ssh_url"           "clone_url"         "svn_url"          
## [52] "homepage"          "size"              "stargazers_count" 
## [55] "watchers_count"    "language"          "has_issues"       
## [58] "has_downloads"     "has_wiki"          "forks_count"      
## [61] "mirror_url"        "open_issues_count" "forks"            
## [64] "open_issues"       "watchers"          "default_branch"

Now if you look at the original Json file, you will see that there's a lot of info on the “owner” entry. Let's explore that:

class(jsonData$owner)
## [1] "data.frame"
names(jsonData$owner)
##  [1] "login"               "id"                  "avatar_url"         
##  [4] "gravatar_id"         "url"                 "html_url"           
##  [7] "followers_url"       "following_url"       "gists_url"          
## [10] "starred_url"         "subscriptions_url"   "organizations_url"  
## [13] "repos_url"           "events_url"          "received_events_url"
## [16] "type"                "site_admin"
jsonData$owner$gists_url
##  [1] "https://api.github.com/users/jtleek/gists{/gist_id}"
##  [2] "https://api.github.com/users/jtleek/gists{/gist_id}"
##  [3] "https://api.github.com/users/jtleek/gists{/gist_id}"
##  [4] "https://api.github.com/users/jtleek/gists{/gist_id}"
##  [5] "https://api.github.com/users/jtleek/gists{/gist_id}"
##  [6] "https://api.github.com/users/jtleek/gists{/gist_id}"
##  [7] "https://api.github.com/users/jtleek/gists{/gist_id}"
##  [8] "https://api.github.com/users/jtleek/gists{/gist_id}"
##  [9] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [10] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [11] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [12] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [13] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [14] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [15] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [16] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [17] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [18] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [19] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [20] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [21] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [22] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [23] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [24] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [25] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [26] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [27] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [28] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [29] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [30] "https://api.github.com/users/jtleek/gists{/gist_id}"

You can also write data frames to Json:

head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
myjson = toJSON(iris, pretty=TRUE)
print(myjson)
## [
##  {
##      "Sepal.Length" : 5.1,
##      "Sepal.Width" : 3.5,
##      "Petal.Length" : 1.4,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.9,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 1.4,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.7,
##      "Sepal.Width" : 3.2,
##      "Petal.Length" : 1.3,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.6,
##      "Sepal.Width" : 3.1,
##      "Petal.Length" : 1.5,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5,
##      "Sepal.Width" : 3.6,
##      "Petal.Length" : 1.4,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.4,
##      "Sepal.Width" : 3.9,
##      "Petal.Length" : 1.7,
##      "Petal.Width" : 0.4,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.6,
##      "Sepal.Width" : 3.4,
##      "Petal.Length" : 1.4,
##      "Petal.Width" : 0.3,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5,
##      "Sepal.Width" : 3.4,
##      "Petal.Length" : 1.5,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.4,
##      "Sepal.Width" : 2.9,
##      "Petal.Length" : 1.4,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.9,
##      "Sepal.Width" : 3.1,
##      "Petal.Length" : 1.5,
##      "Petal.Width" : 0.1,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.4,
##      "Sepal.Width" : 3.7,
##      "Petal.Length" : 1.5,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.8,
##      "Sepal.Width" : 3.4,
##      "Petal.Length" : 1.6,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.8,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 1.4,
##      "Petal.Width" : 0.1,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.3,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 1.1,
##      "Petal.Width" : 0.1,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.8,
##      "Sepal.Width" : 4,
##      "Petal.Length" : 1.2,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.7,
##      "Sepal.Width" : 4.4,
##      "Petal.Length" : 1.5,
##      "Petal.Width" : 0.4,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.4,
##      "Sepal.Width" : 3.9,
##      "Petal.Length" : 1.3,
##      "Petal.Width" : 0.4,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.1,
##      "Sepal.Width" : 3.5,
##      "Petal.Length" : 1.4,
##      "Petal.Width" : 0.3,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.7,
##      "Sepal.Width" : 3.8,
##      "Petal.Length" : 1.7,
##      "Petal.Width" : 0.3,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.1,
##      "Sepal.Width" : 3.8,
##      "Petal.Length" : 1.5,
##      "Petal.Width" : 0.3,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.4,
##      "Sepal.Width" : 3.4,
##      "Petal.Length" : 1.7,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.1,
##      "Sepal.Width" : 3.7,
##      "Petal.Length" : 1.5,
##      "Petal.Width" : 0.4,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.6,
##      "Sepal.Width" : 3.6,
##      "Petal.Length" : 1,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.1,
##      "Sepal.Width" : 3.3,
##      "Petal.Length" : 1.7,
##      "Petal.Width" : 0.5,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.8,
##      "Sepal.Width" : 3.4,
##      "Petal.Length" : 1.9,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 1.6,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5,
##      "Sepal.Width" : 3.4,
##      "Petal.Length" : 1.6,
##      "Petal.Width" : 0.4,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.2,
##      "Sepal.Width" : 3.5,
##      "Petal.Length" : 1.5,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.2,
##      "Sepal.Width" : 3.4,
##      "Petal.Length" : 1.4,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.7,
##      "Sepal.Width" : 3.2,
##      "Petal.Length" : 1.6,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.8,
##      "Sepal.Width" : 3.1,
##      "Petal.Length" : 1.6,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.4,
##      "Sepal.Width" : 3.4,
##      "Petal.Length" : 1.5,
##      "Petal.Width" : 0.4,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.2,
##      "Sepal.Width" : 4.1,
##      "Petal.Length" : 1.5,
##      "Petal.Width" : 0.1,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.5,
##      "Sepal.Width" : 4.2,
##      "Petal.Length" : 1.4,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.9,
##      "Sepal.Width" : 3.1,
##      "Petal.Length" : 1.5,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5,
##      "Sepal.Width" : 3.2,
##      "Petal.Length" : 1.2,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.5,
##      "Sepal.Width" : 3.5,
##      "Petal.Length" : 1.3,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.9,
##      "Sepal.Width" : 3.6,
##      "Petal.Length" : 1.4,
##      "Petal.Width" : 0.1,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.4,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 1.3,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.1,
##      "Sepal.Width" : 3.4,
##      "Petal.Length" : 1.5,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5,
##      "Sepal.Width" : 3.5,
##      "Petal.Length" : 1.3,
##      "Petal.Width" : 0.3,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.5,
##      "Sepal.Width" : 2.3,
##      "Petal.Length" : 1.3,
##      "Petal.Width" : 0.3,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.4,
##      "Sepal.Width" : 3.2,
##      "Petal.Length" : 1.3,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5,
##      "Sepal.Width" : 3.5,
##      "Petal.Length" : 1.6,
##      "Petal.Width" : 0.6,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.1,
##      "Sepal.Width" : 3.8,
##      "Petal.Length" : 1.9,
##      "Petal.Width" : 0.4,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.8,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 1.4,
##      "Petal.Width" : 0.3,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.1,
##      "Sepal.Width" : 3.8,
##      "Petal.Length" : 1.6,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 4.6,
##      "Sepal.Width" : 3.2,
##      "Petal.Length" : 1.4,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5.3,
##      "Sepal.Width" : 3.7,
##      "Petal.Length" : 1.5,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 5,
##      "Sepal.Width" : 3.3,
##      "Petal.Length" : 1.4,
##      "Petal.Width" : 0.2,
##      "Species" : "setosa"
##  },
##  {
##      "Sepal.Length" : 7,
##      "Sepal.Width" : 3.2,
##      "Petal.Length" : 4.7,
##      "Petal.Width" : 1.4,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.4,
##      "Sepal.Width" : 3.2,
##      "Petal.Length" : 4.5,
##      "Petal.Width" : 1.5,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.9,
##      "Sepal.Width" : 3.1,
##      "Petal.Length" : 4.9,
##      "Petal.Width" : 1.5,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.5,
##      "Sepal.Width" : 2.3,
##      "Petal.Length" : 4,
##      "Petal.Width" : 1.3,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.5,
##      "Sepal.Width" : 2.8,
##      "Petal.Length" : 4.6,
##      "Petal.Width" : 1.5,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.7,
##      "Sepal.Width" : 2.8,
##      "Petal.Length" : 4.5,
##      "Petal.Width" : 1.3,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.3,
##      "Sepal.Width" : 3.3,
##      "Petal.Length" : 4.7,
##      "Petal.Width" : 1.6,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 4.9,
##      "Sepal.Width" : 2.4,
##      "Petal.Length" : 3.3,
##      "Petal.Width" : 1,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.6,
##      "Sepal.Width" : 2.9,
##      "Petal.Length" : 4.6,
##      "Petal.Width" : 1.3,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.2,
##      "Sepal.Width" : 2.7,
##      "Petal.Length" : 3.9,
##      "Petal.Width" : 1.4,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5,
##      "Sepal.Width" : 2,
##      "Petal.Length" : 3.5,
##      "Petal.Width" : 1,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.9,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 4.2,
##      "Petal.Width" : 1.5,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6,
##      "Sepal.Width" : 2.2,
##      "Petal.Length" : 4,
##      "Petal.Width" : 1,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.1,
##      "Sepal.Width" : 2.9,
##      "Petal.Length" : 4.7,
##      "Petal.Width" : 1.4,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.6,
##      "Sepal.Width" : 2.9,
##      "Petal.Length" : 3.6,
##      "Petal.Width" : 1.3,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.7,
##      "Sepal.Width" : 3.1,
##      "Petal.Length" : 4.4,
##      "Petal.Width" : 1.4,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.6,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 4.5,
##      "Petal.Width" : 1.5,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.8,
##      "Sepal.Width" : 2.7,
##      "Petal.Length" : 4.1,
##      "Petal.Width" : 1,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.2,
##      "Sepal.Width" : 2.2,
##      "Petal.Length" : 4.5,
##      "Petal.Width" : 1.5,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.6,
##      "Sepal.Width" : 2.5,
##      "Petal.Length" : 3.9,
##      "Petal.Width" : 1.1,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.9,
##      "Sepal.Width" : 3.2,
##      "Petal.Length" : 4.8,
##      "Petal.Width" : 1.8,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.1,
##      "Sepal.Width" : 2.8,
##      "Petal.Length" : 4,
##      "Petal.Width" : 1.3,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.3,
##      "Sepal.Width" : 2.5,
##      "Petal.Length" : 4.9,
##      "Petal.Width" : 1.5,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.1,
##      "Sepal.Width" : 2.8,
##      "Petal.Length" : 4.7,
##      "Petal.Width" : 1.2,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.4,
##      "Sepal.Width" : 2.9,
##      "Petal.Length" : 4.3,
##      "Petal.Width" : 1.3,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.6,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 4.4,
##      "Petal.Width" : 1.4,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.8,
##      "Sepal.Width" : 2.8,
##      "Petal.Length" : 4.8,
##      "Petal.Width" : 1.4,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.7,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 5,
##      "Petal.Width" : 1.7,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6,
##      "Sepal.Width" : 2.9,
##      "Petal.Length" : 4.5,
##      "Petal.Width" : 1.5,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.7,
##      "Sepal.Width" : 2.6,
##      "Petal.Length" : 3.5,
##      "Petal.Width" : 1,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.5,
##      "Sepal.Width" : 2.4,
##      "Petal.Length" : 3.8,
##      "Petal.Width" : 1.1,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.5,
##      "Sepal.Width" : 2.4,
##      "Petal.Length" : 3.7,
##      "Petal.Width" : 1,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.8,
##      "Sepal.Width" : 2.7,
##      "Petal.Length" : 3.9,
##      "Petal.Width" : 1.2,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6,
##      "Sepal.Width" : 2.7,
##      "Petal.Length" : 5.1,
##      "Petal.Width" : 1.6,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.4,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 4.5,
##      "Petal.Width" : 1.5,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6,
##      "Sepal.Width" : 3.4,
##      "Petal.Length" : 4.5,
##      "Petal.Width" : 1.6,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.7,
##      "Sepal.Width" : 3.1,
##      "Petal.Length" : 4.7,
##      "Petal.Width" : 1.5,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.3,
##      "Sepal.Width" : 2.3,
##      "Petal.Length" : 4.4,
##      "Petal.Width" : 1.3,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.6,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 4.1,
##      "Petal.Width" : 1.3,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.5,
##      "Sepal.Width" : 2.5,
##      "Petal.Length" : 4,
##      "Petal.Width" : 1.3,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.5,
##      "Sepal.Width" : 2.6,
##      "Petal.Length" : 4.4,
##      "Petal.Width" : 1.2,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.1,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 4.6,
##      "Petal.Width" : 1.4,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.8,
##      "Sepal.Width" : 2.6,
##      "Petal.Length" : 4,
##      "Petal.Width" : 1.2,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5,
##      "Sepal.Width" : 2.3,
##      "Petal.Length" : 3.3,
##      "Petal.Width" : 1,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.6,
##      "Sepal.Width" : 2.7,
##      "Petal.Length" : 4.2,
##      "Petal.Width" : 1.3,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.7,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 4.2,
##      "Petal.Width" : 1.2,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.7,
##      "Sepal.Width" : 2.9,
##      "Petal.Length" : 4.2,
##      "Petal.Width" : 1.3,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.2,
##      "Sepal.Width" : 2.9,
##      "Petal.Length" : 4.3,
##      "Petal.Width" : 1.3,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.1,
##      "Sepal.Width" : 2.5,
##      "Petal.Length" : 3,
##      "Petal.Width" : 1.1,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 5.7,
##      "Sepal.Width" : 2.8,
##      "Petal.Length" : 4.1,
##      "Petal.Width" : 1.3,
##      "Species" : "versicolor"
##  },
##  {
##      "Sepal.Length" : 6.3,
##      "Sepal.Width" : 3.3,
##      "Petal.Length" : 6,
##      "Petal.Width" : 2.5,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 5.8,
##      "Sepal.Width" : 2.7,
##      "Petal.Length" : 5.1,
##      "Petal.Width" : 1.9,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 7.1,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 5.9,
##      "Petal.Width" : 2.1,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.3,
##      "Sepal.Width" : 2.9,
##      "Petal.Length" : 5.6,
##      "Petal.Width" : 1.8,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.5,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 5.8,
##      "Petal.Width" : 2.2,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 7.6,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 6.6,
##      "Petal.Width" : 2.1,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 4.9,
##      "Sepal.Width" : 2.5,
##      "Petal.Length" : 4.5,
##      "Petal.Width" : 1.7,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 7.3,
##      "Sepal.Width" : 2.9,
##      "Petal.Length" : 6.3,
##      "Petal.Width" : 1.8,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.7,
##      "Sepal.Width" : 2.5,
##      "Petal.Length" : 5.8,
##      "Petal.Width" : 1.8,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 7.2,
##      "Sepal.Width" : 3.6,
##      "Petal.Length" : 6.1,
##      "Petal.Width" : 2.5,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.5,
##      "Sepal.Width" : 3.2,
##      "Petal.Length" : 5.1,
##      "Petal.Width" : 2,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.4,
##      "Sepal.Width" : 2.7,
##      "Petal.Length" : 5.3,
##      "Petal.Width" : 1.9,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.8,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 5.5,
##      "Petal.Width" : 2.1,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 5.7,
##      "Sepal.Width" : 2.5,
##      "Petal.Length" : 5,
##      "Petal.Width" : 2,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 5.8,
##      "Sepal.Width" : 2.8,
##      "Petal.Length" : 5.1,
##      "Petal.Width" : 2.4,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.4,
##      "Sepal.Width" : 3.2,
##      "Petal.Length" : 5.3,
##      "Petal.Width" : 2.3,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.5,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 5.5,
##      "Petal.Width" : 1.8,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 7.7,
##      "Sepal.Width" : 3.8,
##      "Petal.Length" : 6.7,
##      "Petal.Width" : 2.2,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 7.7,
##      "Sepal.Width" : 2.6,
##      "Petal.Length" : 6.9,
##      "Petal.Width" : 2.3,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6,
##      "Sepal.Width" : 2.2,
##      "Petal.Length" : 5,
##      "Petal.Width" : 1.5,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.9,
##      "Sepal.Width" : 3.2,
##      "Petal.Length" : 5.7,
##      "Petal.Width" : 2.3,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 5.6,
##      "Sepal.Width" : 2.8,
##      "Petal.Length" : 4.9,
##      "Petal.Width" : 2,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 7.7,
##      "Sepal.Width" : 2.8,
##      "Petal.Length" : 6.7,
##      "Petal.Width" : 2,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.3,
##      "Sepal.Width" : 2.7,
##      "Petal.Length" : 4.9,
##      "Petal.Width" : 1.8,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.7,
##      "Sepal.Width" : 3.3,
##      "Petal.Length" : 5.7,
##      "Petal.Width" : 2.1,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 7.2,
##      "Sepal.Width" : 3.2,
##      "Petal.Length" : 6,
##      "Petal.Width" : 1.8,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.2,
##      "Sepal.Width" : 2.8,
##      "Petal.Length" : 4.8,
##      "Petal.Width" : 1.8,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.1,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 4.9,
##      "Petal.Width" : 1.8,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.4,
##      "Sepal.Width" : 2.8,
##      "Petal.Length" : 5.6,
##      "Petal.Width" : 2.1,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 7.2,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 5.8,
##      "Petal.Width" : 1.6,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 7.4,
##      "Sepal.Width" : 2.8,
##      "Petal.Length" : 6.1,
##      "Petal.Width" : 1.9,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 7.9,
##      "Sepal.Width" : 3.8,
##      "Petal.Length" : 6.4,
##      "Petal.Width" : 2,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.4,
##      "Sepal.Width" : 2.8,
##      "Petal.Length" : 5.6,
##      "Petal.Width" : 2.2,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.3,
##      "Sepal.Width" : 2.8,
##      "Petal.Length" : 5.1,
##      "Petal.Width" : 1.5,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.1,
##      "Sepal.Width" : 2.6,
##      "Petal.Length" : 5.6,
##      "Petal.Width" : 1.4,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 7.7,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 6.1,
##      "Petal.Width" : 2.3,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.3,
##      "Sepal.Width" : 3.4,
##      "Petal.Length" : 5.6,
##      "Petal.Width" : 2.4,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.4,
##      "Sepal.Width" : 3.1,
##      "Petal.Length" : 5.5,
##      "Petal.Width" : 1.8,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 4.8,
##      "Petal.Width" : 1.8,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.9,
##      "Sepal.Width" : 3.1,
##      "Petal.Length" : 5.4,
##      "Petal.Width" : 2.1,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.7,
##      "Sepal.Width" : 3.1,
##      "Petal.Length" : 5.6,
##      "Petal.Width" : 2.4,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.9,
##      "Sepal.Width" : 3.1,
##      "Petal.Length" : 5.1,
##      "Petal.Width" : 2.3,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 5.8,
##      "Sepal.Width" : 2.7,
##      "Petal.Length" : 5.1,
##      "Petal.Width" : 1.9,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.8,
##      "Sepal.Width" : 3.2,
##      "Petal.Length" : 5.9,
##      "Petal.Width" : 2.3,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.7,
##      "Sepal.Width" : 3.3,
##      "Petal.Length" : 5.7,
##      "Petal.Width" : 2.5,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.7,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 5.2,
##      "Petal.Width" : 2.3,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.3,
##      "Sepal.Width" : 2.5,
##      "Petal.Length" : 5,
##      "Petal.Width" : 1.9,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.5,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 5.2,
##      "Petal.Width" : 2,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 6.2,
##      "Sepal.Width" : 3.4,
##      "Petal.Length" : 5.4,
##      "Petal.Width" : 2.3,
##      "Species" : "virginica"
##  },
##  {
##      "Sepal.Length" : 5.9,
##      "Sepal.Width" : 3,
##      "Petal.Length" : 5.1,
##      "Petal.Width" : 1.8,
##      "Species" : "virginica"
##  }
## ]

mySQL files

About:


How to connect to a database and know what it contains? Example: UCSC database - http://genome.ucsc.edu/goldenPath/help/mysql.html

library(RMySQL)
ucscDb = dbConnect(MySQL(),user="genome", host="genome-mysql.cse.ucsc.edu") # connect
result = dbGetQuery(ucscDb,"show databases;"); # get the databases withing the UCSC database
dbDisconnect(ucscDb); # always disconnect!
## [1] TRUE

We can now see which databases are included in the UCSC database:

result
##               Database
## 1   information_schema
## 2              ailMel1
## 3              allMis1
## 4              anoCar1
## 5              anoCar2
## 6              anoGam1
## 7              apiMel1
## 8              apiMel2
## 9              aplCal1
## 10             balAcu1
## 11             bosTau2
## 12             bosTau3
## 13             bosTau4
## 14             bosTau5
## 15             bosTau6
## 16             bosTau7
## 17           bosTauMd3
## 18             braFlo1
## 19             caeJap1
## 20              caePb1
## 21              caePb2
## 22             caeRem2
## 23             caeRem3
## 24             calJac1
## 25             calJac3
## 26             calMil1
## 27             canFam1
## 28             canFam2
## 29             canFam3
## 30             cavPor3
## 31                 cb1
## 32                 cb3
## 33                ce10
## 34                 ce2
## 35                 ce4
## 36                 ce6
## 37             cerSim1
## 38             choHof1
## 39             chrPic1
## 40                 ci1
## 41                 ci2
## 42             criGri1
## 43             danRer1
## 44             danRer2
## 45             danRer3
## 46             danRer4
## 47             danRer5
## 48             danRer6
## 49             danRer7
## 50             dasNov3
## 51             dipOrd1
## 52                 dm1
## 53                 dm2
## 54                 dm3
## 55                 dp2
## 56                 dp3
## 57             droAna1
## 58             droAna2
## 59             droEre1
## 60             droGri1
## 61             droMoj1
## 62             droMoj2
## 63             droPer1
## 64             droSec1
## 65             droSim1
## 66             droVir1
## 67             droVir2
## 68             droYak1
## 69             droYak2
## 70             echTel1
## 71             echTel2
## 72             equCab1
## 73             equCab2
## 74             eriEur1
## 75             eriEur2
## 76             felCat3
## 77             felCat4
## 78             felCat5
## 79                 fr1
## 80                 fr2
## 81                 fr3
## 82             gadMor1
## 83             galGal2
## 84             galGal3
## 85             galGal4
## 86             gasAcu1
## 87             geoFor1
## 88                  go
## 89            go080130
## 90            go140213
## 91             gorGor3
## 92             hetGla1
## 93             hetGla2
## 94                hg16
## 95                hg17
## 96                hg18
## 97                hg19
## 98         hg19Patch10
## 99          hg19Patch2
## 100         hg19Patch5
## 101         hg19Patch9
## 102               hg38
## 103            hgFixed
## 104             hgTemp
## 105          hgcentral
## 106            latCha1
## 107            loxAfr3
## 108            macEug1
## 109            macEug2
## 110            melGal1
## 111            melUnd1
## 112            micMur1
## 113               mm10
## 114         mm10Patch1
## 115                mm5
## 116                mm6
## 117                mm7
## 118                mm8
## 119                mm9
## 120            monDom1
## 121            monDom4
## 122            monDom5
## 123            musFur1
## 124            myoLuc2
## 125            nomLeu1
## 126            nomLeu2
## 127            nomLeu3
## 128            ochPri2
## 129            oreNil1
## 130            oreNil2
## 131            ornAna1
## 132            oryCun2
## 133            oryLat2
## 134            otoGar3
## 135            oviAri1
## 136            oviAri3
## 137            panTro1
## 138            panTro2
## 139            panTro3
## 140            panTro4
## 141            papAnu2
## 142            papHam1
## 143 performance_schema
## 144            petMar1
## 145            petMar2
## 146            ponAbe2
## 147            priPac1
## 148            proCap1
## 149     proteins120806
## 150     proteins121210
## 151     proteins140122
## 152           proteome
## 153            pteVam1
## 154            rheMac1
## 155            rheMac2
## 156            rheMac3
## 157                rn3
## 158                rn4
## 159                rn5
## 160            sacCer1
## 161            sacCer2
## 162            sacCer3
## 163            saiBol1
## 164            sarHar1
## 165            sorAra1
## 166           sp120323
## 167           sp121210
## 168           sp140122
## 169            speTri2
## 170            strPur1
## 171            strPur2
## 172            susScr2
## 173            susScr3
## 174            taeGut1
## 175            taeGut2
## 176            tarSyr1
## 177               test
## 178            tetNig1
## 179            tetNig2
## 180            triMan1
## 181            tupBel1
## 182            turTru2
## 183            uniProt
## 184            vicPac1
## 185            vicPac2
## 186           visiGene
## 187            xenTro1
## 188            xenTro2
## 189            xenTro3

One of the databases (scroll down the list above and you'll find it) is the hg19. Wanna see which tables are contained in the database hg19?

library(RMySQL)
## Loading required package: DBI
## MYSQL_HOME defined as C:\Program Files\MySQL\MySQL Server 5.6
hg19 = dbConnect(MySQL(),user="genome", db="hg19",
                    host="genome-mysql.cse.ucsc.edu")
allTables = dbListTables(hg19)

Note that we did not disconnect yet, and we should when we are done collecting info from the database hg19. Let's look at the first tables in the database hg19 (there are very many)

allTables[1:19]
##  [1] "HInv"                      "HInvGeneMrna"             
##  [3] "acembly"                   "acemblyClass"             
##  [5] "acemblyPep"                "affyCytoScan"             
##  [7] "affyExonProbeAmbiguous"    "affyExonProbeCore"        
##  [9] "affyExonProbeExtended"     "affyExonProbeFree"        
## [11] "affyExonProbeFull"         "affyExonProbesetAmbiguous"
## [13] "affyExonProbesetCore"      "affyExonProbesetExtended" 
## [15] "affyExonProbesetFree"      "affyExonProbesetFull"     
## [17] "affyGnf1h"                 "affyU133"                 
## [19] "affyU133Plus2"

Right, so maybe we want to read one of these tables, the 19th table, called “affyU133Plus2”:

affydata=dbReadTable(hg19, "affyU133Plus2")

We can send instructions (like get dimensions of the table, select a specific subset, compute quartiles…). We should end by closing the connection:

dbListFields(hg19,"affyU133Plus2")
##  [1] "bin"         "matches"     "misMatches"  "repMatches"  "nCount"     
##  [6] "qNumInsert"  "qBaseInsert" "tNumInsert"  "tBaseInsert" "strand"     
## [11] "qName"       "qSize"       "qStart"      "qEnd"        "tName"      
## [16] "tSize"       "tStart"      "tEnd"        "blockCount"  "blockSizes" 
## [21] "qStarts"     "tStarts"
dbGetQuery(hg19, "select count(*) from affyU133Plus2")
##   count(*)
## 1    58463
query <- dbSendQuery(hg19, "select * from affyU133Plus2 where misMatches between 1 and 3")
affyMis <- fetch(query); quantile(affyMis$misMatches)
##   0%  25%  50%  75% 100% 
##    1    1    2    2    3
affyMisSmall <- fetch(query,n=10); dbClearResult(query);
## [1] TRUE
dim(affyMisSmall)
## [1] 10 22
dbDisconnect(hg19)
## [1] TRUE

HDF: hierarchical data format

About:


To deal with this type of data, you first need to install a couple of things (first time you use it):

source("http://bioconductor.org/biocLite.R")
biocLite("rhdf5")

Then you call the library rhdf5

library(rhdf5)
file.remove("example.h5")
## [1] TRUE
created = h5createFile("example.h5")
created
## [1] TRUE

To see how to deal with this type of data we will create an example file:

h5createFile("example.h5")
## [1] FALSE

Remember that the data is stored in groups, which contain datasets. We start by creating these groups (examples of) and see what the file looks like after the groups have been created:

created = h5createGroup("example.h5","foo")
created = h5createGroup("example.h5","baa")
created = h5createGroup("example.h5","foo/foo_ex")
h5ls("example.h5") # ls stands for list
##   group   name     otype dclass dim
## 0     /    baa H5I_GROUP           
## 1     /    foo H5I_GROUP           
## 2  /foo foo_ex H5I_GROUP

Now let's get datasets on those groups (groups: foo, baa, and foob_ex, which is a subgroup of foo). Let's get two datasets in the group foo, leaving the group baa empty.

A = matrix(1:10,nr=5,nc=2)
h5write(A, "example.h5","foo/A")
B = array(seq(0.1,2.0,by=0.1),dim=c(5,2,2))
h5write(B, "example.h5","foo/B")
h5ls("example.h5")
##   group   name       otype  dclass       dim
## 0     /    baa   H5I_GROUP                  
## 1     /    foo   H5I_GROUP                  
## 2  /foo      A H5I_DATASET INTEGER     5 x 2
## 3  /foo      B H5I_DATASET   FLOAT 5 x 2 x 2
## 4  /foo foo_ex   H5I_GROUP

What do we see? The file contains two groups: baa and foo. Group baa is empty. Group foo contains one subgroup (foo_ex) and two data sets (usually it should be data-frames like objects): A and B. We can also write something to the root group:

df = data.frame(1L:5L,seq(0,1,length.out=5), c("ab","cde","fghi","a","s"), stringsAsFactors=FALSE)
h5write(df, "example.h5","df")
h5ls("example.h5")
##   group   name       otype   dclass       dim
## 0     /    baa   H5I_GROUP                   
## 1     /     df H5I_DATASET COMPOUND         5
## 2     /    foo   H5I_GROUP                   
## 3  /foo      A H5I_DATASET  INTEGER     5 x 2
## 4  /foo      B H5I_DATASET    FLOAT 5 x 2 x 2
## 5  /foo foo_ex   H5I_GROUP

Done with writing? Wanna read something (specifically: database A which is inside group foo)?

h5read("example.h5","foo/A")
##      [,1] [,2]
## [1,]    1    6
## [2,]    2    7
## [3,]    3    8
## [4,]    4    9
## [5,]    5   10

What else is cool? Say we want to rewrite the first three elements of the first column of A (which is inside group foo). That easy:

h5write(c(12,13,14),"example.h5","foo/A",index=list(1:3,1))
h5read("example.h5","foo/A")
##      [,1] [,2]
## [1,]   12    6
## [2,]   13    7
## [3,]   14    8
## [4,]    4    9
## [5,]    5   10

Reading APIs (application programming interfaces)

How to do it? Example:

myapp = oauth_app("twitter", key="yourConsumerKeyHere",secret="yourConsumerSecretHere")  # start the authorisation process; the consumer key you get from the API website
sig = sign_oauth1.0(myapp, token = "yourTokenHere", token_secret = "yourTokenSecretHere") # again, the necessary info is usually found on the app website (at least for twitter)
homeTL = GET("https://api.twitter.com/1.1/statuses/home_timeline.json", sig) # URL corresponds to the twitter API and the data i'd like to get out (statuses on home timeline); with twitter you get JSON data
json1 = content(homeTL) # extract JSON data
json2 = jsonlite::fromJSON(toJSON(json1)) # the data coming from twitter is hard to read, so it might be a good idea to use the jsonlight package to reformat it to something more readable

Interacting more directly with files

For most stuff (reading SPSS data, octave data, etc, there are packages for that). For interacting more directly with files: