This is an R Markdown document created with examples of how to get data using R. Based on Coursera's Getting and Cleaning Data.
We aim at reproducibility. So when you treat raw data to obtain tidy data, you should end up with:
You create it just like you would a data frame:
library(data.table)
## Warning: package 'data.table' was built under R version 3.0.3
DT = data.table(x=rnorm(9),y=rep(c("a","b","c"),each=3),z=rnorm(9))
DT
## x y z
## 1: 0.8060 a 0.4069
## 2: 0.5900 a 0.1419
## 3: -0.5667 a -0.6065
## 4: 0.1425 b -1.5311
## 5: -0.1231 b -0.3270
## 6: -0.5698 b -0.6895
## 7: 0.8695 c -0.7942
## 8: -0.8862 c -0.7771
## 9: 1.3449 c 0.4245
If you're concerned with memory space, you can check out the tables you are currently working with using the tables() function.
Row subsetting is the same as with data frames. Columns subsetting is quite different:
DT[,list(mean(x),sum(z))]
## V1 V2
## 1: 0.1786 -3.752
DT[,table(y)]
## y
## a b c
## 3 3 3
DT[, w:=z^2] # creates new variable w
## x y z w
## 1: 0.8060 a 0.4069 0.16559
## 2: 0.5900 a 0.1419 0.02014
## 3: -0.5667 a -0.6065 0.36786
## 4: 0.1425 b -1.5311 2.34416
## 5: -0.1231 b -0.3270 0.10693
## 6: -0.5698 b -0.6895 0.47542
## 7: 0.8695 c -0.7942 0.63072
## 8: -0.8862 c -0.7771 0.60389
## 9: 1.3449 c 0.4245 0.18019
If you want to create a copy of a data table, explicitly use the function copy. If you just type DT2=DT, any changes you make to DT will be made to DT2.
Cool stuff: you can do more than one operation at once!
DT[,m:= {tmp <- (x+z); log2(tmp+5)}] # first create column tmp=x+z, then compute log2(tmp+5)
## x y z w m
## 1: 0.8060 a 0.4069 0.16559 2.635
## 2: 0.5900 a 0.1419 0.02014 2.519
## 3: -0.5667 a -0.6065 0.36786 1.936
## 4: 0.1425 b -1.5311 2.34416 1.853
## 5: -0.1231 b -0.3270 0.10693 2.186
## 6: -0.5698 b -0.6895 0.47542 1.903
## 7: 0.8695 c -0.7942 0.63072 2.344
## 8: -0.8862 c -0.7771 0.60389 1.738
## 9: 1.3449 c 0.4245 0.18019 2.759
Aggregating is also cool:
DT[,a:=x>0] # create boolean variable
## x y z w m a
## 1: 0.8060 a 0.4069 0.16559 2.635 TRUE
## 2: 0.5900 a 0.1419 0.02014 2.519 TRUE
## 3: -0.5667 a -0.6065 0.36786 1.936 FALSE
## 4: 0.1425 b -1.5311 2.34416 1.853 TRUE
## 5: -0.1231 b -0.3270 0.10693 2.186 FALSE
## 6: -0.5698 b -0.6895 0.47542 1.903 FALSE
## 7: 0.8695 c -0.7942 0.63072 2.344 TRUE
## 8: -0.8862 c -0.7771 0.60389 1.738 FALSE
## 9: 1.3449 c 0.4245 0.18019 2.759 TRUE
DT[,b:= mean(x+w),by=a] # when a is true, compute mean(x+y) for the rows where a is true; when a is false, compute mean(x+y) for the rows where a is false!
## x y z w m a b
## 1: 0.8060 a 0.4069 0.16559 2.635 TRUE 1.4188
## 2: 0.5900 a 0.1419 0.02014 2.519 TRUE 1.4188
## 3: -0.5667 a -0.6065 0.36786 1.936 FALSE -0.1479
## 4: 0.1425 b -1.5311 2.34416 1.853 TRUE 1.4188
## 5: -0.1231 b -0.3270 0.10693 2.186 FALSE -0.1479
## 6: -0.5698 b -0.6895 0.47542 1.903 FALSE -0.1479
## 7: 0.8695 c -0.7942 0.63072 2.344 TRUE 1.4188
## 8: -0.8862 c -0.7771 0.60389 1.738 FALSE -0.1479
## 9: 1.3449 c 0.4245 0.18019 2.759 TRUE 1.4188
Count the number of times each variable appears with the special character .N:
set.seed(123);
DT = data.table(x=sample(letters[1:3], 1E5, TRUE))
DT[, .N, by=x] # much faster than doing a table of DT$x
## x N
## 1: a 33387
## 2: c 33201
## 3: b 33412
Very fast way to subset using a key:
DT = data.table(x=rep(c("a","b","c"),each=100), y=rnorm(300))
setkey(DT, x)
DT['a']
## x y
## 1: a 0.25959
## 2: a 0.91751
## 3: a -0.72232
## 4: a -0.80828
## 5: a -0.14135
## 6: a 2.25701
## 7: a -2.37955
## 8: a -0.45425
## 9: a -0.06007
## 10: a 0.86090
## 11: a -1.78466
## 12: a -0.13074
## 13: a -0.36984
## 14: a -0.18066
## 15: a -1.04973
## 16: a 0.37832
## 17: a -1.37079
## 18: a -0.31612
## 19: a 0.39435
## 20: a -1.68988
## 21: a -1.46234
## 22: a 2.55838
## 23: a 0.08789
## 24: a 1.73141
## 25: a 1.21513
## 26: a 0.29954
## 27: a -0.17246
## 28: a 1.13250
## 29: a 0.02320
## 30: a 1.33587
## 31: a -1.09879
## 32: a -0.58176
## 33: a 0.03892
## 34: a 1.07315
## 35: a 1.34970
## 36: a 1.19528
## 37: a -0.02218
## 38: a 0.69849
## 39: a 0.67241
## 40: a -0.79165
## 41: a -0.21791
## 42: a 0.02307
## 43: a 0.11539
## 44: a -0.27708
## 45: a 0.03688
## 46: a 0.47520
## 47: a 1.70749
## 48: a 1.07601
## 49: a -1.34571
## 50: a -1.44025
## 51: a -0.39393
## 52: a 0.58106
## 53: a -0.17079
## 54: a -0.90585
## 55: a 0.15621
## 56: a -0.37323
## 57: a -0.34587
## 58: a -0.35829
## 59: a -0.13307
## 60: a -0.08960
## 61: a 0.62793
## 62: a -1.42883
## 63: a 0.17255
## 64: a -0.79115
## 65: a 1.26204
## 66: a -0.26941
## 67: a 0.15698
## 68: a -0.76060
## 69: a 1.37060
## 70: a 0.03758
## 71: a 0.44949
## 72: a 2.78869
## 73: a -0.46849
## 74: a 1.01261
## 75: a -0.04374
## 76: a 1.40670
## 77: a 0.41993
## 78: a 0.31009
## 79: a 1.11905
## 80: a -1.29814
## 81: a -1.28248
## 82: a 1.65943
## 83: a 0.78375
## 84: a 0.57771
## 85: a -0.26725
## 86: a -0.64569
## 87: a -0.44953
## 88: a -0.82620
## 89: a 1.05504
## 90: a -0.87927
## 91: a -1.27713
## 92: a -0.63412
## 93: a 0.66470
## 94: a -0.50958
## 95: a 0.40736
## 96: a 1.67775
## 97: a -1.05206
## 98: a -0.63691
## 99: a 0.56539
## 100: a 0.38016
## x y
Merging two data tables fast using a key:
DT1 = data.table(x=c('a', 'a', 'b', 'dt1'), y=1:4)
DT2 = data.table(x=c('a', 'b', 'dt2'), z=5:7)
setkey(DT1, x); setkey(DT2, x)
merge(DT1, DT2)
## x y z
## 1: a 1 5
## 2: a 2 5
## 3: b 3 6
Also, it is much faster to read using fread (returns a data.table) than read.table (returns data.frame). Check it out:
big_df = data.frame(x=rnorm(1E6), y=rnorm(1E6))
file = tempfile()
write.table(big_df, file=file, row.names=FALSE, col.names=TRUE, sep="\t", quote=FALSE)
system.time(fread(file))
##
Read 51.0% of 1000000 rows
Read 89.0% of 1000000 rows
Read 1000000 rows and 2 (of 2) columns from 0.035 GB file in 00:00:04
## user system elapsed
## 3.28 0.01 3.30
system.time(read.table(file, header=TRUE, sep="\t"))
## user system elapsed
## 22.27 0.38 22.64
Often a data analysis process starts by downloading data from the internet and this should preferably be included in the script. It is important to know what directory you are working on and how to set a directory, check if a given directory exists or create one:
setwd(“../example”) to set the current directory to the folder example which is already in the current directory (before setting)
file.exists(“directoryName”) checks whether the directory exists
dir.create(“directoryName”) creates a directory
list.files(“directoryName”) lists the files in the directory
To download files you use the download.file() function. Example:
setInternet2(TRUE) # maybe not necessary for all systems
fileUrl = "https://data.baltimorecity.gov/api/views/dz54-2aru/rows.csv?accessType=DOWNLOAD"
download.file(fileUrl,destfile="./Data/cameras.csv") # on some systems, you might have to have method="curl"
dateDownloaded = date() # it is good practice to record the date of the download
read.table function
* This is the main function for reading data into R
* Flexible and robust but requires more parameters
* Reads the data into RAM - big data can cause problems
* Important parameters: file, header, sep, row.names, nrows
* Other important parameters
* quote - you can tell R whether there are any quoted values quote="" means no quotes.
* na.strings - set the character that represents a missing value.
* nrows - how many rows to read of the file (e.g. nrows=10 reads 10 lines).
* skip - number of lines to skip before starting to read
read.csv
read.csv2
Example:
cameraData <- read.table("./data/cameras.csv",sep=",",header=TRUE)
head(cameraData)
## address direction street crossStreet
## 1 S CATON AVE & BENSON AVE N/B Caton Ave Benson Ave
## 2 S CATON AVE & BENSON AVE S/B Caton Ave Benson Ave
## 3 WILKENS AVE & PINE HEIGHTS AVE E/B Wilkens Ave Pine Heights
## 4 THE ALAMEDA & E 33RD ST S/B The Alameda 33rd St
## 5 E 33RD ST & THE ALAMEDA E/B E 33rd The Alameda
## 6 ERDMAN AVE & N MACON ST E/B Erdman Macon St
## intersection Location.1
## 1 Caton Ave & Benson Ave (39.2693779962, -76.6688185297)
## 2 Caton Ave & Benson Ave (39.2693157898, -76.6689698176)
## 3 Wilkens Ave & Pine Heights (39.2720252302, -76.676960806)
## 4 The Alameda & 33rd St (39.3285013141, -76.5953545714)
## 5 E 33rd & The Alameda (39.3283410623, -76.5953594625)
## 6 Erdman & Macon St (39.3068045671, -76.5593167803)
Example: download+read
if(!file.exists("Data")){dir.create("Data")}
fileUrl = "https://data.baltimorecity.gov/api/views/dz54-2aru/rows.xlsx?accessType=DOWNLOAD"
download.file(fileUrl,destfile="./data/cameras.xlsx", mode="wb") # mind the different download mode for these files, it may depend on your system
## Error: unsupported URL scheme
dateDownloaded = date()
library(xlsx)
## Loading required package: rJava
## Loading required package: xlsxjars
cameraData = read.xlsx("./data/cameras.xlsx",sheetIndex=1,header=TRUE)
head(cameraData)
## address direction street crossStreet
## 1 S CATON AVE & BENSON AVE N/B Caton Ave Benson Ave
## 2 S CATON AVE & BENSON AVE S/B Caton Ave Benson Ave
## 3 WILKENS AVE & PINE HEIGHTS AVE E/B Wilkens Ave Pine Heights
## 4 THE ALAMEDA & E 33RD ST S/B The Alameda 33rd St
## 5 E 33RD ST & THE ALAMEDA E/B E 33rd The Alameda
## 6 ERDMAN AVE & N MACON ST E/B Erdman Macon St
## intersection Location.1
## 1 Caton Ave & Benson Ave (39.2693779962, -76.6688185297)
## 2 Caton Ave & Benson Ave (39.2693157898, -76.6689698176)
## 3 Wilkens Ave & Pine Heights (39.2720252302, -76.676960806)
## 4 The Alameda & 33rd St (39.3285013141, -76.5953545714)
## 5 E 33rd & The Alameda (39.3283410623, -76.5953594625)
## 6 Erdman & Macon St (39.3068045671, -76.5593167803)
About XML:
Components
Tags correspond to general labels
<section></section><line-break />: no need for a start tag and an end tag in this caseElements are specific examples of tags
<Greeting> Hello, world </Greeting>Attributes are components of the label
<img src="jeff.jpg" alt="instructor"/><step number="3"> Connect A to B. </step>Example: Read file http://www.w3schools.com/xml/simple.xml
Let's load the data and see what it looks like:
library(XML)
fileUrl = "http://www.w3schools.com/xml/simple.xml"
doc = xmlTreeParse(fileUrl,useInternal=TRUE) # loads the document into R so that you can then parse it/look at parts of it/identifying what is what in the file
print(doc)
## <?xml version="1.0" encoding="UTF-8"?>
## <!-- Edited by XMLSpy -->
## <breakfast_menu>
## <food>
## <name>Belgian Waffles</name>
## <price>$5.95</price>
## <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
## <calories>650</calories>
## </food>
## <food>
## <name>Strawberry Belgian Waffles</name>
## <price>$7.95</price>
## <description>Light Belgian waffles covered with strawberries and whipped cream</description>
## <calories>900</calories>
## </food>
## <food>
## <name>Berry-Berry Belgian Waffles</name>
## <price>$8.95</price>
## <description>Light Belgian waffles covered with an assortment of fresh berries and whipped cream</description>
## <calories>900</calories>
## </food>
## <food>
## <name>French Toast</name>
## <price>$4.50</price>
## <description>Thick slices made from our homemade sourdough bread</description>
## <calories>600</calories>
## </food>
## <food>
## <name>Homestyle Breakfast</name>
## <price>$6.95</price>
## <description>Two eggs, bacon or sausage, toast, and our ever-popular hash browns</description>
## <calories>950</calories>
## </food>
## </breakfast_menu>
##
We're actually interested in the root node:
rootNode = xmlRoot(doc)
print(rootNode)
## <breakfast_menu>
## <food>
## <name>Belgian Waffles</name>
## <price>$5.95</price>
## <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
## <calories>650</calories>
## </food>
## <food>
## <name>Strawberry Belgian Waffles</name>
## <price>$7.95</price>
## <description>Light Belgian waffles covered with strawberries and whipped cream</description>
## <calories>900</calories>
## </food>
## <food>
## <name>Berry-Berry Belgian Waffles</name>
## <price>$8.95</price>
## <description>Light Belgian waffles covered with an assortment of fresh berries and whipped cream</description>
## <calories>900</calories>
## </food>
## <food>
## <name>French Toast</name>
## <price>$4.50</price>
## <description>Thick slices made from our homemade sourdough bread</description>
## <calories>600</calories>
## </food>
## <food>
## <name>Homestyle Breakfast</name>
## <price>$6.95</price>
## <description>Two eggs, bacon or sausage, toast, and our ever-popular hash browns</description>
## <calories>950</calories>
## </food>
## </breakfast_menu>
We can check out its name, the names of the elements inside the root node or directly access parts of the document:
xmlName(rootNode)
## [1] "breakfast_menu"
names(rootNode)
## food food food food food
## "food" "food" "food" "food" "food"
rootNode[[1]][[2]]
## <price>$5.95</price>
We can programatically extract parts of the file
xmlSApply(rootNode,xmlValue)
## food
## "Belgian Waffles$5.95Two of our famous Belgian Waffles with plenty of real maple syrup650"
## food
## "Strawberry Belgian Waffles$7.95Light Belgian waffles covered with strawberries and whipped cream900"
## food
## "Berry-Berry Belgian Waffles$8.95Light Belgian waffles covered with an assortment of fresh berries and whipped cream900"
## food
## "French Toast$4.50Thick slices made from our homemade sourdough bread600"
## food
## "Homestyle Breakfast$6.95Two eggs, bacon or sausage, toast, and our ever-popular hash browns950"
The XPath function is quite useful to extract …
Example of usage:
xpathSApply(rootNode,"//name",xmlValue)
## [1] "Belgian Waffles" "Strawberry Belgian Waffles"
## [3] "Berry-Berry Belgian Waffles" "French Toast"
## [5] "Homestyle Breakfast"
xpathSApply(rootNode,"//price",xmlValue)
## [1] "$5.95" "$7.95" "$8.95" "$4.50" "$6.95"
xpathSApply(rootNode,"//food[@class='french fries']",xmlValue)
## list()
The last instruction returns an empty list, because there are no foods called french fries. Let's look at an example where this type of instruction may be useful:
fileUrl = "http://espn.go.com/nfl/team/_/name/bal/baltimore-ravens"
doc = htmlTreeParse(fileUrl,useInternal=TRUE)
teams = xpathSApply(doc,"//li[@class='team-name']",xmlValue)
print(teams)
## [1] "San Francisco" "Dallas" "Washington" "New Orleans"
## [5] "Cincinnati" "Pittsburgh" "Cleveland" "Carolina"
## [9] "Indianapolis" "Tampa Bay" "Atlanta" "Cincinnati"
## [13] "Pittsburgh" "Tennessee" "New Orleans" "San Diego"
## [17] "Miami" "Jacksonville" "Houston" "Cleveland"
You may want to do webscraping, i.e., Programatically extracting data from the HTML code of websites:
Let's start by reading a webpage (http://scholar.google.com/citations?user=HI-I6C0AAAAJ&hl=en):
con = url("http://scholar.google.com/citations?user=HI-I6C0AAAAJ&hl=en") # open connection
htmlCode = readLines(con) # read
## Warning: incomplete final line found on
## 'http://scholar.google.com/citations?user=HI-I6C0AAAAJ&hl=en'
close(con) # close connection
In this case, you a list of characters which is huge (which is why it is not displayed here). We can instead try using the XML package:
library(XML)
url = "http://scholar.google.com/citations?user=HI-I6C0AAAAJ&hl=en"
html_obtained_using_XML_library = htmlTreeParse(url, useInternalNodes=T) # it's too big to print here: let's call this ***
Then we can use the type of commands we saw before (see XML files section)
xpathSApply(html_obtained_using_XML_library, "//title", xmlValue)
## [1] "Jeff Leek - Google Scholar Citations"
xpathSApply(html_obtained_using_XML_library, "//td[@id='col-citedby']", xmlValue)
## [1] "Cited by" "416" "303" "278" "181" "159"
## [7] "149" "137" "126" "119" "48" "45"
## [13] "40" "34" "23" "16" "14" "13"
## [19] "12" "10" "7"
There's still another alternative, which is rather useful when the site requires a login and a password. The result from using this approach (in this case, no login and password are required) is the same as using the XML library:
library(httr);
## Warning: package 'httr' was built under R version 3.0.3
html = GET("http://scholar.google.com/citations?user=HI-I6C0AAAAJ&hl=en")
content = content(html,as="text")
parsedHtml = htmlParse(content,asText=TRUE)
So now the parsedHtml looks exactly like html_obtained_using_XML_library. Again, we can use functions like xpathSApply. If the website does require login and password (example: http://httpbin.org/basic-auth/user/passwd), this is how to proceed:
pg2 = GET("http://httpbin.org/basic-auth/user/passwd", authenticate("user","passwd"))
content = content(pg2,as="text")
parsedHtml = htmlParse(content,asText=TRUE)
Using handles (to authenticate in websites?):
google = handle("http://google.com")
pg1 = GET(handle=google,path="/")
pg2 = GET(handle=google,path="search")
About JSON data:
How to read it:
We will open the example JSON data https://api.github.com/users/jtleek/repos
library(jsonlite)
## Warning: package 'jsonlite' was built under R version 3.0.3
jsonData = fromJSON("https://api.github.com/users/jtleek/repos")
You actually get back a data frame! Wanna look at the name of its columns?
colnames(jsonData)
## [1] "id" "name" "full_name"
## [4] "owner" "private" "html_url"
## [7] "description" "fork" "url"
## [10] "forks_url" "keys_url" "collaborators_url"
## [13] "teams_url" "hooks_url" "issue_events_url"
## [16] "events_url" "assignees_url" "branches_url"
## [19] "tags_url" "blobs_url" "git_tags_url"
## [22] "git_refs_url" "trees_url" "statuses_url"
## [25] "languages_url" "stargazers_url" "contributors_url"
## [28] "subscribers_url" "subscription_url" "commits_url"
## [31] "git_commits_url" "comments_url" "issue_comment_url"
## [34] "contents_url" "compare_url" "merges_url"
## [37] "archive_url" "downloads_url" "issues_url"
## [40] "pulls_url" "milestones_url" "notifications_url"
## [43] "labels_url" "releases_url" "created_at"
## [46] "updated_at" "pushed_at" "git_url"
## [49] "ssh_url" "clone_url" "svn_url"
## [52] "homepage" "size" "stargazers_count"
## [55] "watchers_count" "language" "has_issues"
## [58] "has_downloads" "has_wiki" "forks_count"
## [61] "mirror_url" "open_issues_count" "forks"
## [64] "open_issues" "watchers" "default_branch"
Now if you look at the original Json file, you will see that there's a lot of info on the “owner” entry. Let's explore that:
class(jsonData$owner)
## [1] "data.frame"
names(jsonData$owner)
## [1] "login" "id" "avatar_url"
## [4] "gravatar_id" "url" "html_url"
## [7] "followers_url" "following_url" "gists_url"
## [10] "starred_url" "subscriptions_url" "organizations_url"
## [13] "repos_url" "events_url" "received_events_url"
## [16] "type" "site_admin"
jsonData$owner$gists_url
## [1] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [2] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [3] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [4] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [5] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [6] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [7] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [8] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [9] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [10] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [11] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [12] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [13] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [14] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [15] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [16] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [17] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [18] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [19] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [20] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [21] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [22] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [23] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [24] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [25] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [26] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [27] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [28] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [29] "https://api.github.com/users/jtleek/gists{/gist_id}"
## [30] "https://api.github.com/users/jtleek/gists{/gist_id}"
You can also write data frames to Json:
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
myjson = toJSON(iris, pretty=TRUE)
print(myjson)
## [
## {
## "Sepal.Length" : 5.1,
## "Sepal.Width" : 3.5,
## "Petal.Length" : 1.4,
## "Petal.Width" : 0.2,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 4.9,
## "Sepal.Width" : 3,
## "Petal.Length" : 1.4,
## "Petal.Width" : 0.2,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 4.7,
## "Sepal.Width" : 3.2,
## "Petal.Length" : 1.3,
## "Petal.Width" : 0.2,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 4.6,
## "Sepal.Width" : 3.1,
## "Petal.Length" : 1.5,
## "Petal.Width" : 0.2,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 5,
## "Sepal.Width" : 3.6,
## "Petal.Length" : 1.4,
## "Petal.Width" : 0.2,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 5.4,
## "Sepal.Width" : 3.9,
## "Petal.Length" : 1.7,
## "Petal.Width" : 0.4,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 4.6,
## "Sepal.Width" : 3.4,
## "Petal.Length" : 1.4,
## "Petal.Width" : 0.3,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 5,
## "Sepal.Width" : 3.4,
## "Petal.Length" : 1.5,
## "Petal.Width" : 0.2,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 4.4,
## "Sepal.Width" : 2.9,
## "Petal.Length" : 1.4,
## "Petal.Width" : 0.2,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 4.9,
## "Sepal.Width" : 3.1,
## "Petal.Length" : 1.5,
## "Petal.Width" : 0.1,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 5.4,
## "Sepal.Width" : 3.7,
## "Petal.Length" : 1.5,
## "Petal.Width" : 0.2,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 4.8,
## "Sepal.Width" : 3.4,
## "Petal.Length" : 1.6,
## "Petal.Width" : 0.2,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 4.8,
## "Sepal.Width" : 3,
## "Petal.Length" : 1.4,
## "Petal.Width" : 0.1,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 4.3,
## "Sepal.Width" : 3,
## "Petal.Length" : 1.1,
## "Petal.Width" : 0.1,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 5.8,
## "Sepal.Width" : 4,
## "Petal.Length" : 1.2,
## "Petal.Width" : 0.2,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 5.7,
## "Sepal.Width" : 4.4,
## "Petal.Length" : 1.5,
## "Petal.Width" : 0.4,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 5.4,
## "Sepal.Width" : 3.9,
## "Petal.Length" : 1.3,
## "Petal.Width" : 0.4,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 5.1,
## "Sepal.Width" : 3.5,
## "Petal.Length" : 1.4,
## "Petal.Width" : 0.3,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 5.7,
## "Sepal.Width" : 3.8,
## "Petal.Length" : 1.7,
## "Petal.Width" : 0.3,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 5.1,
## "Sepal.Width" : 3.8,
## "Petal.Length" : 1.5,
## "Petal.Width" : 0.3,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 5.4,
## "Sepal.Width" : 3.4,
## "Petal.Length" : 1.7,
## "Petal.Width" : 0.2,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 5.1,
## "Sepal.Width" : 3.7,
## "Petal.Length" : 1.5,
## "Petal.Width" : 0.4,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 4.6,
## "Sepal.Width" : 3.6,
## "Petal.Length" : 1,
## "Petal.Width" : 0.2,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 5.1,
## "Sepal.Width" : 3.3,
## "Petal.Length" : 1.7,
## "Petal.Width" : 0.5,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 4.8,
## "Sepal.Width" : 3.4,
## "Petal.Length" : 1.9,
## "Petal.Width" : 0.2,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 5,
## "Sepal.Width" : 3,
## "Petal.Length" : 1.6,
## "Petal.Width" : 0.2,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 5,
## "Sepal.Width" : 3.4,
## "Petal.Length" : 1.6,
## "Petal.Width" : 0.4,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 5.2,
## "Sepal.Width" : 3.5,
## "Petal.Length" : 1.5,
## "Petal.Width" : 0.2,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 5.2,
## "Sepal.Width" : 3.4,
## "Petal.Length" : 1.4,
## "Petal.Width" : 0.2,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 4.7,
## "Sepal.Width" : 3.2,
## "Petal.Length" : 1.6,
## "Petal.Width" : 0.2,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 4.8,
## "Sepal.Width" : 3.1,
## "Petal.Length" : 1.6,
## "Petal.Width" : 0.2,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 5.4,
## "Sepal.Width" : 3.4,
## "Petal.Length" : 1.5,
## "Petal.Width" : 0.4,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 5.2,
## "Sepal.Width" : 4.1,
## "Petal.Length" : 1.5,
## "Petal.Width" : 0.1,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 5.5,
## "Sepal.Width" : 4.2,
## "Petal.Length" : 1.4,
## "Petal.Width" : 0.2,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 4.9,
## "Sepal.Width" : 3.1,
## "Petal.Length" : 1.5,
## "Petal.Width" : 0.2,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 5,
## "Sepal.Width" : 3.2,
## "Petal.Length" : 1.2,
## "Petal.Width" : 0.2,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 5.5,
## "Sepal.Width" : 3.5,
## "Petal.Length" : 1.3,
## "Petal.Width" : 0.2,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 4.9,
## "Sepal.Width" : 3.6,
## "Petal.Length" : 1.4,
## "Petal.Width" : 0.1,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 4.4,
## "Sepal.Width" : 3,
## "Petal.Length" : 1.3,
## "Petal.Width" : 0.2,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 5.1,
## "Sepal.Width" : 3.4,
## "Petal.Length" : 1.5,
## "Petal.Width" : 0.2,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 5,
## "Sepal.Width" : 3.5,
## "Petal.Length" : 1.3,
## "Petal.Width" : 0.3,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 4.5,
## "Sepal.Width" : 2.3,
## "Petal.Length" : 1.3,
## "Petal.Width" : 0.3,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 4.4,
## "Sepal.Width" : 3.2,
## "Petal.Length" : 1.3,
## "Petal.Width" : 0.2,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 5,
## "Sepal.Width" : 3.5,
## "Petal.Length" : 1.6,
## "Petal.Width" : 0.6,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 5.1,
## "Sepal.Width" : 3.8,
## "Petal.Length" : 1.9,
## "Petal.Width" : 0.4,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 4.8,
## "Sepal.Width" : 3,
## "Petal.Length" : 1.4,
## "Petal.Width" : 0.3,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 5.1,
## "Sepal.Width" : 3.8,
## "Petal.Length" : 1.6,
## "Petal.Width" : 0.2,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 4.6,
## "Sepal.Width" : 3.2,
## "Petal.Length" : 1.4,
## "Petal.Width" : 0.2,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 5.3,
## "Sepal.Width" : 3.7,
## "Petal.Length" : 1.5,
## "Petal.Width" : 0.2,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 5,
## "Sepal.Width" : 3.3,
## "Petal.Length" : 1.4,
## "Petal.Width" : 0.2,
## "Species" : "setosa"
## },
## {
## "Sepal.Length" : 7,
## "Sepal.Width" : 3.2,
## "Petal.Length" : 4.7,
## "Petal.Width" : 1.4,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 6.4,
## "Sepal.Width" : 3.2,
## "Petal.Length" : 4.5,
## "Petal.Width" : 1.5,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 6.9,
## "Sepal.Width" : 3.1,
## "Petal.Length" : 4.9,
## "Petal.Width" : 1.5,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 5.5,
## "Sepal.Width" : 2.3,
## "Petal.Length" : 4,
## "Petal.Width" : 1.3,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 6.5,
## "Sepal.Width" : 2.8,
## "Petal.Length" : 4.6,
## "Petal.Width" : 1.5,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 5.7,
## "Sepal.Width" : 2.8,
## "Petal.Length" : 4.5,
## "Petal.Width" : 1.3,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 6.3,
## "Sepal.Width" : 3.3,
## "Petal.Length" : 4.7,
## "Petal.Width" : 1.6,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 4.9,
## "Sepal.Width" : 2.4,
## "Petal.Length" : 3.3,
## "Petal.Width" : 1,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 6.6,
## "Sepal.Width" : 2.9,
## "Petal.Length" : 4.6,
## "Petal.Width" : 1.3,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 5.2,
## "Sepal.Width" : 2.7,
## "Petal.Length" : 3.9,
## "Petal.Width" : 1.4,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 5,
## "Sepal.Width" : 2,
## "Petal.Length" : 3.5,
## "Petal.Width" : 1,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 5.9,
## "Sepal.Width" : 3,
## "Petal.Length" : 4.2,
## "Petal.Width" : 1.5,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 6,
## "Sepal.Width" : 2.2,
## "Petal.Length" : 4,
## "Petal.Width" : 1,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 6.1,
## "Sepal.Width" : 2.9,
## "Petal.Length" : 4.7,
## "Petal.Width" : 1.4,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 5.6,
## "Sepal.Width" : 2.9,
## "Petal.Length" : 3.6,
## "Petal.Width" : 1.3,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 6.7,
## "Sepal.Width" : 3.1,
## "Petal.Length" : 4.4,
## "Petal.Width" : 1.4,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 5.6,
## "Sepal.Width" : 3,
## "Petal.Length" : 4.5,
## "Petal.Width" : 1.5,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 5.8,
## "Sepal.Width" : 2.7,
## "Petal.Length" : 4.1,
## "Petal.Width" : 1,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 6.2,
## "Sepal.Width" : 2.2,
## "Petal.Length" : 4.5,
## "Petal.Width" : 1.5,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 5.6,
## "Sepal.Width" : 2.5,
## "Petal.Length" : 3.9,
## "Petal.Width" : 1.1,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 5.9,
## "Sepal.Width" : 3.2,
## "Petal.Length" : 4.8,
## "Petal.Width" : 1.8,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 6.1,
## "Sepal.Width" : 2.8,
## "Petal.Length" : 4,
## "Petal.Width" : 1.3,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 6.3,
## "Sepal.Width" : 2.5,
## "Petal.Length" : 4.9,
## "Petal.Width" : 1.5,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 6.1,
## "Sepal.Width" : 2.8,
## "Petal.Length" : 4.7,
## "Petal.Width" : 1.2,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 6.4,
## "Sepal.Width" : 2.9,
## "Petal.Length" : 4.3,
## "Petal.Width" : 1.3,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 6.6,
## "Sepal.Width" : 3,
## "Petal.Length" : 4.4,
## "Petal.Width" : 1.4,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 6.8,
## "Sepal.Width" : 2.8,
## "Petal.Length" : 4.8,
## "Petal.Width" : 1.4,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 6.7,
## "Sepal.Width" : 3,
## "Petal.Length" : 5,
## "Petal.Width" : 1.7,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 6,
## "Sepal.Width" : 2.9,
## "Petal.Length" : 4.5,
## "Petal.Width" : 1.5,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 5.7,
## "Sepal.Width" : 2.6,
## "Petal.Length" : 3.5,
## "Petal.Width" : 1,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 5.5,
## "Sepal.Width" : 2.4,
## "Petal.Length" : 3.8,
## "Petal.Width" : 1.1,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 5.5,
## "Sepal.Width" : 2.4,
## "Petal.Length" : 3.7,
## "Petal.Width" : 1,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 5.8,
## "Sepal.Width" : 2.7,
## "Petal.Length" : 3.9,
## "Petal.Width" : 1.2,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 6,
## "Sepal.Width" : 2.7,
## "Petal.Length" : 5.1,
## "Petal.Width" : 1.6,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 5.4,
## "Sepal.Width" : 3,
## "Petal.Length" : 4.5,
## "Petal.Width" : 1.5,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 6,
## "Sepal.Width" : 3.4,
## "Petal.Length" : 4.5,
## "Petal.Width" : 1.6,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 6.7,
## "Sepal.Width" : 3.1,
## "Petal.Length" : 4.7,
## "Petal.Width" : 1.5,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 6.3,
## "Sepal.Width" : 2.3,
## "Petal.Length" : 4.4,
## "Petal.Width" : 1.3,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 5.6,
## "Sepal.Width" : 3,
## "Petal.Length" : 4.1,
## "Petal.Width" : 1.3,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 5.5,
## "Sepal.Width" : 2.5,
## "Petal.Length" : 4,
## "Petal.Width" : 1.3,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 5.5,
## "Sepal.Width" : 2.6,
## "Petal.Length" : 4.4,
## "Petal.Width" : 1.2,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 6.1,
## "Sepal.Width" : 3,
## "Petal.Length" : 4.6,
## "Petal.Width" : 1.4,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 5.8,
## "Sepal.Width" : 2.6,
## "Petal.Length" : 4,
## "Petal.Width" : 1.2,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 5,
## "Sepal.Width" : 2.3,
## "Petal.Length" : 3.3,
## "Petal.Width" : 1,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 5.6,
## "Sepal.Width" : 2.7,
## "Petal.Length" : 4.2,
## "Petal.Width" : 1.3,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 5.7,
## "Sepal.Width" : 3,
## "Petal.Length" : 4.2,
## "Petal.Width" : 1.2,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 5.7,
## "Sepal.Width" : 2.9,
## "Petal.Length" : 4.2,
## "Petal.Width" : 1.3,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 6.2,
## "Sepal.Width" : 2.9,
## "Petal.Length" : 4.3,
## "Petal.Width" : 1.3,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 5.1,
## "Sepal.Width" : 2.5,
## "Petal.Length" : 3,
## "Petal.Width" : 1.1,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 5.7,
## "Sepal.Width" : 2.8,
## "Petal.Length" : 4.1,
## "Petal.Width" : 1.3,
## "Species" : "versicolor"
## },
## {
## "Sepal.Length" : 6.3,
## "Sepal.Width" : 3.3,
## "Petal.Length" : 6,
## "Petal.Width" : 2.5,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 5.8,
## "Sepal.Width" : 2.7,
## "Petal.Length" : 5.1,
## "Petal.Width" : 1.9,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 7.1,
## "Sepal.Width" : 3,
## "Petal.Length" : 5.9,
## "Petal.Width" : 2.1,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 6.3,
## "Sepal.Width" : 2.9,
## "Petal.Length" : 5.6,
## "Petal.Width" : 1.8,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 6.5,
## "Sepal.Width" : 3,
## "Petal.Length" : 5.8,
## "Petal.Width" : 2.2,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 7.6,
## "Sepal.Width" : 3,
## "Petal.Length" : 6.6,
## "Petal.Width" : 2.1,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 4.9,
## "Sepal.Width" : 2.5,
## "Petal.Length" : 4.5,
## "Petal.Width" : 1.7,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 7.3,
## "Sepal.Width" : 2.9,
## "Petal.Length" : 6.3,
## "Petal.Width" : 1.8,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 6.7,
## "Sepal.Width" : 2.5,
## "Petal.Length" : 5.8,
## "Petal.Width" : 1.8,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 7.2,
## "Sepal.Width" : 3.6,
## "Petal.Length" : 6.1,
## "Petal.Width" : 2.5,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 6.5,
## "Sepal.Width" : 3.2,
## "Petal.Length" : 5.1,
## "Petal.Width" : 2,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 6.4,
## "Sepal.Width" : 2.7,
## "Petal.Length" : 5.3,
## "Petal.Width" : 1.9,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 6.8,
## "Sepal.Width" : 3,
## "Petal.Length" : 5.5,
## "Petal.Width" : 2.1,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 5.7,
## "Sepal.Width" : 2.5,
## "Petal.Length" : 5,
## "Petal.Width" : 2,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 5.8,
## "Sepal.Width" : 2.8,
## "Petal.Length" : 5.1,
## "Petal.Width" : 2.4,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 6.4,
## "Sepal.Width" : 3.2,
## "Petal.Length" : 5.3,
## "Petal.Width" : 2.3,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 6.5,
## "Sepal.Width" : 3,
## "Petal.Length" : 5.5,
## "Petal.Width" : 1.8,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 7.7,
## "Sepal.Width" : 3.8,
## "Petal.Length" : 6.7,
## "Petal.Width" : 2.2,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 7.7,
## "Sepal.Width" : 2.6,
## "Petal.Length" : 6.9,
## "Petal.Width" : 2.3,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 6,
## "Sepal.Width" : 2.2,
## "Petal.Length" : 5,
## "Petal.Width" : 1.5,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 6.9,
## "Sepal.Width" : 3.2,
## "Petal.Length" : 5.7,
## "Petal.Width" : 2.3,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 5.6,
## "Sepal.Width" : 2.8,
## "Petal.Length" : 4.9,
## "Petal.Width" : 2,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 7.7,
## "Sepal.Width" : 2.8,
## "Petal.Length" : 6.7,
## "Petal.Width" : 2,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 6.3,
## "Sepal.Width" : 2.7,
## "Petal.Length" : 4.9,
## "Petal.Width" : 1.8,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 6.7,
## "Sepal.Width" : 3.3,
## "Petal.Length" : 5.7,
## "Petal.Width" : 2.1,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 7.2,
## "Sepal.Width" : 3.2,
## "Petal.Length" : 6,
## "Petal.Width" : 1.8,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 6.2,
## "Sepal.Width" : 2.8,
## "Petal.Length" : 4.8,
## "Petal.Width" : 1.8,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 6.1,
## "Sepal.Width" : 3,
## "Petal.Length" : 4.9,
## "Petal.Width" : 1.8,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 6.4,
## "Sepal.Width" : 2.8,
## "Petal.Length" : 5.6,
## "Petal.Width" : 2.1,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 7.2,
## "Sepal.Width" : 3,
## "Petal.Length" : 5.8,
## "Petal.Width" : 1.6,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 7.4,
## "Sepal.Width" : 2.8,
## "Petal.Length" : 6.1,
## "Petal.Width" : 1.9,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 7.9,
## "Sepal.Width" : 3.8,
## "Petal.Length" : 6.4,
## "Petal.Width" : 2,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 6.4,
## "Sepal.Width" : 2.8,
## "Petal.Length" : 5.6,
## "Petal.Width" : 2.2,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 6.3,
## "Sepal.Width" : 2.8,
## "Petal.Length" : 5.1,
## "Petal.Width" : 1.5,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 6.1,
## "Sepal.Width" : 2.6,
## "Petal.Length" : 5.6,
## "Petal.Width" : 1.4,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 7.7,
## "Sepal.Width" : 3,
## "Petal.Length" : 6.1,
## "Petal.Width" : 2.3,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 6.3,
## "Sepal.Width" : 3.4,
## "Petal.Length" : 5.6,
## "Petal.Width" : 2.4,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 6.4,
## "Sepal.Width" : 3.1,
## "Petal.Length" : 5.5,
## "Petal.Width" : 1.8,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 6,
## "Sepal.Width" : 3,
## "Petal.Length" : 4.8,
## "Petal.Width" : 1.8,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 6.9,
## "Sepal.Width" : 3.1,
## "Petal.Length" : 5.4,
## "Petal.Width" : 2.1,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 6.7,
## "Sepal.Width" : 3.1,
## "Petal.Length" : 5.6,
## "Petal.Width" : 2.4,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 6.9,
## "Sepal.Width" : 3.1,
## "Petal.Length" : 5.1,
## "Petal.Width" : 2.3,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 5.8,
## "Sepal.Width" : 2.7,
## "Petal.Length" : 5.1,
## "Petal.Width" : 1.9,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 6.8,
## "Sepal.Width" : 3.2,
## "Petal.Length" : 5.9,
## "Petal.Width" : 2.3,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 6.7,
## "Sepal.Width" : 3.3,
## "Petal.Length" : 5.7,
## "Petal.Width" : 2.5,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 6.7,
## "Sepal.Width" : 3,
## "Petal.Length" : 5.2,
## "Petal.Width" : 2.3,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 6.3,
## "Sepal.Width" : 2.5,
## "Petal.Length" : 5,
## "Petal.Width" : 1.9,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 6.5,
## "Sepal.Width" : 3,
## "Petal.Length" : 5.2,
## "Petal.Width" : 2,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 6.2,
## "Sepal.Width" : 3.4,
## "Petal.Length" : 5.4,
## "Petal.Width" : 2.3,
## "Species" : "virginica"
## },
## {
## "Sepal.Length" : 5.9,
## "Sepal.Width" : 3,
## "Petal.Length" : 5.1,
## "Petal.Width" : 1.8,
## "Species" : "virginica"
## }
## ]
About:
How to connect to a database and know what it contains? Example: UCSC database - http://genome.ucsc.edu/goldenPath/help/mysql.html
library(RMySQL)
ucscDb = dbConnect(MySQL(),user="genome", host="genome-mysql.cse.ucsc.edu") # connect
result = dbGetQuery(ucscDb,"show databases;"); # get the databases withing the UCSC database
dbDisconnect(ucscDb); # always disconnect!
## [1] TRUE
We can now see which databases are included in the UCSC database:
result
## Database
## 1 information_schema
## 2 ailMel1
## 3 allMis1
## 4 anoCar1
## 5 anoCar2
## 6 anoGam1
## 7 apiMel1
## 8 apiMel2
## 9 aplCal1
## 10 balAcu1
## 11 bosTau2
## 12 bosTau3
## 13 bosTau4
## 14 bosTau5
## 15 bosTau6
## 16 bosTau7
## 17 bosTauMd3
## 18 braFlo1
## 19 caeJap1
## 20 caePb1
## 21 caePb2
## 22 caeRem2
## 23 caeRem3
## 24 calJac1
## 25 calJac3
## 26 calMil1
## 27 canFam1
## 28 canFam2
## 29 canFam3
## 30 cavPor3
## 31 cb1
## 32 cb3
## 33 ce10
## 34 ce2
## 35 ce4
## 36 ce6
## 37 cerSim1
## 38 choHof1
## 39 chrPic1
## 40 ci1
## 41 ci2
## 42 criGri1
## 43 danRer1
## 44 danRer2
## 45 danRer3
## 46 danRer4
## 47 danRer5
## 48 danRer6
## 49 danRer7
## 50 dasNov3
## 51 dipOrd1
## 52 dm1
## 53 dm2
## 54 dm3
## 55 dp2
## 56 dp3
## 57 droAna1
## 58 droAna2
## 59 droEre1
## 60 droGri1
## 61 droMoj1
## 62 droMoj2
## 63 droPer1
## 64 droSec1
## 65 droSim1
## 66 droVir1
## 67 droVir2
## 68 droYak1
## 69 droYak2
## 70 echTel1
## 71 echTel2
## 72 equCab1
## 73 equCab2
## 74 eriEur1
## 75 eriEur2
## 76 felCat3
## 77 felCat4
## 78 felCat5
## 79 fr1
## 80 fr2
## 81 fr3
## 82 gadMor1
## 83 galGal2
## 84 galGal3
## 85 galGal4
## 86 gasAcu1
## 87 geoFor1
## 88 go
## 89 go080130
## 90 go140213
## 91 gorGor3
## 92 hetGla1
## 93 hetGla2
## 94 hg16
## 95 hg17
## 96 hg18
## 97 hg19
## 98 hg19Patch10
## 99 hg19Patch2
## 100 hg19Patch5
## 101 hg19Patch9
## 102 hg38
## 103 hgFixed
## 104 hgTemp
## 105 hgcentral
## 106 latCha1
## 107 loxAfr3
## 108 macEug1
## 109 macEug2
## 110 melGal1
## 111 melUnd1
## 112 micMur1
## 113 mm10
## 114 mm10Patch1
## 115 mm5
## 116 mm6
## 117 mm7
## 118 mm8
## 119 mm9
## 120 monDom1
## 121 monDom4
## 122 monDom5
## 123 musFur1
## 124 myoLuc2
## 125 nomLeu1
## 126 nomLeu2
## 127 nomLeu3
## 128 ochPri2
## 129 oreNil1
## 130 oreNil2
## 131 ornAna1
## 132 oryCun2
## 133 oryLat2
## 134 otoGar3
## 135 oviAri1
## 136 oviAri3
## 137 panTro1
## 138 panTro2
## 139 panTro3
## 140 panTro4
## 141 papAnu2
## 142 papHam1
## 143 performance_schema
## 144 petMar1
## 145 petMar2
## 146 ponAbe2
## 147 priPac1
## 148 proCap1
## 149 proteins120806
## 150 proteins121210
## 151 proteins140122
## 152 proteome
## 153 pteVam1
## 154 rheMac1
## 155 rheMac2
## 156 rheMac3
## 157 rn3
## 158 rn4
## 159 rn5
## 160 sacCer1
## 161 sacCer2
## 162 sacCer3
## 163 saiBol1
## 164 sarHar1
## 165 sorAra1
## 166 sp120323
## 167 sp121210
## 168 sp140122
## 169 speTri2
## 170 strPur1
## 171 strPur2
## 172 susScr2
## 173 susScr3
## 174 taeGut1
## 175 taeGut2
## 176 tarSyr1
## 177 test
## 178 tetNig1
## 179 tetNig2
## 180 triMan1
## 181 tupBel1
## 182 turTru2
## 183 uniProt
## 184 vicPac1
## 185 vicPac2
## 186 visiGene
## 187 xenTro1
## 188 xenTro2
## 189 xenTro3
One of the databases (scroll down the list above and you'll find it) is the hg19. Wanna see which tables are contained in the database hg19?
library(RMySQL)
## Loading required package: DBI
## MYSQL_HOME defined as C:\Program Files\MySQL\MySQL Server 5.6
hg19 = dbConnect(MySQL(),user="genome", db="hg19",
host="genome-mysql.cse.ucsc.edu")
allTables = dbListTables(hg19)
Note that we did not disconnect yet, and we should when we are done collecting info from the database hg19. Let's look at the first tables in the database hg19 (there are very many)
allTables[1:19]
## [1] "HInv" "HInvGeneMrna"
## [3] "acembly" "acemblyClass"
## [5] "acemblyPep" "affyCytoScan"
## [7] "affyExonProbeAmbiguous" "affyExonProbeCore"
## [9] "affyExonProbeExtended" "affyExonProbeFree"
## [11] "affyExonProbeFull" "affyExonProbesetAmbiguous"
## [13] "affyExonProbesetCore" "affyExonProbesetExtended"
## [15] "affyExonProbesetFree" "affyExonProbesetFull"
## [17] "affyGnf1h" "affyU133"
## [19] "affyU133Plus2"
Right, so maybe we want to read one of these tables, the 19th table, called “affyU133Plus2”:
affydata=dbReadTable(hg19, "affyU133Plus2")
We can send instructions (like get dimensions of the table, select a specific subset, compute quartiles…). We should end by closing the connection:
dbListFields(hg19,"affyU133Plus2")
## [1] "bin" "matches" "misMatches" "repMatches" "nCount"
## [6] "qNumInsert" "qBaseInsert" "tNumInsert" "tBaseInsert" "strand"
## [11] "qName" "qSize" "qStart" "qEnd" "tName"
## [16] "tSize" "tStart" "tEnd" "blockCount" "blockSizes"
## [21] "qStarts" "tStarts"
dbGetQuery(hg19, "select count(*) from affyU133Plus2")
## count(*)
## 1 58463
query <- dbSendQuery(hg19, "select * from affyU133Plus2 where misMatches between 1 and 3")
affyMis <- fetch(query); quantile(affyMis$misMatches)
## 0% 25% 50% 75% 100%
## 1 1 2 2 3
affyMisSmall <- fetch(query,n=10); dbClearResult(query);
## [1] TRUE
dim(affyMisSmall)
## [1] 10 22
dbDisconnect(hg19)
## [1] TRUE
About:
To deal with this type of data, you first need to install a couple of things (first time you use it):
source("http://bioconductor.org/biocLite.R")
biocLite("rhdf5")
Then you call the library rhdf5
library(rhdf5)
file.remove("example.h5")
## [1] TRUE
created = h5createFile("example.h5")
created
## [1] TRUE
To see how to deal with this type of data we will create an example file:
h5createFile("example.h5")
## [1] FALSE
Remember that the data is stored in groups, which contain datasets. We start by creating these groups (examples of) and see what the file looks like after the groups have been created:
created = h5createGroup("example.h5","foo")
created = h5createGroup("example.h5","baa")
created = h5createGroup("example.h5","foo/foo_ex")
h5ls("example.h5") # ls stands for list
## group name otype dclass dim
## 0 / baa H5I_GROUP
## 1 / foo H5I_GROUP
## 2 /foo foo_ex H5I_GROUP
Now let's get datasets on those groups (groups: foo, baa, and foob_ex, which is a subgroup of foo). Let's get two datasets in the group foo, leaving the group baa empty.
A = matrix(1:10,nr=5,nc=2)
h5write(A, "example.h5","foo/A")
B = array(seq(0.1,2.0,by=0.1),dim=c(5,2,2))
h5write(B, "example.h5","foo/B")
h5ls("example.h5")
## group name otype dclass dim
## 0 / baa H5I_GROUP
## 1 / foo H5I_GROUP
## 2 /foo A H5I_DATASET INTEGER 5 x 2
## 3 /foo B H5I_DATASET FLOAT 5 x 2 x 2
## 4 /foo foo_ex H5I_GROUP
What do we see? The file contains two groups: baa and foo. Group baa is empty. Group foo contains one subgroup (foo_ex) and two data sets (usually it should be data-frames like objects): A and B. We can also write something to the root group:
df = data.frame(1L:5L,seq(0,1,length.out=5), c("ab","cde","fghi","a","s"), stringsAsFactors=FALSE)
h5write(df, "example.h5","df")
h5ls("example.h5")
## group name otype dclass dim
## 0 / baa H5I_GROUP
## 1 / df H5I_DATASET COMPOUND 5
## 2 / foo H5I_GROUP
## 3 /foo A H5I_DATASET INTEGER 5 x 2
## 4 /foo B H5I_DATASET FLOAT 5 x 2 x 2
## 5 /foo foo_ex H5I_GROUP
Done with writing? Wanna read something (specifically: database A which is inside group foo)?
h5read("example.h5","foo/A")
## [,1] [,2]
## [1,] 1 6
## [2,] 2 7
## [3,] 3 8
## [4,] 4 9
## [5,] 5 10
What else is cool? Say we want to rewrite the first three elements of the first column of A (which is inside group foo). That easy:
h5write(c(12,13,14),"example.h5","foo/A",index=list(1:3,1))
h5read("example.h5","foo/A")
## [,1] [,2]
## [1,] 12 6
## [2,] 13 7
## [3,] 14 8
## [4,] 4 9
## [5,] 5 10
How to do it? Example:
myapp = oauth_app("twitter", key="yourConsumerKeyHere",secret="yourConsumerSecretHere") # start the authorisation process; the consumer key you get from the API website
sig = sign_oauth1.0(myapp, token = "yourTokenHere", token_secret = "yourTokenSecretHere") # again, the necessary info is usually found on the app website (at least for twitter)
homeTL = GET("https://api.twitter.com/1.1/statuses/home_timeline.json", sig) # URL corresponds to the twitter API and the data i'd like to get out (statuses on home timeline); with twitter you get JSON data
json1 = content(homeTL) # extract JSON data
json2 = jsonlite::fromJSON(toJSON(json1)) # the data coming from twitter is hard to read, so it might be a good idea to use the jsonlight package to reformat it to something more readable
For most stuff (reading SPSS data, octave data, etc, there are packages for that). For interacting more directly with files: