library(plyr)
library(knitr)

Assignment

In this assignment we will load data from three different sources into R (as a prelude to further analysis)

Part 1: html

Data from: https://github.com/scottogden10/607-Assignment2/blob/master/Books.html

Use the htmltab library to load in the values and format as dataframe.

html<-"https://raw.githubusercontent.com/scottogden10/607-Assignment2/master/Books.html"
library(htmltab)

htmldf<-htmltab::htmltab(html)
## Argument 'which' was left unspecified. Choosing first table.
htmldf
##                                           Title                   Authors
## 2 The Singular Universe and the Reality of Time Roberto Unger, Lee Smolin
## 3                Three Roads to Quantum Gravity                Lee Smolin
## 4                      The Trouble With Physics                Lee Smolin
##           ISBN13                  Publisher AmazonShippingWeight_lbs
## 2 978-1107074064 Cambridge University Press                      2.2
## 3 978-0465078363                Basic Books                     0.95
## 4 978-0618918683              Mariner Books                      1.1

Part 2: XML

Data from: https://github.com/scottogden10/607-Assignment2/blob/master/Books.xml

Use the XML and Rcurl libraries to load in the values and format as dataframe.

xml<-"https://raw.githubusercontent.com/scottogden10/607-Assignment2/master/Books.xml"

library("XML")
library(RCurl)
## Loading required package: bitops
xmldata<-getURL(xml)
xmlparse<-xmlParse(xmldata)
xmlparse
## <?xml version="1.0"?>
## <books>
##   <book id="1">
##     <Title>The Singular Universe</Title>
##     <Authors>Roberto Unger, Lee Smolin </Authors>
##     <ISBN13>978-1107074064</ISBN13>
##     <Publisher>Cambridge University Press</Publisher>
##     <AmazonShippingWeight_lbs>2.2</AmazonShippingWeight_lbs>
##   </book>
##   <book id="2">
##     <Title>Three Roads to Quantum Gravity</Title>
##     <Authors>Lee Smolin</Authors>
##     <ISBN13>978-0465078363</ISBN13>
##     <Publisher>Basic Books</Publisher>
##     <AmazonShippingWeight_lbs>0.95</AmazonShippingWeight_lbs>
##   </book>
##   <book id="3">
##     <Title>The Trouble With Physics</Title>
##     <Authors>Lee Smolin</Authors>
##     <ISBN13>978-0618918683</ISBN13>
##     <Publisher>Mariner Books</Publisher>
##     <AmazonShippingWeight_lbs>1.1</AmazonShippingWeight_lbs>
##   </book>
## </books>
## 
xmldf<-xmlToDataFrame(xmldata)
xmldf
##                            Title                    Authors         ISBN13
## 1          The Singular Universe Roberto Unger, Lee Smolin  978-1107074064
## 2 Three Roads to Quantum Gravity                 Lee Smolin 978-0465078363
## 3       The Trouble With Physics                 Lee Smolin 978-0618918683
##                    Publisher AmazonShippingWeight_lbs
## 1 Cambridge University Press                      2.2
## 2                Basic Books                     0.95
## 3              Mariner Books                      1.1

Part 3: json

Data from: https://github.com/scottogden10/607-Assignment2/blob/master/Books.json

Use the jsonlite library to load in the values and format as dataframe.

library(jsonlite)

json<-"https://raw.githubusercontent.com/scottogden10/607-Assignment2/master/Books.json"
jsonurl=getURL(json)
jsonlist<-jsonlite::fromJSON(jsonurl,simplifyDataFrame = TRUE)
jsondf<-data.frame(jsonlist)

jsondf
##                                           Title                   Authors
## 1 The Singular Universe and the Reality of Time Roberto Unger, Lee Smolin
## 2                Three Roads to Quantum Gravity                Lee Smolin
## 3                      The Trouble With Physics                Lee Smolin
##           ISBN13                  Publisher AmazonShippingWeight_lbs
## 1 978-1107074064 Cambridge University Press                     2.20
## 2 978-0465078363                Basic Books                     0.95
## 3 978-0618918683              Mariner Books                     1.10

Note all the dataframes are identical with this code!