library(XML)
library(RCurl)
## Loading required package: bitops
library(xml2)
library(rjson)
importbookhtml <- readHTMLTable("books.html", header = F)
importbookhtml
## $`NULL`
## V1
## 1 The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
## 2 The Road Back to You: An Enneagram Journey to Self-Discovery
## 3 The Nerdist Way: How to Reach the Next Level (In Real Life)
## V2 V3 V4
## 1 Mark Manson 09-13-2016 Harper
## 2 Ian Morgan Cron and Suzanne Stabile 10-04-2016 IVP Books
## 3 Chris Hardwick 11-01-2011 Berkley
importbookxml <- xmlParse("books.xml")
bookxml <- xmlToList(importbookxml)
bookxml
## $book
## $book$title
## [1] "The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life"
##
## $book$author
## [1] "Mark Manson"
##
## $book$pubdate
## [1] "09-13-2016"
##
## $book$publisher
## [1] "Harper"
##
##
## $book
## $book$title
## [1] "The Road Back to You: An Enneagram Journey to Self-Discovery"
##
## $book$author
## [1] "Ian Morgan Cron and Suzanne Stabile"
##
## $book$pubdate
## [1] "10-04-2016"
##
## $book$publisher
## [1] "IVP Books"
##
##
## $book
## $book$title
## [1] "The Nerdist Way: How to Reach the Next Level (In Real Life)"
##
## $book$author
## [1] "Chris Hardwick"
##
## $book$pubdate
## [1] "11-01-2011"
##
## $book$publisher
## [1] "Berkley"
importbookjson <- fromJSON(file = "books.json")
importbookjson
## [[1]]
## [[1]]$title
## [1] "The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life"
##
## [[1]]$author
## [1] "Mark Manson"
##
## [[1]]$pubdate
## [1] "09-13-2016"
##
## [[1]]$publisher
## [1] "Harper"
##
##
## [[2]]
## [[2]]$title
## [1] "The Road Back to You: An Enneagram Journey to Self-Discovery"
##
## [[2]]$author
## [1] "Ian Morgan Cron and Suzanne Stabile"
##
## [[2]]$pubdate
## [1] "10-04-2016"
##
## [[2]]$publisher
## [1] "IVP Books"
##
##
## [[3]]
## [[3]]$title
## [1] "The Nerdist Way: How to Reach the Next Level (In Real Life)"
##
## [[3]]$author
## [1] "Chris Hardwick"
##
## [[3]]$pubdate
## [1] "11-01-2011"
##
## [[3]]$publisher
## [1] "Berkley"
The 3 data frames are not identical. Reading from XML and json would yield similar data frames with a list of chars, but from html, it is returning as a vector.