# load data from html file
books.html <- paste(readLines("books.html"))
books.html <- readHTMLTable(books.html, stringsAsFactors = FALSE)
books.html <- books.html[[1]]
books.html
## title author publisher
## 1 capital carceralism jackie wang semiotext(e)
## 2 the information james gleick pantheon books
## 3 the communist manifesto karl marx, frederich engels penguin classic
## year_published pages
## 1 2018 360
## 2 2011 544
## 3 2001 64
# load data from xml file
books.xml <- xmlParse("books.xml")
books.xml <- xmlToDataFrame(books.xml, stringsAsFactors = FALSE)
books.xml
## title author publisher
## 1 capital carceralism jackie wang semiotext(e)
## 2 the information james gleick pantheon books
## 3 the communist manifesto karl marx, frederich engels penguin classic
## year_published pages
## 1 2018 360
## 2 2011 544
## 3 2001 64
# load data from json file
books.json <- fromJSON("books.json")
books.json <- books.json[[1]]
books.json
## title author publisher
## 1 capital carceralism jackie wang semiotext(e)
## 2 the information james gleick pantheon books
## 3 the communist manifesto karl marx, frederich engels penguin classic
## year_published pages
## 1 2018 360
## 2 2011 544
## 3 2001 64
For the most part, the XML, HTML, and JSON-derived dataframes have identical output. The JSON-derived dataframe stands out in having preserved the class that was implied by the lack of double-quotes that wrapped the integer values.