Introduction

Pick three of your favorite books on one of your favorite subjects. At least one of the books should have more than one author. For each book, include the title, authors, and two or three other attributes that you find interesting. Take the information that you’ve selected about these three books, and separately create three files which store the book’s information in HTML (using an html table), XML, and JSON formats (e.g. “books.html”, “books.xml”, and “books.json”). To help you better understand the different file structures, I’d prefer that you create each of these files “by hand” unless you’re already very comfortable with the file formats. Write R code, using your packages of choice, to load the information from each of the three sources into separate R data frames. Are the three data frames identical?

Library

# Loading the required libraries to work with the HTML, XML and JSON data files

library(rvest)
library(XML)
library(knitr)
library(httr)
library(rjson)

Read HTML file in R

Now let’s parse html file into R using libraries loaded above

books_html <- readHTMLTable("books.html") # For some reason, R is not parsing the data from my github account so I had to read the files locally
books_html <- lapply(books_html[[1]], function(x) {unlist(x)})
df_html <- as.data.frame(books_html)
kable(df_html)
Title Authors Pages Publishing.year
Winning Jack Welch 384 2009
R for everyone Jared Lander 528 2017
R for data science Hadley Wickham and Garrett Grolemund 492 2016

Read XML file into R

Now let’s read the XML file into R through XML library

books_xml <- xmlInternalTreeParse("books.xml")
xml_df <- xmlToDataFrame(books_xml)
kable(xml_df)
Title Authors Pages Publishing.year
Winning Jack Welch 384 2009
R for everyone Jared Lander 528 2017
R for data science Hadley Wickham and Garrett Grolemund 492 2016

Loading data from JSON to R

Now let’s load the JSON file into R through rjson library.

books_json <- fromJSON(file="books.json")
books3 <- as.data.frame(books_json)
kable(books3)
Title Authors Pages Publishing.year
Winning Jack Welch 384 2009
R for everyone Jared Lander 528 2017
R for data science Hadley Wickham and Garett Grolemund 492 2016

Conclusion

Different libraries were used to read and parse the data files from html, xml and json data types. All the data were created manually by me and they were loaded locally.

Finally to answer the question regarding the identicality of all the tables, yes they were all identical and alike.