Pick three of your favorite books on one of your favorite subjects. At least one of the books should have more than one author. For each book, include the title, authors, and two or three other attributes that you find interesting. Take the information that you’ve selected about these three books, and separately create three files which store the book’s information in HTML (using an html table), XML, and JSON formats (e.g. “books.html”, “books.xml”, and “books.json”). To help you better understand the different file structures, I’d prefer that you create each of these files “by hand” unless you’re already very comfortable with the file formats. Write R code, using your packages of choice, to load the information from each of the three sources into separate R data frames. Are the three data frames identical?

Loading necessary packages

# install.packages("XML", dependencies=TRUE)
library(knitr)
library(XML)
library(RCurl)
## Loading required package: bitops
library(jsonlite)

Loading and reading HTML file

url<-"https://raw.githubusercontent.com/olgashiligin/607_assignments/master/books.html"
html_file <- getURL(url)
html_file
## [1] "<html> <head> </head>\n<body>\n\t<table>\n\t\t<tr> <th>ID</th><th>title</th> <th>author</th> <th>price</th> <th>isbn</th> <th>pages</th></tr>\n\t\t<tr> <td>01</th> <th>Jamie Cooks Italy</th> <th>Jamie Oliver</th> <th>9.99</th> <th>0718187733</th> <th>408</th> </tr>\n\t\t<tr> <td>02</th> <th>Mary Berry's Complete Cookbook: Over 650 recipes</th> <th>Marry Berry</th> <th>21.00</th> <th>0241286123</th> <th>608</th> </tr>\n\t\t<tr> <td>03</th> <th>Great British Bake Off: Everyday: Over 100 Foolproof Bakes</th> <th>Linda Collister, Marry Berry</th> <th>18.17</th> <th>9781849906081</th> <th>320</th> </tr>\n\t</table>\n</body>\n"
html_table <- readHTMLTable(html_file, header=TRUE, which=1)
html_table
##   ID                                                      title
## 1 01                                          Jamie Cooks Italy
## 2 02           Mary Berry's Complete Cookbook: Over 650 recipes
## 3 03 Great British Bake Off: Everyday: Over 100 Foolproof Bakes
##                         author price          isbn pages
## 1                 Jamie Oliver  9.99    0718187733   408
## 2                  Marry Berry 21.00    0241286123   608
## 3 Linda Collister, Marry Berry 18.17 9781849906081   320

Loading and reading XML file

url2<-("https://raw.githubusercontent.com/olgashiligin/607_assignments/master/books.xml")
xml_file<-getURL(url2)

xml_table <- xmlToDataFrame(xml_file)
xml_table
##                                                        title
## 1                                          Jamie Cooks Italy
## 2           Mary Berry's Complete Cookbook: Over 650 recipes
## 3 Great British Bake Off: Everyday: Over 100 Foolproof Bakes
##                         author price          isbn pages
## 1                 Jamie Oliver  9.99    0718187733   408
## 2                  Marry Berry 21.00    0241286123   608
## 3 Linda Collister, Marry Berry 18.17 9781849906081   320

Loading and reading JSON file

url3<-"https://raw.githubusercontent.com/olgashiligin/607_assignments/master/books.json"
json_file <- fromJSON(url3)
json_file
## $table
## $table$book
##   ID                                                      Title
## 1 01                                          Jamie Cooks Italy
## 2 02           Mary Berry's Complete Cookbook: Over 650 recipes
## 3 03 Great British Bake Off: Everyday: Over 100 Foolproof Bakes
##                         Author price          isbn pages
## 1                 Jamie Oliver  9.99    0718187733   408
## 2                  Marry Berry 21.00    0241286123   608
## 3 Linda Collister, Marry Berry 18.17 9781849906081   320

All three data frames look identical.