Pick three of your favorite books on one of your favorite subjects. At least one of the books should have more than one author. For each book, include the title, authors, and two or three other attributes that you find interesting. Take the information that you’ve selected about these three books, and separately create three files which store the book’s information in HTML (using an html table), XML, and JSON formats (e.g. “books.html”, “books.xml”, and “books.json”). To help you better understand the different file structures, I’d prefer that you create each of these files “by hand” unless you’re already very comfortable with the file formats. Write R code, using your packages of choice, to load the information from each of the three sources into separate R data frames. Are the three data frames identical?
Loading necessary packages
# install.packages("XML", dependencies=TRUE)
library(knitr)
library(XML)
library(RCurl)
## Loading required package: bitops
library(jsonlite)
Loading and reading HTML file
url<-"https://raw.githubusercontent.com/olgashiligin/607_assignments/master/books.html"
html_file <- getURL(url)
html_file
## [1] "<html> <head> </head>\n<body>\n\t<table>\n\t\t<tr> <th>ID</th><th>title</th> <th>author</th> <th>price</th> <th>isbn</th> <th>pages</th></tr>\n\t\t<tr> <td>01</th> <th>Jamie Cooks Italy</th> <th>Jamie Oliver</th> <th>9.99</th> <th>0718187733</th> <th>408</th> </tr>\n\t\t<tr> <td>02</th> <th>Mary Berry's Complete Cookbook: Over 650 recipes</th> <th>Marry Berry</th> <th>21.00</th> <th>0241286123</th> <th>608</th> </tr>\n\t\t<tr> <td>03</th> <th>Great British Bake Off: Everyday: Over 100 Foolproof Bakes</th> <th>Linda Collister, Marry Berry</th> <th>18.17</th> <th>9781849906081</th> <th>320</th> </tr>\n\t</table>\n</body>\n"
html_table <- readHTMLTable(html_file, header=TRUE, which=1)
html_table
## ID title
## 1 01 Jamie Cooks Italy
## 2 02 Mary Berry's Complete Cookbook: Over 650 recipes
## 3 03 Great British Bake Off: Everyday: Over 100 Foolproof Bakes
## author price isbn pages
## 1 Jamie Oliver 9.99 0718187733 408
## 2 Marry Berry 21.00 0241286123 608
## 3 Linda Collister, Marry Berry 18.17 9781849906081 320
Loading and reading XML file
url2<-("https://raw.githubusercontent.com/olgashiligin/607_assignments/master/books.xml")
xml_file<-getURL(url2)
xml_table <- xmlToDataFrame(xml_file)
xml_table
## title
## 1 Jamie Cooks Italy
## 2 Mary Berry's Complete Cookbook: Over 650 recipes
## 3 Great British Bake Off: Everyday: Over 100 Foolproof Bakes
## author price isbn pages
## 1 Jamie Oliver 9.99 0718187733 408
## 2 Marry Berry 21.00 0241286123 608
## 3 Linda Collister, Marry Berry 18.17 9781849906081 320
Loading and reading JSON file
url3<-"https://raw.githubusercontent.com/olgashiligin/607_assignments/master/books.json"
json_file <- fromJSON(url3)
json_file
## $table
## $table$book
## ID Title
## 1 01 Jamie Cooks Italy
## 2 02 Mary Berry's Complete Cookbook: Over 650 recipes
## 3 03 Great British Bake Off: Everyday: Over 100 Foolproof Bakes
## Author price isbn pages
## 1 Jamie Oliver 9.99 0718187733 408
## 2 Marry Berry 21.00 0241286123 608
## 3 Linda Collister, Marry Berry 18.17 9781849906081 320
All three data frames look identical.