Library preparation

library(XML)
library(RJSONIO)
library(DT)
## Warning: package 'DT' was built under R version 3.5.3

Books files were manually created and saved in different formats on local device and also can be found on:

1.https://raw.githubusercontent.com/uplotnik/Data607/master/Books.html

2.https://raw.githubusercontent.com/uplotnik/Data607/master/Books.xml

3.https://raw.githubusercontent.com/uplotnik/Data607/master/Books.json

HTML file

html_parsed<-htmlParse(file = "file:///C:/Users/a/Desktop/607/Books.html")

html_table<-readHTMLTable(html_parsed, stringsAsFactors = FALSE)
html_table<-html_table[[1]]


datatable(html_table)
str(html_table)
## 'data.frame':    3 obs. of  6 variables:
##  $ BookTitle     : chr  "American Gods" "A Game of Thrones" "Tiny Pretty Things"
##  $ Author        : chr  "Neil Gaiman" "George R. Martin" "Sona Charaipotra, Dhonielle Clayton"
##  $ Genre         : chr  "Fantasy" "Fantasy" "Fantasy"
##  $ Publisher     : chr  "William Morrow" "Bantam Spectra,Voyager Books" "Harper Collins Publishers"
##  $ Published Date: chr  "06/19/2001" "08/01/1996" "05/26/2015"
##  $ Pages         : chr  "465" "694" "464"

XML file

xml_books <- "file:///C:/Users/a/Desktop/607/Books.xml"
book = xmlParse(xml_books)
xml_table <- xmlToDataFrame(book)
datatable(xml_table)
str(xml_table)
## 'data.frame':    3 obs. of  6 variables:
##  $ BookTitle    : Factor w/ 3 levels "A Game of Thrones",..: 2 1 3
##  $ Author       : Factor w/ 3 levels "George R. Martin",..: 2 1 3
##  $ Genre        : Factor w/ 1 level "Fantasy": 1 1 1
##  $ Publisher    : Factor w/ 3 levels "Bantam Spectra,Voyager Books",..: 3 1 2
##  $ PublishedDate: Factor w/ 3 levels "05/26/2015","06/19/2001",..: 2 3 1
##  $ Pages        : Factor w/ 3 levels "464","465","694": 2 3 1

JSON file

json_books<-fromJSON(content = "file:///C:/Users/a/Desktop/607/Books.JSON")

#create data frame.
json_table<-do.call("rbind", lapply(json_books[[1]], data.frame, stringsAsFactors = FALSE))

datatable(json_table)
str(json_table)
## 'data.frame':    3 obs. of  6 variables:
##  $ BookTitle     : chr  "American Gods" "A Game of Thrones" "Tiny Pretty Things"
##  $ Author        : chr  "Neil Gaiman" "George R. Martin" "Sona Charaipotra, Dhonielle Clayton"
##  $ Genre         : chr  "Fantasy" "Fantasy" "Fantasy"
##  $ Publisher     : chr  "William Morrow" "Bantam Spectra,Voyager Books" "Harper Collins Publishers"
##  $ Published.Date: chr  "06/19/2001" "08/01/1996" "05/26/2015"
##  $ Pages         : num  465 694 464

Conclusion

All three files saved in different formats look identical when we view data frames using library(DT). However, variables in HTML and JSON files are saved as Characters (except “Pages” variable which is a Number vector in JSON), while all variables in XML file are Factors.