library(XML)
library(RJSONIO)
library(DT)
## Warning: package 'DT' was built under R version 3.5.3
Books files were manually created and saved in different formats on local device and also can be found on:
1.https://raw.githubusercontent.com/uplotnik/Data607/master/Books.html
2.https://raw.githubusercontent.com/uplotnik/Data607/master/Books.xml
3.https://raw.githubusercontent.com/uplotnik/Data607/master/Books.json
html_parsed<-htmlParse(file = "file:///C:/Users/a/Desktop/607/Books.html")
html_table<-readHTMLTable(html_parsed, stringsAsFactors = FALSE)
html_table<-html_table[[1]]
datatable(html_table)
str(html_table)
## 'data.frame': 3 obs. of 6 variables:
## $ BookTitle : chr "American Gods" "A Game of Thrones" "Tiny Pretty Things"
## $ Author : chr "Neil Gaiman" "George R. Martin" "Sona Charaipotra, Dhonielle Clayton"
## $ Genre : chr "Fantasy" "Fantasy" "Fantasy"
## $ Publisher : chr "William Morrow" "Bantam Spectra,Voyager Books" "Harper Collins Publishers"
## $ Published Date: chr "06/19/2001" "08/01/1996" "05/26/2015"
## $ Pages : chr "465" "694" "464"
xml_books <- "file:///C:/Users/a/Desktop/607/Books.xml"
book = xmlParse(xml_books)
xml_table <- xmlToDataFrame(book)
datatable(xml_table)
str(xml_table)
## 'data.frame': 3 obs. of 6 variables:
## $ BookTitle : Factor w/ 3 levels "A Game of Thrones",..: 2 1 3
## $ Author : Factor w/ 3 levels "George R. Martin",..: 2 1 3
## $ Genre : Factor w/ 1 level "Fantasy": 1 1 1
## $ Publisher : Factor w/ 3 levels "Bantam Spectra,Voyager Books",..: 3 1 2
## $ PublishedDate: Factor w/ 3 levels "05/26/2015","06/19/2001",..: 2 3 1
## $ Pages : Factor w/ 3 levels "464","465","694": 2 3 1
json_books<-fromJSON(content = "file:///C:/Users/a/Desktop/607/Books.JSON")
#create data frame.
json_table<-do.call("rbind", lapply(json_books[[1]], data.frame, stringsAsFactors = FALSE))
datatable(json_table)
str(json_table)
## 'data.frame': 3 obs. of 6 variables:
## $ BookTitle : chr "American Gods" "A Game of Thrones" "Tiny Pretty Things"
## $ Author : chr "Neil Gaiman" "George R. Martin" "Sona Charaipotra, Dhonielle Clayton"
## $ Genre : chr "Fantasy" "Fantasy" "Fantasy"
## $ Publisher : chr "William Morrow" "Bantam Spectra,Voyager Books" "Harper Collins Publishers"
## $ Published.Date: chr "06/19/2001" "08/01/1996" "05/26/2015"
## $ Pages : num 465 694 464
All three files saved in different formats look identical when we view data frames using library(DT). However, variables in HTML and JSON files are saved as Characters (except “Pages” variable which is a Number vector in JSON), while all variables in XML file are Factors.