For this assignment, I gathered three pieces of information about three different books and created an XML, JSON, and HTML file containing the books’ authors’ names and whether the book had clarity of style, and its brevity. In some cases, an author may have written specific chapters. In those instances, the individual author was judged on their clarity and brevity. Each file was read and the information was loaded into a dataframe.
bookshtml<-read_html("https://raw.githubusercontent.com/greerda/Data607/main/books.html")
bookshtml
## {html_document}
## <html>
## [1] <head>\n<meta http-equiv="Content-Type" content="text/html; charset=UTF-8 ...
## [2] <body>\n\t\t\t<table border="1">\n<thead><tr>\n<th>Author</th>\n\t\t\t\t\ ...
booksxml <-read_html("https://raw.githubusercontent.com/greerda/Data607/main/books.xml")
booksxml
## {html_document}
## <html>
## [1] <body><root><books title="Design Patterns"><author author="Freeman"><attr ...
booksjson <-fromJSON("https://raw.githubusercontent.com/greerda/Data607/main/books.json")
booksjson
## title Author
## 1 Design Patterns Freeman, Friedman, Yes, Yes, Yes, Yes
## 2 C++ Primer Lippman, Lajoe, No, No, No, No
## 3 JQuery York, Smith, No, Yes, No, Yes
all.equal(booksjson,booksxml)
## [1] "Names: 2 string mismatches"
## [2] "Attributes: < Length mismatch: comparison on first 1 components >"
## [3] "Attributes: < Component \"class\": Lengths (1, 2) differ (string compare on first 1) >"
## [4] "Attributes: < Component \"class\": 1 string mismatch >"
## [5] "Component 1: Modes: character, externalptr"
## [6] "Component 1: Lengths: 3, 1"
## [7] "Component 1: target is character, current is externalptr"
## [8] "Component 2: Modes: list, externalptr"
## [9] "Component 2: Lengths: 3, 1"
## [10] "Component 2: current is not list-like"
all.equal(booksjson,bookshtml)
## [1] "Names: 2 string mismatches"
## [2] "Attributes: < Length mismatch: comparison on first 1 components >"
## [3] "Attributes: < Component \"class\": Lengths (1, 2) differ (string compare on first 1) >"
## [4] "Attributes: < Component \"class\": 1 string mismatch >"
## [5] "Component 1: Modes: character, externalptr"
## [6] "Component 1: Lengths: 3, 1"
## [7] "Component 1: target is character, current is externalptr"
## [8] "Component 2: Modes: list, externalptr"
## [9] "Component 2: Lengths: 3, 1"
## [10] "Component 2: current is not list-like"
all.equal(bookshtml,booksxml)
## [1] TRUE
The HTML and XML files are equal in R but not identical. It makes sense that they are recognized as equal in R because XML and HTML are based on the same Standard Generalized Markup Language ISO standard. Therefore you can use the same tactics and techniques to extract the data from each file type. They aren’t identical because of the inherent differences between XML and HTML.
The XML/HTML files are not equal to the JSON. JSON has a much different syntax and structure in comparison either XML or HTML.
structures.