Introduction

The objective of this assignment is to select three of your favorite books on one of your favorite subjects. At least one of the books should have more than one author. For each book, include the title, authors, and two or three other attributes that you find interesting.

Take the information that you’ve selected about these three books, and separately create three files which store the book’s information in HTML (using an html table), XML, and JSON formats (e.g. “books.html”,“books.xml”, and “books.json”).

Write R code, using your packages of choice, to load the information from each of the three sources into separate R data frames. Are the three data frames identical?

I selected 3 of my favorite books based on the subject of race that contain multiple authors shown below:

“The Fire This Time: A New Generation Speaks about Race” by Jesmyn Ward (Editor), Clint Smith (Contributor), Kevin Young (Contributor), Mitchell S. Jackson (Contributor), Natasha Trethewey (Contributor), Daniel José Older (Contributor), Edwidge Danticat (Contributor), Honorée Fanonne Jeffers (Contributor), Claudia Rankine (Contributor), Isabel Wilkerson (Contributor)

Goodreads Rating: 4.3/5

Number of ratings: 14,000+

“The Race Card: How Bluffing About Bias Makes Race Relations Worse” by Richard Thompson Ford, Karen E. Fields

Amazon Rating: 4.1/5

Number of ratings: 50+

“This Bridge Called My Back: Writings by Radical Women of Color” by Cherríe L. Moraga (Editor), Gloria Anzaldúa (Editor), Toni Cade Bambara (Contributor), Audre Lorde (Contributor), Barbara Smith (Contributor), Ana Castillo (Contributor), Cherrie Moraga (Contributor), and others

Goodreads Rating: 4.4/5

Number of ratings: 10,000+

I created the files: “books.html”, “books.xml”, and “books.json” with the information of these books in their respective formats. Then, I loaded the information from each source into separate R data frames.

library(tidyverse)
library(openintro)

# Load necessary libraries
library(xml2)
library(jsonlite)
library(httr)
library(rvest)

Load HTML data into a data frame

html_url <- "https://raw.githubusercontent.com/pujaroy280/DATA607Week7/main/books.html"
html_content <- GET(html_url)
html_text <- content(html_content, "text")
html_df <- read_html(html_text) %>% html_table()

Load XML data into a data frame

xml_url <- "https://raw.githubusercontent.com/pujaroy280/DATA607Week7/main/books.xml"
xml_content <- xml2::read_xml(xml_url)
xml_df <- xml2::as_list(xml_content)

Load JSON data into a data frame

json_url <- "https://raw.githubusercontent.com/pujaroy280/DATA607Week7/main/books.json"
json_content <- GET(json_url)
json_text <- content(json_content, "text")
json_df <- fromJSON(json_text)

Check dimensions

dim(html_df) == dim(xml_df)
## logical(0)
dim(html_df) == dim(json_df)
## logical(0)
dim(xml_df) == dim(json_df)
## logical(0)

Check structure

identical(colnames(html_df), colnames(xml_df))
## [1] TRUE
identical(colnames(html_df), colnames(json_df))
## [1] TRUE
identical(colnames(xml_df), colnames(json_df))
## [1] TRUE

Check content

identical(html_df, xml_df)
## [1] FALSE
identical(html_df, json_df)
## [1] FALSE
identical(xml_df, json_df)
## [1] FALSE

Conclusion: Are the three data frames identical?

No, the three data frames are not identical because HTML, XML, and JSON have different data structures. While the information is the same, they are represented differently in each format. However, it is proven that they contain the same data.

LS0tDQp0aXRsZTogIldvcmtpbmcgd2l0aCBYTUwgYW5kIEpTT04gaW4gUiINCmF1dGhvcjogIlB1amEgUm95Ig0KZGF0ZTogImByIFN5cy5EYXRlKClgIg0Kb3V0cHV0OiBvcGVuaW50cm86OmxhYl9yZXBvcnQNCi0tLQ0KDQojIyMgSW50cm9kdWN0aW9uDQoNClRoZSBvYmplY3RpdmUgb2YgdGhpcyBhc3NpZ25tZW50IGlzIHRvIHNlbGVjdCB0aHJlZSBvZiB5b3VyIGZhdm9yaXRlIGJvb2tzIG9uIG9uZSBvZiB5b3VyIGZhdm9yaXRlIHN1YmplY3RzLiBBdCBsZWFzdCBvbmUgb2YgdGhlIGJvb2tzIHNob3VsZCBoYXZlIG1vcmUgdGhhbiBvbmUgYXV0aG9yLiBGb3IgZWFjaCBib29rLCBpbmNsdWRlIHRoZSB0aXRsZSwgYXV0aG9ycywgYW5kIHR3byBvciB0aHJlZSBvdGhlciBhdHRyaWJ1dGVzIHRoYXQgeW91IGZpbmQNCmludGVyZXN0aW5nLg0KDQpUYWtlIHRoZSBpbmZvcm1hdGlvbiB0aGF0IHlvdeKAmXZlIHNlbGVjdGVkIGFib3V0IHRoZXNlIHRocmVlIGJvb2tzLCBhbmQgc2VwYXJhdGVseSBjcmVhdGUgdGhyZWUgZmlsZXMgd2hpY2ggc3RvcmUgdGhlIGJvb2vigJlzIGluZm9ybWF0aW9uIGluIEhUTUwgKHVzaW5nIGFuIGh0bWwgdGFibGUpLCBYTUwsIGFuZCBKU09OIGZvcm1hdHMgKGUuZy4g4oCcYm9va3MuaHRtbOKAnSzigJxib29rcy54bWzigJ0sIGFuZCDigJxib29rcy5qc29u4oCdKS4NCg0KV3JpdGUgUiBjb2RlLCB1c2luZyB5b3VyIHBhY2thZ2VzIG9mIGNob2ljZSwgdG8gbG9hZCB0aGUgaW5mb3JtYXRpb24gZnJvbSBlYWNoIG9mIHRoZSB0aHJlZSBzb3VyY2VzIGludG8gc2VwYXJhdGUgUiBkYXRhIGZyYW1lcy4gQXJlIHRoZSB0aHJlZSBkYXRhIGZyYW1lcyBpZGVudGljYWw/DQoNCkkgc2VsZWN0ZWQgMyBvZiBteSBmYXZvcml0ZSBib29rcyBiYXNlZCBvbiB0aGUgc3ViamVjdCBvZiByYWNlIHRoYXQgY29udGFpbiBtdWx0aXBsZSBhdXRob3JzIHNob3duIGJlbG93Og0KDQoqKiJUaGUgRmlyZSBUaGlzIFRpbWU6IEEgTmV3IEdlbmVyYXRpb24gU3BlYWtzIGFib3V0IFJhY2UiKiogYnkgSmVzbXluIFdhcmQgKEVkaXRvciksIENsaW50IFNtaXRoIChDb250cmlidXRvciksIEtldmluIFlvdW5nIChDb250cmlidXRvciksIE1pdGNoZWxsIFMuIEphY2tzb24gKENvbnRyaWJ1dG9yKSwgTmF0YXNoYSBUcmV0aGV3ZXkgKENvbnRyaWJ1dG9yKSwgRGFuaWVsIEpvc8OpIE9sZGVyIChDb250cmlidXRvciksIEVkd2lkZ2UgRGFudGljYXQgKENvbnRyaWJ1dG9yKSwgSG9ub3LDqWUgRmFub25uZSBKZWZmZXJzIChDb250cmlidXRvciksIENsYXVkaWEgUmFua2luZSAoQ29udHJpYnV0b3IpLCBJc2FiZWwgV2lsa2Vyc29uIChDb250cmlidXRvcikNCg0KR29vZHJlYWRzIFJhdGluZzogNC4zLzUNCg0KTnVtYmVyIG9mIHJhdGluZ3M6IDE0LDAwMCsNCg0KKioiVGhlIFJhY2UgQ2FyZDogSG93IEJsdWZmaW5nIEFib3V0IEJpYXMgTWFrZXMgUmFjZSBSZWxhdGlvbnMgV29yc2UiKiogYnkgUmljaGFyZCBUaG9tcHNvbiBGb3JkLCBLYXJlbiBFLiBGaWVsZHMNCg0KQW1hem9uIFJhdGluZzogNC4xLzUNCg0KTnVtYmVyIG9mIHJhdGluZ3M6IDUwKw0KDQoqKiJUaGlzIEJyaWRnZSBDYWxsZWQgTXkgQmFjazogV3JpdGluZ3MgYnkgUmFkaWNhbCBXb21lbiBvZiBDb2xvciIqKiBieSBDaGVycsOtZSBMLiBNb3JhZ2EgKEVkaXRvciksIEdsb3JpYSBBbnphbGTDumEgKEVkaXRvciksIFRvbmkgQ2FkZSBCYW1iYXJhIChDb250cmlidXRvciksIEF1ZHJlIExvcmRlIChDb250cmlidXRvciksIEJhcmJhcmEgU21pdGggKENvbnRyaWJ1dG9yKSwgQW5hIENhc3RpbGxvIChDb250cmlidXRvciksIENoZXJyaWUgTW9yYWdhIChDb250cmlidXRvciksIGFuZCBvdGhlcnMNCg0KR29vZHJlYWRzIFJhdGluZzogNC40LzUNCg0KTnVtYmVyIG9mIHJhdGluZ3M6IDEwLDAwMCsNCg0KSSBjcmVhdGVkIHRoZSBmaWxlczogImJvb2tzLmh0bWwiLCAiYm9va3MueG1sIiwgYW5kICJib29rcy5qc29uIiB3aXRoIHRoZSBpbmZvcm1hdGlvbiBvZiB0aGVzZSBib29rcyBpbiB0aGVpciByZXNwZWN0aXZlIGZvcm1hdHMuIFRoZW4sIEkgbG9hZGVkIHRoZSBpbmZvcm1hdGlvbiBmcm9tIGVhY2ggc291cmNlIGludG8gc2VwYXJhdGUgUiBkYXRhIGZyYW1lcy4NCg0KYGBge3IgbG9hZC1wYWNrYWdlcywgbWVzc2FnZT1GQUxTRX0NCmxpYnJhcnkodGlkeXZlcnNlKQ0KbGlicmFyeShvcGVuaW50cm8pDQoNCiMgTG9hZCBuZWNlc3NhcnkgbGlicmFyaWVzDQpsaWJyYXJ5KHhtbDIpDQpsaWJyYXJ5KGpzb25saXRlKQ0KbGlicmFyeShodHRyKQ0KbGlicmFyeShydmVzdCkNCmBgYA0KDQojIyMgTG9hZCBIVE1MIGRhdGEgaW50byBhIGRhdGEgZnJhbWUNCg0KYGBge3J9DQpodG1sX3VybCA8LSAiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL3B1amFyb3kyODAvREFUQTYwN1dlZWs3L21haW4vYm9va3MuaHRtbCINCmh0bWxfY29udGVudCA8LSBHRVQoaHRtbF91cmwpDQpodG1sX3RleHQgPC0gY29udGVudChodG1sX2NvbnRlbnQsICJ0ZXh0IikNCmh0bWxfZGYgPC0gcmVhZF9odG1sKGh0bWxfdGV4dCkgJT4lIGh0bWxfdGFibGUoKQ0KYGBgDQoNCiMjIyBMb2FkIFhNTCBkYXRhIGludG8gYSBkYXRhIGZyYW1lDQpgYGB7cn0NCnhtbF91cmwgPC0gImh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9wdWphcm95MjgwL0RBVEE2MDdXZWVrNy9tYWluL2Jvb2tzLnhtbCINCnhtbF9jb250ZW50IDwtIHhtbDI6OnJlYWRfeG1sKHhtbF91cmwpDQp4bWxfZGYgPC0geG1sMjo6YXNfbGlzdCh4bWxfY29udGVudCkNCmBgYA0KIyMjIExvYWQgSlNPTiBkYXRhIGludG8gYSBkYXRhIGZyYW1lDQoNCmBgYHtyfQ0KanNvbl91cmwgPC0gImh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9wdWphcm95MjgwL0RBVEE2MDdXZWVrNy9tYWluL2Jvb2tzLmpzb24iDQpqc29uX2NvbnRlbnQgPC0gR0VUKGpzb25fdXJsKQ0KanNvbl90ZXh0IDwtIGNvbnRlbnQoanNvbl9jb250ZW50LCAidGV4dCIpDQpqc29uX2RmIDwtIGZyb21KU09OKGpzb25fdGV4dCkNCmBgYA0KDQojIyMgUHJpbnQgdGhlIGRhdGEgZnJhbWVzDQoNCmBgYHtyfQ0KcHJpbnQoIkhUTUwgRGF0YSBGcmFtZToiKQ0KcHJpbnQoaHRtbF9kZikNCmBgYA0KYGBge3J9DQpwcmludCgiXG5YTUwgRGF0YSBGcmFtZToiKQ0KcHJpbnQoeG1sX2RmKQ0KYGBgDQpgYGB7cn0NCnByaW50KCJcbkpTT04gRGF0YSBGcmFtZToiKQ0KcHJpbnQoanNvbl9kZikNCmBgYA0KIyMjIENoZWNrIGRpbWVuc2lvbnMNCmBgYHtyfQ0KZGltKGh0bWxfZGYpID09IGRpbSh4bWxfZGYpDQpkaW0oaHRtbF9kZikgPT0gZGltKGpzb25fZGYpDQpkaW0oeG1sX2RmKSA9PSBkaW0oanNvbl9kZikNCmBgYA0KIyMjIENoZWNrIHN0cnVjdHVyZQ0KYGBge3J9DQppZGVudGljYWwoY29sbmFtZXMoaHRtbF9kZiksIGNvbG5hbWVzKHhtbF9kZikpDQppZGVudGljYWwoY29sbmFtZXMoaHRtbF9kZiksIGNvbG5hbWVzKGpzb25fZGYpKQ0KaWRlbnRpY2FsKGNvbG5hbWVzKHhtbF9kZiksIGNvbG5hbWVzKGpzb25fZGYpKQ0KYGBgDQoNCiMjIyBDaGVjayBjb250ZW50DQpgYGB7cn0NCmlkZW50aWNhbChodG1sX2RmLCB4bWxfZGYpDQppZGVudGljYWwoaHRtbF9kZiwganNvbl9kZikNCmlkZW50aWNhbCh4bWxfZGYsIGpzb25fZGYpDQpgYGANCg0KIyMjIENvbmNsdXNpb246IEFyZSB0aGUgdGhyZWUgZGF0YSBmcmFtZXMgaWRlbnRpY2FsPw0KDQpObywgdGhlIHRocmVlIGRhdGEgZnJhbWVzIGFyZSBub3QgaWRlbnRpY2FsIGJlY2F1c2UgSFRNTCwgWE1MLCBhbmQgSlNPTiBoYXZlIGRpZmZlcmVudCBkYXRhIHN0cnVjdHVyZXMuIFdoaWxlIHRoZSBpbmZvcm1hdGlvbiBpcyB0aGUgc2FtZSwgdGhleSBhcmUgcmVwcmVzZW50ZWQgZGlmZmVyZW50bHkgaW4gZWFjaCBmb3JtYXQuIEhvd2V2ZXIsIGl0IGlzIHByb3ZlbiB0aGF0IHRoZXkgY29udGFpbiB0aGUgc2FtZSBkYXRhLg0KDQo=