This project contains information about three books, represented in HTML, XML, and JSON formats. The data is loaded into R data frames, and we check if the data frames are identical.
We create three different formats to represent the books: HTML, XML, and JSON.
library(rvest)
# Load HTML file
html_file <- read_html("https://raw.githubusercontent.com/Amish22/DS607/refs/heads/main/Books.html")
html_table <- html_table(html_nodes(html_file, "table")[[1]])
# Display HTML table
html_table
## # A tibble: 3 × 5
## Title Year `Author(s)` Genre Themes
## <chr> <int> <chr> <chr> <chr>
## 1 Gödel, Escher, Bach: An Eternal Golden Braid 1979 Douglas Hofst… Non-… Self-…
## 2 The Talisman 1984 Stephen King,… Fant… Paral…
## 3 Crime and Punishment 1866 Fyodor Dostoe… Fict… Guilt…
# Load necessary library
library(xml2)
# Load XML file using xml2
xml_file <- read_xml("https://raw.githubusercontent.com/Amish22/DS607/refs/heads/main/Books.xml")
# Extract relevant nodes and convert to a data frame manually
titles <- xml_text(xml_find_all(xml_file, "//title"))
authors <- xml_text(xml_find_all(xml_file, "//author"))
years <- xml_text(xml_find_all(xml_file, "//year"))
genres <- xml_text(xml_find_all(xml_file, "//genre"))
themes <- xml_find_all(xml_file, "//themes/theme")
# Organize extracted data into a data frame
xml_data <- data.frame(
Title = titles,
Author = authors,
Year = years,
Genre = genres
)
# Display the XML data
xml_data
## Title Author Year
## 1 Gödel, Escher, Bach: An Eternal Golden Braid Douglas Hofstadter 1979
## 2 The Talisman Stephen King, Peter Straub 1984
## 3 Crime and Punishment Fyodor Dostoevsky 1866
## Genre
## 1 Non-fiction, Philosophy, Cognitive Science
## 2 Fantasy, Horror
## 3 Fiction, Psychological, Philosophical novel
library(jsonlite)
# Load JSON file
json_file <- fromJSON("https://raw.githubusercontent.com/Amish22/DS607/refs/heads/main/Books.json")
json_data <- as.data.frame(json_file$books)
# Display JSON data
json_data
## title year author
## 1 Gödel, Escher, Bach: An Eternal Golden Braid 1979 Douglas Hofstadter
## 2 The Talisman 1984 Stephen King, Peter Straub
## 3 Crime and Punishment 1866 Fyodor Dostoevsky
## genre
## 1 Non-fiction, Philosophy, Cognitive Science
## 2 Fantasy, Horror
## 3 Fiction, Psychological, Philosophical novel
## themes
## 1 Self-reference, Formal systems, Intersection of mathematics, art, and music
## 2 Parallel universes, Hero’s journey, Mother-son bond, Good vs. evil
## 3 Guilt, Morality, Redemption, Crime and justice
Check if the data loaded from HTML, XML, and JSON formats are identical.
identical(html_table, xml_data)
## [1] FALSE
identical(html_table, json_data)
## [1] FALSE
identical(xml_data, json_data)
## [1] FALSE
This project demonstrates how book data can be represented in different formats and loaded into R for analysis and compared the data frames generated from HTML, XML, and JSON files.