Pick three of your favorite books on one of your favorite subjects. At least one of the books should have more than one author. For each book, include the title, authors, and two or three other attributes that you find interesting.

Take the information that you’ve selected about these three books, and separately create three files which store the book’s information in HTML (using an html table), XML, and JSON formats (e.g. “books.html”, “books.xml”, and “books.json”).

First I create a CSV that I can use to convert to any file type just because CSV is the most beloved flexible data type. I will load xml, html and JSON files into R too.

books <- read.csv("https://raw.githubusercontent.com/prnakyazze94/Data_607/refs/heads/main/Books2.csv")
# View first rows
head(books)
##                                                    title
## 1                                       He's Not My Type
## 2                                             Thesaurize
## 3                                            Moral Stand
## 4                                   LLC Beginner’s Guide
## 5 How to Talk to Anyone and Enchant Them into Liking You
## 6                Negotiating from a Position of Weakness
##                                                                                                                                                     subtitle
## 1                                                                                                                         Vancouver Agitators Series, Book 4
## 2                                                                                                                      The Completionist Chronicles, Book 10
## 3                                                                                                                                   Aether's Revival, Book 7
## 4        How to Successfully Start and Maintain a Limited Liability Company Even if You’ve Got Zero Experience: A Complete Up-to-Date & Easy-to-Follow Guide
## 5                   Proven Techniques to Become a People-Magnet by Building Positive, Lasting Relationships and Becoming the Most Likable Person in the Room
## 6 An 18 Step Comprehensive Negotiation System to Turn Vulnerability into Strength. Proven Techniques for Building Empathy, Embracing Vulnerability, and More
##             authors
## 1      Meghan Quinn
## 2      Dakota Krout
## 3 Daniel Schinhofen
## 4      Walter Grant
## 5        Carl Wolfe
## 6   David Whitehead
##                                                                                      coauthors
## 1 Connor Crais, Erin Mallon, Teddy Hamilton, Jason Clarke, J.F. Harding, Kelsey Navarro-Foster
## 2                                                                                 Luke Daniels
## 3                                                                              Andrea Parsneau
## 4                                                                                John Killawee
## 5                                                                                Tim Alexander
## 6                                                                              Gerhard Weigelt
##   release_date language            stars rating
## 1     11-28-23  English 5 out of 5 stars    362
## 2     11-06-23  English 5 out of 5 stars    328
## 3     11-17-23  English 5 out of 5 stars    164
## 4     11-03-23  English 5 out of 5 stars     51
## 5     11-03-23  English 5 out of 5 stars     50
## 6     11-14-23  English 5 out of 5 stars     50

Data not displaying correctly because of the colon in raw data

kable(
  head(books[, c("title", "subtitle", "authors", "release_date", "language", "rating")])
)
title subtitle authors release_date language rating
He’s Not My Type Vancouver Agitators Series, Book 4 Meghan Quinn 11-28-23 English 362
Thesaurize The Completionist Chronicles, Book 10 Dakota Krout 11-06-23 English 328
Moral Stand Aether’s Revival, Book 7 Daniel Schinhofen 11-17-23 English 164
LLC Beginner’s Guide How to Successfully Start and Maintain a Limited Liability Company Even if You’ve Got Zero Experience: A Complete Up-to-Date & Easy-to-Follow Guide Walter Grant 11-03-23 English 51
How to Talk to Anyone and Enchant Them into Liking You Proven Techniques to Become a People-Magnet by Building Positive, Lasting Relationships and Becoming the Most Likable Person in the Room Carl Wolfe 11-03-23 English 50
Negotiating from a Position of Weakness An 18 Step Comprehensive Negotiation System to Turn Vulnerability into Strength. Proven Techniques for Building Empathy, Embracing Vulnerability, and More David Whitehead 11-14-23 English 50

I can use original csv to output.

HTML created out of csv

html_table <- kable(
  head(books), 
  "html", 
  caption = "My Top 6 Books HTML Table"
) %>%
  kable_styling(full_width = FALSE, bootstrap_options = c("striped", "hover"))

# Save the HTML table to a file
save_kable(html_table, "books_info.html")

# open the HTML in default browser
browseURL("books_info.html")

# Print table in RStudio Viewer

webshot("books_info.html", file = "books_info.png", vwidth = 992)
## file:///C:/Users/pricc/OneDrive/Documents/books_info.html screenshot completed

#html_table failed to print directly

JSON CREATED OUT OF CSV

# Select subset of columns
books_subset <- books[, c("title", "subtitle", "authors", "release_date", "language", "rating")]

# Convert to JSON (pretty format)
books_json <- toJSON(books_subset, pretty = TRUE)

# Print JSON to console
cat(books_json)
## [
##   {
##     "title": "He's Not My Type",
##     "subtitle": "Vancouver Agitators Series, Book 4",
##     "authors": "Meghan Quinn",
##     "release_date": "11-28-23",
##     "language": "English",
##     "rating": 362
##   },
##   {
##     "title": "Thesaurize",
##     "subtitle": "The Completionist Chronicles, Book 10",
##     "authors": "Dakota Krout",
##     "release_date": "11-06-23",
##     "language": "English",
##     "rating": 328
##   },
##   {
##     "title": "Moral Stand",
##     "subtitle": "Aether's Revival, Book 7",
##     "authors": "Daniel Schinhofen",
##     "release_date": "11-17-23",
##     "language": "English",
##     "rating": 164
##   },
##   {
##     "title": "LLC Beginner’s Guide",
##     "subtitle": "How to Successfully Start and Maintain a Limited Liability Company Even if You’ve Got Zero Experience: A Complete Up-to-Date & Easy-to-Follow Guide",
##     "authors": "Walter Grant",
##     "release_date": "11-03-23",
##     "language": "English",
##     "rating": 51
##   },
##   {
##     "title": "How to Talk to Anyone and Enchant Them into Liking You",
##     "subtitle": "Proven Techniques to Become a People-Magnet by Building Positive, Lasting Relationships and Becoming the Most Likable Person in the Room",
##     "authors": "Carl Wolfe",
##     "release_date": "11-03-23",
##     "language": "English",
##     "rating": 50
##   },
##   {
##     "title": "Negotiating from a Position of Weakness",
##     "subtitle": "An 18 Step Comprehensive Negotiation System to Turn Vulnerability into Strength. Proven Techniques for Building Empathy, Embracing Vulnerability, and More",
##     "authors": "David Whitehead",
##     "release_date": "11-14-23",
##     "language": "English",
##     "rating": 50
##   }
## ]
# Save JSON to file
write(books_json, "books_subset.json") 

XML CREATED OUT OF CSV

# Select subset of columns from original books csv
books_subset <- books[, c("title", "subtitle", "authors", "release_date", "language", "rating")]

# Create root XML node
books_xml <- newXMLNode("books")

# Loop through each row to add book nodes
apply(books_subset, 1, function(row) {
  book_node <- newXMLNode("book", parent = books_xml)
  newXMLNode("title", row["title"], parent = book_node)
  newXMLNode("subtitle", row["subtitle"], parent = book_node)
  newXMLNode("authors", row["authors"], parent = book_node)
  newXMLNode("release_date", row["release_date"], parent = book_node)
  newXMLNode("language", row["language"], parent = book_node)
  newXMLNode("rating", row["rating"], parent = book_node)
})
## [[1]]
## <rating>362</rating> 
## 
## [[2]]
## <rating>328</rating> 
## 
## [[3]]
## <rating>164</rating> 
## 
## [[4]]
## <rating> 51</rating> 
## 
## [[5]]
## <rating> 50</rating> 
## 
## [[6]]
## <rating> 50</rating>
# Save XML to file
saveXML(books_xml, file = "books_subset.xml")
## [1] "books_subset.xml"
# Print XML in R console
cat(saveXML(books_xml))
## <books>
##   <book>
##     <title>He's Not My Type</title>
##     <subtitle>Vancouver Agitators Series, Book 4</subtitle>
##     <authors>Meghan Quinn</authors>
##     <release_date>11-28-23</release_date>
##     <language>English</language>
##     <rating>362</rating>
##   </book>
##   <book>
##     <title>Thesaurize</title>
##     <subtitle>The Completionist Chronicles, Book 10</subtitle>
##     <authors>Dakota Krout</authors>
##     <release_date>11-06-23</release_date>
##     <language>English</language>
##     <rating>328</rating>
##   </book>
##   <book>
##     <title>Moral Stand</title>
##     <subtitle>Aether's Revival, Book 7</subtitle>
##     <authors>Daniel Schinhofen</authors>
##     <release_date>11-17-23</release_date>
##     <language>English</language>
##     <rating>164</rating>
##   </book>
##   <book>
##     <title>LLC Beginner’s Guide</title>
##     <subtitle>How to Successfully Start and Maintain a Limited Liability Company Even if You’ve Got Zero Experience: A Complete Up-to-Date &amp; Easy-to-Follow Guide</subtitle>
##     <authors>Walter Grant</authors>
##     <release_date>11-03-23</release_date>
##     <language>English</language>
##     <rating> 51</rating>
##   </book>
##   <book>
##     <title>How to Talk to Anyone and Enchant Them into Liking You</title>
##     <subtitle>Proven Techniques to Become a People-Magnet by Building Positive, Lasting Relationships and Becoming the Most Likable Person in the Room</subtitle>
##     <authors>Carl Wolfe</authors>
##     <release_date>11-03-23</release_date>
##     <language>English</language>
##     <rating> 50</rating>
##   </book>
##   <book>
##     <title>Negotiating from a Position of Weakness</title>
##     <subtitle>An 18 Step Comprehensive Negotiation System to Turn Vulnerability into Strength. Proven Techniques for Building Empathy, Embracing Vulnerability, and More</subtitle>
##     <authors>David Whitehead</authors>
##     <release_date>11-14-23</release_date>
##     <language>English</language>
##     <rating> 50</rating>
##   </book>
## </books>

Write R code, using your packages of choice, to load the information from each of the three sources into separate R data frames. Are the three data frames identical?

READING MY RAW FILES INTO R

HTML LOADED INTO R

LOAD HTML FILE INTO R

 # Load html file from github and save it as a webpage

url <- "https://raw.githubusercontent.com/prnakyazze94/Data_607/refs/heads/main/Bookstable.html"

webpage <- read_html(url, encoding = "UTF-8")  # ensure UTF-8

# Print raw HTML to console

cat(as.character(webpage))
## <!DOCTYPE html>
## <html>
## <head>
## <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
## <meta charset="UTF-8">
## <title>Books Table</title>
## <style>
##     table { border-collapse: collapse; width: 100%; }
##     th, td { border: 1px solid black; padding: 8px; text-align: left; }
##     th { background-color: #f2f2f2; }
##   </style>
## </head>
## <body>
##   <h2>Books List</h2>
##   <table>
## <thead><tr>
## <th>Title</th>
##         <th>Subtitle</th>
##         <th>Authors</th>
##         <th>Coauthors</th>
##         <th>Release Date</th>
##         <th>Language</th>
##         <th>Stars</th>
##         <th>Rating</th>
##       </tr></thead>
## <tbody>
## <tr>
## <td>He's Not My Type</td>
##         <td>Vancouver Agitators Series, Book 4</td>
##         <td>Meghan Quinn</td>
##         <td>Connor Crais, Erin Mallon, Teddy Hamilton, Jason Clarke, J.F. Harding, Kelsey Navarro-Foster</td>
##         <td>11-28-23</td>
##         <td>English</td>
##         <td>5 out of 5 stars</td>
##         <td>362</td>
##       </tr>
## <tr>
## <td>Thesaurize</td>
##         <td>The Completionist Chronicles, Book 10</td>
##         <td>Dakota Krout</td>
##         <td>Luke Daniels</td>
##         <td>11-06-23</td>
##         <td>English</td>
##         <td>5 out of 5 stars</td>
##         <td>328</td>
##       </tr>
## <tr>
## <td>Moral Stand</td>
##         <td>Aether's Revival, Book 7</td>
##         <td>Daniel Schinhofen</td>
##         <td>Andrea Parsneau</td>
##         <td>11-17-23</td>
##         <td>English</td>
##         <td>5 out of 5 stars</td>
##         <td>164</td>
##       </tr>
## <tr>
## <td>LLC Beginner’s Guide</td>
##         <td>How to Successfully Start and Maintain a Limited Liability Company Even if You’ve Got Zero Experience: A Complete Up-to-Date &amp; Easy-to-Follow Guide</td>
##         <td>Walter Grant</td>
##         <td>John Killawee</td>
##         <td>11-03-23</td>
##         <td>English</td>
##         <td>5 out of 5 stars</td>
##         <td>51</td>
##       </tr>
## <tr>
## <td>How to Talk to Anyone and Enchant Them into Liking You</td>
##         <td>Proven Techniques to Become a People-Magnet by Building Positive, Lasting Relationships and Becoming the Most Likable Person in the Room</td>
##         <td>Carl Wolfe</td>
##         <td>Tim Alexander</td>
##         <td>11-03-23</td>
##         <td>English</td>
##         <td>5 out of 5 stars</td>
##         <td>50</td>
##       </tr>
## <tr>
## <td>Negotiating from a Position of Weakness</td>
##         <td>An 18 Step Comprehensive Negotiation System to Turn Vulnerability into Strength. Proven Techniques for Building Empathy, Embracing Vulnerability, and More</td>
##         <td>David Whitehead</td>
##         <td>Gerhard Weigelt</td>
##         <td>11-14-23</td>
##         <td>English</td>
##         <td>5 out of 5 stars</td>
##         <td>50</td>
##       </tr>
## </tbody>
## </table>
## </body>
## </html><html>
## <head>
## <meta charset="UTF-8">
## <title>Books Table</title>
## <style>
##     table { border-collapse: collapse; width: 100%; }
##     th, td { border: 1px solid black; padding: 8px; text-align: left; }
##     th { background-color: #f2f2f2; }
##   </style>
## </head>
## <body>
##   <h2>Books List</h2>
##   <table>
## <thead><tr>
## <th>Title</th>
##         <th>Subtitle</th>
##         <th>Authors</th>
##         <th>Coauthors</th>
##         <th>Release Date</th>
##         <th>Language</th>
##         <th>Stars</th>
##         <th>Rating</th>
##       </tr></thead>
## <tbody>
## <tr>
## <td>He's Not My Type</td>
##         <td>Vancouver Agitators Series, Book 4</td>
##         <td>Meghan Quinn</td>
##         <td>Connor Crais, Erin Mallon, Teddy Hamilton, Jason Clarke, J.F. Harding, Kelsey Navarro-Foster</td>
##         <td>11-28-23</td>
##         <td>English</td>
##         <td>5 out of 5 stars</td>
##         <td>362</td>
##       </tr>
## <tr>
## <td>Thesaurize</td>
##         <td>The Completionist Chronicles, Book 10</td>
##         <td>Dakota Krout</td>
##         <td>Luke Daniels</td>
##         <td>11-06-23</td>
##         <td>English</td>
##         <td>5 out of 5 stars</td>
##         <td>328</td>
##       </tr>
## <tr>
## <td>Moral Stand</td>
##         <td>Aether's Revival, Book 7</td>
##         <td>Daniel Schinhofen</td>
##         <td>Andrea Parsneau</td>
##         <td>11-17-23</td>
##         <td>English</td>
##         <td>5 out of 5 stars</td>
##         <td>164</td>
##       </tr>
## <tr>
## <td>LLC Beginner’s Guide</td>
##         <td>How to Successfully Start and Maintain a Limited Liability Company Even if You’ve Got Zero Experience: A Complete Up-to-Date &amp; Easy-to-Follow Guide</td>
##         <td>Walter Grant</td>
##         <td>John Killawee</td>
##         <td>11-03-23</td>
##         <td>English</td>
##         <td>5 out of 5 stars</td>
##         <td>51</td>
##       </tr>
## <tr>
## <td>How to Talk to Anyone and Enchant Them into Liking You</td>
##         <td>Proven Techniques to Become a People-Magnet by Building Positive, Lasting Relationships and Becoming the Most Likable Person in the Room</td>
##         <td>Carl Wolfe</td>
##         <td>Tim Alexander</td>
##         <td>11-03-23</td>
##         <td>English</td>
##         <td>5 out of 5 stars</td>
##         <td>50</td>
##       </tr>
## <tr>
## <td>Negotiating from a Position of Weakness</td>
##         <td>An 18 Step Comprehensive Negotiation System to Turn Vulnerability into Strength. Proven Techniques for Building Empathy, Embracing Vulnerability, and More</td>
##         <td>David Whitehead</td>
##         <td>Gerhard Weigelt</td>
##         <td>11-14-23</td>
##         <td>English</td>
##         <td>5 out of 5 stars</td>
##         <td>50</td>
##       </tr>
## </tbody>
## </table>
## </body>
## </html>

Create table from webpage. Html_node selects a single node from an HTML document that matches a CSS selector.

While table is the CSS selector, so html_node(table) selects the first table element in the

HTML LOADED INTO R TO CREATE A DF

books_htmltable <- webpage %>% 
  html_node("table") %>% 
  html_table(fill = TRUE)



# Use stringi to trim whitespace and remove non-ASCII characters
books_htmltable <- books_htmltable %>%
  mutate(across(everything(), ~ stri_trim_both(.)))  # trim spaces

# remove problematic non-ASCII characters
books_htmltable <- books_htmltable %>%
  mutate(across(everything(), ~ stri_trans_general(., "Latin-ASCII")))



head(books_htmltable)
## # A tibble: 6 × 8
##   Title          Subtitle Authors Coauthors `Release Date` Language Stars Rating
##   <chr>          <chr>    <chr>   <chr>     <chr>          <chr>    <chr> <chr> 
## 1 He's Not My T… Vancouv… Meghan… Connor C… 11-28-23       English  5 ou… 362   
## 2 Thesaurize     The Com… Dakota… Luke Dan… 11-06-23       English  5 ou… 328   
## 3 Moral Stand    Aether'… Daniel… Andrea P… 11-17-23       English  5 ou… 164   
## 4 LLC Beginner'… How to … Walter… John Kil… 11-03-23       English  5 ou… 51    
## 5 How to Talk t… Proven … Carl W… Tim Alex… 11-03-23       English  5 ou… 50    
## 6 Negotiating f… An 18 S… David … Gerhard … 11-14-23       English  5 ou… 50
write.csv(books_htmltable, "books_from_html.csv", row.names = FALSE)

SELECT ONLY A FEW COLUMNS SO I CAN TELL THE DIFFERNCE IN HTML DF

books_htmltable %>%
  select(Title, Authors, `Release Date`, Language, Stars, Rating) %>%
  head() %>%  # just to preview first few rows
  kable(caption = "Preview of HTML Books Data")
Preview of HTML Books Data
Title Authors Release Date Language Stars Rating
He’s Not My Type Meghan Quinn 11-28-23 English 5 out of 5 stars 362
Thesaurize Dakota Krout 11-06-23 English 5 out of 5 stars 328
Moral Stand Daniel Schinhofen 11-17-23 English 5 out of 5 stars 164
LLC Beginner’s Guide Walter Grant 11-03-23 English 5 out of 5 stars 51
How to Talk to Anyone and Enchant Them into Liking You Carl Wolfe 11-03-23 English 5 out of 5 stars 50
Negotiating from a Position of Weakness David Whitehead 11-14-23 English 5 out of 5 stars 50
# Preview the first few rows using kable
kable(head(books_htmltable))
Title Subtitle Authors Coauthors Release Date Language Stars Rating
He’s Not My Type Vancouver Agitators Series, Book 4 Meghan Quinn Connor Crais, Erin Mallon, Teddy Hamilton, Jason Clarke, J.F. Harding, Kelsey Navarro-Foster 11-28-23 English 5 out of 5 stars 362
Thesaurize The Completionist Chronicles, Book 10 Dakota Krout Luke Daniels 11-06-23 English 5 out of 5 stars 328
Moral Stand Aether’s Revival, Book 7 Daniel Schinhofen Andrea Parsneau 11-17-23 English 5 out of 5 stars 164
LLC Beginner’s Guide How to Successfully Start and Maintain a Limited Liability Company Even if You’ve Got Zero Experience: A Complete Up-to-Date & Easy-to-Follow Guide Walter Grant John Killawee 11-03-23 English 5 out of 5 stars 51
How to Talk to Anyone and Enchant Them into Liking You Proven Techniques to Become a People-Magnet by Building Positive, Lasting Relationships and Becoming the Most Likable Person in the Room Carl Wolfe Tim Alexander 11-03-23 English 5 out of 5 stars 50
Negotiating from a Position of Weakness An 18 Step Comprehensive Negotiation System to Turn Vulnerability into Strength. Proven Techniques for Building Empathy, Embracing Vulnerability, and More David Whitehead Gerhard Weigelt 11-14-23 English 5 out of 5 stars 50

XML

XML FILE LOADED INTO R

# Force UTF-8 locale in this session because , and // are causing errors.
Sys.setlocale("LC_CTYPE", "en_US.UTF-8")
## [1] "en_US.UTF-8"
 #Load XML file from github and save it as urlxml
# Define the URL
urlxml <- "https://raw.githubusercontent.com/prnakyazze94/Data_607/refs/heads/main/BooksXML.xml"

# Read the XML file directly from the URL

books_xml <- read_xml(urlxml, encoding = "UTF-8")

# View the XML structure
print(books_xml)
## {xml_document}
## <books>
## [1] <book>\n  <title>He's Not My Type</title>\n  <subtitle>Vancouver Agitator ...
## [2] <book>\n  <title>Thesaurize</title>\n  <subtitle>The Completionist Chroni ...
## [3] <book>\n  <title>Moral Stand</title>\n  <subtitle>Aether's Revival, Book  ...
## [4] <book>\n  <title>LLC Beginner’s Guide</title>\n  <subtitle>How to Success ...
## [5] <book>\n  <title>How to Talk to Anyone and Enchant Them into Liking You</ ...
## [6] <book>\n  <title>Negotiating from a Position of Weakness</title>\n  <subt ...
# explore it more nicely
xml_structure(books_xml)
## <books>
##   <book>
##     <title>
##       {text}
##     <subtitle>
##       {text}
##     <authors>
##       <author>
##         {text}
##     <coauthors>
##       <coauthor>
##         {text}
##       <coauthor>
##         {text}
##       <coauthor>
##         {text}
##       <coauthor>
##         {text}
##       <coauthor>
##         {text}
##       <coauthor>
##         {text}
##     <release_date>
##       {text}
##     <language>
##       {text}
##     <stars>
##       {text}
##     <rating>
##       {text}
##   <book>
##     <title>
##       {text}
##     <subtitle>
##       {text}
##     <authors>
##       <author>
##         {text}
##     <coauthors>
##       <coauthor>
##         {text}
##     <release_date>
##       {text}
##     <language>
##       {text}
##     <stars>
##       {text}
##     <rating>
##       {text}
##   <book>
##     <title>
##       {text}
##     <subtitle>
##       {text}
##     <authors>
##       <author>
##         {text}
##     <coauthors>
##       <coauthor>
##         {text}
##     <release_date>
##       {text}
##     <language>
##       {text}
##     <stars>
##       {text}
##     <rating>
##       {text}
##   <book>
##     <title>
##       {text}
##     <subtitle>
##       {text}
##     <authors>
##       <author>
##         {text}
##     <coauthors>
##       <coauthor>
##         {text}
##     <release_date>
##       {text}
##     <language>
##       {text}
##     <stars>
##       {text}
##     <rating>
##       {text}
##   <book>
##     <title>
##       {text}
##     <subtitle>
##       {text}
##     <authors>
##       <author>
##         {text}
##     <coauthors>
##       <coauthor>
##         {text}
##     <release_date>
##       {text}
##     <language>
##       {text}
##     <stars>
##       {text}
##     <rating>
##       {text}
##   <book>
##     <title>
##       {text}
##     <subtitle>
##       {text}
##     <authors>
##       <author>
##         {text}
##     <coauthors>
##       <coauthor>
##         {text}
##     <release_date>
##       {text}
##     <language>
##       {text}
##     <stars>
##       {text}
##     <rating>
##       {text}

SAVE XML FILE INTO R data frames Extract fields into a data frame

Extract simple single-value fields (like title, subtitle, release_date, etc.) directly

# Force UTF-8 locale in this session because , and // are causing errors.
Sys.setlocale("LC_CTYPE", "en_US.UTF-8")
## [1] "en_US.UTF-8"
# Extract <book> nodes 

book_nodes <- xml_find_all(books_xml, ".//book")

#create df
books_dfxml <- tibble(
  title = xml_text(xml_find_all(book_nodes, "title")),
  subtitle = xml_text(xml_find_all(book_nodes, "subtitle")),
  release_date = xml_text(xml_find_all(book_nodes, "release_date")),
  language = xml_text(xml_find_all(book_nodes, "language")),
  stars = xml_text(xml_find_all(book_nodes, "stars")),
  rating = xml_text(xml_find_all(book_nodes, "rating"))
)

Extract authors that are nested separately

# Extract authors and coauthors
authors_list <- lapply(book_nodes, function(x) {
  xml_text(xml_find_all(x, ".//authors/author"))
})

coauthors_list <- lapply(book_nodes, function(x) {
  xml_text(xml_find_all(x, ".//coauthors/coauthor"))
})

#  Combine all into one data frame ---
books_dfxml <- books_dfxml %>%
  mutate(
    authors = sapply(authors_list, function(x) paste(x, collapse = ", ")),
    coauthors = sapply(coauthors_list, function(x) paste(x, collapse = ", "))
  )

# View result ---
print(books_dfxml)
## # A tibble: 6 × 8
##   title            subtitle release_date language stars rating authors coauthors
##   <chr>            <chr>    <chr>        <chr>    <chr> <chr>  <chr>   <chr>    
## 1 He's Not My Type Vancouv… 11-28-23     English  5 ou… 362    Meghan… Connor C…
## 2 Thesaurize       The Com… 11-06-23     English  5 ou… 328    Dakota… Luke Dan…
## 3 Moral Stand      Aether'… 11-17-23     English  5 ou… 164    Daniel… Andrea P…
## 4 LLC Beginner’s … How to … 11-03-23     English  5 ou… 51     Walter… John Kil…
## 5 How to Talk to … Proven … 11-03-23     English  5 ou… 50     Carl W… Tim Alex…
## 6 Negotiating fro… An 18 S… 11-14-23     English  5 ou… 50     David … Gerhard …
# Preview the first few rows using kable
kable(head(books_dfxml))
title subtitle release_date language stars rating authors coauthors
He’s Not My Type Vancouver Agitators Series, Book 4 11-28-23 English 5 out of 5 stars 362 Meghan Quinn Connor Crais, Erin Mallon, Teddy Hamilton, Jason Clarke, J.F. Harding, Kelsey Navarro-Foster
Thesaurize The Completionist Chronicles, Book 10 11-06-23 English 5 out of 5 stars 328 Dakota Krout Luke Daniels
Moral Stand Aether’s Revival, Book 7 11-17-23 English 5 out of 5 stars 164 Daniel Schinhofen Andrea Parsneau
LLC Beginner’s Guide How to Successfully Start and Maintain a Limited Liability Company Even if You’ve Got Zero Experience: A Complete Up-to-Date & Easy-to-Follow Guide 11-03-23 English 5 out of 5 stars 51 Walter Grant John Killawee
How to Talk to Anyone and Enchant Them into Liking You Proven Techniques to Become a People-Magnet by Building Positive, Lasting Relationships and Becoming the Most Likable Person in the Room 11-03-23 English 5 out of 5 stars 50 Carl Wolfe Tim Alexander
Negotiating from a Position of Weakness An 18 Step Comprehensive Negotiation System to Turn Vulnerability into Strength. Proven Techniques for Building Empathy, Embracing Vulnerability, and More 11-14-23 English 5 out of 5 stars 50 David Whitehead Gerhard Weigelt

SELECT ONLY A FEW COLUMNS SO I CAN TELL THE DIFFERNCE IN XML DF

books_dfxml %>%
  select(title, authors, `release_date`, language, stars, rating) %>%
  head() %>%  # just to preview first few rows
  kable(caption = "Preview of XML Books Data")
Preview of XML Books Data
title authors release_date language stars rating
He’s Not My Type Meghan Quinn 11-28-23 English 5 out of 5 stars 362
Thesaurize Dakota Krout 11-06-23 English 5 out of 5 stars 328
Moral Stand Daniel Schinhofen 11-17-23 English 5 out of 5 stars 164
LLC Beginner’s Guide Walter Grant 11-03-23 English 5 out of 5 stars 51
How to Talk to Anyone and Enchant Them into Liking You Carl Wolfe 11-03-23 English 5 out of 5 stars 50
Negotiating from a Position of Weakness David Whitehead 11-14-23 English 5 out of 5 stars 50

JSON FILE LOADED INTO R

# Define the GitHub raw URL
urljson <- "https://raw.githubusercontent.com/prnakyazze94/Data_607/refs/heads/main/Booksjson.json"

# Read the JSON directly from the URL
books_json <- fromJSON(urljson)

# View the structure
str(books_json)
## 'data.frame':    7 obs. of  15 variables:
##  $ title        : chr  "He's Not My Type" "Thesaurize" "Moral Stand" "LLC Beginner’s Guide" ...
##  $ subtitle     : chr  "Vancouver Agitators Series, Book 4" "The Completionist Chronicles, Book 10" "Aether's Revival, Book 7" "How to Successfully Start and Maintain a Limited Liability Company Even if You’ve Got Zero Experience: A Comple"| __truncated__ ...
##  $ authors      :List of 7
##   ..$ : chr "Meghan Quinn"
##   ..$ : chr "Dakota Krout"
##   ..$ : chr "Daniel Schinhofen"
##   ..$ : chr "Walter Grant"
##   ..$ : chr "Carl Wolfe"
##   ..$ : chr "David Whitehead"
##   ..$ : chr "Henry Matthias"
##  $ narrators    :List of 7
##   ..$ : chr  "Connor Crais" "Erin Mallon" "Teddy Hamilton" "Jason Clarke" ...
##   ..$ : chr "Luke Daniels"
##   ..$ : chr "Andrea Parsneau"
##   ..$ : chr "John Killawee"
##   ..$ : chr "Tim Alexander"
##   ..$ : chr "Gerhard Weigelt"
##   ..$ : chr "KC Wayman"
##  $ series       : chr  "The Vancouver Agitators" "The Completionist Chronicles" "Aether's Revival" "" ...
##  $ length       : chr  "Length: 11 hrs and 40 mins" "Length: 11 hrs and 40 mins" "Length: 13 hrs and 3 mins" "Length: 3 hrs and 8 mins" ...
##  $ release_date : chr  "Release date: 11-28-23" "Release date: 11-06-23" "Release date: 11-17-23" "Release date: 11-03-23" ...
##  $ language     : chr  "Language: English" "Language: English" "Language: English" "Language: English" ...
##  $ rating       : chr  "5 out of 5 stars" "5 out of 5 stars" "5 out of 5 stars" "5 out of 5 stars" ...
##  $ no_of_ratings: chr  "362 ratings" "328 ratings" "164 ratings" "51 ratings" ...
##  $ regular_price: chr  "$24.95" "$24.95" "$24.95" "$14.95" ...
##  $ sales_price  : chr  "" "" "" "" ...
##  $ category     : logi  NA NA NA NA NA NA ...
##  $ genres       :List of 7
##   ..$ : list()
##   ..$ : list()
##   ..$ : list()
##   ..$ : list()
##   ..$ : list()
##   ..$ : list()
##   ..$ : list()
##  $ url          : chr  "https://www.audible.com/pd/Hes-Not-My-Type-Audiobook/B0CMJSQSJF" "https://www.audible.com/pd/Thesaurize-Audiobook/B0CMR1XPQY" "https://www.audible.com/pd/Moral-Stand-Audiobook/B0CNKVXCJ6" "https://www.audible.com/pd/LLC-Beginners-Guide-Audiobook/B0CMFJ7Y64" ...

CONVERT JSON FILE INTO DF

#  convert to a data frame
books_dfjson <- as.data.frame(books_json)

# View the first few rows
head(books_dfjson)
##                                                    title
## 1                                       He's Not My Type
## 2                                             Thesaurize
## 3                                            Moral Stand
## 4                                   LLC Beginner’s Guide
## 5 How to Talk to Anyone and Enchant Them into Liking You
## 6                Negotiating from a Position of Weakness
##                                                                                                                                                     subtitle
## 1                                                                                                                         Vancouver Agitators Series, Book 4
## 2                                                                                                                      The Completionist Chronicles, Book 10
## 3                                                                                                                                   Aether's Revival, Book 7
## 4        How to Successfully Start and Maintain a Limited Liability Company Even if You’ve Got Zero Experience: A Complete Up-to-Date & Easy-to-Follow Guide
## 5                   Proven Techniques to Become a People-Magnet by Building Positive, Lasting Relationships and Becoming the Most Likable Person in the Room
## 6 An 18 Step Comprehensive Negotiation System to Turn Vulnerability into Strength: Proven Techniques for Building Empathy, Embracing Vulnerability, and More
##             authors
## 1      Meghan Quinn
## 2      Dakota Krout
## 3 Daniel Schinhofen
## 4      Walter Grant
## 5        Carl Wolfe
## 6   David Whitehead
##                                                                                      narrators
## 1 Connor Crais, Erin Mallon, Teddy Hamilton, Jason Clarke, J.F. Harding, Kelsey Navarro-Foster
## 2                                                                                 Luke Daniels
## 3                                                                              Andrea Parsneau
## 4                                                                                John Killawee
## 5                                                                                Tim Alexander
## 6                                                                              Gerhard Weigelt
##                         series                     length
## 1      The Vancouver Agitators Length: 11 hrs and 40 mins
## 2 The Completionist Chronicles Length: 11 hrs and 40 mins
## 3             Aether's Revival  Length: 13 hrs and 3 mins
## 4                                Length: 3 hrs and 8 mins
## 5                               Length: 3 hrs and 21 mins
## 6                                Length: 7 hrs and 7 mins
##             release_date          language           rating no_of_ratings
## 1 Release date: 11-28-23 Language: English 5 out of 5 stars   362 ratings
## 2 Release date: 11-06-23 Language: English 5 out of 5 stars   328 ratings
## 3 Release date: 11-17-23 Language: English 5 out of 5 stars   164 ratings
## 4 Release date: 11-03-23 Language: English 5 out of 5 stars    51 ratings
## 5 Release date: 11-03-23 Language: English 5 out of 5 stars    50 ratings
## 6 Release date: 11-14-23 Language: English 5 out of 5 stars    50 ratings
##   regular_price sales_price category genres
## 1        $24.95                   NA   NULL
## 2        $24.95                   NA   NULL
## 3        $24.95                   NA   NULL
## 4        $14.95                   NA   NULL
## 5        $14.95                   NA   NULL
## 6        $19.95                   NA   NULL
##                                                                                                      url
## 1                                        https://www.audible.com/pd/Hes-Not-My-Type-Audiobook/B0CMJSQSJF
## 2                                             https://www.audible.com/pd/Thesaurize-Audiobook/B0CMR1XPQY
## 3                                            https://www.audible.com/pd/Moral-Stand-Audiobook/B0CNKVXCJ6
## 4                                    https://www.audible.com/pd/LLC-Beginners-Guide-Audiobook/B0CMFJ7Y64
## 5 https://www.audible.com/pd/How-to-Talk-to-Anyone-and-Enchant-Them-into-Liking-You-Audiobook/B0CMF9B353
## 6                https://www.audible.com/pd/Negotiating-from-a-Position-of-Weakness-Audiobook/B0CNB17GP4
# Preview the first few rows using kable
kable(head(books_dfjson))
title subtitle authors narrators series length release_date language rating no_of_ratings regular_price sales_price category genres url
He’s Not My Type Vancouver Agitators Series, Book 4 Meghan Quinn Connor Crais , Erin Mallon , Teddy Hamilton , Jason Clarke , J.F. Harding , Kelsey Navarro-Foster The Vancouver Agitators Length: 11 hrs and 40 mins Release date: 11-28-23 Language: English 5 out of 5 stars 362 ratings $24.95 NA NULL https://www.audible.com/pd/Hes-Not-My-Type-Audiobook/B0CMJSQSJF
Thesaurize The Completionist Chronicles, Book 10 Dakota Krout Luke Daniels The Completionist Chronicles Length: 11 hrs and 40 mins Release date: 11-06-23 Language: English 5 out of 5 stars 328 ratings $24.95 NA NULL https://www.audible.com/pd/Thesaurize-Audiobook/B0CMR1XPQY
Moral Stand Aether’s Revival, Book 7 Daniel Schinhofen Andrea Parsneau Aether’s Revival Length: 13 hrs and 3 mins Release date: 11-17-23 Language: English 5 out of 5 stars 164 ratings $24.95 NA NULL https://www.audible.com/pd/Moral-Stand-Audiobook/B0CNKVXCJ6
LLC Beginner’s Guide How to Successfully Start and Maintain a Limited Liability Company Even if You’ve Got Zero Experience: A Complete Up-to-Date & Easy-to-Follow Guide Walter Grant John Killawee Length: 3 hrs and 8 mins Release date: 11-03-23 Language: English 5 out of 5 stars 51 ratings $14.95 NA NULL https://www.audible.com/pd/LLC-Beginners-Guide-Audiobook/B0CMFJ7Y64
How to Talk to Anyone and Enchant Them into Liking You Proven Techniques to Become a People-Magnet by Building Positive, Lasting Relationships and Becoming the Most Likable Person in the Room Carl Wolfe Tim Alexander Length: 3 hrs and 21 mins Release date: 11-03-23 Language: English 5 out of 5 stars 50 ratings $14.95 NA NULL https://www.audible.com/pd/How-to-Talk-to-Anyone-and-Enchant-Them-into-Liking-You-Audiobook/B0CMF9B353
Negotiating from a Position of Weakness An 18 Step Comprehensive Negotiation System to Turn Vulnerability into Strength: Proven Techniques for Building Empathy, Embracing Vulnerability, and More David Whitehead Gerhard Weigelt Length: 7 hrs and 7 mins Release date: 11-14-23 Language: English 5 out of 5 stars 50 ratings $19.95 NA NULL https://www.audible.com/pd/Negotiating-from-a-Position-of-Weakness-Audiobook/B0CNB17GP4

SELECT ONLY A FEW COLUMNS SO I CAN TELL THE DIFFERNCE IN JSON DF

books_dfjson %>%
  select(title, authors, `release_date`, language, rating,no_of_ratings ) %>%
  head() %>%  # just to preview first few rows
  kable(caption = "Preview of JSON Books Data")
Preview of JSON Books Data
title authors release_date language rating no_of_ratings
He’s Not My Type Meghan Quinn Release date: 11-28-23 Language: English 5 out of 5 stars 362 ratings
Thesaurize Dakota Krout Release date: 11-06-23 Language: English 5 out of 5 stars 328 ratings
Moral Stand Daniel Schinhofen Release date: 11-17-23 Language: English 5 out of 5 stars 164 ratings
LLC Beginner’s Guide Walter Grant Release date: 11-03-23 Language: English 5 out of 5 stars 51 ratings
How to Talk to Anyone and Enchant Them into Liking You Carl Wolfe Release date: 11-03-23 Language: English 5 out of 5 stars 50 ratings
Negotiating from a Position of Weakness David Whitehead Release date: 11-14-23 Language: English 5 out of 5 stars 50 ratings

Are the three data frames identical?

No, the three data frames are not identical.

JSON retains nested structures (authors/narrators arrays), XML and HTML have flattened text. JSON requires text cleaning, XML and HTML are more ready for analysis

XML and HTML capture similar core fields but lose some detail and may have formatting inconsistencies.

For analysis, I would likely need to standardize column names and formats before comparing or combining them. For example I have title and Title.Language: English and English.

I personally like the xml file format the best when converted into a df, which comes as a suprise. I have not had a lot of experience with XML files.