HTML Table Load

html_table  <- as.data.frame(read_html("books.html") |> html_table(fill=TRUE))
kable(html_table)
Title Authors Favorite.Attributes
Linear Algebra and Its Applications David C. Lay, Steven R. Lay, Judi J. McDonald Simple Explinations, Transition to Advance topics
Calculus Illustrated. Volume 2: Differential Calculus Peter Saveliev Visuals, Format of questions, Great background information on each toppic
Statistics: Principles and Methods Richard A. Johnson, Gouri K. Bhattacharyya Practical, Basics coverage
is.data.frame(html_table) 
## [1] TRUE

XML Table Load

it loads as a list so must convert

xml_file = "books.xml"
books_xml = read_xml(xml_file)
is.data.frame(books_xml) 
## [1] FALSE

Convert XML List to dataframe

## reload to make subvectors as lists as well
books_xml <- as_list(read_xml(xml_file))

xml_book_df = tibble::as_tibble(books_xml)|>
              mutate(number = row_number())|>
              unnest_longer(books)

df_unt_1 <- xml_book_df  |>
            unnest_longer( col = books, names_repair = "minimal") |>
            select(c(1,3,4)) 

df_unt_2 <- df_unt_1  |> 
            filter(books_id != "title") |>
            unnest_longer( col = books, names_repair = "minimal") 

book_df <- rbind(df_unt_1 |>
           filter(books_id == "title"), df_unt_2)

books_df <- book_df |> 
  pivot_wider(
    names_from = books_id,
    values_from = books
  )
## Warning: Values from `books` are not uniquely identified; output will contain list-cols.
## • Use `values_fn = list` to suppress this warning.
## • Use `values_fn = {summary_fun}` to summarise duplicates.
## • Use the following dplyr code to identify duplicates.
##   {data} |>
##   dplyr::summarise(n = dplyr::n(), .by = c(number, books_id)) |>
##   dplyr::filter(n > 1L)
books_df <- books_df |>
              unnest_longer(col = c(title)) |> 
              unnest_longer(col = c(authors)) |> 
              unnest_longer(col = c(favoriteAttributes)) |>
              select(2,4,6)

kable(books_df)
title authors favoriteAttributes
Linear Algebra and Its Applications David C. Lay Simple Explanations
Linear Algebra and Its Applications David C. Lay Transition to Advanced Topics
Linear Algebra and Its Applications Steven R. Lay Simple Explanations
Linear Algebra and Its Applications Steven R. Lay Transition to Advanced Topics
Linear Algebra and Its Applications Judi J. McDonald Simple Explanations
Linear Algebra and Its Applications Judi J. McDonald Transition to Advanced Topics
Calculus Illustrated. Volume 2: Differential Calculus Peter Saveliev Visuals
Calculus Illustrated. Volume 2: Differential Calculus Peter Saveliev Format of Questions
Calculus Illustrated. Volume 2: Differential Calculus Peter Saveliev Great Background Information on Each Topic
Statistics: Principles and Methods Richard A. Johnson Practical
Statistics: Principles and Methods Richard A. Johnson Basics Coverage
Statistics: Principles and Methods Gouri K. Bhattacharyya Practical
Statistics: Principles and Methods Gouri K. Bhattacharyya Basics Coverage
is.data.frame(books_df) 
## [1] TRUE

Load JSON table

books_data <- fromJSON("books.json") 
json_books_df <- as.data.frame(books_data) 
kable((json_books_df))
books.title books.authors books.favoriteAttributes
Linear Algebra and Its Applications David C. Lay , Steven R. Lay , Judi J. McDonald Simple Explanations , Transition to Advanced Topics
Calculus Illustrated. Volume 2: Differential Calculus Peter Saveliev Visuals , Format of Questions , Great Background Information on Each Topic
Statistics: Principles and Methods Richard A. Johnson , Gouri K. Bhattacharyya Practical , Basics Coverage
is.data.frame(json_books_df)
## [1] TRUE

Conclusion

In conclusion JSON and HTML had libraries that would directly load their contents to an R dataframe although not perfect as books still need some tidying since many of the favorite attributes and authors are on the same row and it can be beneficial seperating them for analysis purposes. XML on the other hand dealt with a larger diffirence in data structure making it a little more complicated when loading into R as many of the values came in terms of lists so you have to unnest them.