Introduction

Pick three of your favorite books on one of your favorite subjects. At least one of the books should have more than one author. For each book, include the title, authors, and two or three other attributes that you find interesting.

We are tasked with creating 3 file formats json, html, and xml of my favorite books.

Import Json file into an R Dataframe

We manage to do this with the library jsonlite.

library(kableExtra)
## Warning in !is.null(rmarkdown::metadata$output) && rmarkdown::metadata$output
## %in% : 'length(x) = 2 > 1' in coercion to 'logical(1)'
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows
library(jsonlite)
## 
## Attaching package: 'jsonlite'
## The following object is masked from 'package:purrr':
## 
##     flatten
library(knitr)

books_json <- fromJSON("books.json")

df <- as.data.frame(books_json$Favorite_Books)

#head(df)

knitr::kable(df)
title authors pages publisher year
The Strange Case of Dr. Jekyll and Mr. Hyde and Other Stories Robert Louis Stevenson 269 Barnes & Noble Classics 2003
Life on the Edge: The Coming of Age of Quantum Biology Johnjoe Mcfadden AND Jim Al-Khalili 353 Crown Publishers 2014
Cracking the Coding Interview Sixth Edition Gayle Laakmann Mcdowell 695 CareerCup 2020
md_table <- kable(df, 
                  format = "markdown", 
                  col.names = c("Title", "Authors", "Pages", "Publisher", "Year"),
                  align = c("l", "l", "r", "l", "r"), 
                  caption = "My Favorite Books")


kable_styling(md_table, 
              bootstrap_options = c("striped", "hover"), 
              full_width = F)
## Warning in kable_styling(md_table, bootstrap_options = c("striped", "hover"), :
## Please specify format in kable. kableExtra can customize either HTML or LaTeX
## outputs. See https://haozhu233.github.io/kableExtra/ for details.
My Favorite Books
Title Authors Pages Publisher Year
The Strange Case of Dr. Jekyll and Mr. Hyde and Other Stories Robert Louis Stevenson 269 Barnes & Noble Classics 2003
Life on the Edge: The Coming of Age of Quantum Biology Johnjoe Mcfadden AND Jim Al-Khalili 353 Crown Publishers 2014
Cracking the Coding Interview Sixth Edition Gayle Laakmann Mcdowell 695 CareerCup 2020

Import Html file into an R Dataframe

We used rvest library to read this into R.

library(rvest)
## 
## Attaching package: 'rvest'
## The following object is masked from 'package:readr':
## 
##     guess_encoding
html <- read_html("books.html")

table <- html_table(html_nodes(html, "table")[[1]], header = TRUE)

table <- table[, -1]

table$index <- rownames(table)
rownames(table) <- NULL

df1 <- data.frame(table[1:3, ])
df1
##                                                           Title
## 1 The Strange Case of Dr. Jekyll and Mr. Hyde and Other Stories
## 2        Life on the Edge: The Coming of Age of Quantum Biology
## 3                   Cracking the Coding Interview Sixth Edition
##                                author pages               publisher year index
## 1              Robert Louis Stevenson   269 Barnes & Noble Classics 2003     1
## 2 Johnjoe Mcfadden AND Jim Al-Khalili   353        Crown Publishers 2014     2
## 3             Gayle Laakmann McDowell   695               CareerCup 2020     3

Import XML file into an R Dataframe

Finally, we are able to read this into R with the library xml2.

library(xml2)

doc <- read_xml("books.xml")


titles <- xml_text(xml_find_all(doc, ".//title/*"))
authors <- xml_text(xml_find_all(doc, ".//authors/*"))
pages <- xml_text(xml_find_all(doc, ".//pages/*"))
publishers <- xml_text(xml_find_all(doc, ".//publisher/*"))
years <- xml_text(xml_find_all(doc, ".//year/*"))

df2 <- data.frame(title = titles, author = authors, pages = pages, publisher = publishers, year = years)

df2
##                                                           title
## 1 The Strange Case of Dr. Jekyll and Mr. Hyde and Other Stories
## 2        Life on the Edge: The Coming of Age of Quantum Biology
## 3                   Cracking the Coding Interview Sixth Edition
##                                author pages               publisher year
## 1              Robert Louis Stevenson   269 Barnes & Noble Classics 2003
## 2 Johnjoe Mcfadden AND Jim Al-Khalili   353        Crown Publishers 2014
## 3             Gayle Laakmann McDowell   695               CareerCup 2020

Conclusion

Using these 3 file formats we were able to bring them into R. Some tidying is required to get these formats into a clean form. The only noticable difference was in the HTML file, where there was an index number column generated with the dataframe. Overall, they all appear to be the same.