Pick three of your favorite books on one of your favorite subjects. At least one of the books should have more than one author. For each book, include the title, authors, and two or three other attributes that you find interesting.
We are tasked with creating 3 file formats json, html, and xml of my favorite books.
We manage to do this with the library jsonlite.
library(kableExtra)
## Warning in !is.null(rmarkdown::metadata$output) && rmarkdown::metadata$output
## %in% : 'length(x) = 2 > 1' in coercion to 'logical(1)'
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
library(jsonlite)
##
## Attaching package: 'jsonlite'
## The following object is masked from 'package:purrr':
##
## flatten
library(knitr)
books_json <- fromJSON("books.json")
df <- as.data.frame(books_json$Favorite_Books)
#head(df)
knitr::kable(df)
| title | authors | pages | publisher | year |
|---|---|---|---|---|
| The Strange Case of Dr. Jekyll and Mr. Hyde and Other Stories | Robert Louis Stevenson | 269 | Barnes & Noble Classics | 2003 |
| Life on the Edge: The Coming of Age of Quantum Biology | Johnjoe Mcfadden AND Jim Al-Khalili | 353 | Crown Publishers | 2014 |
| Cracking the Coding Interview Sixth Edition | Gayle Laakmann Mcdowell | 695 | CareerCup | 2020 |
md_table <- kable(df,
format = "markdown",
col.names = c("Title", "Authors", "Pages", "Publisher", "Year"),
align = c("l", "l", "r", "l", "r"),
caption = "My Favorite Books")
kable_styling(md_table,
bootstrap_options = c("striped", "hover"),
full_width = F)
## Warning in kable_styling(md_table, bootstrap_options = c("striped", "hover"), :
## Please specify format in kable. kableExtra can customize either HTML or LaTeX
## outputs. See https://haozhu233.github.io/kableExtra/ for details.
| Title | Authors | Pages | Publisher | Year |
|---|---|---|---|---|
| The Strange Case of Dr. Jekyll and Mr. Hyde and Other Stories | Robert Louis Stevenson | 269 | Barnes & Noble Classics | 2003 |
| Life on the Edge: The Coming of Age of Quantum Biology | Johnjoe Mcfadden AND Jim Al-Khalili | 353 | Crown Publishers | 2014 |
| Cracking the Coding Interview Sixth Edition | Gayle Laakmann Mcdowell | 695 | CareerCup | 2020 |
We used rvest library to read this into R.
library(rvest)
##
## Attaching package: 'rvest'
## The following object is masked from 'package:readr':
##
## guess_encoding
html <- read_html("books.html")
table <- html_table(html_nodes(html, "table")[[1]], header = TRUE)
table <- table[, -1]
table$index <- rownames(table)
rownames(table) <- NULL
df1 <- data.frame(table[1:3, ])
df1
## Title
## 1 The Strange Case of Dr. Jekyll and Mr. Hyde and Other Stories
## 2 Life on the Edge: The Coming of Age of Quantum Biology
## 3 Cracking the Coding Interview Sixth Edition
## author pages publisher year index
## 1 Robert Louis Stevenson 269 Barnes & Noble Classics 2003 1
## 2 Johnjoe Mcfadden AND Jim Al-Khalili 353 Crown Publishers 2014 2
## 3 Gayle Laakmann McDowell 695 CareerCup 2020 3
Finally, we are able to read this into R with the library xml2.
library(xml2)
doc <- read_xml("books.xml")
titles <- xml_text(xml_find_all(doc, ".//title/*"))
authors <- xml_text(xml_find_all(doc, ".//authors/*"))
pages <- xml_text(xml_find_all(doc, ".//pages/*"))
publishers <- xml_text(xml_find_all(doc, ".//publisher/*"))
years <- xml_text(xml_find_all(doc, ".//year/*"))
df2 <- data.frame(title = titles, author = authors, pages = pages, publisher = publishers, year = years)
df2
## title
## 1 The Strange Case of Dr. Jekyll and Mr. Hyde and Other Stories
## 2 Life on the Edge: The Coming of Age of Quantum Biology
## 3 Cracking the Coding Interview Sixth Edition
## author pages publisher year
## 1 Robert Louis Stevenson 269 Barnes & Noble Classics 2003
## 2 Johnjoe Mcfadden AND Jim Al-Khalili 353 Crown Publishers 2014
## 3 Gayle Laakmann McDowell 695 CareerCup 2020
Using these 3 file formats we were able to bring them into R. Some tidying is required to get these formats into a clean form. The only noticable difference was in the HTML file, where there was an index number column generated with the dataframe. Overall, they all appear to be the same.