Introduction

Pick three of your favorite books on one of your favorite subjects. At least one of the books should have more than one author. For each book, include the title, authors, and two or three other attributes that you find interesting.

We are tasked with creating 3 file formats json, html, and xml of my favorite books.

Import Json file into an R Dataframe

We manage to do this with the library jsonlite.

library(kableExtra)

## Warning in !is.null(rmarkdown::metadata$output) && rmarkdown::metadata$output
## %in% : 'length(x) = 2 > 1' in coercion to 'logical(1)'

## 
## Attaching package: 'kableExtra'

## The following object is masked from 'package:dplyr':
## 
##     group_rows

library(jsonlite)

## 
## Attaching package: 'jsonlite'

## The following object is masked from 'package:purrr':
## 
##     flatten

library(knitr)

books_json <- fromJSON("books.json")

df <- as.data.frame(books_json$Favorite_Books)

#head(df)

knitr::kable(df)

title	authors	pages	publisher	year
The Strange Case of Dr. Jekyll and Mr. Hyde and Other Stories	Robert Louis Stevenson	269	Barnes & Noble Classics	2003
Life on the Edge: The Coming of Age of Quantum Biology	Johnjoe Mcfadden AND Jim Al-Khalili	353	Crown Publishers	2014
Cracking the Coding Interview Sixth Edition	Gayle Laakmann Mcdowell	695	CareerCup	2020

md_table <- kable(df, 
                  format = "markdown", 
                  col.names = c("Title", "Authors", "Pages", "Publisher", "Year"),
                  align = c("l", "l", "r", "l", "r"), 
                  caption = "My Favorite Books")


kable_styling(md_table, 
              bootstrap_options = c("striped", "hover"), 
              full_width = F)

## Warning in kable_styling(md_table, bootstrap_options = c("striped", "hover"), :
## Please specify format in kable. kableExtra can customize either HTML or LaTeX
## outputs. See https://haozhu233.github.io/kableExtra/ for details.

My Favorite Books
Title	Authors	Pages	Publisher	Year
The Strange Case of Dr. Jekyll and Mr. Hyde and Other Stories	Robert Louis Stevenson	269	Barnes & Noble Classics	2003
Life on the Edge: The Coming of Age of Quantum Biology	Johnjoe Mcfadden AND Jim Al-Khalili	353	Crown Publishers	2014
Cracking the Coding Interview Sixth Edition	Gayle Laakmann Mcdowell	695	CareerCup	2020

Import Html file into an R Dataframe

We used rvest library to read this into R.

library(rvest)

## 
## Attaching package: 'rvest'

## The following object is masked from 'package:readr':
## 
##     guess_encoding

html <- read_html("books.html")

table <- html_table(html_nodes(html, "table")[[1]], header = TRUE)

table <- table[, -1]

table$index <- rownames(table)
rownames(table) <- NULL

df1 <- data.frame(table[1:3, ])
df1

##                                                           Title
## 1 The Strange Case of Dr. Jekyll and Mr. Hyde and Other Stories
## 2        Life on the Edge: The Coming of Age of Quantum Biology
## 3                   Cracking the Coding Interview Sixth Edition
##                                author pages               publisher year index
## 1              Robert Louis Stevenson   269 Barnes & Noble Classics 2003     1
## 2 Johnjoe Mcfadden AND Jim Al-Khalili   353        Crown Publishers 2014     2
## 3             Gayle Laakmann McDowell   695               CareerCup 2020     3

Import XML file into an R Dataframe

Finally, we are able to read this into R with the library xml2.

library(xml2)

doc <- read_xml("books.xml")


titles <- xml_text(xml_find_all(doc, ".//title/*"))
authors <- xml_text(xml_find_all(doc, ".//authors/*"))
pages <- xml_text(xml_find_all(doc, ".//pages/*"))
publishers <- xml_text(xml_find_all(doc, ".//publisher/*"))
years <- xml_text(xml_find_all(doc, ".//year/*"))

df2 <- data.frame(title = titles, author = authors, pages = pages, publisher = publishers, year = years)

df2

##                                                           title
## 1 The Strange Case of Dr. Jekyll and Mr. Hyde and Other Stories
## 2        Life on the Edge: The Coming of Age of Quantum Biology
## 3                   Cracking the Coding Interview Sixth Edition
##                                author pages               publisher year
## 1              Robert Louis Stevenson   269 Barnes & Noble Classics 2003
## 2 Johnjoe Mcfadden AND Jim Al-Khalili   353        Crown Publishers 2014
## 3             Gayle Laakmann McDowell   695               CareerCup 2020

Conclusion

Using these 3 file formats we were able to bring them into R. Some tidying is required to get these formats into a clean form. The only noticable difference was in the HTML file, where there was an index number column generated with the dataframe. Overall, they all appear to be the same.

Week7_JoeGarcia

Joe_Garcia

2023-03-13

Introduction

Import Json file into an R Dataframe

Import Html file into an R Dataframe

Import XML file into an R Dataframe

Conclusion