Pick three of your favorite books on one of your favorite subjects. At least one of the books should have more than one author. For each book, include the title, authors, and two or three other attributes that you find interesting. Take the information that you’ve selected about these three books, and separately create three files which store the book’s information in HTML (using an html table), XML, and JSON formats (e.g. “books.html”, “books.xml”, and “books.json”). To help you better understand the different file structures, I’d prefer that you create each of these files “by hand” unless you’re already very comfortable with the file formats. Write R code, using your packages of choice, to load the information from each of the three sources into separate R data frames. Are the three data frames identical? Your deliverable is the three source files and the R code. If you can, package your assignment solution up into an .Rmd file and publish to rpubs.com. [This will also require finding a way to make your three text files accessible from the web].
I will begin by creating a csv file in Google sheets by exporting this data and importing it into R to give me a view of how my data should look once I convert it to different formats.”
csv_link <- "https://raw.githubusercontent.com/Zcash95/DATA607-7/main/DATA607%20LAB%207%20Books%20CSV%20-%20Sheet1.csv"
books_data <- read.csv(url(csv_link))
## Warning in read.table(file = file, header = header, sep = sep, quote = quote, :
## incomplete final line found by readTableHeader on
## 'https://raw.githubusercontent.com/Zcash95/DATA607-7/main/DATA607%20LAB%207%20Books%20CSV%20-%20Sheet1.csv'
head(books_data)
## Book.Title Author Year.Published ISBN
## 1 Rich Dad Poor Dad Robert Kiyosaki 1997 978-1612680194
## 2 Think and Grow Rich Napolean Hill 1937 978-1612680195
## 3 Zero to One Peter Thiel, Blake Masters 2014 978-1612680196
html_data <- read_html("https://raw.githubusercontent.com/Zcash95/DATA607-7/main/books.html")
html_data1 <- html_data %>% html_table(fill = TRUE)
head(html_data1)
## [[1]]
## # A tibble: 3 × 4
## Title Author `Year Published` ISBN
## <chr> <chr> <int> <chr>
## 1 Rich Dad Poor Dad Robert Kiyosaki 1997 978-1612680194
## 2 Think and Grow Rich Napoleon Hill 1937 978-1612680195
## 3 Zero to One Peter Thiel, Blake Masters 2014 978-1612680196
| Title | Author | Year Published | ISBN |
|---|---|---|---|
| Rich Dad Poor Dad | Robert Kiyosaki | 1997 | 978-1612680194 |
| Think and Grow Rich | Napoleon Hill | 1937 | 978-1612680195 |
| Zero to One | Peter Thiel, Blake Masters | 2014 | 978-1612680196 |
xml_link <- "https://raw.githubusercontent.com/Zcash95/DATA607-7/main/book.xml"
xml_content <- readLines(xml_link, warn = FALSE)
xml_string <- paste(xml_content, collapse = "\n")
xml_data <- read_xml(xml_string)
xml_df <- xml_data %>%
xml_find_all(".//book") %>%
map_df(~{
title <- xml_text(xml_find_all(., ".//title"))
author <- xml_text(xml_find_all(., ".//author"))
year <- xml_text(xml_find_all(., ".//year"))
isbn <- xml_text(xml_find_all(., ".//isbn"))
data.frame(title, author, year, isbn)
})
print(xml_df)
## title author year isbn
## 1 Rich Dad Poor Dad Robert Kiyosaki 1997 978-1612680194
## 2 Think and Grow Rich Napoleon Hill 1937 978-1612680195
## 3 Zero to One Peter Thiel, Blake Masters 2014 978-1612680196
<author>Robert Kiyosaki</author>
<year>1997</year>
<isbn>978-1612680194</isbn>
<author>Napoleon Hill</author>
<year>1937</year>
<isbn>978-1612680195</isbn>
<author>Peter Thiel, Blake Masters</author>
<year>2014</year>
<isbn>978-1612680196</isbn>
json_data <- fromJSON("https://raw.githubusercontent.com/Zcash95/DATA607-7/main/books.json")
json_df <- as.data.frame(json_data)
print(json_df)
## books.title books.author books.year books.isbn
## 1 Rich Dad Poor Dad Robert Kiyosaki 1997 978-1612680194
## 2 Think and Grow Rich Napoleon Hill 1937 978-1612680195
## 3 Zero to One Peter Thiel, Blake Masters 2014 978-1612680196
{ “books”: [ { “title”: “Rich Dad Poor Dad”, “author”: “Robert Kiyosaki”, “year”: 1997, “isbn”: “978-1612680194” }, { “title”: “Think and Grow Rich”, “author”: “Napoleon Hill”, “year”: 1937, “isbn”: “978-1612680195” }, { “title”: “Zero to One”, “author”: “Peter Thiel, Blake Masters”, “year”: 2014, “isbn”: “978-1612680196” } ] }
The data tables were all imported into R and their data frames look the same. I had trouble with the XML file but with some tidying I was able to import the data into a table format just like that of JSON and HTML. The difference that I seen between the three data frames are that HTML and JSON had the year column as an integer while the XML file had it as a character. The source code for all are provided as well is located in the GitHub repo
print(html_data1)
## [[1]]
## # A tibble: 3 × 4
## Title Author `Year Published` ISBN
## <chr> <chr> <int> <chr>
## 1 Rich Dad Poor Dad Robert Kiyosaki 1997 978-1612680194
## 2 Think and Grow Rich Napoleon Hill 1937 978-1612680195
## 3 Zero to One Peter Thiel, Blake Masters 2014 978-1612680196
print(xml_df)
## title author year isbn
## 1 Rich Dad Poor Dad Robert Kiyosaki 1997 978-1612680194
## 2 Think and Grow Rich Napoleon Hill 1937 978-1612680195
## 3 Zero to One Peter Thiel, Blake Masters 2014 978-1612680196
print(json_df)
## books.title books.author books.year books.isbn
## 1 Rich Dad Poor Dad Robert Kiyosaki 1997 978-1612680194
## 2 Think and Grow Rich Napoleon Hill 1937 978-1612680195
## 3 Zero to One Peter Thiel, Blake Masters 2014 978-1612680196