library(rvest)
library(xml2)
library(jsonlite)
library(tibble)
books<- read_html("https://raw.githubusercontent.com/AnnaMoy/Data-607/main/Books.html", trim = T, as.data.frame = T)
books<- books %>%
html_node("table")%>%
html_table(header =TRUE, fill = TRUE)
books
## # A tibble: 3 × 5
## Title Author Genre Published ISBN
## <chr> <chr> <chr> <chr> <dbl>
## 1 The Legend of Zelda: Twlight princess Akira HImekawa Grap… 2017 by … 9.78e13
## 2 The Notebook Nicholas Sparks Love… 1996 by … 9.78e12
## 3 The Whiteout Dhoniell Clayto… Shor… 2022 by … 9.78e11
#Another way to run it for myself knowledge
# df_table <- books %>%
# html_element(xpath = "//table") %>%
# html_table()
booksxml <- read_xml("https://raw.githubusercontent.com/AnnaMoy/Data-607/main/Books.xml")
title <- xml_text(xml_find_all(booksxml, xpath ="//title"))
author <- xml_text(xml_find_all(booksxml, xpath ="//author"))
genre <- xml_text(xml_find_all(booksxml, xpath ="//genre"))
published <- xml_text(xml_find_all(booksxml, xpath ="//published"))
ISBN <- xml_text(xml_find_all(booksxml, xpath ="//ISBN"))
df2<-data_frame(title,author,genre,published,ISBN)
## Warning: `data_frame()` was deprecated in tibble 1.1.0.
## ℹ Please use `tibble()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
df2
## # A tibble: 3 × 5
## title author genre published ISBN
## <chr> <chr> <chr> <chr> <chr>
## 1 The Legend of Zelda:Twlight princess Akira Himekawa Grap… 2017 by … 9781…
## 2 The Notebook Nicholas Sparks Love… 1996 by … 9780…
## 3 The Whiteout Dhonielle Clayton,… Shor… 2022 by … 9780…
bookjson <- fromJSON("https://raw.githubusercontent.com/AnnaMoy/Data-607/main/Books.JSON")
json_data_frame <- as.data.frame(bookjson)
json_data_frame
## title
## 1 The Legend of Zelda: Twlight princess
## 2 The Notebook
## 3 The Whiteout
## Author Genre
## 1 Akira Himekawa Graphic novels
## 2 Nicholas Sparks Love stories
## 3 Dhonielle Clayton, Tiffany Jackson, Nic Stone Short stories
## Published ISBN
## 1 2017 by Viz Media 97814121593470
## 2 1996 by Warner Books 9780446520805
## 3 2022 by Quill Tree Books 978006388146
The HTML, XML and JSON data output all look fairly the same except the JSON did not print it out in the same format. They are not 100% idenitical but very similar. It was the hardest to load the XML as I had to covert the data into columns then print it out as a dataframe.