library(rvest)
library(xml2)
library(jsonlite)
library(tibble)

Read the HTML and make it into a dataframe

books<- read_html("https://raw.githubusercontent.com/AnnaMoy/Data-607/main/Books.html", trim = T, as.data.frame = T)

books<- books %>%
  html_node("table")%>% 
  html_table(header =TRUE, fill = TRUE) 
books

## # A tibble: 3 × 5
##   Title                                 Author           Genre Published    ISBN
##   <chr>                                 <chr>            <chr> <chr>       <dbl>
## 1 The Legend of Zelda: Twlight princess Akira HImekawa   Grap… 2017 by … 9.78e13
## 2 The Notebook                          Nicholas Sparks  Love… 1996 by … 9.78e12
## 3 The Whiteout                          Dhoniell Clayto… Shor… 2022 by … 9.78e11

#Another way to run it for myself knowledge
# df_table <- books %>% 
#  html_element(xpath = "//table") %>% 
#  html_table()

Read the XML and make into a dataframe

booksxml <- read_xml("https://raw.githubusercontent.com/AnnaMoy/Data-607/main/Books.xml")

title <- xml_text(xml_find_all(booksxml, xpath ="//title"))
author <- xml_text(xml_find_all(booksxml, xpath ="//author"))
genre <- xml_text(xml_find_all(booksxml, xpath ="//genre"))
published <- xml_text(xml_find_all(booksxml, xpath ="//published"))
ISBN <- xml_text(xml_find_all(booksxml, xpath ="//ISBN"))

df2<-data_frame(title,author,genre,published,ISBN)

## Warning: `data_frame()` was deprecated in tibble 1.1.0.
## ℹ Please use `tibble()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

df2

## # A tibble: 3 × 5
##   title                                author              genre published ISBN 
##   <chr>                                <chr>               <chr> <chr>     <chr>
## 1 The Legend of Zelda:Twlight princess Akira Himekawa      Grap… 2017 by … 9781…
## 2 The Notebook                         Nicholas Sparks     Love… 1996 by … 9780…
## 3 The Whiteout                         Dhonielle Clayton,… Shor… 2022 by … 9780…

Read the JSON and make into a dataframe

bookjson <- fromJSON("https://raw.githubusercontent.com/AnnaMoy/Data-607/main/Books.JSON")

json_data_frame <- as.data.frame(bookjson)
json_data_frame

##                                   title
## 1 The Legend of Zelda: Twlight princess
## 2                          The Notebook
## 3                          The Whiteout
##                                          Author          Genre
## 1                                Akira Himekawa Graphic novels
## 2                               Nicholas Sparks   Love stories
## 3 Dhonielle Clayton, Tiffany Jackson, Nic Stone  Short stories
##                  Published           ISBN
## 1        2017 by Viz Media 97814121593470
## 2     1996 by Warner Books  9780446520805
## 3 2022 by Quill Tree Books   978006388146

Conclusion

The HTML, XML and JSON data output all look fairly the same except the JSON did not print it out in the same format. They are not 100% idenitical but very similar. It was the hardest to load the XML as I had to covert the data into columns then print it out as a dataframe.

Assignment 7

Anna Moy

2024-03-10

Read the HTML and make it into a dataframe

Read the XML and make into a dataframe

Read the JSON and make into a dataframe

Conclusion