Data_607_Week7

library(XML)
library(xml2)
library(htmltab)
library(rvest)
library(jsonlite)

webpage <- read_html("https://raw.githubusercontent.com/Patel-Krutika/Data_607/main/books.html")
webpage %>% html_table() %>% data.frame()

##         Book.Title              Author Year   Genre              Tag
## 1 The Great Gatsby F. Scott Fitzgerald 1925 Fiction            Novel
## 2     Little Women   Louisa May Alcott 1868 Fiction Domestic Fiction
## 3             1984       George Orwell 1949 Fiction  Science Fiction

d <- xml2::read_xml("https://raw.githubusercontent.com/Patel-Krutika/Data_607/main/books.xml")
xmlParse(d) %>% xmlToDataFrame()

##               name              author year   genre              tag
## 1 The Great Gatsby F. Scott Fitzgerald 1925 Fiction            Novel
## 2     Little Women   Louisa May Alcott 1868 Fiction Domestic Fiction
## 3             1984       George Orwell 1949 Fiction  Science Fiction

books_json <- fromJSON("https://raw.githubusercontent.com/Patel-Krutika/Data_607/main/books.json")
books_json %>% data.frame()

##         books.name        books.author books.year books.genre        books.tag
## 1 The Great Gatsby F. Scott Fitzgerald       1925     Fiction            Novel
## 2     Little Women   Louisa May Alcott       1868     Fiction Domestic Fiction
## 3             1984       George Orwell       1949     Fiction  Science Fiction

All three data frames seem to be similar visually. The column names for all three data frames were different. The html table data frame preserved the int data type of the year column, where as the XML and JSON data frames were changed to character type.