This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
# Load the necessary libraries
library(rvest)
library(jsonlite)
library(XML)
library(xml2)
#html Data File
book_data <- "https://raw.githubusercontent.com/MRobinson112/assignment7/main/books1.html"
html_file <- readLines(book_data)
## Warning in readLines(book_data): incomplete final line found on
## 'https://raw.githubusercontent.com/MRobinson112/assignment7/main/books1.html'
html_file <- paste(html_file, "\n", collapse = "\n")
html_data <- read_html(html_file)
html_table <- html_data %>% html_table(fill = TRUE)
html_data_frame <- as.data.frame(html_table)
colnames(html_data_frame) <- c("Title", "Author", "ISBN", "Pages", "Publisher", "Attributes")
print(html_data_frame)
## Title Author ISBN
## 1 The Catcher in the Rye J.D. Salinger 978-0-316-76948-7
## 2 All the President's Men by Carl Bernstein, Bob Woodward 978-0-671-21781-5
## 3 The Alchemist Paulo Coelho 978-0-06-250217-9
## Pages Publisher
## 1 277 Little, Brown and Company
## 2 349 Simon & Schuster
## 3 197 HarperOne
## Attributes
## 1 Coming of age, alienation, and identity.
## 2 Investigative journalism, Watergate scandal, political history.
## 3 Quest, personal legend, destiny.
#XML Book File
xml_data <- "https://raw.githubusercontent.com/MRobinson112/assignment7/main/books1.xml"
xml_file <- readLines(xml_data, warn = FALSE)
xml_file <- paste(xml_file, collapse = "\n")
if (nzchar(xml_file)) {
xml_data <- xmlParse(xml_file)
xml_data_frame <- xmlToDataFrame(xml_data)
print(xml_data_frame)
} else {
cat("XML content is empty or invalid.\n")
}
## title author isbn
## 1 The Catcher in the Rye J.D. Salinger 978-0-316-76948-7
## 2 All the President's Men Carl Bernstein and Bob Woodward 978-0-671-21781-5
## 3 The Alchemist Paulo Coelho 978-0-06-231500-7
## pages publisher
## 1 277 Little, Brown and Company
## 2 349 Simon and Schuster
## 3 208 HarperOne
## Attributes
## 1 Coming of age, alienation, and identity.
## 2 Investigative journalism, Watergate scandal, political history.
## 3 Quest, personal legend, destiny.
#Json book file
json_data <- "https://raw.githubusercontent.com/MRobinson112/assignment7/main/book1.json"
json_file <- fromJSON(json_data)
json_data_frame <- as.data.frame(json_file$books)
print(json_data_frame)
## title author isbn
## 1 The Catcher in the Rye J.D. Salinger 978-0-316-76948-7
## 2 All the President's Men Carl Bernstein and Bob Woodward 978-0-671-21781-5
## 3 The Alchemist Paulo Coelho 978-0-06-231500-7
## pages publisher
## 1 277 Little, Brown and Company
## 2 349 Simon & Schuster
## 3 208 HarperOne
## Attributes
## 1 Coming of age, alienation, and identity.
## 2 Investigative journalism, Watergate scandal, political history.
## 3 Quest, personal legend, destiny.
After reading all three file into R and looking at the from all the data they are all identical. the only difference i notices is the json output was automatically numbered.