library(knitr)
library(XML)
library(jsonlite)
library(RCurl)
## Loading required package: bitops

Load information from HTML

# Get data from html file I uploaded to github
theURL_html <- getURL("https://raw.githubusercontent.com/bpersaud104/Data607/master/books.html")
books_html <- readHTMLTable(theURL_html, header = TRUE)
books_html
## $`Fantasy Books`
##                                               Title
## 1 The Lord of the Rings: The Fellowship of the Ring
## 2           Harry Potter and the Chamber of Secrets
## 3                                      The Talisman
##                   Author(s) Year published Pages
## 1            J.R.R. Tolkein           1954   423
## 2              J.K. Rowling           1999   315
## 3 Stephen King,Peter Straub           1984   921

Load information from XML

# Get data from xml file I uploaded to github
theURL_xml <- getURL("https://raw.githubusercontent.com/bpersaud104/Data607/master/books.xml")
books_xml <- xmlParse(theURL_xml)
root <- xmlRoot(books_xml)
xmlName(root)
## [1] "Fantasy_Books"
xmlSApply(root, function(x) xmlSApply(x, xmlValue))
##                Book                                               
## Title          "The Lord of the Rings: The Fellowship of the Ring"
## Author         "J.R.R. Tolkein"                                   
## Author         "DNE"                                              
## Year_Published "1954"                                             
## Pages          "423"                                              
##                Book                                      Book          
## Title          "Harry Potter and the Chamber of Secrets" "The Talisman"
## Author         "J.K. Rowling"                            "Stephen King"
## Author         "DNE"                                     "Peter Straub"
## Year_Published "1999"                                    "1984"        
## Pages          "315"                                     "921"

Load information from JSON

# Get data from json file I uploaded to github
theURL_json <- getURL("https://raw.githubusercontent.com/bpersaud104/Data607/master/books.json")
books_json <- fromJSON(theURL_json)
books_json
## $`Fantasy Books`
##                                               Title
## 1 The Lord of the Rings: The Fellowship of the Ring
## 2           Harry Potter and the Chamber of Secrets
## 3                                      The Talisman
##                      Authors Year Published Pages
## 1             J.R.R. Tolkein           1954   423
## 2               J.K. Rowling           1999   315
## 3 Stephen King, Peter Straub           1984   921

Are the three dataframes identical?

The three dataframes are not identical. HTML and JSON are almost identical except for the author column, HTML allows you to use (s) to show if they have more than one author or not. JSON does not allow this so the column is named authors even if there is only one author. You can probably do some tidying to make XML look like the other two. For XML you have to put list the authors seperately unlike HTML and JSON where they can be put together. I put DNE(does not exist) for the ones that only have one author to show that there is only one author.