library(knitr)
library(XML)
library(jsonlite)
library(RCurl)
## Loading required package: bitops
# Get data from html file I uploaded to github
theURL_html <- getURL("https://raw.githubusercontent.com/bpersaud104/Data607/master/books.html")
books_html <- readHTMLTable(theURL_html, header = TRUE)
books_html
## $`Fantasy Books`
## Title
## 1 The Lord of the Rings: The Fellowship of the Ring
## 2 Harry Potter and the Chamber of Secrets
## 3 The Talisman
## Author(s) Year published Pages
## 1 J.R.R. Tolkein 1954 423
## 2 J.K. Rowling 1999 315
## 3 Stephen King,Peter Straub 1984 921
# Get data from xml file I uploaded to github
theURL_xml <- getURL("https://raw.githubusercontent.com/bpersaud104/Data607/master/books.xml")
books_xml <- xmlParse(theURL_xml)
root <- xmlRoot(books_xml)
xmlName(root)
## [1] "Fantasy_Books"
xmlSApply(root, function(x) xmlSApply(x, xmlValue))
## Book
## Title "The Lord of the Rings: The Fellowship of the Ring"
## Author "J.R.R. Tolkein"
## Author "DNE"
## Year_Published "1954"
## Pages "423"
## Book Book
## Title "Harry Potter and the Chamber of Secrets" "The Talisman"
## Author "J.K. Rowling" "Stephen King"
## Author "DNE" "Peter Straub"
## Year_Published "1999" "1984"
## Pages "315" "921"
# Get data from json file I uploaded to github
theURL_json <- getURL("https://raw.githubusercontent.com/bpersaud104/Data607/master/books.json")
books_json <- fromJSON(theURL_json)
books_json
## $`Fantasy Books`
## Title
## 1 The Lord of the Rings: The Fellowship of the Ring
## 2 Harry Potter and the Chamber of Secrets
## 3 The Talisman
## Authors Year Published Pages
## 1 J.R.R. Tolkein 1954 423
## 2 J.K. Rowling 1999 315
## 3 Stephen King, Peter Straub 1984 921
The three dataframes are not identical. HTML and JSON are almost identical except for the author column, HTML allows you to use (s) to show if they have more than one author or not. JSON does not allow this so the column is named authors even if there is only one author. You can probably do some tidying to make XML look like the other two. For XML you have to put list the authors seperately unlike HTML and JSON where they can be put together. I put DNE(does not exist) for the ones that only have one author to show that there is only one author.